iText - add a portion of one PDF to another - pdf

I have two PDFs. One is the main PDF and the other has an image that I need to insert into the first. Also in the second PDF, after inserting that image, I need to concatenate the remainder of the second PDF.

The solution was to superimpose the PDF page with the image onto the main PDF. Then concatenate the rest of it. "design_section" is the PDF with the image in it. This code will do:
PdfReader confirmation_section = new PdfReader(SOURCE);
PdfReader design_section = new PdfReader(SOURCE2);
PdfStamper stamper = new PdfStamper(confirmation_section, new FileOutputStream(RESULT));
PdfImportedPage page = stamper.getImportedPage(design_section, 1);
int c = confirmation_section.getNumberOfPages();
PdfContentByte background;
for (int i = 1; i <= c; i++) {
background = stamper.getUnderContent(i);
if(i == c)
background.addTemplate(page, 0, 0);
}
int d = design_section.getNumberOfPages();
if(d > 1) {
for(int f = 2; f <= d; f++) {
stamper.insertPage(c + f, confirmation_section.getPageSize(1));
page = stamper.getImportedPage(design_section, f);
stamper.getOverContent(c + f - 1).addTemplate(page, 0, 0);
System.out.println("here we are in the loop c + f is: " + (c + f));
}
}
stamper.close();
Pointed suggestion for iText -- how about renaming "addTemplate()" to "addPage()"???. iText is the most cryptic lib I have used and that includes regexp

Thanks for the follow up. I did read that many, many times ))) well, honestly, at least 6 times. I know it is just an excerpt and I am sure that there is more valuable information in the book, but with that said, I did not find what I was looking for. Where in that text does it discuss, compare and differentiate PdfCopy PDFStamper and PDFReader/Writer in the context of, for example, adding pages from one PDF to another?

Related

Duplicate pageItem from one layer to another in illustrator without offset

I'm copying items from a layer in one illustrator document into a new layer in a new illustrator document. It all works fine except that the items do not 'paste' into the same location in the new illustrator document. They are in a different position on the artboard to the original. Could anyone tell me how to resolve this, I've had a good look around but can't find anything.
Many thanks
var targetLayer = newDoc.layers.add()
for (var k = 0; k < layerName.pageItems.length; k++) {
var newItem = layerName.pageItems[k].duplicate(targetLayer, ElementPlacement.PLACEATEND)
}
This seems to work:
for (var k = 0; k < layerName.pageItems.length; k++) {
var pos = layerName.pageItems[k].position
var newItem = layerName.pageItems[k].duplicate(targetLayer, ElementPlacement.PLACEATEND)
newItem.position = pos
}

Apache PDFBox replace text results in few character missed

Trying to use Apache PDFBox version 2.0.2 for a text replace (with the below code) produces an output where few of the characters would not be displayed, mostly the capital Case Character. For example a replacement with "ABCDEFGHIJKLMNOPQRSTUVWXYZ" the output appears in pdf as "ABCDEF HIJKLM OP RST W Y ". Is this some bug ?? or we have some workaround to handle these character .
public static PDDocument replaceText(PDDocument document, String searchString, String replacement) throws IOException {
if (StringUtils.isEmpty(searchString) || StringUtils.isEmpty(replacement)) {
return document;
}
PDPageTree pages = document.getDocumentCatalog().getPages();
for (PDPage page : pages) {
PDFStreamParser parser = new PDFStreamParser(page);
parser.parse();
List tokens = parser.getTokens();
for (int j = 0; j < tokens.size(); j++) {
Object next = tokens.get(j);
if (next instanceof Operator) {
Operator op = (Operator) next;
//Tj and TJ are the two operators that display strings in a PDF
if (op.getName().equals("Tj")) {
// Tj takes one operator and that is the string to display so lets update that operator
COSString previous = (COSString) tokens.get(j - 1);
String string = previous.getString();
string = string.replaceFirst(searchString, replacement);
previous.setValue(string.getBytes());
} else if (op.getName().equals("TJ")) {
COSArray previous = (COSArray) tokens.get(j - 1);
for (int k = 0; k < previous.size(); k++) {
Object arrElement = previous.getObject(k);
if (arrElement instanceof COSString) {
COSString cosString = (COSString) arrElement;
String string = cosString.getString();
string = StringUtils.replaceOnce(string, searchString, replacement);
cosString.setValue(string.getBytes());
}
}
}
}
}
// now that the tokens are updated we will replace the page content stream.
PDStream updatedStream = new PDStream(document);
OutputStream out = updatedStream.createOutputStream();
ContentStreamWriter tokenWriter = new ContentStreamWriter(out);
tokenWriter.writeTokens(tokens);
page.setContents(updatedStream);
out.close();
}
return document;
}
Quoting from
https://pdfbox.apache.org/2.0/migration.html
Why was the ReplaceText example removed?
The ReplaceText example has been removed as it gave the incorrect illusion that text can be replaced easily. Words are often split, as seen by this excerpt of a content stream:
[ (Do) -29 (c) -1 (umen) 30 (tation) ] TJ
Other problems will appear with font subsets: for example, if only the glyphs for a, b and c are used, these would be encoded as hex 0, 1 and 2, so you won’t find “abc”. Additionally, you can’t replace “c” with “d” because it isn’t part of the subset.
You could also have problems with ligatures, e.g. “ff”, “fl”, “fi”, “ffi”, “ffl”, which can be represented by a single code in many fonts. To understand this yourself, view any file with PDFDebugger and have a look at the “Contents” entry of a page.
======================================================================
Your description suggests that the initial file has been using a font subset, that is missing the characters G, N, Q, V and Y.
And no, there is no easy workaround. You would have to delete the text you don't want from the content stream, and then append a new content stream with the text you want with a new font at the correct place.
P.S. the current PDFBox version is 2.0.7, not 2.0.2.

How to tell PdfPTable that PdfPCells need to be showed dynamically according to free space on PdfPTable

I have this PDF document that I made with iText in Java.
The PDF Document contains data that is added via PDFPTable objects.
The 'Problem' is that when I have more data then fits on one PDF page, the data is rendered on the next page, leaving me with empty space on the first page. (See the image 'Problem' side).
I would like to have these empty spaces filled with 'PDFPCell' object, see 'Solution' (these PdfPCell object contain another PdfPTable, the data in this PdfPTable must not be 'continued' on the next page of the pdf when it does not fit).
This is a small example in code:
PdfPTable outerTable = new PdfPTable(1);
outerTable.setHorizontalAlignment(Element.ALIGN_LEFT);
outerTable.setWidthPercentage(100);
int i = 0;
while (i < 5)
{
i++;
PdfPTable innerTable = new PdfPTable(new float[] {0.25f, 0.25f, 0.25f, 0.25f});
innerTable .setHorizontalAlignment(Element.ALIGN_LEFT);
innerTable .setWidthPercentage(100);
PdfPCell cell = new PdfPCell(innerTable);
cell.setPadding(0);
innerTable.addCell(new Phrase("test Data"));
innerTable.addCell(new Phrase("test Data"));
innerTable.addCell(new Phrase("test Data"));
innerTable.addCell(new Phrase("test Data"));
outerTable.addCell(cell);
}
document.add(outertable);
document.close();
Please take a look at the DropTablePart example. In this example, I add 4 tables with 19 rows to a ColumnText object. As soon as a table doesn't fit the page, I drop the remaining content of the ColumnText object (which will automatically drop the rest of the table) and I start a new page where a new table will start.
Dropping the content of the ColumnText object can be done in two different ways:
Either:
ct = new ColumnText(writer.getDirectContent());
Or:
ct.setText(null);
The result looks like this:
As you can see, rows 10-18 are dropped from inner table 3.
This is the full code:
public void createPdf(String dest) throws IOException, DocumentException {
Document document = new Document();
PdfWriter writer = PdfWriter.getInstance(document, new FileOutputStream(dest));
document.open();
Rectangle column = new Rectangle(36, 36, 559, 806);
ColumnText ct = new ColumnText(writer.getDirectContent());
ct.setSimpleColumn(column);
for (int i = 0; i < 4; ) {
PdfPTable table = new PdfPTable(new float[]{0.25f, 0.25f, 0.25f, 0.25f});
table.setHorizontalAlignment(Element.ALIGN_LEFT);
table.setWidthPercentage(100);
PdfPCell cell = new PdfPCell(new Phrase("inner table " + (++i)));
cell.setColspan(4);
table.addCell(cell);
for (int j = 0; j < 18; j++) {
table.addCell(new Phrase("test Data " + (j + 1) + ".1"));
table.addCell(new Phrase("test Data " + (j + 1) + ".1"));
table.addCell(new Phrase("test Data " + (j + 1) + ".1"));
table.addCell(new Phrase("test Data " + (j + 1) + ".1"));
}
ct.addElement(table);
if (ColumnText.hasMoreText(ct.go())) {
document.newPage();
ct = new ColumnText(writer.getDirectContent());
ct.setSimpleColumn(column);
}
}
document.close();
}
I didn't use nested tables, because it is generally a bad idea to use nested tables. It has a negative impact on the performance of your application and it usually results in code that is hard to maintain (the programmers who inherit our application will thank you for not using nested tables).

Pdf Merge issue in itextsharp - PDF looks distorted after Merge

I have a simple scenario where I extract pages from a PDF document (or split the document in two parts, if you will) and merge the parts back to a new document, with an option to add new pages in between.
However, in one particular case the resulting document differs from the original one in that couple of pages (in this case pages 4 and 5) look distorted in comparison to the source document.
How can I circumvent the distortion of the pages? The reproduction code below has been tested with iTextSharp versions 5.5.0.0 and 5.5.6.0 (latest at the moment).
You can find the input-File i used here.
void Main()
{
var pathPrefix = #"C:\temp"; // TODO change
var inputDocPath = #"input.pdf";
var part1 = ExtractPages(Path.Combine(pathPrefix, inputDocPath), 1, 2);
var outputPath1 = Path.Combine(pathPrefix, "part1.pdf");
File.WriteAllBytes(outputPath1, part1);
var part2 = ExtractPages(Path.Combine(pathPrefix, inputDocPath), 3);
var outputPath2 = Path.Combine(pathPrefix, "part2.pdf");
File.WriteAllBytes(outputPath2, part2);
var merged = Merge(new[] {
outputPath1,
outputPath2
});
var mergedPath = Path.Combine(pathPrefix, "output.pdf");
File.WriteAllBytes(mergedPath, merged);
}
//Page sizes:
// input: 8,26x11,68; 8,26x11,69; 8,26x11,69; 8,26x11,69; 8,26x11,69; 8,26x11,68; 8,26x11,68
// output: 8,26x11,68; 8,26x11,69; 8,26x11,69; 8,26x11,69; 8,26x11,69; 8,26x11,68; 8,26x11,68
public static byte[] Merge(string[] documentPaths)
{
byte[] mergedDocument;
using (MemoryStream memoryStream = new MemoryStream())
using (Document document = new Document())
{
PdfSmartCopy pdfSmartCopy = new PdfSmartCopy(document, memoryStream);
document.Open();
foreach (var docPath in documentPaths)
{
PdfReader reader = new PdfReader(docPath);
try
{
reader.ConsolidateNamedDestinations();
var numberOfPages = reader.NumberOfPages;
for (int page = 0; page < numberOfPages;)
{
PdfImportedPage pdfImportedPage = pdfSmartCopy.GetImportedPage(reader, ++page);
pdfSmartCopy.AddPage(pdfImportedPage);
}
}
finally
{
reader.Close();
}
}
document.Close();
mergedDocument = memoryStream.ToArray();
}
return mergedDocument;
}
public static byte[] ExtractPages(string pdfDocument, int startPage, int? endPage = null)
{
var reader = new PdfReader(pdfDocument);
var numberOfPages = reader.NumberOfPages;
var endPageResolved = endPage.HasValue ? endPage.Value : numberOfPages;
if (startPage > numberOfPages || endPageResolved > numberOfPages)
string.Format("Error: page indices ({0}, {1}) out of bounds. Document has {2} pages.",
startPage, endPageResolved, numberOfPages).Dump();
byte[] outputDocument;
using (var doc = new Document()) // NOTE use reader.GetPageSizeWithRotation(startPage) ?
using (var msOut = new MemoryStream())
{
var pdfCopyProvider = new PdfCopy(doc, msOut);
doc.Open();
for (var i = startPage; i <= endPageResolved; i++)
{
var page = pdfCopyProvider.GetImportedPage(reader, i);
pdfCopyProvider.AddPage(page);
}
doc.Close();
reader.Close();
outputDocument = msOut.ToArray();
}
return outputDocument;
}
I could reproduce the issue using your code and your test file with iTextSharp 5.5.6. Actually, though, the images are not merely distorted, they have been replaced by other ones! Inspecting the result PDF internally, one observes:
Originally page 3 through 5 each had their own respective Resource dictionary containing different entries than the ones of each other.
After split up, as pages 1 through 3 of part2.pdf, they still had different Resource dictionaries.
In the final merged result, though, page 3 through 5 all refer to the same Resource dictionary object, a copy of the resources of the original page 3!
(As page 3 contains images with the same names as the images on pages 4 and 5, this results in page 3 images being shown on pages 4 and 5.)
Somehow PdfSmartCopy seems to outsmart itself here, using PdfCopy instead creates the expected result.
I assume PdfSmartCopy falsely considers those source dictionaries identical, probably some hash collision without actual equality check.
It might be of interest to note that an equivalent test using Java and iText, SmartMerging.java, does not show the same issue, its result is as expected.
Thus, this looks like an issue of the iTextSharp port or .Net in general.

link coming twice while exporting to pdf using itextsharp

my asp boundfield:
<asp:BoundField DataField = "SiteUrl" HtmlEncode="false" HeaderText = "Team Site URL" SortExpression = "SiteUrl" ></asp:BoundField>
My itextsharpcode
for (int i = 0; i < dtUIExport.Rows.Count; i++)
{
for (int j = 0; j < dtUIExport.Columns.Count; j++)
{
if (j == 1)
{ continue; }
string cellText = Server.HtmlDecode(dtUIExport.Rows[i][j].ToString());
// cellText = Server.HtmlDecode((domainGridview.Rows[i][j].FindControl("link") as HyperLink).NavigateUrl);
// string cellText = Server.HtmlDecode((domainGridview.Rows[i].Cells[j].FindControl("hyperLinkId") as HyperLink).NavigateUrl);
iTextSharp.text.Font font = new iTextSharp.text.Font(bf, 10, iTextSharp.text.Font.NORMAL);
font.Color = new BaseColor(domainGridview.RowStyle.ForeColor);
iTextSharp.text.pdf.PdfPCell cell = new iTextSharp.text.pdf.PdfPCell(new Phrase(12, cellText, font));
pdfTable.AddCell(cell);
}
}
domainGridview is the grid name. However I am manipulating the pdf using data table.
The hyperlink is coming in this way
http://dtsp2010vm:47707/sites/TS1>http://dtsp2010vm:47707/sites/TS1
How to rip the addtional link?
Edit: i have added the screenshot of pdf file
Your initial question didn't get an answer because it is rather misleading. You claim link coming twice, but that's not true. From the point of view, the link is shown as HTML syntax:
http://stackoverflow.com
This is the HTML definition of a single link that is stored in the cellText parameter.
You are adding this content to a PdfPCell as if it were a simple string. It shouldn't surprise you that iText renders this string as-is. It would be a serious bug if iText didn't show:
http://stackoverflow.com
If you want the HTML to be rendered, for instance like this: http://stackoverflow.com, you need to parse the HTML into iText objects (e.g. the <a>-tag will result in a Chunk object with an anchor).
Parsing HTML for use in a PdfPCell is explained in the following question: How to add a rich Textbox (HTML) to a table cell?
When you have http://stackoverflow.com, you are talking about HTML, not just ordinary text. There's a big difference.
I wrote this code for achiveing my result. Thanks Bruno for your answer
for (int j = 0; j < dtUIExport.Columns.Count; j++)
{
if (j == 1)
{ continue; }
if (j == 2)
{
String cellTextLink = Server.HtmlDecode(dtUIExport.Rows[i][j].ToString());
cellTextLink = Regex.Replace(cellTextLink, #"<[^>]*>", String.Empty);
iTextSharp.text.Font fontLink = new iTextSharp.text.Font(bf, 10, iTextSharp.text.Font.NORMAL);
fontLink.Color = new BaseColor(domainGridview.RowStyle.ForeColor);
iTextSharp.text.pdf.PdfPCell cellLink = new iTextSharp.text.pdf.PdfPCell(new Phrase(12, cellTextLink, fontLink));
pdfTable.AddCell(cellLink);
}