PdfDocument's copyPagesTo method or PdfCanvas's copyAsFormXObject to copy content from PDF to PDF - pdf

I followed the guide at this URL: http://developers.itextpdf.com/content/itext-7-jump-start-tutorial/chapter-6-reusing-existing-pdf-documents
Following that guide, I had a problem where some content from the PDF was not copied into the destination PDF when using copyAsFormXObject (which I submitted a support ticket for). An alternative I found in the meantime was that I could use the PdfDocument's copyPagesTo method and simply open the page that was copied with getPage on the destination PDF. From that, I can create a PdfCanvas from the existing page and do our transformations (such as scaling) on the object.
This seems to work exactly as the code in the aforementioned guide with the exception that the PDFs I found where content wasn't copied, the content now appears to be copied.
Are there any drawbacks to using the copyPagesTo method to copy the content as opposed to what the guide suggests (copyAsFormXObject)? Performance, memory, or extraneous non-visible content, etc.?
Code that exhibits this problem:
PdfDocument pdf = new PdfDocument(new PdfWriter(dest));
PdfDocument origPdf = new PdfDocument(new PdfReader(src));
PdfPage origPage = origPdf.getPage(1);
PdfPage page = pdf.addNewPage();
PdfCanvas canvas = new PdfCanvas(page);
PdfFormXObject pageCopy = origPage.copyAsFormXObject(pdf);
canvas.addXObject(pageCopy, 0, 0);
pdf.close();
origPdf.close();
Code that does not:
PdfDocument pdf = new PdfDocument(new PdfWriter(dest));
PdfDocument origPdf = new PdfDocument(new PdfReader(src));
origPdf.copyPagesTo(1,2,pdf);
pdf.close();
origPdf.close();

I've provided code and answers for the specific problem on your support ticket.
As for the difference between copyToPages() and copyAsFormXObject() for copying pages:
copyToPages() is a high level method that copies over the entire page, maintaining all structure and adding any applicable resources to the new document.
With copyAsFormXObject(), you first need to transform the page to an XObject, essentially turning it into an appearance stream. If this page needs additional settings or resources to be displayed correctly, such as a different page size or fonts that were not stored on the page itself, they need to be manually set or added. XObject are always added at absolute positions, so this needs to be specified too.
While copying using low-level methods such as XObjects grants a lot more control over what the result can look like, they come with their own dangers and pitfalls. For ubiquitous tasks such as copying pages, it is better to use the high-level methods to avoid such possible problems.
EDIT:
We've decided that this behaviour is a bug and that 'copyAsFormXObject()' should include the used resources even if they're stored at the /Pages level. This will be fixed in a later release of iText

Related

Tagged annotation link not reading properly in PDF with JAWS?

What is the problem?
By using PDFBOX I tried to tag a link which contains annotation. The PDF creating sample code is here. I tagged a nested paragraph link by using PDFBOX. The newly created tagged PDF is passing the Adobe checker.
What I observed?
After tagging pdf I tried to read it using JAWS. But unfortunately the jaws is not reading links in both "entire document mode" and "read currently visible page" mode. Then I jumped to StructTree Root and compared with adobe created tagged PDF vs our tagged PDF. The ParentTree(NUMTree) is not matching with adobe tagged pdf.
Tagged by me
Tagged by Adobe
What I Tried?
I tried to replicate the adobe numtree in my pdf. Almost I am able to create same but except one object.
Above image left side is adobe tagged and right side is created by me. I didn't understand the adobe why it is created entire structtree under this (142 O R) object? I am adding annotation object to TumTree by using this code
private void addWidgetContent(PDObjectReference objectReference, PDStructureElement fieldElem, String type, int pageIndex) {
COSDictionary annotDict = new COSDictionary();
COSArray annotArray = new COSArray();
annotArray.add(COSInteger.get(currentMCID));
annotArray.add(objectReference);
annotDict.setItem(COSName.K, annotArray);
annotDict.setString(COSName.LANG, "EN-US");
annotDict.setItem(COSName.P, currentElem.getCOSObject());
annotDict.setItem(COSName.PG, pages.get(pageIndex).getCOSObject());
annotDict.setName(COSName.S, type);
annotDicts.add(annotDict);
setNextMarkedContentDictionary();
numDictionaries.add(annotDict);
fieldElem.appendKid(objectReference);
currentElem.appendKid(fieldElem);
}
And one more bug I saw in Adobe preflight is
How can I fix these bugs and what is the correct way of tagging link annotation to be read by JAWS? Please help me .......
I got some update to share. Now I crated parent tree root without having any bugs. Here the file tagged pdf file.
Still this tagged pdf not reading links. Why ?????????

How to get rid of unwanted extra pages when converting a goole document to pdf via google-apps-script?

I have an old script that (among other things) converts a google document to pdf.
It used to work ok, but now two extra blank pages appear in the pdf version of the file.
I just discovered that this problem affects also the "download as pdf" menu option in google documents. There is a number of workarounds in that case, but I need a workaround for google-apps-script.
In this post the solution to a similar problem seems to involve a fine tuning of the page size. I tried something like that, but it does not trivially apply.
I also tried some other (kind of random) variations for the page size and margins, but to no avail.
Below I'm pasting a minimal working example. It should create a document file "test" and its pdf version "test.pdf" in your main drive folder.
Any help getting rid of the two extra pages is greatly appreciated.
Thanks
function myFunction() {
// this function
// - creates a google document "test",
// - writes "this is a test" inside it
// - saves and closes the document
// - creates a pdf version of the document, called "test.pdf"
//
// the conversion is ok, except two extra blank pages appear in the pdf version.
// create google document
var doc = DocumentApp.create('test');
var docFile = DriveApp.getFileById( doc.getId() );
// set margins (I need landscape layout)
// this is an attempt to a solution, inspired by https://stackoverflow.com/questions/18426817/extra-blank-page-when-converting-html-to-pdf
var body = doc.getBody();
body.setPageHeight(595.2).setPageWidth(841.8);
var mrg = 40; // in points
body.setMarginTop(mrg).setMarginBottom(mrg);
body.setMarginLeft(mrg).setMarginRight(mrg);
// write something
body.appendParagraph('this is a test').setHeading(DocumentApp.ParagraphHeading.HEADING2).setAlignment(DocumentApp.HorizontalAlignment.CENTER);
// save and close file
doc.saveAndClose();
// convert file to pdf
var docblob = docFile.getAs('application/pdf');
// set pdf name
docblob.setName("test.pdf");
// save pdf file
var file = DriveApp.createFile(docblob);
}
I found the source of the problem and a solution in this post on the google product forum, dating 8 months back.
The extra pages appear in the pdf if the option in view -> print layout is not checked.
I did some further tests, with my accounts and my colleagues'.
The results are consistent:
when view -> print layout is not checked two extra pages appear in the pdf version of the document
when view -> print layout is checked the pdf version of the document has the expected number of pages.
this setting affects also the documentApp services in Google Apps Script. That is: the above script produces the expected pdf version only if the "view->print layout" option in Google Documents is checked.
I do not see how this behaviour could be a "feature", so I think it's a bug. By the way "print layout" does not seem to have any visible effect on my documents (other than messing up the pdf version). I'm surprised that after 8 months the bug is still out there.
Number 3 above surprised me, because I did not think that an option set manually in a (any) google document would affect my scripts.
I'm currently looking for a way of setting the "print layout" option from inside the script. So far I had no luck with that.

The font properties looks similar but looks and feel is different in two pdf documents

I have a PDF template document given by my client as a reference. My application has to create it. When i compared the PDF generated by my application and the template given by the client i could see the look and feel are slightly different.I opened it in Adobe acrobat pro dc, i could still see the font properties are same. Please see the link below to see the PDFs header i created which has the above mentioned issue.
Please see the image:
I have the font properties defined in xml as below.This will be unmarshalled later.
<font name="Arial Narrow" id="blackBold" fsize="12" type="bold" rgb="0,0,0"/>
I have the different font type defined in a map like below,
fontAndAlignmentMap.put("bold", 1);
//Internally this will be resolved to-->com.itextpdf.text.Font.BOLD = 1;
The itext Font will be created from an unmarshalled xml font object as below:
FontFactory.registerDirectories();
com.itextpdf.text.Font itextFont =
FontFactory.getFont(sbcFont.getName(), sbcFont.getFsize(), fontAndAlignmentMap.get(sbcFont.getType()), new BaseColor(r,g,b));
The 10 page pdf document have different fonts and formatting.The Statically defined fonts are unmarshalled and loaded into the itextFontMap as shown below.
itextFontMap.put(sbcFont.getId(), itextFont);
sbcFont.getId() are "blackBold","blackRegular" etc.Depends on the use the appropriate font will be pulled from this map.
Phrase pSbc = new Phrase("Summary of Benefits and Coverage:",
SBCMappingSingletonBuilder.itextFontMap.get("blackBold"));`
Phrase pSbcDesc = new Phrase(
" What this Plan Covers & What You Pay For Covered Services",
SBCMappingSingletonBuilder.itextFontMap.get("blackRegular"));
When i printed the family name of the com.itextpdf.text.Font object.
blackBold.getFamilyname() :Arial Narrow
blackRegular.getFamilyname() :Arial Narrow
This is what printed from the code.
sbcFont.getName() :Arial Narrow Bold
sbcFont.getName() :Arial Narrow
Phrases added to a PdfPCell; PdfPCell added to PdfPTable and then to the document.
Can any one please throw some light here.I am using java 1.6 and itextpdf-5.3.4
The sample template which i am referring please click
here
The one i am tring to create similar to the template,please click
here
I opened the pdf in Adobe Acrobat Pro.The bold font look like narrow even though the font properties are showing as bold in Adobe Acrobat Pro.But when i opened the pdf in a browser,it looks good.

PDFBox - document is empty after loading

I am using Apache PDFBox for rendering thumbnails of PDF documents. Therefore I load the PDF and use the first page as thumbnail. The problem is, that for a particular document, it seems, it is not loaded correctly. For all other docs, it works like expected.
ByteArrayInputStream is = new ByteArrayInputStream(pdfData);
PDDocument pdf = PDDocument.load(is, true);
List<PDPage> pages = pdf.getDocumentCatalog().getAllPages(); //pages is empty here
The pdf file has 238 pages and is around 6,5 MB of size.
Assuming that you're using an 1.8.* version, please use the non sequential parser:
PDDocument pdf = PDDocument.loadNonSeq(is, null);
The non sequential parser is successful in certain cases where the old parser fails, e.g. for PDFs that have had revisions (example). Another advantage is that no extra code is needed for "protected" PDFs that are encrypted with the empty password.

Hyperlink in existing PDF

I am trying to add a hyperlink based off of known position coordinates in the PDF. I have tried editing the physical pdf code and have added a link, but in the process deleted other content on the pdf.
[/Rect [ x x x x ]
/Action
<</Subtype /URI/URI (http://www.xxxxx.com/)>>
/Subtype /Link
/ANN pdfmark
Is there any way of adding the hyperlink without corrupting the existing pdf? Would converting to a different file format adding the link and converting back be a better approach? Possible commercial use prevents use of some gnu licensed products.
Debenu Quick PDF Libarary also provides a solution. I also recommend to don't edit the 'physical code' of the PDF file (with Notepad or others), because it won't give any solution - neither in other cases.
Here is a sample code how to do it with the Debenu Quick PDF Library:
/* Add a link to a webpage*/
// Set the origin for the co-ordinates to be the top left corner of the page.
DPL.SetOrigin(1);
// Adding a link to an external web page using the AddLinkToWeb function.
DPL.AddLinkToWeb(200, 100, 60, 20, "www.debenu.com", 0);
// Hyperlinks and text are two separate elements in a PDF,
//so we'll draw some text now so that you know
//where the hyperlink is located on the page.
DPL.DrawText(205, 114, "Click me!");
// When the Debenu Quick PDF Library object is initiated a blank document
// is created and selected in memory by default. So
// all we need to do now is save the document to
// the local hard disk to see the changes that we've made.
DPL.SaveToFile("link_to_web.pdf");
Member of Debenu
Docotic.Pdf library can add hyperlinks to existing PDFs. The library is not *GPL-licensed and can be used in commercial solutions after purchasing a license.
Below is a code that adds hyperlink on to the first page of a PDF.
using System;
using System.Drawing;
public static void AddHyperlink()
{
// NOTE:
// When used in trial mode, the library imposes some restrictions.
// Please visit http://bitmiracle.com/pdf-library/trial-restrictions.aspx
// for more information.
using (PdfDocument pdf = new PdfDocument("input.pdf"))
{
PdfPage page = pdf.Pages[0];
RectangleF rectWithLink = new RectangleF(10, 70, 200, 100);
page.AddHyperlink(rectWithLink, new Uri("http://google.com"));
pdf.Save("output.pdf");
}
}
Disclaimer: I work for the vendor of the library.