Page Templates with Form XObject in PDF - pdf

I'm writing a PDF generation library and wanted to add the the ability to use other PDFs as templates. The specification notes a TemplateInstantiatedproperty on pages with the alias of the template object should be all that is needed.
Here is a gist of the pdf content:
https://gist.github.com/tyre/89c12f8203181f078001
The template itself is stored in object 16 and the page in object 19.
qpdf --check reports the PDF as invalid:
WARNING: tmp/alpaca.pdf: file is damaged
WARNING: tmp/alpaca.pdf (file position 32089): xref not found
WARNING: tmp/alpaca.pdf: Attempting to reconstruct cross-reference table
checking tmp/alpaca.pdf
PDF Version: 1.7
File is not encrypted
File is not linearized

I'm afraid your PDF document is completely and utterly broken and that you have misunderstood a number of key concepts. You cannot simply incorporate a complete PDF file into another PDF file in the way you have done and expect that to work.
The template system you are referring to is intended to include "hidden" pages - not referenced in the pages tree in the PDF file - in the context of an interactive form document (or interactive document in general). That doesn't sound like what you are intending to do. And these pages need to be valid PDF pages. You can in other words not just include the original PDF document verbatim and expect the PDF reader to sort things out; you need to insert a syntactically correct PDF page object.
What you want to do is take the content of a document and apply that as a background to a document. This most commonly is done using XObjects. Pseudo-code for this could be:
Open the original PDF document
Open the "template" document
Read the template document and copy all elements from the template page into a newly created XObject in the original PDF document.
Modify the page contents of the pages in the original PDF document to paint the new XObject at the beginning of the page description of the existing pages.
It's important to note that again, you're not supposed to simply insert the template document into the stream for the newly created XObject. You will have to create a valid XObject that contains a properly formed resources dictionary referencing all resources needed by your XObject, and that contains the content stream from your template document.

As already indicated in comments, the PDF presented by the OP is structurally defect, the cross reference table position and entries are wrong. Furthermore the transition from one PDF revision to a next update looks questionable. Essentially, therefore, the OP will have to provide a sample PDF which is at least syntactically correct.
That been said, though, the OP indicated he was
writing a PDF generation library and wanted to add the the ability to use other PDFs as templates. The specification notes a TemplateInstantiatedproperty on pages with the alias of the template object should be all that is needed.
The Named Pages mechanism is not meant for something like that. Its main current use (if it is used at all) is in the context of spawning page templates by Acroform actions.
For using pages from other PDFs, one can simply copy them (and the referenced other objects) from the source PDF if they are to be used as separate pages as is; and if multiple templates are to be put onto a single target page, one can wrap the copied sources into form xobjects and include them in the target page.

Related

Embedding PDF graphics in PDF output file programmatically

I am looking for a rough overview of how one would go about embedding graphics (coming from a PDF file) into another PDF file when writing a C++ document processor.
Background: I work on the LilyPond music typesetter, and recently added Cairo output to the system. Now I would like to support adding externally provided graphics to the PDF files that we generate (eg. adding a logo onto page laid out). This is trivial with EPS for PS output.
I can see how you could hook up Poppler to read the PDF, and render the PDF contents onto a Cairo surface, but I wonder if there is a simpler shortcut (eg. embed the PDF file as a binary stream, and then point directly to that stream).
If you need to go via an external route, like reading the PDF and writing it into an existing PDF using Cairo, that would be simpler. To do it manually:
A PDF page consists of a stream of operators for drawing it, and a dictionary of external resources (fonts, images etc.). To stamp one PDF page onto another, you would need to:
a) Find all objects for external resources in the stamp which are needed, and add them to the destination PDF.
b) Convert the page to a "Form Xobject", which is a sort of reusable piece of content. Add this to the /XObjects entry in the destination page, making sure to pick a fresh name.
c) Add some operators to the page content in the destination page to invoke the new xobject
To see how this might work, you could play with -stamp-as-xobject and -postpend-content "/XObjName Do" from section 8.4 of the cpdf manual.
Making this work for arbitrary PDFs is really not for the faint of heart, I'm afraid.

PDFBOX merge several PDF without render Optional Content Group visibility

In my project, I need to merge several pdf documents. Each document contains only one page with several optional content groups (OCG). Many OCGs are not activated and are not visible in the pdf viewer application.
After merging this pdf with PDFMergerUtility, I get merged PDF files.
In this pdf, except for the first page which is correct, in all other pages, all the ocg are visible. The previously disabled ocg becomes visible.
I don't know now if I'm using PDFMergerUtility correctly or if it's a bug, do you know, how to merge several pdf in a single file with the correct display ocg in java?
Is it possible to completely delete an OCG in pdf format?
Edit :
I currently use pdfbox-app-2.0.6, I will try with 2.0.7.
I use this pdf with all OCG disable : simple pdf layer exemple
and I try to merge with a copy of himself.
Sorry my other pdf are confidentiel.
This was fixed in issue PDFBOX-3973 and will be part of release 2.0.14 in a few weeks / months. It is available as a snapshot until release.
In the merged file both OCGs will coexist.

How to merge PDFs into a PDFA1b with watermarks using iText5

Here is what I need to do:
Merge several PDF documents (which may or may not be PDFA) into one PDFA1b.
Add a watermark (a simple text label) on each page of the resulting PDF.
It has to be with iText 5
I have looked at this official merging example: http://developers.itextpdf.com/examples/merging-pdf-documents/adding-cover-page-existing-pdf
But can this method be used to create a PDFA, and also add watermarks?
Or am I stuck with using this other method which he specifically says not to use: http://developers.itextpdf.com/examples/merging-pdf-documents-itext5/how-not-merge-documents
You can create files that conform to PDF/A-1b with just about any PDF library including iText. PDF/A, in general, is a subset of ISO 32000 (PDF) so it's really just a matter of using the tool to do what you need to with the files but not adding anything that is forbidden by PDF/A-1b (in your case).
The thing to be aware of is that iText or any of the other libraries that "support" PDF/A, will not prevent you from modifying PDF in a way that is forbidden by PDF/A... you just need to know what those things are.
So... before merging, you'll want to be sure that the input files don't have any annotations or form fields or any other interactive content.
After merging, add your watermark as page content and be sure your XMP metadata is conforming and you should be OK.

Is there any file pdf version which allow for automatic(or manual) addition of http source of document?

Is there any pdf version which allow for automatic(or manual) addition of http source of document ?
Scenarion of this problem from user side looks like that :
I found disire document in pdf format on web.
I save it.
In a few months I open this document and I wish to find the web page where I've found it.
It would be nice to have somewhere address of that file, of course it could be manually written in soe text file, but usually there are problems with copy+paste of pdf documents titles.
If you can modify your PDF files before sending them to the browser, then there are several places where you could put the URL where the document came from:
You could use a node in the "logical structure" tree (chapter 14 part 7 of the PDF reference document). This tree will show up in Acrobat Reader in the "Model Tree" tab.
You could add a hyperlink annotation to the top or bottom of each page, or the first page, or in a new page that you can add at the beginning or at the end of the file. I personally think this is the best approach since the link will be click-able.
You could add a button field on a page that fires a GoTo action that is linked to the source URL. Actions are explained in chapter 12 - Interactive Features of the PDF reference document.
You could add a bookmark(outline) that points to a named destination that is linked to the source URL. Named Destinations are also explained in chapter 12. This approach can also be used with just one click, and it is possible to hide the bookmarks tab if we will not use it.
You could add it as a Document property as #Bobrovsky said.
PDF allows you to add custom values to document information dictionary (see 14.3.3, "Document Information Dictionary" in PDF Reference). You might put your URL there. Adobe Reader will show custom values in Document Properties dialog on the Advanced tab.
Starting from PDF 1.4 (Acrobat 5x and later) you might add URL to XMP Metadata stream referenced from document catalog (see 14.3 Metadata in PDF Reference). Adobe Reader will show metadata properties too if you put them in Custom scheme.
Acrobat Professional could be used to add custom values or XMP metadata. Almost any PDF library that can open and save PDFs could be used for the task too.
I think there is no other places in a PDF document that you can use to store your information.
PDF Reference

PDF files Stitching with cover page and disclaimer pages

The possibility to stitch multiple PDF files as one merged PDF file.
We need to stitch multiple PDF financial Report as a single merged PDF file( One PDF report package).
The first page of each report is the cover page, then follows by report body, and last
one or more pages contain disclaimer info. The table content on the cover page is
different for each of the seperated Reports before stiting; and the disclaimers info is also some difference from each other.
What we want to do is that the merged PDF package after stitching should contains
one cover page with a new table content info, such as the page numbers for each section
in this stitched PDF report package; and one set of combined disclosures info on the last
one or several pages should covered all disclaimer info, no missing and no repeating any
of the disclaimer info from the speared reports before stitching.
The report bodies themselves should be concatenated together in the between of the cover
page and first disclaimer pager after merged PDF package.
If you are using either Java or C#, you can probably use the iText PDF library. In particular, see the 'addPage()' method of the PdfCopy class:
PdfCopy Class addPage method
If commercial products are an option for you, and you are targeting Windows, you could use Amyuni PDF Creator ActiveX (C++, Delphi, VB, ASP Classic, etc) or Amyuni PDF Converter .Net (C#, VB.NET etc) for this task, specifically the Append method. You can get customer support during evaluation period.
From the documentation:
Append Method
The Append method can be used to append or concatenate a PDF file to
the current document. Syntax:
C++: HRESULT Append ([in] BSTR FileName,[in] BSTR Password)
C#: void Append (string fileName, string password)
VB: Sub Append (FileName As String, Password As String)
You may also need to use DeletePage and Merge methods. Merge method is for drawing the content of a PDF file on top of another PDF file.
Usual disclaimer applies.