PDFBOX merge several PDF without render Optional Content Group visibility

PDFBOX merge several PDF without render Optional Content Group visibility - pdf

In my project, I need to merge several pdf documents. Each document contains only one page with several optional content groups (OCG). Many OCGs are not activated and are not visible in the pdf viewer application.
After merging this pdf with PDFMergerUtility, I get merged PDF files.
In this pdf, except for the first page which is correct, in all other pages, all the ocg are visible. The previously disabled ocg becomes visible.
I don't know now if I'm using PDFMergerUtility correctly or if it's a bug, do you know, how to merge several pdf in a single file with the correct display ocg in java?
Is it possible to completely delete an OCG in pdf format?
Edit :
I currently use pdfbox-app-2.0.6, I will try with 2.0.7.
I use this pdf with all OCG disable : simple pdf layer exemple
and I try to merge with a copy of himself.
Sorry my other pdf are confidentiel.

This was fixed in issue PDFBOX-3973 and will be part of release 2.0.14 in a few weeks / months. It is available as a snapshot until release.
In the merged file both OCGs will coexist.

Related

Can you embed a separate pdf into Indesign and open it after exporting to PDF?

I would like to ask the following if possible. We have a client that wants a separate pdf document, embedded in a main pdf document and opens when you click it. Like the function in MS Word where you can attach another Word document inside a Word document (Word-ception, lol) and you can still open it.
I've tried it in Acrobat Pro with the Attachment and Link tools. Another option was to put the link document in an ftp server for accessibility. but our client really wants this functionality. Is this possible in Indesign?
Thank you!

Using Word as your example vehicle there are several ways to link 2 documents.
One is an appendix to the other, in PDF terms is a merge or binding but its one flowing document with separate sequential sections/chapters.
Another way is to link to an external file, in PDF terms a hyperlink to a relative second file, which can be locally folder relative or a web absolute reference. You have tried that.
In Word we can add objects internally with icons, in PDF that can be an annotation comment attachment to save externally and action accordingly. You also seem to discount that approach.
Finally PDF offers an Adobe Specific Structure where multiple PDFs attachments can be imbedded in an overall PDF wrapper. These are called Portfolios and not! to be confused with their portfolio service
They are unpopular since in a browser without Adobe Reader they should only offer the cover page.
Whilst in securer offline readers the files may well be shown as attachments that you need to save or independently open to view them.
Only some non Acrobat viewers may view them as a collection. And in the past that required runing insecure SWFlash, But I understand that has changed ?
Here is how the 3 internal PDF files seen above were shown in older Acrobat 9.
Possibly the best experience is using Foxit Reader

How to merge PDFs into a PDFA1b with watermarks using iText5

Here is what I need to do:
Merge several PDF documents (which may or may not be PDFA) into one PDFA1b.
Add a watermark (a simple text label) on each page of the resulting PDF.
It has to be with iText 5
I have looked at this official merging example: http://developers.itextpdf.com/examples/merging-pdf-documents/adding-cover-page-existing-pdf
But can this method be used to create a PDFA, and also add watermarks?
Or am I stuck with using this other method which he specifically says not to use: http://developers.itextpdf.com/examples/merging-pdf-documents-itext5/how-not-merge-documents

You can create files that conform to PDF/A-1b with just about any PDF library including iText. PDF/A, in general, is a subset of ISO 32000 (PDF) so it's really just a matter of using the tool to do what you need to with the files but not adding anything that is forbidden by PDF/A-1b (in your case).
The thing to be aware of is that iText or any of the other libraries that "support" PDF/A, will not prevent you from modifying PDF in a way that is forbidden by PDF/A... you just need to know what those things are.
So... before merging, you'll want to be sure that the input files don't have any annotations or form fields or any other interactive content.
After merging, add your watermark as page content and be sure your XMP metadata is conforming and you should be OK.

Page Templates with Form XObject in PDF

I'm writing a PDF generation library and wanted to add the the ability to use other PDFs as templates. The specification notes a TemplateInstantiatedproperty on pages with the alias of the template object should be all that is needed.
Here is a gist of the pdf content:
https://gist.github.com/tyre/89c12f8203181f078001
The template itself is stored in object 16 and the page in object 19.
qpdf --check reports the PDF as invalid:
WARNING: tmp/alpaca.pdf: file is damaged
WARNING: tmp/alpaca.pdf (file position 32089): xref not found
WARNING: tmp/alpaca.pdf: Attempting to reconstruct cross-reference table
checking tmp/alpaca.pdf
PDF Version: 1.7
File is not encrypted
File is not linearized

I'm afraid your PDF document is completely and utterly broken and that you have misunderstood a number of key concepts. You cannot simply incorporate a complete PDF file into another PDF file in the way you have done and expect that to work.
The template system you are referring to is intended to include "hidden" pages - not referenced in the pages tree in the PDF file - in the context of an interactive form document (or interactive document in general). That doesn't sound like what you are intending to do. And these pages need to be valid PDF pages. You can in other words not just include the original PDF document verbatim and expect the PDF reader to sort things out; you need to insert a syntactically correct PDF page object.
What you want to do is take the content of a document and apply that as a background to a document. This most commonly is done using XObjects. Pseudo-code for this could be:
Open the original PDF document
Open the "template" document
Read the template document and copy all elements from the template page into a newly created XObject in the original PDF document.
Modify the page contents of the pages in the original PDF document to paint the new XObject at the beginning of the page description of the existing pages.
It's important to note that again, you're not supposed to simply insert the template document into the stream for the newly created XObject. You will have to create a valid XObject that contains a properly formed resources dictionary referencing all resources needed by your XObject, and that contains the content stream from your template document.

As already indicated in comments, the PDF presented by the OP is structurally defect, the cross reference table position and entries are wrong. Furthermore the transition from one PDF revision to a next update looks questionable. Essentially, therefore, the OP will have to provide a sample PDF which is at least syntactically correct.
That been said, though, the OP indicated he was
writing a PDF generation library and wanted to add the the ability to use other PDFs as templates. The specification notes a TemplateInstantiatedproperty on pages with the alias of the template object should be all that is needed.
The Named Pages mechanism is not meant for something like that. Its main current use (if it is used at all) is in the context of spawning page templates by Acroform actions.
For using pages from other PDFs, one can simply copy them (and the referenced other objects) from the source PDF if they are to be used as separate pages as is; and if multiple templates are to be put onto a single target page, one can wrap the copied sources into form xobjects and include them in the target page.

ExpertPdf - how to generate pdf with marking text possibility from HTML

I'm using ExpertPDF library to generate PDF. I had the same code on two different servers. On one of them PDF was generated as a text (I could mark text in pdf and copy). On the second server PDF was generated as a picture, so I couldn't mark text in PDF file.
What is the differenc? Where should I looking for an error?
Now, after installing windows updates on all servers PDF is generateing as a Picture. I think, that updates had main influence on that change, but I'm not shure.

I had this problem too, with an older version of ExpertPdf that used IE for rendering. The problem is fixed in the last version (v9) that does not rely on IE anymore.

Save out a new PDF with updates from users

In my iOS app, I would like to regenerate an existing pdf into another pdf after the users are done annotating on the existing pdf.
My regenerated pdf should be an exact replica of the existing pdf but should have embedded annotations and highlights etc which can be opened and viewed on desktops as well.
I have done some research on this including the solutions proposed on other SO posts. I have tried libharu etc.
But somehow I am not able to convert an existing pdf into a replica pdf. I am able to add annotations to a new pdf I create using libharu.
Now my problem is to copy the existing pdf as is to my regenerated pdf. Any pointers will be much helpful.

My understanding is that a library that can save back out a PDF with "true" annotations (those that can be hidden in Acrobat, for example) is not something that exists in a FOSS solution.
LibHaru, for example, only supports creating new PDFs, not editing or appending existing PDFs. From their homepage:
At this moment libHaru does not support reading and editing existing
PDF files and it's unlikely this support will ever appear.
You can render the PDF on a page by page basis, and then re-save it with some additional information. This S.O question has a reasonable looking piece of code. That will save any "annotations" more as an image in the PDF itself, though.
You might try a paid library like PDFNet.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas