Add a cover page to a PDF document - pdf

I create a PDF document with EVO PDF library from a HTML page using the code below:
HtmlToPdfConverter htmlToPdfConverter = new HtmlToPdfConverter();
byte[] outPdfBuffer = htmlToPdfConverter.ConvertUrl(url);
Response.AddHeader("Content-Type", "application/pdf");
Response.AddHeader("Content-Disposition", String.Format("attachment; filename=Merge_HTML_with_Existing_PDF.pdf; size={0}", outPdfBuffer.Length.ToString()));
Response.BinaryWrite(outPdfBuffer);
Response.End();
This produces a PDF document but I have another PDF document that I would like to use as cover page in the final PDF document.
One possiblity I was thinking about was to create the PDF document and then to merge my cover page PDF with the PDF produced by converter but this looks like an inefficient solution. Saving the PDF and loading back for merge seems to introduce a unnecessary overhead. I would like to merge the cover page while the PDF document produced by converter is still in memory.

The following line added in your code right after you create the HTML to PDF converter object should do the trick:
// Set the PDF file to be inserted before conversion result
htmlToPdfConverter.PdfDocumentOptions.AddStartDocument("CoverPage.pdf");

Related

How to replace PDF page

Given a PDF (with multiple pages), a page (stream) and a index , how can i replace the page at target index with my stream and save the pdf ?
public Stream ReplacePDFPageAtIndex(Stream pdfStream,int index,Stream replacePageStream)
{
List<Stream> pdfPages= //fetch pdf pages
pdfPages.RemoveAt(index);
pdfPages.Insert(index,replacePageStream);
Stream newPdf=//create new pdf using the edited list
return newPdf;
}
I was wondering if there is any open source solution or i can roll my own with ease , given the fact that i just need to split the pdf in pages , and replace one of them.
I have also checked ITextSharp but i do not see the functionality that i require.

How can I reduce size of generated PDF using iText7

I use this method to copy and scale page by page number from original PDF and put them to generated PDF which contains only selected and scaled pages from original PDF.
private static void addScaledPage(PdfDocument pdf, PdfDocument srcDoc, String pageNumber) throws IOException {
PdfPage page = pdf.addNewPage(PageSize.A4);
PdfCanvas canvas = new PdfCanvas(page);
AffineTransform transformationMatrix = AffineTransform.getScaleInstance(0.86, 0.86);
canvas.concatMatrix(transformationMatrix);
PdfFormXObject pageCopy = srcDoc.getPage(Integer.valueOf(pageNumber)).copyAsFormXObject(pdf);
canvas.addXObject(pageCopy, 50, 30);
}
This code works fine, but small issue happen when I try to take 3 pages from original PDF which have 140 pages and approx. 10 MB size => the generated PDF with 3 selected pages also have approx. 10 MB.
Also, when I try to copy 3 pages or 10 pages from original document I got always the same size of generated PDF => it seems like references are copied from source PDF
I would appreciate to give me some advice, did I do something wrong in the implementation? Or some other advice?
Kindest regards,
It depends a lot on the resources embedded in the document. If a large image that uses CMYK color, or a font with CJK glyphs (either of these resources could easily be several MB in size) is used on the pages you are copying, that entire resource will be copied into the PDF you're creating. The fact that you are only copying three out 140 pages wouldn't make much difference: the bulk of the file size will be taken up by the resource, and the pages won't display properly without it.
A solution would be a workflow that optimizes your document during or after copying the pages. This could convert images to an equivalent, smaller color space, or subset the font so that you only carry the required glyphs. Both of these techniques can substantially reduce the size of the file (but this is all dependent on how the source file itself is constructed, of course).

Best practice to compress generated PDF using iText7

I have existing / source PDF source document and I copy selected pages from it and generate destination PDF with selected pages. Every page in existing / source document is scanned in different resolution and it varies in size:
generated document with 4 pages => 175 kb
generated document with 4 pages => 923 kb (I suppose this is because of higher scan resolution of each page in source document)
What would be best practice to compress this pages?
Is there any code sample with compressing / reducing size of final PDF which consists of copied pages of source document in different resolution?
Kindest regards
If you are just adding scans to a pdf document, it makes sense for the size of the resulting document to go up if you're using a high resolution image.
Keep in mind that iText is a pdf library. Not an image-manipulation library.
You could of course use regular old java to attempt to compress the images.
public static void writeJPG(BufferedImage bufferedImage, OutputStream outputStream, float quality) throws IOException
{
Iterator<ImageWriter> iterator = ImageIO.getImageWritersByFormatName("jpg");
ImageWriter imageWriter = iterator.next();
ImageWriteParam imageWriteParam = imageWriter.getDefaultWriteParam();
imageWriteParam.setCompressionMode(ImageWriteParam.MODE_EXPLICIT);
imageWriteParam.setCompressionQuality(quality);
ImageOutputStream imageOutputStream = new MemoryCacheImageOutputStream(outputStream);
imageWriter.setOutput(imageOutputStream);
IIOImage iioimage = new IIOImage(bufferedImage, null, null);
imageWriter.write(null, iioimage, imageWriteParam);
imageOutputStream.flush();
}
But really, putting scanned images into a pdf makes life so much more difficult. Imagine the people who have to handle that document after you. They open it, see text, try to select it, and nothing happens.
Additionaly, you might change the WriterProperties when creating your PdfWriter instance:
PdfWriter writer = new PdfWriter(dest,
new WriterProperties().setFullCompressionMode(true));
Full compression mode will compress certain objects into an object stream, and it will also compress the cross-reference table of the PDF. Since most of the objects in your document will be images (which are already compressed), compressing objects won't have much effect, but if you have a large number of pages, compressing the cross-reference table may result in smaller PDF files.

How to move XFA xml data into PDF/A-2 conforming File with iText/XFA Worker

In the Adobe's ISO 32000 spec for PDF/A it states that XFA data can be stored in a special place in the PDF/A-2 confirming PDF. Here is the text of that section.
Incorporation of XFA Datasets into a PDF/A-2 Conforming File
To support PDF/A-2 conforming files, ExtensionLevel 3 adds support for XML form data (XFA datasets)
through the XFAResources name tree, which is part of the name dictionary of the document catalog.
(See “TABLE 3.28 Entries in the name dictionary” on page 23.) While Acrobat forms (and form data) are
permitted in a PDF/A-2 conforming file, XML forms are not. Such XML forms are specified as XDP streams
referenced from interactive form dictionaries. XDP streams can contain XFA datasets.
For applications that convert PDF documents to PDF/A-2, the XFAResources name tree supports
relocation of XML form data from XDP streams in a PDF document into the XFAResources name tree.
The XFAResources name tree consists of a string name and an indirect reference to a stream. The string
name is created at the time the document is converted to a PDF/A-2 conforming file. The stream contains
the element of the XFA, comprised of elements.
In addition to data values for XML form fields, the elements enable the storage and retrieval
of other types of information that may be useful for other workflows, including data that is not bound to
form fields, and one or more XML signature(s).
See the XML Architecture, XML Forms Architecture (XFA) Specification, version 2.6 in the Bibliography
We have an XFA Form that we pass xml to and now need to convert that document to PDF/A-2.
We are currently testing out XFA Worker to see if that will allow us to do this, I have been unable to find a sample of XFA Worker that will do this for us.
I first tried to flatten with XFA Worker but that removes the data completely and is no longer able to be extracted.
How do you get the XFA xml data into the place that Adobe says to put it in with XFA Worker?
UPDATE: Thanks Bruno, my code isn't allowing me to convert the XFA Form to PDF/A-2. Here is the code I used.
xfa.fillXfaForm(new ByteArrayInputStream(xmlSchemaStream.toByteArray()));
stamper.close();
reader.close();
try (ByteArrayOutputStream outputStreamDest = new ByteArrayOutputStream()) {
PdfReader pdfAReader = new PdfReader(output.toByteArray());
PdfAStamper pdfAStamper = new PdfAStamper(pdfAReader, outputStreamDest, PdfAConformanceLevel.PDF_A_2A);
....
and I get an error com.itextpdf.text.pdf.PdfAConformanceException: Only PDF/A documents can be opened in PdfAStamper.
So I am now assuming the new PdfAStamper isn't a converter but just reading in the byte array of the XFA PDF.
Allow me to start with some fatherly advice. XFA will be deprecated in ISO-32000-2 (PDF 2.0) and it is great that you are turning your XFA documents into PDF/A documents. However, why would you choose for PDF/A-2? PDF/A-3 is identical to PDF/A-2 with one exception: in PDF/A-3, you are allowed to embed XML files. You can even indicate the relationship between the attached XML and the PDF. Wouldn't it be smarter to create a PDF/A-3 file and to attach the original data (not the XFA file) as an attachment?
Suppose that you'd ignore this fatherly advice, what could you do?
Annex D of ISO-19005-2 (and -3) tells you that you have to add an entry to the Names dictionary of the document catalog. Unfortunately, iText 5 doesn't allow you to add your own entries to this names dictionary while creating a file, so you will have to post-process the document.
Suppose that you have a file located in filePath, then you can get the Catalog entry and the Names entry of the Catalog entry like this:
PdfReader reader = new PdfReader(filePath);
PdfDictionary catalog = reader.getCatalog();
PdfDictionary names = catalog.getAsDict(PdfName.NAMES);
You can add entries to this names dictionary. For instance: suppose that I want to add a stream with content some bytes as a custom entry, I would use this code:
public void manipulatePdf(String src, String dest) throws IOException, DocumentException {
PdfReader reader = new PdfReader(src);
PdfStamper stamper = new PdfStamper(reader, new FileOutputStream(dest));
PdfDictionary catalog = reader.getCatalog();
PdfDictionary names = catalog.getAsDict(PdfName.NAMES);
if (names == null) {
names = new PdfDictionary();
}
PdfStream stream = new PdfStream("Some bytes".getBytes());
PdfIndirectObject objref = stamper.getWriter().addToBody(stream);
names.put(new PdfName("ITXT_Custom"), objref.getIndirectReference());
catalog.put(PdfName.NAMES, names);
stamper.close();
reader.close();
}
The result would look like this:
In your case, you don't want to entry named ITXT_Custom. You want to add an entry called XFAResources and the value of that entry should be a name tree consisting of a string name and an indirect reference to a stream. It should be fairly easy to adapt my example to achieve this.
Note: All code provided by me on Stack Overflow can be used under the CC-BY-SA as defined in the Stack Exchange Network Terms of Service. If you do not like the CC-BY-SA, I also provide this code under the same license as used for iText, more specifically the AGPL.

XFA missing filled fields?

I am using pdfbox-1.8.12 to read content from PDF to get XFA.
I have been able to get XFA for most of the files successfully without missing out on any field values.
The trouble is with some files like error.pdf. I have many of the fields having no values like CIN, but when I open the file in any PDF Viewer, foxit or Acrobat it shows that field.
public static byte[] getParsableXFAForm(File file) {
if (file == null)
return null;
PDDocument doc;
PDDocumentCatalog catalog;
PDAcroForm acroForm;
PDXFA xfa;
try {
doc = PDDocument.load(file);
catalog = doc.getDocumentCatalog();
acroForm = catalog.getAcroForm();
xfa = acroForm.getXFA();
byte[] xfaBytes = xfa.getBytes();
doc.close();
return xfaBytes;
} catch (IOException e) {
// handle IOException
// happens when the file is corrupt.
System.out.println("IOException");
return null;
}
}
Then the byte[] is converted to String.
This is the xfa for this file and if you search in this for 'U72300DL1996PLC075672', it would be missing.
This is a normal file, that gives all fields.
Any Ideas? I have tried everything, but my guess is that since readers can see that value, I should be able to as well.
EDIT :
You will have to download the files, you might not be able to view them in the browser.
There are multiple entries of XFA content within the form representing the different states the form had prior to applying the different signatures. As you are using
PDDocument.load(file)
the PDF is parsed sequentially and the most current XFA content is not picked up. If you change that to
PDDocument.loadNonSeq(file,null)
the Xref information is used and the most current XFA is extracted containing the information you are looking for.
Note that for PDFBox 1.8.x one should always use PDDocument.loadNonSeq in order to parse the PDF in line with the specification i.e. by following the Xref information. PDDocument.load should only be used to handle files with (Xref related) parsing errors where a sequential parsing can be a fall back.
For PDFBox 2.x PDDocument.load parses following the Xref i.e. like `PDDocument.loadNonSeq' in 1.8 and sequential parsing is done behind the scenes in case there are errors.