Bad:Converting pdf to images

Bad:Converting pdf to images - pdf

Convert class:
public void getImage(String pdfFilename) throws Exception{
List<byte[]> listImg = new ArrayList<>();
try (final PDDocument document = PDDocument.load(new File(pdfFilename))){
PDFRenderer pdfRenderer = new PDFRenderer(document);
for (int page = 0; page < document.getNumberOfPages(); ++page)
{
File file = new File("C:\\path1\\"+page+".png");
BufferedImage bim = pdfRenderer.renderImage(page);
ByteArrayOutputStream baos = new ByteArrayOutputStream();
ImageIO.write(bim, "png",file);
System.out.println("!!!!");
// System.out.println(Arrays.toString(listImg.get(page)));
}
document.close();
} catch (IOException e){
System.err.println("Exception while trying to create pdf document - " + e);
}
}
Everything works well. All pdf files are converted, but if I use the class shw (this is very necessary for my project):
PdfDocument srcDoc = new PdfDocument(new PdfReader(DEST1));
Rectangle rect = srcDoc.getFirstPage().getPageSize();
System.out.println(rect);
Rectangle pageSize = new Rectangle(rect.getWidth(), rect.getHeight());
PdfDocument pdfDoc = new PdfDocument(new PdfWriter(dest));
pdfDoc.setDefaultPageSize(new PageSize(pageSize));
System.out.println(srcDoc.getNumberOfPages());
PdfCanvas content = new PdfCanvas(pdfDoc.addNewPage());
int n = 0;
for (int i =1 ; i <= srcDoc.getNumberOfPages(); i++) {
PdfFormXObject page = srcDoc.getPage(i).copyAsFormXObject(pdfDoc);
content.clip();
content.newPath();
content.addXObject(page,MainPdf.right_Margin-MainPdf.left_Margin,0);
content = new PdfCanvas(pdfDoc.addNewPage());
for (double y = 4.251969f; y <= 595; y += 14.1732) {
content.moveTo(0, y);
content.lineTo(420, y);
}
for (double x = 0; x <= 420; x += 14.1732) {
content.moveTo(x, 0);
content.lineTo(x, 595);
}
content.closePathStroke();
}
srcDoc.close();
pdfDoc.close();
}
Those images that have been converted to empty (contain nothing inside themselves, just a white background). Pdf not empty.
pdf:https://dropmefiles.com/UXedd
images:

The cause was the call
content.clip();
in the itext segment. This clips with an empty path. Adobe Reader ignores this, but PDFBox doesn't, so the current clipping path is empty so that nothing gets seen.
Per one of the comments, removing that call solves the problem. (I suspect that content.newPath(); isn't needed either)
I have also tried other viewers: PDF.js and GhostScript don't display it, Chrome and Edge display it.

Related

A better solution for Itext 7 Text fitting

I have a project that used Itext 5 and worked as intended.
Program had to put userInput in certain 'Chunks' inside paragraphs. Paragraphs have unmovable (chunks)words per line, and the userInput should scale in the space reserved for the userInput inside paragraph.
Old Project had the following code(made as example)
public class Oldway {
static final transient Font bold2 = FontFactory.getFont("Times-Roman", 10.0f, 1);
public static void main(String[] args) {
Document document = new Document();
document.setPageSize(PageSize.A4);
try {
PdfWriter writer = PdfWriter.getInstance(document, new FileOutputStream(new File("itext5.pdf")));
document.open();
Paragraph title = new Paragraph("Title of doc");
title.setAlignment(1);
document.add(title);
Paragraph dec= new Paragraph();
Chunk ch01 = new Chunk("Prev text ");
dec.add(ch01);
Chunk ch02 = new Chunk(getEmptySpace(42));
dec.add(ch02);
Chunk ch03 = new Chunk(" next Text");
dec.add(ch03);
document.add(dec);
float y = writer.getVerticalPosition(false);
float x2 = document.left() + ch01.getWidthPoint();
float x3 = x2 + ch02.getWidthPoint();
getPlainFillTest("Text to insert", document, y, x3, x2, writer, false);
document.close();
writer.flush();
} catch (FileNotFoundException | DocumentException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
public static Chunk getEmptySpace(int size) {
Chunk ch = new Chunk();
for(int i = 0;i<=size;i++) {
ch.append("\u00a0");
}
return new Chunk(ch);
}
public static void getPlainFillTest(String str,Document document,float y, float x1pos,
float x2pos, PdfWriter writer,boolean withTab) {
if(str.isEmpty() || str.isBlank()) {
str = "________";
}
Rectangle rec2 = null;
if(!withTab)
rec2 = new Rectangle(x2pos, y, x1pos-2,y+10);
else {
rec2 = new Rectangle(x2pos+35, y, x1pos+33,y+10);
}
BaseFont bf = bold2.getBaseFont();
PdfContentByte cb = writer.getDirectContent();
float fontSize = getMaxFontSize(bf, str,(int)rec2.getWidth(), (int)rec2.getHeight());
Phrase phrase = new Phrase(str, new Font(bf, fontSize));
ColumnText.showTextAligned(cb, Element.ALIGN_CENTER, phrase,
// center horizontally
(rec2.getLeft() + rec2.getRight()) / 2,
// shift baseline based on descent
rec2.getBottom() - bf.getDescentPoint(str, fontSize),0);
cb.saveState();//patrulaterul albastru
cb.setColorStroke(Color.BLUE);
cb.rectangle(rec2.getLeft(), rec2.getBottom(), rec2.getWidth(), rec2.getHeight());
cb.stroke();
cb.restoreState();
}
//stackoverflow solution
private static float getMaxFontSize(BaseFont bf, String text, int width, int height){
// avoid infinite loop when text is empty
if(text.isEmpty()){
return 0.0f;
}
float fontSize = 0.1f;
while(bf.getWidthPoint(text, fontSize) < width){
fontSize += 0.1f;
}
float maxHeight = measureHeight(bf, text, fontSize);
while(maxHeight > height){
fontSize -= 0.1f;
maxHeight = measureHeight(bf, text, fontSize);
};
return fontSize;
}
public static float measureHeight(BaseFont baseFont, String text, float fontSize)
{
float ascend = baseFont.getAscentPoint(text, fontSize);
float descend = baseFont.getDescentPoint(text, fontSize);
return ascend - descend;
}}
Now I'm trying to do the same thing in IText 7 and ...is not that easy!
I manage to create a working code, but its messy, and some things don't get the right coordinates. The Itext7 code(made as example):
public class Newway {
public static void main(String[] args) {
PdfWriter writer;
try {
writer = new PdfWriter(new File("test2.pdf"));
PdfDocument document = new PdfDocument(writer);
document.getDocumentInfo().addCreationDate();
document.getDocumentInfo().setTitle("Title");
document.setDefaultPageSize(PageSize.A4);
Document doc = new Document(document);
doc.setFontSize(12);
Paragraph par = new Paragraph();
Text ch01 = new Text("Prev Text ");
par.add(ch01);
Paragraph space = new Paragraph();
space.setMaxWidth(40);
for(int i=0;i<40;i++) {
par.add("\u00a0");
space.add("\u00a0");
}
Text ch02 = new Text(" next text");
par.add(ch02);
doc.add(par);
Paragraph linePara = new Paragraph().add("Test from UserInput")
.setTextAlignment(TextAlignment.CENTER).setBorder(new DottedBorder(1));
float width = doc.getPageEffectiveArea(PageSize.A4).getWidth();
float height = doc.getPageEffectiveArea(PageSize.A4).getHeight();
IRenderer primul = ch01.createRendererSubTree().setParent(doc.getRenderer());
IRenderer spaceR = space.createRendererSubTree().setParent(doc.getRenderer());
LayoutResult primulResult = primul.layout(new LayoutContext(new LayoutArea(1, new Rectangle(width,height))));
LayoutResult layoutResult = spaceR.layout(new LayoutContext(new LayoutArea(1, new Rectangle(width,height))));
Rectangle primulBox = ((TextRenderer) primul).getInnerAreaBBox();
Rectangle rect = ((ParagraphRenderer) spaceR).getInnerAreaBBox();
float rwidth = rect.getWidth();
float rheight = rect.getHeight();
float x = primulBox.getWidth()+ doc.getLeftMargin();
float y = rect.getY()+(rheight*2.05f);//rect.getY() is never accurate, is always below the paragraph. WHY ??
Rectangle towr = new Rectangle(x, y, rwidth, rheight*1.12f);//rheight on default is way too small
PdfCanvas pdfcanvas = new PdfCanvas(document.getFirstPage());
Canvas canvas = new Canvas(pdfcanvas, towr);
//from theinternet
float fontSizeL = 1;
float fontSizeR = 14;
while (Math.abs(fontSizeL - fontSizeR) > 1e-1) {
float curFontSize = (fontSizeL + fontSizeR) / 2;
linePara.setFontSize(curFontSize);
// It is important to set parent for the current element renderer to a root renderer
IRenderer renderer = linePara.createRendererSubTree().setParent(canvas.getRenderer());
LayoutContext context = new LayoutContext(new LayoutArea(1, towr));
if (renderer.layout(context).getStatus() == LayoutResult.FULL) {
// we can fit all the text with curFontSize
fontSizeL = curFontSize;
} else {
fontSizeR = curFontSize;
}
}
canvas.add(linePara);
new PdfCanvas(document.getFirstPage()).rectangle(towr).setStrokeColor(ColorConstants.BLACK).stroke();
canvas.close();
doc.close();
writer.flush();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}}
The questions are
Is there a better, more elegant way to do this ?
Why rect.getY() is always below the paragraph? and how do I get Y to match the Paragraph real Y coorinate ?
Why the default 'rheight' is always too small? but (rheight*1.1f) works ?
(Optional) How do I set tab() space size in IText 7 ?

This way is quite good because it's taking into account all the possible model element settings and implications of the layout process. Your iText 5 alternative was good enough for basic case of Latin-based text without any modifications on the visual side. The iText 7 code you have is much more flexible and will still work if you use more complex layout settings, complex scripts etc. Also I see the iText 5 code is 105 lines in your example while the iText 7 code is 80 lines.
You are adding some magic +(rheight*2.05f); here while in reality what you are missing here is that when you draw via Canvas you don't have your margins defined anymore, so what you really need instead of rect.getY()+(rheight*2.05f); is rect.getY() + doc.getBottomMargin()
The issue comes from the fact that you are calculating rheight as renderer.getInnerAreaBBox() while this calculation does not take into account default margins that are applied to a paragraph. Margins are included into the occupied area but not the inner area bbox. To fix that, use renderer.getOccupiedArea().getBBox() instead. In this case there is not need to multiply rheight by a coefficient anymore.
The visual result is slightly different now but there is no magic constants anymore. Depending on what you are trying to really achieve you can tune the code further (add some margins here and there etc). But the code adapts well to the change in the user text.
Visual result before:
Visual result after:
Resultant code:
PdfWriter writer;
try {
writer = new PdfWriter(new File("test2.pdf"));
PdfDocument document = new PdfDocument(writer);
document.getDocumentInfo().addCreationDate();
document.getDocumentInfo().setTitle("Title");
document.setDefaultPageSize(PageSize.A4);
Document doc = new Document(document);
doc.setFontSize(12);
Paragraph par = new Paragraph();
Text ch01 = new Text("Prev Text ");
par.add(ch01);
Paragraph space = new Paragraph();
space.setMaxWidth(40);
for(int i=0;i<40;i++) {
par.add("\u00a0");
space.add("\u00a0");
}
Text ch02 = new Text(" next text");
par.add(ch02);
doc.add(par);
Paragraph linePara = new Paragraph().add("Test from UserInput")
.setTextAlignment(TextAlignment.CENTER).setBorder(new DottedBorder(1));
float width = doc.getPageEffectiveArea(PageSize.A4).getWidth();
float height = doc.getPageEffectiveArea(PageSize.A4).getHeight();
IRenderer primul = ch01.createRendererSubTree().setParent(doc.getRenderer());
IRenderer spaceR = space.createRendererSubTree().setParent(doc.getRenderer());
LayoutResult primulResult = primul.layout(new LayoutContext(new LayoutArea(1, new Rectangle(width,height))));
LayoutResult layoutResult = spaceR.layout(new LayoutContext(new LayoutArea(1, new Rectangle(width,height))));
Rectangle primulBox = ((TextRenderer) primul).getInnerAreaBBox();
Rectangle rect = ((ParagraphRenderer) spaceR).getOccupiedArea().getBBox();
float rwidth = rect.getWidth();
float rheight = rect.getHeight();
float x = primulBox.getWidth()+ doc.getLeftMargin();
float y = rect.getY() + doc.getBottomMargin();
Rectangle towr = new Rectangle(x, y, rwidth, rheight);
PdfCanvas pdfcanvas = new PdfCanvas(document.getFirstPage());
Canvas canvas = new Canvas(pdfcanvas, towr);
//from theinternet
float fontSizeL = 1;
float fontSizeR = 14;
while (Math.abs(fontSizeL - fontSizeR) > 1e-1) {
float curFontSize = (fontSizeL + fontSizeR) / 2;
linePara.setFontSize(curFontSize);
// It is important to set parent for the current element renderer to a root renderer
IRenderer renderer = linePara.createRendererSubTree().setParent(canvas.getRenderer());
LayoutContext context = new LayoutContext(new LayoutArea(1, towr));
if (renderer.layout(context).getStatus() == LayoutResult.FULL) {
// we can fit all the text with curFontSize
fontSizeL = curFontSize;
} else {
fontSizeR = curFontSize;
}
}
canvas.add(linePara);
new PdfCanvas(document.getFirstPage()).rectangle(towr).setStrokeColor(ColorConstants.BLACK).stroke();
canvas.close();
doc.close();
writer.flush();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}

Merging PDFs using iTextSharp removes Trim Box Detail

I am trying to use iTextSharp to merge 2 or more PDF files. However I am not getting any details about the TrimBox. Performing the code below on the PDF (which was merged) always return NULL
Rectangle rect = reader.GetBoxSize(1, "trim");
This is the code for merging.
public void Merge(List<String> InFiles, String OutFile)
{
using (FileStream stream = new FileStream(OutFile, FileMode.Create))
using (Document doc = new Document())
using (PdfCopy pdf = new PdfCopy(doc, stream))
{
doc.Open();
PdfReader reader = null;
PdfImportedPage page = null;
InFiles.ForEach(file =>
{
reader = new PdfReader(file);
for (int i = 0; i < reader.NumberOfPages; i++)
{
page = pdf.GetImportedPage(reader, i + 1);
pdf.AddPage(page);
}
pdf.FreeReader(reader);
reader.Close();
});
}
}
How to keep I keep the box information after the merge?
-Alan-

Here is the code I created to merge Portrait and Landscape docs using iTextSharp. It works rather well.
public void MergeFiles(System.Collections.Generic.List<string> sourceFiles, string destinationFile)
{
Document document=null;
if (System.IO.File.Exists(destinationFile))
System.IO.File.Delete(destinationFile);
try
{
PdfCopy writer = null;
int numberOfPages=0;
foreach(string sourceFile in sourceFiles)
{
PdfReader reader = new PdfReader(sourceFile);
reader.ConsolidateNamedDestinations();
numberOfPages = reader.NumberOfPages;
if(document==null)
{
document = new Document(reader.GetPageSizeWithRotation(1));
writer = new PdfCopy(document, new FileStream(destinationFile, FileMode.Create));
document.Open();
}
for (int x = 1;x <= numberOfPages;x++ )
{
if (writer != null)
{
PdfImportedPage page = writer.GetImportedPage(reader, x);
writer.AddPage(page);
}
}
PRAcroForm form = reader.AcroForm;
if (form != null && writer != null)
writer.CopyAcroForm(reader);
}
}
finally
{
if (document != null && document.IsOpen())
document.Close();
}
}

How to retain page labels when concatenating an existing pdf with a pdf created from scratch?

I have a code which is creating a "cover page" and then merging it with an existing pdf. The pdf labels were lost after merging. How can I retain the pdf labels of the existing pdf and then add a page label to the pdf page created from scratch (eg "Cover page")? The example of the book I think is about retrieving and replacing page labels. I don't know how to apply this when concatenating an existing pdf with a pdf created from scratch. I am using itext 5.3.0. Thanks in advance.
EDIT
as per comment of mkl
public ByteArrayOutputStream getConcatenatePDF()
{
if (bitstream == null)
return null;
if (item == null)
{
item = getItem();
if (item == null)
return null;
}
ByteArrayOutputStream byteout = null;
InputStream coverStream = null;
try
{
// Get Cover Page
coverStream = getCoverStream();
if (coverStream == null)
return null;
byteout = new ByteArrayOutputStream();
int pageOffset = 0;
ArrayList<HashMap<String, Object>> master = new ArrayList<HashMap<String, Object>>();
Document document = null;
PdfCopy writer = null;
PdfReader reader = null;
byte[] password = (ownerpass != null && !"".equals(ownerpass)) ? ownerpass.getBytes() : null;
// Get infomation of the original pdf
reader = new PdfReader(bitstream.retrieve(), password);
boolean isPortfolio = reader.getCatalog().contains(PdfName.COLLECTION);
char version = reader.getPdfVersion();
int permissions = reader.getPermissions();
// Get metadata
HashMap<String, String> info = reader.getInfo();
String title = (info.get("Title") == null || "".equals(info.get("Title")))
? getFieldValue("dc.title") : info.get("Title");
String author = (info.get("Author") == null || "".equals(info.get("Author")))
? getFieldValue("dc.contributor.author") : info.get("Author");
String subject = (info.get("Subject") == null || "".equals(info.get("Subject")))
? "" : info.get("Subject");
String keywords = (info.get("Keywords") == null || "".equals(info.get("Keywords")))
? getFieldValue("dc.subject") : info.get("Keywords");
reader.close();
// Merge cover page and the original pdf
InputStream[] is = new InputStream[2];
is[0] = coverStream;
is[1] = bitstream.retrieve();
for (int i = 0; i < is.length; i++)
{
// we create a reader for a certain document
reader = new PdfReader(is[i], password);
reader.consolidateNamedDestinations();
if (i == 0)
{
// step 1: creation of a document-object
document = new Document(reader.getPageSizeWithRotation(1));
// step 2: we create a writer that listens to the document
writer = new PdfCopy(document, byteout);
// Set metadata from the original pdf
// the position of these lines is important
document.addTitle(title);
document.addAuthor(author);
document.addSubject(subject);
document.addKeywords(keywords);
if (pdfa)
{
// Set thenecessary information for PDF/A-1B
// the position of these lines is important
writer.setPdfVersion(PdfWriter.VERSION_1_4);
writer.setPDFXConformance(PdfWriter.PDFA1B);
writer.createXmpMetadata();
}
else if (version == '5')
writer.setPdfVersion(PdfWriter.VERSION_1_5);
else if (version == '6')
writer.setPdfVersion(PdfWriter.VERSION_1_6);
else if (version == '7')
writer.setPdfVersion(PdfWriter.VERSION_1_7);
else
; // no operation
// Set security parameters
if (!pdfa)
{
if (password != null)
{
if (security && permissions != 0)
{
writer.setEncryption(null, password, permissions, PdfWriter.STANDARD_ENCRYPTION_128);
}
else
{
writer.setEncryption(null, password, PdfWriter.ALLOW_PRINTING | PdfWriter.ALLOW_COPY | PdfWriter.ALLOW_SCREENREADERS, PdfWriter.STANDARD_ENCRYPTION_128);
}
}
}
// step 3: we open the document
document.open();
// if this pdf is portfolio, does not add cover page
if (isPortfolio)
{
reader.close();
byte[] coverByte = getCoverByte();
if (coverByte == null || coverByte.length == 0)
return null;
PdfCollection collection = new PdfCollection(PdfCollection.TILE);
writer.setCollection(collection);
PdfFileSpecification fs = PdfFileSpecification.fileEmbedded(writer, null, "cover.pdf", coverByte);
fs.addDescription("cover.pdf", false);
writer.addFileAttachment(fs);
continue;
}
}
int n = reader.getNumberOfPages();
// step 4: we add content
PdfImportedPage page;
PdfCopy.PageStamp stamp;
for (int j = 0; j < n; )
{
++j;
page = writer.getImportedPage(reader, j);
if (i == 1) {
stamp = writer.createPageStamp(page);
Rectangle mediabox = reader.getPageSize(j);
Rectangle crop = new Rectangle(mediabox);
writer.setCropBoxSize(crop);
// add overlay text
//<-- Code for adding overlay text -->
stamp.alterContents();
}
writer.addPage(page);
}
PRAcroForm form = reader.getAcroForm();
if (form != null && !pdfa)
{
writer.copyAcroForm(reader);
}
// we retrieve the total number of pages
List<HashMap<String, Object>> bookmarks = SimpleBookmark.getBookmark(reader);
//if (bookmarks != null && !pdfa)
if (bookmarks != null)
{
if (pageOffset != 0)
{
SimpleBookmark.shiftPageNumbers(bookmarks, pageOffset, null);
}
master.addAll(bookmarks);
}
pageOffset += n;
}
if (!master.isEmpty())
{
writer.setOutlines(master);
}
if (isPortfolio)
{
reader = new PdfReader(bitstream.retrieve(), password);
PdfDictionary catalog = reader.getCatalog();
PdfDictionary documentnames = catalog.getAsDict(PdfName.NAMES);
PdfDictionary embeddedfiles = documentnames.getAsDict(PdfName.EMBEDDEDFILES);
PdfArray filespecs = embeddedfiles.getAsArray(PdfName.NAMES);
PdfDictionary filespec;
PdfDictionary refs;
PRStream stream;
PdfFileSpecification fs;
String path;
// copy embedded files
for (int i = 0; i < filespecs.size(); )
{
filespecs.getAsString(i++); // remove description
filespec = filespecs.getAsDict(i++);
refs = filespec.getAsDict(PdfName.EF);
for (PdfName key : refs.getKeys())
{
stream = (PRStream) PdfReader.getPdfObject(refs.getAsIndirectObject(key));
path = filespec.getAsString(key).toString();
fs = PdfFileSpecification.fileEmbedded(writer, null, path, PdfReader.getStreamBytes(stream));
fs.addDescription(path, false);
writer.addFileAttachment(fs);
}
}
}
if (pdfa)
{
InputStream iccFile = this.getClass().getClassLoader().getResourceAsStream(PROFILE);
ICC_Profile icc = ICC_Profile.getInstance(iccFile);
writer.setOutputIntents("Custom", "", "http://www.color.org", "sRGB IEC61966-2.1", icc);
writer.setViewerPreferences(PdfWriter.PageModeUseOutlines);
}
// step 5: we close the document
document.close();
}
catch (Exception e)
{
log.info(LogManager.getHeader(context, "cover_page: getConcatenatePDF", "bitstream_id="+bitstream.getID()+", error="+e.getMessage()));
// e.printStackTrace();
return null;
}
return byteout;
}
UPDATE
Based on mkl's answer, I modified the code above to look like this:
public ByteArrayOutputStream getConcatenatePDF()
{
if (bitstream == null)
return null;
if (item == null)
{
item = getItem();
if (item == null)
return null;
}
ByteArrayOutputStream byteout = null;
try
{
// Get Cover Page
InputStream coverStream = getCoverStream();
if (coverStream == null)
return null;
byteout = new ByteArrayOutputStream();
InputStream documentStream = bitstream.retrieve();
PdfReader coverPageReader = new PdfReader(coverStream);
PdfReader reader = new PdfReader(documentStream);
PdfStamper stamper = new PdfStamper(reader, byteout);
PdfImportedPage page = stamper.getImportedPage(coverPageReader, 1);
stamper.insertPage(1, coverPageReader.getPageSize(1));
PdfContentByte content = stamper.getUnderContent(1);
int n = reader.getNumberOfPages();
for (int j = 2; j <= n; j++) {
//code for overlay text
ColumnText.showTextAligned(stamper.getOverContent(j), Element.ALIGN_CENTER, overlayText,
crop.getLeft(10), crop.getHeight() / 2 + crop.getBottom(), 90);
}
content.addTemplate(page, 0, 0);
stamper.close();
}
catch (Exception e)
{
log.info(LogManager.getHeader(context, "cover_page: getConcatenatePDF", "bitstream_id="+bitstream.getID()+", error="+e.getMessage()));
e.printStackTrace();
return null;
}
return byteout;
}
And then I set the page labels to the cover page. I omitted code not relevant to my question.
/**
*
* #return InputStream the resulting output stream
*/
private InputStream getCoverStream()
{
ByteArrayOutputStream byteout = getCover();
return new ByteArrayInputStream(byteout.toByteArray());
}
/**
*
* #return InputStream the resulting output stream
*/
private byte[] getCoverByte()
{
ByteArrayOutputStream byteout = getCover();
return byteout.toByteArray();
}
/**
*
* #return InputStream the resulting output stream
*/
private ByteArrayOutputStream getCover()
{
ByteArrayOutputStream byteout;
Document doc = null;
try
{
byteout = new ByteArrayOutputStream();
doc = new Document(PageSize.LETTER, 24, 24, 20, 40);
PdfWriter pdfwriter = PdfWriter.getInstance(doc, byteout);
PdfPageLabels labels = new PdfPageLabels();
labels.addPageLabel(1, PdfPageLabels.EMPTY, "Cover page", 1);
pdfwriter.setPageLabels(labels);
pdfwriter.setPageEvent(new HeaderFooter());
doc.open();
//code omitted (contents of cover page)
doc.close();
return byteout;
}
catch (Exception e)
{
log.info(LogManager.getHeader(context, "cover_page", "bitstream_id="+bitstream.getID()+", error="+e.getMessage()));
return null;
}
}
The modified code retained the page labels of the existing pdf (see screenshot 1) (documentStream), but the resulting merged pdf (screenshots 2 and 3) is off by 1 page since a cover page was inserted. As suggested by mkl, I should use page labels to the cover page, but it seems the pdf labels of the imported page was lost. My concern now is how do I set the page labels to the final document state as also suggested by mkl? I suppose I should use PdfWriter but I don't know where to put that in my modified code. Am I correct to assume that after the stamper.close() portion, that is the final state of my document? Thanks again in advance.
Screenshot 1. Notice the actual page 1 labeled Front cover
Screenshot 2. Merged pdf, after the generated on-the-fly "cover page" was inserted. The page label "Front cover" was now assigned to the cover page even after I've set the pdf label of the inserted page using labels.addPageLabel(1, PdfPageLabels.EMPTY, "Cover page", 1)
Screenshot 3. Note that the page label 3 was assigned to page 2.
FINAL UPDATE
Kudos to #mkl
The screenshot below is the result after I applied the latest update of mkl's answer. The pages labels are now assigned correctly to pages. Also, using PdfStamper instead of PdfCopy (as used in my original code) did not break the PDF/A compliance of the existing pdf.

Adding the cover page
Usually using PdfCopy for merging PDFs is the right choice, it creates a new document from the copied pages copying as much of the page-level information as possible not preferring any single document.
Your case is somewhat special, though: You have one document whose structure and content you prefer and want to apply a small change to it by adding a single page, a title page. All the while all information including document-level information (e.g. metadata, embedded files, ...) from the main document shall still be present in the result.
In such a use case it is more appropriate to use a PdfStamper which you use to "stamp" changes onto an existing PDF.
You might want to start from something like this:
try ( InputStream documentStream = getClass().getResourceAsStream("template.pdf");
InputStream titleStream = getClass().getResourceAsStream("title.pdf");
OutputStream outputStream = new FileOutputStream(new File(RESULT_FOLDER, "test-with-title-page.pdf")) )
{
PdfReader titleReader = new PdfReader(titleStream);
PdfReader reader = new PdfReader(documentStream);
PdfStamper stamper = new PdfStamper(reader, outputStream);
PdfImportedPage page = stamper.getImportedPage(titleReader, 1);
stamper.insertPage(1, titleReader.getPageSize(1));
PdfContentByte content = stamper.getUnderContent(1);
content.addTemplate(page, 0, 0);
stamper.close();
}
PS: Concerning questions in comments:
In my code above, I should have an overlay text supposedly (before the stamp.alterContents() portion) but I omitted that part of code for testing purposes. Can you please give me an idea how to implement that?
Do you mean something like an overlayed watermark? The PdfStamper allows you to access an "over content" for each page onto which you can draw any content:
PdfContentByte overContent = stamper.getOverContent(pageNumber);
Keeping page labels
My other question is about page offset, because I inserted the cover page, the page numbering are off by 1 page. How can I resolve that?
Unfortunately iText's PdfStamper does not automatically update the page label definition of the manipulated PDF. Actually this is no wonder because it is not clear how the inserted page is meant to be labeled. #Bruno At least, though, iText could change the page label sections starting after the insertion page number.
Using iText's low level API it is possible, though, to fix the original label positions and add a label for the inserted page. This can be implemented similarly to the iText in Action PageLabelExample example, more exactly its manipulatePageLabel part; simply add this before stamper.close():
PdfDictionary root = reader.getCatalog();
PdfDictionary labels = root.getAsDict(PdfName.PAGELABELS);
if (labels != null)
{
PdfArray newNums = new PdfArray();
newNums.add(new PdfNumber(0));
PdfDictionary coverDict = new PdfDictionary();
coverDict.put(PdfName.P, new PdfString("Cover Page"));
newNums.add(coverDict);
PdfArray nums = labels.getAsArray(PdfName.NUMS);
if (nums != null)
{
for (int i = 0; i < nums.size() - 1; )
{
int n = nums.getAsNumber(i++).intValue();
newNums.add(new PdfNumber(n+1));
newNums.add(nums.getPdfObject(i++));
}
}
labels.put(PdfName.NUMS, newNums);
stamper.markUsed(labels);
}
For a document with these labels:
It generates a document with these labels:
Keeping links
I just found out that the inserted page "Cover Page" lost its link annotations. I wonder if there's a workaround for this, since according to the book, the interactive features of the inserted page are lost when using PdfStamper.
Indeed, among the iText PDF generating classes only Pdf*Copy* keeps interactive features like annotations. Unfortunately one has to decide whether one wants to
create a genuinely new PDF (PdfWriter) with no information from other PDFs beyond contents being embedable;
manipulate a single existing PDF ('PdfStamper') with all information from that one PDF being preserved but no information from other PDFs beyond contents being embedable;
merge any number of existing PDFs (PdfCopy) with most page-level information from all those PDFs being preserved but no document-level information from any.
In your case I thought the new cover page had only static content, no dynamic features, and so assumes the PdfStamper was best. If you only have to deal with links, you may consider copying links manually, e.g. using this helper method
/**
* <p>
* A primitive attempt at copying links from page <code>sourcePage</code>
* of <code>PdfReader reader</code> to page <code>targetPage</code> of
* <code>PdfStamper stamper</code>.
* </p>
* <p>
* This method is meant only for the use case at hand, i.e. copying a link
* to an external URI without expecting any advanced features.
* </p>
*/
void copyLinks(PdfStamper stamper, int targetPage, PdfReader reader, int sourcePage)
{
PdfDictionary sourcePageDict = reader.getPageNRelease(sourcePage);
PdfArray annotations = sourcePageDict.getAsArray(PdfName.ANNOTS);
if (annotations != null && annotations.size() > 0)
{
for (PdfObject annotationObject : annotations)
{
annotationObject = PdfReader.getPdfObject(annotationObject);
if (!annotationObject.isDictionary())
continue;
PdfDictionary annotation = (PdfDictionary) annotationObject;
if (!PdfName.LINK.equals(annotation.getAsName(PdfName.SUBTYPE)))
continue;
PdfArray rectArray = annotation.getAsArray(PdfName.RECT);
if (rectArray == null || rectArray.size() < 4)
continue;
Rectangle rectangle = PdfReader.getNormalizedRectangle(rectArray);
PdfName hightLight = annotation.getAsName(PdfName.H);
if (hightLight == null)
hightLight = PdfAnnotation.HIGHLIGHT_INVERT;
PdfDictionary actionDict = annotation.getAsDict(PdfName.A);
if (actionDict == null || !PdfName.URI.equals(actionDict.getAsName(PdfName.S)))
continue;
PdfString urlPdfString = actionDict.getAsString(PdfName.URI);
if (urlPdfString == null)
continue;
PdfAction action = new PdfAction(urlPdfString.toString());
PdfAnnotation link = PdfAnnotation.createLink(stamper.getWriter(), rectangle, hightLight, action);
stamper.addAnnotation(link, targetPage);
}
}
}
which you can call right after inserting the original page:
PdfImportedPage page = stamper.getImportedPage(titleReader, 1);
stamper.insertPage(1, titleReader.getPageSize(1));
PdfContentByte content = stamper.getUnderContent(1);
content.addTemplate(page, 0, 0);
copyLinks(stamper, 1, titleReader, 1);
Beware, this method is really simple. It only considers links with URI actions and creates a link on the target page using the same location, target, and highlight setting as the original one. If the original one uses more refined features (e.g. if it brings along its own appearance streams or even merely uses the border style attributes) and you want to keep these features, you have to improve the method to also copy the entries for these features to the new annotation.

Copy annotations

I have a source PDF with some Free Text Annotations.
I would like to perform a mail merge like function on PDF. I would like to make a copy of the PDF and replace the Free Text Annotation based on some text replacement method.
For simplicity, I have a program that takes the annotations and add "LHC" behind it. Alas, the copy works, but the annotations remains unchanged.
I would have tried to use PdfAnnotation however, I am unsure how to convert from the PdfDictionary to PdfAnnotation
See my code below
string oldFile = "C:\\Temp\\oldFile.pdf";
string newFile = "C:\\Temp\\newFile.pdf";
// open the reader
PdfReader reader = new PdfReader(oldFile);
Rectangle size = reader.GetPageSizeWithRotation(1);
Document document = new Document(size);
// open the writer
FileStream fs = new FileStream(newFile, FileMode.Create, FileAccess.Write);
PdfCopy writer = new PdfCopy(document,fs);
document.Open();
// the pdf content
PdfContentByte cb = writer.DirectContent;
// adding Free Text Annotation
for (int pg = 1; pg < reader.NumberOfPages; pg++)
{
PdfDictionary pageDict = reader.GetPageN(pg);
PdfArray annotArray = pageDict.GetAsArray(PdfName.ANNOTS);
for (int i = 0; i < annotArray.Size; ++i)
{
PdfDictionary curAnnot = annotArray.GetAsDict(i);
PdfName contents = new PdfName("Contents");
PdfString str = curAnnot.GetAsString(contents);
String newString = str.ToString() + "LHC";
curAnnot.Remove(contents);
curAnnot.Put(contents, new PdfString(newString));
}
PdfImportedPage page = writer.GetImportedPage(reader, pg);
// PdfImportedPage pageOut = writer.destinationPdfReader(reader, pg);
//cb.AddTemplate(page, 0, 0);
writer.AddPage(page);
PdfAnnotation annot = new PdfAnnotation(writer, new Rectangle(0, 0));
writer.AddAnnotation(annot);
}
document.Close();
fs.Close();
writer.Close();
reader.Close();

References:
http://itextsharp.10939.n7.nabble.com/How-to-edit-annotations-td3352.html
(There is another link in stackoverflow, that I can't find, when I find it I will add it here)
The steps:
Step 1. Create a stamper from a reader.
Step 2. Read all the annotations
Step 3. Delete a set of keys and as a fallback any dictionary items
You now have performed an edit/copy of the annotation and changed the values.
The following is the code:
// Step 1. Create the stamper
string oldFile = "C:\\Temp\\oldFile.pdf";
string newFile = "C:\\Temp\\newFile.pdf";
// open the reader
PdfReader reader = new PdfReader(oldFile);
Rectangle size = reader.GetPageSizeWithRotation(1);
Document document = new Document(size);
// open the writer
// remember to set the page size before opening document
// otherwise the page is already set.
/* chapter02/HelloWorldMetadata.java */
document.Open();
// the pdf content
// cb does not work with stamper
// create the new pagez and add it to the pdf
// this segment of code is meant for writer
FileStream fs = new FileStream(newFile, FileMode.Create, FileAccess.ReadWrite);
PdfStamper writer = new PdfStamper(reader, fs, reader.PdfVersion, false);
for (int pg = 1; pg < reader.NumberOfPages; pg++)
{
// taken from http://itextsharp.10939.n7.nabble.com/How-to-edit-annotations-td3352.html
PdfDictionary pagedic = reader.GetPageN(pg);
PdfArray annotarray = (PdfArray)PdfReader.GetPdfObject(pagedic.Get(PdfName.ANNOTS));
if (annotarray == null || annotarray.Size == 0)
continue;
// step 2. read all the annotations
foreach (PdfIndirectReference annot in annotarray.ArrayList)
{
PdfDictionary annotationDic = (PdfDictionary)PdfReader.GetPdfObject(annot);
PdfName subType = (PdfName)annotationDic.Get(PdfName.SUBTYPE);
if (subType.Equals(PdfName.TEXT) || subType.Equals(PdfName.FREETEXT))
{
// 3. Change values of different properties of a certain annotation and delete a few keys & dictionaries
annotationDic.Put(PdfName.CONTENTS, new PdfString("These are changed contents", PdfObject.TEXT_UNICODE));
}
PdfString contents = annotationDic.GetAsString(PdfName.CONTENTS);
if (contents != null)
{
String value = contents.ToString();
annotationDic.Put(PdfName.CONTENTS, new PdfString(value));
annotationDic.Remove(PdfName.AP);
List<PdfName> tobeDel = new List<PdfName>();
foreach (PdfName key in annotationDic.Keys)
{
if (key.CompareTo(PdfName.AP) == 0 ||
key.CompareTo(PdfName.RC) == 0 ||
annotationDic.Get(key).IsDictionary())
{
tobeDel.Add(key);
}
}
foreach (PdfName key in tobeDel)
{
annotationDic.Remove(key);
}
}
writer.MarkUsed(annotationDic);
}
if ((pg + 1) < reader.NumberOfPages)
{
document.NewPage();
}
}
// close the streams and voilá the file should be changed :)
writer.Close();
reader.Close();

How can I copy annotations/highlighting from one PDF to an updated version of that PDF?

I've recently moved to almost exclusively electronic books. I prefer to mark up documents with highlighting or annotations as I read them.
However, when I get an updated version of a PDF - O'Reilly, for example, will give access to corrected versions of the books you've purchased - I'm then stuck with a marked up older copy and a newer copy, without my notes.
My preferred language being C# I realize that iTextSharp is probably what I'd need to use if I wanted to programmatically do this (see for example Copy pdf annotations via C#), but is there an easier way to handle this?
I can't believe I'm the only one with this issue, so is there perhaps already a solution that will handle this for me?

You can use this example for iTextSharp to approach your problem:
var output = new MemoryStream();
using (var document = new Document(PageSize.A4, 70f, 70f, 20f, 20f))
{
var readers = new List<PdfReader>();
var writer = PdfWriter.GetInstance(document, output);
writer.CloseStream = false;
document.Open();
const Int32 requiredWidth = 500;
const Int32 zeroBottom = 647;
const Int32 left = 50;
Action<String, Action> inlcudePdfInDocument = (filename, e) =>
{
var reader = new PdfReader(filename);
readers.Add(reader);
var pageCount = reader.NumberOfPages;
for (var i = 0; i < pageCount; i++)
{
e?.Invoke();
var imp = writer.GetImportedPage(reader, (i + 1));
var scale = requiredWidth / imp.Width;
var height = imp.Height * scale;
writer.DirectContent.AddTemplate(imp, scale, 0, 0, scale, left, zeroBottom - height);
var annots = reader.GetPageN(i + 1).GetAsArray(PdfName.ANNOTS);
if (annots != null && annots.Size != 0)
{
foreach (var a in annots)
{
var newannot = new PdfAnnotation(writer, new Rectangle(0, 0));
var annotObj = (PdfDictionary) PdfReader.GetPdfObject(a);
newannot.PutAll(annotObj);
var rect = newannot.GetAsArray(PdfName.RECT);
rect[0] = new PdfNumber(((PdfNumber)rect[0]).DoubleValue * scale + left); // Left
rect[1] = new PdfNumber(((PdfNumber)rect[1]).DoubleValue * scale); // top
rect[2] = new PdfNumber(((PdfNumber)rect[2]).DoubleValue * scale + left); // right
rect[3] = new PdfNumber(((PdfNumber)rect[3]).DoubleValue * scale); // bottom
writer.AddAnnotation(newannot);
}
}
document.NewPage();
}
}
foreach (var apprPdf in pdfs)
{
document.NewPage();
inlcudePdfInDocument(apprPdf.Pdf, null);
}
document.Close();
readers.ForEach(x => x.Close());
}
output.Position = 0;
return output;
This example copies a list of pdf files with annotations into a new pdf file.
Obtain data from two PdfReaders simultaneously - one for copying new pdf and another for copying annotations from old pdf.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Bad:Converting pdf to images - pdf

Related

A better solution for Itext 7 Text fitting

Merging PDFs using iTextSharp removes Trim Box Detail

How to retain page labels when concatenating an existing pdf with a pdf created from scratch?

Copy annotations

How can I copy annotations/highlighting from one PDF to an updated version of that PDF?

Categories

Resources