Copying pages from one PDF to another gives an error on save - pdf

I'm trying to take pages from one PDF, scale them down, and put them side-by-side in another PDF. To do this I make an intermediate PDF that has all of the pages from the source scaled down to the size I need to place them side-by-side. Then I go thought the scaled PDF and copy the pages two at a time to the final PDF. My thinking is that I'm down with the scaled PDF so I can close it but when I do that I get an error trying to save the final PDF that says
COSStream has been closed and cannot be read. Perhaps its enclosing PDDocument has been closed?
I'm not sure why the intermediate doc should matter when I try to save the final doc. It could be that I'm doing something wrong in the copying of pages? Here's the code I use for that:
private PDDocument sideBySide(PaperSize paperSize, PaperSize pageSize) throws IOException {
PDRectangle targetPaperSize = getRect(paperSize);
PDRectangle targetPageSize = getRect(pageSize);
PDDocument scaledDoc = scaleDoc(pageSize, doc);
PDDocument outputDoc = new PDDocument();
final double theta = Math.PI / 2;
for (int offset = 0; offset < scaledDoc.getNumberOfPages() - 1; offset+=2) {
PDPage twoUp = new PDPage(targetPaperSize);
twoUp.setRotation(90);
twoUp.setResources(new PDResources());
outputDoc.addPage(twoUp);
PDPage leftPage = scaledDoc.getPage(offset);
PDPage rightPage = scaledDoc.getPage(offset + 1);
PDFormXObject leftObject = importAsXObject(outputDoc, leftPage);
twoUp.getResources().add(leftObject);
PDFormXObject rightObject = importAsXObject(outputDoc, rightPage);
twoUp.getResources().add(rightObject);
PDPageContentStream content = new PDPageContentStream(outputDoc, twoUp);
AffineTransform leftTrans = AffineTransform.getRotateInstance(theta);
leftTrans.concatenate(AffineTransform.getTranslateInstance(0, -targetPageSize.getHeight()));
AffineTransform rightTrans = AffineTransform.getRotateInstance(theta);
rightTrans.concatenate(AffineTransform.getTranslateInstance(targetPageSize.getWidth(), -targetPageSize.getHeight()));
leftObject.setMatrix(leftTrans);
rightObject.setMatrix(rightTrans);
content.drawForm(leftObject);
content.drawForm(rightObject);
content.close();
}
scaledDoc.close();
return outputDoc;
}

Related

How to extract a portion of a page and write to a new PDF file in itext7?

I want to divide a PDF page in to 4 quadrants. Then write each quadrant in to separate PDF page (or a document). I don't want to crop the existing page, but extract the contents of each quadrant and write it in to a new PDF file. Is there a way to do this using itext7?
I want to mention that the documentation for itextsharp and itext7 is bad and lacking in many ways - the book "iText in Action 2nd Edition" is the only help, if you are willing to read a book, and the examples are only in Java and some of the code is implemented in a different way in C#, not to mention that this is only on itextsharp 5.
For future reference - assuming you need equal parts split, here is what will do the trick( this is for 4x4 - that is 16 parts):
public void manipulatePdf(string src, string dest)
{
PdfReader reader = new PdfReader(src);
iTextSharp.text.Rectangle pagesize = reader.GetPageSizeWithRotation(1);
Document document = new Document(pagesize);
PdfWriter writer = PdfWriter.GetInstance(document,
new FileStream(dest, FileMode.OpenOrCreate, FileAccess.ReadWrite, FileShare.ReadWrite));
document.Open();
PdfContentByte content = writer.DirectContent;
PdfImportedPage page = writer.GetImportedPage(reader, 1);
float x, y;
for (int i = 0; i< 16; i++)
{
x = -pagesize.Width * (i % 4);
y = pagesize.Height * (i / 4 - 3);
content.AddTemplate(page, 4, 0, 0, 4, x, y);
document.NewPage();
}
document.Close();
}

When using iText to generate a PDF, if I need to switch fonts many times the file size becomes too large

I have a section of my PDF in which I need to use one font for its unicode symbol and the rest of the paragraph should be a different font. (It is something like "1. a 2. b 3. c" where "1." is the unicode symbol/font and "a" is another font) I have followed the method Bruno describes here: iText 7: How to build a paragraph mixing different fonts? and it works fine to generate the PDF. The issue is that the file size of the PDF goes from around 20MB to around 100MB compared to using only one font and one Text element. This section is used repeatedly in the document thousands of times. I am wondering if there is a way to reduce the impact of switching fonts or to reduce the file size of the entire document in some way.
Style creation pseudocode:
Style style1 = new Style();
Style style2 = new Style();
PdfFont font1 = PdfFontFactory.createFont(FontProgramFactory.createFont(fontFile1), PdfEncodings.IDENTITY_H, true);
style1.setFont(font1).setFontSize(8f).setFontColor(Color.DARK_GRAY);
PdfFont font2 = PdfFontFactory.createFont(FontProgramFactory.createFont(fontFile2), "", false);
style2.setFont(font2).setFontSize(8f).setFontColor(Color.DARK_GRAY);
Writing text/paragraph pseudocode:
Div div = new Div().setPaddingLeft(3).setMarginBottom(0).setKeepTogether(true);
Paragraph paragraph = new Paragraph();
loop up to 25 times: {
Text unicodeText = new Text(unicodeSymbol + " ").addStyle(style1);
paragraph.add(unicodeText);
Text plainText = new Text(plainText + " ").addStyle(style2);
paragraph.add(plainText);
}
div.add(paragraph);
This writing of text/paragraph is done thousands of times and makes up most of the document. Basically the document consists of thousands of "buildings" that have corresponding codes and the codes have categories. I need to have the index for the category as the unicode symbol and then all of the corresponding codes within the paragraph for the building.
Here is reproducable code:
float offSet = 50;
Integer leading = 10;
DateFormat format = new SimpleDateFormat("yyyy_MM_dd_kkmmss");
String formattedDate = format.format(new Date());
String path = "/tmp/testing_pdf_"+formattedDate + ".pdf";
File targetPdfFile = new File(path);
PdfWriter writer = new PdfWriter(path, new WriterProperties().addXmpMetadata());
PdfDocument pdf = new PdfDocument(writer);
pdf.setTagged();
PageSize pageSize = PageSize.LETTER;
Document document = new Document(pdf, pageSize);
document.setMargins(offSet, offSet, offSet, offSet);
byte[] font1file = IOUtils.toByteArray(FileUtility.getInputStreamFromClassPath("fonts/Garamond-Premier-Pro-Regular.ttf"));
byte[] font2file = IOUtils.toByteArray(FileUtility.getInputStreamFromClassPath("fonts/Quivira.otf"));
PdfFont font1 = PdfFontFactory.createFont(FontProgramFactory.createFont(font1file), "", true);
PdfFont font2 = PdfFontFactory.createFont(FontProgramFactory.createFont(font2file), PdfEncodings.IDENTITY_H, true);
Style style1 = new Style().setFont(font1).setFontSize(8f).setFontColor(Color.DARK_GRAY);
Style style2 = new Style().setFont(font2).setFontSize(8f).setFontColor(Color.DARK_GRAY);
float columnGap = 5;
float columnWidth = (pageSize.getWidth() - offSet * 2 - columnGap * 2) / 3;
float columnHeight = pageSize.getHeight() - offSet * 2;
Rectangle[] columns = {
new Rectangle(offSet, offSet, columnWidth, columnHeight),
new Rectangle(offSet + columnWidth + columnGap, offSet, columnWidth, columnHeight),
new Rectangle(offSet + columnWidth * 2 + columnGap * 2, offSet, columnWidth, columnHeight)};
document.setRenderer(new ColumnDocumentRenderer(document, columns));
for (int j = 0; j < 5000; j++) {
Div div = new Div().setPaddingLeft(3).setMarginBottom(0).setKeepTogether(true);
Paragraph paragraph = new Paragraph().setFixedLeading(leading);
// StringBuilder stringBuilder = new StringBuilder();
for (int i = 0; i < 26; i++) {
paragraph.add(new Text("\u3255 ").addStyle(style2));
paragraph.add(new Text("test ").addStyle(style1));
// stringBuilder.append("\u3255 ").append(" test ");
}
// paragraph.add(stringBuilder.toString()).addStyle(style2);
div.add(paragraph);
document.add(div);
}
document.close();
In creating the reproducible code I have found this this is related to the document being tagged. If you remove the line that marks it as tagged it reduces the file size greatly.
You can also reduce the file size by using the commented out string builder with one font instead of two. (Comment out the two "paragraph.add"s in the for-loop) This mirrors the issue I have in my code.
The problem is not in fonts themselves. The issues comes from the fact that you are creating a tagged PDF. Tagged documents have a lot of PDF objects in them that need a lot of space in the file.
I wasn't able to reproduce your 20MB vs 100MB results. On my machine whether with one font or with two fonts, but with two Text elements, the resultant file size is ~44MB.
To compress file when creating large tagged documents, you should use full compression mode which compresses all PDF objects, not only streams.
To activate full compression mode, create a PdfWriter instance with WriterProperties:
PdfWriter writer = new PdfWriter(outFileName,
new WriterProperties().setFullCompressionMode(true));
This setting reduced the file size for me from >40MB to ~5MB.
Please note that you are using iText 7.0.x while 7.1.x line has already been released and is now the main line of iText, so I recommend that you update to the latest version.

Using pdfbox - how to get the font from a COSName?

How to get the font from a COSName?
The solution I'm looking for looks somehow like this:
COSDictionary dict = new COSDictionary();
dict.add(fontname, something); // fontname COSName from below code
PDFontFactory.createFont(dict);
If you need more background, I added the whole story below:
I try to replace some string in a pdf. This succeeds (as long as all text is stored in one token). In order to keep the format I like to re-center the text. As far as I understood I can do this by getting the width of the old string and the new one, do some trivial calculation and setting the new position.
I found some inspiration on stackoverflow for replacing https://stackoverflow.com/a/36404377 (yes it has some issues, but works for my simple pdf's. And How to center a text using PDFBox. Unfortunatly this example uses a font constant.
So using the first link's code I get a handling for operator 'TJ' and one for 'Tj'.
PDFStreamParser parser = new PDFStreamParser(page);
parser.parse();
java.util.List<Object> tokens = parser.getTokens();
for (int j = 0; j < tokens.size(); j++)
{
Object next = tokens.get(j);
if (next instanceof Operator)
{
Operator op = (Operator) next;
// Tj and TJ are the two operators that display strings in a PDF
if (op.getName().equals("Tj"))
{
// Tj takes one operator and that is the string to display so lets
// update that operator
COSString previous = (COSString) tokens.get(j - 1);
String string = previous.getString();
String replaced = prh.getReplacement(string);
if (!string.equals(replaced))
{ // if changes are there, replace the content
previous.setValue(replaced.getBytes());
float xpos = getPosX(tokens, j);
//if (true) // center the text
if (6 * xpos > page.getMediaBox().getWidth()) // check if text starts right from 1/xth page width
{
float fontsize = getFontSize(tokens, j);
COSName fontname = getFontName(tokens, j);
// TODO
PDFont font = ?getFont?(fontname);
// TODO
float widthnew = getStringWidth(replaced, font, fontsize);
setPosX(tokens, j, page.getMediaBox().getWidth() / 2F - (widthnew / 2F));
}
replaceCount++;
}
}
Considering the code between the TODO tags, I will get the required values from the token list. (yes this code is awful, but for now it let's me concentrate on the main issue)
Having the string, the size and the font I should be able to call the getWidth(..) method from the sample code.
Unfortunatly I run into trouble to create a font from the COSName variable.
PDFont doesn't provide a method to create a font by name.
PDFontFactory looks fine, but requests a COSDictionary. This is the point I gave up and request help from you.
The names are associated with font objects in the page resources.
Assuming you use PDFBox 2.0.x and that page is a PDPage instance, you can resolve the name fontname using:
PDFont font = page.getResources().getFont(fontname);
But the warning from the comments to the questions you reference remain: This approach will work only for very simple PDFs and might even damage other ones.
try {
//Loading an existing document
File file = new File("UKRSICH_Mo6i-Spikyer_z1560-FAV.pdf");
PDDocument document = PDDocument.load(file);
PDPage page = document.getPage(0);
PDResources pageResources = page.getResources();
System.out.println(pageResources.getFontNames() );
for (COSName key : pageResources.getFontNames())
{
PDFont font = pageResources.getFont(key);
System.out.println("Font: " + font.getName());
}
document.close();
}

Unable to add margins in iTextSharp document having images

Requirement:
A large image (dynamic) needs to be split and shown in PDF pages. If image can't be accomodated in one page then we need to add another page and try to fit the remaining portion and so on.
So far I am able to split the image in multiple pages, however it appears that they are completely ignoring the margin values and so images are shown without any margins.
Please see below code:
string fileStringReplace = imageByteArray.Replace("data:image/jpeg;base64,", "");
Byte[] imageByte = Convert.FromBase64String(fileStringReplace);
iTextSharp.text.Image image = iTextSharp.text.Image.GetInstance(imageByte);
float w = image.ScaledWidth;
float h = image.ScaledHeight;
float cropHeight = 1500f;
iTextSharp.text.Rectangle page = new iTextSharp.text.Rectangle(1150f, cropHeight);
var x = page.Height;
Byte[] created;
iTextSharp.text.Document document = new iTextSharp.text.Document(page, 20f, 20f, 20f, 40f); --This has no impact
using (var outputMemoryStream = new MemoryStream())
{
PdfWriter writer = PdfWriter.GetInstance(document, outputMemoryStream);
writer.CloseStream = false;
document.Open();
PdfContentByte canvas = writer.DirectContentUnder;
float usedHeights = h;
while (usedHeights >= 0)
{
usedHeights -= cropHeight;
document.SetPageSize(new iTextSharp.text.Rectangle(1150f, cropHeight));
canvas.AddImage(image, w, 0, 0, h, 0, -usedHeights);
document.NewPage();
}
document.Close();
created = outputMemoryStream.ToArray();
outputMemoryStream.Write(created, 0, created.Length);
outputMemoryStream.Position = 0;
}
return created;
I also tried to set margin in the loop by document.SetMargins() - but that's not working.
You are mixing different things.
When you create margins, be it while constructing the Document instance or by using the setMargins() method, you create margins for when you let iText(Sharp) decide on the layout. That is: the margins will be respected when you do something like document.Add(image).
However, you do not allow iText to create the layout. You create a PdfContentByte named canvas and you decide to add the image to that canvas using a transformation matrix. This means that you will calculate the a, b, c, d, e, and f value needed for the AddImage() method.
You are supposed to do that Math. If you want to see a margin, then the values w, 0, 0, h, 0, and -usedHeights are wrong, and you shouldn't blame iTextSharp, you should blame your lack of insight in analytical geometrics (that's the stuff you learn in high school at the age of 16).
This might be easier for you:
iTextSharp.text.Image image = iTextSharp.text.Image.GetInstance(imageByte);
float w = image.ScaledWidth;
float h = image.ScaledHeight;
// For the sake of simplicity, I don't crop the image, I just add 20 user units
iTextSharp.text.Rectangle page = new iTextSharp.text.Rectangle(w + 20, h + 20);
iTextSharp.text.Document document = new iTextSharp.text.Document(page);
PdfWriter writer = PdfWriter.GetInstance(document, outputMemoryStream);
// Please drop the line that prevents closing the output stream!
// Why are so many people making this mistake?
// Who told you you shouldn't close the output stream???
document.Open();
// We define an absolute position for the image
// it will leave a margin of 10 to the left and to the bottom
// as we created a page that is 20 user units to wide and to high,
// we will also have a margin of 10 to the right and to the top
img.SetAbsolutePosition(10, 10);
document.Add(Image);
document.Close();
Note that SetAbsolutePosition() also lets you take control, regardless of the margins, as an alternative, you could use:
iTextSharp.text.Image image = iTextSharp.text.Image.GetInstance(imageByte);
float w = image.ScaledWidth;
float h = image.ScaledHeight;
// For the sake of simplicity, I don't crop the image, I just add 20 user units
iTextSharp.text.Rectangle page = new iTextSharp.text.Rectangle(w + 20, h + 20);
iTextSharp.text.Document document = new iTextSharp.text.Document(page, 10, 10, 10, 10);
PdfWriter writer = PdfWriter.GetInstance(document, outputMemoryStream);
// Please drop the line that prevents closing the output stream!
// Why are so many people making this mistake?
// Who told you you shouldn't close the output stream???
document.Open();
// We add the image to the document, and we let iTextSharp decide where to put it
// As there is just sufficient space to fit the image inside the page, it should fit,
// But be aware of the existence of a leading; that could create side-effects
// such as forwarding the image to the next page because it doesn't fit vertically
document.Add(Image);
document.Close();

PDF Watermark for printing only, programmatically

I can watermark any PDF already, and the images inside, everything ok, but now I need the watermark only showing up when the PDF is printed... Is this possible? How?
I need to do this programmatically of course.
For future readers, this is possible to do by wrapping the watermark in a PDF layer (Optional Content Group), then configuring the Usage attribute of this layer as Print-Only. See the PDF Reference Document, Chapter 4-Graphics, part 4.10-Optional Content for more details.
Specifically, using itextsharp, I was able to get it working with the following, specifically - pdf version 1.7, and SetPrint("Watermark",true)
string oldfile = #"c:\temp\oldfile.pdf";
string newFile = #"c:\temp\newfile.pdf";
PdfReader pdfReaderS = new PdfReader(oldfile);
Document document = new Document(pdfReaderS.GetPageSizeWithRotation(1));
PdfWriter pdfWriterD = PdfWriter.GetInstance(document, new FileStream(newFile, FileMode.Create, FileAccess.Write));
pdfWriterD.SetPdfVersion(PdfWriter.PDF_VERSION_1_7);
document.Open();
PdfContentByte pdfContentByteD = pdfWriterD.DirectContent;
BaseFont bf = BaseFont.CreateFont(BaseFont.HELVETICA, BaseFont.CP1252, BaseFont.NOT_EMBEDDED);
int n = pdfReaderS.NumberOfPages;
string text = "UNCONTROLLED";
for (int i = 1; i <= n; i++)
{
iTextSharp.text.Rectangle pageSizeS = pdfReaderS.GetPageSizeWithRotation(i);
float pageWidth = pageSizeS.Width / 2;
float pageheight = pageSizeS.Height / 2;
document.SetPageSize(pageSizeS);
document.NewPage();
PdfImportedPage pdfImportedPage = pdfWriterD.GetImportedPage(pdfReaderS, i);
PdfLayer layer1 = new PdfLayer("Watermark", pdfWriterD);
layer1.SetPrint("Watermark", true);
layer1.View = false;
layer1.On = false;
layer1.OnPanel = false;
pdfContentByteD.BeginLayer(layer1);
pdfContentByteD.SetColorFill(BaseColor.RED);
pdfContentByteD.SetFontAndSize(bf, 30);
ColumnText.ShowTextAligned(pdfContentByteD, Element.ALIGN_CENTER, new Phrase(text), 300, 700, 0);
pdfContentByteD.EndLayer();
pdfContentByteD.AddTemplate(pdfImportedPage, 0, 0);//, 0, 1, 0, 0);
}
document.Close();
pdfReaderS.Close();
You should probably make use of the fact that the screen uses RGB and the printer CMYK. You should be able to create two colors in CMYK that map to the same RGB value. This is of course not enough against a determined specialist.
The bOnScreen parameter determines whether the watermark will be displayed when the PDF is viewed on the computer screen, and bOnPrint determines whether it will be displayed when the PDF is printed.
-- https://acrobatusers.com/tutorials/watermarking-a-pdf-with-javascript