I have a system that generates PDF-files using itextsharp and mails them to my users. And the files grows in a way that is not ok.
I start of with one 1 page word document 28 KB.
I print this one page word document using adobes printer and pdf file gets 73 KB.
I open the document in Adobe Acrobat X, insert my forms and save, 1055 KB.
I load the document in itextsharp and set the 30 different values and now my file is 2031 KB.
Is there any compression flags or tricks that can be set in itextsharp or in Adobe that keeps my file at ~73 KB. I don't add any images or any other media. just text.
BR
Andreas
I've had similar fun with PDF file size growing. I've gotten form template generation down to a process where I get my template down to the smallest size possible before I send it to itextsharp to have it populate it.
In your step 3 you are adding your acrofields for your form. After you've finished adding all your fields and saved the document, Used the "Reduce File Size..." option to help shrink the document a little bit. Then you can also use the PDF Optimizer to further reduce file size. I personally use a work around when I use the Create PDF from multiple files feature, but only add the one document I'm working on and then select the smaller file size option for lower quality optimized PDFs.
Then in step 4 it depends on how you are populating and generating the PDF files. Our process uses the template PDFs generated and copies each page onto a new document with the form fields filled. When copying instead of using the PDFCopy class we use the PDFSmartCopy class which will copy content to the new document, but will not duplicate content that is identical. After switching to the smart copy class we saw a significant reduction in file size generated by itextsharp.
Hope this helps.
Related
I am using Google Sheets for a few years now. I have several bigger sheets running. One of these creates via an apps script a PDF doc out of one single sheet. The code is running well since over three years now. But since a few days, the created PDF files are no more searchable and the file size of those single PDF files increases from about 150kb to 2MB (2 DIN A4 pages) and the other ones from about 1.5MB to over 12MB (8 DIN A4 pages).
Below is the part of the code, how I create the PDF file. The gid is just an example.
var url = "https://docs.google.com/spreadsheets/d/"+ssID+"/export"+
"?format=pdf&"+
"gid=1234567890&"+
"size=A4&"+
"fzr=true&"+
"portrait=true&"+
"fitw=false&"+
"gridlines=false&"+
"printtitle=false&"+
"sheetnames=false&"+
"pagenum=CENTER&"+
"top_margin=0.5&"+
"left_margin=0.8&"+
"right_margin=0.5&"+
"bottom_margin=0.8&"+
"attachment=true";
var blob = pdf.getBlob().getAs('application/pdf').setName(namearchive + '.pdf');
var file = foldersave.createFile(blob);
Does anyone have any idea, why this now changed? Has Google changed something in the app?
The only thing I found was perhaps this bug related to here, but not sure if this is the same: File size increases when converting an html file to adobe pdf using google apps script
Would be very nice, if someone has an idea.
Many thanks.
After a lot I found now out, that the file size depends on the used font. The sheet was created with the font "Calibri". When I print one sheet e.g. the size is 1.3MB. When I change the font to "Arial" the font size is 53kb.
I've got a version 1.4 PDF created by using the R-function "pdf". The file contains six pages and has 135 KB. Now I want each of these pages in a separate file in order to include it as picture in Latex. Since I have not only the Adobe Reader deleting pages isn't a problem, but after a page is deleted from the document Adobe Acrobat automatically changes the version to 1.6, which then causes problems in Latex.
I've now tried to save it as version 1.4 PDF, which itself isn't a problem, but the file size then increases from 28 KB to 759 KB and my final PDF mustn't be larger than 3 MB. I've already played a bit with the compression settings, but the size doesn't really change. Why does Adobe change the version automatically and how can I extract the pages without blowing up the size that much?
Acrobat is always setting the PDF version to its own level, even if the file itself would be compliant to an earlier standard. It has been doing so since Acrobat 2…
You can control quite a few things when you do Save as… --> Optimized PDF. There you can also set the standard at which the document is saved, and many more things.
About the file size, it really depends on what your document contains. It is also possible that your PDF creation tool creates an incomplete document, and saving it in Acrobat will create a more complete one (think of embedded fonts, etc.).
I want to use MigraDoc/PdfSharp to create and store PDF documents.
Is there a way to show these documents in an application on-screen? I'd like to show the print in my program rather than starting Acrobat Reader with the document name.
I considered storing the print using XPS instead of PDF, but then I'd need to way to convert XPS to PDF for mailing it to customers. And I don't want to save the same print in two formats for space reasons.
MigraDoc can save files in its own format "MigraDoc DDL". You can preview MDDDL on the screen, create PDF or RTF from it or print it.
Disadvantage: images are not included in the MDDDL file (OTOH this can be an advantage as images can be shared between several documents).
You can ZIP document plus images for storage.
PDFsharp can create PDF files from XPS (but this is in a beta state and not fully operational).
I am not sure this question belongs on a programming forum but then again not sure where it would.
I currently open any PDF documents in Adobe Acrobat 9 Pro when reading or editing files. Many times, I want to make a change to the text in those files and will simply use the Tools->Advanced Editing->Touch Up Text Tool to do so.
No issues with the actual text changes but when I go back to save the file, the file size increases drastically. Even after running Advanced->PDF Optimizer and Document->Reduce File Size, the size is still much larger than the previous file, in many cases even if I am reducing the amount of text on that page.
It is quite frustrating. I am sure entire books have been written about proper PDF compression but take one text only document I have for example: file size is 110KB for a 12 page document. We just migrated to Google Apps and an entire 72 page PDF was under 600 KB.
Am I missing something?
Save as... your document after some changes.
Sounds like the font data is being embedded into the PDF when you edit it. Run Acrobat's Space Audit on the original and modified PDF to determine what is taking up the extra space in the modified PDF.
This is a bit more of a fun question than a serious one, but how does the Adobe PDF format make documents so... portable?
I just created a small Word document, 235kb in size, containing multiple color photos and a few textual phrases. A PDF created using CutePDF (which I understand isn't the most efficient method of PDF creation) is only 176kb. That's a 25% compression ratio. When those files are placed into a compressed folder, the PDF is capable of 3% compression where the .docx can only take 2%. I'm sure that larger files would have even greater differences in size.
My question is, how does Adobe manage to make their files so much smaller? I understand that they are drawn from raster graphics, but my 3 bitmap files really can't be helped from raster that much, can they?
If you have Acrobat 9 there is a nice tool built-in so you can see how the PDF was put together (and compressions used). There is a blog post explaining how to use it at http://pdf.jpedal.org/java-pdf-blog/bid/10479/Viewing-PDF-objects
There are a few ways it can be compressing this:
Pdf files use lzw and zip compression.
If the image is scaled in the document, or is a larger dpi on disk than you allow for in cutepdf (for example, if cutepdf is set for 300dpi and the image is 600 dpi), it can be scaled in the pdf.
Microsoft stores TONS of info in the docx format, in xml. WAY more than is really needed to just export the info (for an example, try copying and pasting your text into a textbox cell, and look at the html info that comes out - I had a limit on a textbox size for a cms, and a 7 word sentence ballooned to 950 characters). This is so it can be later edited, and with a lot of esoteric info to make sure everything displays right in every possible permutation. The pdf doesn't need that info, and so it can just do the font and size, and strip out all the unnecessary info, saving a ton of space.
When you use such small files any overhead in the document format will have a disproportionate effect which is why you are seeing such large % differences.
I took a 2683KB JPEG and inserted it into a new word 2003 document. The resulting .doc file was 2725KB (or 2697KB as docx). Turning this into a PDF gives me a 2701KB PDF. So I am seeing a difference of 25KB, but only about 1% difference because of the size of the image data. It is about half what you got but maybe the version of word you have is more verbose when making docx?
For the PDF, acrobat shows space usage as 2691K image, 8.27K overhead and 1K fonts. PDF is quite a sparse format in its syntax which limits overhead and much of it has repeating strings so is easily compressible.
If you want to see what the PDF contains in a tree-like view you can download the demo version of CosEdit.