How to compare two pdf files? [duplicate] - pdf

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Tool to compare large numbers of PDF files?
I have generated two pdf files
1.MNTR305K.PRT.pdf (1862 pages) of 2760 KB
2.MNTR305K.PRT.pdf (1862 pages) of 7345 KB
I saw each pdf file by comparing it's content and fonts. I found everything is same. I don't know why the why the second file took more size than the first one.Does any one help me how to find the difference.
Thanks in advance

Probably down the way the PDF creator is storing/compressing the page data. Have you had a look inside at the structures with Acrobat Pro?

Acrobat has a "space auditor" that can tell you if the size increase is caused by images, fonts, format overhead, etc. Open the "PDF optimizer" and click on the 'Audit space usage..." button.

I have generated two pdf files
What are you using to generate the PDF?
If you are using iText api, then;
Always use setFormFlattening before closing the stamp.
If you are copying / merging pdf using iText then use freeReader before closing the reader.

You can use i-net PDF content comparer to compare 2 PDF files. It's specifically for finding "hidden" differences that are hard to make out with the naked eye.

BeyondCompare work for you?
http://www.scootersoftware.com/

Related

Adjust PDF scale to print

In the context of my studies I often receive PDF files written in LaTeX, with big margins.
When I have to print those files, I like to print them with 2 pages per sheet to spare paper. But I then have a lot of white-space and the text is quite small.
So I'm looking for a way to scale the page contents first and only then print them 2 pages per sheet, to avoid losing space and to have the text as big and readable as possible.
Has anyone an idea of how I could do that either programmatically, or scripted, or on a "step-by-step commands" basis ?
(Note that I have no access to the LaTeX code, otherwise I would just change the margins...)
I used FinePrint to do this on windows. But there are some alternatives, which I haven't try:
https://superuser.com/questions/190869/fineprint-alternative-on-linux
https://superuser.com/questions/107687/good-virtual-printers-with-cropping-for-windows-and-linux
Here are previous answers (all mine) which provide building blocks that will help you construct your own programmatic or scripted or "some step-by-step commands" solution:
PDF Manipulation: "2-Up" page layout (SuperUser)
Linux-based tool to chop PDFs into multiple pages (SuperUser)
Convert PDF 2 sides per page to 1 side per page (SuperUser)
How can I split a PDF's pages down the middle? (SuperUser)
Cropping a PDF using Ghostscript 9.01 (StackOverflow)
Split one PDF page into two (StackOverflow)
PDF - Remove White Margins (StackOverflow)

Converting bitmap pdf to pdf with text that can be copied

In reading journal articles for school, I often come across pdfs where I cannot select text, which makes taking notes very inconvenient. The funny thing is, the pdf does not appear to be scanned, and the security permissions allow for copying, but the text does not enable me to do so. How can I convert this pdf into a pdf where I can select the text, preferably without having to convert each page individually, as there are typically 20 pages per pdf.
I've tried some online converters, but I can't seem to find one that can 'digitize' the text while maintaining the file in a pdf format.
Any suggestions? Any background information to explain this helps as well.
Thanks much.
Try to "print" the original pdf file into a new pdf archive by using "PDF Creator" or a similar application. In the new file you should be able to select the text.

Get text from a pdf in NSString

I am trying to make an iOS app which would extract plain text from a pdf file and display it in a UITextView. Its simply not a pdf reader to view a pdf file but i would later wish to perform certain operations on that text.
I have already googled a lot but still not able to get an exact solution.
i already tried using https://github.com/zachron/pdfiphone
but the files are using ARMV6 architecture which seems obsolete with xcode 4.5
And if anyone can suggest some exact and non-confusing code using Quartz-2d framework of iOS then it would be great.
Here is An Sample code to Extract text from PDF Hope this Might Help You.
https://github.com/zachron/pdfiphone
This is a library to get the text out of a PDF for the iPhone.
Another Demo is there Which uses OCR technology find the link below
https://github.com/nolanbrown/Tesseract-iPhone-Demo
Also Check this page of the Quartz 2D Programming Guide, it covers everything you need to open and parse a PDF file in iOS. Note that it is not a simple task, since there's no method to extract the full text in one line. You have to work with the data as an input stream, using a CGPDFScanner
Two Other Libraries
https://github.com/KurtCode/PDFKitten/
https://github.com/mobfarm/FastPdfKit
This question comes up all the time. It is VERY hard to extract text from PDF in general. The PDF specification is not designed with text extraction in mind. There are many libraries that try to do the job, essentially by reconstructing the text from the geometric placement of the individual glyphs. These libraries have varying degrees of success, but will all fail on certain PDF documents. In fact, some PDF documents have Glyphs but no way to associate the glyph with a character. For these documents it is simply not possible to extract text, short of using some kind of OCR approach.
PDF is designed as a read-only format that is portable in the sense that a PDF document will be rendered identically on any platform. That is what it is best at, and what it should be used for.
If text is to be edited, do not use PDF.
Here (Extracting text from pdf using objective-c), I found an answer to your question and it works. But not so fine as i need it :(
it can extract only ascii
it return me only one paragraph
Good luck.

PDF Compression and editing techniques

I am not sure this question belongs on a programming forum but then again not sure where it would.
I currently open any PDF documents in Adobe Acrobat 9 Pro when reading or editing files. Many times, I want to make a change to the text in those files and will simply use the Tools->Advanced Editing->Touch Up Text Tool to do so.
No issues with the actual text changes but when I go back to save the file, the file size increases drastically. Even after running Advanced->PDF Optimizer and Document->Reduce File Size, the size is still much larger than the previous file, in many cases even if I am reducing the amount of text on that page.
It is quite frustrating. I am sure entire books have been written about proper PDF compression but take one text only document I have for example: file size is 110KB for a 12 page document. We just migrated to Google Apps and an entire 72 page PDF was under 600 KB.
Am I missing something?
Save as... your document after some changes.
Sounds like the font data is being embedded into the PDF when you edit it. Run Acrobat's Space Audit on the original and modified PDF to determine what is taking up the extra space in the modified PDF.

Append text to PDF in Coldfusion 8

I have a PDF that I want to append some text to. the addFooter() that is available in CF9 would work perfectly, but I only have access to CF8.
Any one have workarounds for this feature in 8?
Thanks
Yes, even in ColdFusion 8 you can use DDX to add footers and headers to a PDF. See the specific Adobe 8 Livedocs on how to do this. I also have a couple blog posts 1 and 2 that might help. ALthough I tested on CF9, there's CF8 valid information as well. You might also want to get the almost impossible to find DDX reference. Also check out ColdFusion Jedi's 8 part series on PDF manipulation in CF8.
UPDATE (Added information below on combining text):
To take PDF1 and PDF2 and put the text on a single page in resulting PDF, the first thing that comes to mind is that you could use cfpdf with the getinfo action to get the text (if you don't already have it in a plain text or HTML format). Then you could cfoutput the text into a cfdocument element of type pdf. That way you get a new merged PDF with the contents combined.