how to convert large image PDF file to PNG? - pdf

I have a large single-page PDF (about 700KB) that was automatically generated by the Trace2UML tool. This provides the means to generate a UML interaction diagram from the trace logs obtained while running an application. It has options to export the diagram to PDF and PNG file formats; however, due to the large image size, the PNG export fails, whereas the PDF export succeeds. So I have been looking for a way to conert this large file from PDF to PNG format.
I've googled this for for a couple of hours this morning. There are lots of programs and on-line services to do it, but none of them work with my file. When I load the file into PDF-XChange Viewer, it indicates that the image size is 136 x 11186 cm. So, definitely huge! Is there any way to convert this file?

For mac, the Preview could convert pdf to png (just open pdf and save as png), not sure if it works for large file though.

Related

How to convert a "pdf" to "odg" file with OpenOffice cmd

I can easily convert a pdf to an odt file using:
soffice --infilter="writer_pdf_import" --convert-to odt a.pdf
But when I try to do:
soffice --infilter="writer_pdf_import" --convert-to odg a.pdf
I get an error:
no export filter
TL;DR the answer is at the bottom but do read the following as to why there can be issues
ODG is a multi-part graphics file usually a blank template, often similar to an ORA, however there are many ways they can be structured and converted TO a set of PDF page printouts, as they contain thumbnails, plus one or more high resolution images or scalable vector graphics. Common variants can be used with Inkscape, Krita possibly Scribus / OODraw and other more GRAPHIC apps.
PDF is a page document output format thus not a suitable candidate for converting to professional images with scalar graphics. *Except see the last comment
ODG or ORA may be done well in image conversion but the reverse is not usually true.
Open Office Graphic is like a DocX, a zip wrapper around a core object, here it is a Jpeg but could be PNG SVG etc.
However the contents of the zip are not simple potentially running to thousands of lines of coding. Thus you need to use a more appropriate method to hand build an ODG not simple command line conversion from cruder PDF.
The real strength of a EXPORT from draw as PDF is the hybrid use of embedding ODFG content thus opening such a PDF you can edit it in Draw.
And it will look just as good in any PDF viewer. However it is too specialist to be simply translated without the app settings. In reality the PDF is the chimera/polyglot ODG.
But if you wish to try with simple files the command line is for a.pdf to a.odg
soffice --infilter="draw_pdf_import" --convert-to odg a.pdf

Adjusting format of PDF to print it faster

I am using a combination of iTextSharp and PdfSharp to assemble a large PDF file for printing to a Canon Oce VarioPrint 6000 series printer. The PDF is replacing a postscript file.
Both this new file and the old are transferred to the printer via an LPR command.
The postscript file would take maybe 10 minutes to rip to the printer. My PDF version of the same file is taking over 30 minutes to process before it is ready to print.
Can anyone give me pointers into ways I could change the way this file is written / created that would decrease the processing time on the Vario?
EDIT: I took the file that was ripping so slowly and ran it through Acrobat Preflight and it found many RGB images, that it wanted to convert to CMYK. When I look at the PDF though, they are all black and white logos, so I had Preflight do a fix up to convert all images to print Black and White.
I also noticed the Preflight was consolidating backgrounds. Half of the pages have the same logo on them, so leveraging this conversion is probably also helpful.
When I LPR'd that file, it copyed and ripped in less than 5 minutes! So I guess the real question is how can I do that programmatically?
I am modifying the title and tags.
Thanks!
An equivalent result to the preflight repair process in this case can be gotten by using iText (or in my case, iTextSharp). I replaced the PdfSharp method of aggregating the pdfs with the PdfSmartCopy class. This brought down the size of the outputted pdf significantly, combined with using iText's reader.RemoveUnusedObjects(), and my rip time to the printer was lowered to the same or below the previous rip times that we had with the postscript file. Very pleased.
So the RGB images that were probably contributing to the large processing time, were narrowed by the Smart copy removing duplicates.
More info on PdfSmartCopy can be found at: http://api.itextpdf.com/itext/com/itextpdf/text/pdf/PdfSmartCopy.html
and in Bruno's book, iText In Action, more specifically in Chapter 6.

compress pdf by c# and adobe printer

One of my friend scans a lot of pages of documents and saves them as a pdf.
The size of the resulting pdf is 1GB, when I reprint this pdf using adobe pdf printer, the size of my file changes and is reduced to 80MB.
I set up Adobe Acrobat X Pro to open pdfs and Adobe Acrobat X Pro sets up a virtual pdf printer for me.
The image quality in the second pdf is very good and the most important thing is the difference in filesize.
Now how can I do this in a c# program? I want to write a piece of c# code to do this automatically.
I have about 500 pdf files and size of these files is very large and I want to reduce the size of them.
I need a c# code to get the file path and print that file using Adobe pdf printer and get a pdf file to me, or I want to be able to set a export path for the output pdf. I tested some dlls to do this.
For example iTextSharp or PDFSharp-MigraDocFoundation-1_32 or sharpPDF_2_0_Beta2_dll among many other things.
But these are not nice and working with them is not easy for me. I just want a method or class or a fast component to do these.
Please remember we wanna do this with Adobe Acrobat X Pro.
Thanks

Embed JPG data properly in PDF files generated by Inkscape

There is a bug in Inkscape where JPEG images included in an SVG document are embedded as bitmaps rather than JPEG when exporting to PDF files.
The result is a huge increase in file size. For example, I have a simple SVG drawing which includes a 2 MB JPEG image; exporting to PDF results in a 14 MB file.
I am looking for a workaround. Is there a way to fix the resulting PDF by inserting the correctly-encoded JPG image, perhaps via some sort of pdftk trickery?
(In my case, the resulting PDF will be included as a figure in a LaTeX document rendered with pdflatex, so there may be workarounds other than directly fixing the PDF generated by Inkscape.)
One kludge is to use pdf2ps followed by ps2pdf, which will re-encode the bitmap data as JPEG:
pdf2ps made-by-inkscape.pdf foo.ps
ps2pdf foo.ps smaller-file.pdf
For my test case, the file sizes were:
original JPEG 2.1M
made-by-inkscape.pdf 15M
foo.ps 104M
smaller-file.pdf 1.5M
But of course, this involves re-encoding the JPEG data, which is best avoided.
I found that with Inkscape 0.48.1 exporting to EPS instead, and passing the resulting EPS file to the epstopdf script, produces good results. PNG/JPG files stay PNG/JPG within the PDF file, fonts look alright, etc.

PDF compression How does Adobe do it?

This is a bit more of a fun question than a serious one, but how does the Adobe PDF format make documents so... portable?
I just created a small Word document, 235kb in size, containing multiple color photos and a few textual phrases. A PDF created using CutePDF (which I understand isn't the most efficient method of PDF creation) is only 176kb. That's a 25% compression ratio. When those files are placed into a compressed folder, the PDF is capable of 3% compression where the .docx can only take 2%. I'm sure that larger files would have even greater differences in size.
My question is, how does Adobe manage to make their files so much smaller? I understand that they are drawn from raster graphics, but my 3 bitmap files really can't be helped from raster that much, can they?
If you have Acrobat 9 there is a nice tool built-in so you can see how the PDF was put together (and compressions used). There is a blog post explaining how to use it at http://pdf.jpedal.org/java-pdf-blog/bid/10479/Viewing-PDF-objects
There are a few ways it can be compressing this:
Pdf files use lzw and zip compression.
If the image is scaled in the document, or is a larger dpi on disk than you allow for in cutepdf (for example, if cutepdf is set for 300dpi and the image is 600 dpi), it can be scaled in the pdf.
Microsoft stores TONS of info in the docx format, in xml. WAY more than is really needed to just export the info (for an example, try copying and pasting your text into a textbox cell, and look at the html info that comes out - I had a limit on a textbox size for a cms, and a 7 word sentence ballooned to 950 characters). This is so it can be later edited, and with a lot of esoteric info to make sure everything displays right in every possible permutation. The pdf doesn't need that info, and so it can just do the font and size, and strip out all the unnecessary info, saving a ton of space.
When you use such small files any overhead in the document format will have a disproportionate effect which is why you are seeing such large % differences.
I took a 2683KB JPEG and inserted it into a new word 2003 document. The resulting .doc file was 2725KB (or 2697KB as docx). Turning this into a PDF gives me a 2701KB PDF. So I am seeing a difference of 25KB, but only about 1% difference because of the size of the image data. It is about half what you got but maybe the version of word you have is more verbose when making docx?
For the PDF, acrobat shows space usage as 2691K image, 8.27K overhead and 1K fonts. PDF is quite a sparse format in its syntax which limits overhead and much of it has repeating strings so is easily compressible.
If you want to see what the PDF contains in a tree-like view you can download the demo version of CosEdit.