There is a bug in Inkscape where JPEG images included in an SVG document are embedded as bitmaps rather than JPEG when exporting to PDF files.
The result is a huge increase in file size. For example, I have a simple SVG drawing which includes a 2 MB JPEG image; exporting to PDF results in a 14 MB file.
I am looking for a workaround. Is there a way to fix the resulting PDF by inserting the correctly-encoded JPG image, perhaps via some sort of pdftk trickery?
(In my case, the resulting PDF will be included as a figure in a LaTeX document rendered with pdflatex, so there may be workarounds other than directly fixing the PDF generated by Inkscape.)
One kludge is to use pdf2ps followed by ps2pdf, which will re-encode the bitmap data as JPEG:
pdf2ps made-by-inkscape.pdf foo.ps
ps2pdf foo.ps smaller-file.pdf
For my test case, the file sizes were:
original JPEG 2.1M
made-by-inkscape.pdf 15M
foo.ps 104M
smaller-file.pdf 1.5M
But of course, this involves re-encoding the JPEG data, which is best avoided.
I found that with Inkscape 0.48.1 exporting to EPS instead, and passing the resulting EPS file to the epstopdf script, produces good results. PNG/JPG files stay PNG/JPG within the PDF file, fonts look alright, etc.
Related
I can use Acrobat to reduce a PDF file of 30MB to 10MB. The input PDF is just the result of combining of many monochrome tiff files like the following.
$ file x.tiff
x.tiff: TIFF image data, little-endian, direntries=14, height=2957, bps=1, compression=bi-level group 4, PhotometricIntepretation=WhiteIsZero, orientation=upper-left, width=1627
The tiff files are converted to pdf files using the following command.
convert x.tiff x.pdf
The single page PDF files are then merged to a single PDF file by the following command.
cpdf input1.pdf input2.pdf ... -o output.pdf
The OCR (Searchable Image (Exact)) is done on the pdf file. I am surprized that the file size can be reduced to only 1/3.
Note that I don't see any changes in image resolution. For example, when I zoom in, I see squares for pixels. The image in pdf still looks black-white, there are no gray pixels.
What can be done to reduce the PDF files by such a big amount?
You may want to run the PDF through pdfsizeopt. For such a PDF, pdfsizeopt will most probably recompress the images with JBIG2 compression, which makes them smaller without loss of quality or reducing the image resolution. However, it's unlikely that this will make the PDF much smaller than by a factor of 3.
pdfsizeopt --use-pngout=no output.pdf optimized_output.pdf
If you need an even smaller PDF, you may want to reduce the image resolution (number of image pixels) first (before running pdfsizeopt):
convert x.tiff -resize 50% x.pdf
If you are unsure what is using much space in a PDF, run:
pdfsizeopt --stats output.pdf
The use of xfa inside pdf isn’t only for creating forms
Short : I need valid test cases for a new xfa ᴘᴅꜰ reader, but couldn’t found anyone nor I could find how to use ghostscript in order to create such test cases in batch.
The point is I don’t know how to build the extra information ghostscript should handle without an hex editor.
Ghostscript doesn't handle XFA at all, neither on input nor in output, you cannot use Ghostscript to create XFA files.
Nor does Ghostscript (currently) create PDF files which solely consist of an image. Even if it did, these wouldn't be PNG or TIFF images, as those file formats are not directly supported by PDF. The next release of Ghostscript will contain devices which produce PDF files where the content is a rendered bitmap image created from the input. But they won't be either PNG or TIFF file format.
Note that XFA has been removed from the PDF 2.0 specification (hardly surprising as its XML not PDF format).
I'm converting PDF to JPG with gs.
Does gs substitute embedded fonts? How exactly this works? Like if i embed all fonts that is used in PDF does gs still look for some substitution or can it use that embedded font data?
So does embedding fonts in PDF mean that all glyphs used in PDF with that font is being embedded and i don't need to have that font in my gs font path?
Thanks!
When you’re outputting a JPEG file, you’re in effect outputting an image. This means that Ghostscript renders the page as image, then compresses the image using JPEG (lossy – to prevent reduced legibility of the text, use a lossless compression format such as PNG instead; JPEG is basically only good for photography because lossless would be much too big there).
In a bitmap image, there are no fonts, only pixels – so, for text rendering (e.g. black text on a white page), Ghostscript will create a bitmap image consisting only of greyscale pixels (by means of anti-aliasing), then save that.
To be able to do that, Ghostscript must have access to the fonts at the time of PDF rendering and JPEG creation. This means that the fonts either must be installed on the system (and in your font path), or embedded in the PDF in the first place. They are not necessary to view the JPEG file.
When I print a PDF-file with a PS-driver and then convert the PS-file to a searchable PDF with ghostscript (pdfwrite device) something is wrong with the final pdf file. It becomes corrupt.
In some cases, the space character disappears, and in other cases the text width becomes too large so text overlap text.
The settings for gs is -dNOPAUSE -dBatch -sDEVICE=pdfwrite -dEmbedAllFonts=true -dSubsetFonts=false -sOutputFile=output.pdf input.ps
I am wondering if it is ghostscript that just cant produce a good output when the input file is a pdf.
If I print a word-document everything works fine!
Are there any other solutions like using a xps-driver and convert the xps file to a searchable pdf instead? are there any solutions out there that can do this?
I use gs 9.07.
Best regards
Joe
Why are you going through the step of printing the PDF file to a PostScript file? Ghostscript is already capable of accepting a PDF file as input.
This simply adds more confusion, it certainly won't add anything useful.
Its not possible to say what the problem 'might' be without seeing the original PDF file and the PostScript file produced by your driver. My guess would be that whatever application is processing the PDF hasn't embedded the font, or that the PostScript driver hasn't been able to convert the font into something suitable for PostScript, resulting in the font being missing in the output, and the pdfwrite device having to substitute 'something else' for the missing font.
Ghostscript (more accurately the pdfwrite device) is perfectly capable of producing a decent PDF file when the input is PDF, but your input isn't PDF, its PostScript!
To be perfectly honest, if your original PDF file isn't 'searchable' its very unlikely that the PDF file produced by pdfwrite will be either, no matter whether you use the original PDF or mangle it into PostScript instead.
The usual reasons why a PDF file are not 'searchable' are because there is no ToUnicode information and the font is encoded with a custom encoding and deos not use standard glyph names. If this is the case there is nothing you can do with the PDF file except OCR it.
I have a large single-page PDF (about 700KB) that was automatically generated by the Trace2UML tool. This provides the means to generate a UML interaction diagram from the trace logs obtained while running an application. It has options to export the diagram to PDF and PNG file formats; however, due to the large image size, the PNG export fails, whereas the PDF export succeeds. So I have been looking for a way to conert this large file from PDF to PNG format.
I've googled this for for a couple of hours this morning. There are lots of programs and on-line services to do it, but none of them work with my file. When I load the file into PDF-XChange Viewer, it indicates that the image size is 136 x 11186 cm. So, definitely huge! Is there any way to convert this file?
For mac, the Preview could convert pdf to png (just open pdf and save as png), not sure if it works for large file though.