How to create a pdf with tiff or png images with ghostscript? - pdf

The use of xfa inside pdf isn’t only for creating forms
Short : I need valid test cases for a new xfa ᴘᴅꜰ reader, but couldn’t found anyone nor I could find how to use ghostscript in order to create such test cases in batch.
The point is I don’t know how to build the extra information ghostscript should handle without an hex editor.

Ghostscript doesn't handle XFA at all, neither on input nor in output, you cannot use Ghostscript to create XFA files.
Nor does Ghostscript (currently) create PDF files which solely consist of an image. Even if it did, these wouldn't be PNG or TIFF images, as those file formats are not directly supported by PDF. The next release of Ghostscript will contain devices which produce PDF files where the content is a rendered bitmap image created from the input. But they won't be either PNG or TIFF file format.
Note that XFA has been removed from the PDF 2.0 specification (hardly surprising as its XML not PDF format).

Related

Ghostscript converted PDF not rendered corectly with pdfium

I have a problem with some pdf documents I convert with Ghostsript to pdf/a documents. If the original document contains subseted fonts, the document is not correctly displayed in Chrome (pdfium) after converting. The chars will be displayed as squares.
In Adobe PDF Reader the output will be displayed correctly. Maybe the attached files can help you.
original PDF
converted PDF/A

Relative crop with Ghostscript

I want to crop all pages by 1 inch, as with Adobe Acrobat:
but I can't find such switch in ghostscript reference.
Is there an easy way to crop pages relatively by 1 inch without knowing page dimension upfront?
Short answer; no.
Note that the dialog you have posted doesn't actually crop pages per se, it sets the CropBox of the PDF file. Ghostscript's pdfwrite device does NOT 'edit' PDF files, so you cannot simply alter the CropBox of the original PDF file using Ghostcript.
Long answer; What you can do is create a brand new PDF file which should look the same as the original, and you can create that PDF file with a different CropBox. Now the PDF interpreter must know the MediaBox (and other boxes) from the PDF file, because, obviously, it needs to know how big the original is. Which means that you can write PostScript to alter that.
But doing this isn't simple, especially not if you expect each page in the file to have a different MediaSize, and hence a different CropBox.

Print a file (pdf) to a printer with PS driver, grab PS-file and convert to searchable pdf with ghostscript

When I print a PDF-file with a PS-driver and then convert the PS-file to a searchable PDF with ghostscript (pdfwrite device) something is wrong with the final pdf file. It becomes corrupt.
In some cases, the space character disappears, and in other cases the text width becomes too large so text overlap text.
The settings for gs is -dNOPAUSE -dBatch -sDEVICE=pdfwrite -dEmbedAllFonts=true -dSubsetFonts=false -sOutputFile=output.pdf input.ps
I am wondering if it is ghostscript that just cant produce a good output when the input file is a pdf.
If I print a word-document everything works fine!
Are there any other solutions like using a xps-driver and convert the xps file to a searchable pdf instead? are there any solutions out there that can do this?
I use gs 9.07.
Best regards
Joe
Why are you going through the step of printing the PDF file to a PostScript file? Ghostscript is already capable of accepting a PDF file as input.
This simply adds more confusion, it certainly won't add anything useful.
Its not possible to say what the problem 'might' be without seeing the original PDF file and the PostScript file produced by your driver. My guess would be that whatever application is processing the PDF hasn't embedded the font, or that the PostScript driver hasn't been able to convert the font into something suitable for PostScript, resulting in the font being missing in the output, and the pdfwrite device having to substitute 'something else' for the missing font.
Ghostscript (more accurately the pdfwrite device) is perfectly capable of producing a decent PDF file when the input is PDF, but your input isn't PDF, its PostScript!
To be perfectly honest, if your original PDF file isn't 'searchable' its very unlikely that the PDF file produced by pdfwrite will be either, no matter whether you use the original PDF or mangle it into PostScript instead.
The usual reasons why a PDF file are not 'searchable' are because there is no ToUnicode information and the font is encoded with a custom encoding and deos not use standard glyph names. If this is the case there is nothing you can do with the PDF file except OCR it.

Embed JPG data properly in PDF files generated by Inkscape

There is a bug in Inkscape where JPEG images included in an SVG document are embedded as bitmaps rather than JPEG when exporting to PDF files.
The result is a huge increase in file size. For example, I have a simple SVG drawing which includes a 2 MB JPEG image; exporting to PDF results in a 14 MB file.
I am looking for a workaround. Is there a way to fix the resulting PDF by inserting the correctly-encoded JPG image, perhaps via some sort of pdftk trickery?
(In my case, the resulting PDF will be included as a figure in a LaTeX document rendered with pdflatex, so there may be workarounds other than directly fixing the PDF generated by Inkscape.)
One kludge is to use pdf2ps followed by ps2pdf, which will re-encode the bitmap data as JPEG:
pdf2ps made-by-inkscape.pdf foo.ps
ps2pdf foo.ps smaller-file.pdf
For my test case, the file sizes were:
original JPEG 2.1M
made-by-inkscape.pdf 15M
foo.ps 104M
smaller-file.pdf 1.5M
But of course, this involves re-encoding the JPEG data, which is best avoided.
I found that with Inkscape 0.48.1 exporting to EPS instead, and passing the resulting EPS file to the epstopdf script, produces good results. PNG/JPG files stay PNG/JPG within the PDF file, fonts look alright, etc.

Is there any easy way to convert images inside a pdf file?

I have a few books that I absolutely MUST be reading; they are a set of calculus textbooks as PDF files. The problem is that the graphs and images in these pdf file are all png, which is apparently not supported by my kindle. Is there anyway I can convert these images as a batch into jpeg or any other format inside the pdf file. I have tried everything from converting the pdf to other formats (equation formatting didn't let it work), to extracting the images from the pdf file and getting them converted. I just really need to know if there is any program I can use to help me or if maybe, there is a way I could 'open' the pdf container, and switch out the png images for the jpeg images and replace the png file extensions with jpg. Any help would be greatly appreciated.
The books are:
http://tutorial.math.lamar.edu/pdf/CalcI/CalcI_Complete.pdf
http://tutorial.math.lamar.edu/pdf/CalcII/CalcII_Complete.pdf