I'm considering the following page of a book. If it's being viewed in a web preview, it is displayed properly. If I download a pdf of this page, it seems to be blank. It only seems. The text can be selected and copied (in the Firefox preview and Evince (but not Atril))
While processing with pdf2djvu I get the following error
PDF syntax warning (97406879): Missing or invalid segmentation symbol in JPX stream
This problem makes the page unreadable. Is there any way to fix it?
/edit: I was suspecting a corrupt pdf since 2 separate viewers failed to read the page.
Both Evince and the Firefox PDF preview seem not to handle JPEG2000 images properly. You can use ghostscript to fix the problematic page (source)
gs -dNOPAUSE -dBATCH -dPDFSETTINGS=/ebook -sDEVICE=pdfwrite -sOutputFile=output.pdf input.pdf
Related
I am trying to run regular conversions of PS to PDF but having some issues with Ghostscript.
Running under normal conditions the output crops the top of the page as you would expect since the PS is configured for A4 but doesn't define a page size.
However, when I use the sPAPERSIZE or change the default in gs_init as described here it prints a blank page.
I tried on a colleges PC who is running adobe distiller and the conversion worked perfectly without issue. I also tried using PDF24 rather than GS directly - it cropped the same but I couldn't find an init file to change.
Unfortunately the PS files are auto generated and so changing them isnt an option.
Windows 10 10.0.17763 x64
GS 9.53.3
PDF24 9.2.2
Adobe Distiller: Version unknown (probably older)
Solved my issue:
C:\Program Files (x86)\gs\gs9.53.3\bin>gswin32c.exe -sOutputFile="output.pdf" -dNOPAUSE -dBATCH -sPAPERSIZE=a4 -sDEVICE=pdfwrite -dSAFER "input.PS"
Solved using a combination of answers from here to get the gs function
https://stackoverflow.com/questions/30128250/ps2pdf-preserve-page-size#:~:text=An%20A4%20page%20has%20a,it%20comes%20to%20PDF%20output
and here to run gs in the command line (unable to get it to work outside of command line)
Keep getting error messages in ghostscript when using the documented ghostscript syntax
I am happily converting docx files to PDF via the command line (controlled via C# process calls) out of my service.
Unfortunately I could not find any internet search results on how to set the options for the output PDF that the GUI offers me. I am specifically looking for generating PDF/A and tagged PDF via the command line.
Anyone ever done this and knows how to do that?
EDIT:
Obviously getting a PDF/A can be done by using unoconv instead.
On windows one would use the following command line in a checked out unoconv repository:
python.exe .\unoconv -f pdf -eSelectPdfVersion=1 C:\temp\libre\renderingtest.docx
I did not find further information on how to select other things (tagged PDF etc.) and where to get a complete list of the options that are available.
EDIT: It seems as one could try the different options in the GUI. The settings get saved to C:\Users\<userName>\AppData\Roaming\LibreOffice\4\user\registrymodifications.xcu. Then one can look up the changed setting and provide that to unoconv as this:
python.exe .\unoconv -f pdf -eUseTaggedPDF=1 -eSelectPdfVersion=1 C:\temp\libre\renderingtest.docx
Still not sure if I am doing this correctly though.
The gotenberg project shows how that can be done using unocov.
$ curl --request POST 'http://localhost:3000/forms/libreoffice/convert' --form 'files=#"doc.docx"' --form 'nativePdfFormat="PDF/A-1a"' -o pdfA.pdf
Example PDF
I am using Spatie/pdfToImage that builds on ghost script and imagemagick to on my server:
Take a multiple page pdf from an email using mailgun routing.
Save the pdf in folder /docs_pdf like file.pdf
Use a foreach to loop through each page and save each page as a png to /docs like file_#.png
locally where I use laravel -> valet everything works fine.
On my server using digital ocean through laravel forge the language in a multipaged pdf that is in swedish transforms from normal swedish to a bunch of random letters and signs.
The left is correct (yes, its true. Its Swedish) and the right is wrong:
Someone suggested to me that this is probably a matter of the font missing on the server. The fonts used in the pdf:
<</StemV 68/FontName/PSQHMO+FoundrySans-Normal/FontFile2 216 0 R/FontStretch/Normal/FontWeight 400/Flags 32/Descent -240/FontBBox[-40 -240 960 916]/Ascent 916/FontFamily(FoundrySans-Normal)/CapHeight 667/XHeight 465/Type/FontDescriptor/ItalicAngle 0>>
<</StemV 100/FontName/MLHPWU+FoundrySans-Medium/FontFile2 217 0 R/FontStretch/Normal/FontWeight 400/Flags 32/Descent -241/FontBBox[-42 -241 1008 916]/Ascent 916/FontFamily(FoundrySans-Medium)/CapHeight 667/XHeight 470/Type/FontDescriptor/ItalicAngle 0>>
<</StemV 68/FontName/SUEECI+FoundrySans-Normal/FontFile2 218 0 R/FontStretch/Normal/FontWeight 400/Flags 4/Descent -240/FontBBox[-40 -240 960 916]/Ascent 916/FontFamily(FoundrySans-Normal)/CapHeight 667/XHeight 465/Type/FontDescriptor/ItalicAngle 0>>
<</StemV 48/FontName/KIDDUY+FoundrySans-Light/FontFile2 9 0 R/FontStretch/Normal/FontWeight 400/Flags 32/Descent -248/FontBBox[-28 -248 978 924]/Ascent 924/FontFamily(FoundrySans-Light)/CapHeight 667/XHeight 458/Type/FontDescriptor/ItalicAngle 0>>
Here is configuration of fonts in imagemagick and ghostscript:
https://www.imagemagick.org/script/resources.php
how can this be solved?
Update:
I have now made a clean install on a new server.
Installed Imagick and spatie/pdfToImage
As suggested by KenS I ran
gs -sDEVICE=png16m -o out%d.png
terminal output
forge#Server:~/app/storage/app/public/files$ gs -sDEVICE=png16m -o test_out%d.png file.pdf
GPL Ghostscript 9.22 (2017-10-04)
Copyright (C) 2017 Artifex Software, Inc. All rights reserved.
This software comes with NO WARRANTY: see the file PUBLIC for details.
Processing pages 1 through 2.
Page 1
Page 2
the document rendered the same = wrong.
I am at a complete loss.. Don't know what next step might be..
Update2:
I also run the convert imagemagick commando and the img rendered the same way also.
So even if I do it with ghostscript solo, imagemagick or spatie/pdfToImage it gives me the same output
Well, the current version of Ghostscript (9.25) renders this acceptably for me; that is the text appears to be correct. All the fonts are embedded, so there shouldn't be any problems.
And this means that even if you did replace the default font substitution, it wouldn't help, because Ghostscript shouldn't be using the default font, it will be using the fonts embedded in the PDF file.
Without knowing what version of Ghostscript you are using (I see from a later comment that its 9.25), or the command line that is used to start it, I can't really do a like-for-like comparison. Its hard for me to see how you could be getting such a different result though. That looks like Ghostscript has failed to find the embedded fonts.
Its possible that whatever package you are using has done something 'unfortunate'. The various package maintainers on Linux add their own patches, and sometimes modify the way that Ghostscript is built. Possibly that has broken something.
If you are able to build Ghostscript yourself you could try cloning our Git repository and doing that. You could also try downloading the Linux binaries off our website. They won't work with every Linux distribution (different ABI) but you can try, you might be lucky.
You could also try running Ghostscript directly on the PDF file. Something like:
gs -sDEVICE=png16m -o out%d.png
should produce 2 PNG files, out1.png and out2.png. It will also produce a bunch of stuff on the terminal. That back channel output is valuable information for me so if you can reproduce the problem, I'd like to see that too.
One last thought; its possible to have more than one version of Ghostscript installed simultaneously, its possible that your current setup is using an old version of Ghostscript.
I can't help you with ImageMagick or Spatie, but if you can debug those to the point where you can reproduce the problem with a plain Ghostscript command line then I can look further at it.
Finally got it to work. I want to first give kudos to KenS that really helped me, and without him it would not have worked.
This is what I did:
1 - I removed Ghostscript:
sudo apt-get purge --auto-remove ghostscript
then
wget https://github.com/ArtifexSoftware/ghostpdl-downloads/releases/download/gs925/ghostscript-9.25.tar.gz
tar xvf ghostscript-9.25.tar.gz
Enter the unpacked folder and do
./configure
make
make install
then
sudo ln -s /usr/local/bin/gs /usr/bin/gs
On top of the above I did:
sudo add-apt-repository ppa:glasen/freetype2
and then:
sudo apt update && sudo apt install freetype2-demos
I updated Ghostscript from version 9.05 to 9.15. After the update, ps2pdf doesn't use the correct fonts in the conversion from PostScript to PDF anymore.
The fonts in question are not from the 12 default PostScript fonts. fc-list lists them properly and I even made sure that the font-files actually exist.
With gs 9.05 everything worked as expected. The command I use is simply
ps2pdf $FILE $PDF_FILE_NAME
I tried -dEmbedAllFonts=true but to no help.
I use the Generic Mapping Tools to generate my PostScript files. The PostScript file looks correct when opened with Apple Mac OS X' Preview.app:
Only the gs generated PDF lacks the proper fonts:
System:
Ubuntu 12.04.5 LTS (GNU/Linux 2.6.32-042stab092.2 x86_64)
GPL Ghostscript 9.15 (2014-09-22)
GMT 5.2.0_r13493 [64-bit] [4 cores]
I ran out of ideas on what might cause my problem. Your ideas and input are highly appreciated.
Seems to have been raised as a bug at:
http://bugs.ghostscript.com/show_bug.cgi?id=695552
Still waiting to hear a response from the bug reporter there.
You could try to use ps2pdf with the additional option -sFONTPATH:
ps2pdf -sFONTPATH=/path/to/your/fonts $FILE $OUTPUT
I'm led to believe that it's possible to output from Ghostscript to SVG, as described on this blog post:
gs -dBATCH -dSAFER -dNOPAUSE -sDEVICE=svg -sOutputFile=Logo.svg Logo.pdf
However, I just get "Unknown device: svg"
I am using Ghostscript 9.06
My question is: where do I get the svg device, and how do I install it? (Red Hat x64)
So far I have tried googling (many dead ends but no real mention of this output device) and looking on the Ghostscript website.
If you do gs -? the usage will give you a list of available devices. Presumably your build doesn't include the svg device, in which case you will need to rebuild your executable with that support included.
I do have svg as one of my available devices. It does something a little different with the output than you expect. Specifically, it sends the file to standard error instead of to the output file you modified. Set up your command line like this for the special case:
gs -dBATCH -dSAFER -dNOPAUSE -sDEVICE=svg Logo.pdf 2>Logo.svg
You may or may want to look into the -q flag, which will suppress the usual standard output.