Use Ghostscript to convert each page of a PDF to images and the output is still PDF

Use Ghostscript to convert each page of a PDF to images and the output is still PDF - pdf

I know that Adobe Acrobat Reader DC can select the Microsoft Print to PDF printer to output to a PDF file with Print As Image checked in the Advanced Print Setup dialog. However, I want to use a command to do this. I tried the following command, as a result it failed to convert each page to images (Note the output file is still PDF).
gs -o 0.999.watermask.compact.screen.pdf -sDEVICE=pdfwrite -dDetectDuplicateImages=true -dPDFSETTINGS=/screen 0.999.watermask.pdf
References
7.4 PDF file output
iText 7 iText 7 for Java represents the next level of SDKs for developers that want to take advantage of the benefits PDF can bring.
itext-rups-7.1.14.jar iText RUPS iText® 7.1.14 ©2000-2020 iText Group NV (AGPL-version)

Your -switches include -dDetectDuplicateImages=true which under the circumstances should be superfluous and the device selection can be from one of four as pointed out by KenS.
gs -o 0.999.watermask.compact.screen.pdf -sDEVICE=pdfimage32 -dPDFSETTINGS=/screen 0.999.watermask.pdf
If you want to emulate MS Print As Image PDF on Windows you would find the result in some ways inferior (and often many times bigger). But for comparison it would be,
NOTE:- "%%printer%%... is for a batch file for a command line use "%printer%...
gswin64c.exe -sDEVICE=mswinpr2 -dNoCancel -o "%%printer%%Microsoft Print to PDF" -dPDFSETTINGS=/screen -f "0.999.watermask.pdf"

Related

Print multiple PDF page ranges

I have a PDF with 200+ editable pages and need to hardcode print to PDF them into smaller PDF files (ie page 1-2, 3-8, 9, 10-11, 12-14, etc..).
Is there a way to automate this since I do this exercise each month? Right now I have to manually print each sub section one at a time.

You can use Ghostscript to copy a range of pages from a PDF file to another.
For example, to write pages 3-8 from input.pdf to output.pdf you could run the following from the command prompt, using the command line options to specify the first and last page to process.
gswin64c.exe -sDEVICE=pdfwrite -dFirstPage=3 -dLastPage=8 -o output.pdf input.pdf

GhostScript PDF 1.5 (from tiff to PDF with ImageMagick) convert to PDF/A

I need to create a PDF/A from a Folder of Tiff Files.
Creating a PDF (1.5) is working with ImageMagick.
But Converting this PDF to a PDF/A using Ghostscript is a problem.
My GhostScript cmd:
-dPDFA=2 -dNOOUTERSAVE -sProcessColorModel=DeviceRGB -sDEVICE=pdfwrite -o "C:\Temp\TestData\TIFF to PDF Imagemagick\pdfa.pdf" "C:\Temp\TestData\TIFF to PDF Imagemagick\PDFA_def.ps" -dPDFACompatibilityPolicy=1 "C:\Temp\TestData\TIFF to PDF Imagemagick\test.pdf"
Also tryed:
-dPDFA=2 -dBATCH -dNOPAUSE -sColorConversionStrategy=RGB -sDEVICE=pdfwrite -sPDFACompatibilityPolicy=1 -sOutputFile="C:\Temp\TestData\TIFF to PDF Imagemagick\pdfa.pdf" "C:\Temp\TestData\TIFF to PDF Imagemagick\PDFA_def.ps" "C:\Temp\TestData\TIFF to PDF Imagemagick\test.pdf"
my PDFA_def.ps is the GS standard with:
/ICCProfile (AdobeRGB1998.icc) % Customise
The created PDF/? is not passing the "Verify compliance with PDF/A-2b" preflight in Adobe Acrobat:
Error
Metadata missing (XMP)
PDF/A entry missing
Syntax problem: Indirect object “endobj” keyword not preceded by an EOL marker
Syntax problem: Stream dictionary improperly formatted
Also not the https://www.pdf-online.com/osa/validate.aspx validator:
File pdfa.pdf
Compliance pdf1.5
Result Document does not conform to PDF/A.
Details
Validating file "pdfa.pdf" for conformance level pdf1.5
XML line 10:212: xmlParseCharRef: invalid xmlChar value 0.
The document does not conform to the requested standard.
The document's meta data is either missing or inconsistent or corrupt.
The document does not conform to the PDF 1.5 standard.
Done.
Also tryed VeraPDF ....
What kind of settings have I forgotten?

Well there's quite a few problems here.
You haven't said what version of Ghostscript you are using, nor have you supplied an example file to experiment with. You also haven't given the back channel output which might contain additional information.
You can't use the supplied model PFA_def.ps without modification, at the very least you need to modify the /ICCProfile entry to point to a real valid ICC profile. I suspect this has caused pdfwrite to abort PDF/A-2 production, which would normally be mentioned in the back channel output.
You haven't set -dColorConversionStrategy, just setting the ProcessColorModel is not sufficient, pdfwrite will mostly ignore that. If you don't tell pdfwrite that you want colours converted to a different space, it will preserve them unchanged, regardless of the Process color model.

With this command its now running:
-dPDFA=2 -sColorConversionStrategy=RGB -sDEVICE=pdfwrite -dPDFACompatibilityPolicy=1 -dNOPAUSE -dBATCH -o "C:\Temp\TestData\tiff2pdfa\pdfatest.pdf" "C:\Temp\TestData\tiff2pdfa\PDFA\PDFA_def.ps" "C:\Temp\TestData\tiff2pdfa\test.pdf"
Thanks to:
Batch Convert PDF to PDF/A - MARK BERRY
But i have still some Error:
GPL Ghostscript 9.25: UTF16BE text string detected in DOCINFO cannot be represented
in XMP for PDF/A 1, discarding DOCINFO
Processing pages 1 through 56.
Page 1
GPL Ghostscript 9.25: Setting Overprint Mode to 1
not permitted in PDF/A-2, overprint mode not set
Should I be thinking about this "Overpirnt Mode"?

Reverse white and black colors in a PDF

Given a black and white PDF, how do I reverse the colors such that background is black and everything else is white?
Adobe Reader does it (Preferences -> Accessibility) for viewing purposes only in the program. But does not change the document inherently such that the colors are reversed also in other PDF readers.
How to reverse colors permanently?

You can run the following Ghostscript command:
gs -o inverted.pdf \
-sDEVICE=pdfwrite \
-c "{1 exch sub}{1 exch sub}{1 exch sub}{1 exch sub} setcolortransfer" \
-f input.pdf
Acrobat will show the colors inverted.
The four identical parts {1 exch sub} are meant for CMYK color spaces and are applied to C(yan), M(agenta), Y(ellow) and (blac)K color channels in the order of appearance.
You may use only three of them -- then it is meant for RGB color spaces and is applied to R(ed), G(reen) and B(lue).
Of course you can "invent" you own transfer functions too, instead of the simple 1 exch sub one: for example {0.5 mul} will just use 50% of the original color values for each color channel.
Note: Above command will show ALL colors inverted, not just black+white!
Caveats:
Some PDF viewers won't display the inverted colors, notably Preview.app on Mac OS X, Evince, MuPDF and PDF.js (Firefox PDF Viewer) won't. But Chrome's native PDF viewer PDFium will do it, as well as Ghostscript and Adobe Reader.
It will not work with all PDFs (or for all pages of the PDF), because it is also dependent on how exactly the document's colors are defined.
Update
Command above updated with added -f parameter (required) before the input.pdf. Sorry for not noticing this flaw in my command line before. I got aware of it again only because some good soul gave it its first upvote today...
Additional update: The most recent versions of Ghostscript do not require the added -f parameter any more. Verified with v9.26 (may also be true even with v9.25 or earlier versions).

Best method would be to use "pdf2ps - Ghostscript PDF to PostScript translator", which convert the PDF to PS file.
Once PS file is created, open it with any text editor & add {1 exch sub} settransfer before first line.
Now "re-convert" the PS file back to PDF with same software used above.

If you have the Adobe PDF printer installed, you go to Print -> Adobe PDF -> Advanced... -> Output area and select the "Invert" checkbox. Your printed PDF file will then be inverted permanently.

None of the previously posted solutions worked for me so I wrote this simple bash script. It depends on pdftk and awk. Just copy the code into a file and make it executable. Then run it like:
$ /path/to/this_script.sh /path/to/mypdf.pdf
The script:
#!/bin/bash
pdftk "$1" output - uncompress | \
awk '
/^1 1 1 / {
sub(/1 1 1 /,"0 0 0 ",$0);
print;
next;
}
/^0 0 0 / {
sub(/0 0 0 /,"1 1 1 ",$0);
print;
next;
}
{ print }' | \
pdftk - output "${1/%.pdf/_inverted.pdf}" compress
This script works for me but your mileage may vary. In particular sometimes the colors are listed in the form 1.000 1.000 1.000 instead of 1 1 1. The script can easily be modified as needed. If desired, additional color conversions could be added as well.
For me, the pdf2ps -> edit -> ps2pdf solution did not work. The intermediate .ps file is inverted correctly, but the final .pdf is the same as the original. The final .pdf in the suggested gs solution was also the same as the original.

Cross Platform try MuPDF
Mutool draw -I -o out.pdf in.pdf [range of pages]
It should permanently change colours in many viewers
Later Edit
A sample file that did not reverse was one with linework only (no image) and the method needed was to save the graphics as inverted image then reuse that to build a replacement PDF, however beware converting the whole pages to image will make any searchable text just simply unsearchable pixels thus would need to be run with the OCR active on rebuild.
The two commands needed will be something like (%4d means numbers for images start output0001)
mutool draw -o output%4d.png -I input.pdf
For Linux users the folowing second pass should work easily:-
mutool convert -O compress -o output.pdf output*.png
For windows users you will for now (v1.19) need to combine by scripting or use groups
mutool convert -O compress -o output.pdf output0001.png output0002.png output0003.png
next version may include an #filelist option see https://bugs.ghostscript.com/show_bug.cgi?id=703163

This is probably just a frontend for the ghostscript command Kurt Pfeifle posted, but you could also use imagemagick with something like:
convert -density 300 -colorspace RGB -channel RGB -negate input.pdf output.pdf

Ghostscript: convert PDF to EPS with embeded font rather than outlined curve

I use the following command to convert a PDF to EPS:
gswin32 -dNOCACHE -dNOPAUSE -dBATCH -dSAFER -sDEVICE=epswrite -dLanguageLevel=2 -sOutputFile=test.eps -f test.pdf
I then use the following command to convert the EPS to another PDF (test2.pdf) to view the EPS figure.
gswin32 -dSAFER -dNOPLATFONTS -dNOPAUSE -dBATCH -dEPSCrop -sDEVICE=pdfwrite -dPDFSETTINGS=/printer -dCompatibilityLevel=1.4 -dMaxSubsetPct=100 -dSubsetFonts=true -dEmbedAllFonts=true -sOutputFile=test2.pdf -f test.eps
I found the text in the generated test2.pdf have been converted to outline curves. There is no font embedded anymore either.
Is it possible to convert PDF to EPS without convert text to outlines? I mean, to EPS with embedded font and text.
Also after the conversion (test.pdf -> test.eps -> test2.pdf), the height and width of the PDF figure (test2.pdf) is a little bit smaller than the original PDF (test.pdf):
test.pdf:
test2.pdf:
Is it possible to keep the width and height of the figure after conversion?
Here is the test.pdf: https://dl.dropboxusercontent.com/u/45318932/test.pdf
I tried KenS's suggestion:
gswin32 -dNOPAUSE -dBATCH -dSAFER -sDEVICE=eps2write -dLanguageLevel=2 -sOutputFile=test.eps -f test.pdf
gswin32 -dSAFER -dNOPLATFONTS -dNOPAUSE -dBATCH -dEPSCrop -sDEVICE=pdfwrite -dPDFSETTINGS=/printer -dCompatibilityLevel=1.4 -dMaxSubsetPct=100 -dSubsetFonts=true -dEmbedAllFonts=true -sOutputFile=test2.pdf -f test.eps
I can see the converted test2.pdf have very weird font:
that is different from the original font in test.pdf:
When I copy the text from test2.pdf, I only get a couple of symbols like:
✕ ✖ ✗✘✙ ✚✛
Here is the test2.pdf: https://dl.dropboxusercontent.com/u/45318932/test2.pdf
I was using the latest Ghostscript 9.15. So what is the problem?

I just noticed you are using epswrite, you don't want to do that. That device is terrible and has been deprecated (and removed now). Use the eps2write device instead (you will need a relatively recent version of Ghostscript).
There's nothing you can do with epswrite except throw it away, it makes terrible EPS files. It also can't make level 2 files, no matter what you set -dLanguageLevel to
oh, and don't use -dNOCACHE, that prevents fonts being processed and decomposes everything to outlines or bitmaps.
UPDATE
You set subset fonts to true. By doing so the character codes which are used are more or less random. The first glyph in the document (say for example the 'H' in 'Hello World') gets the code 1, the second one (eg 'e') gets the code 2 and so on.
If you have a ToUnicode CMap, then Acrobat and other readers can convert these character codes to Unicode code points, without that the readers have to fall back on heuristics, the final one being 'treat it as ASCII'. Because the encoding arrangement isn't ASCII, then you get gibberish. MS Windows' PostScript output can contain additional ToUnicode information, but that's not something we try to mimic in ps2write. After all, presumably you had a PDF file already....
Every time you do a conversion you run the risk of this kind of degradation, you should really try and minimise this in your workflow.
The problem is even worse in this case, the input PDF file has a TrueType CID Font. Basic language level 2 PostScript can't handle CIDFonts (IIRC this was introduced in version 2015). Since eps2write only emits basic level 2 it cannot write the font as a CIDFont. So instead it captures the glyph outlines and stores them in a type 3 font.
However, our EPS/PS output doesn't attempt to embed ToUnicode information in the PostScript (its non-standard, very few applications can make use of it and it therefore makes the files larger for little benefit). In addition CIDFonts use multiple (2 or more) bytes for the character code, so there's no way to encode the type 3 fonts as ASCII.
Fundamentally you cannot use Ghostscript to go PDF->PS->PDF and still be able to copy/paste/search text, if the input contains CIDFonts.
By the way, there's no point in setting -dLanguageLevel at all. eps2write only creates level 2 output.

I used Inkscape To convert a .pdf to .EPS. Just upload the .pdf file to Inkscape, in the options to open chose high mesh, and save as . an EPS file.

Cropped PCL after gswin PDF to PCL conversion

I have a PDF, which I want to convert to PCL
I convert PDF to PCL using the following command:
(gs 8.70)
gswin32c.exe -q -dNOPAUSE -dBATCH \
-sDEVICE=ljetplus -dDuplex=false -dTumble=false \
-sPAPERSIZE=a4 -sOutputFile="d:\doc1.pcl" \
-f"d:\doc1.pdf" -c -quit
When I view or print the output PCL, it is cropped. I would expect the output to start right at the edge of the paper (at least in the viewer).
Is there any to way get the whole output without moving the contents of the page away from the paper edge?
I tried the -dPDFFitpage option which works, but results in a scaled output.

You are using -sPAPERSIZE=a4. This causes the PCL to render for A4 sized media.
Very likely, your input PDF is made for a non-A4 size. That leaves you with 3 options:
...you either use that exact page size for the PCL too (which your printer possibly cannot handle),
...or you have to add -dPDFFitPage (as you tried, but didn't like),
...or you skip the -sPAPERSIZE=... parameter altogether (which most likely will automatically use the same as size as the PDF, and which your printer possibly cannot handle...)
Update 1:
In case ljetplus is not a hard requirement for your requested PCL format variant, you could try this:
gs -sDEVICE=pxlmono -o pxlmono.pcl a4-fo.pdf
gs -sDEVICE=pxlcolor -o pxlcolor.pcl a4-fo.pdf
Update 2:
I can confirm now that even the most recent version of Ghostscript (v9.06) cannot handle non-Letter page sizes for ljetplus output.
I'd regard this as a bug... but it could well be that it won't be fixed, even if reported at the GS bug tracker. However, the least that can be expected is that it will get documented as a known limitation for ljetplus output...

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas