change pdf document properties in command line - pdf

I want to change pdf document properties like this using command line instead of Acrobat.
Navigation tab: Bookmarks Panel and Page
Page layout: Single Page Continous
Magnification: Fit Width
It seems that cpdf and pdftk can not do so (Correct me if I am wrong). Is there a command line tool that can do?

Untested:
cpdf in.pdf -set-page-layout OneColumn AND -set-page-mode UseOutlines AND -open-at-page 1 -o out.pdf
Details in section 11.4 of the cpdf manual. See also the Nota Bene at the end of 11.4.1 in case of trouble with -set-page-layout.

Related

Use Ghostscript to convert each page of a PDF to images and the output is still PDF

I know that Adobe Acrobat Reader DC can select the Microsoft Print to PDF printer to output to a PDF file with Print As Image checked in the Advanced Print Setup dialog. However, I want to use a command to do this. I tried the following command, as a result it failed to convert each page to images (Note the output file is still PDF).
gs -o 0.999.watermask.compact.screen.pdf -sDEVICE=pdfwrite -dDetectDuplicateImages=true -dPDFSETTINGS=/screen 0.999.watermask.pdf
References
7.4 PDF file output
iText 7 iText 7 for Java represents the next level of SDKs for developers that want to take advantage of the benefits PDF can bring.
itext-rups-7.1.14.jar iText RUPS iText® 7.1.14 ©2000-2020 iText Group NV (AGPL-version)
Your -switches include -dDetectDuplicateImages=true which under the circumstances should be superfluous and the device selection can be from one of four as pointed out by KenS.
gs -o 0.999.watermask.compact.screen.pdf -sDEVICE=pdfimage32 -dPDFSETTINGS=/screen 0.999.watermask.pdf
If you want to emulate MS Print As Image PDF on Windows you would find the result in some ways inferior (and often many times bigger). But for comparison it would be,
NOTE:- "%%printer%%... is for a batch file for a command line use "%printer%...
gswin64c.exe -sDEVICE=mswinpr2 -dNoCancel -o "%%printer%%Microsoft Print to PDF" -dPDFSETTINGS=/screen -f "0.999.watermask.pdf"

Reverse white and black colors in a PDF

Given a black and white PDF, how do I reverse the colors such that background is black and everything else is white?
Adobe Reader does it (Preferences -> Accessibility) for viewing purposes only in the program. But does not change the document inherently such that the colors are reversed also in other PDF readers.
How to reverse colors permanently?
You can run the following Ghostscript command:
gs -o inverted.pdf \
-sDEVICE=pdfwrite \
-c "{1 exch sub}{1 exch sub}{1 exch sub}{1 exch sub} setcolortransfer" \
-f input.pdf
Acrobat will show the colors inverted.
The four identical parts {1 exch sub} are meant for CMYK color spaces and are applied to C(yan), M(agenta), Y(ellow) and (blac)K color channels in the order of appearance.
You may use only three of them -- then it is meant for RGB color spaces and is applied to R(ed), G(reen) and B(lue).
Of course you can "invent" you own transfer functions too, instead of the simple 1 exch sub one: for example {0.5 mul} will just use 50% of the original color values for each color channel.
Note: Above command will show ALL colors inverted, not just black+white!
Caveats:
Some PDF viewers won't display the inverted colors, notably Preview.app on Mac OS X, Evince, MuPDF and PDF.js (Firefox PDF Viewer) won't. But Chrome's native PDF viewer PDFium will do it, as well as Ghostscript and Adobe Reader.
It will not work with all PDFs (or for all pages of the PDF), because it is also dependent on how exactly the document's colors are defined.
Update
Command above updated with added -f parameter (required) before the input.pdf. Sorry for not noticing this flaw in my command line before. I got aware of it again only because some good soul gave it its first upvote today...
Additional update: The most recent versions of Ghostscript do not require the added -f parameter any more. Verified with v9.26 (may also be true even with v9.25 or earlier versions).
Best method would be to use "pdf2ps - Ghostscript PDF to PostScript translator", which convert the PDF to PS file.
Once PS file is created, open it with any text editor & add {1 exch sub} settransfer before first line.
Now "re-convert" the PS file back to PDF with same software used above.
If you have the Adobe PDF printer installed, you go to Print -> Adobe PDF -> Advanced... -> Output area and select the "Invert" checkbox. Your printed PDF file will then be inverted permanently.
None of the previously posted solutions worked for me so I wrote this simple bash script. It depends on pdftk and awk. Just copy the code into a file and make it executable. Then run it like:
$ /path/to/this_script.sh /path/to/mypdf.pdf
The script:
#!/bin/bash
pdftk "$1" output - uncompress | \
awk '
/^1 1 1 / {
sub(/1 1 1 /,"0 0 0 ",$0);
print;
next;
}
/^0 0 0 / {
sub(/0 0 0 /,"1 1 1 ",$0);
print;
next;
}
{ print }' | \
pdftk - output "${1/%.pdf/_inverted.pdf}" compress
This script works for me but your mileage may vary. In particular sometimes the colors are listed in the form 1.000 1.000 1.000 instead of 1 1 1. The script can easily be modified as needed. If desired, additional color conversions could be added as well.
For me, the pdf2ps -> edit -> ps2pdf solution did not work. The intermediate .ps file is inverted correctly, but the final .pdf is the same as the original. The final .pdf in the suggested gs solution was also the same as the original.
Cross Platform try MuPDF
Mutool draw -I -o out.pdf in.pdf [range of pages]
It should permanently change colours in many viewers
Later Edit
A sample file that did not reverse was one with linework only (no image) and the method needed was to save the graphics as inverted image then reuse that to build a replacement PDF, however beware converting the whole pages to image will make any searchable text just simply unsearchable pixels thus would need to be run with the OCR active on rebuild.
The two commands needed will be something like (%4d means numbers for images start output0001)
mutool draw -o output%4d.png -I input.pdf
For Linux users the folowing second pass should work easily:-
mutool convert -O compress -o output.pdf output*.png
For windows users you will for now (v1.19) need to combine by scripting or use groups
mutool convert -O compress -o output.pdf output0001.png output0002.png output0003.png
next version may include an #filelist option see https://bugs.ghostscript.com/show_bug.cgi?id=703163
This is probably just a frontend for the ghostscript command Kurt Pfeifle posted, but you could also use imagemagick with something like:
convert -density 300 -colorspace RGB -channel RGB -negate input.pdf output.pdf

can I create a PDF from ImageMagick which will always open at zoom 100%? [duplicate]

I am running into an issue with PNG to PDF conversion.
Actually I have big PNG files not in size but in contents.
In PDF conversion it creates a big PDF files. I don't have any issue with its quality, but whenever I try to open this PDF in PDF viewer, it opens in "Fit to Page" mode.
So, I can't see the created PDF in the initial view, but I need to zoom it up to 100%.
My question is: can I create a PDF which will always open at zoom 100% ?
You can possibly achieve what you want with the help of Ghostscript.
Ghostscript supports to insert PostScript snippets into its command line parameters via -c "...[PostScript code here]...".
PostScript has a special operator called pdfmark. This operator is not understood by most PostScript interpreters, but is understood by Acrobat Distiller and (for most of its parameters) also by Ghostscript when generating PDFs.
So you could try to insert
-c "[ /PageMode /UseNone /Page 1 /View [/XYZ null null 1] \
/PageLayout /SinglePage /DOCVIEW pdfmark"
into a PDF->PDF conversion Ghostscript command line.
Please take note about various basic things concerning this snippet:
The contents of the command line snippet appears to be 'unbalanced' regarding the [ and ] operators/keywords. But it is not! The initial [ is balanced by the final pdfmark keyword. (Don't ask -- I did not define this syntax...)
The 'inner' [ ... ] brackets delimit an array representing the page /View settings you desire.
Not all PDF viewers do respect the view settings embedded in the PDF file (Acrobat software does!).
Most PDF viewers allow users to override the view settings embedded in PDF files (Acrobat software also does this). That is, you can tell your viewer to never respect any settings from the PDF files it opens, but f.e. to always open it with "fit to width".
Some specific things about this snippet:
The page mode /UseNone means: the document displays without bookmarks or thumbnails. It could be replaced by
/UseOutlines (to display bookmarks also, not just the pages)
/UseThumbs (to display thumbnail images of the pages, not just the pages
/FullScreen (to open document in full screen mode)
The array for the view mode constructed as [/XYZ <left> <top> <zoom>] means: The zoom factor is 1 (=100%), the left distance from the page origin is the special 'null' value, which means to keep the previously user-set value; the top distance from the page origin is also 'null'. This array could be replaced by
/Fit (to adapt the page to the current window size)
/FitB (to adapt the visible page content to the current window size)
/FitH <top>' (to adapt the page width to the current window width);` indicates the required distance from page origin to upper edge of window.
...plus several others I cannot remember right now.
So to change the settings of an existing PDF file, you could do the following:
gs \
-o out.pdf \
-sDEVICE=pdfwrite \
-c "[ /PageMode /UseNone /Page 1 /View [ /XYZ null null 1 ] " \
-c " /PageLayout /SinglePage /DOCVIEW pdfmark" \
-f in.pdf
To check if the Ghostscript command worked, open the PDF in a text editor which is capable of handling binary files. Search for the /View or the /PageMode keywords and check if they are there, inserted as values into the PDF root object.
If it worked, check if your PDF viewer honors the settings. If it doesn't honor them, see if there is an overriding setting within the viewers preference settings.
I did a quick test run on a sample PDF of mine. Here is how the PDF root object's dictionary looks now, checked with the help of pdf-parser.py:
pdf-parser-beta.py -s Catalog a.pdf
obj 1 0
Type: /Catalog
Referencing: 3 0 R, 9 0 R
<<
/Type /Catalog
/Pages 3 0 R
/PageMode /UseNone
/Page 1
/View [/XYZ null null 1]
/PageLayout /SinglePage
/Metadata 9 0 R
>>
To learn more about the pdfmark operator, google for 'pdfmark reference filetype:pdf'. You should be able to find it on the Adobe website and elsewhere:
https://www.google.de/search?q=pdfmark%20reference%20filetype%3Apdf&oq=pdfmark%20reference%20filetype%3Apdf
In order to let ImageMagick create a PDF as you want it, you may be able to hack the file defining your delegate settings. For more help about this topic see for example here:
http://www.imagemagick.org/Usage/files/#delegates
PDF specification supports this functionality in this way: create a GoTo action that goes to first page and sets the zoom level to 100% and then set the action as the document open action.
How exactly you implement it in real life depends very much on the tool you use to create the PDF file. I do not know if ImageMagick can create such actions.

When converting first page of a PDF into an image using Ghostscript, sometimes I get "extra" space. Why?

I am building a simple script which converts the first page of a PDF into an image using Ghostscript. Here is the command I use:
gs -q -o output.png -sDEVICE=pngalpha -dLastPage=1 input.pdf
This works beautifully with some PDFs, e.g. if I convert the first page of a PDF that looks like this:
I actually get this first page as an image and there aren't any problems.
But I have noticed that with some first pages of other PDFs, like the following:
With the same gs command, after the conversion, the .png image looks like this:
The problem is that I get this extra white space on the left inside the image when I convert that page, why does GhostsScript do this? Where does that extra blank white space come from?
Most likely, your PDFs do not use identical values for /MediaBox and for /CropBox. For details about these technical terms related to a page, see this illustration from the German Wikipedia:
In other words: the /CropBox values (if given) for a PDF page determines which (smaller) part of the overall page information (which is inside the /MediaBox) the PDF viewer should be made visible to the user (or to the printer).
Solution
To determine what are the different values for all the pages of your book(s), run this command:
pdfinfo -f 1 -l 1000 -box my.pdf
To see these values just for the first page, run
pdfinfo -l 1 -box my.pdf
For Ghostscript to give the results you want, add -dUseCropBox to your command line:
gs -q -o output.png -sDEVICE=pngalpha -dLastPage=1 -dUseCropBox input.pdf

How to change header ("Contents") of automatic TOC when using Pandoc?

When converting markdown to pdf with pandoc (version 1.12.1) the ToC option adds an english header: "Contents".
Since my document is in Dutch, I would like to be able to put the Dutch equivalent of contents there. But unfortunately I couldn't find any configuration options for this, neither did I found clues in the default.latex file.
My query:
pandoc -S --toc essay.md --biblio "MCM Essay.bib" --csl apa.csl -o mcm.pdf
I'm using windows
I use MIKTex, like in the pandoc instructions
The string "Contents" is not supplied by pandoc, but by latex (which pandoc calls to create the PDF).
Try adding
-Vlang=dutch
to your command line. This will be passed to latex in the documentclass options, and LaTeX will provide the right string.
Adding
-V toc-title="My Custom TOC Header"
to the pandoc command line will also work. See https://pandoc.org/MANUAL.html#variables-set-automatically.