Is it possible to programmatically "chain" several PDF files, preferably from command line? - pdf

Is there a way, in Linux, Windows, or preferably Mac OS X to take a bunch of PDF files and "chain them" into one "booklet" without owning Acrobat and preferably without doing this manually?
I have TexShop, MikTex and the like installed, if any of their utilities help.

ghostcript method:
gs -q -sPAPERSIZE=letter -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -sOutputFile=out.pdf in1.pdf in2.pdf in3.pdf ...
from: How to concatenate PDFs without pain
ImageMagick method:
convert file1.pdf file2.pdf file3.pdf out.pdf
pdftk method:
pdftk file1.pdf file2.pdf file3.pdf cat output out.pdf

I have tried several different tools and have gotten the most reliable results with the PDF toolkit, pdftk. It seems to work more consistently than trying to use gs or messing around with conversion to PostScript and back. And it avoids dealing with one image per page, which is a nuisance.
pdftk is included in Debian-based Linux distributions and perhaps others as well.

I had to recently research this and came up with the following. In the end I went with ImageMagick.
Merging is hard! http://ansuz.sooke.bc.ca/software/pdf-append.php
pdfjoin from the pdfjam package seems to be the standard on unix-like systems but not available on Windows
Coherentpdf is multi-platform. However licences cost up to €700
pdftk is multi-platform and open source. However it does appear to be 3 years old.
Imagemagick will merge pdfs and also generate pdfs from jpgs. I know it works on Linux and Windows.
PDFsam works using iText and Java

You can chain the "Get selected Items", "Combine PDF Pages", "Rename PDF Document" and "Move Finder Items" actions in Automator to create the desired workflow.

Have a look at Multivalent Document Tools
Failing that you can search out other tools via Freshmeat.net

I've mentioned it in the other topics and I'll mention it again: you can use the Ghostscript utilities pdf2ps and ps2pdf do it as so:
pdf2ps file1.pdf file1.ps # Convert file1 to PostScript
pdf2ps file2.pdf file2.ps # Convert file2 to PostScript
cat file2.ps >> file1.ps # Concatenate files
ps2pdf file1.ps output.pdf # Convert back to PDF

I have also used Multivalent Java based tools. It is a simple invocation of Java MultiValent main program passing in each pdf file you want to append as arguments.

Related

Remove flickering during pdf refresh when compiling latex with pandoc

I am creating latex documents on Linux (Debian 9) writing them in markdown and converting them with pandoc.
I use the following pandoc terminal command:
watch -n 5 pandoc -N --toc --toc-depth=3 -f markdown checklist.md -t
latex -o checklist.pdf
which compiles the markdown code into a pdf (latex) every 5 seconds, which enables live preview.
The issue is that the pdf in the pdf reader (i have tried several, i.e. evince, atril, okular, mupdf) flashes every time the pdf is recompiled (in this case every 5 seconds), which is somewhat annoying.
I had a similar setup for latex live preview in vim, and it was not flashing like this.
(one possible difference between the vim setup, and my current workflow, is that in vim, the latex code would auto-save after any changes, which i am not doing now (and would rather not do).
PS: i have found this script, that seems interesting, but am not sure how to get it to work:
https://gist.github.com/mmcclimon/7311538

Convert all files in a folder from PDF to PCL with Ghostscript

I'm trying to use Ghostscript to convert my files in PDF to PCL. I'm able to convert one file with this command:
gswin64c -dBATCH -dNOPAUSE -dSAFER -sDEVICE=pxlcolor -sOutputFile=[PCLPath].pcl [PDFPath].pdf
It works fine, I think, if you see anything wrong or not needed please say me.
The question is to convert all files in a folder, I don't know how to change the command line to do that, or what I have to do, maybe a script file??
Other question is if there is someway to accelerate the process, with any options in the command line, or using Linux instead Windows, whatever.
Thanks in advance!
Greetings.
You can use Ghostscript Studio to do that.
See the image below:

Is it possible to uncompress PDF by using Adobe Acrobat or Acrobat Distiller?

Most PDF files found on the Web have compressed and unreadable data streams. Is it possible to uncompress the internal content of a PDF file using Acrobat or Acrobat Distiller, allowing us to read the source code by a text editor?
P.S. This question is inspired by this answer which explains how it can be done with GhostScript.
qpdf and pdftk have already been mentioned. To show the commands:
$ qpdf --qdf --object-streams=disable orig.pdf uncompressed-orig.pdf
$ pdftk orig.pdf output uncompressed-orig.pdf uncompress
mutool however hasn't been mentioned yet:
$ mutool clean -d -a orig.pdf uncompressed-orig.pdf
mutool is a command line tool which ships alongside the lightweight MuPDF PDF + document viewer.
I do not think you can achieve the uncompressing of PDF objects' streams with Acrobat or Distiller (unless you have additional payware plugins available).
Use cpdf:
cpdf -decompress in.pdf -o out.pdf
and then the graphic operators for each page can be read in a text editor. You'll need a copy of the standard as a reference, though.
Disclosure: I am the author of cpdf.
This is easy with qpdf and pdftk.
With Adobe Acrobat you can get at the internal structure after profiling a PDF (preflight with some profile (e.g. detect PDF syntax errors), then Options->Internal PDF structure) - but there's no way to get something editable with a text editor.

Ghostscript not extracting all the text from PDF file

I am using ghostscript 8.71 to extract text from the PDF pages.
The command I am using is:
gswin32c -q -sFONTPATH=c:\\fonts -dNODISPLAY -dSAFER -dDELAYBIND \
-dWRITESYSTEMDICT -dSIMPLE -fps2ascii.ps -dFirstPage=1 \
-dLastPage=1 input.pdf -dQUIET
And I am using <stdout> to direct the text to another file.
But the problem is some searchable text items are not extracted by Ghostscript.
Some font text is not extracted, for example: Verdana in bold characters. But Ghostscript is opening the font files.
I can upload the PDF file but here I didn't find any upload option. If any option is available let me know.
Did you also try alternative commandline tools to extract the text, such as pdftotext from the XPDF package? How do these compare?
Can you give more details about what exactly is missing in your output? Just certain types of characters, just certain fonts, just certain pages?
Also, you are mixing Linux/Unix syntax ("gs") with Windows syntax ("c:\fonts"). On Windows systems, the default location where fonts are hosted usually is c:\Windows\fonts ...
Oh, and yes: having your problematic PDF file to look at would definitely help.

How to use ghostscript to convert PDF to PDF/A or PDF/X?

Is there a way to use ghostscript to convert PDF to PDF/A or PDF/X? I know it can be used to convert PDF to images, but I don't know if it can be used to convert PDF/A. What parameters should I use?
This is to convert a pdf document (not pdf/a) into pdf/a:
gs -dPDFA -dBATCH -dNOPAUSE -dUseCIEColor -sProcessColorModel=DeviceCMYK -sDEVICE=pdfwrite -sPDFACompatibilityPolicy=1 -sOutputFile=output_filename.pdf input_filename.pdf
Hope this will help some one!
Hope this answer helps others coming from Google with the same problem:
To convert from PDF to PDFA-1b or PDFA-2b, you can use Ghostscript. I suggest you use the latest version (9.19 today).
Install it
**In Mac OS**, you may prefer to use [Homebrew][1]:
brew install ghostscript
(UPDATE: 2023-01-23. This no longer works in mac with homebrew, as versions newer than 9.19 will adamantly refuse to do the conversion, no matter what I've tried)
In Linux, some distros bring a much older version (rhel7 sports 9.07). To download a fully independent modern one-file-only ghostscript, download it directly from the site:
wget https://github.com/ArtifexSoftware/ghostpdl-downloads/releases/download/gs919/ghostscript-9.19-linux-x86_64.tgz
(UPDATE: 2023-01-23: stick to that version, newer versions won't work with the method presented below.
If the link above is broken when you try it 20 years from now, please refer to ghostscript.com and search for download section. Download the binary version, don't go for the source, unless you know what you are doing.
In Windows, I cannot help you, but if you manage to install it, the following commands will also work, if you substitute the location of files and gs executable.
Command line
(note to future editors, please don't remove formatting, as this is more readable, yet working command line)
gs-919-linux_x86_64 \
-dPDFA=1
-dNOOUTERSAVE \
-sProcessColorModel=DeviceRGB \
-sDEVICE=pdfwrite \
-dPDFACompatibilityPolicy=1 \
-o output_file.pdf \
/path/to/PDFA_def.ps \
input_file.pdf
In Mac gs-919-linux_x86_64 will be simply gs.
Please note that output_file.pdf and input_file.pdf must be changed to the names of the output file (the converted file) and the input file (the file to be converted). /path/to/PDFA_def.ps is your copy of the file PDFA_def.ps.
-dPDFA=1 is for PDFA-1b.
-dPDFA=2 if you want PDFA-2b.
What is PDFA_def.ps?
PDFA_def.ps is some sort of template ghostscript uses to create a PDFA file. The tricky part is that, for some reason, ghostcript comes with a non-working file.
You'll need to edit PDFA_def.ps and include the path to a valid ICC (color profile) file. Download a good color profile from Adobe:
wget https://tutankhamon.acc.umu.se/mirror/archive/ftp.sunet.se/pub/vendor/adobe/adobe/iccprofiles/win/AdobeICCProfilesWin_end-user.zip
Inside that zip, find a file called AdobeRGB1998.icc, put it somewhere and put the path to that file INSIDE you PDFA_def.ps file. Note that the path should be absolute, with no quotes. Like:
/ICCProfile (/full/path/to/file/AdobeRG1998.icc) % Customize.
Here is a version of PDFA_def.ps, change PATH_TO_YOUR_ICC_FILE to the path of you AdobeRGB1998.icc.
https://gist.githubusercontent.com/weltonrodrigo/19df77833f023fbe1572168982e4b515/raw/ea86e87379d14120d7ff26f6f235ac7eeb5f5dd5/PDFA_def.ps
#danio, #imgen: Even recently released documentation pages on PDF/X (standardized Prepress requirements) and PDF/A (standardized Archiving requirments) generation were quite misleading. (Your link pointed to a v8.63 release.) In the end, it suggested that running the example commandlines using the sample PDF*_def.ps would already generated valid PDF/A and PDF/X files.
But, they do not!
Here is one of the sample commands, which by itself is correct:
gs \
-dPDFA \
-dBATCH \
-dNOPAUSE \
-dNOOUTERSAVE \
-dUseCIEColor \
-sDEVICE=pdfwrite \
-sOutputFile=out-a.pdf \
PDFA_def.ps \
input.ps
The output file will declare itself to be PDF/A (and most PDF viewers would happily go along with this), but the output file fails all real compliance tests.
The fix is easy: you need to edit your sample PDFA_def.ps (for PDF/X: your PDFX_def.ps) files to match your environments. These required edits were not clearly spelled out in older documentation versions, and the provided command suggested it would work out of the box.
Especially in case of PDF/X you MUST specifiy a valid ICC profile to use.
See also the updated documentation (current SVN trunk version) about this:
http://svn.ghostscript.com/ghostscript/trunk/gs/doc/Ps2pdf.htm#PDFA
Please note that current answers are not completely correct. You can define which level of PDF/A you want, resulting in different behaviors of the program. This one is correct:
gs -dPDFA -dBATCH -dNOPAUSE -sColorConversionStrategy=UseDeviceIndependentColor -sDEVICE=pdfwrite -dPDFACompatibilityPolicy=2 -sOutputFile=output_filename.pdf input_filename.pdf
Please note my change from sdPDFACompatibilityPolicy to dPDFACompatibilityPolicy.
Change it to a higher number to get other versions. 1 is good if you don't need DOCINFO.
Furthermore we use the option UseDeviceIndependentColor to avoid validating issues.
If you change options here, you will most likely get a non compliant PDF/A (even if it stated differently).
You can check your pdf/a here:
https://www.pdf-online.com/osa/validate.aspx
If you're using Windows and want to create PDF/A-1b documents explicitely (PDFCreator has an output option for PDF/A-2b but not for PDF/A-1b), you just can enter the parameters Artur described above into the ui settings of PDFCreator without the ones for the document names. Start PDFCreator, choose the printer menu, then go to settings. Now, choose 'Ghostscript' from the settings list on the left side. Under 'additional ghostscript settings', enter as follows :
-dPDFA|-dBATCH|-dNOPAUSE|-dUseCIEColor|-sProcessColorModel=DeviceCMYK|-sDEVICE=pdfwrite|-sPDFACompatibilityPolicy=1
Click on 'Save', then print something from MS Word or any other application you want using the PDFCreator - it will be created in PDF/A-1b.
Greetings,
Fritz