How to merge many PDF files into a single one? [duplicate]

How to merge many PDF files into a single one? [duplicate] - pdf

This question already has answers here:
Merge / convert multiple PDF files into one PDF [closed]
(23 answers)
Closed 6 years ago.
I have 16 pdfs that I want to convert into a single one... I am on Ubuntu 10.10, how can I do it?

First, get Pdftk:
sudo apt-get install pdftk
Now, as shown on example page, use
pdftk 1.pdf 2.pdf 3.pdf cat output 123.pdf
for merging pdf files into one.

You can also use Ghostscript to merge different PDFs. You can even use it to merge a mix of PDFs, PostScript (PS) and EPS into one single output PDF file:
gs \
-o merged.pdf \
-sDEVICE=pdfwrite \
-dPDFSETTINGS=/prepress \
input_1.pdf \
input_2.pdf \
input_3.eps \
input_4.ps \
input_5.pdf
However, I agree with other answers: for your use case of merging PDF file types only, pdftk may be the best (and certainly fastest) option.
Update:
If processing time is not the main concern, but if the main concern is file size (or a fine-grained control over certain features of the output file), then the Ghostscript way certainly offers more power to you. To highlight a few of the differences:
Ghostscript can 'consolidate' the fonts of the input files which leads to a smaller file size of the output. It also can re-sample images, or scale all pages to a different size, or achieve a controlled color conversion from RGB to CMYK (or vice versa) should you need this (but that will require more CLI options than outlined in above command).
pdftk will just concatenate each file, and will not convert any colors. If each of your 16 input PDFs contains 5 subsetted fonts, the resulting output will contain 80 subsetted fonts. The resulting PDF's size is (nearly exactly) the sum of the input file bytes.

You can use http://www.mergepdf.net/ for example
Or:
PDFTK http://www.pdflabs.com/tools/pdftk-the-pdf-toolkit/
If you are NOT on Ubuntu and you have the same problem (and you wanted to start a new topic on SO and SO suggested to have a look at this question) you can also do it like this:
Things You'll Need:
* Full Version of Adobe Acrobat
Open all the .pdf files you wish to merge. These can be minimized on your desktop as individual tabs.
Pull up what you wish to be the first page of your merged document.
Click the 'Combine Files' icon on the top left portion of the screen.
The 'Combine Files' window that pops up is divided into three sections. The first section is titled, 'Choose the files you wish to combine'. Select the 'Add Open Files' option.
Select the other open .pdf documents on your desktop when prompted.
Rearrange the documents as you wish in the second window, titled, 'Arrange the files in the order you want them to appear in the new PDF'
The final window, titled, 'Choose a file size and conversion setting' allows you to control the size of your merged PDF document. Consider the purpose of your new document. If its to be sent as an e-mail attachment, use a low size setting. If the PDF contains images or is to be used for presentation, choose a high setting. When finished, select 'Next'.
A final choice: choose between either a single PDF document, or a PDF package, which comes with the option of creating a specialized cover sheet. When finished, hit 'Create', and save to your preferred location.
Tips & Warnings
Double check the PDF documents prior to merging to make sure all pertinent information is included. Its much easier to re-create a single PDF page than a multi-page document.

There are lots of free tools that can do this.
I use PDFTK (a open source cross-platform command-line tool) for things like that.

Also seem pdfjam: http://www2.warwick.ac.uk/fac/sci/statistics/staff/academic/firth/software/pdfjam/

Related

inkscape: multiple page pdf to multiple png

when I convert pdf to image in linux command line, it seems inkscape gets the best result (better quality than gs with same dpi). Unfortunately, it only converts the first page to png. How to convert every pdf page to different png file? Do I have to extract one PDF page and store to a new pdf file , then do inkscape concert, and so on?

This isn't solely using Inkscape, but you could use e.g. pdftk to split up the pdf-file into separate pages and convert every page into a png with Inkscape. For example, like this:
pdftk file.pdf burst;
l=$(ls pg_*.pdf)
for i in $l; do inkscape "$i" -z --export-dpi=300 --export-area-page --export-png="$i.png"; done
Note that pdftk burst creates pdf-files called pg_0001.pdf, etc., so if you have any files named like that, they'll be overwritten. You can remove them afterwards easily using
rm pg_*.pdf

Lu Kas' answer threw warnings for me without doing the conversion. Probably because I'm running Inkscape 1.1
However, i got it running by replacing some deprecated commands:
inkscape pdfFile.pdf --export-dpi=300 --export-area-page --export-filename=imageFile.png;

For batch processing rather than slowly looping through file by file inkscape has a shell mode for command file scripting. See https://wiki.inkscape.org/wiki/index.php/Using_the_Command_Line#Shell_mode
However like all other #file.txt scripts you need to write a custom text file. and for Windows users run against higher ranking inkscape.com not .exe
Since version 1.0 (currently 1.2) a multipage pdf of contents can be addressed for multiple outputs. for some other examples see https://inkscape.org/doc/inkscape-man.html#EXAMPLES
Commands get replaced over time so currently to export png use --export-type="xxx" to batch export a list of input files to type xxx. Thus in this case --export-type="png"
Also for pdf related inputs and support see https://wiki.inkscape.org/wiki/index.php/Using_the_Command_Line#New_options
For windows users there is a handy batchfile converter here https://gist.github.com/JohannesDeml/779b29128cdd7f216ab5000466404f11

How can I force PDFsharp to embed a subset of a font only?

I am able to sucessfully create PDF files using PDFsharp and MigraDoc.
Two private fonts (OTF format) are used for the creation of a single page PDF. The created PDF contains both fonts fully embedded.
Unfortunatly each font contains Chinese letters too and therefor measures about 4 MB in size each resulting in a PDF file size about 9 MB (containing one page with a bit of text only!). :shock:
Is it possible to use a subset of those fonts to save valuable space.
The thing is I need to create a few thousends PDF files and therefor file size is crucial.
Is there a special setting i can use?
Can anyone point me into the right direction?
Update:
I used fontforge to extract the embedded font subgroub and found out that the fonts derived from the pdf match the full font files exactly.
So no font subsetting is indeed used at all. :(
Taking a look into the PDFsharp sources I found the function
public OpenTypeFontface CreateFontSubSet(Dictionary<int, object> glyphs, bool cidFont)
which is commented as follows: Creates a new font image that is a subset of this font image containing only the specified glyphs.
Which is exactly what I want to be used here.
The thing I do not understand is why this function seems not to get used when creating my PDF.
What criteria needs to be met in order to make it work?

Just found a solution to my problem that requires no extensive fiddling with additional pdf frameworks. I am able to create font subsets using ghostscript (commandline).
In fact ghostscript takes the (pdfsharp-) generated file and rewrites it (while optimizing the fonts). Here the commandline solution:
gswin64 -dCompatibilityLevel=1.4 -dPDFSETTINGS=/printer -dCompressFonts=true -dSubsetFonts=true -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -sOutputFile=optimized.pdf -c ".setpdfwrite <</NeverEmbed [ ]>> setdistillerparams" -f my_pdfsharp.pdf
My file size of about 9 MB is now down to 51 KB. Yihaa!!!

Some fonts have a "loca table", some do not. The loca table stores the offsets to the locations of the glyphs in the font.
CreateFontSubSet is and can only be called for fonts with a loca table that provides the information needed to create subsets.

How to batch convert CMYK PDFs to RGB?

I have a large batch of PDFs (6,000 +) that need to be converted from a CMYK color profile to RGB. Are there any scripts that can accomplish this task, and ideally without a (too) visible change in color? The PDFs are book files that were originally designed for print and are being prepared to be loaded as e-books.
I've found a few InDesign scripts that might be able to do this, but at this point obtaining and re-exporting from the original design files will be extraordinarily time consuming. Another option seems to be running actions via Adobe Acrobat, but I haven't had any success with that yet.
I've also found this bit of Java, if anyone can vouch for it:
http://www.aspose.com/docs/display/pdfjava/Changing+Color+space+of+a+PDF+document
Any suggestions or insights?

You can use Ghostscript for this job. Make sure to use a very recent release though.
Here is a command to try:
gs \
-o rgb.pdf \
-sDEVICE=pdfwrite \
-sProcessColorModel=DeviceRGB \
-sColorConversionStrategy=RGB \
-sColorConversionStrategyForImages=RGB \
cmyk.pdf
Note, that your goal to achieve the conversion 'ideally without a (too) visible change in color' is not always possible. It very much depends on wether the input PDF did use an embedded color profile, and which.
It also depends on the color profile you apply. Above command will use a default RGB profile compiled into Ghostscript. To use a custom profile, you can add various command line parameters. To use one profile for all types of PDF content, use:
-sDefaultRGBProfile=rgb-profile-filename
This defines source colors which are not already colorimetrically defined in the source docu- ment.
If you want to override the profiles which are already embedded in the PDF document, add this:
-dOverrideICC=true
On top of these options, you can also control the ICC profile for the output device, by adding:
-sOutputICCProfile=output-profile-filename
When using an output profile, you frequently also want to set the rendering intent. For this purpose use:
-dRenderIntent=intent
where intent is one of
0 : for Perceptual
1 : for Colorimetric
2 : for Saturation
3 : for Absolute Colorimetric intent.
Ghostscript even supports to use different profiles for different types of PDF content: graphics, text and images. See here:
-sGraphicICCProfile=graphicprofile-filename
-sTextICCProfile=textprofile-filename
-sImageICCProfile=imageprofile-filename
Similar to the above explained generic option -dRenderIntent, you can specify different intents for different content types:
-dGraphicIntent=intent
-dTextIntent=intent
-dImageIntent=intent

I'd use Adobe Acrobat Pro. Go into Tools, Preflight (might be in a different spot depending on what version you have).
In the PDF fixups section look for "Convert to sRGB". You can run this command manually a single PDF to see if it works for you. If it does, go to the Options menu and select "Create Preflight Droplet"
You'll get some options about what to do on success and failure but when you click the "Save" button you'll get an actual EXE file for Windows, Mac you should get an Application file. This file you can drag file and I think folders directly only to and it will run that action, just like Photoshop.

Preflight generated using Adobe Acrobat Pro can be used in a batch process. In my case I have to convert spot colors to CMYK without affecting other colours so i choose convert to CMYK only (SWOP) in the PDF fixups section
A .exe file is generated after saving the preflight. That can be tested using the command below in command prompt this can be tested.
"%location_of_file%\Convert to CMYK only (SWOP).exe" "" "%file_name%"
I also prepared a script so as to automate the process i can give the small prototype of it.
d:
cd %dir% ::directory on which the batch process is to be run.
:cycle
set count_files=0
for %%x in (*.pdf) do set /a count_files+=1 ::PDF in my case so *.pdf
if %count_files%==0 ( GOTO :MISSING ) else (for /F %%a in ('dir /a-d /b /o-d *.pdf') do set oldest=%%a)
"%location_of_executable%\Convert to CMYK only (SWOP).exe" "" "%oldest%"
move %oldest% %output_folder_with_location%\%oldest%
timeout 3 ::delay so that conversion process get completed
:MISSING
goto :cycle
This batch script goes on looping itself weather it gets failed or pass and in case there are PDF to be processed this BATCH script starts converting the oldest file first.

Undo Pdfnup Operation

I have a Pdf file which contains several slides per page, including text (not only images).
This pdf was probably created using pdfnup.
Can I revert the pdfnup operation so that each slide is shown on one page?

As far as I know, there is no simple to be used 'undo' operation.
However, the following answers show you the approach principle, how you can achieve the undo-equivalent operation using Ghostscript:
Convert PDF 2 sides per page to 1 side per page (Superuser)
How can I split a PDF's pages down the middle? (Superuser)
Cropping a PDF using Ghostscript 9.01 (Stackoverflow)
PDF - Remove White Margins (Stackoverflow)
(Should these not help you to find the final solution, ask again. But then to come up with a fully working commandline, I'd need the complete output of the following command first: pdfinfo -f 1 -l 100 -box your.pdf.)

Combining PDF with GhostScript: Using Original Bookmarks with corrected page numbers

I am using
gs -dBATCH -dNOPAUSE -sDEVICE=pdfwrite -sOutputFile=book.pdf -f front-matter.pdf fulltext-0.pdf fulltext-1.pdf back-matter.pdf
to create a single PDF document from a series of pdf documents. I was going to include a new made-up table of content and include it using the pdfmark mechanism. Then I notice that the original files already have bookmarks in them - they are however referenced to the original page numbers, not the ones in the combined document.
I am looking for two possible solutions. Remove the orginal bookmarks or make use of the original bookmarks but somehow update their page references...

As so often the case, someone has walked the same path before you...
unfolding disasters has worked out a solution to this very problem. His python script pdf-merge.py first invokes pdftk with its dump_data switch to retrieve all the pdfmark information. It then keeps track of the total number of pages for each merged document and does the math to offset the new page number pointer in the pdfmark instruction by the sum total of page counts of all the PDF documents included before the current PDF document. So it is close but not the same as the 2-pass approach of KenS. It first discovers bookmarks using pdftk and then creates a new bookmark file with correct page numbers. It also manages to turn the original pdfmark instruction (that would normally be preserved by gs into noop). I won't pretend I understand how that last part worked ...
However, the script does all I need including the option of tweaking the bookmark file before the final writing. Very neat and hat tip to Trevor King.

In general pdfwrite doesn't know you are appending files, so it preserves bookmark and other 'metadata' information on the assumption that you will want it in the output.
However, when you are combining PDF files, preserving the information won't work, as the page numbers for the second and subsequent files will be incorrect.
So you need a 2-pass approach, first merge all the files, discarding the bookmarks, then 'convert' the merged file and add pdfmarks to set the correct bookmarks.
There is currently no option (with pdfwrite) to not preserve bookmarks. You will need to modify the Ghostscript PDF interpreter PostScript files to achieve this I think. You might try setting -dDOPDFMARKS=false, but I doubt that will work.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas