I have a Pdf file which contains several slides per page, including text (not only images).
This pdf was probably created using pdfnup.
Can I revert the pdfnup operation so that each slide is shown on one page?
As far as I know, there is no simple to be used 'undo' operation.
However, the following answers show you the approach principle, how you can achieve the undo-equivalent operation using Ghostscript:
Convert PDF 2 sides per page to 1 side per page (Superuser)
How can I split a PDF's pages down the middle? (Superuser)
Cropping a PDF using Ghostscript 9.01 (Stackoverflow)
PDF - Remove White Margins (Stackoverflow)
(Should these not help you to find the final solution, ask again. But then to come up with a fully working commandline, I'd need the complete output of the following command first: pdfinfo -f 1 -l 100 -box your.pdf.)
Related
I receive postage labels from a supplier as single page pdf documents. The labels would fit on an A5 sheet but they are presented as a portrait within an A4 page, also in portrait orientation. I would like to be able to print two of these labels per A4 page to cut down on waste.
This can be achieved by rotating the page content without rotating the page itself. Or by resizing the page by swapping the height and width about the content. I am aware that both of these things can result in content being lost, which isn't a problem for my use case. Ideally I'd like a command line application that works on both Linux or Windows machines. Unfortunately, web searches for "rotate" or "resize" pdf will point to the many applications that just rotate or resize pdf pages along with the content which isn't what I want.
Similar questions:
With PdfBox: identical use case, see my comments on PdfBox below.
With iText: almost identical use case, I explicitly don't want any resizing of the content. See my comments on iText below as well.
Things I have investigated tried:
pdftk - too basic
ImageMagick - the original image contains transparency and the extent argument results in a visible loss of quality
pdfjam - also requires install of Latex and PdfPages. Ideally I'd like something that works on both Windows and Linux.
iText7 - the documentation isn't great. Looks like it was completely re-written in the last few years and the Nuget feed makes it clear that previous version, iTextSharp, is EOL. Consequently most of the examples one finds online (including on this site) are out of date. iText7 doesn't let you resize a page. I got as far as saving a document with a new page that was the right size but struggling to copy the content over. I think I could get what I wanted from this but it would take a long time and I'm trying to do something simple.
PdfBox - I've already tried one .NET library without success. Looking at the comments to the question I've linked above, this one seems to also have a version issue. I'm trying to do something really simple here, I will try this one if I exhaust all other avenues
Gimp - does what I want but I have to fire up the application, point and click quite a few times to rescale the image canvas, set the background and export
Screenshot the label from a pdf reader at 100% size and paste into a Word/LibreOffice doc. Sadly this is the most reliable method I have at the moment
I have example labels but they contain the name and address of people I've sent things to, I'd rather not upload them.
Try the command line tool cpdf from here: https://community.coherentpdf.com
cpdf -rotate-contents <angle> in.pdf -o out.pdf
to rotate contents without rotating the page. or...
cpdf -mediabox "100 100 600 500" in.pdf -o out.pdf
(and -cropbox and so on) to change page dimensions without altering content. Chapter 3 of the manual is of relevance.
You can also prepare the file by removing any page rotation whilst counter-rotating the content to leave the visual appearance unchanged:
cpdf -upright in.pdf -o out.pdf
In the context of my studies I often receive PDF files written in LaTeX, with big margins.
When I have to print those files, I like to print them with 2 pages per sheet to spare paper. But I then have a lot of white-space and the text is quite small.
So I'm looking for a way to scale the page contents first and only then print them 2 pages per sheet, to avoid losing space and to have the text as big and readable as possible.
Has anyone an idea of how I could do that either programmatically, or scripted, or on a "step-by-step commands" basis ?
(Note that I have no access to the LaTeX code, otherwise I would just change the margins...)
I used FinePrint to do this on windows. But there are some alternatives, which I haven't try:
https://superuser.com/questions/190869/fineprint-alternative-on-linux
https://superuser.com/questions/107687/good-virtual-printers-with-cropping-for-windows-and-linux
Here are previous answers (all mine) which provide building blocks that will help you construct your own programmatic or scripted or "some step-by-step commands" solution:
PDF Manipulation: "2-Up" page layout (SuperUser)
Linux-based tool to chop PDFs into multiple pages (SuperUser)
Convert PDF 2 sides per page to 1 side per page (SuperUser)
How can I split a PDF's pages down the middle? (SuperUser)
Cropping a PDF using Ghostscript 9.01 (StackOverflow)
Split one PDF page into two (StackOverflow)
PDF - Remove White Margins (StackOverflow)
We have to do a comparison of about 1500 PDF's in one folder with 1500 PDF's in another to check for visual differences.
We have found DiffPDF(and comparePDF command line version) for Windows which is a lot faster than our automated Acrobat Pro comparisons.
So far I have used:
comparepdf -v=2 =c=a old.pdf new.pdf
but the problem with this is that it just returns "these files are different". Does anyone know of any way to save the output from command line? You can do this from the GUI but that would mean using something like TestCOmplete to automate it :(
Or are there better ways of doing a comparison of 2 PDF's visually- with output/highlighting/
Bonus points for C# .net libraries.
You could have a look at these answers to similar questions:
PDF compare on linux command line
How to compare two pdf files through command line
How to unit test a Python function that draws PDF graphics?
However, I have no idea if any of these would be performing faster than what your automated Acrobat Pro comparison does... Let me know if you found out, will you?
Shortcut:
For simplicity, let's assume your input files to be compared are similar enough, and each being only 1 page. (For multi-page input expand the base idea of this answer...)
The two most essential commands any such comparison boils down to are these:
compare.exe ^
%input1% ^
%input2% ^
-compose src ^
%output%.tmp.pdf
and
pdftk.exe ^
%output%.tmp.pdf ^
background %input1% ^
output %output%.pdf
The first command generates a PDF with all differential pixels colored in red. (A default resolution is used here, 72 dpi. For a more fine-grained view on pixel differences add -density 200 (that will mean: 200 dpi) or higher -- but your processing time will increase accordingly as will the disk space needed by the output...)
The second command tries to merge the resulting PDF with a background taken from ${input1}.
Optionally, you may add -verbose -debug coder after the compare command for a better idea about what's going on.
compare.exe is a commandline tool from the great, great ImageMagick family of utilities (available for Linux, Windows, Unix and MacOSX). But it requires a Ghostscript installation to use as a 'delegate' in order to be able to process PDF input. pdftk.exe is also a commandline utility, available for the same platforms. Both a Free Software.
After the first command, you'll have an output file which has only red pixels where there are differences found on the page.
After the second command, you'll have an output with all red 'diff' pixels in the context of the first input PDF.
Example output:
Here are screenshots of two 1-page PDF files with differences in their content:
Here are screenshots of the output produced by the two commands above:
The left one shows the intermediate result (after first command), with only the difference pixels displaying as red (identical pixels being white).
The screenshot on the right shows the red difference pixels, but this time with the input PDF file number 1 as a (gray) background (after second command).
(PDF input files courtesy of Mark Summerfield, author of the beautiful DiffPDF tool.)
I had the same problem, diffpdf is quick and nice but GUI only.
[comparepdf] is console one but reports only exit code (no diff itself).
[diff-pdf] has both console mode and diff.pdf output but it is slow and output is not friendly.
I have tried to add the required code to diffpdf,
you can find it here: http://github.com/taurus-forever/diffpdf-console
I am using
gs -dBATCH -dNOPAUSE -sDEVICE=pdfwrite -sOutputFile=book.pdf -f front-matter.pdf fulltext-0.pdf fulltext-1.pdf back-matter.pdf
to create a single PDF document from a series of pdf documents. I was going to include a new made-up table of content and include it using the pdfmark mechanism. Then I notice that the original files already have bookmarks in them - they are however referenced to the original page numbers, not the ones in the combined document.
I am looking for two possible solutions. Remove the orginal bookmarks or make use of the original bookmarks but somehow update their page references...
As so often the case, someone has walked the same path before you...
unfolding disasters has worked out a solution to this very problem. His python script pdf-merge.py first invokes pdftk with its dump_data switch to retrieve all the pdfmark information. It then keeps track of the total number of pages for each merged document and does the math to offset the new page number pointer in the pdfmark instruction by the sum total of page counts of all the PDF documents included before the current PDF document. So it is close but not the same as the 2-pass approach of KenS. It first discovers bookmarks using pdftk and then creates a new bookmark file with correct page numbers. It also manages to turn the original pdfmark instruction (that would normally be preserved by gs into noop). I won't pretend I understand how that last part worked ...
However, the script does all I need including the option of tweaking the bookmark file before the final writing. Very neat and hat tip to Trevor King.
In general pdfwrite doesn't know you are appending files, so it preserves bookmark and other 'metadata' information on the assumption that you will want it in the output.
However, when you are combining PDF files, preserving the information won't work, as the page numbers for the second and subsequent files will be incorrect.
So you need a 2-pass approach, first merge all the files, discarding the bookmarks, then 'convert' the merged file and add pdfmarks to set the correct bookmarks.
There is currently no option (with pdfwrite) to not preserve bookmarks. You will need to modify the Ghostscript PDF interpreter PostScript files to achieve this I think. You might try setting -dDOPDFMARKS=false, but I doubt that will work.
This question already has answers here:
Merge / convert multiple PDF files into one PDF [closed]
(23 answers)
Closed 6 years ago.
I have 16 pdfs that I want to convert into a single one... I am on Ubuntu 10.10, how can I do it?
First, get Pdftk:
sudo apt-get install pdftk
Now, as shown on example page, use
pdftk 1.pdf 2.pdf 3.pdf cat output 123.pdf
for merging pdf files into one.
You can also use Ghostscript to merge different PDFs. You can even use it to merge a mix of PDFs, PostScript (PS) and EPS into one single output PDF file:
gs \
-o merged.pdf \
-sDEVICE=pdfwrite \
-dPDFSETTINGS=/prepress \
input_1.pdf \
input_2.pdf \
input_3.eps \
input_4.ps \
input_5.pdf
However, I agree with other answers: for your use case of merging PDF file types only, pdftk may be the best (and certainly fastest) option.
Update:
If processing time is not the main concern, but if the main concern is file size (or a fine-grained control over certain features of the output file), then the Ghostscript way certainly offers more power to you. To highlight a few of the differences:
Ghostscript can 'consolidate' the fonts of the input files which leads to a smaller file size of the output. It also can re-sample images, or scale all pages to a different size, or achieve a controlled color conversion from RGB to CMYK (or vice versa) should you need this (but that will require more CLI options than outlined in above command).
pdftk will just concatenate each file, and will not convert any colors. If each of your 16 input PDFs contains 5 subsetted fonts, the resulting output will contain 80 subsetted fonts. The resulting PDF's size is (nearly exactly) the sum of the input file bytes.
You can use http://www.mergepdf.net/ for example
Or:
PDFTK http://www.pdflabs.com/tools/pdftk-the-pdf-toolkit/
If you are NOT on Ubuntu and you have the same problem (and you wanted to start a new topic on SO and SO suggested to have a look at this question) you can also do it like this:
Things You'll Need:
* Full Version of Adobe Acrobat
Open all the .pdf files you wish to merge. These can be minimized on your desktop as individual tabs.
Pull up what you wish to be the first page of your merged document.
Click the 'Combine Files' icon on the top left portion of the screen.
The 'Combine Files' window that pops up is divided into three sections. The first section is titled, 'Choose the files you wish to combine'. Select the 'Add Open Files' option.
Select the other open .pdf documents on your desktop when prompted.
Rearrange the documents as you wish in the second window, titled, 'Arrange the files in the order you want them to appear in the new PDF'
The final window, titled, 'Choose a file size and conversion setting' allows you to control the size of your merged PDF document. Consider the purpose of your new document. If its to be sent as an e-mail attachment, use a low size setting. If the PDF contains images or is to be used for presentation, choose a high setting. When finished, select 'Next'.
A final choice: choose between either a single PDF document, or a PDF package, which comes with the option of creating a specialized cover sheet. When finished, hit 'Create', and save to your preferred location.
Tips & Warnings
Double check the PDF documents prior to merging to make sure all pertinent information is included. Its much easier to re-create a single PDF page than a multi-page document.
There are lots of free tools that can do this.
I use PDFTK (a open source cross-platform command-line tool) for things like that.
Also seem pdfjam: http://www2.warwick.ac.uk/fac/sci/statistics/staff/academic/firth/software/pdfjam/