How can I combine two PDF pages show up on the same page? [closed] - pdf

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 6 years ago.
Improve this question
I am looking for a free tool that allows re-arranging pages of a PDF document and combining multiple pages per sheet. The first part (re-arranging) is easily solved by many tools (I use PyPDF).
The problem is with the second requirement: to combine two (or more) pages into single page. For example, take two pages (A and B), rotate them, scale and combine into a single page like this
------ ------ ------
| | | | | |
| A | | B | | a |
| | | | | |
| | | | ---> ------
| | | | | |
| | | | | b |
| | | | | |
------ ------ ------
The solution needs to work on Linux and preferably on Windows too. I'm looking for either console application or library with Python or Perl bindings.
Edit there is pdfnup library that is supposed to perform exactly this kind of transformation, and is cross-platform, however I cannot use it due to a bug similar to this.

This is a summary of the tools I found for PDF (I wanted to find the equivalent of psup and psbook)
Create booklets: pdfbook, pdf-tools (command: pdfbklt)
Merge PDF files: pdfmerge, pdfjam (command: pdfjoin)
Rotate pages: pdfjam (command: pdf90)
Multiple pages per sheet: pdfjam (command: pdfnup)
Create posters (multiple sheets per page): pdfposter
From my package manager:
pdf-tools: http://search.cpan.org/dist/Text-PDF
pdfbook: http://www.ctan.org/tex-archive/support/pdfbook/
pdfmerge: https://github.com/dmaphy/pdfmerge
pdfjam: http://go.warwick.ac.uk/pdfjam
pdfposter: http://pdfposter.origo.ethz.ch/
Create an A6 booklet:
pdfbook -2 -p a5 infile.pdf outfile.pdf
pdf-tools contains:
pdfbklt: create booklets
pdfrevert: Removes one layer of changes to a PDF file, trying to maximise the size of the output file (to account for linearised PDF).
pdfstamp: Adds the given string to the infile .pdf file at the given location, font and size.
There is also multivalent: http://multivalent.sourceforge.net/Tools/index.html

On Linux, you can convert the PDF files to Postscript and use psnup. The exact way to invoke it depends on exactly how you want the pages to be put together, whether you want them rotated, what paper size(s) you want to use, etc. but it'll be something like this:
pdf2ps infile.pdf infile.ps
psnup -2 infile.ps outfile.ps
ps2pdf outfile.ps outfile.pdf
Depending on what tools you have available, you might have a more efficient way to do this - psnup is certainly not the only way, but it's a relatively well-known program (on Linux anyway).

If you use Linux, you can use BookletImposer for putting multiple PDF pages on one single page.
For Ubuntu users, this tool is available at Ubuntu Software Center.

Check out this answer that uses Multivalent to impose PDF pages

In answer to your question, you'll need a PDF 'Imposition' tool, which is a fancy way of saying a tool that arranges PDF page images onto a particular array to create a NEW single PDF page. Imagine it's something like typesetting a newspaper. You define an array of slots a certain number of columns wide, by a certain number of rows deep, on a page of a certain fixed dimensions (in cm). Then you fill those empty slots top to bottom, left to right with pages from a pdf source-file. In the case of the OP, they want to create a single page, composed of two 8.5x11 pages arranged in a 1x2 array (1 column, 2 rows). Their pages will be dropped into the array in the following order: 1,2. So you are dropping the first page (page 1 of the pdf) into the first slot of your array (Column 1, row 1), and you are dropping the second page (page 2) into the second slot (Column 1, row 2).
How to use the tool to make this happen:
Download the old version of Multivalent. Note the author removed the good tool classes from the latest edition without explanation, so you have to use an older one. Here's a working link as of 02/12:
http://code.google.com/p/pdfsizeopt/downloads/detail?name=Multivalent20060102.jar
For simplicity, I renamed the jar file to m.jar.
It sort of goes without saying that you need to install JRE for this to work, but I'll put it out there.
Add m.jar to your Java Class Path Environment variables (for scripting) or run the command line syntax with the -cp option and the relative path (shown below). Note, I ran it FROM the command-line at the install directory in my example below. Provide an absolute path from root otherwise (like c:\1\bin\m.jar).
Here is an example that will accomplish exactly what OP posted about:
C:\1\bin>java -cp m.jar tool.pdf.Impose -dim 1x2 -verbose -papersize "21.59x55.88cm" -layout "1,2" yourfilename.pdf
Note, the -dim option creates the array in Columns x Rows. The -papersize is specified in centimeters here, but if you need inches, just multiply inches by 2.54 to get cm. The -layout option gives you the order you want to fill the empty slots in your array, filling from top to bottom and left to right. In this case, we want page one of the pdf on top and page two on the bottom, so our argument is "1,2". The final argument is your actual source file. The tool will output a file called yourfilename-up.pdf when you are done.
Hope that helps.
-Matt Zweil

Check the source code of PyPDF, especially the rotateClockwise() method. There must be a place where the content of a page is written. Insert a "q" operator (save state) and "cm" (with the correct parameters for a scaling matrix) before the content and a "Q" operator (restore state) afterwards.
See the PDF documentation for an explanation of operators and the structure of a page (scroll to the bottom for some useful links).
Don't forget to send a patch to PyPDF :)
[EDIT] You might also want to check the pdfjam sources which include a pdfnup command.

This is a perl function I use to grab a directory full of prn files from a 3rd party app and create a single merged pdf.
sub runMerged($)
{
my($path) = #_;
print "Generating merged PDFs for $path\n";
my #files = sort(getFiles($path, ".prn\$"));
if (scalar(#files))
{
open(MERGE, ">$path/merged.prn");
for (my $i = 0; $i < scalar(#files); $i++)
{
print MERGE "^L\n" if ($i > 0);
open(FN, "$path/" . $files[$i]);
while (my $line = <FN>)
{
print MERGE $line;
}
close(FN);
}
chdir("$BASE_PATH/txt2pdf");
print `./txt2pdf.pl $path/merged.prn`;
}
}

I Had a similar need this week.
But I needed to repeat each A4 page (landscape) "twice" in the A3 sheet (portrait), to later cut then in half.
I found a Acrobat Plugin with tons of imposition features that worked great for my needs, and with a fully functional 30 days trial.
Hope It could be helpfull for someone else.
http://www.pdfsnake.com/

I had the same issue as you and this is what I did:
Extracted all the pages in the pdf file as a separate file each
In irfanView (with plugins) I created a Vertical "Panoramic" image
Dragged the pdf files over to the images section
Clicked created
The "image" is created with all the pages following each other as one very long vertical image
You can export to PDF with almost no loss in quality.
Enjoy

Here's a script for repeating pages (like A5, landscape) twice on a sheet double this size (A4, portrait):
#!/bin/bash
INPUTFILE=$*
PAGENUM=`pdftk ${INPUTFILE} dump_data | grep NumberOfPages | cut -d : -f 2 | cut -d " " -f 2`
PAGES=`seq 1 ${PAGENUM}`
DUPAGES=`for i in ${PAGES} ; do echo $i $i | tr "\n" " " ; done`
OUTPUT1=`basename ${INPUTFILE} .pdf`.dup.pdf
OUTPUT2=`basename ${INPUTFILE} .pdf`.double.pdf
pdftk ${INPUTFILE} cat ${DUPAGES} output ${OUTPUT1}
pdfjam --nup 1x2 ${OUTPUT1} --outfile ${OUTPUT2}
It's not really elegant; it could be done without the second pdftk call, and it does not work with files containing spaces. But it works with multi-page pdfs.

Using Adobe Acrobat XI Pro, open the 1st document.
Edit, select "take a snapshot", then click at the top corner of what you want to copy and drag to the opposite corner.
Have an open clean sheet in Paint.
Click over to the clean sheet in paint and control V to paste the 1st document into the clean sheet.
Repeat for the 2nd item that you want to combine on the same page except position the 2nd item UNDER the 1st item temporarily in your Paint sheet.
Then drag the 2nd item to position it where you need it in the Paint Sheet.
Save the paint sheet file and you are done!

Related

Can man pass an option to the roff formatter?

SYNOPSIS
From man(1):
-l
Format and display local manual files instead of
searching through the system's manual collection.
-t
Use groff -mandoc to format the manual page to stdout.
From groff_tmac(5):
papersize
This macro file is already loaded at start-up by troff so it
isn't necessary to call it explicitly. It provides an interface
to set the paper size on the command line with the option
-dpaper=size. Possible values for size are the same as
the predefined papersize values in the DESC file (only
lowercase; see groff_font(5) for more) except a7–d7.
An appended l (ell) character denotes landscape orientation.
Examples: a4, c3l, letterl.
Most output drivers need additional command-line switches -p
and -l to override the default paper length and orientation
as set in the driver-specific DESC file. For example, use the
following for PS output on A4 paper in landscape orientation:
sh# groff -Tps -dpaper=a4l -P-pa4 -P-l -ms foo.ms > foo.ps
THE PROBLEM
I would like to use these to format local and system man pages to print out, but want to switch the paper size from letter to A4. Unfortunately I couldn't find anything in man(1) about passing options to the underlying roff formatter.
Right now I can use
zcat `man -w man` | groff -tman -dpaper=a4 -P-pa4
to format man(1) on stdout, but that's kind of long and I'd rather have man build the pipeline for me if I can. In addition the above pipeline might need changing for more complicated man pages, and while I could use grog, even it doesn't detect things like accented characters (for groff's -k option), while man does (perhaps using locale settings).
The man command is typically intended only for searching for and displaying manual pages on a TTY device, not for producing typeset and paper printed output.
Depending on the host system, and/or the programs of interest, the a fully typeset printable form of a manual page can sometimes be generated when a program (or the whole system) is compiled. This is more common for system documents and less common for manual pages though.
Note that depending on which manual pages you are trying to print there may be additional steps required. Traditionally the following pipeline would be used to cover all the bases:
grap $MANFILE | pic | tbl | eqn /usr/pub/eqnchar | troff -tman -Tps | lpr -Pps
Your best solution for simplifying your command line would probably be to write a little tiny script which encapsulates what you're doing. Note that man -w might find several different filenames, so you would probably want to print each separately (or maybe only print the first one).

Change 88 characters limit for Black plugin in PyCharm

I am using Black inside PyCharm to format my Python code.
I am using the Black-Pycharm plugin, unfortunately, selecting code and applying Black on it (Code > Reformat Code (BLACK)) cuts all my lines at 88 characters (the default limit for Black).
I want to change this limit to cut the lines at 80 characters. I tried two different ways:
Changing the Black exe path in the "Black plugin settings" from ~/.local/bin/black to ~/.local/bin/black -l80, but applying Black with PyCharm outputs this error: BlackPycharm: Cannot run program "/home/BCT/.local/bin/black -l80": error=2, File or folder not found
Using Black as an 'External Tool' in Pycharm (as described here), and specifying the line length in the arguments text box. This successfully applies Black on my file with the desired character limit, but:
It automatically saves/replaces my file with the new formatted file, I can't undo the change.
I can't apply Black on a portion of code only.
Do you know ways to use Black with:
The ability to specify a desired line length
The ability to reformat only a portion of code
at the same time ?
EDIT: Apparently PyCharm cannot use Black for only a portion of code...
I also had the same problem with adjusting the linelength for the black 'external tool'.
1- follow this link to install and set black as an external tool: https://black.readthedocs.io/en/stable/editor_integration.html#pycharm-intellij-idea
2- within the "PyCharm/IntelliJ IDEA" section, in "Arguments":
replace "$FilePath$" with "$FilePath$" -l 120
Note: the '-l 120' should be outside the quotes, and replace 120 by whatever line length you need.
Cheers!
Maher.
I use Black as an external tool within PyCharm, but I am able to specify line length by adding a pyproject.toml (The PEP/ more info) file to my root project directory. I don't have to pass anything as an argument. Maybe it's able to solve your problems. It looks the following way:
# NOTE: you have to use single-quoted strings in TOML for regular expressions.
# It's the equivalent of r-strings in Python. Multiline strings are treated as
# verbose regular expressions by Black. Use [ ] to denote a significant space
# character.
[tool.black]
line-length = 79
target-version = ['py37', 'py38']
include = '\.pyi?$'
exclude = '''
/(
\.eggs
| \.git
| \.hg
| \.mypy_cache
| \.tox
| \.venv
| _build
| buck-out
| build
| dist
# The following are specific to Black, you probably don't want those.
| blib2to3
| tests/data
| profiling
)/
'''
#Nadros solution is correct but it should be noted that -l 120 has to be added to the FileWatcher that was created as well if you are using it to format files on save.

Stack PDF images: single page output

How can I stack PDF images (vertically) into a single page output PDF? I.e.:
|-----|
| 1 |
| 2 |
| ... |
|-----|
(See example below.) What I am looking for is a PDF equivalent of this tool that stacks SVG graphics.
Note that this is distinctly different from a multi-page combination
|-----|
| 1 |
|-----|
| 2 |
|-----|
| ... |
|-----|
which one would obtain using
gs -dBATCH -dNOPAUSE -q -sDEVICE=pdfwrite -sOutputFile=merged.pdf 1.pdf 2.pdf ...
(See this answer.)
Example
My goal is perfectly described by #KenS in the comments: I take pages 1 and 2, both of which are 612x792 points, which should become a PDF with a single page containing the marking content from page 1 at the top and the marking content from page 2 at the bottom. The size if this page should this page be 612x1584.
More visually:
OK so the PostScript program from this answer will, I think, do the job. The way this works is that you set up the Ghostscript media size to be what you want the final output to look like, then you simply run the program through Ghostscript passing GS the name of the PDF file.
The program gets the current media size, and then attempts to fit the pages from the PDF onto that media. Obviously I don't have your test file but I believe if you set up GS to have media 612x1584 and then run it, then GS will decide that the pages fit best unscaled and unrotated. If that's not the case I'd need to see an example to figure out why.
Assuming you copy the program from the answer, and save it with the name 2-up.ps, the usage is in the comments at the start of the program:
% usage: gs -dNODISPLAY -sFile=____.pdf [-dVerbose] 2-up.ps
So you would need something like:
gs -dNODISPLAY -dDEVICEWIDTHPOINTS=612 -dDEVICEHEIGHTPOINTS=1584 -dFIXEDMEDIA -sDEVICE=pdfwrite -sOutputFile=out.pdf -sFile=<insert your full path and filename here> 2-up.ps
That will take the original PDF file (defined by -sFile), and try to create a 2-up representation of it, writing the output to a new PDF file.
Note the comments; this doesn't attempt to preserve metadata like hyperlinks, because these are page-based and will be wrong when the pages are renumbered) and will only work with the current PDF interpreter in Ghostscript. It won't work with any other PostScript interpreter because the program uses internals of the Ghostscript PDF interpreter that it isn't really supposed to meddle with.
Oh, and the program assumes that all the pages in the PDF file are the same size, the size of the first page.
We're supposed to be adding more (better) support for imposition in Ghostscript in a future release.

How can I drop metadata fields (e.g., PageLabel fields) from PDFs?

I have used pdftk to change the "Info" metadata associated with a PDF. I currently have several PDFs with extraneous page labels and I cannot figure how to drop them. This is what I am currently doing:
$ pdftk example_orig.pdf dump_data output page_labels.orig
$ grep -v PageLabel page_labels.orig > page_labels.new
$ pdftk example_orig.pdf update_info page_labels.new output example_new.pdf
This does not remove the PageLabel* metadata which can be verified with:
$ pdftk example_orig.pdf dump_data | grep PageLabel
How can I programmatically remove this metadata from the PDF? It would be nice to do with with pdftk but if there another tool or way to do this on GNU/Linux, that would also work for me.
I need this because I am using LaTeX Beamer to generate presentations with the \setbeameroption{show notes on second screen} option which generates a double-width PDF for showing notes on a second screen. Unfortunately, there seems to be a bug in pgfpages which results in incorrect and extraneous PageLabels in these files (example). If I generate a slides only PDF, it will generates the correct PageLabels (example). Since I can generate a correct set of PageLabels, one solution would be to replace the pagelabels in the first examples with those in the second. That said, since there are extra pagelabels in the first example, I would need to remove them first.
Using a text editor to remove PDF metadata
If it is the first time you edit a PDF, make a backup copy first.
Open your PDF with a text editor that can handle binary blobs. vim -b will be fine.
Locate the /Info dictionary. Overwrite all the entries you do not want any more completely with blanks (an entry consists of /Key names plus the (some values) following them).
Be careful to not use more spaces than there were characters initially. Otherwise your xref table (ToC of PDF objects will be invalidated, and some viewers will indicate the PDF as corrupted).
For additional measure, locate the /XML string in your PDF. It should show you where your XMP/XML metadata section is (not all PDFs have them). Locate all the key values (not the <something keys>!) in there which you want to remove. Again, just overwrite them with blanks and be careful not to change the total length (neither longer, nor shorter).
In case your PDF does not make the /Info dictionary accessible, transform it with the help of qpdf.
Use this command:
qpdf --qdf --object-streams=disable orig.pdf qdf---orig.pdf
Apply the procedure outlined above. (The qdf---orig.pdf now should be much better suited for
Re-compact your edited file:
qpdf qdf---orig.pdf edited---orig.pdf
Done! Enjoy your edited---orig.pdf. Check if it has all the data removed:
pdfinfo -meta edited---orig.pdf
Update
After looking at the sample PDF files provided, it became clear to me that the /PageLabel key is not part of the /Info dictionary (PDF's Document Information Dictionary), but of the /Root object.
That's probably one reason why pdftk was unable to update it with the method the OP described.
The other reason is the following: the PDF which the OP quoted as containing the correct page labels does in fact contain incorrect ones!
Logical Page No. | Page Label
-----------------+------------
1 | 1
2 | 2
3 | 2
4 | 2
5 | 2
6 | 4
The other PDF (which supposedly contains extraneous page labels) is incorrect in a different way:
Logical Page No. | Page Label
-----------------+------------
1 | 1
2 | 1
3 | 2
4 | 2
5 | 2
6 | 4
My original advice about how to manually edit the classical metadata of a PDF remains valid. For the case of editing page labels you can apply the same method with a slight variation.
In the case of the OP's example files, the complication comes into play: the /Root object is not directly accessible, because it is hidden inside a compressed object stream (PDF object type /ObjStm). That means one has to decompress it with the help of qpdf first:
Use qpdf:
qpdf --qdf --object-streams=disable example_presentation-NOTES.pdf q-notes.pdf
Open the resulting file in binary mode with vim:
vim -b q-notes.pdf
Locate the 1 0 obj marker for the beginning of the /Root object, containing a dictionary named /PageLabels.
(a) To disable page labels altogether, just replace the /PageLabels string by /Pagelabels, using a lowercase 'l' (PDF is case sensitive, and will no longer recognize the keyword; you yourself could at some other time restore the original version should you need it.)
(b) To edit the page labels, first see how the consecutive labels for pages 1--6 are being referred to as
<feff0031>
[....]
<feff0032>
[....]
<feff0032>
[....]
<feff0032>
[....]
<feff0033>
[....]
<feff0034>
(These values are in BOM-marked hex, meaning 1, 2, 2, 2, 3, 4...)
Edit these values to read:
<feff0031>
[....]
<feff0032>
[....]
<feff0033>
[....]
<feff0034>
[....]
<feff0035>
[....]
<feff0036>
Save the file and run qpdf again in order to re-compress the PDF:
qpdf q-notes.pdf notes.pdf
These now hopefully are the page labels the OP is looking for....
Since the OP seems to be familiar with editing pdftk's output of dump_data output, he can possibly edit the output and use update_data to apply the fix to the PDF without needing to resort to qpdf and vim.
Update 2:
User #Iserni posted a very good, short and working answer, which limits itself to one command, pdftk, which the OP seems to be familiar with already, plus sed -- not needing to use a text editor to open the PDF, and not introducing an additional utility qpdf like my answer did.
Unfortunately #Iserni deleted it again after a comment of mine. I think his answer deserves to get the bounty and I call you to vote to "undelete" his answer!
So temporarily, I'll include a copy of #Iserni's answer here, until his is undeleted again:
Not sure if I correctly understood the problem. You can try with a butcher's solution: brute force replace the /PageLabels block with a different one which will not be recognized.
# Get a readable/writable PDF
pdftk file1.pdf output temp.pdf uncompress
# Mangle the PDF. Keep same length
sed -e 's|^/PageLabels|/BageLapels|g' < temp.pdf > mangled.pdf
# Recompress
pdftk mangled.pdf output final.pdf compress
# Remove temp file
rm -f temp.pdf mangled.pdf
Not sure if I correctly understood the problem. You can try with a butcher's solution: brute force replace the /PageLabels block with a different one which will not be recognized.
# Get a readable/writable PDF
pdftk file1.pdf output temp.pdf uncompress
# Mangle the PDF. Keep same length
sed -e 's|^/PageLabels|/BageLapels|g' < temp.pdf > mangled.pdf
# Recompress
pdftk mangled.pdf output final.pdf compress
rm -f temp.pdf mangled.pdf

How to annotate PS or PDF from (Linux) command line without losing quality?

Is there any command line tool for Linux that will allow me to annotate a PS or PDF file with text or a particular font, color, and size with no loss of quality? I have tried ImageMagick's convert, and the resulting PDF is of pretty poor quality.
I have a template originally authored in Adobe Illustrator, and I would like to generate PDFs from it with names in certain places. I have a huge list of names, so I would like to do this in a batch (not interactively).
If anyone has any ideas I'd appreciate hearing them.
Thanks,
Carl
Another way to accomplish this would be to hack the postscript file itself. It used to be that AI files were postscript files, and you could modify them directly; I don't know if that's true anymore. So you may have to export it.
For simplicity, I assume there's a single page. Therefore, at the very end there will be a single call to showpage (perhaps through another name). Any drawing commands performed before showpage will show up on the page.
You may need to reinitialize the graphics state (initgraphics), as the rest of the document may have left it all funny, expecting showpage to clean up before anyone notices.
To place text, you'll need to set a new font (the old one was invalidated by initgraphics) measure the location in points (72 points/inch, 28.3465 points/cm).
/Palatino-Roman 17 selectfont %so much prettier than Times
x y moveto
(new text) show
To do the merging, you can use perl: emit the beginning of the document as a HERE-document, construct some text-writing lines by program, emit the tail of the document. Here's an example of generating postscript with PERL
Or you can take data from the command-line (with ghostscript) by using the -- option ($gs -q -- program.ps arg1 arg2 ... argn). These arguments are accessible to the program through an array named /ARGUMENTS.
So, say you have a nice graphic of a scary clown holding a blank sign about 1 inch wide, 3 inches tall, top left corner at 4 inches from the left, 4 inches from the bottom. You can insert this code into the ps program, just before showpage.
initgraphics
/Palatino-Roman 12 selectfont
4 72 mul 4 72 mul moveto
ARGUMENTS {
gsave show grestore 0 -14 rmoveto
} forall
Now you can make him say funny things ($gs -- clown.ps "On a dark," "and stormy night...").
I think it's better to create PDF form and fill it with pdftk fill_form in batch:
$ pdftk form.pdf fill_form data.fdf output out.pdf flatten
Form data should be in Forms Data Format (it's just XML file with field names and values specified).
Note the flatten command. It is required to convert filled form to plain document.
Another way is to create set of PDF documents "with names in certain places" and transparent background, and pdftk stamp each of them over the template:
$ pdftk template.pdf stamp words.pdf output out.pdf