PDF cover page printed as the only on the sheet, without empty pages inserted - pdf

I have a make-based workflow which uses pdftk to merge cover with LaTeX document. The cover is generated by the following recipe (if that matters):
$(COVER) : $(patsubst %.pdf,%.svg,$(COVER))
inkscape $< --export-area-page --batch-process --export-type=pdf --export-filename=$# || inkscape --file=$< --export-area-page --export-area-drawing --without-gui --export-pdf=$#
When I view the document or print it single-sided, everything looks well. But when I print it in double-sided, the first page of the document is obviously printed at the opposite side of the cover. An obvious solution is to add a blank page - but that ruins document viewing and single-side printing.
How can I suggest (not necessarily force) printer to print the cover page single-sided by default? Preferably without losing hyperlinks in both cover and the document. Something to mark a page as "the only in the sheet" or "the first in the sheet" when printed.
I can use any tool available in Ubuntu, preferably commandline (so I can use it in the Makefile) or a Python package (so I can write an appropriate commandline tool myself).
An answer to Insert blank pages into PDF when printing does not help, as it does not cover hiding the blank pages when viewing/single side printing.

Related

Interactive PDF (for Mobile) - Is there a way to navigate to a page within a PDF and ensure that the user starts at the top of the page?

I'm creating an interactive PDF to be viewed on a mobile device (it's not an ideal option, but this is the direction we need to pursue).
Currently, I have tested buttons and hyperlinks, and I can easily navigate through the PDF. However, my view following the jump is centred on the page (content is cut off on the page above and below).
I have tried a variety of hyperlink anchoring types, but I'm not having any luck.
I thought the solution could be to use navto:// within a GO to URL hyperlink, but I have been unsuccessful with that as well.
Is there a way to navigate to a page within a PDF and ensure that the user is brought to the top of the page?
If you are willing to post-process the file, cpdf -open-at-page <n> in.pdf -o out.pdf should set this up, for a given value of <n>. See Chapter 11 of the manual for details.

Pandoc, make PDF for printing purposes, links into footnotes

I have a markdown document which I would like to convert to a PDF.
I will then be printing this PDF onto paper.
I would like to know how to convert my markdown in a PDF so that there are no digital elements the user has to click on. Since the document will be printed and it will be impossible for readers to click on anything.
I would like it so anything the user has to click on is converted into a footnote, or some other visual indicator.
The two main offenders I have found are web links and section links.
I am using the -V links-as-notes=true option to make all web links into footnotes.
However section links (ex: [section name](#section-name)) are still digital elements which must be clicked.
I have tried looking into the options provided by the packages pandoc uses which pandoc allows users to customize. However I have not found any relevant options.

Adjust PDF scale to print

In the context of my studies I often receive PDF files written in LaTeX, with big margins.
When I have to print those files, I like to print them with 2 pages per sheet to spare paper. But I then have a lot of white-space and the text is quite small.
So I'm looking for a way to scale the page contents first and only then print them 2 pages per sheet, to avoid losing space and to have the text as big and readable as possible.
Has anyone an idea of how I could do that either programmatically, or scripted, or on a "step-by-step commands" basis ?
(Note that I have no access to the LaTeX code, otherwise I would just change the margins...)
I used FinePrint to do this on windows. But there are some alternatives, which I haven't try:
https://superuser.com/questions/190869/fineprint-alternative-on-linux
https://superuser.com/questions/107687/good-virtual-printers-with-cropping-for-windows-and-linux
Here are previous answers (all mine) which provide building blocks that will help you construct your own programmatic or scripted or "some step-by-step commands" solution:
PDF Manipulation: "2-Up" page layout (SuperUser)
Linux-based tool to chop PDFs into multiple pages (SuperUser)
Convert PDF 2 sides per page to 1 side per page (SuperUser)
How can I split a PDF's pages down the middle? (SuperUser)
Cropping a PDF using Ghostscript 9.01 (StackOverflow)
Split one PDF page into two (StackOverflow)
PDF - Remove White Margins (StackOverflow)

Combining PDF with GhostScript: Using Original Bookmarks with corrected page numbers

I am using
gs -dBATCH -dNOPAUSE -sDEVICE=pdfwrite -sOutputFile=book.pdf -f front-matter.pdf fulltext-0.pdf fulltext-1.pdf back-matter.pdf
to create a single PDF document from a series of pdf documents. I was going to include a new made-up table of content and include it using the pdfmark mechanism. Then I notice that the original files already have bookmarks in them - they are however referenced to the original page numbers, not the ones in the combined document.
I am looking for two possible solutions. Remove the orginal bookmarks or make use of the original bookmarks but somehow update their page references...
As so often the case, someone has walked the same path before you...
unfolding disasters has worked out a solution to this very problem. His python script pdf-merge.py first invokes pdftk with its dump_data switch to retrieve all the pdfmark information. It then keeps track of the total number of pages for each merged document and does the math to offset the new page number pointer in the pdfmark instruction by the sum total of page counts of all the PDF documents included before the current PDF document. So it is close but not the same as the 2-pass approach of KenS. It first discovers bookmarks using pdftk and then creates a new bookmark file with correct page numbers. It also manages to turn the original pdfmark instruction (that would normally be preserved by gs into noop). I won't pretend I understand how that last part worked ...
However, the script does all I need including the option of tweaking the bookmark file before the final writing. Very neat and hat tip to Trevor King.
In general pdfwrite doesn't know you are appending files, so it preserves bookmark and other 'metadata' information on the assumption that you will want it in the output.
However, when you are combining PDF files, preserving the information won't work, as the page numbers for the second and subsequent files will be incorrect.
So you need a 2-pass approach, first merge all the files, discarding the bookmarks, then 'convert' the merged file and add pdfmarks to set the correct bookmarks.
There is currently no option (with pdfwrite) to not preserve bookmarks. You will need to modify the Ghostscript PDF interpreter PostScript files to achieve this I think. You might try setting -dDOPDFMARKS=false, but I doubt that will work.

Is there a way to programmatically remove all blank pages from a PDF file?

Nowadays it is more practical to purchase an ebook than the dead-tree version. But the PDFs frequently contain the blank pages used by the print edition. I typically see between 10-30 blank pages (or pages with text "This page intentionally left blank.") per ebook. Is it possible to programmatically remove these blank pages? Currently I manually identify the blank pages and then run it through this:
pdftops orig.pdf - | psselect "$range_of_non_blank_pages" | ps2pdf - new.pdf
So the hard part is identifying the blank pages. pdftotext would work for the most part, except where the page has only images and no text.
Also, even after removing many pages and seeing the resulting file size is smaller, after shrinking both the original file and the new version (using various methods found on the internets), the original file is usually smaller by several hundred KB or more. So it appears the method I'm using to remove the blank pages doesn't create an optimal pdf. I've also tried various gui programs and see the same results in this respect.
Partial answer: you don't need to go via postscript (this is probably the reason why you get a bigger file). One possibility is
pdftk orig.pdf cat "$range_of_non_blank_pages" output new.pdf
To identify blank pages, you'd need to use a tool that can go beyond selecting and reassembling pages. Try a library for a scripting language, for example CAM::PDF or PDF::API2 in Perl.
I don't know of an open source solution that can detect and remove blank pages. However, Apago's commercial PDF Enhancer can automatically remove blank pages -- both vector and scanned. For scanned, it can remove scan artifacts such as black edges, hole punches and noise prior to determining if page is blank.