does pandoc have option for putting multiple "pages" per "sheet"? - pdf

I've been using a shell script with pandoc to create multiple page pdf files. I can specify the size of the page... but if we consider this "page" to be a sheet of paper (the pdf gets printed onto paper)... I want to actually have multiple pages per sheet - 2 pages on each side of the side of the paper. 4 pages get printed to each sheet of paper (and the paper gets folded in half).
Depending on the style of printing, the ordering of pages is different:
for "booklet" printing where all the pages are stacked together and then folded once together, the page ordering of an 8-page (2 sheets of paper) document would be pages 8 & 1, on the back of that pages 2 & 7, on the next sheet pages 6 & 3, and on its back 4 &5.
for "book" style printing, where each sheet gets folded on its own, and then the folded sheets placed together, the order is different: page 4&1 with 2&3 on the back, and the next sheet pages 8 & 5 on one side with 6 & 7 on the other.
The desktop publishing program called Scribus (among other design-to-print softwares) has functionality for ordering the pages like this (for reference of the intent which I describe this article describes the situation). But I don't want to use a GUI like Scribus. I'm writing the pages in markdown in Vim and generating the pdf from the command line.
Does pandoc have a way of ordering pages like this?

As you state in comments "It might be that pandoc can NOT do this" without some TeX fettling.
I had previously given an answer with visuals of flat in TeX eXchange to a versatile 32-Up page way imposition can be done in LaTeX (and thus an 8 page should not be a problem by trimming down the answer)
see https://tex.stackexchange.com/questions/494047/a-tex-script-to-impose-multiple-layout-signatures-optionally-saddle-stitch-fro/494232#494232
Simple Imposition (booklets and n-up print compositing) can be done by other PDF orientated CLI tools that Pandoc uses but the last time I built a LaTeX program (Linked Above) it was a pain to get right.
More recently I wrote similar function to work inside SumatraPDF reader and for that I used simple CMD batch commands via either N-Up-PDF or cPDF as they do the basic stuff like rotate join 2 and reorder to booklets but you may need to adapt to suit your own use.
My RTFManual (Rich Text Format) for install/usage are in PDF #
https://github.com/GitHubRulesOK/MyNotes/raw/master/AppNotes/SumatraPDF/Addins/N-Up-PDF/N-Up-PDF.pdf

Related

Adjust PDF scale to print

In the context of my studies I often receive PDF files written in LaTeX, with big margins.
When I have to print those files, I like to print them with 2 pages per sheet to spare paper. But I then have a lot of white-space and the text is quite small.
So I'm looking for a way to scale the page contents first and only then print them 2 pages per sheet, to avoid losing space and to have the text as big and readable as possible.
Has anyone an idea of how I could do that either programmatically, or scripted, or on a "step-by-step commands" basis ?
(Note that I have no access to the LaTeX code, otherwise I would just change the margins...)
I used FinePrint to do this on windows. But there are some alternatives, which I haven't try:
https://superuser.com/questions/190869/fineprint-alternative-on-linux
https://superuser.com/questions/107687/good-virtual-printers-with-cropping-for-windows-and-linux
Here are previous answers (all mine) which provide building blocks that will help you construct your own programmatic or scripted or "some step-by-step commands" solution:
PDF Manipulation: "2-Up" page layout (SuperUser)
Linux-based tool to chop PDFs into multiple pages (SuperUser)
Convert PDF 2 sides per page to 1 side per page (SuperUser)
How can I split a PDF's pages down the middle? (SuperUser)
Cropping a PDF using Ghostscript 9.01 (StackOverflow)
Split one PDF page into two (StackOverflow)
PDF - Remove White Margins (StackOverflow)

A Table of Contents Page for a Scanned PDF

I was given some really old but very useful hand-written notes recently and in a bid to preserve them, I had them scanned into a file in the PDF format. What I have is a 35 page PDF but I want to add a contents page at the beginning so that I can use the first page to click my way to a specific topic.
More precisely,
I want a page which says
Topic 1
Topic 2
Topic 3
...
Each one should be linked to a page of my choosing.
I've explored a lot of standard tools out there to help me with this, like LibreOffice, pdftk etc. but the solution does not appear to be in the form of a simple application and a few clicks. My hunch is that this will require a program written in a suitable language. The way I'd want this program to work as follows:
ProgramName Input.pdf CustomTOC.txt
Where CustomTOC.txt could be a simple ASCII table containing two columns, one column being the title and the second column being the page number. The output of this program will be another PDF file which contains one page appended at the beginning of Input.pdf containing a table of contents with hyperlinks to the right pages.
I have managed to solve this problem though I don't think this is the best way to do it. I have written a Python program that accepts two mandatory inputs - the input PDF file and '|' separated ASCII table containing columns and page numbers. A third optional output can be the name of a PDF file which contains the output. If this is not provided then the original input file is rewritten.
How the code works? Uses a system call to 'pdftk' for bursting the PDF file into its constituent pages. Writes a .tex file which contains a \listoffigures command for the first page with the package hyperref ensuring it links to the figures. The later part of the .tex code contains several figure insertion statements where the PDF file corresponding to each page is inserted, providing captions only to those PDFs for which there is an entry in the provided TOC table.
Why the code is not ideal? It relies on too many dependencies. It relies on a system call to the pdftk package, it requires that LaTeX be also installed on the machine with the graphics package. In the current version of the code, the PDFs on each page do have some offset which I am trying to solve using geometry package with custom margin settings. I will try to post the code once this problem is solved.
A more ideal solution. That which does not require LaTeX and can use some PDF library within Python to achieve the same effect. Comments and suggestions welcome!

Build a PDF file from parts of another

I have a template for a Hipster PDA (you remember those, don't you?) that shows four copies of the same card on one page then four copies of the next card on the next, and so forth. I would like to rearrange things so that each page only has one copy of each card, so I can print four distinct cards to a page, without wasting a lot of paper. I did something vaguely similar to this years ago, but that involved hand editing a lot of Postscript and took forever to do. I would like some sort of command-line solution that would cut a different quadrant from each page and then paste four them onto a single new page.
You might try and get what you want in two steps:
Setup CropBox for each of the pages so that only one copy of a card lays within the CropBox.
Use a PDF imposition software to make new pages from 4 "old" ones
For the latter you could try Multivalent Impose tool.

Is this possible to break the pdf file smaller than page wise breaking?

I found there is a lot of tools available for breaking the Big PDF files into smaller one by splitting the original PDF file PAGE WISE.for example, if i have a 10 page PDF Document,then we can able to break the original pdf file into 10 pieces in page wise splitting.
But i want similar kind of tool that breaks the PDF file smaller than the Page wise splitting.That means,i need to split the PDF page into different documents based on any parameter like paragraph,section,element...
for example,
If my PDF file having 2 pages with 10 paragraphs then i would like to split the pdf file into 10 separate Pdf file based on paragraph parameter...
Also, I strongly believe pdf does not contain any structure like Open XML.But i also Suspecting
How the tools can able to break the pdf files in to small pdf files by splitting page wise? What kind of mechanism they are using for page wise splitting PDF File?
So, Is there any way to do my work? Please give me your valuable suggestion on this?
PDF is a vector based document description language. It's page based so in a way every page is independent from the next one. Splitting page wise is therefore pretty easy. Contrary to a raster image where you can extract small subsets independently in a pdf you have to render the whole page to know how a small subset looks like.
Say you have a Page (black) which contains a complex shaped object (here it is a line but it could be any text, shape, image, etc.) and you want to extract a subset (red). You would have to first find all the objects that produce visible output in the region of interest. Then you would have to modify them so they are rendered correctly (in this case calculate the green points from the blue points while preserving the shape of the object).
An easier approach would be to include the whole page and clip the viewing area to the dimensions of the region.
You could do this with pdfjam. Check the --trim/--offset/--delta command in conjunction with a custom paper size (Example 6,7 on the pdfjam website). You would still have to somehow calculate the coordinates of the region of interest though.

Is there a way to programmatically remove all blank pages from a PDF file?

Nowadays it is more practical to purchase an ebook than the dead-tree version. But the PDFs frequently contain the blank pages used by the print edition. I typically see between 10-30 blank pages (or pages with text "This page intentionally left blank.") per ebook. Is it possible to programmatically remove these blank pages? Currently I manually identify the blank pages and then run it through this:
pdftops orig.pdf - | psselect "$range_of_non_blank_pages" | ps2pdf - new.pdf
So the hard part is identifying the blank pages. pdftotext would work for the most part, except where the page has only images and no text.
Also, even after removing many pages and seeing the resulting file size is smaller, after shrinking both the original file and the new version (using various methods found on the internets), the original file is usually smaller by several hundred KB or more. So it appears the method I'm using to remove the blank pages doesn't create an optimal pdf. I've also tried various gui programs and see the same results in this respect.
Partial answer: you don't need to go via postscript (this is probably the reason why you get a bigger file). One possibility is
pdftk orig.pdf cat "$range_of_non_blank_pages" output new.pdf
To identify blank pages, you'd need to use a tool that can go beyond selecting and reassembling pages. Try a library for a scripting language, for example CAM::PDF or PDF::API2 in Perl.
I don't know of an open source solution that can detect and remove blank pages. However, Apago's commercial PDF Enhancer can automatically remove blank pages -- both vector and scanned. For scanned, it can remove scan artifacts such as black edges, hole punches and noise prior to determining if page is blank.