Imposed (nup'ped) PDF file does not allow selection of text - pdf

I have a PDF file with 5 pages. I have created an imposed (nup'ped) PDF file with all those pages on one long paper sheet. I used pdfpages LaTeX package for that and the following code:
\includepdf[pages={1-5},nup=1x5]{original.pdf}
The original PDF file had recognized and selectable text on all pages. But in the resulting file (result.pdf) only the first two pages allow text selection. The imposition was done correctly and looks as expected.

It turned out to be the poppler's 50`000 characters limit per page.

Related

creating pdftk watermark file from command line

I need pdftk to watermark a pdf. I’m generating the content of the watermark programatically and write it out into a text file. Then I use cupsfilter to create the watermark pdf, and then pdftk to apply the generated watermark pdf onto an eBook pdf.
I understand that pdftk applies page by page watermark pdf onto eBook pdf.
If I create a 62 line text file, with 61 empty lines and watermark text on 62nd, then it gets applied properly at around 5/6 of the page height on every page of the eBook pdf.
I add one more empty line, the watermark text disappears. It does not end up on the next page, it is simply not there.
My ultimate goal is to have the watermark text at the bottom of the second page of the eBook
So I would need to create a 3 page pdf, having the first page empty, watermark text at the bottom of the second page and an third page again empty
I tried to insert page break using BBEdit into the text file, but I do not get the expected result.
does anybody have a hint how could I create the required text file which once printed out with cupsflter into a pdf will create the needed watermark pdf (first and third page empty and line or two of text at the bottom of the second page)
OK, so first, the manual is not entirely clear about difference between stamp and multistamp, and background and multibackground - it explains that the watermark pdf will be applied page by page onto eBook pdf if the watermark pdf is a multipage pdf, and that if the watermark pdf has fewer pages than the eBook pdf, the last page of the watermark pdf will be applied to all surplus pages of the eBook, and this is correct, but only in case of multistamp/multibackground option. If you use stamp/background option then only the first page of the watermark pdf will be applied to all pages of the eBook pdf, this was the first to figure out.
So I created two txt files using echo, one empty (one space in it) and one with one line of watermark text. Than I used pdftk cat option to merge the empty pdf with the watermark pdf, so I got two page pdf having first page empty and second with the line of text. Than I merged this file once again with the empty pdf, and ended up with 3 pages pdf.
Then I applied this 3 page watermark pdf with multibackground option to the eBook and got what I wanted - first page no watermark, second page the line of text and third an all other pages with no watermark.

Export PDF Page contents to individual pages

I have a pdf document which contains more than one page within each page.
The original document is only 2 pages - size A4, but has multiple pages on each of the 2 pages.
I need to export each of these "pages within each page" to an individual pdf page.
I have tried increasing the zoom of the pages and printing from there, but it prints incorrectly.
What could I do within Adobe reader or similar program to export each of these pages each as their own pdf page ?
Link to PDF
Within Acrobat reader, you could make a clever use of custom poster printing (possibly to print as a new PDF):
https://apple.stackexchange.com/questions/12305/split-a-single-page-pdf-into-multiple-pages
Otherwise you can do any of these:
Splitting single page into two pages with ghostscript
Alternatively you could use other tools such as Inkscape to do the splitting.

ghostscript extract pages containing a text string

i need to programmatically extract from a multipage pdf, only the pages containing a text string. Is it possible or i need some other tools? I'm working on aix.
thanx in advance
OK firstly Ghostscript doesn't extract pages from PDF files. It creates brand new PDF files whose visual appearance should be the same as the original, but whose content will be different.
There is no way to do this with Ghostscript in a single pass. You could use the txtwrite device to extract the text then grep through the output files for the text you want, note the page numbers and then run another pass to get those pages into new files.
Be aware that extracting text from a PDF file is far from guaranteed to work! That was not the intent of the original PDF format.
Also note that GHostscript currently only allows for handling a single range of pages, First->Last, so if you have a discontinuous set (eg pages 1, 3, 5, 7 etc) then you will have to run this step multiple times.

Recover text from PDF file when normal methods fail

I have a few hundred PDF files from which I need to extract sections of text. For many, pdftotext works fine, but for others, it misses large sections of text. If I open the PDF in Acrobat and select that text by hand and copy/paste into emacs and then view the file without an encoding, I get stuff like this:
Husband \364\200\200\272\364\200\201\213\364 etc.
How can I extract the text correctly?
I should mention that I've tried saving as text from Acrobat; also tried applying Acrobat's Document=>OCR feature before copying.
Why not convert the PDF to doc or txt first? See the guide:
http://www.aolor.com/pdf-converter/user-guide.html

create two pdfs from one .ps file?

I need to reformat a text file into a PDF. Using Perl, I am modifying an existing PostScript template file based on what is in the text file. Sometimes this text file will be long enough to require a two page PDF.
Can I create a two page PDF file from one .ps file using GhostScript? If so, what tells GhostScript where the page break should occur?
Maybe I need to use two template files. One for a one page pdf and another for a two page PDF.
PostScript doesn't directly have the concept of text flows or page breaks. The showpage operator renders the page to the device, clears the page and starts a new one. PS to PDF conversion will create a new page in the PDF on this operator. If you want to chop up a PostScript file into pages, psutils is a series of programs for manipulating PostScript files.
It's down to whatever is converting your text file to create appropriate PostScript commands to handle the page break.
A page break will happen if (and only if) your PostScript template invokes showpage.
I would guess it depends on what's in your PostScript template. A PostScript file is a computer program, and page breaks are determined by the logic in the PostScript. If the two-page format is substantially the same as the one-page format, you could have your Perl script split the data up, then create two single-page files concatenated together. GhostScript should render that file correctly.