Build a PDF file from parts of another

Build a PDF file from parts of another - pdf

I have a template for a Hipster PDA (you remember those, don't you?) that shows four copies of the same card on one page then four copies of the next card on the next, and so forth. I would like to rearrange things so that each page only has one copy of each card, so I can print four distinct cards to a page, without wasting a lot of paper. I did something vaguely similar to this years ago, but that involved hand editing a lot of Postscript and took forever to do. I would like some sort of command-line solution that would cut a different quadrant from each page and then paste four them onto a single new page.

You might try and get what you want in two steps:
Setup CropBox for each of the pages so that only one copy of a card lays within the CropBox.
Use a PDF imposition software to make new pages from 4 "old" ones
For the latter you could try Multivalent Impose tool.

Related

Automating Split, Save Name and Error Checking with Acrobat Action Scripts?

I'm considering convincing my company to upgrade to Acrobat Pro so I can automate the processing of my scanned documents. Before I bring it up, I want to make sure the things I want to do are possible. I don't need anyone to give me the code, I just want to know if this is possible.
The documents i'm working with are landscape, 2-5 pages, and have the filename and page numbers in the footer. I want to scan a big stack of them and have a script perform the following actions:
Use OCR to acquire the filename and page numbers for each page. I would like to restrict the OCR to only look at the footer to save time and RAM.
Using the filenames, I want it to detect when one document ends and the next one begins so they can be split into separate files.
Before saving the split files, check that the number of pages in the file matches the page total in the footer. (I work in a factory and the documents can get sticky, so my scanner frequently pulls two pages at once)
Instead of saving the files where the page total doesn't match, compile a list of the errors so I know which documents need to be rescanned.
Finally, save all correct documents with their filenames from the footer to a folder on my desktop.
This could save me hours a week, so I'm hopeful that it's all possible. Thanks

Possible to control PDF layout with iText?

I'm writing some logic to build a large single PDF file that our users can print at their convenience. I'm using Java's iText library (through Clojure's clj-pdf).
I'm trying to have the PDF show the same exact template form on every single page, however I can't seem to find any documentation or indication that one can have PDF content "fit to a page".
The text in these forms varies a little bit, so there's a chance it might require more of fewer text lines per page. This means that the content has a chance of spilling over to the next page, or being too short, making the next page creep up into the previous one, breaking the requirement of "one form per page" for the rest of the document.
I'm trying to figure out if my option is pretty much only to manually check the length of the text on each page and potentially crop it by hand if I goes over n lines, or if the PDF format somehow supports a smart way of having paragraphs+tables+headings all fit in one page. Some UI systems allow you to control how spill-over is handled, anywhere from cropping to resizing the font, so I'm curious if PDF supports anything of that sort.
Edit: ended up going with pagebreaks for simplicity, wasn't aware of that option when I wrote this question.

If you want to take control over the space taken by text, for instance to fit it on a single page, the way to go would be to create a ColumnText object and to add the content in simulation mode. If the text fits the page, add it for real. If it doesn't, use a smaller font size. This is demonstrated in the MovieAds example where snippets of text are fitted into AcroForm fields.

Is there a way to programmatically remove all blank pages from a PDF file?

Nowadays it is more practical to purchase an ebook than the dead-tree version. But the PDFs frequently contain the blank pages used by the print edition. I typically see between 10-30 blank pages (or pages with text "This page intentionally left blank.") per ebook. Is it possible to programmatically remove these blank pages? Currently I manually identify the blank pages and then run it through this:
pdftops orig.pdf - | psselect "$range_of_non_blank_pages" | ps2pdf - new.pdf
So the hard part is identifying the blank pages. pdftotext would work for the most part, except where the page has only images and no text.
Also, even after removing many pages and seeing the resulting file size is smaller, after shrinking both the original file and the new version (using various methods found on the internets), the original file is usually smaller by several hundred KB or more. So it appears the method I'm using to remove the blank pages doesn't create an optimal pdf. I've also tried various gui programs and see the same results in this respect.

Partial answer: you don't need to go via postscript (this is probably the reason why you get a bigger file). One possibility is
pdftk orig.pdf cat "$range_of_non_blank_pages" output new.pdf
To identify blank pages, you'd need to use a tool that can go beyond selecting and reassembling pages. Try a library for a scripting language, for example CAM::PDF or PDF::API2 in Perl.

I don't know of an open source solution that can detect and remove blank pages. However, Apago's commercial PDF Enhancer can automatically remove blank pages -- both vector and scanned. For scanned, it can remove scan artifacts such as black edges, hole punches and noise prior to determining if page is blank.

Batch Decollating Directory of PDFs and Bar Code Imposing

I have a directory of PDFs. They are all different, but they all have 5 pages. I need to insert a bar code on each page for each PDF. After this process I need to combine and decollate every PDF. Essentially there would be 5 different PDFs created. The first would contain all page ones from every PDF, the second the second page, etc.
I need to find a tool, or a toolset, that would allow me to accomplish this. I'm willing to program my own solution but I'm not even sure what would be the most efficient language to attack it with.

What I ended up doing was using Perl with the PDF::Reuse and PDF::Reuse::BarCode libraries to get all PDFS in the directory, pull the pages I wanted, put the barcode and save out to a new PDF.

Automated Development of Presentation with Interactivity

I am trying to identify the right tool, language, software package, or other for the automated development of presentations, where the presentation is user interactive.
The presentation will consist of images with titles and some descriptive text. Most of the time there will be 35–70 images. I would like to show each image on a separate page, slide, tab, etc. (I guess proper terminology depends on the solution.)
The images will change, but the titles will remain the same, and there will be a little bit of change to the description of each image.
After putting the presentation together, I would like the user to be able to circle and "write" on the electronic image in kind of the wax pencil sense (I previously worked in a photo lab and we worked with wax pencils on negatives all the time and would like to have kind of a similar flexibility). Moreover, I would like users to be able to add comments as well, kind of in the way Adobe PDF Professional allows, e.g. inserting bubble comments, etc.
Most importantly, I would like to be able to do this in an automated way. Right now we are using PowerPoint, but the amount of time it is taking to put an image on a slide in PowerPoint, resize it, and then set up the text is killing us. Plus, as the images change it takes tons of time to go back and update them. Thus, we would like something that is a bit faster to update images and get the feedback from our few users. Does not necessarily have to be a web hosted solution, but could be run through a browser.
Sorry this is so long and thanks for any ideas and feedback, especially if there is an existing software package solution, language that can be used, or other approach to get this done.

These days, two of the most popular are Adobe Captivate and Articulate Presenter. For service, instead of product, you can check out services like http://voicethread.com.

I don't know of any product that completely answers your requirements.
But, for similar results I use two different tools for developing the presentations and another one for drawing while presenting.
If I just want to make a presentation made of pictures and texts, and I want to automate its creation, I use irfanview http://www.irfanview.com/ with its wonderful feature for automated slideshows. I put all the images together, annotate them (I use either their filenames, or if not enought, with EXIF and comment fields) and create a slideshow, that can be compiler into an .exe file.
If I want a more elaborated presentation. With full annotation capabilities, I use Wink http://www.debugmode.com/wink/
For drawing over the screen during the presentation, I use a very old bitmap drawing program, called PC-Draw, that allows, with a hotkey, to capture the screen as a bitmap and begin drawing over it, and with another hotkey, to return to the original screen without altering the running programs at all. I have not found it anywhere in the web. However, I found similar programs just a quick google away.
All three tools are free and easy (and even fun) to use.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas