How to vertically split a PDF e-book with collation (2-page per sheet to 1 sheet per page) - pdf

I have a scanned e-book with 2 pages per sheet. I was able to crop the e-book for the white borders on four sides. Since the two book sheets are on one single page, I am getting bad view on e-reader like kindle. I am trying to split the e-book to 1 page per sheet. Is there a way to to do this in acrobat professional?
I thought of cropping the pdf as two batches (left and right) and merging them together but the page collation will go off completely. the pages won't come adjacent to each other. I will get 1,3,5,7 upto 101 as one pdf and 2,4,6....100 as another PDF
pLEASE provide me a solution in acrobat professional

You can merge the PDFs back together in script. When running from Acrobat, JS has access to quite a few functions that aren't available in Reader.
doc.insertPages(nPageInDoc, pathToOtherPDF, nStartPage, nEndPage)
So you could create a script in a button in one of your 1,3,5,7... files to import all the pages from the other. Something like:
var oddPagesDoc = app.openDoc("c:\\oddPages.pdf");
var evenPagesDoc = app.openDoc("c:\\evenPages.pdf");
var evenPageCount = evenPagesDoc.numPages;
for (var i = 0; i < evenPageCount; ++i) {
oddPagesDoc.insertPages(i, "c:\\evenPages.pdf", i, i);
}
So insert a button into the "odd pages" file with the above script as the button's "mouse down" javascript action. Click. Delete the button.
It's entirely possible there's an "off by one" error in my script, so I don't recommend saving over the original until you're sure everything was assembled properly.

If you Adobe Acrobat Pro, there is a way to do this without scripting. It's quite tedious, but I'll explain:
I advise you to make a copy of the file first
Crop the left part of the page: Select Crop pages, if it's an A4 sheet then for Margin Controls use Right = 14.85, for Page Range select All. Save as left.pdf
Extract all the left pages as separate files: in the file left.pdf, select Extract pages, edit the From and To boxes to select all the pages, check the box Extract Pages As Separate Files. Now select a folder to save all the files in, it will name them left 1.pdf, left 2.pdf, left 3.pdf, etc
Repeat step 1 for the right side of the page: open original file, crop at Left = 14.85 for all pages and save as right.pdf
Repeat step 2 for right.pdf to extract all the pages into the same folder as right 1.pdf, right 2.pdf, right 3.pdf, etc
In Acrobat choose Create -> Combine Files into a Single PDF, and navigate to the folder where you've saved all the separate pages. You can rearrange the files into the correct order, i.e. left 1.pdf, right 1.pdf, left 2.pdf, right 2.pdf, etc. Then click Combine Files and save your new ebook.
Tip: if you have many pages in the book, it can take quite long to rearrange the files when combining them. Acrobat arranges them alphabetically, so it would be better if the files were named, for instance for a PDF with 354 pages, 001left.pdf, 002right.pdf, ..., 354left.pdf, 354right.pdf. I can't find any setting in Acrobat to change the default name. But you could use this free tool to batch rename files: http://www.snapfiles.com/get/denrenamer.html

The following modified script (idea based on the previous answer) worked for me:
for (var i = 0; i < pageCount; i++) {
this.insertPages({nPage:2*i,cPath:"/C/***/fileName.pdf",nStart: i,nEnd: i});
}
Multiplication by 2 is needed since when you're inserting pages, your page numbering shifts.

Related

Word / PDF - Merge Documents

I am looking to merge two documents, however it is not your typical merge.
My first document is a mailmerge, creating a cover letter, basically each page has a name and address
My next document is a static document that cannot be changed.
I need to insert the static document into my first merged document, but after every page, therefore, for every one page a document is inserted.
I have tried the insert document in both word 2010 and pdf using adobe acrobat, and as you have thought it only inserted one document after the first page.
I'm looking at VBA, but I have never utilized VBA and word before
Any pointers would be appreciated.
Many thanks
I should have spent more time on this.
The original template contains fields to merge.
On the static document that I mention, click insert tab, Text Section, select Object - Text From File
Select the cover letter / template that contains the fields to merge. This will insert the template followed by the static document that cannot be changed
Note I have spotted some formatting changes on the template following merge - further work required
From this point start your mail merge, and complete merge to Adobe or word.
This creates a mail merged document containing the cover letter with name and address fields followed by the static document.
Extremely simple. I always over complicate things!
I'll work on the changed formatting, but other then that this works

Comparing two files using BeyondCompare - check for content

I have two text files containing many lines of data (they are just some linux paths). The order of the paths are different in both files. I need Beyond Compare to compare the files based on content. Right now, it is checking line by line and pointing out errors if the same content is not present in the corresponding lines. I want beyondcompare to go through the entire file before saying that some path is missing. How to do it?
You can make Beyond Compare 4 sort the files before comparison. Open the files in the Text Compare, then click the dropdown on the right side of the Format toolbar button and select Sorted.

Cloning a form field with all styles using PDFBox

We need to fill out a form for our customers and get a PDF, which they can print, sign & send back. As the PDF has to change frequently, we use PDF templates with form fields, which are fill by our tool (pdftk). For various reasons I would like to switch to PDFBox (one is, that it required us to split the templates in individual pages and save them to disk, fill them and then merged them together again). So far everything works fine.
But I struggle with the page numbering. As the form is combined out of multiple templates, I have to fix the page number with PDFBox. So far, we used a styled input field page_num on every page. But since they all have the same name, I can't fill them individually.
Can I somehow split or clone the fields and give them individual names, so I can fill them individually? Of course, the styling should stay like it is.
With the help of the PDFBox guys I've got it to work. My solution is using JRuby, but I think you could pretty easily translate it to Java (remove the Java::OrgApachePdfbox... namespace).
doc = Java::OrgApachePdfboxPdmodel::PDDocument.load("input.pdf")
form = doc.getDocumentCatalog.getAcroForm
pages = doc.getDocumentCatalog.getAllPages.toArray.to_a
page_num = form.getField("page_num")
string = page_num.getDictionary
.getDictionaryObject(Java::OrgApachePdfboxCos::COSName::DA)
page_num.getKids.to_array.each do |widget|
widget_dict = widget.getDictionary
widget_dict.setString(Java::OrgApachePdfboxCos::COSName::DA, string.getString)
field = Java::OrgApachePdfboxPdmodelInteractiveForm::PDTextbox.new(
form,
widget_dict
)
field.setParent(page_num)
page = (pages.index(widget.getPage) + 1).to_s
field.setPartialName("page_num_#{page}")
field.setValue(page)
end
doc.save("output.pdf")

How to parse text from a plain text file and use the result to highlight a PDF file

Back in 2010, some guy claimed to be capable of doing this:
http://www.mobileread.com/forums/showthread.php?t=103847
"The Kindle stores its annotations in a Mobipocket (".mobi") file for each document and in one long text file named "My Clippings.txt." In this post I describe a system that synchronizes these annotations with PDF versions of the corresponding documents on a computer.
Overview
This system is embodied in an Applescript that parses the My Clippings file and controls the Skim PDF reader. The script first parses the clippings file. It then searches through the clippings and isolates any that come from documents on the kindle matching the filename of the currently open PDF file (the "pertinent clippings"). The script then iterates through each of the pertinent clippings, locating the matching text or location in the PDF document and applying highlights or adding notes where appropriate. The end result is an annotated, printable PDF document that matches the document on the kindle.
You can download the script here: http://dl.dropbox.com/u/2541109/KindleClippings.scpt. Before running the script, be sure to change the value of MyEmail to match your sending address and to verify that the Kindle mount point defined in MyClippingsFile is correct. You'll also need the free Skim PDF Reader.
To use it, send or copy a document file to your kindle. Remember, the kindle supports RTF, DOC, TXT and other common text formats and it will convert them into MobiPocket files internally for easier reading. Make some notes. Then take the same document that you just sent to the kindle and convert it to a PDF, e.g. by using the print to PDF feature in Mac OS X. Be sure to keep the filename the same. Open that same PDF in Skim and run the script. The highlights and notes should appear in the PDF.
If you're interested in how this works, read more on my blog here:
[not longer available]
Sadly, his script is no longer available, nor his blog.
Do you guys know if this is possible? I've been looking for this kind of functionality but can't find it anywhere.
This code, using python and PyMuPDF, works:
import fitz
# the document to annotate
doc = fitz.open("text_to_highlight.pdf")
# the text to be marked
text_list = [
"first piece of text",
"second piece of text",
"third piece of text"
]
for page in doc:
for text in text_list:
rl = page.search_for(text, quads = True)
page.add_highlight_annot(rl)
# save to a new PDF
doc.save("text_annotated.pdf")
The original 'My Clippings.txt' should be manipulated somehow, stringr could work but I found more useful to manipulate the text with multiple selections in Sublime Text---the goal is to have a list of highlights in the form of text_list above.
I am trying to do this using Python + a Windows macro creator (I'm a Win 7 user). You can use this approach to save the file as RTF, DOCX, PDF, etc. So far, it's been reasonably effective. Do note 2 things first:
1- the 'My Clippings' file only saves the text and the page, it does not save the location on the page (e.g., if you highlighted "mammals are animals" on page 15, it will give you this line and the page number, but if there are more than one "mammals are animals" on page 15, it's impossible to know which one you've highlighted). This is specially bad when you've highlighted a generic word, like "animals" or "the". And if you made comments by pressing on a word, this word is the only information you'll get about what in that page the comment refers to (e.g., I pressed on "animals" and the menu popped up, I selected 'Comment'. If "animals" appears 20 times on page 15, I cannot know to which of them my comment is refering).
2- The only way to retrieve the location on the page would be to analyze the *.pds and *.pdt files, inside the *.sdr folder in Kindle's drive ('Documents'). I can make no sense of these files.
In Python, you can run an easy code to extract the information you want from "My Clippings". Then you can use a macro creator to automate the process of copying the text and annotating it to the PDF (using Adobe Acrobat, for example), and then saving the PDF file.
Exemplifying with Adobe Acrobat:
Say I want to save all my highlights to the PDF file. First, I'll create a *.txt file on Python and run a script to copy all the strings related to the highlights to this new txt file (i.e., the highlighted text & the page number). Here's an example of such code (but first, copy and paste the "My Clippings.txt" file to the IDE start folder, e.g.: C:\Python27):
#for python 2.7.6
with open('My Clippings.txt','r') as rf:
with open('My Clippings Output.txt','w') as wf:
access = 0
bookTitle = 'Book Title'#put the book file's name as it's written in "My Clippings.txt"
for x in rf:
if access == 1:
wf.write(x)
if bookTitle in x:
access = 1
#for highlights only, instead of all annotations, include this if statement:
if (' | Added on ' in x) and ('- Your Note ' in x) or ('- Your Bookmark ' in x):
access = 0
if x == '==========\n':
access = 0
Then I'll create a macro to copy the page number in the "My Clippings Output.txt" file (it's inside the same folder you put the "My Clippings.txt" file), paste in Acrobat "page window", find (ctrl+f) the string in the page, then press "highlight". Done!
There's a catch in Acrobat though, the search/find function has a limit of ~28 chars, so your highlighted text can't be longer than that. I still don't know how to circumvent this limitation... I raised this problem here https://superuser.com/questions/884221/how-to-search-and-highlight-long-passages-in-a-pdf-file . As a bypass to the 28 chars limit on Acrobat, you can program the macro to copy using "shift"+"right arrow 28 times", and then use "cut" instead of "copy".
There are many free-to-use and libre macro creators out there, just google and choose the one you like best. For Windows, my favorite one is Pulover's Macro Creator. If you have any doubts about the process you can comment here or PM me. I'd prefer you to comment here, so that I can improve the answer

How to generate pdf from a libreoffice calc sheet fitting the page width?

Using LibreOffice 4.1.2.3 in Ubuntu 13.10 I am desperately trying to export the content of a sheet (4 columns) into a pdf (portrait), so all 4 columns fit on a page. A page nicely explains all the settings - but they do not have any effect!
I select all the range I want to export to a pdf (the 4 columns previously mentioned), click File -> Export as PDF, and no matter what I change (e.g. zoom to 7%), the generated pdf contains two pages: One page with the first three columns, and another page with the fourth column.
This is quite cumbersome and ridiculous, and any help is appreciated to solve this problem.
Maybe the LibreOffice Help is misleading here. Those settings (Fit width etc) just affect how to display the resulting PDF. If you want to scale the output to make it fit to a certain number of pages, you will have to modify the page styles's properties: Menu Format -> Page... -> Sheet Tab.
Here, you have three options:
Reduce / enlarge printout: set a fixed scaling factor (e.g. 50 %);
Fit print range(s) to width / heigth: set either the maximum width or maximum heigth in pages, scaling will be proportionally in every case;
Fit print range(s) on number of pages: set the maximum page number.
In your case, just select the third option and set the page number to 1:
I had the same settings as in tohuwawohu's answer, though page still ended too early after column EF, no matter of Scale, Page width or margin settings.
Then I discovered Format -> Print Ranges -> Edit menu with custom range - .
Changing to last column solved my problem. HTH somebody.
Go to File > Print Preview, and adjust the content size with the zoom slider. Click Export and you're done.