I scanned old documents into multiple pages PDFs (typically, 50 pages). Each of the PDF pages encapsulates several pages of the original document. I would like to preprocess the PDF so that its pages matches those of the original document.
Because the original documents do not all have the same format, this necessarily implies manual stages, such as selecting the pages of the original document (red rectangles in the image below).
Many tools can do that, but given the amount of PDFs, I would like something as convenient as possible, in particular, the red rectangles should have always the same size.
So the workflow would be:
Go to page 1 of PDF
Choose rectangle size once for all
Move rectangle to page 1 of original document, extract
Move rectangle to page 2 of original document, extract
Go to page 2 of PDF
Move rectangle to page 1 of original document, extract
...
This is a typical page of a PDF, the red rectangles correspond the pages of the original documents that I would like to extract.
Question Do you know any tool (Linux or Windows), ideally free, that would relevant for what I am trying to do ?
Note: this is related, but in my case the parts I want to extract are not always at the same position (otherwise it should be done easily with pdfcrop and a little script).
I finally decided to write a Mathematica piece of code, reproduced here if it can help someone one day:
(* import *)
fileName = "mypdf.pdf;
pdf = Import[fileName];
(* desired output size *)
rectangleSize = {1620, 2070}
i = 1;
(* update function: crop and save when clicked *)
update[a_, b_] := Block[{},
Export[fileName <> IntegerString[i, 10, 3] <> ".pdf",
ImageTake[
pdf[[i]], {a, a + rectangleSize[[2]]}, {b,
b + rectangleSize[[1]]}]];
i += 2;]
(* a function to deal with pages that should not be cropped, if any *)
relaxCropSize = Block[{},
buffer = rectangleSize;
rectangleSize = ImageDimensions[pdf[[i]]];]
(* the "interface" *)
Manipulate[
ImageResize[ImageTake[pdf[[i]], {a, a + rectangleSize[[2]]}, {b,
b + rectangleSize[[1]]}], 200], {a, 1,
ImageDimensions[pdf[[i]]][[2]] - rectangleSize[[2]]}, {b, 1,
ImageDimensions[pdf[[i]]][[1]] - rectangleSize[[1]]},
Row[{Spacer[100],
Button["Save and next page", update[a, b], Method -> "Queued",
ImageSize -> 100]}],
Row[{Spacer[100],
Button["Relax Crop Size", relaxCropSize, Method -> "Queued",
ImageSize -> 100]}]]
Slide a and b to adjust rectangle, click "save and next page" to save the result and go to next page.
What I'm trying to do: Iterate over each page in a PDF, and extract the number of words on each page.
What is happening instead: The code below will return 0 words for any page that has not become "editable". Although I have selected for all pages to become editable at once, Adobe will not maintain the editability of a page for very long after I have left that page. Side note: It also seems to cap how many pages I can have "editable" at once. This is a problem because right now I'm working with a 10 page selection of a pdf file. This same code will have to work with a 120+ page pdf. Please click 'Edit PDF'-->'Scanned Documents'-->'Settings' to see what I mean by "editable". I have already selected the option to have all pages become editable at once.
What I've tried so far: I've tried various ways to get Acrobat to make the page being iterated upon the "active one" so that it would become editable. I've tried manually setting the page number after each iteration of the for loop, and including an artificial delay like with the h variabled for loop in the sample code. I've tried looking for some sort of method that determines which page is the "active one" but I've had no luck so far.
CurrDoc = app.activeDocs[0]
CurrDoc.title;
NumPagesInDoc = CurrDoc.numPages;
console.println("Document has "+NumPagesInDoc+" pages");
for (j=0; j<NumPagesInDoc; j++)
{
NumWordsOnPage = CurrDoc.getPageNumWords(j);
CurrDoc.pageNum = j;
for(h=0; h<10000;h++); //<--I've tried adding in delays to give time so that
//Acrobat can catch up, but this hasn't worked.
console.println("Page number: "+j+" has this number of words: "+ NumWordsOnPage);
};
Output:
Document has 10 pages
Page number: 0 has this number of words: 309
Page number: 1 has this number of words: 0
Page number: 2 has this number of words: 0
Page number: 3 has this number of words: 0
Page number: 4 has this number of words: 0
Page number: 5 has this number of words: 0
Page number: 6 has this number of words: 0
Page number: 7 has this number of words: 0
Page number: 8 has this number of words: 0
Page number: 9 has this number of words: 158
true
Note: Different pages might work on the output at different times depending on which pages I've clicked on most recently before running the script.
Any guidance or help would be greatly appreciated. Thank you for your time.
So. I'm still not entirely sure what the issue is, but I've found a way to get acrobat to function most of the time.
Before clicking the "make all pages editable" option, zoom all the way out until you can see all the pages in the document. For whatever reason, when I did this, it would seem to refresh something about the settings and once again make all the pages editable. This even seemed to work when I opened a totally different pdf and pressed "make all pages editable" even without zooming out.
I want to move the search bar which is at bottom of the data table in admin template to top ..
as this is defined in javascript .. any suggestion pls..
Taking this piece of code from the link you provided.
/*
* Set DOM structure for table controls
* #url http://www.datatables.net/examples/basic_init/dom.html
*/
sDom: '<"block-controls"<"controls-buttons"p>>rti<"block-footer clearfix"lf>',
Here you can see this sDom right. Thats the option in datatable used to set the Table structure and to place the sections accordingly. I would suggest you to take a look at Datatables Dom Settings
The letters you see in the settings are p , r,t, i, l, f. They actually mean
p - pagination control
r - processing display element
t - The table!
i - Table information summary
l - length changing input control
f - filtering input
So by replacing this Dom settings you should be able to place the items as you wish.
So what you would need is .
sDom: 'lftir'
Add the div's and styling accordingly as explained in the Markup and Styling in the above provided link.
How to specify page size in dynamically in yii view?(not a grid view)
In my custom view page there is a search option to find list of users between 2 date
i need 2 know how to specify the page size dynamically? If number of rows less than 10 how to hide pagination ?
please check screen-shoot
I dont want the page navigation after search it confusing users..
my controller code
$criteria=new CDbCriteria();
$count=CustomerPayments::model()->count($criteria);
$pages=new CPagination($count);
// results per page
$pages->pageSize=10;
$pages->applyLimit($criteria);
$paymnt_details=CustomerPayments::model()->findAll($criteria);
$this->render('renewal',array('paymnt_details'=>$paymnt_details,'report'=>$report,'branch'=>$branch,'model'=>$model,'pages' => $pages,));
}
my view
Link
I'm assuming you want the entire list of search result be shown on a single page, no-matter how long it is?
If so, this is quite easy to achive.
As I can see in your code, you are now defining your pageSize to have a size of 10. You can update it to be dynamically be doing this:
$pages->pageSize = $count;
To remove the pagesizes you can change the css of your view file, but for that we would need your view file to see how you defined it.
Can someone help me to identify the page boundaries in a multiple page job .prn file ?
My goal is to use a PJL command #PJL SET MEDIASOURCE=TRAYX at the start of every page of a multiple job file.
Where X = page number, for example:
for Page 1 : #PJL SET MEDIASOURCE=TRAY1
for Page 2 : #PJL SET MEDIASOURCE=TRAY2