Splitting pages within a PDF with ArcoJS / Acrobat JS with a given array of names - acrobat

So, I am super new to using the JS interface within Acrobat and I am trying to write something for splitting PDF pages easily to an array of file names. I cannot find a lot of snippets around that seems to show me how to work with Acrobat JS. Can you provide some guidance on how a script similar to such will look like and how I can execute it within Acrobat? Thanks!

First, you will need Acrobat Professional or Standard for JS tasks generally. And you will execute code like you do within your normal terminal/immediates window within what they call the debugger tool for Javascript. You will need to first activate JS within Acrobat by going to Preferences and activate the Debugger. After you set the preferences, restart Acrobat, and find the tools for the Javascript debugger (different places based on your version, google it if you can't find it).
Once you get the debugger running, run the code below after modifying for the file names you wish to use and the appropriate file paths. Then highlight the entire code block and hit Ctrl+Enter and it will automatically split the pages for you. Enjoy.
Split();
function Split() {
var totalPages = this.numPages;
var i;
var arrNames = [ "SOME ARRAY" ];
var targetPath = "/C/Users/...SOMEPATH/";
try {
for (i = 0; i < totalPages; i++) {
this.extractPages({
nStart: i,
cPath: targetPath +arrNames[i] + ".pdf"
});
console.println("Completed: " + targetPath + arrNames[i] + ".pdf");
}
} catch (e) {
console.println("Aborted: " + e);
}
}

Related

How to clean existing properties and replace with metadata template on Photoshop (scripting)?

While creating a script that would automate all the different tasks I do when I start working on a new picture on Photoshop, I encountered the following problem.
Manually, I would Ctrl + Alt + Shift + I, click on the template I want and choose the option "Clear existing properties and replace with template properties".
I can't find the way to do precisely this. The best thing I managed to find is something like this :
app.activeDocument.info.author = "test";
app.activeDocument.info.caption = "";
app.activeDocument.info.captionWriter = "";
app.activeDocument.info.headline = "";
app.activeDocument.info.instructions = "";
app.activeDocument.info.keywords = "";
app.activeDocument.info.authorPosition = "";
app.activeDocument.info.credit = "";
app.activeDocument.info.source = "";
app.activeDocument.info.category = "";
app.activeDocument.info.supplementalCategories = "";
app.activeDocument.info.title = "";
// etc.
And it actually doesn't really work like the "Clear existing properties and replace with template properties".
I didn't find anything on the Photoshop scripting guide, nor on the internet. Any help would be greatly appreciated !
What I think is the problem is Photoshop separates file-metadata from its activeDocument-metadata. What you see in "File info..." (via Ctrl+Alt+Shift+I) is supposed to represent the file in the filesystem, which metadata is embedded in.
There are several scripting guides to Photoshop scripting. I think the one relevant for you would be "Javascript Tools Guide", specifically the chapter 10 "Scripting Access to XMP Metadata".
Is it important for you to set up the metadata already when creating a new picture? If not, you may want to look at a solution using a customized export script.
It customizes XMP-metadata upon exporting like
Create a basic metadata object:
var meta = new XMPMeta();
Provide a namespaceURI (see XMP specs) known to photoshop along with tag name, and value:
meta.setProperty(XMPConst.NS_XMP, "CreatorTool", app.version);
Save the image temporarily (using other script):
var imgFile = new File(fileName);
saveImage(fileName);
Finish saving by adding the metadata-object:
var metaFile = new XMPFile(imgFile.fsName, XMPConst.FILE_UNKNOWN, XMPConst.OPEN_FOR_UPDATE);
if (metaFile.canPutXMP(meta)) { metaFile.putXMP(meta); }
metaFile.closeFile(XMPConst.CLOSE_UPDATE_SAFELY);
Doing it this way also erases any existing or default metadata.

Handle download dialog box in SlimerJS

I have written a script that clicks on a link which can download a mp3 file. The problem I am facing is when the script simulates the click on that link, a download dialog box pops up like this:
Download Dialog Box
Now, I want to save this file to some path of my choice and automate this whole process. I am clueless on how to handle this dialog box.
Here's a script adapted from this blog post to download a file.
In SlimerJS it is possible to use response.body inside the onResourceReceived handler. However to prevent using too much memory it does not get anything by default. You have to first set page.captureContent to say what you want. You assign an array of regexes to page.captureContent to say which files to receive. The regex is applied to the mime-type. In the example code below I use /.*/ to mean "get everything". Using [/^image/.+$/] should just get images, etc.
var fs=require('fs');
var page = require('webpage').create();
fs.makeTree('contents');
page.captureContent = [ /.*/ ];
page.onResourceReceived = function(response) {
if(response.stage!="end" || !response.bodySize)
{
return;
}
var matches = response.url.match(/[/]([^/]+)$/);
var fname = "contents/"+matches[1];
console.log("Saving "+response.bodySize+" bytes to "+fname);
fs.write(fname,response.body);
phantom.exit();
};
page.onResourceRequested = function(requestData, networkRequest) {
//console.log('Request (#' + requestData.id + '): ' + JSON.stringify(requestData));
};
page.open("http://....mp3", function(){
});
You can't control a dialog box. SlimerJS doesn't have API for this action.
Firefox generates a temp "downloadfile.extension.part" file which contains the content. Just simply rename the file ex. myfile.csv.part > myfile.csv
locally if working on a mac you should find the .part file in the downloads directory, on linux /temp/ folder
Not the most elegant solution but should do the trick

How to check multiple PDF files for annotations/comments?

Problem: I routinely receive PDF reports and annotate (highlight etc.) some of them. I had the bad habit of saving the annotated PDFs together with the non-annotated PDFs. I now have hundreds of PDF files in the same folder, some annotated and some not. Is there a way to check every PDF file for annotations and copy only the annotated ones to a new folder?
Thanks a lot!
I'm on Win 7 64bit, I have Adobe Acrobat XI installed and I'm able to do some beginner coding in Python and Javascript
Please ignore the following suggestion, since the answers already solved the problem.
EDIT: Following Mr. Wyss' suggestion, I created the following code for Acrobat's Javascript console to be run only once at the beginning:
counter = 1;
// Open a new report
var rep = new Report();
rep.size = 1.2;
rep.color = color.blue;
rep.writeText("Files WITH Annotations");
Then this code should be applied to all PDFs:
this.syncAnnotScan();
annots = this.getAnnots();
path = this.path;
if (annots) {
rep.color = color.black;
rep.writeText(" ");
rep.writeText(counter.toString()+"- "+path);
rep.writeText(" ");
if (counter% 20 == 0) {
rep.breakPage();
}
counter++;
}
And, at last, one code to be run only once at the end:
//Now open the report
var docRep = rep.open("files_with_annots.pdf");
There are two problems with this solution:
1. The "Action Wizard" seems to always apply the same code afresh to each PDF (that means that the "counter" variable, for instance, is meaningless; it will always be = 1. But more importantly, var "rep" will be unassigned when the middle code is run on different PDFs).
2. How can I make the codes that should be run only once run only at the beginning or at the end, instead of running everytime for every single PDF (like it does by default)?
Thank you very much again for your help!
This would be possible using the Action Wizard to put together an action.
The function to determine whether there are annotations in the document would be done in Acrobat JavaScript. Roughly, the core function would look like this:
this.syncAnnotScan() ; // updates all annots
var myAnnots = this.getAnnots() ;
if (myAnnots != null) {
// do something if there are annots
} else {
// do something if there are no annots
}
And that should get you there.
I am not completely positive, but I think there is also a Preflight check which tells you whether there are annotations in the document. If so, you would create a Preflight droplet, which would sort out the annotated and not annotated documents.
Mr. Wyss is right, here's a step-by-step guide:
In Acrobat XI Pro, go to the 'Tools' panel on the right side
Click on the 'Action Wizard' tab (you must first make it visible, though)
Click on 'Create New Action...', choose 'More tools' > 'Execute Javascript' and add it to right-hand pane > click on 'Execute Javascript' > 'Specify Settings' (uncheck 'prompt user' if you want) > paste this code:
.
this.syncAnnotScan();
var annots = this.getAnnots();
var fname = this.documentFileName;
fname = fname.replace(",", ";");
var errormsg = "";
if (annots) {
try {
this.saveAs({
cPath: "/c/folder/"+fname,
bPromptToOverwrite: false //make this 'true' if you want to be prompted on overwrites
});
} catch(e) {
for (var i in e)
{errormsg+= (i + ": " + e[i]+ " / ");}
app.alert({
cMsg: "Error! Unable to save the file under this name ('"+fname+"'- possibly an unicode string?) See this: "+errormsg,
cTitle: "Damn you Acrobat"
});
}
;}
annots = 0;
Save and run it! All your annotated PDFs will be saved to 'c:\folder' (but only if this folder already exists!)
Be sure to enable first Javascript in 'Edit' > 'Preferences...' > 'Javascript' > 'Enable Acrobat Javascript'.
VERY IMPORTANT: Acrobat's JS has a bug that doesn't allow Docs to be saved with commas (",") in their names (e.g., "Meeting with suppliers, May 11th.pdf" - this will get an error). Therefore, I substitute in the code above all "," for ";".

Print PDF in Website

I have been searching for days for a solution to this problem.
Description : I have a website which loads a PDF dynamically via an iFrame. The PDF is saved on the server and the user of the website can view the pdf on the website.
Problem : Introduce a Print button on website which prints the PDF which was created dynamically and saved on the server.
Is this even possible ? I am looking at a cross-browser implementation as well to make things worse. I have tried n number of JS options from the web but none of them seem to work. I can not seem to get the PDF printed in the same way as it looks. To put it short, I am trying to emulate the print button which appears on the PDF when it is loaded. Is there an option to pass the pdf document from the server to the print dialog box ?
Description : I have a website which loads a PDF dynamically via an iFrame. The PDF is saved on the server and the user of the website can view the pdf on the website.
Problem : Introduce a Print button on website which prints the PDF which was created dynamically and saved on the server.
Solution : I could not find an exact solution to this problem, but here is how I solved the problem -
Create the 'Print' as per req and redirect that to another page which has only the PDF.
Copy the previous PDF & Create new PDF with JS - this.print() such that when it opens up, the print dialog pops up directly to the user.
In the new page -
if ("Location of PDF " != null)
{
sPdf = "Location of PDF ";
PdfReader pReader = new PdfReader(sPdf);
Document document = new Document
(pReader.GetPageSizeWithRotation(ApplicationConstants.INDEX_ONE));
int n = pReader.NumberOfPages;
FileStream fs = new FileStream
("New PDF location",
FileMode.Create, FileAccess.Write);
PdfCopy copy = new PdfCopy(document, fs);
// Write to pdf
document.Open();
for (int i = ApplicationConstants.INDEX_ONE; i <= n; i++)
{
PdfImportedPage page = copy.GetImportedPage(pReader, i);
copy.AddPage(page);
}
copy.AddJavaScript("this.print(true);", true);
document.Close();
pReader.Close();
inStr = File.OpenRead("New PDF location");
while ((bytecnt = inStr.Read
(buffer, ApplicationConstants.INDEX_ZERO, buffer.Length))
> ApplicationConstants.INDEX_ZERO)
{
if (Context.Response.IsClientConnected)
{
Context.Response.ContentType = "application/PDF";
Context.Response.OutputStream.Write(buffer,
ApplicationConstants.INDEX_ZERO, buffer.Length);
Context.Response.Flush();
}
}
}
Please note that I am using itextsharp to inject the JS script into the new PDF. Hope this helps someone else. I am trying to find another solution without the usage of itextsharp or any other dll but this will have to do for now.
I am not sure if this will work, but you could try launching a popup window with a special version of your PDF file that opens the print dialog when opened. Then close the popup afterwards. This last part might be tricky since I think there is no clean way to know if the print dialog has been closed.

Selenium 2: How to save a HTML page including all referenced resources (css, js, images...)?

In Selenium 2, the WebDriver object only offers a method getPageSource() which saves the raw HTML page without any CSS, JS, images etc.
Is there a way to also save all referenced resources in the HTML page (similar to HtmlUnit's HtmlPage.save())?
I know I'm royally late with my answer, but I didn't really find an answer for this question when I was searching myself. So I did something myself, hope I can help some people still.
For c# here's how I did it:
using system.net;
string DataDirectory = "C:\\Temp\\AutoTest\\Data\\";
string PageSourceHTML = Driver.PageSource;
string[] StringSeparators = new string[] { "<" };
string[] Result = PageSourceHTML.Split(StringSeparators, StringSplitOptions.None);
string CSSFile;
string FileName = "filename.html";
System.IO.File.WriteAllText(DataDirectory + FileName, PageSourceHTML);
foreach(string S in Result)
{
if(S.Contains("stylesheet"))
{
CSSFile = S.Substring(28); // strip off "link rel="stylesheet" href="
CSSFile = CSSFile.Substring(0,CSSFile.Length-10); // strip off characters behind, like " />" and newline, spaces until next "<" was found. Can and probably will be different in your case.
System.IO.Directory.CreateDirectory(DataDirectory + "\\" + CSSFile.Substring(0, CSSFile.LastIndexOf("/"))); //create the CSS direcotry structure
var Client = new WebClient();
Client.DownloadFile(Browser.Browser.WebUrl + "/" + CSSFile, DataDirectory + "\\" + CSSFile); // download the file and save it with the same filename under the same relative path.
}
}
I'm sure it could be improved to include any unforeseen situations, but for my website in test it will always work like this.
Nope. If you can, go for HtmlUnit for this particular task.
The best you could do, I think, is Robot. Press Ctrl + S simultaneously, the confirm with Enter. It's blind, it's imperfect, but it's the closest thing to your need.
You can use the selenium interactions to handle it.
using OpenQA.Selenium.Interactions;
There are a few ways to do it as well. One of the ways that I handle something like this, is to find an item central to the page, or whichever area that you wish to save, and do an actions builder.
var htmlElement = driver.FindElement(By.XPath("//your path"));
Actions action = new Actions(driver);
try
{
action.MoveToElement(htmlElement).ContextClick(htmlElement).SendKeys("p").Build().Perform();
}
catch(WebDriverException){}
This will simply right click on the area, and then send the key "p" which is the 'Save Page As' hotkey in firefox when right clicking. Another way is to have the builder send the keys.
var htmlElement = driver.FindElement(By.Xpath("//your path"));
action.MoveToElement(htmlElement);
try
{
action.KeyDown(Keys.Control).SendKeys("S").KeyUp(Keys.Control).Build().Perform();
}
catch(WebDriverException){}
Note that in both cases, if you leave the scope of the driver, say a windows form, then you will have to switch your case / code to handle the windows form when it pops up. Selenium will also have issues with nothing being returned after the keys are sent, so the Try Catches are there for that. If anyone has a way to work around that, it would be awesome.