How do I convert multiple .pdf files to .ai files all at once? - pdf

As a part of my job, my boss wants me to convert thousands of .pdfs into .ai format in Illustrator CS6 without having to open each individual file (among the thousands) and save each pdf as a .ai. I need to convert these files by the thousands with a few simple steps.
Using Illustrator CS6, I have tried to do this by using the batch option by applying the same action to multiple files, (2). I have chosen two folders for input and output. A source from which I get the pdfs and a destination for the converted .pdfs in .ai format are placed.
While the conversions are successful, the multiple files, in this case 2, opened up individually in Illustrator, I had to save them rudimentarily.
This is not what I need. I need to be able to automatically convert thousands of pdfs into .ai's, without having to open and save each and every one of them.
How do I do this?

You can use this script as starting point. It works for singlepage .pdf files right away. For multipage files you will have to tweak it a bit more
(function(thisObj){
main();
function main(){
var pdffiles = File.openDialog ('select one or more pdf files', '*.pdf', true);
if(pdffiles === null){
return;
}
for(var f = 0; f < pdffiles.length;f++){
var pdf = pdffiles[f];
//~ alert(pdf);
var doc = app.open (pdf);
var namepattern = pdf.path + "/" + pdf.name + ".converted.ai";
var newai = null;
if(!(File(namepattern).exists)){
newai = new File(namepattern);
}else{
newai = File(namepattern);
}
doc.saveAs(newai);
doc.close (SaveOptions.DONOTSAVECHANGES);
}
}
})(this);

Related

ImageMagick.Net - convert pdf to tiff

I am running into an issue when converting from pdf to tiff. Here is the code I used (based on a sample provided in the documentation):
private void convImageMx(string pdfFile)
{
var settings = new MagickReadSettings();
// Settings the density to 300 dpi will create an image with a better quality
settings.Density = new Density(300, 300);
settings.ColorType = ColorType.TrueColor;
string tifpath = Path.GetDirectoryName(pdfFile) + "\\" + Path.GetFileNameWithoutExtension(pdfFile);
using (var images = new MagickImageCollection())
{
// Add all the pages of the pdf file to the collection
images.Read(pdfFile, settings);
var page = 1;
foreach (var image in images)
{
// Write page to file that contains the page number
image.Format = MagickFormat.Ptif;
image.Crop(image.Width, image.Height);
image.Write(tifpath + "_p_" + page + ".tif");
page++;
}
}
}
When I provide a multiple pdf as input, I get multiple tiff files - one file per page. However, each file contains 7 pages which are shrinking images of the original page and the size is very large (original pdf size is 328k, the size of one tiff is 67mb!).
I think I need to set the compression property as well as crop property correctly. But did not find any documentation with .NET.
[EDIT] I commented the line with density so that the size issue is fixed. However, the repeating images is still an issue.

Extract Text from Multipage Attachment PDF Using Google Apps Script

I have a Gmail attachment PDF with multiple scanned pages. When I use Google Apps Script to save the blob from the attachment to a Drive file, open the PDF manually from Google Drive, then select Open With Google Docs, all of the text from the PDF is displayed as a Google Doc. However, when I save the blob as a Google Doc with OCR, only the text from the image on the first page is saved to a Doc, accessed either manually or by code.
The code to get the blob and process it is:
function getAttachments(desiredLabel, processedLabel, emailQuery){
// Find emails
var threads = GmailApp.search(emailQuery);
if(threads.length > 0){
// Iterate through the emails
for(var i in threads){
var mesgs = threads[i].getMessages();
for(var j in mesgs){
var processingMesg = mesgs[j];
var attachments = processingMesg.getAttachments();
var processedAttachments = 0;
// Iterate through attachments
for(var k in attachments){
var attachment = attachments[k];
var attachmentName = attachment.getName();
var attachmentType = attachment.getContentType();
// Process PDFs
if (attachmentType.includes('pdf')) {
processedAttachments += 1;
var pdfBlob = attachment.copyBlob();
var filename = attachmentName + " " + processedAttachments;
processPDF(pdfBlob, filename);
}
}
}
}
}
}
function processPDF(pdfBlob, filename){
// Saves the blob as a PDF.
// All pages are displayed if I click on it from Google Drive after running this script.
let pdfFile = DriveApp.createFile(pdfBlob);
pdfFile.setName(filename);
// Saves the blob as an OCRed Doc.
let resources = {
title: filename,
mimeType: "application/pdf"
};
let options = {
ocr: true,
ocrLanguage: "en"
};
let file = Drive.Files.insert(resources, pdfBlob, options);
let fileID = file.getId();
// Open the file to get the text.
// Only the text of the image on the first page is available in the Doc.
let doc = DocumentApp.openById(fileID);
let docText = doc.getBody().getText();
}
If I try to use Google Docs to read the PDF without OCR directly, I get Exception: Invalid argument, for example:
DocumentApp.openById(pdfFile.getId());
How do I get the text from all of the pages of the PDF?
DocumentApp.openById is a method that can only be used for Google Docs documents
pdfFile can only be "opened" with the DriveApp - DriveApp.getFileById(pdfFile.getId());
Opening a file with DriveApp allows you to use the following methods on the file
When it comes to OCR conversion, your code works for me correctly to convert all pages of a PDF document to Google Docs, so you error source is likely come from the attachment itself / the way you retrieve the blob
Mind that OCR conversion is not good at preserving formatting, so a two page PDF might be collapsed into a one-page Docs - depneding on the formatting of the PDF

Apps Script save as pdf doesn't include drawings and images

I want to save a Google Doc file as a pdf in the same Google Drive folder as my current file. I know I can download the file as a pdf, but then I have to upload it into the same Google Drive folder. I am trying to skip the upload step.
I have created a script to accomplish all of this, but I cannot get the images and drawings to be included in the resulting pdf.
Here is my code:
function onOpen() {
// Add a custom menu to the spreadsheet.
var ui = DocumentApp.getUi();
var menu = ui.createAddonMenu();
menu.addItem('Save As PDF','saveToPDF')
.addToUi();
}
function saveToPDF(){
var currentDocument = DocumentApp.getActiveDocument();
var parentFolder = DriveApp.getFileById(currentDocument.getId()).getParents();
var folderId = parentFolder.next().getId();
var currentFolder = DriveApp.getFolderById(folderId);
var pdf = currentDocument.getAs('application/PDF');
pdf.setName(currentDocument.getName() + ".pdf");
// Check if the file already exists and add a datecode if it does
var hasFile = DriveApp.getFilesByName(pdf.getName());
if(hasFile.hasNext()){
var d = new Date();
var dateCode = d.getYear()+ "" + ("0" + (d.getMonth() + 1)).slice(-2) + "" + ("0" + (d.getDate())).slice(-2);
pdf.setName(currentDocument.getName() + "_" + dateCode +".pdf");
}
// Create the file (puts it in the root folder)
var file = DriveApp.createFile(pdf);
// Add to source document original folder
currentFolder.addFile(file);
// Remove the new file from the root folder
DriveApp.getRootFolder().removeFile(file);
}
Is there another way to create the pdf, save to the current Google Drive folder, and not lose the images?
UPDATE
I just tested and realized that even if I export as a pdf, the images and drawings aren't included. There has to be a way to do this.
UPDATE 2
I have been testing some more and have learned a few things:
Images in the header/footer are included if they are In line, but if I use Wrap text or Break text they are not.
Images in the body can be any of the three
However, if I use the "Project Proposal" template, they include an image in the footer with Break text and it exports to pdf. I can't tell why their image is any different.
I don't want to use In line because I want the image to touch both sides of the page and In line will always leave at least 1 pixel to the left of the image.

How to Download PDF Links in Column and Save to Common Folder

We have a column that contains links to PDFs that starts on line 4 (e.g B4:B). I am trying to find a way to automatically download the PDF files that are accessed via the links to a folder on Drive. This is what I have so far:
function savePDFs() {
var sheet = SpreadsheetApp.getActiveSheet();
var data = sheet.getDataRange().getValues();
for (var i = 3; i < data.length; i++) {
Logger.log(data[i][1]);
}
}
Presumably the above code would write the links starting in column B (index value of [1]) on row 4 (i value of 3) (ie., B4) until the bottom of the data set (eg., data.length()).
I'm now confused about how to access and save the PDF link that are written in the logger to a folder.
Would someone be willing to help me out? I'm currently having to go to each link, click Save Link As... and then navigate to the folder that I'd like to save the linked PDF to. My hope is to modify the above process using code.
Update: I found this bit of code here that may help me out. Note, I changed the PDF link to a currently valid PDF link.
var urlOfThePdf = 'http://download.p4c.philips.com/l4b/9/929000277411_eu/929000277411_eu_pss_aenaa.pdf';// an example of online pdf file
var folderName = 'GAS';// an example of folder name
function saveInDriveFolder(){
var folder = DocsList.getFolder(folderName);// get the folder
var file = UrlFetchApp.fetch(urlOfThePdf); // get the file content as blob
folder.createFile(file);//create the file directly in the folder
}
Okay, I'm going to go and noodle with the data that is in the logger to confirm that the data is in properly formatted PDF links, then I'm going to test this new bit of code out. I feel like I'm getting close.
You can't force a download of a file from an apps script, you must try that from an HTMLService and not sure it will work.
For your need I would recommend to create a dedicated folder and you add all the pdf in it and you use the download function of the drive interface to download all files in one clic.
In drive, a file can be put in several folders so the pdf files stay in the original folder but you create a new folder 'PDF for download" for example and you put them in it. To do that from drive interface you have to click on "shift"+Z when file(s) is/are selected.
For you current list of file you just have to add in your loop the add to folder function. You can use this function.
function addFileToFolder(id){
var folderPDF = DriveApp.getFolderById("Id OFFolder to put pdf");
var file = DriveApp.getFileById(id);
folderPDF.addFile(file);
}
EDIT : Function will browse list of url, get the file and make a copy in a dedicated folder on the user drive.
function downloadInDriveFolder(){
var folderID = 'Id of the folder';// put id of the folder
var folder = DriveApp.getFolderById(folderID)// get the folder
var sheet = SpreadsheetApp.getActiveSheet();
var data = sheet.getDataRange().getValues();
for (var i = 3; i < data.length; i++) {
var blob = UrlFetchApp.fetch(data[i][1]).getContent();
var pdf = DriveApp.createFile(blob);
pdf.setName(data[i][0]);//Put as name of the file the value in col A
folder.addFile(pdf);
}
}
Well I figured it out. I was expecting more code, but this does it for me:
function listPDFs() {
var out = new Array();
var row = 3; //row index of 0 = row 1
var column = 4; // column index of 0 = column A
var sheet = SpreadsheetApp.getActiveSheet();
var data = sheet.getDataRange().getValues();
var folder = DriveApp.getFolderById("this is where you paste your folder id"); // destination folder (this is the 0978SDFSDFKJHSDF078Y98hkyo looking value when you right click your folder and select "Get Link")
for (var i=row ; i<data.length ; i++) {
if(data[i][column] !== "") {
var file = UrlFetchApp.fetch(data[i][column]);
folder.createFile(file);
}
}
return
}
As you can see, I included a row and column variable so that I could easily change these.
I haven't figured out how to assemble them into a merged PDF, but I did figure out that I could sort them by date (which places the top most item first) and then right click and select "Open With...PDF Mergy", which then moves the PDFs into PDF Mergy and merges them up in the correct order. You can find PDF Mergy in the Chrome App Store. If I figure out how to automatically call PDF Mergy from GAS, I'll post that up--but for the time being the above code has saved us a ton of time...so I'm calling it good enough for the time being.

Need alternative to local or remote goto/destinations with merged documents

BACKGROUND
I have a java program that analyzes data and creates a pdf report using itext 5.
I recently had to add a summary of major problems at the start of the document so a user would not have to read over a hundred pages to find problems. Problems are only discovered when serially looking through the data.
I solved the problem by creating 3 pdf documents and then merging them, a start/title pdf, the summary of problems pdf, and the body or analysis pdf. (Basically splitting the original document at the point I wanted to insert the summary)
I use PdfReader and PdfCopy to combine the documents. I am able to keep the chapter bookmarks OK.
THE PROBLEM
As I encounter a significant problem I add it to the 'summary' document. I want to add a link in the summary to point to the problem in the body.
I tried to use Chunk.setLocalDestination and setLocalGoto but realized why that did not work, so I tried using setLocalDestination and setRemoteGoto (with and without 'file://'), but that did not work either. (Also, I used the final pdf document name in the RemoteGoto, not the temporary pdf document name.)
I do not want to use bookmarks because that seems wrong and would not look right.
I am hoping someone could suggest an alternate method or make a suggestion.
To recap, in my current code a create a Chunk with setLocalDestination and that chunk goes into the 'body' document. At the same time I create a setRemoteGoto which is put into the summary document. I was hoping when they were combined the link would work, but when the link is clicked, you go to the first page of the combined document.
Thanks.....
PS I have both iText in action books
CLARIFICATION 3/5/2014
What I was calling 'bookmarks' are really Chapter class entities that are inserted into sections of the 3 documents as they are being created.
After saving the 3 documents, PdfReader is used to open each and PdfCopy is used to put them into a new, final document.
I get the data from the Chapters, which creates the 'bookmarks' on the left side of the Pdf reader used by the user, e.g. Acrobat Reader.
int thisPdfPages = reader.getNumberOfPages();
reader.consolidateNamedDestinations();
java.util.List<HashMap<String, Object>> bookmarks = SimpleBookmark.getBookmark(reader);
if (bookmarks != null) {
if (pageOffset != 0) {
if (debug3) auditLogger.log("Shifting pages by " + pageOffset );
SimpleBookmark.shiftPageNumbers(bookmarks, pageOffset, null);
}
masterBookmarks.addAll(bookmarks);
}
for (int i = 0; i < thisPdfPages;) {
page = copy.getImportedPage(reader, ++i);
stamp = copy.createPageStamp(page);
// add page numbers
ColumnText.showTextAligned(stamp.getUnderContent(), Element.ALIGN_CENTER, new Phrase(String.format("page %d of %d", start + i, totalPages)), 297.5f, 28, 0);
stamp.alterContents();
copy.addPage(page);
}
PRAcroForm form = reader.getAcroForm();
if (form != null) {
copy.copyAcroForm(reader);
}
When analyzing the data I have 2 documents open, a base document which contains all the details and a summary document which contains notable events over some thresholds.
//NOTE section is part of the 'body' document
//NOTE summaryPhrase is a part of the 'summary' document
String linkName = "summaryPf_" + networkid ;
//create Link target
section.add(new Chunk("CHANGE TO EMPTY STRING WHEN WORKING").setLocalDestination( linkName ));
//create Link
Chunk linkChunk = new Chunk( "[Link] " );
Font linkFont = new Font( regularFont );
linkFont.setColor(BaseColor.BLUE);
linkFont.setStyle( Font.UNDERLINE );
linkChunk.setFont( linkFont );
boolean useLocal = true;
// both local and remote goto's fail
if (useLocal) {
linkChunk.setLocalGoto( linkName);
} else {
// all permutations of setting filename fail,
// but it does bring up a permissions dialog when the link is clicked.
//String remotePdfName = "file://./" + pdfReportName ;
//String remotePdfName = "file://" + pdfReportName ;
//String remotePdfName = "file:" + pdfReportName ;
String remotePdfName = pdfReportName ;
linkChunk.setRemoteGoto( remotePdfName, linkName);
}
// add link to summary document
summaryPhrase.add( linkChunk );
summaryPhrase.add( String.format("There were %d devices with ping failures", summaryCount));
summaryPhrase.add( Chunk.NEWLINE );
}
If I use setLocalGoto, when you click the link in the final document you goto the first page.
If I use setRemoteGoto, a dialog ask permission to go to a document, but the document fails to open, tried several permutations on filename.