Open a .pdf file - vba

I am trying to open a .pdf file within Excel like an iframe in HTML.
My requirement is:
Save the path of multiple PDF files in Excel.
Excel should open each .pdf file within Excel itself (no need to open that in a separate .pdf window).
It should be like iframe in HTML. The user should be able
to view the .pdf within Excel itself.
I know this is little weird, but can anybody help me?

you could probably get the filenames via vba.
here's some that claim to work:
Loop through files in a folder using VBA?
So far as opening a pdf in excel - thats kinda pushing it.
Since your request is exotic I can think of an exotic workaround:
If you can spare the interactivity you can simply make copies and convert your pdfs to word formats to work with them and load them in that way. I've seen people convert pdfs to Jpgs just to load them in some other documents but thats rudimentary and really fringe.
Otherwise you are facing a lot of custom coding that needs to make it possible.

Related

Multiple PDF + Excel list

i am trying to do something a bit difficult (for me) here. So i have a PDF with over a 100 pages and an excel sheet with all the corresponding names of each page.
So what i want is a way to split the PDF into individual PDFs and rename them according to the excel sheet.
Thanks in advance.
Do you have Acrobat?
There is a function to split PDFs by pages, book marks etc in there. I am sure other PDF editing software has a similar feature.
You can then write a VBA script or similar to read the Excel and rename the files, if you are comfortable with code: How to rename multiple pdf files used excel database vba
Or you could try a file name changing software, something like http://www.bulkrenameutility.co.uk/Main_Intro.php

Programmatically fill in pdf text fields

I have some pdf forms that I am trying to populate the text fields programmatically. The program I am working with is MS Access 2013. I have tried to fill the fields directly, but no such luck. The pdf will be local to the Access database.
Is there a way that I could write the fields to a text file and kick off another script (powershell, javascript or whatever) to read that file and fill in the text fields? Im not familiar with other languages, but will figure it out if it will work.
So the fix is going to be writing to a text file from Access, opening the pdf and inside the pdf is javascript to read in that file and update the fields. A little clunky but it is working.

docx4j word/googledocs compatibility

I'm creating a program which extracts a docx file, displays it in a Javafx graphic interface with buttons in place of flags put in the docx, and when one puts on it, it modifies the docx taken in input.
I'm using the docx4j API for extracting and modifying the document.
The problem is that the program fails if i take in entry a docx generated from Microsoft Word. I'm forced to use an artifice.
I'm taking my docx made on Word, then i load it in Google Docs and I use the "Download in .docx format" option. If i directly put the docx from Word in my program, it fails.
I noticed my Word file was two times lighter after being passed trough google doc. Same, if I tale a docx file downloaded from Google Docs, if i open it in Word and modify one letter and save it, he becomes two times heavier. For the record i use word 2008.
That's it, so I'd like to know if someone know what explains this difference.
Thanks

Converting Office files to PDF using SaveAs

I have a question about saving the Microsoft Office format file like doc, xls to PDF. I am using the SavesAs option in VB.Net to convert the files to PDF programatically. However, I need to open the file differently to achieve this.
If file is an Excel, then I need to open it using excel API's and then perform SaveAs. Similarly with Word documents. Is there a way so that I can open this documents generically and then use SaveAs option to convert them to PDF irrespective of opening the files?
The Word API and Excel API are two distinct APIs, even though they may both have a SaveAs method. You will not be able to use the same call for both file types.

Extract text from a PowerPoint (.ppt or .pptx) file?

I'm currently using a combination of OpenOffice macros and a pdf2text program to extract text and would like to find an easier, more efficient way getting the text out of a PowerPoint file.
I've tried using the Apache POI library and have not had much luck, encountered numerous exceptions within the library when trying to process the files I'm looking at and don't particularly want to sift through the source code of the library.
Is there an easy way to do this without using the aforementioned library?
If you have MS Office and you save the PPT in the RTF (Rich Text Format), it contains just the text from the presentation. You could then open the file in any editor that understands RTF files and save it as a text (TXT) file.
I expect this to work from Open Office too.
Since you talk of API, this may not be the way to go for you but maybe it will give you newer ideas on getting there. Say, you use multiple macros to do the conversion in stages...
Edit: I got curious and did a short google search
This is what i found on one of the www.openoffice.org pages
As people in this thread have pointed out, retrieving text from an OO
document isn't hard since it's just zipped xml that can be parsed with a
perl script. The problem is getting Microsoft Powerpoint documents into
a zipped XML format in the first place.
I've found that File -> Wizards -> Document Convertor does exactly that.
Just tell it you want to convert Powerpoint documents, not templates,
point it to your source directory and where you want it to spit out the
result and you're away.
I then find unzip -p $file.sxi content.xml | perl -p -e
"s/<[^>]>/\n/g;s/ +//;s/\n\n/\n/g;" -w
works rather well for extracting the text.
Sorry, i don't have Open Office handy to try any of that out.
pptx files are relatively easy to deal with, because they are just zipped xml - you can just unzip them and then strip all the xml tags from the content of the files in the 'ppt/slides' subdirectory of the unzipped stuff, yielding most of the pertinent text.
ppt files are a whole other ballgame, and the process is rendered even more painful because the canonical tool, catppt from the catdoc package, is susceptible to a buffer overflow that makes it nearly useless (it segfaults on a large percentage of ppt files).
LibreOffice-5 File - Export - HTML includes both slide contents and presenter notes.
Then, open the .html file in Firefox or other browser, and File - Save Page As - Text File (or utility such as pandoc -o file.txt file.html).