Error When Importing PDF into Word - vba

I need to perform multiple edits in a word document and based on search criteria then import the relevant PDF for that specific page. The file path I dynamically create as per specific data on the various pages. The code works fine if there are less that roughly 70 PDF files to import.
Once it gets to importing after around 70 PDF files then Word starts to give warning about not being able to undo the action which is about to be performed, even though i clear the Undo History after I have imported each PDF.
Then there is also an error which comes up with regards to the importing of the PDF, see image below:
But I can still open PDF document when this error occurs, if I stop my code and try to manually import the PDF it doesnt work. Only if I save the word document and then reopen it, can I import PDF documents into the word document again.
I believe it has something to do with all the actions being performed in Word.
Unfortunately I am not able to create a demo code which will give the same outputs for this question.
Why would PDF documents give an error when importing into Word?

Related

Getdata count the number of pages in a PDF file

I am currently trying to write some code to pull some data from some PDF files. I have used record macro to get the required power query lines correct and that's not an issue. My issue is that the number of pages in each PDF file changes. When I run power query through the get data on the data tab, you can see in this example there are 7 pages. I need a way to set a variable to 7, so I can update my for loop for each page.
Is this possible? Surely it is because getdata knows there are 7 pages. I don't have Adobe Acrobat Pro, and all solutions look at using Acrobat to open the file to then count. All solutions I've tried result in ActiveX 429 error, as I don't have Acrobat.
I have tried all the codes that involve opening the PDF with acrobat and also using word as an intermediary (often page count is incorrect)

Open a .pdf file

I am trying to open a .pdf file within Excel like an iframe in HTML.
My requirement is:
Save the path of multiple PDF files in Excel.
Excel should open each .pdf file within Excel itself (no need to open that in a separate .pdf window).
It should be like iframe in HTML. The user should be able
to view the .pdf within Excel itself.
I know this is little weird, but can anybody help me?
you could probably get the filenames via vba.
here's some that claim to work:
Loop through files in a folder using VBA?
So far as opening a pdf in excel - thats kinda pushing it.
Since your request is exotic I can think of an exotic workaround:
If you can spare the interactivity you can simply make copies and convert your pdfs to word formats to work with them and load them in that way. I've seen people convert pdfs to Jpgs just to load them in some other documents but thats rudimentary and really fringe.
Otherwise you are facing a lot of custom coding that needs to make it possible.

How can I edit the search text of a searchable PDF?

I have access to a scanner at my library which can create "searchable PDFs." These are PDFs that show the exact image of a scanned document, but there is a kind of hidden text in the PDF that can be selected when you try to select a portion of the image that contains text. In this way you can copy and paste text or search for text in the scanned document. This is VERY useful. It's an awesome improvement over raw scanned images. I also have several apps on my mac that can create this kind of searchable PDF from a scanned document or a raw image.
Now it's obvious from any who has ever used OCR that the process of converting images to text is not 100% accurate, so the text that you search or copy will not be correct in some places.
So I search for quite some time to find an application that would load a searchable PDF and allow me to repair the hidden searchable text without reformatting or modifying the original scanned image.
Does anyone know of a tool (or library API) that would allow this?
It's worth saying here that I tried the latest version of Adobe Acrobat DC for Mac, and it doesn't seem to even allow me to view the hidden searchable text, much less edit it. It does allow me to replace scanned image with the results of it's own OCR process so that I could edit and save the document. But this would produce horrible results for any of the scanned documents that I am using. It seems designed for editing a "native PDF" not editing a scanned document.
I have also tried ABBYY FineReader with no luck.
i'm using ABBYY FineReader 12 Professional. (not open source)
Just open a scanned image or scanned pdf and press Verify Text(or Ctrl + F7), than you go over all the spelling errors or low-confidence charachters and fix them.
The program is very good, it shows you the exact place in image/pdf to correct and the OCR guessing side by side for convenience. It iterates all of them.
[By the way, I'm using the shortcuts to speed up things:
Alt+Enter to add the unrecognized word to dictionary.
Ctrl+Delete to skip word or confirm in case you fixed it.]
Than save the document as a pdf file Menu:File>Save Document As> PDF File, and you can search it on every pdf reader. The saved file look the same as the scanned one, but 'behind' it there text.
It's weird you tried ABBYY with no luck... it's working great for me. maybe you tried not the Professional version.
Hope it helps you.
It is not creating a searchable pdf from images the poster is after, he wants to start with an already searchable pdf and modify its text (e.g. because intially a searchable pdf was made but later an overlooked error in recognition was found and needs correction). I see no way and no tool that assists in doing this.

docx4j word/googledocs compatibility

I'm creating a program which extracts a docx file, displays it in a Javafx graphic interface with buttons in place of flags put in the docx, and when one puts on it, it modifies the docx taken in input.
I'm using the docx4j API for extracting and modifying the document.
The problem is that the program fails if i take in entry a docx generated from Microsoft Word. I'm forced to use an artifice.
I'm taking my docx made on Word, then i load it in Google Docs and I use the "Download in .docx format" option. If i directly put the docx from Word in my program, it fails.
I noticed my Word file was two times lighter after being passed trough google doc. Same, if I tale a docx file downloaded from Google Docs, if i open it in Word and modify one letter and save it, he becomes two times heavier. For the record i use word 2008.
That's it, so I'd like to know if someone know what explains this difference.
Thanks

Word Automation Service breaks links in table of contents

I have written a code which utilizes Word Automation Service in order to convert the .DOCX file to the .PDF. I have noticed that in case the Word document contains a table of contents, its links are removed in the PDF. This is very bad for my business case.
On the other hand, manually opening MS Word and saving the same document as PDF preserves the links in the table of contents. This is the behavior I am looking for, but I want to keep my code independent form having MS Office Word installed on the machine running my code.
Has anyone had the similar issue and was anybody able to resolve it?
In my case, i found out that this is something related to Job Settings property. Try to comment or remove this line of code if you have one:
jobSettings.UpdateFields = true;