Automation of PDF String Search using Excel VBA - OLE error - vba

I'm getting this error, "Microsoft Excel is waiting for another application to complete an OLE action" when trying to automate a PDF string search and record findings in excel. For certain PDFs this error is not popping. I assume this is due to the less optimized PDFs taking a longer time to search string while indexing page by page.
To be more precise, I have a workbook containing two sheets. One contains a list of PDF file names and the other has a list of words that I want to search. From the file list the macro would open each PDF file and take each word from the list of words and perform a string search. If found it would record each finding in a new sheet in the same workbook with the file name and the found string.
Below is the code I'm struggling with. Any help is welcome.
Public Sub SearchWords()
'variables
Dim ps As Range
Dim fs As Range
Dim PList As Range
Dim FList As Range
Dim PLRow As Long
Dim FLRow As Long
Dim Tracker As Worksheet
Dim gapp As Object
Dim gAvDoc As Object
Dim gPDFPath As String
Dim sText As String 'String to search for
FLRow = ActiveWorkbook.Sheets("List Files").Range("B1").End(xlDown).Row
PLRow = ActiveWorkbook.Sheets("Prohibited Words").Range("A1").End(xlDown).Row
Set PList = ActiveWorkbook.Sheets("Prohibited Words").Range("A2:A" & PLRow)
Set FList = ActiveWorkbook.Sheets("List Files").Range("B2:B" & FLRow)
Set Tracker = ActiveWorkbook.Sheets("Tracker")
'For each PDF file list in Excel Range
For Each fs In FList
'Initialize Acrobat by creating App object
Set gapp = CreateObject("AcroExch.App")
'Set AVDoc object
Set gAvDoc = CreateObject("AcroExch.AVDoc")
'Set PDF file path to open in PDF
gPDFPath = fs.Cells.Value
' open the PDF
If gAvDoc.Open(gPDFPath, "") = True Then
'Bring the PDF to front
gAvDoc.BringToFront
'For each word list in the range
For Each ps In PList
'Assign String to search
sText = ps.Cells.Value
'This is where the error is appearing
If gAvDoc.FindText(sText, False, True, False) = True Then
'Record findings
Tracker.Range("A1").End(xlDown).Offset(1, 0) = fs.Cells.Offset(0, -1).Value
Tracker.Range("B1").End(xlDown).Offset(1, 0) = ps.Cells.Value
End If
Next
End If
'Message to display once the search is over for a particular PDF
MsgBox (fs.Cells.Offset(0, -1).Value & " assignment complete")
Next
gAvDoc.Close True
gapp.Exit
set gAVDoc = Nothing
set gapp = Nothing
End Sub

I have now found the answer to this problem.
I'm using Acrobat Pro and whenever I open a PDF file, it opens with limited features due to Protected View settings. If I disable this function or if I click Enable All Features and save changes to the PDF files, VBA macro runs smooth.
It's funny, I'm posting an answer to my own problem.

Related

Printing pdf through automating word with VB net without showing dialog

I've finally encountered a problem, where I didn't already find the answer here or anywhere else on the web:
My program grabs some measurement values from an instrument (I cannot directly control it so I have to wait until the measurement was done by the user and parse the report), calculates some derived values and shall put these values back into the pdf report, which was automatically generated by the instrument control software.
It all works until I come to the line where the printout is started. It always opens the word print dialog instead of silently overwriting my file. I actually don't understand what I am doing wrong when calling PrintOut.
Here is the example code:
Imports Microsoft.Office.Interop
Module Example
Private Sub PrintReport()
Dim intAnswer As Integer
Dim strReportFileName As String = ""
Dim appWord As New Word.Application
Dim wdDoc As Word.Document
dim strPPF as string = "0.5" 'For testing, normally a parameter
dim strFolder as string = "C:\UVVis-Data" 'For testing, normally a parameter
'Find and open the PDF file of the report:
strReportFileName = (From fi As IO.FileInfo In (New IO.DirectoryInfo(strFolder.GetFiles("*.pdf")) Order By fi.LastWriteTime Descending Select fi)(0).FullName 'It will be always the newest file in that folder
appWord.Visible = False 'hide word from the user
wdDoc = appWord.Documents.Open(strReportFileName) 'open the PDF report
'Replace the placeholders which were defined in the report template earlier:
With appWord.Selection.Find
.Text = "#PPF#"
.Replacement.ClearFormatting()
.Replacement.Text = strPPF
.Execute(Replace:=Microsoft.Office.Interop.Word.WdReplace.wdReplaceAll)
End With
'Print out the modified report:
'wdDoc.PrintOut(False, False,, strReportFileName,,,,,,, True) 'this was my first approach
wdDoc.PrintOut(Background:=False, Append:=False, OutputFileName:=strReportFileName, PrintToFile:=True) 'this also doesn't work as intended
'Close the file and restore word to it's normal state:
wdDoc.Close(Microsoft.Office.Interop.Word.WdSaveOptions.wdDoNotSaveChanges)
appWord.Visible = True
appWord.Quit()
End Sub
end Module
Use the Document.ExportAsFixedFormat method which saves a document as PDF or XPS format.
Public Sub ExportAsFixedFormat_Example()
wdDoc.ExportAsFixedFormat pbFixedFormatTypePDF, "pathandfilename.pdf"
End Sub
Thank you for the input.
I found a second problem with my code: I cannot overwrite the original document once it is open in word.
I solved this by first moving the pdf to a temporary folder, opening that temporary file in word and deleting it after word is closed.

How to check pdf check box

I am trying to read one pdf and a VBA userform and then fill out another pdf.
I wrote code to read all text in a pdf and then find certain sub strings based on tokens that I can find in the string. It is intended to populate the fields in the destination pdf based on the substrings and check the appropriate text boxes based on the user form. I can get the code to fill the substrings and then save the document, but it won't check the boxes.
Before the code used a AVDoc, but I switched to a JSO because I don't want the pdf to pop up, and the jso avoids that problem.
I tried pdfBool.value = cBool(vbaBool), pdfBool.value = 1, pdfBool.value = "1", jso.setValue("checked"), jso.setValue("yes"), etc.
This code will run without crashing. I reduced the number of variables to one string and one bool for the sake of the example.
Sub main()
‘findString grabs all text from a pdf file. This code works.
Dim mystr As String
If findString(mystr) = False Then
Application.StatusBar = "Cannot find Source PDF"
Exit Sub
End If
Dim mypath As String
mypath = ActiveWorkbook.Path & "\destination.pdf"
Dim aApp As acrobat.AcroApp
Dim pdfDoc As acrobat.CAcroPDDoc
Dim jso As Object
Set aApp = CreateObject("AcroExch.App")
Set pdfDoc = CreateObject("AcroExch.PDDoc")
If pdfDoc.Open(mypath) = True Then
Set jso = pdfDoc.GetJSObject
Dim vbaText As String
Dim vbaBool As String
vbaText = returnString("Token1")
vbaBool = userForm.checkBox1.value
Dim pdfText As Object
Dim pdfBool As Object
Set pdfText = jso.getField("TextField1")
Set pdfBool = jso.getField("CheckBox1")
pdfText.Value = vbaText
pdfBool.Value = vbaBool
'save pdffile
Dim fileSavePath As String
fileSavePath = ActiveWorkbook.Path & "\My Save File.pdf"
pdfDoc.Save PDSaveFull, fileSavePath
'clean up memory
Set pdfDoc = Nothing
Set pdfText = Nothing
Set pdfBool = Nothing
Set jso = Nothing
End If
aApp.Exit
Set aApp = Nothing
Unload userForm1
End Sub
Ok, so after some searching, I have found a solution. Basically, forms created using Living Cycle don't work well with checkboxes. I asked somebody in my organization and they confirmed that Living Cycle was used on forms for a while until we got rid of it. Honestly, I don't know what Living Cycle is, but the solution seemed to work and so I think whatever the issue was related to something called "Living Cycle".
The solution? Redo the pdf form: I exported the pdf to an Encapsulated PostScript file. This stripped away all the fields. After that, I used the prepare form tool which automatically found all the relevant fields. Fortunately, with my pdf, it found all of the fields perfectly, though there was one or two extra ones that I had to delete. The field names and the code need to match so adjustments need to either be made to the PDF or to the code, but once I made that adjustment, everything was perfect.
Try jso.getfield(pdfFieldName).Value = "Yes" or "No". The value is case sensitive so you have to use Yes or No.

Run Macro with vb.net Application to format Word Documents

I have a macro files with file extension of .DO Files(.DO). I Open a file through vb.net
application and to also open a macro. by using that macro I need to format that word Document. I
tried a lot but no use . I try to Select some area of word file but it will leads to error:
Object reference not set to instance of object
Private sub beginFormatting
ls_inipath = System.Windows.Forms.Application.StartupPath & "\"
ls_Document = GetIniValue("Remove_Pages", "doc_name", txtFileName.Text)
Dim what As Object = Word.WdGoToItem.wdGoToLine
Dim which As Object = Word.WdGoToDirection.wdGoToLast
Dim SelectionOne As Selection
Dim returnValue As Range = SelectionOne.GoTo(what, which, Nothing,
Nothing)
SelectionOne.EndKey(WdUnits.wdStory, WdMovementType.wdMove)
end sub

Run Time Error Generated when opening Microsoft Word Document from Excel VBA

Receiving a Runtime Error '13' exception when creating a word application object. I also was experiencing relatively extreme delays (on the order of 20-30 seconds) of running prior to the exception. Any idea what this could be from? I searched Stackoverflow and I haven't seen anything that was very similar.
I have never worked with an word doc from excel before so this is something new to me.
Code Below:
Public Sub GetRawData()
'Meant to translate data from a raw word file and format into excel
Dim filePath As String
Dim objWord As Application
Set objWord = CreateObject("Word.Application")
Dim objDoc As Word.Document
'File management vars
Dim oneLine As Paragraph
Dim lineText As String
filePath = "U:\Research_Dev Docs\DevFolder\Word Doc Translation In Excel For Phys\testWordDoc.docx"
'Set word doc object using standard file directory and file name
Set objDoc = objWord.Documents.Open(Filename:=filePath, Visible:=True)
For Each oneLine In objDoc.Paragraphs
'Pull in each line and eventually parse
lineText = oneLine.Range.Text
'DEBUG OUTPUT TO THE SCREEN FOR TESTING
MsgBox (lineText)
Next oneLine
End Sub

Writing Data from Excel to Word

I want to use Excel to store "tag names" in column A and their associated "replacement text" in Column B. When the code runs, it needs to collect each tag, one at a time (row by row), search an entire Word document for those words, and replace them with their corresponding replacements.
I noticed the special tags in the headers and footers weren't being replaced. I turned to this article (http://word.mvps.org/faqs/customization/ReplaceAnywhere.htm) and found that working with a range of ranges (or cycling through all available Story Ranges in the document) I was able to do this.
I improved my code, as recommended in the link above and it worked, so long as my code was embedded in my "Normal" Word file, thereby using my VBA code from Word to operate on another Word document. However, the goal is to use VBA Excel to operate the replacements while reading an Excel file.
When I moved the code to Excel, I'm getting hung up on an Automation error which reads,
"Run-time error '-2147319779 (8002801d)': Automation error Library not registered.".
I've looked for answers from reviewing the Registry to using "Word.Application.12" in place of "Word.Application".
I have a Windows 7, 64-Bit machine, with Microsoft Office 2007. I have the following libraries selected:
Excel:
Visual Basic For Applications
Microsoft Excel 12.0 Object Library
OLE Automation
Microsoft Access 12.0 Object Library
Microsoft Outlook 12.0 Object Library
Microsoft Word 12.0 Object Library
Microsoft Forms 2.0 Object Library
Microsoft Office 14.0 Object Library
Word:
Visual Basic For Applications
Microsoft Word 12.0 Object Library
OLE Automation
Microsoft Office 12.0 Object Library
I have no issues with operating inside of Excel with regard to VBA. Normally, I will be passing a set of strings to this function, but for now, I have embedded the strings inside of the function, as if I am only planning on swapping one string (for any number of instances), with another predetermined string.
Function Story_Test()
Dim File As String
Dim Tag As String
Dim ReplacementString As String
Dim a As Integer
Dim WordObj As Object
Dim WordDoc As Object
Dim StoryRange As Word.Range
Dim Junk As Long
Dim BaseFile As String
'Normally, these lines would be strings which get passed in
File = "Z:\File.docx"
Tag = "{{Prepared_By}}"
ReplacementString = "Joe Somebody"
'Review currently open documents, and Set WordDoc to the correct one
'Don't worry, I already have error handling in place for the more complex code
Set WordObj = GetObject(, "Word.Application")
BaseFile = Basename(File)
For a = 1 To WordObj.Documents.Count
If WordObj.Documents(a).Name = BaseFile Then
Set WordDoc = WordObj.Documents(a)
Exit For
End If
Next a
'This is a fix provided to fix the skipped blank Header/Footer problem
Junk = WordDoc.Sections(1).Headers(1).Range.StoryType
'Okay, this is the line where we can see the error.
'When this code is run from Excel VBA, problem. From Word VBA, no problem.
'Anyone known why this is???
'***********************************************************************
For Each StoryRange In WordObj.Documents(a).StoryRanges
'***********************************************************************
Do
'All you need to know about the following function call is
' that I have a function that works to replace strings.
'It works fine provided it has valid strings and a valid StoryRange.
Call SearchAndReplaceInStory_ForVariants(StoryRange, Tag, _
ReplacementString, PreAdditive, FinalAdditive)
Set StoryRange = StoryRange.NextStoryRange
Loop Until StoryRange Is Nothing
Next StoryRange
Set WordObj = Nothing
Set WordDoc = Nothing
End Function
For Each StoryRange In WordObj.Documents(a).StoryRanges
should probably be
For Each StoryRange In WordDoc.StoryRanges
since you just assigned that in the loop above.
For now, I will have to conclude, as I don't have the possibility of testing the contrary, that there is a difference between using Microsoft Office 12 Object Library in one VBA environment, and Microsoft Office 14 Object Library in another. I don't have the means/authorizations to change either, so I must conclude, for now that is, that the difference between the two is the culprit. So, if I was to go forward and expect different results, I would assume Microsoft Office 12 Object Library to be the correct library, where 14 has a few differences that I am not aware of.
Thank you to all who provided input. If you have any other suggestions, we can discuss and forward. Thanks!
This is to update a bunch of links spread over body & Headers footers.
I didn't write this only from memory made a bunch of fixes, inclusions and tweaks.
It shows you how to cover all the different sections and can easily be modified to work within your parameters.
Please post your final code once done.
Public Sub UpdateAllFields()
Dim doc As Document
Dim wnd As Window
Dim lngMain As Long
Dim lngSplit As Long
Dim lngActPane As Long
Dim rngStory As Range
Dim TOC As TableOfContents
Dim TOA As TableOfAuthorities
Dim TOF As TableOfFigures
Dim shp As Shape
Dim sctn As Section
Dim Hdr As HeaderFooter
Dim Ftr As HeaderFooter
' Set Objects
Set doc = ActiveDocument
Set wnd = ActiveDocument.ActiveWindow
' get Active Pane Number
lngActPane = wnd.ActivePane.Index
' Hold View Type of Main pane
lngMain = wnd.Panes(1).View.Type
' Hold SplitSpecial
lngSplit = wnd.View.SplitSpecial
' Get Rid of any split
wnd.View.SplitSpecial = wdPaneNone
' Set View to Normal
wnd.View.Type = wdNormalView
' Loop through each story in doc to update
For Each rngStory In doc.StoryRanges
If rngStory.StoryType = wdCommentsStory Then
Application.DisplayAlerts = wdAlertsNone
' Update fields
rngStory.Fields.Update
Application.DisplayAlerts = wdAlertsAll
Else
' Update fields
rngStory.Fields.Update
End If
Next
'Loop through text boxes and update
For Each shp In doc.Shapes
With shp.TextFrame
If .HasText Then
shp.TextFrame.TextRange.Fields.Update
End If
End With
Next
' Loop through TOC and update
For Each TOC In doc.TablesOfContents
TOC.Update
Next
' Loop through TOA and update
For Each TOA In doc.TablesOfAuthorities
TOA.Update
Next
' Loop through TOF and update
For Each TOF In doc.TablesOfFigures
TOF.Update
Next
For Each sctn In doc.Sections
For Each Hdr In sctn.Headers
Hdr.Range.Fields.Update
For Each shp In Hdr.Shapes
With shp.TextFrame
If .HasText Then
shp.TextFrame.TextRange.Fields.Update
End If
End With
Next shp
Next Hdr
For Each Ftr In sctn.Footers
Ftr.Range.Fields.Update
For Each shp In Ftr.Shapes
With shp.TextFrame
If .HasText Then
shp.TextFrame.TextRange.Fields.Update
End If
End With
Next shp
Next Ftr
Next sctn
' Return Split to original state
wnd.View.SplitSpecial = lngSplit
' Return main pane to original state
wnd.Panes(1).View.Type = lngMain
' Active proper pane
wnd.Panes(lngActPane).Activate
' Close and release all pointers
Set wnd = Nothing
Set doc = Nothing
End Sub