Excel VBA code for searching PDF in Adobe Acrobat - vba

I want to search a PDF file for a string and print the number of counted instances. I've done this for Word, Excel, and Powerpoint, but never Acrobat. There is an error when I call acroDoc.Range, so I assume this is the wrong syntax for Acrobat.
Run-time error '450': Wrong number of arguments or invalid property assignment.
I can't find answers in Adobe's documentation. What is the correct syntax for selecting the whole document and searching for a string?
Sub pdfSearch()
Dim acroApp As Object
Dim acroDoc As Object
Dim aRng As Object
Dim i As Integer
i = 0
Set acroApp = CreateObject("AcroExch.App")
Set acroDoc = CreateObject("AcroExch.pddoc")
acroDoc.Open ("C:\Documents\example.pdf")
Set aRng = acroDoc.Range
With aRng.Find
Do While .Execute(FindText:="desk", MatchCase:=False)
i = i + 1
Loop
End With
acroDoc.Close 0
Set aRng = Nothing
Set acroDoc = Nothing
Set acroApp = Nothing
Debug.Print (i)
End Sub

Acrobat doesn't have a concept of Range. FindText finds the specified text, scrolls so that it is visible, and highlights it. The return value is -1 when the text is found. Unless you also pass a parameter to reset the selection, subsequent calls will start where you left off so to get the count, you just loop until the return value is something other than -1. I haven't used VAB in quite a while but I think the code would look like this...
i = 0
Set acroApp = CreateObject("AcroExch.App")
Set acroDoc = CreateObject("AcroExch.AVDoc")
acroDoc.Open ("C:\Documents\example.pdf")
Do While acroDoc.FindText("desk",0) == -1
i = i + 1
Loop
Documentation to FindText:
http://help.adobe.com/en_US/acrobat/acrobat_dc_sdk/2015/HTMLHelp/index.html#t=Acro12_MasterBook%2FIAC_API_OLE_Objects%2FFindText.htm

Related

Add not replace content in Word Content Control using VBA

I am trying to generate multiple Word documents which have content controls that are populated from an Excel file. The second content control needs to be populated with a list which varies in length.
How do I add each value to the content control instead of replacing the current value? I am currently using Rich Text Content Controls.
Here is what I have so far:
Sub CreateCoverLetters()
Dim objWord As Word.Application
Dim wDoc As Word.Document
Dim Rows As Integer
Set objWord = CreateObject(Class:="Word.Application")
objWord.Visible = True
Set wDoc = objWord.Documents.Open(*insert filepath*)
objWord.Activate
wDoc.ContentControls(1).Range.Text = Worksheets("Lists").Range("A2").Value
Rows = Worksheets("Lists").Range("A3", Range("A3").End(xlDown)).Rows.Count
r = 3
For i = 1 To Rows
wDoc.ContentControls(2).Range.Text = Worksheets("Lists").Cells(r, 1).Value
r = r + 1
Next
wDoc.SaveAs (*insert filepath*)
End Sub
Any help much appreciated!
Solved it as follows:
Sub CreateCoverLetters()
Dim objWord As Word.Application
Dim wDoc As Word.Document
Dim Rows As Integer
Dim Content As String
Set objWord = CreateObject(Class:="Word.Application")
objWord.Visible = True
Set wDoc = objWord.Documents.Open(*insert filepath*)
objWord.Activate
wDoc.ContentControls(1).Range.Text = Worksheets("Lists").Range("A2").Value
Rows = Worksheets("Lists").Range("A3", Range("A3").End(xlDown)).Rows.Count
r = 3
For i = 1 To Rows
Content = Content & "- " & Worksheets("Lists").Cells(r, 1).Value & vbNewLine
r = r + 1
Next
wDoc.ContentControls(2).Range.Text = Content
wDoc.SaveAs (*insert filepath*)
End Sub
The approach in user's answer works if the content can 1) be concatenated in a single string and 2) none of the elements require special formatting. This would also be the fastest approach.
If for any reason this process is not possible, then the way to "append" content without replacing goes something like in the code snippet that follows.
Notice how Range and ContentControl objects are declared and instantiated, especially the Range object. This makes it much easier to pick up the "target" at a later point in the code. Also, a Range object can be collapsed (think of it like pressing the right-arrow to make a selection a blinking cursor): this makes it possible to append content and work with that new content (format it, for example). Word also has a Range.InsertAfter method which can be used if the new content does not have to be manipulated in any special way.
Dim cc as Object ' Word.ContentControl
Dim rngCC as Object 'Word.Range
Set cc = wDoc.ContentControls(1).Range
Set rngCC = cc.Range
rngCC.Text = Worksheets("Lists").Range("A2").Value
'Add something at a later point
rngCC.Collapse wdCollapseEnd
rngCC.Text = " New material at the end of the content control."

Copying a lotus notus database view and paste into word document

The code below reads and inserts the email from lotus notes to an excel sheet. However I would like to read a lotus notes database and copy its content and paste it as a rich text into a word document.
I assume this line of code needs to be modified.
Set nitem = .GetFirstItem("Body")
What would be the best way to go about this?
Public Sub Lotus_Notes_Current_Email4()
Dim NSession As Object 'NotesSession
Dim NUIWorkSpace As Object 'NotesUIWorkspace
Dim NUIdoc As Object 'NotesUIDocument
Dim nitem As Object 'NotesItem
Dim lines As Variant
Set NSession = CreateObject("Notes.NotesSession")
Set NUIWorkSpace = CreateObject("Notes.NotesUIWorkspace")
Set NUIdoc = NUIWorkSpace.CurrentDocument
If Not NUIdoc Is Nothing Then
With NUIdoc.Document
Set nitem = .GetFirstItem("Body")
If Not nitem Is Nothing Then
lines = Split(nitem.Text, vbCrLf)
Sheets(1).Activate
Range("H8").Resize(UBound(lines) + 1, 1).Value = Application.WorksheetFunction.Transpose(lines)
End If
End With
Else
MsgBox "Lotus Notes is not displaying an email"
End If
Set NUIdoc = Nothing
Set NUIWorkSpace = Nothing
Set NSession = Nothing
End Sub
Your assumption is incorrect. Your entire script needs to be rewritten.
If you want to copy the contents of a view, then you need to start by opening a NotesView object. Your current code is opening a NotesDocument object. To get the NotesView, you might want to use the CurrentView property NotesUIWorkspace to get a NotesUIView object, and then use that objects View property.
Once you have the NotesView object, then I'm guessing you might want to want to use the columns property to get at the data, but there are other ways you could go about it. Pretty much no matter what you do, though, you're going to have to handle the formatting of the data on your own.

How to call Word macros from Excel

I have two macros, one in Excel, and one in Word. The Excel Macro calls the Word macro. My code is as follows:
Excel:
Public wb1 As Workbook
Public dt1 As Document
Sub openword()
Dim wpath, epath As String 'where the word document will be opened and where the excel sheet will be saved
Dim wordapp As Object 'preparing to open word
Set wb1 = ThisWorkbook
While wb1.Sheets.Count <> 1
wb1.Sheets(2).Delete
Wend
wpath = "C:\users\GPerry\Desktop\Projects and Work\document.docm"
Set wordapp = CreateObject("Word.Application")
'Set wordapp = CreateObject(Shell("C:\Program Files (x86)\Microsoft Office\Office14\WINWORD", vbNormalFocus)) this is one I tried to make work because while word.application seems to work, I don't *understand* it, so if anyone can help, that'd be awesome
wordapp.Visible = True
Set dt1 = wordapp.Documents.Open(wpath)
wordapp.Run "divider", wb1, dt1
dt1.Close
wordapp.Quit
End Sub
And word:
Sub divider(wb1, dt1)
Set dt1 = ThisDocument
If dt1.Paragraphs.Count > 65000 Then
Set cutrange = dt1.Range(dt1.Paragraphs(1).Range.Start, dt1.Paragraphs(65000).Range.End)
If wb1.Sheets(Sheets.Count).Cells(1, 1) <> "" Then
wb1.Sheets.Add After:=Sheets.Count
End If
Else
Set cutrange = dt1.Content
If wb1.Sheets(Sheets.Count).Cells(1, 1) <> "" Then
wb1.Sheets.Add After:=Sheets.Count
End If
End If
cutrange.Cut Destination:=wb1.Sheets(wb1.Sheets(Sheets.Count)).Cells(1, 1)
wb1.Sheets(Sheets.Count).Cells(1, 1).TextToColumns Destination:=wb1.Sheets(1).Cells(1, 1)
End Sub
My problem is that the variable wb1 isn't getting passed between them. Even though I put wb1 in the list of variables to send to the macro, when it arrives at the document, wb1 has no value inside of it. I would re-initialize it, but I don't know how to refer to an already existing document - only how to set it equal to one as you open it.
So either how do I pass the value through into the Word macro, or how do I re-initialize this variable? Preferably without having to set something equal to the excel application, because every time I try that it sets it equal to Excel 2003, not 2010 (though any solutions to that are also, of course, welcome).
Thanks!
You can't use the Excel global objects from inside of Word without explicitly qualifying them (they simply don't exist there). In particular, that means you can't use Sheets. You should also explicitly declare the variable types of your parameters - otherwise they'll be treated as Variant. This is important with reference types because in that it helps prevent run-time errors because the compiler knows that the Set keyword is required.
Sub divider(wb1 As Object, dt1 As Document)
Set dt1 = ThisDocument
If dt1.Paragraphs.Count > 65000 Then
Set cutrange = dt1.Range(dt1.Paragraphs(1).Range.Start, dt1.Paragraphs(65000).Range.End)
If wb1.Sheets(wb1.Sheets.Count).Cells(1, 1) <> "" Then
wb1.Sheets.Add After:=wb1.Sheets.Count
End If
Else
Set cutrange = dt1.Content
If wb1.Sheets(wb1.Sheets.Count).Cells(1, 1) <> "" Then
wb1.Sheets.Add After:=wb1.Sheets.Count
End If
End If
cutrange.Cut Destination:=wb1.Sheets(wb1.Sheets(wb1.Sheets.Count)).Cells(1, 1)
wb1.Sheets(wb1.Sheets.Count).Cells(1, 1).TextToColumns Destination:=wb1.Sheets(1).Cells(1, 1)
End Sub
Note - you also don't need to pass dt1 at all. You never use the value in the parameter and actually set it to something else. This could be a source of errors if you're using internal calls, because dt1 is implicitly passed ByRef (it gets boxed when you call it through Application.Run). That means whenever you call divider, whatever you pass to dt1 in the calling code will change to ThisDocument. You should either remove the parameter or specify that it is ByVal.
Borrowed from another SO link.
Sub Sample()
Dim wdApp As Object, newDoc As Object
Dim strFile As String
strFile = "C:\Some\Folder\MyWordDoc.dotm"
'~~> Establish an Word application object
On Error Resume Next
Set wdApp = GetObject(, "Word.Application")
If Err.Number <> 0 Then
Set wdApp = CreateObject("Word.Application")
End If
Err.Clear
On Error GoTo 0
wdApp.Visible = True
Set newDoc = wdApp.Documents.Add(strFile)
Call wdApp.Run("YHelloThar", "Hello")
'
'~~> Rest of the code
'
End Sub

Collect images url and data from webpage table to Excel table

Need to collect data from table on a webpage, some table cell have images.
The codes is to copy the data to Excel, and if the cell has images, then get its src links instead of images. below are the codes, but it is not working, I don't know how to detect if the cell has image in it or not, and add its src links to Excel cell.
Sub extractData()
Dim IE As Object, obj As Object
Dim myYear As String
Dim r As Integer, c As Integer, t As Integer
Dim elemCollection As Object
Set IE = CreateObject("InternetExplorer.Application")
myYear = InputBox("Enter year")
With IE
.Visible = True
.navigate ("url")
While IE.ReadyState <> 4
DoEvents
Wend
For Each obj In IE.Document.All.Item("Year").Options
If obj.innerText = myYear Then
obj.Selected = True
End If
Next obj
IE.Document.getElementsByName("btn_search").Item.Click
Do While IE.busy: DoEvents: Loop
ThisWorkbook.Sheets("Sheet1").Range("A1:K2000").ClearContents
Set elemCollection = IE.Document.getElementsByTagName("TABLE")
For t = 0 To (elemCollection.Length - 9)
For r = 0 To (elemCollection(t).Rows.Length - 1)
For c = 0 To (elemCollection(t).Rows(r).Cells.Length - 1)
If elemCollection(t).Rows(r).Cells(c).innerText = "" Then
ThisWorkbook.Worksheets(1).Cells(r + 1, c + 1) = elemCollection(t).Rows(r).Cells(c).getAttribute("src")
Exit For
End If
Next
ThisWorkbook.Worksheets(1).Cells(r + 1, c + 1) = elemCollection(t).Rows(r).Cells(c).innerText
Next c
Next r
Next t
End With
Set IE = Nothing
End Sub
First, brush up on HTML Document Object Model. There are tons of tutorials on how to use JavaScript to work with the DOM, and VBA is real similar (because the DOM doesn't change based on language and VBA is very closely related to VBScript which is similar to JavaScript). Second, if you get an error but no line is highlighted when you click the Debug button, step through your code line by line with the F8 key. This will let you figure out where the error is occurring. Third, add a reference to the Microsoft HTML Object Library so you can use Intellisense for code hints.
It's tough to give an actual solution without seeing the HTML source so instead I'll give some pointers:
Use IE.Document.Body.getElementsByTagName("TABLE") (note the addition of BODY to narrow the scope) to get a collection of every table on the page. With a reference to the HTML Objects Lib you can do something like this:
Dim oTable As HTMLTable
Dim oCell As HTMLTableCell
Dim oImg As HTMLImage
Dim strSrc As String
For Each oCell In oTable.Cells
strSrc = ""
On Error Resume Next
Set oImg = oCell.getElementsByTagName("img")
strSrc = oImg.Source
On Error GoTo 0
If strSrc <> "" Then Debug.Print strSrc
Next
This should (I did not test it) loop through every cell in a table and attempt to get an img element. If it fails, no biggie, just continue to the next cell. If you want to use late binding after you get it working, remove the HTML Obj Lib reference then simply dim everything as an object. Eg:
Dim oTable As Object 'HTMLTable
Dim oCell As Object 'HTMLTableCell
Dim oImg As Object 'HTMLImage
Dim strSrc As String

How to read the content of an online PDF file into a string variable using VBA?

I am wondering if anyone has dealt with this before. I have a spreadsheet with links to thousands of pdf files. I would like to load the content of each pdf into a string variable and run a few RegEx to extract useful data. I have the function shown below which loads the content of a pdf file into a string, however this function only works for local files. However in my case I am opening the pdf file using IE.Navigate2 "https://www.example.com/mypdf.pdf" this will open the pdf in the browser, how can I load the content of that file into a string. The extreme solution would be to download the file and open it with the function below and then delete it. Please let me know your thoughts. Please note that the function below will only work if you have Acrobat installed (not the reader) you will also will need to add the reference in the VBA project to Adobe Acrobat Type Library
Public Function ReadAcrobatDocument(strFileName As String) As String
Dim AcroApp As CAcroApp, AcroAVDoc As CAcroAVDoc, AcroPDDoc As CAcroPDDoc
Dim AcroHiliteList As CAcroHiliteList, AcroTextSelect As CAcroPDTextSelect
Dim PageNumber, PageContent, Content, i, j
Set AcroApp = CreateObject("AcroExch.App")
Set AcroAVDoc = CreateObject("AcroExch.AVDoc")
If AcroAVDoc.Open(strFileName, vbNull) <> True Then Exit Function
' The following While-Wend loop shouldn't be necessary but timing issues may occur.
While AcroAVDoc Is Nothing
Set AcroAVDoc = AcroApp.GetActiveDoc
Wend
Set AcroPDDoc = AcroAVDoc.GetPDDoc
For i = 0 To AcroPDDoc.GetNumPages - 1
Set PageNumber = AcroPDDoc.AcquirePage(i)
Set PageContent = CreateObject("AcroExch.HiliteList")
If PageContent.Add(0, 9000) <> True Then Exit Function
Set AcroTextSelect = PageNumber.CreatePageHilite(PageContent)
' The next line is needed to avoid errors with protected PDFs that can't be read
On Error Resume Next
For j = 0 To AcroTextSelect.GetNumText - 1
Content = Content & AcroTextSelect.GetText(j)
Next j
Next i
ReadAcrobatDocument = Content
AcroAVDoc.Close True
AcroApp.Exit
Set AcroAVDoc = Nothing: Set AcroApp = Nothing
End Function