getElementByID in a Word Document - vba

I am writing some VBA code which manipulates an HTML document. The document is opened as text so that the HTML can be worked on. Thus:
Dim oWordDoc As Word.Document
Set oWordDoc = Documents.Open(FileName:=strFolder & "\" & strFileName, _
Format:=wdOpenFormatText)
The HTML contains some meta tags that I would like to be able to access by ID. This is the code I tried to attempt this:
Dim objHtmlDoc As HTMLDocument
Dim objMetaTag As HTMLMetaElement
Set objHtmlDoc = oWordDoc
Set objMetaTag = objHtmlDoc.getElementById("keywords")
However, I get a Type Mismatch error on the line:
Set objHtmlDoc = oWordDoc
I tried to set the objHtmlDoc to oWordDoc.content, and get the same error. Is there anyway that I can convert the Word.Document object to an HTMLDocument object so that I can set the HTMLDocument to be the Word.Document? Or will I have to develop my own getElementbyID function to perform this?
Thanks.

An Alternative that I was suggesting.
Sub Sample()
Dim objHtmlDoc As HTMLDocument
Dim objMetaTag As HTMLMetaElement
Dim IE As Object
Set IE = CreateObject("InternetExplorer.Application")
IE.Visible = True
IE.navigate "C:\Sample.Htm"
Set objHtmlDoc = IE.Document
Set objMetaTag = objHtmlDoc.getElementById("keywords")
End Sub

Related

How to scrape post div table content on VBA?

I am new to web-scraping. I am trying to get data from O-net online. To be specific, I would like to draw the median wages and employment stats for a few dozens selected jobs. For example:
https://www.onetonline.org/link/summary/13-2041.00
O-net content
Upon inspection, the html looks like this
Below is my code though it does not work.
Sub scrape()
Dim appIE As Object
Set appIE = CreateObject("internetexplorer.application")
With appIE
.Navigate "https://www.onetonline.org/link/summary/13-2041.00"
.Visible = False
End With
Do While appIE.Busy
DoEvents
Loop
Set allRowOfData = appIE.document.getElementByID("wrapper_WagesEmployment").getelementbytagname("table")(0)
Dim myValue As String: myValue = allRowOfData.Cells(3).innerHTML
appIE.Quit
Set appIE = Nothing
Range("A1").Value = myValue
End Sub
Please insert Option Explicit at the top of the module to help you identify which variable are not declared properly.
You spelled getelementbytagname method wrong, it's getElementsByTagName.
Try the below code:
Private Sub scrape()
Dim appIE As Object
Set appIE = CreateObject("internetexplorer.application")
With appIE
.navigate "https://www.onetonline.org/link/summary/13-2041.00"
.Visible = False
Do While .Busy Or .readyState <> 4
DoEvents
Loop
End With
Dim ieDoc As Object
Set ieDoc = appIE.Document
Dim allRowOfData As Object
Set allRowOfData = ieDoc.getElementById("wrapper_WagesEmployment").getElementsByTagName("table")(0)
Dim medianWageValue As String
medianWageValue = allRowOfData.getElementsByClassName("report2")(0).innerText
Range("A1").Value = medianWageValue 'Fully qualify your range referece e.g. Sheet1.Range("A1").Value
Set ieDoc = Nothing
appIE.Quit
Set appIE = Nothing
End Sub
Alternative - You can do the same using XMLHTTP that doesn't require you to open IE and most of the time, faster (You will need to add Microsoft HTML Object Library in your reference):
Private Sub ScrapeByXMLHTTP()
Dim xmlhttp As Object
Set xmlhttp = CreateObject("MSXML2.XMLHTTP")
With xmlhttp
.Open "GET", "https://www.onetonline.org/link/summary/13-2041.00"
.send
Dim htmlDoc As MSHTML.HTMLDocument
Set htmlDoc = New MSHTML.HTMLDocument
htmlDoc.body.innerHTML = .responseText
End With
Set xmlhttp = Nothing
htmlDoc.body.innerHTML = htmlDoc.getElementById("wrapper_WagesEmployment").getElementsByTagName("table")(0).outerHTML
Dim medianWageValue As String
medianWageValue = htmlDoc.getElementsByClassName("report2")(0).innerText
Set htmlDoc = nothing
Range("A1").Value = medianWageValue 'Fully qualify your range referece e.g. Sheet1.Range("A1").Value
End Sub

Macro to open multiple links in new tabs

I want my macro to open each link stored in a spreadsheet in a separate IE tab. I am successful with opening the first link, but for some reason on the second iteration of the loop I get:
Automation error.The interface is unknown
error.
I suspect the macro somehow loses IE object reference after first iteration, but I am not sure why.
Range is set OK.
Here is the code:
Sub OpenCodingForms()
Dim wb1 As Workbook
Dim ws1 As Worksheet
Dim CodingFormLinks As Range
Dim IE as InternetExplorerMedium
Set wb1 = Workbooks("New shortcut.xlsm")
Set ws1 = wb1.Worksheets("Data")
Set CodingFormLinks = ws1.Range("A2", Range("A2").End(xlDown))
Set IE = CreateObject("InternetExplorer.Application")
IE.Visible = True
ws1.Activate
For Each link In CodingFormLinks.Cells
IE.Navigate link, CLng(2049)
Next link
End Sub
I ran into this issue before and ended up just writing a routine to get the instance. You will need to add a reference to shell controls and automation.
you may have to adjust this to look for the sURL var in the beginning of the actual URL if there is redirection.
Sub OpenCodingForms()
Dim wb1 As Workbook
Dim ws1 As Worksheet
Dim CodingFormLinks As Range
Dim IE As InternetExplorerMedium
Set wb1 = Workbooks("New shortcut.xlsm")
Set ws1 = wb1.Worksheets("Data")
Set CodingFormLinks = ws1.Range("A2", Range("A2").End(xlDown))
Set IE = CreateObject("InternetExplorer.Application")
IE.Visible = True
ws1.Activate
Dim sUrl As String
For Each link In CodingFormLinks.Cells
sUrl = link.Value
IE.navigate sUrl, CLng(2048)
Set IE = GetWebPage(sUrl)
Next link
End Sub
''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
'Desc: The Function gets the Internet Explorer window that has the current
' URL from the sURL Parameter. The Function Timesout after 30 seconds
'Input parameters:
'String sURL - The URL to look for
'Output parameters:
'InternetExplorer ie - the Internet Explorer window holding the webpage
'Result: returns the Internet Explorer window holding the webpage
''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
Function GetWebPage(sUrl As String) As InternetExplorer
Dim winShell As Shell
Dim dt As Date
'set the timeout period
dt = DateAdd("s", 300, DateTime.Now)
Dim IE As InternetExplorer
'loop until we timeout
Do While dt > DateTime.Now
Set winShell = New Shell
'loop through the windows and check the internet explorer windows
For Each IE In winShell.Windows
'check for the url
If IE.LocationURL = sUrl Then
'set the window visible
IE.Visible = True
IE.Silent = True
'set the return value
Set GetWebPage = IE
Do While IE.Busy
DoEvents
Loop
Set winShell = Nothing
Exit Do
End If
Next IE
Set winShell = Nothing
DoEvents
Loop
End Function

Web-scraping in Excel with variable url (url extension)

I am fairly new to VBA and VBA in excel, I have been trying to find out how to conditionally scrape web data based off of one cells value ("Guid") and have not really found a way to progress the function -- to make it dynamic. As of right now I can only get it to retrieve data for one specific cell, and print in another specified cell. I believe I am just missing some kind of looping variable function? (aside from there is probably a more correct way of writing the code).
Sub ie_open()
Dim wb As Workbook
Dim ws As Worksheet
Dim TxtRng As Range
Dim Guid As Range
Dim ie As Object
Dim URL As String
URL = "https://url.com/userpage="
Set wb = ActiveWorkbook
Set ws = wb.Sheets("Detail Report - Individuals")
Set Guid = ws.Range("E2")
Set TxtRng = ws.Range("F2")
Set ie = CreateObject("INTERNETEXPLORER.APPLICATION")
ie.NAVIGATE (URL + Guid)
ie.Visible = True
While ie.ReadyState <> 4
DoEvents
Wend
TxtRng = ie.document.getelementbyid("lbl_Location").innertext
End Sub
Thank you in advance.
Turn on a reference to HTML elements (Go to Tools -- References. You should also turn on a reference to Microsoft Internet controls so you can declare IE as an InternetExplorer object rather than just an object, but it's not necessary), then you can loop through each element like
Sub ie_open()
Dim wb As Workbook
Dim ws As Worksheet
Dim TxtRng As Range
Dim Guid As Range
Dim ie As Object
Dim URL As String
'ADDED THIS
Dim sl as Ihtmlelement
Dim r as long = 1
URL = "https://url.com/userpage="
Set wb = ActiveWorkbook
Set ws = wb.Sheets("Detail Report - Individuals")
Set Guid = ws.Range("E2")
Set TxtRng = ws.Range("F2")
Set ie = CreateObject("INTERNETEXPLORER.APPLICATION")
ie.NAVIGATE (URL + Guid)
ie.Visible = True
While ie.ReadyState <> 4
DoEvents
Wend
For each sl in ie.document.all
ws.cells(r, 1).value = sl.innertext
r = r + 1
Next
'TxtRng = ie.document.getelementbyid("lbl_Location").innertext
End Sub
Edit: forgot to increment the r variable in the loop, and I think it should be IE.Document.All instead of just IE.Document when initializing the loop

how to get the meta name keywords -vba

I am trying to get the meta name keywords from a webpage
meta name="keywords" content="Mitch Albom,For One More Day,Little, Brown Book Group,0751537535,Fiction / General,General & Literary Fiction,Modern & contemporary fiction (post c 1945),USA
I need to get the contents from it need help.
Option Explicit
Sub GetData()
Dim ie As New InternetExplorer
Dim str As String
Dim wk As Worksheet
Dim webpage As New HTMLDocument
Dim item As HTMLHtmlElement
Set wk = Sheet1
str = wk.Range("Link").value
ie.Visible = True
ie.Navigate str
Do
DoEvents
Loop Until ie.ReadyState = READYSTATE_COMPLETE
Dim Doc As HTMLDocument
Set Doc = ie.Document
Dim kwd As String
kwd = Trim(Doc.getElementsByTagName("keywords").innerText)
MsgBox kwd
End Sub
The best way to do that is by finding meta-element with name keyword and referring to its content property. You can do it like that:
Option Explicit
Sub GetData()
Dim ie As New InternetExplorer
Dim str As String
Dim wk As Worksheet
Dim webpage As New HTMLDocument
Dim item As HTMLHtmlElement
Set wk = Sheet1
str = wk.Range("Link").value
ie.Visible = True
ie.Navigate str
Do
DoEvents
Loop Until ie.ReadyState = READYSTATE_COMPLETE
'Find the proper meta element --------------
Const META_TAG As String = "META"
Const META_NAME As String = "keywords"
Dim Doc As HTMLDocument
Dim metaElements As Object
Dim element As Object
Dim kwd As String
Set Doc = ie.Document
Set metaElements = Doc.all.tags(META_TAG)
For Each element In metaElements
If element.Name = META_NAME Then
kwd = element.Content
End If
Next
MsgBox kwd
End Sub

vba htmldocument How can I delete a specific element

I am writing the code in VBA.
How can I delete a specific element?
Thank you
Dim html As MSHTML.HTMLDocument
Set html = New MSHTML.HTMLDocument
Dim document As MSHTML.HTMLDocument
Set document = html.createDocumentFromUrl("http://example.com/", vbNullString)
' wait download
Do While document.readyState <> "complete"
Loop
' Insert I Can
Call document.DocumentElement.insertAdjacentHTML("afterend", "<div>test</div>")
' I want to delete a specific element here But I fail
' Call document.removeNode("specific element")
You cant remove elements, but you can remove nodes, which is more or less the same.
Function DelTagById(strData As String, strID As String) As String
On Local Error GoTo MyError
Dim HTMLDoc As HTMLDocument
Dim Node As IHTMLDOMNode
DelTagById = strData
If strID = "" Then GoTo MyExit
Set HTMLDoc = New HTMLDocument
HTMLDoc.body.innerHTML = strData
Set Node = HTMLDoc.getElementById(strID)
If Node Is Nothing Then GoTo MyExit
Node.parentNode.removeChild Node
DelTagById = HTMLDoc.body.innerHTML
MyExit:
Set Node = Nothing
Set HTMLDoc = Nothing
Exit Function
MyError:
'Handle Error
Resume MyExit
End Function