Query div element by class name using Excel VBA - vba

I'm trying to get the data to Excel of a div element with a specific class name, like:
<div class="myClass">
<span>Text1</span>
<span>Text2</span>
</div>
My VBA code:
Sub GetData()
Dim oHtml As HTMLDocument
Dim oElement As Object
Set oHtml = New HTMLDocument
With CreateObject("WINHTTP.WinHTTPRequest.5.1")
.Open "GET", "http://www.example.com/", False
.send
oHtml.body.innerHTML = .responseText
End With
For Each oElement In oHtml.getElementsByClassName("myClass")
Debug.Print oElement.Children(0).src
Next oElement
End Sub
This is returning the error:
Run-time error: '438': Object doesn't support this property or method. The error is on line Debug.Print ...
I have activated the following refereces:
Microsoft HTML Object Library
Microsoft Internet Controls
I want to be able to select the text on the first or second span and paste it on a cell, how can I do this?

This worked for me:
Dim oHtml As HTMLDocument
Dim oElement As Object
Set oHtml = New HTMLDocument
With CreateObject("WINHTTP.WinHTTPRequest.5.1")
.Open "GET", "http://www.example.com", False
.send
oHtml.body.innerHTML = .responseText
End With
Set dados = oHtml.getElementsByClassName("myClass")(0).getElementsByTagName("span")
i = 0
For Each oElement In dados
Sheets("Sheet1").Range("A" & i + 1) = dados(i).innerText
i = i + 1
Next oElement

Here's how I've done it in the past:
Set oElement = oHtml.getElementsByClassName("myClass")
i = 0
While i < oElement.Length
Debug.Print oElement(i).innerText
i = i + 1
Wend
you might want innerHtml instead?

Related

VBA scrape src instead of href

I am using the code below code but it brings the value of 'src' instead of 'href' for some reason. Anyone can help please?
Sub bringfox(txt As String)
Dim oHtml As HTMLDocument
Dim oElement As Object
Set oHtml = New HTMLDocument
maintext2 = "https://www.jjfox.co.uk/cigars/show/all.html"
With CreateObject("WINHTTP.WinHTTPRequest.5.1")
.Open "GET", maintext2 & gr, False
.send
oHtml.body.innerHTML = .responseText
End With
counter = cnt
'oElement(i).Children(0).getAttribute ("href")
Set oElement = oHtml.getElementsByClassName("products-grid products-grid--max-3-col")(0).getElementsByTagName("a")
i = 0
While i < oElement.Length
Debug.Print oElement(i).Children(0).getAttribute("href")
i = i + 1
Wend
End Sub
You could try using a CSS selector.
#wrapper div.category-products > ul a
This is a reduced version of the full selector that targets a tags within the products category. You then parse the outerHTML for the hrefs as that is where the information is located.
Site image (Sample view)
Output from code (Sample view)
Code
Option Explicit
Public Sub GetInfo()
Dim oHtml As HTMLDocument, nodeList As Object, currentItem As Long
Const URL As String = "https://www.jjfox.co.uk/cigars/show/all.html"
Set oHtml = New HTMLDocument
With CreateObject("WINHTTP.WinHTTPRequest.5.1")
.Open "GET", URL, False
.send
oHtml.body.innerHTML = .responseText
End With
Set nodeList = oHtml.querySelectorAll("#wrapper div.category-products > ul a")
For currentItem = 0 To nodeList.Length - 1
On Error Resume Next
Debug.Print Split(Split(nodeList.item(currentItem).outerHTML, "<A href=")(1), ">")(0)
On Error GoTo 0
Next currentItem
End Sub
Or more simply, use the following
For currentItem = 0 To nodeList.Length - 1
On Error Resume Next
Debug.Print nodeList.item(currentItem).href
On Error GoTo 0
Next currentItem

Open website, find specific value and return value to Excel in VBA

I would like to use VBA to open a website, look for a certain paragraph in the HTML code of this website (<p class="myClass">XYZ</p>) and return this value to Excel, in my example "XYZ".
The website has only one paragraph (p element) with the above class.
I know that this is possible but don't know where to start here.
My code:
Dim objIE As Object
Set objIE = CreateObject("InternetExplorer.Application")
With objIE
.Navigate varUrl
Do While .Busy
Application.Wait Now + TimeValue("0:00:01")
Loop
.Visible = True
End With
Instead of opening IE, use a web request:
Set oRequest = New WinHttp.WinHttpRequest
With oRequest
.Open "GET", sUrl, True
.SetRequestHeader "Content-Type", "application/x-www-form-urlencoded; charset=UTF-8"
.Send "{range:9129370}"
.WaitForResponse
Set index = .ResponseText.IndexOf("<p class=""myClass"">")
Set text = .ResponseText.Substring(index,3)
Cells(row, col).Value = text
End With
If you don't know the length of the string you are looking for, you could also do a loop after index until you hit a "<" character.
Dim objIE As Object
Set objIE = CreateObject("InternetExplorer.Application")
With objIE
.Navigate varUrl
Do While .Busy
Application.Wait Now + TimeValue("0:00:01")
Loop
.Visible = True
End With
'HTML document
Dim doc As Object
Set doc = objIE.document
Dim el As Object
Dim myText as string
For Each el In doc.GetElementsByClassName("myClass")
'put paragrah text in cell A1
Cells(1, 1).Value = el.innerText
'put your paragraph text in a variable string
myText = el.innerText
Next el
That is a tricky and interesting question. Let's say that you want to obtain the title of this current website, which is in class question-hyperlink within StackOverflow. Thus, using the idea of the solution of #Matt Spinks you may come up with something like this:
Option Explicit
Public Sub TestMe()
Dim oRequest As Object
Dim strOb As String
Dim strInfo As String: strInfo = "class=""question-hyperlink"">"
Dim lngStart As Long
Dim lngEnd As Long
Set oRequest = CreateObject("WinHttp.WinHttpRequest.5.1")
With oRequest
.Open "GET", "http://stackoverflow.com/questions/42254051/vba-open-website-find-specific-value-and-return-value-to-excel#42254254", True
.SetRequestHeader "Content-Type", "application/x-www-form-urlencoded; charset=UTF-8"
.Send "{range:9129370}"
.WaitForResponse
strOb = .ResponseText
End With
lngStart = InStr(1, strOb, strInfo)
lngEnd = InStr(lngStart, strOb, "<")
Debug.Print Mid(strOb, lngStart + Len(strInfo), lngEnd - lngStart - Len(strInfo))
End Sub
Instead of Debug.print you may get the Title in a string and work further.

MSXML2.XMLHTTP response different from IE document

I want to know why in some cases the HTML in the response from MSXML2.XMLHTTP object does not produce the same results as automating Internet Explorer and inspecting the document property value.
For example, the procedure below compares the results found by using the object MSXML2.XMLHTTP (column A) with the results found by using the InternetExplorer object (column B).
The results of the InternetExplorer object include the NASDAQ index as expected but the results of the MSXML2.XMLHTTP object do not include the NASDAQ index and are completely different:
Sub ExtractDataFromInternet()
'Enable Tools/references 1. Microsoft Internet Control and 2. Microsoft HTML Object Library.
Dim URL
Dim objHTML As HTMLDocument
Dim Oelement As Object
Dim ie As New InternetExplorer
Dim J, Field1, Field2
Set objHTML = New HTMLDocument
URL = "http://www.nasdaq.com/"
'-------- METHOD1: MSXML2.XMLHTTP --------------------------
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", URL, False
.send
objHTML.body.innerHTML = .responseText
End With
J = 0
Set Field1 = objHTML.getElementsByTagName("td")
For Each Oelement In Field1
Worksheets("sheet1").Cells(J + 1, 1) = Field1(J).innerText
J = J + 1
Next Oelement
'----------METHOD2: InternetExplorer Object----------------
Set ie = New InternetExplorer
With ie
.navigate URL
.Visible = False
While .Busy Or .readyState <> READYSTATE_COMPLETE
DoEvents
Wend
Set objHTML = .document
DoEvents
End With
J = 0
Set Field1 = objHTML.getElementsByTagName("td")
For Each Oelement In Field1
Worksheets("sheet1").Cells(J + 1, 2) = Field1(J).innerText
J = J + 1
Next Oelement
DoEvents
ie.Quit
DoEvents
Set ie = Nothing
'----------------------------------------------
End Sub

vba htmldocument How can I delete a specific element

I am writing the code in VBA.
How can I delete a specific element?
Thank you
Dim html As MSHTML.HTMLDocument
Set html = New MSHTML.HTMLDocument
Dim document As MSHTML.HTMLDocument
Set document = html.createDocumentFromUrl("http://example.com/", vbNullString)
' wait download
Do While document.readyState <> "complete"
Loop
' Insert I Can
Call document.DocumentElement.insertAdjacentHTML("afterend", "<div>test</div>")
' I want to delete a specific element here But I fail
' Call document.removeNode("specific element")
You cant remove elements, but you can remove nodes, which is more or less the same.
Function DelTagById(strData As String, strID As String) As String
On Local Error GoTo MyError
Dim HTMLDoc As HTMLDocument
Dim Node As IHTMLDOMNode
DelTagById = strData
If strID = "" Then GoTo MyExit
Set HTMLDoc = New HTMLDocument
HTMLDoc.body.innerHTML = strData
Set Node = HTMLDoc.getElementById(strID)
If Node Is Nothing Then GoTo MyExit
Node.parentNode.removeChild Node
DelTagById = HTMLDoc.body.innerHTML
MyExit:
Set Node = Nothing
Set HTMLDoc = Nothing
Exit Function
MyError:
'Handle Error
Resume MyExit
End Function

Can't get to value inside XML spanclass

I'm trying to retrieve a value in one of the spanclasses in an online XML-file.
File: http://www.ecb.europa.eu/stats/eurofxref/eurofxref-daily.xml
I want to get to the USD rate but my code does not seem to loop through and span-classes, where is my mistake?
My code
Function response_Text(url As String) ' get the responsetext from an xml request
Dim xml As Object
Set xml = CreateObject("MSXML2.XMLHTTP")
With xml
.Open "Get", url, False
.send
response_Text = .responsetext
End With
Set xml = Nothing
End Function
Private Sub find_ClassElement(HTML_doc As MSHTML.HTMLDocument) ' return value inside span_class
Dim ticker As Variant
Dim XML_elements As MSHTML.IHTMLElementCollection
Dim XML_spanclass As MSHTML.HTMLSpanElement
Dim XML_targetElement As MSHTML.HTMLLIElement
Set XML_elements = HTML_doc.getElementsByClassName("line") **<--- something seems to be wrong here, the code does not loop through any span_classes after this point as intended ( the for statement is not being executed )**
For Each XML_spanclass In XML_elements
If InStr(XML_spanclass.innerHTML, "USD") > 0 Then
Debug.Print "success"
Set XML_targetElement = XML_spanclass.parentElement
Debug.Print CSng(XML_targetElement.getElementsByClassName("webkit-html-attribute-value")(0).innerHTML)
End If
Next
End Sub
Private Sub run() ' run the whole operation
Dim http_req As http_req: Set http_req = New http_req
Dim xml As MSHTML.HTMLDocument: Set xml = New MSHTML.HTMLDocument
Dim url As String: url = "http://www.ecb.europa.eu/stats/eurofxref/eurofxref-daily.xml"
xml.body.innerHTML = http_req.response_Text(url)
Call find_ClassElement(xml)
End Sub
There are no tags with a class of "line" so your collection is empty - nothing to loop through. Here's another way
Sub GetUSD()
Dim xHttp As MSXML2.XMLHTTP
Dim xDoc As MSXML2.DOMDocument
Dim xCube As MSXML2.IXMLDOMElement
Dim xCubes As MSXML2.IXMLDOMSelection
Dim sCurrency As String
'load the xml document
Set xDoc = New MSXML2.DOMDocument
xDoc.Load "http://www.ecb.europa.eu/stats/eurofxref/eurofxref-daily.xml"
'wait until it's completely loaded
Do
DoEvents
Loop Until xDoc.readyState = 4
'get all the cube tags
Set xCubes = xDoc.getElementsByTagName("Cube")
For Each xCube In xCubes
'some cube tags don't have attributes
On Error Resume Next
sCurrency = xCube.Attributes(0).Text
On Error GoTo 0
'if the first attribute is USD, get the second attribute
If sCurrency = "USD" Then
Debug.Print xCube.Attributes(1).Text
End If
Next xCube
End Sub
edit
I don't know xpath well enough to do this properly, but this works.
Sub GetUSD()
Dim xDoc As MSXML2.DOMDocument60
Dim xCube As MSXML2.IXMLDOMNode
Dim xCubes As MSXML2.IXMLDOMNodeList
Dim sCurrency As String
'load the xml document
Set xDoc = New MSXML2.DOMDocument60
xDoc.Load "http://www.ecb.europa.eu/stats/eurofxref/eurofxref-daily.xml"
'wait until it's completely loaded
Do
DoEvents
Loop Until xDoc.readyState = 4
'get all the cube tags
Set xCubes = xDoc.SelectNodes("//*")
For Each xCube In xCubes
On Error Resume Next
sCurrency = xCube.Attributes(0).NodeValue
On Error GoTo 0
'if the first attribute is USD, get the second attribute
If sCurrency = "USD" Then
Debug.Print xCube.Attributes(1).NodeValue
End If
Next xCube
End Sub