getElementsByClassName() failed in my VBA - vba

contents in Webpage as follow:
<pre>
<div class="catdivlogo">
<img src="//static.designandreuse.com/sip/logo/rambus2.gif" border="0" class="catlogo" alt="Rambus, Inc.">
</div>
</pre>
I failed to get the middle line by .getElementsByClASSnAME("catlogo"). I can only get the first line by .getElementsByClASSnAME("catlogo").
Anybody can help this?
My VBA as follow:
Sub Test()
Const URL As String = "https://www.design-reuse.com/sip/pci-express-c-430/"
Const CLASS_NAME As String = "catlogo"
Dim oDoc As MSHTML.HTMLDocument
Dim results As Object
With CreateObject("MSXML2.XMLHttp")
.Open "GET", URL, False
.send
'Check the response is valid
If .Status = 200 Then
Set oDoc = CreateObject("htmlfile")
oDoc.body.innerHTML = .responseText
Set results = oDoc.getElementsByClassName(CLASS_NAME)
End If
End With
End Sub

Related

Extract Data from URL VBA

I am trying to get Addresses Data from URL but facing some error. I am just beginner in VBA, i did not Understand where is problem in my code. wish somebody can help me to get right solution.
here I attached Image and also my VBA code
here is my Code
Public Sub IE_GetLink()
Dim sResponse As String, HTML As HTMLDocument
Dim url As String
Dim Re As Object
Set HTML = New HTMLDocument
Set Re = CreateObject("MSXML2.XMLHTTP")
'On Error Resume Next
url = "http://markexpress.co.in/network1.aspx?Center=360370&Tmp=1656224682265"
With Re
.Open "GET", url, False
.setRequestHeader "If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT"
.send
sResponse = StrConv(.responseBody, vbUnicode)
End With
Dim Title As Object
With HTML
.body.innerHTML = sResponse
Title = .querySelectorAll("#colspan")(0).innerText
End With
MsgBox Title
End Sub
Please help me ...
Several things.
What is wrong with your code:
Title should be a string as you are attempting to assign the return of .innerText to it. You have declared it as an object which would require SET keyword (and the removal of the .innerText accessor).
Colspan is an attribute not an id so your css selector list is incorrect.
Furthermore, looking at what the page actually does, there is a request for an additional document which actually has the info you need. You need to take the centre ID you already have and change the URI you make a request to.
Then, you want only the first td in the target table. Change your CSS selector list to target that.
Public Sub GetInfo()
Dim HTML As MSHTML.HTMLDocument
Dim re As Object
Set HTML = New MSHTML.HTMLDocument
Set re = CreateObject("MSXML2.XMLHTTP")
Dim url As String
Dim response As String
url = "http://crm.markerp.in/NetworkDetail.aspx?Center=360370&Tmp="
With re
.Open "GET", url, False
.setRequestHeader "If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT"
.send
response = .responseText
End With
Dim info As String
With HTML
.body.innerHTML = response
info = .querySelector("#tblDisp td").innerText
End With
MsgBox info
End Sub

Unable to collect links of different properties from a webpage

I've written a script in vba to get only the links of different properties under the title Single Family Homes from the right sided area of a webpage. When I run my script, I get nothing, no error either. The content I wish to grab are static and available within page source, so XMLHttpRequestshould do the trick.
Although it seems the selectors I've defined within my script is errorless, I can't still fetch the links of different properties.
Webpage address
I've written:
Sub GetLinks()
Const link$ = "https://www.zillow.com/homes/for_sale/33125/house_type/12_zm/0_mmm/"
Dim oHttp As New XMLHTTP60, Html As New HTMLDocument
Dim I&
With oHttp
.Open "GET", link, False
.setRequestHeader "User-Agent", "Mozilla/5.0"
.send
Html.body.innerHTML = .responseText
With Html.querySelectorAll("article > a.list-card-info")
For I = 0 To .Length - 1
Sheet1.Range("A1").Offset(I, 0) = .item(I).getAttribute("href")
Next I
End With
End With
End Sub
Expected links are like:
https://www.zillow.com/homedetails/3446-NW-15th-St-Miami-FL-33125/43822210_zpid/
https://www.zillow.com/homedetails/1877-NW-22nd-Ave-Miami-FL-33125/43823838_zpid/
https://www.zillow.com/homedetails/1605-NW-8th-Ter-Miami-FL-33125/43825765_zpid/
How can I get all the links of different properties from it's landing page from the link above?
Use the class of the child alone. Note there are a number of other things I would like to change about the code but know you like to keep your structure/style.
Sub GetLinks()
Const link$ = "https://www.zillow.com/homes/for_sale/33125/house_type/12_zm/0_mmm/"
Dim oHttp As New XMLHTTP60, Html As New HTMLDocument
Dim I&
With oHttp
.Open "GET", link, False
.setRequestHeader "User-Agent", "Mozilla/5.0"
.send
Html.body.innerHTML = .responseText
With Html.querySelectorAll(".list-card-info")
For I = 0 To .Length - 1
Sheet1.Range("A1").Offset(I, 0) = .item(I).getAttribute("href")
Next I
End With
End With
End Sub
Some of the changes I might make:
Private Sub GetLinks()
Const LINK As String = "https://www.zillow.com/homes/for_sale/33125/house_type/12_zm/0_mmm/"
Dim http As MSXML2.XMLHTTP60, html As MSHTML.HTMLDocument
Dim i As Long, links As Object
Set http = New MSXML2.XMLHTTP60: Set html = New MSHTML.HTMLDocument
With http
.Open "GET", LINK, False
.setRequestHeader "User-Agent", "Mozilla/5.0"
.send
html.body.innerHTML = .responseText
End With
Set links = html.querySelectorAll(".list-card-info")
With ThisWorkbook.Worksheets("Sheet1")
For i = 0 To links.Length - 1
.Cells(i + 1, 1) = links.item(i).href
Next i
End With
End Sub

Can't use querySelector in the right way in vba

I've written some code using vba to get all the movie names from a specific webpage out of a torrent site. However, pressing "F8" I could find out that the code works well and prints the results until it hits the last result from that page. As soon as it reaches the last name to parse, the program crashes. I did several times and suffered the same consequences. If vba doesn't support this css selector method then how could I collect results before the last one? Is there any reference to add in the library or something else before execution? Any help on this will be vastly appreciated.
Here is the code I have written:
Sub Torrent_data()
Dim http As New XMLHTTP60, html As New HTMLDocument
Dim movie_name As Object, movie As Object
With http
.Open "GET", "https://www.yify-torrent.org/search/1080p/", False
.send
html.body.innerHTML = .responseText
End With
Set movie_name = html.querySelectorAll("div.mv h3 a")
For Each movie In movie_name
x = x + 1: Cells(x, 1) = movie.innerText
Next movie
End Sub
Try this:
Sub Torrent_data()
Dim http As New XMLHTTP60, html As New HTMLDocument, x As Long
With http
.Open "GET", "https://www.yify-torrent.org/search/1080p/", False
.send
html.body.innerHTML = .responseText
End With
Do
x = x + 1
On Error Resume Next
Cells(x, 1) = html.querySelectorAll("div.mv h3 a")(x - 1).innerText
Loop Until Err.Number = 91
End Sub
This is another way which doesn't need error handler:
Sub GetContent()
Const URL$ = "https://yify-torrent.cc/search/1080p/"
Dim HTMLDoc As New HTMLDocument, R&, I&
With New ServerXMLHTTP60
.Open "Get", URL, False
.send
HTMLDoc.body.innerHTML = .responseText
End With
With HTMLDoc.querySelectorAll("h3 > a.movielink")
For I = 0 To .Length - 1
R = R + 1: Cells(R, 1).Value = .Item(I).innerText
Next I
End With
End Sub
the code retrieves one element after the last movie
this extra element causes the failure, so for each ... cannot be used
not sure why ... yet .... will update
Sub Torrent_data()
Dim http As New XMLHTTP60, html As New HTMLDocument
Dim movie_name As Object, movie As Object
With http
.Open "GET", "https://www.yify-torrent.org/search/1080p/", False
.send
html.body.innerHTML = .responseText
End With
Set movie_name = html.querySelectorAll("div.mv h3 a")
Dim i As Integer
For i = 0 To movie_name.Length - 1
Cells(x + i, 1) = movie_name(i).innerText
Next i
End Sub
looks like querySelectorAll has an issue of some sort
the object html.querySelectorAll(".mv h3 a") cannot be examined in Watch window.
attempting to do so crashes excel or word (i tried both)
tried other tags, same result
Sub Torrent_data()
Dim http As New XMLHTTP60, html As New HTMLDocument
Dim movie_name As Object, movie As Object
With http
.Open "GET", "https://www.yify-torrent.org/search/1080p/", False
.send
html.body.innerHTML = .responseText
End With
' Set movie_name = html.querySelectorAll("div.mv h3 a") ' querySelectorAll crashes VBA when trying to examine movie_name object
Set movie_name = html.getElementsByClassName("mv") ' HTMLElementCollection
For Each movie In movie_name
x = x + 1: Cells(x, 1) = movie.getElementsByTagName("a")(1).innerText
Next movie
' HTML block for each movie looks like this
' <div class="mv">
' <h3>
' Smoke (1995) 1080p
' </h3>
' <div class="movie">
' <div class="movie-image">
' <a href="/movie/55346/download-smoke-1995-1080p-mp4-yify-torrent.html" target="_blank" title="Download Smoke (1995) 1080p">
' <span class="play"><span class="name">Smoke (1995) 1080p</span></span>
' <img src="//pic.yify-torrent.org/20170820/55346/smoke-1995-1080p-poster.jpg" alt="Smoke (1995) 1080p" />
' </a>
' </div>
' </div>
' <div class="mdif">
' <ul>
' <li><b>Genre:</b>Comedy</li><li><b>Quality:</b>1080p</li><li><b>Screen:</b>1920x1040</li><li><b>Size:</b>2.14G</li><li><b>Rating:</b>7.4/10</li><li><b>Peers:</b>2</li><li><b>Seeds:</b>0</li>
' </ul>
' Download
' </div>
' </div>
End Sub
I know this old, but I managed on how to use querySelectorAll without crashes my IE.
Instead of using For-each I used For Loop
Example below:
Dim priceData as Object
Set priceData = IE.document.getElementsByClassName("list-flights")(0).querySelectorAll("[class$='price']")
For i = 0 to priceData.Length - 1
Debug.Print priceData.item(i).getElementsByClassName("cash js_linkInsideCell")(0).innerHTML
Next i

Error when changing IE automation code to XML

I recently started working with XML automation and after changing some basic IE automation code over, I seem to be getting an error. Here's the HTML:
<tbody>
<tr class="group-2 first">
<td class="date-col">
<a href="/stats/matches/mapstatsid/48606/teamone-vs-merciless">
<div class="time" data-time-format="d/M/yy" data-unix="1498593600000">27/6/17</div>
</a>
</td>
......SOME MORE HTML HERE......
</tr>
......SOME MORE HTML HERE......
</tbody>
And here's the code i'm using in Excel VBA:
Sub readData()
Dim XMLPage As New MSXML2.XMLHTTP60
Dim html As New MSHTML.HTMLDocument
XMLPage.Open "GET", "https://www.hltv.org/stats/matches", False
XMLPage.send
If XMLPage.Status <> 200 Then MsgBox XMLPage.statusText
html.body.innerHTML = XMLPage.responseText
For Each profile In html.getElementsByTagName("tbody")(0).Children
Debug.Print profile.getElementsByClassName("date-col")(0).getElementsByTagName("a")(0).getAttribute("href") 'Run time error '438' here
Next
End Sub
I'm getting the Run time error '438' at the debug print code. seems to be happening when getting the class but i'm unsure why. It works fine if I use this for example:
Debug.Print profile.innertext
Worked for me:
Sub readData()
Dim XMLPage As New MSXML2.XMLHTTP60
Dim html As New MSHTML.HTMLDocument, links, a, i
XMLPage.Open "GET", "https://www.hltv.org/stats/matches", False
XMLPage.send
If XMLPage.Status <> 200 Then MsgBox XMLPage.statusText
html.body.innerHTML = XMLPage.responseText
Set links = html.querySelectorAll("td.date-col > a")
Debug.Print links.Length
For i = 0 To links.Length - 1
Debug.Print links(i).href
Next
Set links = Nothing
Set html = Nothing
End Sub
FYI when I used For Each to loop over the links collection Excel would reliably crash, so I'd stay with the loop shown
profile refers to a row, and profile.cells(0) will refer to the first column in that row. So try...
profile.cells(0).getElementsByTagName("a")(0).getAttribute("href")
Also, profile should be declared as HTMLTableRow.
The URL you are using isn't serving valid XML, but it's recoverable with some simple regex replacements. Once we have some valid XML, we can load that into a DOM document and use XPath to select the nodes as required:
Option Explicit
'Add references to:
' - MSXML v3
' - Microsoft VBScript Regular Expressions 5.5
Sub test()
Const START_MARKER As String = "<table class=""stats-table matches-table"">"
Const END_MARKER As String = "</table>"
With New MSXML2.XMLHTTP
.Open "GET", "https://www.hltv.org/stats/matches", False
.send
If .Status = 200 Then
'The HTML isn't valid XHTML, so we can't just use the http.XMLResponse DOMDocument
'Let's extract the HTML table
Dim tableStart As Long
tableStart = InStr(.responseText, START_MARKER)
Dim tableEnd As Long
tableEnd = InStr(tableStart, .responseText, END_MARKER)
Dim tableHTML As String
tableHTML = Mid$(.responseText, tableStart, tableEnd - tableStart + Len(END_MARKER))
'The HTML table has invalid img tags (let's add a closing tag with some regex)
With New RegExp
.Global = True
.Pattern = "(\<img [\W\w]*?)"">"
Dim tableXML As String
tableXML = .Replace(tableHTML, "$1"" />")
End With
'And load an XML document from the cleaned up HTML fragment
Dim doc As MSXML2.DOMDocument
Set doc = New MSXML2.DOMDocument
doc.LoadXML tableXML
End If
End With
If Not doc Is Nothing Then
'Use XPath to select the nodes we need
Dim nodes As MSXML2.IXMLDOMSelection
Set nodes = doc.SelectNodes("//td[#class='date-col']/a/#href")
'Enumerate the URLs
Dim node As IXMLDOMAttribute
For Each node In nodes
Debug.Print node.nodeTypedValue
Next node
End If
End Sub
Output:
/stats/matches/mapstatsid/48606/teamone-vs-merciless
/stats/matches/mapstatsid/48607/merciless-vs-teamone
/stats/matches/mapstatsid/48608/merciless-vs-teamone
/stats/matches/mapstatsid/48600/wysix-vs-fnatic-academy
/stats/matches/mapstatsid/48602/skitlite-vs-nexus
/stats/matches/mapstatsid/48604/extatus-vs-forcebuy
/stats/matches/mapstatsid/48605/extatus-vs-forcebuy
/stats/matches/mapstatsid/48599/planetkey-vs-gatekeepers
/stats/matches/mapstatsid/48603/gatekeepers-vs-planetkey
/stats/matches/mapstatsid/48595/wysix-vs-gambit
/stats/matches/mapstatsid/48596/kinguin-vs-playing-ducks
/stats/matches/mapstatsid/48597/spirit-academy-vs-tgfirestorm
/stats/matches/mapstatsid/48601/spirit-academy-vs-tgfirestorm
/stats/matches/mapstatsid/48593/fnatic-academy-vs-gambit
/stats/matches/mapstatsid/48594/alternate-attax-vs-nexus
/stats/matches/mapstatsid/48590/pro100-vs-playing-ducks
/stats/matches/mapstatsid/48583/extatus-vs-ex-fury
/stats/matches/mapstatsid/48589/extatus-vs-ex-fury
/stats/matches/mapstatsid/48584/onlinerol-vs-forcebuy
/stats/matches/mapstatsid/48591/forcebuy-vs-onlinerol
/stats/matches/mapstatsid/48581/epg-vs-veni-vidi-vici
/stats/matches/mapstatsid/48588/epg-vs-veni-vidi-vici
/stats/matches/mapstatsid/48592/veni-vidi-vici-vs-epg
/stats/matches/mapstatsid/48582/log-vs-gatekeepers
/stats/matches/mapstatsid/48586/gatekeepers-vs-log
/stats/matches/mapstatsid/48580/spraynpray-vs-epg
/stats/matches/mapstatsid/48579/quantum-bellator-fire-vs-spraynpray
/stats/matches/mapstatsid/48571/noxide-vs-masterminds
/stats/matches/mapstatsid/48572/athletico-vs-legacy
/stats/matches/mapstatsid/48578/node-vs-avant
/stats/matches/mapstatsid/48573/funky-monkeys-vs-grayhound
/stats/matches/mapstatsid/48574/grayhound-vs-funky-monkeys
/stats/matches/mapstatsid/48575/hegemonyperson-vs-eclipseo
/stats/matches/mapstatsid/48577/eclipseo-vs-hegemonyperson
/stats/matches/mapstatsid/48566/masterminds-vs-tainted-black
/stats/matches/mapstatsid/48562/grayhound-vs-legacy
/stats/matches/mapstatsid/48563/noxide-vs-riotous-raccoons
/stats/matches/mapstatsid/48564/avant-vs-dark-sided
/stats/matches/mapstatsid/48565/avant-vs-dark-sided
/stats/matches/mapstatsid/48567/eclipseo-vs-uya
/stats/matches/mapstatsid/48568/uya-vs-eclipseo
/stats/matches/mapstatsid/48560/uya-vs-new4
/stats/matches/mapstatsid/48561/new4-vs-uya
/stats/matches/mapstatsid/48559/jaguar-sa-vs-miami-flamingos
/stats/matches/mapstatsid/48558/spartak-vs-binary-dragons
/stats/matches/mapstatsid/48557/kungar-vs-spartak
/stats/matches/mapstatsid/48556/igamecom-vs-fragsters
/stats/matches/mapstatsid/48554/nordic-warthogs-vs-aligon
/stats/matches/mapstatsid/48555/binary-dragons-vs-kungar
/stats/matches/mapstatsid/48550/havu-vs-rogue-academy
Looking at the MSHTML.HTMLDocument reference there is no method getElementsByClassName.
You will need to loop through each row in the tbody you are selecting and then get the first td in that row and then get the first link in that td and read the href attribute from it. You could alternately compare the class attribute of the td but since it is the first element in the row there is no need to do that.

How to access HTML file Document created through XMLHTTP request

I have got a responseText from an XMLHTTP request:
Set XMLHttp = CreateObject("MSXML2.XMLHTTP")
XMLHttp.Open "GET", urlPiece, False
XMLHttp.send
that I store in an HTML file created in memory:
Set htmlResponse = CreateObject("htmlfile")
htmlResponse.body.innerHTML = XMLHttp.responseText
If I look at the object htmlResponse on the debugger, I see the structure of a normal HTML file. However, when I try to get the document, I don't succeed:
Set doc = htmlResponse.document '<-- Invalid method or property
What am I doing wrong? Below my full code in case you want to test on real sample:
Sub getPrice()
Dim urlPiece As String: urlPiece = "https://fr.finance.yahoo.com/q?s="
Dim htmlResponse As Object
Dim XMLHttp As Object
Set htmlResponse = CreateObject("htmlfile")
ccyPair = "XAUUSD"
urlPiece = urlPiece & ccyPair & "=X"
Set XMLHttp = CreateObject("MSXML2.XMLHTTP")
XMLHttp.Open "GET", urlPiece, False
XMLHttp.send
htmlResponse.body.innerHTML = XMLHttp.responseText
Set doc = htmlResponse.document '<-- error here
End Sub
I have found the mistake myself.
Differently than JavaScript, the document is defined in the body of the HTMLfile and is not itself an attribute of the object.
Hence:
Set doc = htmlResponse.document
should rather be
Set doc = htmlResponse.body.document