VBA HTML Scraping Error "Object Variable or With Variable not set" - vba

I was following a tutorial of this video https://www.youtube.com/watch?v=sGw6r5GVA5g&t=2803s made by the WiseOwlTutorials channel and got stuck at a listing procedure he explains at the 36:00 position of the video.
At that point, he starts to explain how to return the video url and name of a video list from a specific category through a iteration method called Sub ListVideosOnPage(VidCatName As String, VidCatURL As String) used in another module which loops through all video categories of their website main video page https://www.wiseowl.co.uk/videos (left corner menu list).
When this procedure starts, it goes inside each video category and get the name and url of each video from that category in order to list it on a page which, in that part of the Youtube video cited above, is a debug page. However, the actual WiseOwl Video page is diferente from that when the tutorial video was made.
So, I changed his method a little in order to put the correct elements on the debbugin page, as shown below:
Sub ListVideosOnPage(VidCatName As String, VidCatURL As String)
Dim XMLReq As New MSXML2.XMLHTTP60
Dim HTMLDoc As New MSHTML.HTMLDocument
Dim VidTables As MSHTML.IHTMLElementCollection
Dim VidTable As MSHTML.IHTMLElement
Dim VidRows As MSHTML.IHTMLElementCollection
Dim VidRow As MSHTML.IHTMLElement
Dim VidLink As MSHTML.IHTMLElement
XMLReq.Open "GET", VidCatURL, False
XMLReq.send
If XMLReq.Status <> 200 Then
MsgBox "Problem" & vbNewLine & XMLReq.Status & " - " & XMLReq.statusText
Exit Sub
End If
HTMLDoc.body.innerHTML = XMLReq.responseText
'get the table element in each video category found by other module
'VidTables tag added by me to get the new element on the WiseOwl website
Set VidTables = HTMLDoc.getElementsByTagName("table")
'loop starts to search for row and link tags on the current table
For Each VidTable In VidTables
Set VidRows = VidTable.getElementsByTagName("tr")
For Each VidRow In VidRows
Set VidLink = VidRow.getElementsByTagName("a")(0) 'just pick the first link
Debug.Print VidRow.innerText, VidRow.getattribute("href") 'objetc variable not set error happpens here
Next VidRow
Next VidTable
End Sub
I found a way to circumvent this Object Variable or With Variable not set error by changing the code inside vidrow loop, adding a manual index to the code to get only the first link in each row:
For Each VidTable In VidTables
Set VidRows = VidTable.getElementsByTagName("tr")
For Each VidRow In VidRows
Index = 0
For Each VidLink In VidLinks
If Index = 0 Then
Debug.Print VidLink.innerText, VidLink.getAttribute("href")
Index = Index + 1
End If
Next VidLink
Next VidRow
Next VidTable
But, in the turorial video referenced above, the instructor doesnt get this error when he codes indexes in the way shown below:
VidLink = VidRow.getElementsByTagName("a")(0)
Debug.Print VidRow.innerText, VidRow.getattribute("href")
So my question is how do I get these object variable not set errors and in the tutorial video the instructor doesnt? Looks like the same code to me, with each element defined in the right way and a much more efficient way to code then using if's. Could anyone more used to VBA please help with an answer this? Maybe I missing something.

tl:dr:
I first give you the debug and fix info;
I go on to show you a different way using CSS selectors to target the page styling. This is generally faster, more robust and more flexible;
VidCatName doesn't appear to be used but I have left in for now. I personally would remove unless you will later develop the code to use this variable. The second sub parameters are passed by value so I have added ByVal to the signature.
① Debugging:
Your error is because you are looping all table rows and trying to access a tags and then href attributes. The first row of each table is the header row and this doesn't have a tag elements, nor associated href attributes. See image below:
Table element on page:
See that the first tr tagged element in the table contains a child th tag element, indicating it is the table header, and that there is no associated a tag element.
Kind of like you were shown elsewhere in that video, you want to change your loop to a For Next, and then, in this case, start from index 1 to skip the header row.
So, the part containing this line: For Each VidRow In VidRows , becomes the following:
Dim VidRowID As Long
For Each VidTable In VidTables
Set VidRows = VidTable.getElementsByTagName("tr")
For VidRowID = 1 To VidRows.Length - 1 'first row is actually header which doesn't have an a tag or href
Set VidLink = VidRows(VidRowID).getElementsByTagName("a")(0)
Debug.Print VidLink.innerText, VidLink.getAttribute("href")
Next VidRowID
Next VidTable
There is also only one table per page so a loop of all tables is unnecessary code in this case.
Example full call (using your code with just the change in loop type):
Option Explicit
Public Sub test()
ListVideosOnPage "Business Intelligence (70)", "https://www.wiseowl.co.uk/business-intelligence/videos/"
End Sub
Public Sub ListVideosOnPage(ByVal VidCatName As String,ByVal VidCatURL As String)
Dim XMLReq As New MSXML2.XMLHTTP60
Dim HTMLDoc As New MSHTML.HTMLDocument
Dim VidTables As MSHTML.IHTMLElementCollection
Dim VidTable As MSHTML.IHTMLElement
Dim VidRows As MSHTML.IHTMLElementCollection
Dim VidRow As MSHTML.IHTMLElement
Dim VidLink As MSHTML.IHTMLElement
XMLReq.Open "GET", VidCatURL, False
XMLReq.send
If XMLReq.Status <> 200 Then
MsgBox "Problem" & vbNewLine & XMLReq.Status & " - " & XMLReq.statusText
Exit Sub
End If
HTMLDoc.body.innerHTML = XMLReq.responseText
Set VidTables = HTMLDoc.getElementsByTagName("table") 'Should limit to just one table
Dim VidRowID As Long
For Each VidTable In VidTables
Set VidRows = VidTable.getElementsByTagName("tr")
For VidRowID = 1 To VidRows.Length - 1 'first row is actually header which doesn't have an a tag or href
Set VidLink = VidRows(VidRowID).getElementsByTagName("a")(0)
Debug.Print VidLink.innerText, VidLink.getAttribute("href")
Next VidRowID
Next VidTable
End Sub
② CSS selectors:
I would instead use a CSS selector combination to target the a tag elements within the target parent table element. This is written as .bpTable a. A more official term for this combination is descendant selector.
The descendant combinator — typically represented by a single space (
) character — combines two selectors such that elements matched by the
second selector are selected if they have an ancestor element matching
the first selector. Selectors that utilize a descendant combinator are
called descendant selectors.
The .bpTable is in fact itself a class selector (like .getElementsByClassName). The class part indicated by the leading ".". So, elements with class name bpTable; which is the class name of the target table on each page.
Target table element on page:
This selector is applied via the .querySelectorAll method of .document and returns a static nodeList. You can then loop the .Length of this nodeList, from 0 to .Length -1, accessing elements by index.
Public Sub ListVideosOnPage(ByVal VidCatName As String, ByVal VidCatURL As String)
Dim XMLReq As New MSXML2.XMLHTTP60
Dim HTMLDoc As New MSHTML.HTMLDocument
XMLReq.Open "GET", VidCatURL, False
XMLReq.send
If XMLReq.Status <> 200 Then
MsgBox "Problem" & vbNewLine & XMLReq.Status & " - " & XMLReq.statusText
Exit Sub
End If
HTMLDoc.body.innerHTML = XMLReq.responseText
Dim aNodeList As Object, link As Long
Set aNodeList = HTMLDoc.querySelectorAll(".bpTable a")
For link = 0 To aNodeList.Length - 1
Debug.Print aNodeList(link).innerText, aNodeList(link).href
Next
End Sub
References (VBE > Tools > References):
Microsoft HTML Object Library
Microsoft XML, V6.0 'For my Excel 2016 version

Related

MSXML2 - How to search specific nodes and replace its child nodes

I have this XML file
I need to search for the <deviceset> element by its name (for example name="DB_") and replace its children subtree <technologies> with updated data.
So far I made function that returns MSXML2.IXMLDOMElement <technologies> with correct structure, but I have no clue how to search and replace in the main document.
I'm trying this approach
'Select everything from table Interlink - This table contains element's names
Dim RS As Recordset
Set RS = CurrentDb.OpenRecordset("SELECT * FROM Interlink")
'Create new document and load the file
Dim oDoc As DOMDocument60
Set oDoc = New DOMDocument60
oDoc.async = False
oDoc.Load CurrentProject.Path & "\JLC_pattern.xml"
Dim Tech As IXMLDOMElement 'I can set this to contain updated <technologies> subtree
'is it better to use IXMLDOMNode? or IXMLDOMDocumentFragment?
Dim devSets As IXMLDOMNodeList 'Collection ?
Dim devSet As IXMLDOMNode 'Node?
'Loop through the recordset, search for elements and replace the subtree <technologies>
Do Until RS.EOF
'Recordset now contains the deviceset name attribute
Debug.Print RS.Fields("lbrDeviceSetName") ' first record contains "DB_"
'I can't find the right method to find the node or collection
'I have tried:
Set devSets = oDoc.getElementsByTagName("deviceset") 'and
Set devSets = oDoc.selectNodes("//eagle/drawing/library/devicesets/deviceset")
'but devSets collection is always empty
For Each devSet In devSets
Debug.Print devSet.baseName ' this does not loop
Next devSet
'I made a function that returns IXMLDOMNode with needed data structure
'Once I find the node I need to replace the subtree
'and move to the next deviceset name
RS.MoveNext
Loop
'Save the modified XML document to disk
oDoc.Save CurrentProject.Path & "\SynthetizedDoc.xml"
RS.Close
'Cleanup...
It may be easier to loop through the collection of nodes and search the recordset instead of looping through the recordset and search the nodes.
Can anyone give me a clue please?
EDIT: I have expanded the VBA code with for each loop
Pattern XML is here JLC_Pattern.xml
EDIT 2: The <technologies> subtree can be quite huge. I don't want to overwhelm this post by code. I have a function getTechnology(tech as string) as IXMLDOMElement that pulls data from DB. Function output content can be downloaded here: IXMLDOMElement.xml The issue is not this function, I just don't know how to insert this output into the correct place of the oDoc
This works for me:
'Create new document and load the file
Dim oDoc As DOMDocument60
Dim devSet As IXMLDOMNode
Set oDoc = New DOMDocument60
oDoc.async = False
'https://learn.microsoft.com/en-us/previous-versions/windows/desktop/ms762632(v=vs.85)
oDoc.SetProperty "ProhibitDTD", False 'needed for MSXML6
oDoc.validateOnParse = False 'or get a DTD-related error
'"The element 'eagle' is used but not declared in the DTD/Schema."
'always test for load errors
If Not oDoc.Load("C:\Tester\JLC_pattern.xml") Then
Debug.Print oDoc.parseError.reason
Exit Sub
End If
'select a single node based on its name attribute value
Set devSet = oDoc.SelectSingleNode("/eagle/drawing/library/devicesets/deviceset[#name='DB_']")
If Not devSet Is Nothing Then
Debug.Print devSet.XML
'work with devSet child nodes...
Else
Debug.Print "node not found"
End If

VBA GetElementsById Method "Object Variable Not Set"

I'm trying to select the main menu ID of this page http://greyhoundstats.co.uk/index.php labeled ("menu_wholesome") in order to get their hyperlinks later on. In the HTML document, there are two tags with this ID, a <div> and its child element <ul>, but when i search for them with the code below, i get the object variable not set" error.
Option Explicit
Public Const MenuPage As String = "http://greyhoundstats.co.uk/index.php"
Sub BrowseMenus()
Dim XMLHTTPReq As New MSXML2.XMLHTTP60
Dim HTMLDoc As New MSHTML.HTMLDocument
Dim MainMenuList As MSHTML.IHTMLElement
Dim aElement As MSHTML.IHTMLElementCollection
Dim ulElement As MSHTML.IHTMLUListElement
Dim liElement As MSHTML.IHTMLLIElement
XMLHTTPReq.Open "GET", MenuPage, False
XMLHTTPReq.send
HTMLDoc.body.innerText = XMLHTTPReq.responseText
Set MainMenuList = HTMLDoc.getElementById("menu_wholesome")(0) '<-- error happens here
End Sub
Anyone knows why getElementsById can't find the refered ID, although it is part of the HTML document set? I know that this method is supposed to return a unique ID, but when we have the same one refered by other tags i also know that i will return the first ID found which should be the <div id="menu_wholesome"> part of the HTML page being requested.
Firstly: You want to work and set the innerHTML as you intend to traverse a DOM document.
Secondly: This line
Set MainMenuList = HTMLDoc.getElementById("menu_wholesome")(0)
It is incorrect. getElementById returns a single element which you cannot index into. You index into a collection.
Please note: Both div and and ul lead to the same content.
If you want to select them separately use querySelector
HTMLDoc.querySelector("div#menu_wholesome")
HTMLDoc.querySelector("ul#menu_wholesome")
The above target by tag name first then the id attribute.
If you want a collection of ids then use querySelectorAll to return a nodeList of matching items. Ids should be unique to the page but sometimes they are not!
HTMLDoc.querySelectorAll("#menu_wholesome")
You can then index into the nodeList e.g.
HTMLDoc.querySelectorAll("#menu_wholesome").item(0)
VBA:
Option Explicit
Public Const MenuPage As String = "http://greyhoundstats.co.uk/index.php"
Sub BrowseMenus()
Dim sResponse As String, HTMLDoc As New MSHTML.HTMLDocument
Dim MainMenuList As Object, div As Object, ul As Object
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", MenuPage, False
.setRequestHeader "If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT"
.send
sResponse = StrConv(.responseBody, vbUnicode)
End With
sResponse = Mid$(sResponse, InStr(1, sResponse, "<!DOCTYPE "))
HTMLDoc.body.innerHTML = sResponse
Set MainMenuList = HTMLDoc.querySelectorAll("#menu_wholesome")
Debug.Print MainMenuList.Length
Set div = HTMLDoc.querySelector("div#menu_wholesome")
Set ul = HTMLDoc.querySelector("ul#menu_wholesome")
Debug.Print div.outerHTML
Debug.Print ul.outerHTML
End Sub
It is unclear what are you trying to achieve. I just fixed the current problem you are having at this moment. .getElementById() deals with an individual element so when you treats it as a collection of element then it will throws that error. If you notice this portion getElementBy and getElementsBy, you can see the variation as to which one is a collection of elements (don't overlook the s). You can only use (0) or something similar when you make use of getElementsBy.
You should indent your code in the right way so that others can read it without any trouble:
Sub BrowseMenus()
Const MenuPage$ = "http://greyhoundstats.co.uk/index.php"
Dim HTTPReq As New XMLHTTP60, HTMLDoc As New HTMLDocument
Dim MainMenuList As Object
With HTTPReq
.Open "GET", MenuPage, False
.send
HTMLDoc.body.innerHTML = .responseText
End With
Set MainMenuList = HTMLDoc.getElementById("menu_wholesome")
End Sub

Trying to get VBA to pull in data from Zillow

I am new to coding and have been trying to figure out how to extract specific data from zillow and import it into excel. To be honest I am pretty lost trying to figure this out and I have been looking throughout the form and other online videos, but I haven't had any luck.
Here is the link to the website I am using https://www.zillow.com/new-york-ny/home-values/
I am looking to pull all the numbers into excel so I can run some calculations. If someone could help me just pull in the Zillow Home Value Index of $660,000 into excel, I feel that I can figure out the rest.
This is the code from the website
<ul class="value-info-list" id="yui_3_18_1_1_1529698944920_2626">
<li id="yui_3_18_1_1_1529698944920_2625">
<!-- TODO: need zillow logo icon here -->
<!-- <span class="zss-logo-color"><span class="zss-font-icon"></span></span> -->
<span class="value" id="yui_3_18_1_1_1529698944920_2624">
$660,000
</span>
<span class="info zsg-fineprint"> ZHVI
</span>
I tried getElementsByTagName getElementById and getElemenByClass The id is confusing me since I want to be able to enter any town into excel and it will search on zillow for the data on the web page. All the id tags are different so if I search by id in this code it will not work for other towns. I used the Class tag and was able to get some of the data I was looking for.
This is the code I came up with It pulls into the text box the $660,000. The Range function is working and putting the text box data into excel. This is pulling a bunch of strings which I was able to pull out the $660,000, but the way the sting is set up Im not sure how to pull the remaining data, such as the 1 year forecast "yr_forcast" is the cell range I want to pull the data into excel.
Sub SearchBot1()
'dimension (declare or set aside memory for) our variables
Dim objIE As InternetExplorer 'special object variable representing the IE browser
Dim aEle As HTMLLinkElement 'special object variable for an <a> (link) element
Dim y As Integer 'integer variable we'll use as a counter
Dim result As String 'string variable that will hold our result link
Dim Doc As HTMLDocument 'holds document object for internet explorer
'initiating a new instance of Internet Explorer and asigning it to objIE
Set objIE = New InternetExplorer
'make IE browser visible (False would allow IE to run in the background)
objIE.Visible = True
'navigate IE to this web page (a pretty neat search engine really)
objIE.navigate "https://www.zillow.com/new-york-ny/home-values/"
'wait here a few seconds while the browser is busy
Do While objIE.Busy = True Or objIE.readyState <> 4: DoEvents: Loop
'in the search box put cell "A2" value, the word "in" and cell "C1" value
objIE.document.getElementById("local-search").Value = _
Sheets("Sheet2").Range("B3").Value & ", " & Sheets("Sheet2").Range("B4").Value
'click the 'go' button
Set the_input_elements = objIE.document.getElementsByTagName("button")
For Each input_element In the_input_elements
If input_element.getAttribute("name") = "SubmitButton" Then
input_element.Click
Exit For
End If
Next input_element
'wait again for the browser
Do While objIE.Busy = True Or objIE.readyState <> 4: DoEvents: Loop
'price for home
Set Doc = objIE.document
Dim cclass As String
cclass = Trim(Doc.getElementsByClassName("value-info-list")(0).innerText)
MsgBox cclass
Dim aclass As Variant
aclass = Split(cclass, " ")
Range("Market_Price").Value = aclass(0)
Range("yr_forecast").Value = aclass(5)
'close the browser
objIE.Quit
End Sub
If you need anymore information please let me know.
The value you want is the first element with className value. You can use querySelector to apply a CSS selector of .value, where "." is the selector for class, to get this value.
Option Explicit
Public Sub GetInfo()
Dim html As New MSHTML.HTMLDocument
Const URL As String = "https://www.zillow.com/new-york-ny/home-values/"
html.body.innerHTML = GetHTML(URL)
Debug.Print html.querySelector(".value").innerText
End Sub
Public Function GetHTML(ByVal URL As String) As String
Dim sResponse As String
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", URL, False
.send
sResponse = StrConv(.responseBody, vbUnicode)
End With
GetHTML = Mid$(sResponse, InStr(1, sResponse, "<!DOCTYPE "))
End Function
You could also use:
Debug.Print html.getElementsByClassName("value")(0).innerText
Current webpage value:
Code output:

Extract list of all input boxes on webpage vba

I want to create a list on Excel of all the labels of input boxes on a webpage- so I imagine the code would be something like:
Sub IEInteract()
Dim i As Long
Dim URL As String
Dim IE As Object
Dim objCollection As Object
Set IE = CreateObject("InternetExplorer.Application")
IE.Visible = True
URL = "mywebsite.com"
IE.Navigate URL
Do While IE.ReadyState = 4: DoEvents: Loop
Do Until IE.ReadyState = 4: DoEvents: Loop
objCollection = IE.Document.getElementsByTagName("input")
For Each el In objCollection
label = el.label 'or something like that????'
Debug.Print label
Next el
End Sub
Where am I going wrong? Thanks
BTW My VBA is OK, but my HTML is non-existent.
For learning purposes maybe choose a website that has more obvious inputboxes, rather than dropdowns.
Many inputboxes won't be pre-populated so maybe consider reading other properties of the retrieved elements. Or even writing to them and then retrieving those values.
Selecting by tag name can bring back a host of items that you might not have expected.
Bearing all of the above in mind. Try running the following, which generates a collection of <input> tag elements.
Code:
Option Explicit
Public Sub PrintTagInfo()
'Tools > references > Microsoft XML and HTML Object library
Dim http As New XMLHTTP60 '<== this will be specific to your excel version
Dim html As New HTMLDocument
With http
.Open "GET", "https://www.mrexcel.com/forum/register.php", False
.send
html.body.innerHTML = .responseText
End With
Dim inputBoxes As MSHTML.IHTMLElementCollection, iBox As MSHTML.IHTMLElement, i As Long
Set inputBoxes = html.getElementsByTagName("input") '<== the collection of input tags on the page
'<== These are input boxes i.e. you are putting info into them so perhaps populate and then try to read what is in the entry box?
For Each iBox In inputBoxes
Debug.Print "Result #" & i + 1
Debug.Print vbNewLine
Debug.Print "ID: " & iBox.ID '<== select a sample of properties to print out as some maybe empty
Debug.Print "ClassName: " & iBox.className,
Debug.Print "Title: " & iBox.Title
Debug.Print String$(20, Chr$(61))
Debug.Print vbNewLine
i = i + 1
Next iBox
End Sub
Sample output:
From the above, it looks like class name might be in some ways more informative if you are looking to target boxes to input information into.
An initial inspection of the page source, selecting an inputbox and right-click > inspect... will help you refine your choices.
I noticed that a lot of the boxes of interest had the Input tag and then type = "text"
This means you can target elements matching this pattern using CSS selectors. In this case using the selector input[type=""text""].
Adjusting the former code to factor this in gives a smaller set of more targeted results. Note, using .querySelectorAll, to apply the CSS selector, returns a NodeList object which requires a different method of iterating over. A For Each Loop will cause Excel to crash as described here.
Code:
Option Explicit
Public Sub PrintTagInfo()
'Tools > references > Microsoft XML and HTML Object library
Dim http As New XMLHTTP60 '<== this will be specific to your excel version
Dim html As New HTMLDocument
With http
.Open "GET", "https://www.mrexcel.com/forum/register.php", False
.send
html.body.innerHTML = .responseText
End With
Dim inputBoxes As Object, i As Long
Set inputBoxes = html.querySelectorAll("input[type=""text""]") '<== the collection of text input boxes on page. Returned as a NodeList
'<== These are input boxes i.e. you are putting info into them so perhaps populate and then try to read what is in the entry box?
For i = 0 To inputBoxes.Length - 1
Debug.Print "Result #" & i + 1
Debug.Print vbNewLine
Debug.Print "ID: " & inputBoxes.Item(i).ID '<== select a sample of properties to print out as some maybe empty
Debug.Print "ClassName: " & inputBoxes.Item(i).className,
Debug.Print "Title: " & inputBoxes.Item(i).Title
Debug.Print String$(20, Chr$(61))
Debug.Print vbNewLine
Next i
End Sub
Sample results:
Note: I have edited the spacing to fit more into the image.
References added via VBE > Tools > References
Last two are those of interest. The bottom one will be version specific and you will need to re-write XMLHTTP60 which is for XML 6.0 to target your version of Excel if not using Excel 2016.

Error when changing IE automation code to XML

I recently started working with XML automation and after changing some basic IE automation code over, I seem to be getting an error. Here's the HTML:
<tbody>
<tr class="group-2 first">
<td class="date-col">
<a href="/stats/matches/mapstatsid/48606/teamone-vs-merciless">
<div class="time" data-time-format="d/M/yy" data-unix="1498593600000">27/6/17</div>
</a>
</td>
......SOME MORE HTML HERE......
</tr>
......SOME MORE HTML HERE......
</tbody>
And here's the code i'm using in Excel VBA:
Sub readData()
Dim XMLPage As New MSXML2.XMLHTTP60
Dim html As New MSHTML.HTMLDocument
XMLPage.Open "GET", "https://www.hltv.org/stats/matches", False
XMLPage.send
If XMLPage.Status <> 200 Then MsgBox XMLPage.statusText
html.body.innerHTML = XMLPage.responseText
For Each profile In html.getElementsByTagName("tbody")(0).Children
Debug.Print profile.getElementsByClassName("date-col")(0).getElementsByTagName("a")(0).getAttribute("href") 'Run time error '438' here
Next
End Sub
I'm getting the Run time error '438' at the debug print code. seems to be happening when getting the class but i'm unsure why. It works fine if I use this for example:
Debug.Print profile.innertext
Worked for me:
Sub readData()
Dim XMLPage As New MSXML2.XMLHTTP60
Dim html As New MSHTML.HTMLDocument, links, a, i
XMLPage.Open "GET", "https://www.hltv.org/stats/matches", False
XMLPage.send
If XMLPage.Status <> 200 Then MsgBox XMLPage.statusText
html.body.innerHTML = XMLPage.responseText
Set links = html.querySelectorAll("td.date-col > a")
Debug.Print links.Length
For i = 0 To links.Length - 1
Debug.Print links(i).href
Next
Set links = Nothing
Set html = Nothing
End Sub
FYI when I used For Each to loop over the links collection Excel would reliably crash, so I'd stay with the loop shown
profile refers to a row, and profile.cells(0) will refer to the first column in that row. So try...
profile.cells(0).getElementsByTagName("a")(0).getAttribute("href")
Also, profile should be declared as HTMLTableRow.
The URL you are using isn't serving valid XML, but it's recoverable with some simple regex replacements. Once we have some valid XML, we can load that into a DOM document and use XPath to select the nodes as required:
Option Explicit
'Add references to:
' - MSXML v3
' - Microsoft VBScript Regular Expressions 5.5
Sub test()
Const START_MARKER As String = "<table class=""stats-table matches-table"">"
Const END_MARKER As String = "</table>"
With New MSXML2.XMLHTTP
.Open "GET", "https://www.hltv.org/stats/matches", False
.send
If .Status = 200 Then
'The HTML isn't valid XHTML, so we can't just use the http.XMLResponse DOMDocument
'Let's extract the HTML table
Dim tableStart As Long
tableStart = InStr(.responseText, START_MARKER)
Dim tableEnd As Long
tableEnd = InStr(tableStart, .responseText, END_MARKER)
Dim tableHTML As String
tableHTML = Mid$(.responseText, tableStart, tableEnd - tableStart + Len(END_MARKER))
'The HTML table has invalid img tags (let's add a closing tag with some regex)
With New RegExp
.Global = True
.Pattern = "(\<img [\W\w]*?)"">"
Dim tableXML As String
tableXML = .Replace(tableHTML, "$1"" />")
End With
'And load an XML document from the cleaned up HTML fragment
Dim doc As MSXML2.DOMDocument
Set doc = New MSXML2.DOMDocument
doc.LoadXML tableXML
End If
End With
If Not doc Is Nothing Then
'Use XPath to select the nodes we need
Dim nodes As MSXML2.IXMLDOMSelection
Set nodes = doc.SelectNodes("//td[#class='date-col']/a/#href")
'Enumerate the URLs
Dim node As IXMLDOMAttribute
For Each node In nodes
Debug.Print node.nodeTypedValue
Next node
End If
End Sub
Output:
/stats/matches/mapstatsid/48606/teamone-vs-merciless
/stats/matches/mapstatsid/48607/merciless-vs-teamone
/stats/matches/mapstatsid/48608/merciless-vs-teamone
/stats/matches/mapstatsid/48600/wysix-vs-fnatic-academy
/stats/matches/mapstatsid/48602/skitlite-vs-nexus
/stats/matches/mapstatsid/48604/extatus-vs-forcebuy
/stats/matches/mapstatsid/48605/extatus-vs-forcebuy
/stats/matches/mapstatsid/48599/planetkey-vs-gatekeepers
/stats/matches/mapstatsid/48603/gatekeepers-vs-planetkey
/stats/matches/mapstatsid/48595/wysix-vs-gambit
/stats/matches/mapstatsid/48596/kinguin-vs-playing-ducks
/stats/matches/mapstatsid/48597/spirit-academy-vs-tgfirestorm
/stats/matches/mapstatsid/48601/spirit-academy-vs-tgfirestorm
/stats/matches/mapstatsid/48593/fnatic-academy-vs-gambit
/stats/matches/mapstatsid/48594/alternate-attax-vs-nexus
/stats/matches/mapstatsid/48590/pro100-vs-playing-ducks
/stats/matches/mapstatsid/48583/extatus-vs-ex-fury
/stats/matches/mapstatsid/48589/extatus-vs-ex-fury
/stats/matches/mapstatsid/48584/onlinerol-vs-forcebuy
/stats/matches/mapstatsid/48591/forcebuy-vs-onlinerol
/stats/matches/mapstatsid/48581/epg-vs-veni-vidi-vici
/stats/matches/mapstatsid/48588/epg-vs-veni-vidi-vici
/stats/matches/mapstatsid/48592/veni-vidi-vici-vs-epg
/stats/matches/mapstatsid/48582/log-vs-gatekeepers
/stats/matches/mapstatsid/48586/gatekeepers-vs-log
/stats/matches/mapstatsid/48580/spraynpray-vs-epg
/stats/matches/mapstatsid/48579/quantum-bellator-fire-vs-spraynpray
/stats/matches/mapstatsid/48571/noxide-vs-masterminds
/stats/matches/mapstatsid/48572/athletico-vs-legacy
/stats/matches/mapstatsid/48578/node-vs-avant
/stats/matches/mapstatsid/48573/funky-monkeys-vs-grayhound
/stats/matches/mapstatsid/48574/grayhound-vs-funky-monkeys
/stats/matches/mapstatsid/48575/hegemonyperson-vs-eclipseo
/stats/matches/mapstatsid/48577/eclipseo-vs-hegemonyperson
/stats/matches/mapstatsid/48566/masterminds-vs-tainted-black
/stats/matches/mapstatsid/48562/grayhound-vs-legacy
/stats/matches/mapstatsid/48563/noxide-vs-riotous-raccoons
/stats/matches/mapstatsid/48564/avant-vs-dark-sided
/stats/matches/mapstatsid/48565/avant-vs-dark-sided
/stats/matches/mapstatsid/48567/eclipseo-vs-uya
/stats/matches/mapstatsid/48568/uya-vs-eclipseo
/stats/matches/mapstatsid/48560/uya-vs-new4
/stats/matches/mapstatsid/48561/new4-vs-uya
/stats/matches/mapstatsid/48559/jaguar-sa-vs-miami-flamingos
/stats/matches/mapstatsid/48558/spartak-vs-binary-dragons
/stats/matches/mapstatsid/48557/kungar-vs-spartak
/stats/matches/mapstatsid/48556/igamecom-vs-fragsters
/stats/matches/mapstatsid/48554/nordic-warthogs-vs-aligon
/stats/matches/mapstatsid/48555/binary-dragons-vs-kungar
/stats/matches/mapstatsid/48550/havu-vs-rogue-academy
Looking at the MSHTML.HTMLDocument reference there is no method getElementsByClassName.
You will need to loop through each row in the tbody you are selecting and then get the first td in that row and then get the first link in that td and read the href attribute from it. You could alternately compare the class attribute of the td but since it is the first element in the row there is no need to do that.