I am trying to convert some of my working VBA code to VBScript, but keep getting errors in VBScript when trying to use the getElementsByClassName method. Here's the full code:
option explicit
Dim XMLPage, html
Set XMLPage = WScript.CreateObject("MSXML2.XMLHTTP")
Set html= CreateObject("HTMLFile")
XMLPage.Open "GET", "https://www.hltv.org/stats/matches/mapstatsid/48745/immortals-vs-dignitas", False
Wscript.Sleep 50
XMLPage.send
Wscript.Sleep 50
If XMLPage.Status <> 200 Then MsgBox XMLPage.statusText
html.Open
html.write XMLPage.responseText
html.Close
'msgbox html.getElementsByTagName("tbody")(0).innertext'WORKS
msgbox html.getElementsByClassName("match-info-box-con")(0).innertext'DOESNT WORK
The last line of code is where the following error occurs:
If I comment that out and run it to search for a tag name instead (code on the line above) - it works fine no problem.
I suspect it has something to do with how the variable html is declared, as from what I understand. getElementsByClassName comes from IHTMLElement6 - but I am unsure on how to get this to work in VBScript.
MSHTML behaves differently depending on how it was instantiated - it exposes different interfaces depending on whether or not its early or late bound (its heavily reliant on IDispatch).
You are late binding and no interface exposing getElementsByClassName is available.
You can loop over document.all() and look at each item.className.
I used similar code to retrieve data from a POST request.
getElementsByClassName worked only if preceded with another command, like "msgbox 1" or anything to halt the script for a fraction of a second.
Then I tried Wscript.Sleep 200, decreasing it to the smallest possible number, and it still worked.
Wscript.Sleep 1 ' This line got it working.
msgbox html.getElementsByClassName("match-info-box-con")(0).innertext'
I took some time to elaborate a working example.
Thanks to GSerg for pointing out the load delay. That is accurate.
Had to tweak the code a little to get it working though.
Based on the previous comments, maybe the behaviour of MSHTML depends on the code being parsed. Hence the aditional meta tag below.
Set objHTTP = CreateObject("MSXML2.XMLHTTP")
Dim htmldoc: Set htmldoc = CreateObject("htmlfile")
'
URL = "https://stackoverflow.com/questions/44853941/vbscript-getelementsbyclassname-not-supported"
' sEnv = ""
'
objHTTP.Open "GET", URL, False
' objHTTP.setRequestHeader "Content-Type", "application/x-www-form-urlencoded"
objHTTP.send ' (sEnv)
ttext = objHTTP.responsetext
ttext = "<meta http-equiv=""X-UA-Compatible"" content=""IE=EDGE,chrome=1"" />" & vbnewline & ttext
htmldoc.write ttext
htmldoc.close
htmldoc.designMode = "on" ' Refer to https://developpaper.com/method-of-parsing-html-documents-by-vbs-htmlfile/
WScript.ConnectObject htmldoc, "htmldoc_"
Sub htmldoc_onreadystatechange()
If htmldoc.readyState = "interactive" Then
ttext = htmldoc.getElementsByClassName("fs-headline1").Item(0).innerText
msgbox ttext
Wscript.quit
End If
End Sub
'-----------------
Wscript.Sleep 10000 ' Random timeout
msgbox "Timeout!"
Related
running following code i get a .readystate = 1, i do not understand why. Is anyone able to help me ? As well and more globally, i am looking for some documentation on using MSXML2.ServerXMLHTTP or XMLHTTP60, is there a website for noobs on this topic ?*
thanks in advance !
Sub a()
Dim wwwPage As Object
Dim Email As String
DoEvents
On Error Resume Next
Set wwwPage = CreateObject("MSXML2.ServerXMLHTTP")
With wwwPage
.Open "GET", "https://runninggeek.be/annuaire-des-clubs/", False
.send
Position = 1
If .Status = 200 And .readyState = 4 Then
Email = "xxx"
End If
End With
End Sub
Am expecting .readystate 4. I see sometimes we need to use headers but i do not know why nor how, nor even how i can find what header to use
I need some help to download the stock table located in this URL:
I’ve tried with the code below to at least grab the first line, but what in the inspector is showed as :
<a target=”_blank”href=”/equities/apple-computer-inc” title=Apple Inc”>Apple</a>
I can only see:
A title={fullName} href="about:{pairLink}" target=_blank>{pairName}
This is the code I've put together:
Sub table()
Dim XMLReq As New MSXML2.XMLHTTP60
Dim HTMLDoc As New MSHTML.HTMLDocument
Dim Tables As MSHTML.IHTMLElementCollection
Dim table As MSHTML.IHTMLElement
Dim TableRow As MSHTML.IHTMLElement
XMLReq.Open "GET", "https://es.investing.com/stock-screener/?sp=country::5|sector::a|industry::a|equityType::a|exchange::a|eq_market_cap::110630000,1990000000000%3Ceq_market_cap;2", False
XMLReq.send
If XMLReq.Status <> 200 Then
MsgBox "problem" & vbNewLine & XMLReq.Status & "- " & XMLReq.statusText
Exit Sub
End If
HTMLDoc.body.innerHTML = XMLReq.responseText
Set Tables = HTMLDoc.getElementsByTagName("Table")
For Each table In Tables
If table.className = "displayNone genTbl openTbl resultsStockScreenerTbl elpTbl " Then
For Each TableRow In table.getElementsByTagName("td")
Debug.Print TableRow.innerHTML
Next
End If
Next table
End Sub
Any help will be appreciated.
It looks like the actual data that fills the table is pulled from JSON from another request that some javascript or something runs on the page.
This might make it easier to parse the response with a json parser but it might be difficult to compose the correct request to get the data you want. The owners of the website might not want you do do this so they might not make it easy.
It looks like a POST request with a bunch of parameters and also a cookie sent along. So basically you would need to re-create this POST request by adding all of the correct parameters and the correct cookie in the header. I would get a web debugging program like fiddler (shown above) to look and see what is going on.
I was going to also suggest you check and see if that website provides an API but it looks like it doesn't?
EDIT:
I was actually able to get the JSON with the data you want by pretty much just copying the request used on the site:
Sub getdata()
Dim XMLReq As New MSXML2.XMLHTTP60
XMLReq.Open "POST", "https://es.investing.com/stock-screener/Service/SearchStocks", False
XMLReq.setRequestHeader "Content-Type", "application/x-www-form-urlencoded"
XMLReq.setRequestHeader "Accept", "application/json"
XMLReq.setRequestHeader "X-Requested-With", "XMLHttpRequest"
XMLReq.send "country%5B%5D=5&exchange%5B%5D=95&exchange%5B%5D=2&exchange%5B%5D=1§or=5%2C12%2C3%2C8%2C9%2C1%2C7%2C6%2C2%2C11%2C4%2C10&industry=74%2C56%2C73%2C29%2C25%2C4%2C47%2C12%2C8%2C44%2C52%2C45%2C71%2C99%2C65%2C70%2C98%2C40%2C39%2C42%2C92%2C101%2C6%2C30%2C59%2C77%2C100%2C9%2C50%2C46%2C88%2C94%2C62%2C75%2C14%2C51%2C93%2C96%2C34%2C55%2C57%2C76%2C66%2C5%2C3%2C41%2C87%2C67%2C85%2C16%2C90%2C53%2C32%2C27%2C48%2C24%2C20%2C54%2C33%2C19%2C95%2C18%2C22%2C60%2C17%2C11%2C35%2C31%2C43%2C97%2C81%2C69%2C102%2C72%2C36%2C78%2C10%2C86%2C7%2C21%2C2%2C13%2C84%2C1%2C23%2C79%2C58%2C49%2C38%2C89%2C63%2C64%2C80%2C37%2C28%2C82%2C91%2C61%2C26%2C15%2C83%2C68&equityType=ORD%2CDRC%2CPreferred%2CUnit%2CClosedEnd%2CREIT%2CELKS%2COpenEnd%2CRight%2CParticipationShare%2CCapitalSecurity%2CPerpetualCapitalSecurity%2CGuaranteeCertificate%2CIGC%2CWarrant%2CSeniorNote%2CDebenture%2CETF%2CADR%2CETC%2CETN&eq_market_cap%5Bmin%5D=110630000&eq_market_cap%5Bmax%5D=1990000000000&pn=1&order%5Bcol%5D=eq_market_cap&order%5Bdir%5D=d"
If XMLReq.Status <> 200 Then
MsgBox "problem" & vbNewLine & XMLReq.Status & "- " & XMLReq.statusText
Exit Sub
End If
Debug.Print XMLReq.responseText
End Sub
So now you will just need to figure out how to parse the JSON response.
I want to create a list on Excel of all the labels of input boxes on a webpage- so I imagine the code would be something like:
Sub IEInteract()
Dim i As Long
Dim URL As String
Dim IE As Object
Dim objCollection As Object
Set IE = CreateObject("InternetExplorer.Application")
IE.Visible = True
URL = "mywebsite.com"
IE.Navigate URL
Do While IE.ReadyState = 4: DoEvents: Loop
Do Until IE.ReadyState = 4: DoEvents: Loop
objCollection = IE.Document.getElementsByTagName("input")
For Each el In objCollection
label = el.label 'or something like that????'
Debug.Print label
Next el
End Sub
Where am I going wrong? Thanks
BTW My VBA is OK, but my HTML is non-existent.
For learning purposes maybe choose a website that has more obvious inputboxes, rather than dropdowns.
Many inputboxes won't be pre-populated so maybe consider reading other properties of the retrieved elements. Or even writing to them and then retrieving those values.
Selecting by tag name can bring back a host of items that you might not have expected.
Bearing all of the above in mind. Try running the following, which generates a collection of <input> tag elements.
Code:
Option Explicit
Public Sub PrintTagInfo()
'Tools > references > Microsoft XML and HTML Object library
Dim http As New XMLHTTP60 '<== this will be specific to your excel version
Dim html As New HTMLDocument
With http
.Open "GET", "https://www.mrexcel.com/forum/register.php", False
.send
html.body.innerHTML = .responseText
End With
Dim inputBoxes As MSHTML.IHTMLElementCollection, iBox As MSHTML.IHTMLElement, i As Long
Set inputBoxes = html.getElementsByTagName("input") '<== the collection of input tags on the page
'<== These are input boxes i.e. you are putting info into them so perhaps populate and then try to read what is in the entry box?
For Each iBox In inputBoxes
Debug.Print "Result #" & i + 1
Debug.Print vbNewLine
Debug.Print "ID: " & iBox.ID '<== select a sample of properties to print out as some maybe empty
Debug.Print "ClassName: " & iBox.className,
Debug.Print "Title: " & iBox.Title
Debug.Print String$(20, Chr$(61))
Debug.Print vbNewLine
i = i + 1
Next iBox
End Sub
Sample output:
From the above, it looks like class name might be in some ways more informative if you are looking to target boxes to input information into.
An initial inspection of the page source, selecting an inputbox and right-click > inspect... will help you refine your choices.
I noticed that a lot of the boxes of interest had the Input tag and then type = "text"
This means you can target elements matching this pattern using CSS selectors. In this case using the selector input[type=""text""].
Adjusting the former code to factor this in gives a smaller set of more targeted results. Note, using .querySelectorAll, to apply the CSS selector, returns a NodeList object which requires a different method of iterating over. A For Each Loop will cause Excel to crash as described here.
Code:
Option Explicit
Public Sub PrintTagInfo()
'Tools > references > Microsoft XML and HTML Object library
Dim http As New XMLHTTP60 '<== this will be specific to your excel version
Dim html As New HTMLDocument
With http
.Open "GET", "https://www.mrexcel.com/forum/register.php", False
.send
html.body.innerHTML = .responseText
End With
Dim inputBoxes As Object, i As Long
Set inputBoxes = html.querySelectorAll("input[type=""text""]") '<== the collection of text input boxes on page. Returned as a NodeList
'<== These are input boxes i.e. you are putting info into them so perhaps populate and then try to read what is in the entry box?
For i = 0 To inputBoxes.Length - 1
Debug.Print "Result #" & i + 1
Debug.Print vbNewLine
Debug.Print "ID: " & inputBoxes.Item(i).ID '<== select a sample of properties to print out as some maybe empty
Debug.Print "ClassName: " & inputBoxes.Item(i).className,
Debug.Print "Title: " & inputBoxes.Item(i).Title
Debug.Print String$(20, Chr$(61))
Debug.Print vbNewLine
Next i
End Sub
Sample results:
Note: I have edited the spacing to fit more into the image.
References added via VBE > Tools > References
Last two are those of interest. The bottom one will be version specific and you will need to re-write XMLHTTP60 which is for XML 6.0 to target your version of Excel if not using Excel 2016.
Running my code for a site to crawl the titles of different tutorials spreading across several pages, I found it working flawless. I tried to write some code not depending on the last page number the url has but on the status code until it shows http.status<>200. The code I'm pasting below is working impeccably in this case. However, Trouble comes up when I try to use another url to see whether it breaks automatically but found that the code did fetch all the results but did not break. What is the workaround in this case so that the code will break when it is done and stop the macro? Here is the working one?
Sub WiseOwl()
Const mlink = "http://www.wiseowl.co.uk/videos/default"
Dim http As New XMLHTTP60, html As New HTMLDocument
Dim post As Object
Do While True
y = y + 1
With http
.Open "GET", mlink & "-" & y & ".htm", False
.send
If .Status <> 200 Then
MsgBox "It's done"
Exit Sub
End If
html.body.innerHTML = .responseText
End With
For Each post In html.getElementsByClassName("woVideoListDefaultSeriesTitle")
With post.getElementsByTagName("a")
x = x + 1
If .Length Then Cells(x, 1) = .item(0).innerText
End With
Next post
Loop
End Sub
I found a logic to get around with yellowpage. My update script is able to parse yellowpage but breaks before scraping the last page because there is no "Next Page" button. I tried with this:
"https://www.dropbox.com/s/iptqm79b0byw3dz/Yellowpage.txt?dl=0"
However, the same logic I tried to apply with torrent site but it doesn't work here:
"https://www.yify-torrent.org/genres/western/p-1/"
You can always rely on elements if they exits or not. Here for example, if you try to use the object which you have set your element to, you will get:
Run-time error '91': Object variable or With block variable not set
This is the key you should be looking for to put an end to your code. Please see the below example:
Sub yify()
Const mlink = "https://www.yify-torrent.org/genres/western/p-"
Dim http As New XMLHTTP60, html As New HTMLDocument
Dim post As Object
Dim posts As Object
y = 1
Do
With http
.Open "GET", mlink & y & "/", False
.send
html.body.innerHTML = .responseText
End With
Set posts = html.getElementsByClassName("mv")
On Error GoTo Endofpage
Debug.Print Len(posts) 'to force Error 91
For Each post In posts
With post.getElementsByTagName("div")
x = x + 1
If .Length Then Cells(x, 1) = .Item(0).innerText
End With
Next post
y = y + 1
Endofpage:
Loop Until Err.Number = 91
Debug.Print "It's over"
End Sub
I have the following function which returns HTML document for the URL passed. I am using the returned HTML Doc in some other function.
The function works perfectly on Windows 7 but NOT on windows 8 unfortunately. How can I write code which works on Windows 7 and 8 both? I think I need to use a different version of XML HTTP object.
Function GetHtmlDoc(ByVal URL As String) As Object
Dim msg As String
' Reset the Global variables.
PageSrc = ""
Set htmlDoc = Nothing
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", URL, True
.Send
While .readyState <> 4: DoEvents: Wend
a = .statusText
' Check for any server errors.
If .statusText <> "OK" Then
' Check for WinHTTP error.
'msg = GetWinHttpErrorMsg(.Status)
'If msg = "" Then msg = .statusText
' Return the error number and message.
GetPageSource = "ERROR: " & .Status & " - " & msg
Exit Function
End If
' Save the HTML code in the Global variable.
PageSrc = .responseText
End With
' Create an empty HTML Document.
Set htmlDoc = CreateObject("htmlfile")
htmlDoc.Open URL:="text/html", Replace:=False
' Convert the HTML code into an HTML Document Object.
htmlDoc.write PageSrc
' Terminate Input-Output with the Global variable.
htmlDoc.Close
' Return the HTML text of the web page.
Set GetHtmlDoc = htmlDoc
End Function
Example function call:
Set htmlDoc = GetHtmlDoc("http://www.censusdata.abs.gov.au/census_services/getproduct/census/2011/quickstat/POA2155?opendocument&navpos=220")
XMLHTTP no longer likes accessing remote servers from local scripts, switch to ServerXMLHTTP:
With CreateObject("MSXML2.ServerXMLHTTP")
.Open "GET", URL, False
(Using False performs the operation synchronously ngating the need for the readyState loop.)
I had same error turns out my code was working fine, just firewall blocked Excel so it failed.