Parse HTML content in VBA - vba

I have a question relating to HTML parsing. I have a website with some products and I would like to catch text within page into my current spreadsheet. This spreadsheet is quite big but contains ItemNbr in 3rd column, I expect the text in the 14th column and one row corresponds to one product (item).
My idea is to fetch the 'Material' on the webpage which is inside the Innertext after tag. The id number changes from one page to page (sometimes ).
Here is the structure of the website:
<div style="position:relative;">
<div></div>
<table id="list-table" width="100%" tabindex="1" cellspacing="0" cellpadding="0" border="0" role="grid" aria-multiselectable="false" aria-labelledby="gbox_list-table" class="ui-jqgrid-btable" style="width: 930px;">
<tbody>
<tr class="jqgfirstrow" role="row" style="height:auto">
<td ...</td>
<td ...</td>
</tr>
<tr role="row" id="1" tabindex="-1" class="ui-widget-content jqgrow ui-row-ltr">
<td ...</td>
<td ...</td>
</tr>
<tr role="row" id="2" tabindex="-1" class="ui-widget-content jqgrow ui-row-ltr">
<td ...</td>
<td ...</td>
</tr>
<tr role="row" id="3" tabindex="-1" class="ui-widget-content jqgrow ui-row-ltr">
<td ...</td>
<td ...</td>
</tr>
<tr role="row" id="4" tabindex="-1" class="ui-widget-content jqgrow ui-row-ltr">
<td ...</td>
<td ...</td>
</tr>
<tr role="row" id="5" tabindex="-1" class="ui-widget-content jqgrow ui-row-ltr">
<td ...</td>
<td ...</td>
</tr>
<tr role="row" id="6" tabindex="-1" class="ui-widget-content jqgrow ui-row-ltr">
<td ...</td>
<td ...</td>
</tr>
<tr role="row" id="7" tabindex="-1" class="ui-widget-content jqgrow ui-row-ltr">
<td role="gridcell" style="padding-left:10px" title="Material" aria-describedby="list-table_">Material</td>
<td role="gridcell" style="" title="600D polyester." aria-describedby="list-table_">600D polyester.</td>
</tr>
<tr ...>
</tr>
</tbody>
</table> </div>
I would like to get "600D Polyester" as a result.
My (not working) code snippet is as is:
Sub ParseMaterial()
Dim Cell As Integer
Dim ItemNbr As String
Dim AElement As Object
Dim AElements As IHTMLElementCollection
Dim IE As MSXML2.XMLHTTP60
Set IE = New MSXML2.XMLHTTP60
Dim HTMLDoc As MSHTML.HTMLDocument
Dim HTMLBody As MSHTML.HTMLBody
Set HTMLDoc = New MSHTML.HTMLDocument
Set HTMLBody = HTMLDoc.body
For Cell = 1 To 5 'I iterate through the file row by row
ItemNbr = Cells(Cell, 3).Value 'ItemNbr isin the 3rd Column of my spreadsheet
IE.Open "GET", "http://www.example.com/?item=" & ItemNbr, False
IE.send
While IE.ReadyState <> 4
DoEvents
Wend
HTMLBody.innerHTML = IE.responseText
Set AElements = HTMLDoc.getElementById("list-table").getElementsByTagName("tr")
For Each AElement In AElements
If AElement.Title = "Material" Then
Cells(Cell, 14) = AElement.nextNode.value 'I write the material in the 14th column
End If
Next AElement
Application.Wait (Now + TimeValue("0:00:2"))
Next Cell
Thanks for your help !

Just a couple things that hopefully will get you in the right direction:
clean up a bit: remove the readystate property testing loop. The value returned by the readystate property will never change in this context - code will pause after the send instruction, to resume only once the server response is received, or has failed to do so. The readystate property will be set accordingly, and the code will resume execution. You should still test for the ready state, but the loop is just unnecessary
target the right HTML elements: you are searching through the tr elements - while the logic of how you use these elements in your code actually looks to point to td elements
make sure the properties are actually available for the objects you are using them on: to help you with this, try and declare all your variable as specific objects instead of the generic Object. This will activate intellisense. If you have a difficult time finding the actual name of your object as defined in the relevant library in a first place, declare it as the generic Object, run your code, and then inspect the type of the object - by printing typename(your_object) to the debug window for instance. This should put you on your way
I have also included some code below that may help. If you still can't get this to work and you can share your urls - plz do that.
Sub getInfoWeb()
Dim cell As Integer
Dim xhr As MSXML2.XMLHTTP60
Dim doc As MSHTML.HTMLDocument
Dim table As MSHTML.HTMLTable
Dim tableCells As MSHTML.IHTMLElementCollection
Set xhr = New MSXML2.XMLHTTP60
For cell = 1 To 5
ItemNbr = Cells(cell, 3).Value
With xhr
.Open "GET", "http://www.example.com/?item=" & ItemNbr, False
.send
If .readyState = 4 And .Status = 200 Then
Set doc = New MSHTML.HTMLDocument
doc.body.innerHTML = .responseText
Else
MsgBox "Error" & vbNewLine & "Ready state: " & .readyState & _
vbNewLine & "HTTP request status: " & .Status
End If
End With
Set table = doc.getElementById("list-table")
Set tableCells = table.getElementsByTagName("td")
For Each tableCell In tableCells
If tableCell.getAttribute("title") = "Material" Then
Cells(cell, 14).Value = tableCell.NextSibling.innerHTML
End If
Next tableCell
Next cell
End Sub
EDIT: as a follow-up to the further information you provided in the comment below - and the additional comments I have added
'Determine your product number
'Open an xhr for your source url, and retrieve the product number from there - search for the tag which
'text include the "productnummer:" substring, and extract the product number from the outerstring
'OR
'if the product number consistently consists of the fctkeywords you are entering in your source url
'with two "0" appended - just build the product number like that
'Open an new xhr for this url "http://www.pfconcept.com/cgi-bin/wspd_pcdb_cgi.sh/y/y2productspec-ajax.p?itemc=" & product_number & "&_search=false&rows=-1&page=1&sidx=&sord=asc"
'Load the response in an XML document, and retrieve the material information
Sub getInfoWeb()
Dim xhr As MSXML2.XMLHTTP60
Dim doc As MSXML2.DOMDocument60
Dim xmlCell As MSXML2.IXMLDOMElement
Dim xmlCells As MSXML2.IXMLDOMNodeList
Dim materialValueElement As MSXML2.IXMLDOMElement
Set xhr = New MSXML2.XMLHTTP60
With xhr
.Open "GET", "http://www.pfconcept.com/cgi-bin/wspd_pcdb_cgi.sh/y/y2productspec-ajax.p?itemc=10031700&_search=false&rows=-1&page=1&sidx=&sord=asc", False
.send
If .readyState = 4 And .Status = 200 Then
Set doc = New MSXML2.DOMDocument60
doc.LoadXML .responseText
Else
MsgBox "Error" & vbNewLine & "Ready state: " & .readyState & _
vbNewLine & "HTTP request status: " & .Status
End If
End With
Set xmlCells = doc.getElementsByTagName("cell")
For Each xmlCell In xmlCells
If xmlCell.Text = "Materiaal" Then
Set materialValueElement = xmlCell.NextSibling
End If
Next
MsgBox materialValueElement.Text
End Sub
EDIT2: an alternative automating IE
Sub searchWebViaIE()
Dim ie As SHDocVw.InternetExplorer
Dim doc As MSHTML.HTMLDocument
Dim anchors As MSHTML.IHTMLElementCollection
Dim anchor As MSHTML.HTMLAnchorElement
Dim prodSpec As MSHTML.HTMLAnchorElement
Dim tableCells As MSHTML.IHTMLElementCollection
Dim materialValueElement As MSHTML.HTMLTableCell
Dim tableCell As MSHTML.HTMLTableCell
Set ie = New SHDocVw.InternetExplorer
With ie
.navigate "http://www.pfconcept.com/cgi-bin/wspd_pcdb_cgi.sh/y/y2facetmain.p?fctkeywords=100317&world=general#tabs-4"
.Visible = True
Do While .readyState <> READYSTATE_COMPLETE Or .Busy = True
DoEvents
Loop
Set doc = .document
Set anchors = doc.getElementsByTagName("a")
For Each anchor In anchors
If InStr(anchor.innerHTML, "Product Specificatie") <> 0 Then
anchor.Click
Exit For
End If
Next anchor
Do While .readyState <> READYSTATE_COMPLETE Or .Busy = True
DoEvents
Loop
End With
For Each anchor In anchors
If InStr(anchor.innerHTML, "Product Specificatie") <> 0 Then
Set prodSpec = anchor
End If
Next anchor
Set tableCells = doc.getElementById("list-table").getElementsByTagName("td")
If Not tableCells Is Nothing Then
For Each tableCell In tableCells
If tableCell.innerHTML = "Materiaal" Then
Set materialValueElement = tableCell.NextSibling
End If
Next tableCell
End If
MsgBox materialValueElement.innerHTML
End Sub

Not related to tables or Excel ( I use MS-Access 2013) but directly related to the topic title. My solution is
Private Sub Sample(urlSource)
Dim httpRequest As New WinHttpRequest
Dim doc As MSHTML.HTMLDocument
Dim tags As MSHTML.IHTMLElementCollection
Dim tag As MSHTML.HTMLHtmlElement
httpRequest.Option(WinHttpRequestOption_UserAgentString) = "Mozilla/4.0 (compatible;MSIE 7.0; Windows NT 6.0)"
httpRequest.Open "GET", urlSource
httpRequest.send ' fetching webpage
Set doc = New MSHTML.HTMLDocument
doc.body.innerHTML = httpRequest.responseText
Set tags = doc.getElementsByTagName("a")
i = 1
For Each tag In tags
Debug.Print i
Debug.Print tag.href
Debug.Print tag.innerText
'Debug.Print tag.Attributes("any other attributes you need")() ' may return an object
i = i + 1
If i Mod 50 = 0 Then Stop
' or code to store results in a table
Next
End Sub

Related

Setting a web table scraper by Msxml2.ServerXMLHTTP.6.0 on Excel VBA

I need to do a web data scraper.
I need to login to the site: user, password, click login button
Click a second button
Wait for page to load, here is the Table in question. The table is a call log and adds new content dynamically, so it is always refreshing.
I want to exclude a form from the table content and limit the rows pasted to Excel.
I make it work by InternetExplorer.Application code but I need to switch to MSXML2.XMLHTTP code because it is very slow.
Working InternetExplorer.Application Version:
Sub extractTablesData()
'we define the essential variables
Dim IE As Object, obj As Object
Dim r As Integer, c As Integer, t As Integer
Dim elemCollection As Object
'add the "Microsoft Internet Controls" reference in your VBA Project indirectly
Set IE = CreateObject("InternetExplorer.Application")
With IE
.Silent = True
.Visible = True
.navigate ("https://www.clickphone.ro")
' we ensure that the web page downloads completely before we fill the form automatically
While IE.readyState <> 4
DoEvents
Wend
Application.Wait Now + TimeValue("00:00:03")
Set HTMLDoc = IE.document
HTMLDoc.all.user.Value = "user or email" 'Enter your email/user id here
HTMLDoc.all.pass.Value = "xXXxXXXxxXXXxx" 'Enter your password here
'Login Button Click
With IE.document
Set elems = .getElementsByTagName("a")
For Each e In elems
If (e.getAttribute("class") = "orange_button") Then
e.Click
Exit For
End If
Next e
End With
'Needed Table page Button Click https://www.clickphone.ro/account/istoric_apel_in.html
While IE.readyState <> 4
DoEvents
Wend
Set iedoc = IE.document
Set elems = iedoc.getElementsByClassName("black")(12)
elems.Click
' again ensuring that the web page loads completely before we start scraping data
While IE.readyState <> 4
DoEvents
Wend
Application.Wait Now + TimeValue("00:00:05")
Set iedoc = IE.document
'Clearing any unnecessary or old data in Sheet1
ThisWorkbook.Sheets("Sheet1").Range("A1:K1000").ClearContents
'Scrapping Data and past to Sheet1
Set elemCollection = IE.document.getElementsByTagName("table")
For t = 0 To (elemCollection.Length - 1)
For r = 0 To (elemCollection(t).Rows.Length - 1)
For c = 0 To (elemCollection(t).Rows(r).Cells.Length - 1)
ThisWorkbook.Worksheets(1).Cells(r + 1, c + 1) = elemCollection(t).Rows(r).Cells(c).innerText
Next c
Next r
Next t
End With
' cleaning up memory
Set IE = Nothing
End Sub
This is my attempt of MSXMLHTTP:
Option Explicit
'reference to Microsoft Internet Controls
'reference to Microsoft HTML Object Library
Sub Web_Table_Option_One()
Dim xml As Object
Dim html As Object
Dim objTable As Object
Dim result As String
Dim lRow As Long
Dim lngTable As Long
Dim lngRow As Long
Dim lngCol As Long
Dim ActRw As Long
Set xml = CreateObject("MSXML2.XMLHTTP.6.0")
Set html = CreateObject("htmlFile")
With xml
.Open "POST", "https://www.clickphone.ro/login.html", False
.setRequestHeader "Content-type", "application/x-www-form-urlencoded"
.send "userName=USER&password=XXXXxxxxXxxxxXXX"
.Open "GET", "https://www.clickphone.ro/account/istoric_apel_in.html", False
.setRequestHeader "Content-type", "text/xml"
.send
End With
html.body.innerHTML = xml.responseText
Set objTable = html.getElementsByTagName("table")
For lngTable = 0 To objTable.Length - 1
For lngRow = 0 To objTable(lngTable).Rows.Length - 1
For lngCol = 0 To objTable(lngTable).Rows(lngRow).Cells.Length - 1
ThisWorkbook.Sheets("Sheet2").Cells(ActRw + lngRow + 1, lngCol + 1) = objTable(lngTable).Rows(lngRow).Cells(lngCol).innerText
Next lngCol
Next lngRow
ActRw = ActRw + objTable(lngTable).Rows.Length + 1
Next lngTable
End Sub
HTML source code:
For user,pass,login button:
<form action="/login.html" id="toploginform" name="toploginform" method="post">
<script>
function processLoginForm(){
with (document.toploginform) {
if (user.value=="Email"){alert('Email/Parola incorecte!'); return false}
document.getElementById('toploginform').submit();
}
}
</script>
<fieldset>
<input name="userlogin" type="hidden" id="userlogin" value="true" />
<span class="text">
<input name="user" type="text" onFocus="if(this.value=='Email'){this.value=''}" onBlur="if(this.value==''){this.value='Email'}" value="Email">
</span> <span class="text">
<input name="pass" type="password" onFocus="if(this.value=='Password'){this.value=''}" onBlur="if(this.value==''){this.value='Password'}" value="Password">
</span>
<input name="authcode" type="hidden" id="authcode" value="false" />
<span>Login</span>
<span class="links">Am uitat parola<br/>
<input class="css-checkbox" id="checkbox2" type="checkbox" name="rememberpass" value="da" />
<label for="checkbox2" name="checkbox2_lbl" class="css-label lite-orange-check">Retin datele?</label>
</span>
</fieldset>
</form>
Table page button:
<br /> <img src="/images/sageata_orange.gif" width="7" height="8" /> <a class="black" href="/account/istoric_apel_in.html">Apeluri primite</a>
Table source code:
<table class="TabelDate" cellspacing="0">
<thead>
<tr>
<th width="130">Data</th>
<th>Sursa</th>
<th>Destinatie</th>
<th>Durata</th>
<th class="ultima">Status</th>
</tr>
</thead>
<tr class="u"> <td class="prima">19-03-2017 17:31:16</td><td><font color="green"><form name="form24-1489937476.41719" method="post" action="">0720145931 <a class="TipFB" href="#"><span class="tip"><span class="tipTitle">Suna inapoi la 0720145931</span><span class="tipBody">Click si serverul te va suna gratuit pe numarul tau 0371780434.<br />Dupa ce raspunzi se formeaza automat numarul 0720145931.</span><span class="tipArrow"></span></span><input type="image" name="btn_opentextbox" src="/images/phone_small.gif" value="Submit" /></a> <input name="numartel" type="hidden" id="numartel" value="0720145931" /></form></font></td><td align="center"><font color="green">0371780444</font></td><td align="center"><font color="green">00:00:07</font></td>
<td class="ultima" align="center"><font color="green">Apel preluat</font></td></tr> <tr class="gri"> <td class="prima">19-03-2017 17:30:48</td><td><font color="green"><form name="form24-1489937448.41715" method="post" action="">0728409617 <a class="TipFB" href="#"><span class="tip"><span class="tipTitle">Suna inapoi la 0728409617</span><span class="tipBody">Click si serverul te va suna gratuit pe numarul tau 0371780434.<br />Dupa ce raspunzi se formeaza automat numarul 0728409617.</span><span class="tipArrow"></span></span><input type="image" name="btn_opentextbox" src="/images/phone_small.gif" value="Submit" /></a> <input name="numartel" type="hidden" id="numartel" value="0728409617" /></form></font></td><td align="center"><font color="green">0371780655</font></td><td align="center"><font color="green">00:00:07</font></td>
I manage to partially resolve my problem. Now i can login and retrieve the table i need with XmlHttp. I'l post the working code here so every one can use it (i don't take any credits for this code, i did it with help from different forums)
Option Explicit
'reference to Microsoft Internet Controls
'reference to Microsoft HTML Object Library
Sub CallLog()
Dim xml As Object
Dim html As Object
Dim objTable As Object
Dim result As String
Dim lRow As Long
Dim lngTable As Long
Dim lngRow As Long
Dim lngCol As Long
Dim ActRw As Long
Set xml = CreateObject("MSXML2.XMLHTTP.6.0")
Set html = CreateObject("htmlFile")
With xml
.Open "POST", "https://www.XXXXXX.xx/login.html", False
.setRequestHeader "Content-type", "application/x-www-form-urlencoded" 'send appropriate Headers
.send "userlogin=true&user=USERNAME&pass=PASSWORD&authcode=false" ' send login info
'MsgBox xml.responseText
.Open "GET", "https://www.XXXXXX.xx/account/callLog.html", False
.setRequestHeader "Content-type", "text/xml"
.send
End With
html.body.innerHTML = xml.responseText
Set objTable = html.getElementsByTagName("table")
For lngTable = 0 To objTable.Length - 1
For lngRow = 0 To objTable(lngTable).Rows.Length - 1
For lngCol = 0 To objTable(lngTable).Rows(lngRow).Cells.Length - 1
ThisWorkbook.Sheets("Sheet2").Cells(ActRw + lngRow + 1, lngCol + 1) = objTable(lngTable).Rows(lngRow).Cells(lngCol).innerText
Next lngCol
Next lngRow
ActRw = ActRw + objTable(lngTable).Rows.Length + 1
Next lngTable
End Sub
Now i'm left with the two problems...
How can i get the children "table" from the parent "table" (the table i'm after is in a bigger table, see below source code) and i want to get only the first row, but excluding a "form" from the Row (it's a href link)
Source Code
How i can get this continuously (this table is dynamic, it's updating every time some one call me, this first Row, is updating continuously)
Version 2.0 of my working code:
Option Explicit
'reference to Microsoft Internet Controls
'reference to Microsoft HTML Object Library
Sub CallLog()
Dim xml As Object
Dim html As Object
Dim objTable As Object
Dim result As String
Dim lRow As Long
Dim lngTable As Long
Dim lngRow As Long
Dim lngCol As Long
Dim ActRw As Long
Set xml = CreateObject("MSXML2.XMLHTTP.6.0")
Set html = CreateObject("htmlFile")
With xml
.Open "POST", "https://www.XXXXXX.xx/login.html", False
.setRequestHeader "Content-type", "application/x-www-form-urlencoded" 'send appropriate Headers
.send "userlogin=true&user=USERNAME&pass=PASSWORD&authcode=false" ' send login info
'MsgBox xml.responseText
.Open "GET", "https://www.XXXXXX.xx/account/callLog.html", False
.setRequestHeader "Content-type", "text/xml"
.send
End With
html.body.innerHTML = xml.responseText
Set objTable = html.getElementsByTagName("table")
ThisWorkbook.Sheets("LogClickPhone").Range("A2") = objTable(1).Rows(1).Cells(0).innerText
ThisWorkbook.Sheets("LogClickPhone").Range("B2") = objTable(1).Rows(1).Cells(1).innerText
ThisWorkbook.Sheets("LogClickPhone").Range("C2") = objTable(1).Rows(1).Cells(2).innerText
ThisWorkbook.Sheets("LogClickPhone").Range("D2") = objTable(1).Rows(1).Cells(3).innerText
ThisWorkbook.Sheets("LogClickPhone").Range("E2") = objTable(1).Rows(1).Cells(4).innerText
End Sub
I manage to get only the row i need but it's very slow, it takes 38.5 Sec to complete. I think I will better to use MSXML2.DOMDocument.6.0 structure for getting the text i need. But i don't know how to do that.
Question:
How i can automate this code so it's running every 60 sec or so?
Tx

How to get a particular link from a specific class?

I'd like to extract this href from that particular class
<tr class="even">
<td>
Serie A 2015/2016
</td>
This is what I wrote:
Sub ExtractHrefClass()
Dim ie As Object
Dim doc As HTMLDocument
Dim class As Object
Dim href As Object
Set ie = CreateObject("InternetExplorer.Application")
ie.Visible = True
ie.navigate Range("D8")
Do
DoEvents
Loop Until ie.readyState = READYSTATE_COMPLETE
Set doc = ie.document
Set class = doc.getElementsByClassName("even")
Set href = class.getElementsByTagName("a")
Range("E8").Value = href
ie.Quit
End Sub
But unfortunately there is a mistake Object doesn't support this property or method (Error 438) on the line:
Set href = class.getElementsByTagName("a")
UPDATE 1
I modified the code as per #RyszardJędraszyk answer, but no output come out O_o Where am I doing wrong?
Sub ExtractHrefClass()
Dim ie As Object
Dim doc As HTMLDocument
Dim href As Object
Dim htmlEle As Object
Set ie = CreateObject("InternetExplorer.Application")
ie.Visible = True
ie.navigate Range("D8")
Do
DoEvents
Loop Until ie.readyState = READYSTATE_COMPLETE And ie.Busy = False
Set doc = ie.document
Set href = doc.getElementsByTagName("a")
For Each htmlEle In href
If htmlEle.className = "even" Then
Range("E8").Value = htmlEle
End If
Next
ie.Quit
End Sub
UPDATE 2
As #dee requested in comment, there is the code from the web page http://www.soccer24.com/italy/serie-a/archive/
<tbody>
<tr>
<td>
Serie A 2016/2017
</td>
<td></td>
</tr>
<tr class="even">
<td>
Serie A 2015/2016
</td>
<td>
<span class="team-logo" style="background-image: url(/res/image/data/UZbZIMhM-bsGsveSt.png)"></span>Juventus
</td>
</tr>
<tr>
<td>
Serie A 2014/2015
</td>
<td>
<span class="team-logo" style="background-image: url(/res/image/data/UZbZIMhM-bsGsveSt.png)"></span>Juventus
</td>
</tr>
I need only to extract that line: /italy/serie-a-2015-2016/
This worked for me:
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", "http://www.soccer24.com/italy/serie-a/archive/", False
.Send
MsgBox Split(Split(Split(.ResponseText, "<tr class=""even"">", 2)(1), "<a href=""", 2)(1), """", 2)(0)
End With
The procedure you need might look like:
Sub ExtractHrefClass()
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", Range("D8").Value, False
.Send
Range("E8").Value = Split(Split(Split(.ResponseText, "<tr class=""even"">", 2)(1), "<a href=""", 2)(1), """", 2)(0)
End With
End Sub
Try:
Dim href As HTMLObjectElement
Make sure that proper libraries are checked in references (Microsoft HTML Object Library).
Are you sure that doc.getElementsByClassName("even") works? It is not listed here: https://msdn.microsoft.com/en-us/library/aa926433.aspx as available method.
I always first use getElementsByTagName and make a condiction If htmlEle.className = "even" then.
Also add the following: ie.readyState = READYSTATE_COMPLETE and ie.busy = False. Still if it is some AJAX based website it can be not enough to determine that website has fully loaded (from the link guessing it could be flashscore.com where you need to track elements on the website informing about its loading status).
querySelectorAll or querySelector can be used here to select the anchor elemets inside of the tr with the specific class and then with getAttribute("href") the href-attribute can be retrieved. HTH.
' Add reference to Microsoft Internet Controls (SHDocVw)
' Add reference to Microsoft HTML Object Library
Dim ie As Object
Dim name As String
Dim Doc As HTMLDocument
Set ie = New InternetExplorer
ie.Visible = 1
ie.navigate "<URL>"
While ie.Busy Or ie.readyState <> 4
DoEvents
Wend
Set Doc = ie.document
Dim anchors As IHTMLDOMChildrenCollection
Dim anchor As IHTMLAnchorElement
Dim i As Integer
Set anchors = Doc.querySelectorAll("tr[class~='even'] a")
If Not anchors Is Nothing Then
For i = 0 To anchors.Length - 1
Set anchor = anchors.item(i)
If anchor.getAttribute("href") = "/italy/serie-a-2015-2016/" Then
Range("E8").Value = anchor.innerHTML
End If
Next
End If
ie.Quit

Extracting data from a web page using VBA code

I am wrting a visual basic code to automatically extract some weather data. The following is the source code of the web page.
<tr><td>
<table align="center" width="100%" summary="table used for formatting"><tr>
<td align="center"><b>Example:</b></td>
<td align="center" width="40%">Latitude 33.5<br>Longitude -80.75</td>
<td align="center">OR</td>
<td align="center" width="40%">Latitude 33 30<br>Longitude -80 45</td>
</tr></table></td></tr>
<tr><td><table width="100%" summary="table used for formatting"><tr>
<td><b><label for="lat">Latitude? </label>
<input type="text" name="lat" id="lat" size="12" value=""></b></td>
<td width="30%">South: -90 to 0</td>
<td width="30%">North: 0 to 90</td>
</tr><tr>
<td><b><label for="lon">Longitude? </label>
<input type="text" name="lon" id="lon" size="12" value=""></b></td>
<td width="30%">West: -180 to 0</td>
<td width="30%">East: 0 to 180</td>
</tr></table></td></tr>
<tr><td><table align="center" summary="table used for formatting"><tr>
<td><b> <input type="submit" name="submit" value="Submit"> </b></td>
<td><b> <input type="submit" name="submit" value=" Reset "> </b></td>
<td><i> This form is "Reset" if the input is out of range. </i></td>
</tr></table>
</td></tr></table>
</form>
I am getting an error (Object variable or With block variable not set). Could anybody help me on this code? Thank you very much in advance. Here is what I have written:
Sub extractSolData()
Dim IE As Object
Dim latitude As String, longitude As String
Set IE = CreateObject("InternetExplorer.Application")
latitude = InputBox("Enter Latitude of the location")
longitude = InputBox("Enter Longitude of the location")
With IE
.Visible = True
.navigate ("https://eosweb.larc.nasa.gov/cgi-bin/sse/grid.cgi?email=skip#larc.nasa.gov")
While IE.readyState <> 4
DoEvents
Wend
IE.document.getElementsByName(“lat”).Item(0).Value = latitude
IE.document.getElementsByName(“lon”).Item.innertext = longitude
IE.document.getElementsByName("submit").Item.Value = "Submit"
Do While .Busy Or _
.readyState <> 4
DoEvents
Loop
End With
Set IE = Nothing
End Sub
The problem is with the quotation marks you are using in
IE.document.getElementsByName(“lat”).Item(0).Value = latitude
IE.document.getElementsByName(“lon”).Item.innertext = longitude
These are not real quotes, I bet you copied this from a website and somehow the quotations marks got messed up. They need to look like
IE.document.getElementsByName("lat").Item(0).Value = latitude
IE.document.getElementsByName("lon").Item.innertext = longitude
You could have shortened all of that to the following by concatenating the lat on long into URL string:
Option Explicit
Public Sub nota_ong()
Dim latitude As String, longitude As String
latitude = InputBox("Enter Latitude of the location")
longitude = InputBox("Enter Longitude of the location")
If latitude = vbNullString Or longitude = vbNullString Then Exit Sub
With CreateObject("internetexplorer.application")
.Visible = True
.navigate "https://eosweb.larc.nasa.gov/cgi-bin/sse/grid.cgi?email=skip%40larc.nasa.gov&step=1&lat=" & LATITUDE & "&lon=" & LONGITUDE & "&submit=Submit"
While .Busy Or .readyState < 4: DoEvents: Wend
'Do stuff with new page
'Quit '< Uncomment this later
End With
End Sub

VBA Getting results in another url but same window

I'm working with VBA to fill a form in a URL and submiting to get results.
When I submit the form with correct values and submit via VBA, I get results in another URL but on the same window.
The problem is that I don't know how to change html reference to start scrapping data with this new url.
Here is my code:
'to refer to the running copy of Internet Explorer
Dim IE As InternetExplorer
'to refer to the HTML document returned
Dim html As HTMLDocument
'open Internet Explorer in memory, and go to website
Set IE = New InternetExplorer
IE.Visible = False
IE.navigate "http://url..."
'Wait until IE is done loading page
Do While IE.READYSTATE <> READYSTATE_COMPLETE
Application.StatusBar = "Connecting with http://url..."
DoEvents
Loop
'show text of HTML document returned
Set html = IE.document
'close down IE and reset status bar
Set IE = Nothing
Application.StatusBar = ""
Set txtArea = html.getElementsByTagName("textarea")(0)
txtArea.Value = txtArea_data
Set formSubmit = html.getElementsByName("submit")(1)
formSubmit.Click
'-------------Get results
'Dim html_results As HTMLDocument
IE.navigate "http://new_url" 'Im not sure if I must do it this way...
'Wait until IE is done loading page
Do While IE.READYSTATE <> READYSTATE_COMPLETE
Application.StatusBar = "Connecting with http://new_url..."
DoEvents
Loop
Set html = IE.document
Dim trResults As IHTMLElementCollection
Set trResults = html.getElementsByClassName("tr")
MsgBox (trResults.Length) 'At this point, trResults always have 0 results...
Have you any idea to help me?
Thanks!
This is a new approach, and still don't work.
I've used IE.Visible = True, so I could check that I get many <tr> results on the results page.
Dim rowNumber As Long
Dim txtArea_data As String
Dim txtArea As Object
Dim formSubmit As Object
Dim IE As InternetExplorer
Dim html As HTMLDocument
Set IE = New InternetExplorer
IE.Visible = True
IE.navigate "http://url..."
Do While IE.READYSTATE <> READYSTATE_COMPLETE
Application.StatusBar = "Connecting with http://url..."
DoEvents
Loop
Set html = IE.document
rowNumber = 4
For rowNumber = 4 To 120 'Rows.Count
txtArea_data = txtArea_data & Cells(rowNumber, 1).Value & Chr(10)
Next rowNumber
Set txtArea = html.getElementsByTagName("textarea")(0)
txtArea.Value = txtArea_data
Set formSubmit = html.getElementsByName("submit")(1)
formSubmit.Click
'Wait until results
Do While IE.Busy: DoEvents: Loop
'-------------Get results
'At this point, the page URL with results has changed, but on the same tab. In the other code, I've used IE.navigate "http://new_url..." But a "Invalid file" message appears.
'-------------
'I suppose html var could be recharged with these new results data, but nothing happen...
Set html = IE.document
Dim trResults As IHTMLElementCollection
Set trResults = html.getElementsByClassName("tr")
MsgBox (trResults.Length) 'At this point, MsgBox returns O...
Set html = Nothing
Results page looks like:
<body>
<table id="tabla-a">
<thead>
<tr>
<th>...</th>
<th>...</th>
<th>...</th>
</tr>
</thead>
<tbody>
<tr>
<td>...</td>
<td>...</td>
<td>...</td>
</tr>
<tr>
<td>...</td>
<td>...</td>
<td>...</td>
</tr>
<tr>
<td>...</td>
<td>...</td>
<td>...</td>
</tr>
</tbody>
</table>
</body>
Is there anything I could tell you to make to help you finding the problem?
Thanks!
EDIT
Oh my God! I've been using Set trResults = html.getElementsByClassName("tr") instead of Set trResults = html.getElementsByTagName("tr") !!!
Thanks for your time!
What should I do now? Edit the original question with the final solution or close the entire question?
I don't use Stackoverflow too much to ask questions...
Thanks!

VBA If comparison TR\TD Elements

I have a web query to a table that gives me a table like this :
[C] [D] [E]
[A] Events | IP source | IP destination |
[B] MyEvent (1.10) | 192.168.0.1 | 192.168.0.3 |
the Html Code that gives me the info is on this type of schema :
<table class="wrapper">
<tbody>
<tr>
<td width="100%" valign="top">
<center>
<div id="contenpanel">
<table id="contenttable" class="full_table_of_events">
<tbody>
<tr class="content" oncontextmenu="blablabla",( "src=192.168.0.1&dst=192.168.0.2")></tr>
<tr></tr>
<tr></tr>
</tbody>
</table>
</div>
</center>
</td>
<tr>
</tbody>
</table>
Soo i can get the value by the simple c0de .innerText the complete value from the innerText is :
myEvent (1.10) 192.168.0.1 192.168.0.2
I do grab this value cristal clear with msgbox , and i have this c0de in excel that what it does is , see were do you have the mouse on the worksheet and gives to a Label.form the Caption of what you got, i have 3 label.form's 1 for the Event, one for the Source Ip and one for the Destination ip and i try to see if that line that i have exists's on the table with a c0de but nothing happens.
Private Sub Worksheet_SelectionChange(ByVal Target As Range)
Eventi.Caption = ActiveCell.Value
Source.Caption = Cells(ActiveCell.Row, ActiveCell.Column + 1)
Destination.Caption = Cells(ActiveCell.Row, ActiveCell.Column + 2)
End Sub
Sub Extract()
Dim URL As String
Dim IE As InternetExplorer
Dim HTMLdoc As HTMLDocument
Dim TRelements As IHTMLElementCollection
Dim TRelement As HTMLTableRow
Dim Chkb0x As IHTMLElementCollection
URL = "https://localhost/events/index.cgi"
Set IE = New InternetExplorer
With IE
.navigate URL
.Visible = True
While .Busy Or .readyState <> READYSTATE_COMPLETE: DoEvents: Wend
Set HTMLdoc = .document
End With
Set TRelements = HTMLdoc.getElementsByTagName("TR")
MsgBox Eventi.Caption & " " & Source.Caption & " " & Destination.Caption
MsgBox "Search Starting..."
For Each TRelement In TRelements
If Trim(TRelement.innerText) = Eventi.Caption & " " & Source.Caption & " " & Destino.Caption Then
MsgBox "Bingo! You have this event"
End If
Next
End Sub
Well I hope someone can see what i'm doing wrong i just want to know if there is a way to do this.
A few debug points:
Step through your code and manually check each Trim(TRelement.innerText) value for validity.
Ensure the Eventi.Caption value is exactly "myEvent (x.xx)" including the space, parenthesis and case.
I'm not familiar with HTMLdoc.getElementsByTagName("TR"), but ensure it's not case sensitive.
As #Dick suggested, check for any non-spaces that should be a spce using asc(mid(TRelement.innertext),8,1)), where 8 is where you think a space is. If it comes back with something other than 32 (ascii for a space), you won't get a match in your If statement.