I cant seem to locate an answer to this anywhere through searches...
I am trying to iterate through a list on a webpage using vba and then use the data in excel.
Accessing the webpage is fine, locating the correct div is fine but I cannot find how to iterate through the list.
What I am trying is:
Sub getdata()
Dim ie As InternetExplorer
Dim html As HTMLDocument
Set ie = New InternetExplorer
ie.Visible = False
ie.navigate "http://www.springfieldeducationalfurniture.co.uk/products/60-Chair-Trolley/11116/"
Do While ie.READYSTATE <> READYSTATE_COMPLETE
Application.StatusBar = "Attempting connection ..."
DoEvents Loop
Set html = ie.document
Set ie = Nothing
Application.StatusBar = ""
Dim content Set content = html.getElementsByClassName("tabs__content")
For Each bullet In content
'tried this
IHtml = bullet.innerHTML'this gives the whole div not sure how to convert to a string
'and this but get "Run-time error '438': Object doesn't support this property or method"
IHtml = bullet.getElementsByTagName("li")
Next
End Sub
They HTML I am after is as follows, I am wanting to iterate through the <ul> in the <div class="tabs__content"> and assign the content i.e. "Requires simple self assembly" to a cell in excel (once I read the data from the list, the rest is easy):
<div class="tabs">
<div class="container">
<ul class="tabs__nav">
<li class="is-active background-grey-lighter">
Description
</li>
<li class="background-grey-light">
Delivery
</li>
</ul>
</div>
<div class="tabs__tab tabs__tab--product-info is-active">
<div class="tabs__title">
Information
</div>
<div class="tabs__content">
<div class="container">
<p>
60 Chair Trolley</p>
<ul>
<li>
Requires simple self assembly</li>
<li>
Non marking wheels </li>
<li>
Heavy duty lockable castors</li>
<li>
Black frame</li>
<li>
Vertical / hanging chair storage</li>
<li>
Does not fit through a single doorway</li>
<li>
Fits through double doors when fully loaded</li>
<li>
Dimensions: W780 x L1770 x H1340mm</li>
</ul>
<p>
Code: Y16527<br />
</p>
</div>
</div>
</div>
<div class="tabs__tab tabs__tab--product-info ">
<div class="tabs__title">
Delivery
</div>
<div class="tabs__content">
<div class="container">
<p>
Please <span style="color: rgb(0, 0, 255);">contact us</span> for delivery information.</p>
</div>
</div>
</div>
</div>
And this targets the class you mentioned. Requires reference to HTML Object library and Microsoft XML (your version)
Option Explicit
Sub Getinfo2()
Dim http As New XMLHTTP60
Dim html As New HTMLDocument
With http
.Open "GET", "http://www.springfieldeducationalfurniture.co.uk/products/60-Chair-Trolley/11116/", False
.send
html.body.innerHTML = .responseText
End With
Dim posts As MSHTML.IHTMLElementCollection
Dim post As MSHTML.IHTMLElement
Set posts = html.getElementsByClassName("tabs__content")(0).getElementsByTagName("li")
For Each post In posts
Debug.Print post.innerHTML
Next post
End Sub
Output:
This gets the html for all the li elements
Option Explicit
Sub Getinfo2()
Dim http As New XMLHTTP60
Dim html As New HTMLDocument
With http
.Open "GET", "http://www.springfieldeducationalfurniture.co.uk/products/60-Chair-Trolley/11116/", False
.send
html.body.innerHTML = .responseText
End With
Dim posts As MSHTML.IHTMLElementCollection
Dim post As MSHTML.IHTMLElement
Set posts = html.getElementsByTagName("li")
For Each post In posts
Debug.Print post.innerHTML
Next post
End Sub
Here is an alternative option that doesn't require any reference's to library's (Late binding). It also show a different way of looping through the class, as well as the LI's.
Sub getData()
Dim ie As Object
Dim li As Object
Dim tabsClass As Object
'Late Binding
Set ie = CreateObject("InternetExplorer.Application")
On Error GoTo Catch
ie.Visible = False
ie.navigate "http://www.springfieldeducationalfurniture.co.uk/products/60-Chair-Trolley/11116/"
While ie.ReadyState <> 4 Or ie.Busy: DoEvents: Wend
'LOOP EACH CLASS ELEMENT
For Each tabsClass In ie.Document.getElementsByClassName("tabs__content")
'LOOP EACH LI WITHIN THAT CLASS
For Each li In tabsClass.getElementsByTagName("li")
Debug.Print li.innertext
Next li
Next tabsClass
'CLOSE INSTANCE OF IE
Catch:
ie.Quit
Set ie = Nothing
End Sub
Related
I am not new to VBA but I am new to coding anything that interacts with the web. I can open the web page which then has several icons. I need help on clicking a specific icon.
Here is the HTML that appears when I right click the desired icon and click "Inspect element":
<div title="" class="myapps-myapp resource" position="N" index="22">
<div class="myapps-icon-background"></div>
<a class="myapps-icon" href="#">
<img class="iconImage" alt="SomeName"
src="Resources/Icon/aklhdjQ2QWJGVVQxcHpUcEJ5RG5FcEZwcytzPQ--
?size=48"
iconid="akltdjStWJGVVRxcHpUcEJ7QG3FcEZwtytzMT">
</a>
<div class="myapps-status"></div>
<div class="myapps-name">SomeName</div>
</div>
Here is my VBA code that I have so far:
Sub test()
Dim oHTML_Element As IHTMLElement
Dim oBrowser As Internet Explorer
Dim objIE As Variant
Set objIE = Create Object("InternetExplorer.Application")
objIE.navigate "http://The webpage goes here" 'This open the site
While objIE.readyState <> READYSTATE_COMPLETE And objIE.readyState & _
<> READYSTATE_LOADED
DoEvents
Wend
For Each oHTML_Element In objIE.document.getElementsByName & _
("SomeName")
oHTML_Element.Click 'This does not work!
Next
End Sub
Any help would be greatly appreciated. Thanks!
It is the value of the alt attribute you want to match on. No loop.
objIe.document.querySelector("[alt='RevenueCycle P0657 YAVA_AZ']").click
Though it may actually be the a tag you want to click
objIe.document.querySelector("[index='22'] .myapps-icon").click
Read about:
css selectors
querySelector
I've written a script in vba using IE to get the data from a webpage. The data are not stored within any table, I meant there is no table,tr or td tag. However, they look like to be in a tabular format. You can see the below image for clarity.
What I've tried so far can get the data in a single line like:
$4,085
$1,620
$1,435
$35
$1,125
$905
How I wish to get them is like:
$4,085 $1,620
$1,435 $35
$1,125 $905
In other languages there is an option for list comprehension using which I can handle it in a single line of code but in case of vba I get stuck.
html elements within which the data are (It's just a chunk of the whole):
<ul id="tco_detail_data">
<li>
<ul class="list-title">
<li class="first"> </li>
<li>Year 1</li>
<li>Year 2</li>
<li>Year 3</li>
<li>Year 4</li>
<li>Year 5</li>
<li class="last">5 Yr Total</li>
</ul>
</li>
<hr class="loose-dotted">
<li class="first">
<ul class="first">
<li class="first">Depreciation</li>
<li>$4,085</li>
<li>$1,620</li>
<li>$1,425</li>
<li>$1,263</li>
<li>$1,133</li>
<li class="last">$9,526</li>
</ul>
</li>
</ul>
The data look like in that page:
This is what I've attempted so far:
Sub Get_Information()
Dim IE As New InternetExplorer, HTML As HTMLDocument
Dim post As Object
With IE
.Visible = False
.Navigate "https://www.edmunds.com/ford/escape/2017/cost-to-own/?zip=43215"
While .Busy = True Or .ReadyState < 4: DoEvents: Wend
Set HTML = .Document
End With
Application.Wait Now + TimeValue("00:00:05") 'waiting for the items to be available
For Each post In HTML.getElementById("tco_detail_data").getElementsByTagName("li")
Debug.Print post.innerText
Next post
IE.Quit
End Sub
Reference to add to the library to execute the above script:
Microsoft Internet Controls
Microsoft HTML Object Library
This works using CSS selector. Updated to remove explicit wait.
The selector is:
#tco_detail_data > li
Which is the li within id of tco_detail_data
Which looks like the following sample results from webpage using CSS query
Code:
Option Explicit
Public Sub Get_Information()
Dim IE As New InternetExplorer
With IE
.Visible = False
.navigate "https://www.edmunds.com/ford/escape/2017/cost-to-own/?zip=43215"
While .Busy = True Or .readyState < 4: DoEvents: Wend
End With
Dim a As Object, exitTime As Date
exitTime = Now + TimeSerial(0, 0, 5)
Do
DoEvents
On Error Resume Next
Set a = IE.document.querySelectorAll("#tco_detail_data")
On Error GoTo 0
If Now > exitTime Then Exit Do
Loop While a Is Nothing
If a Is Nothing Then Exit Sub
Dim resultsNodeList As Object, i As Long, arr() As String
Set resultsNodeList = HTML.querySelectorAll("#tco_detail_data > li")
With ActiveSheet
For i = 0 To 9
arr = Split(resultsNodeList(i).innerText, Chr$(10))
.Cells(i + 1, 1).Resize(1, UBound(arr) + 1).Value = arr
Next
End With
IE.Quit
End Sub
Result in sheet
Additional info:
The array part is because resultsNodeList(i).innerText returns as a "stacked string" - i.e. with line breaks in between; See image below. I split on those, to produce an array, which I then write out to the sheet. The array is 0 based, so I have to add 1 to be able to populate the range properly.
Apart from what QHarr has already shown, there is another way the same goal can be achieved:
Sub Get_Information()
Dim IE As New InternetExplorer, HTML As HTMLDocument
Dim posts As Object, post As Object, oitem As Object
Dim R&, C&, B As Boolean
With IE
.Visible = False
.Navigate "https://www.edmunds.com/ford/escape/2017/cost-to-own/?zip=43215"
Do While .Busy = True Or .ReadyState <> 4: DoEvents: Loop
Set HTML = .Document
End With
''no hardcoded delay is required. The following line should take care of that
Do: Set oitem = HTML.getElementById("tco_detail_data"): DoEvents: Loop While oitem Is Nothing
For Each posts In oitem.getElementsByTagName("li")
C = 1: B = False
For Each post In posts.getElementsByTagName("li")
Cells(R + 1, C).Value = post.innerText
C = C + 1: B = True
Next post
If B Then R = R + 1
Next posts
IE.Quit
End Sub
So far, while writing code in VBE to make a parser I have used the "img" tag and the "src" attribute to scrape an image but I stumbled trying to go through the portion I'm pasting below. Can't filter the portion I need to use in my code to parse an Image.
Set topics = html.getElementsByClassName("card card-lg")
For i = 0 To topics.Length - 1
Set topic = topics(i)
Cells(x, 1).Value = topic.getElementsByClassName("wine-card__image-wrapper")(0).getElementsByTagName("img")(0).src
x = x + 1
Next i
And a sample of the HTML I'm working with:
<div class="wine-card__image-wrapper">
<a href="/wineries/tschida/wines/angerhof-eiswein-gruner-veltliner-2012">
<figure class="wine-card__image" style="background-image: url(//images.vivino.com/thumbs/qlER3oggQVKh1FZn7YGxZg_375x500.jpg)">
<div class="image-inner"></div>
</figure>
</a>
</div>
You can access the "style" attribute I believe, so
Sub t()
Dim ie As SHDocVw.InternetExplorer
Dim d As MSHTML.HTMLDocument
Dim dv As MSHTML.HTMLDivElement
Dim ha As MSHTML.IHTMLElement
Set ie = New SHDocVw.InternetExplorer
ie.Visible = True
ie.navigate "https://www.vivino.com/explore?e=eJzLLbI11jNVy83MswWSiRW2RgZqyZW26Ulq5SXRsbaGAKA_Cdk%3D"
While ie.Busy Or ie.readyState <> READYSTATE_COMPLETE: DoEvents: Wend
Set d = ie.document
Set e = d.getElementsByClassName("wine-card__image-wrapper")(0)
Set ha = e.Children(0).Children(0)
Debug.Print ha.Style.backgroundImage
End Sub
I'd like to extract this href from that particular class
<tr class="even">
<td>
Serie A 2015/2016
</td>
This is what I wrote:
Sub ExtractHrefClass()
Dim ie As Object
Dim doc As HTMLDocument
Dim class As Object
Dim href As Object
Set ie = CreateObject("InternetExplorer.Application")
ie.Visible = True
ie.navigate Range("D8")
Do
DoEvents
Loop Until ie.readyState = READYSTATE_COMPLETE
Set doc = ie.document
Set class = doc.getElementsByClassName("even")
Set href = class.getElementsByTagName("a")
Range("E8").Value = href
ie.Quit
End Sub
But unfortunately there is a mistake Object doesn't support this property or method (Error 438) on the line:
Set href = class.getElementsByTagName("a")
UPDATE 1
I modified the code as per #RyszardJędraszyk answer, but no output come out O_o Where am I doing wrong?
Sub ExtractHrefClass()
Dim ie As Object
Dim doc As HTMLDocument
Dim href As Object
Dim htmlEle As Object
Set ie = CreateObject("InternetExplorer.Application")
ie.Visible = True
ie.navigate Range("D8")
Do
DoEvents
Loop Until ie.readyState = READYSTATE_COMPLETE And ie.Busy = False
Set doc = ie.document
Set href = doc.getElementsByTagName("a")
For Each htmlEle In href
If htmlEle.className = "even" Then
Range("E8").Value = htmlEle
End If
Next
ie.Quit
End Sub
UPDATE 2
As #dee requested in comment, there is the code from the web page http://www.soccer24.com/italy/serie-a/archive/
<tbody>
<tr>
<td>
Serie A 2016/2017
</td>
<td></td>
</tr>
<tr class="even">
<td>
Serie A 2015/2016
</td>
<td>
<span class="team-logo" style="background-image: url(/res/image/data/UZbZIMhM-bsGsveSt.png)"></span>Juventus
</td>
</tr>
<tr>
<td>
Serie A 2014/2015
</td>
<td>
<span class="team-logo" style="background-image: url(/res/image/data/UZbZIMhM-bsGsveSt.png)"></span>Juventus
</td>
</tr>
I need only to extract that line: /italy/serie-a-2015-2016/
This worked for me:
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", "http://www.soccer24.com/italy/serie-a/archive/", False
.Send
MsgBox Split(Split(Split(.ResponseText, "<tr class=""even"">", 2)(1), "<a href=""", 2)(1), """", 2)(0)
End With
The procedure you need might look like:
Sub ExtractHrefClass()
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", Range("D8").Value, False
.Send
Range("E8").Value = Split(Split(Split(.ResponseText, "<tr class=""even"">", 2)(1), "<a href=""", 2)(1), """", 2)(0)
End With
End Sub
Try:
Dim href As HTMLObjectElement
Make sure that proper libraries are checked in references (Microsoft HTML Object Library).
Are you sure that doc.getElementsByClassName("even") works? It is not listed here: https://msdn.microsoft.com/en-us/library/aa926433.aspx as available method.
I always first use getElementsByTagName and make a condiction If htmlEle.className = "even" then.
Also add the following: ie.readyState = READYSTATE_COMPLETE and ie.busy = False. Still if it is some AJAX based website it can be not enough to determine that website has fully loaded (from the link guessing it could be flashscore.com where you need to track elements on the website informing about its loading status).
querySelectorAll or querySelector can be used here to select the anchor elemets inside of the tr with the specific class and then with getAttribute("href") the href-attribute can be retrieved. HTH.
' Add reference to Microsoft Internet Controls (SHDocVw)
' Add reference to Microsoft HTML Object Library
Dim ie As Object
Dim name As String
Dim Doc As HTMLDocument
Set ie = New InternetExplorer
ie.Visible = 1
ie.navigate "<URL>"
While ie.Busy Or ie.readyState <> 4
DoEvents
Wend
Set Doc = ie.document
Dim anchors As IHTMLDOMChildrenCollection
Dim anchor As IHTMLAnchorElement
Dim i As Integer
Set anchors = Doc.querySelectorAll("tr[class~='even'] a")
If Not anchors Is Nothing Then
For i = 0 To anchors.Length - 1
Set anchor = anchors.item(i)
If anchor.getAttribute("href") = "/italy/serie-a-2015-2016/" Then
Range("E8").Value = anchor.innerHTML
End If
Next
End If
ie.Quit
I read so many answers to my problem but somehow if I try to "mimic" what I see, I still am not able to do what I need.
The problem is very simple: fill an inputbox on an opened IE page.
Result: the code gets stuck on the line with getelementbyid showing runtime error 424 (object required).
Private Sub AddInfoFromIntranet()
Dim ie As Object
Set ie = CreateObject("internetexplorer.application")
Application.SendKeys "{ESC}" ' I need this to ignore a prompt
With ie
.Visible = True
.navigate "{here goes the address of my website}"
Do Until Not .Busy And .readyState = 4
DoEvents
Loop
.document.getelementbyid("Nachnamevalue").Value = "{here goes whar I want to insert}"
End With
Set ie = Nothing
End Sub
Internet Explorer libraries were naturally imported (otherwise the "internetexplorer.application" wouldn't work.
I am positive that the field I want to fill is called "Nachnamevalue" as from what I learned this morning taking a look around the internet.
The html code of my webpage (only the interesting piece) looks like this:
<!DOCTYPE html>
<html>
<head>
<title></title>
<style>
'{here there are info on the style, which i'm gonna ignore}
</style>
</head>
<body bgcolor="#ffffcc"><table width="1000"><tbody><tr><td>
<form name="Suchform" action="index.cfm" method="get" target="bottom_window">
Nachname:
<select name="Nachnamepulldown" class="font09px" onchange="wait_and_search()">
<option value="BEGINS_WITH">beginnt mit
<option value="EQUAL">ist
<option value="CONTAINS">enthält
</option></select>
<input name="Nachnamevalue" onkeyup="wait_and_search()" type="text" size="8">
Abteilung:
<select name="Abteilungpulldown" class="font09px" onchange="wait_and_search()">
<option value="BEGINS_WITH">beginnt mit
<option value="EQUAL">ist
<option value="CONTAINS">enthält
</option></select>
<input name="Abteilungvalue" onkeyup="wait_and_search()" type="text" size="3">
<input name="fuseaction" type="hidden" value="StdSearchResult">
<input type="submit" value="suchen">
<script language="JavaScript" type="text/JavaScript">
document.Suchform.Nachnamevalue.focus();
</script>
</form>
</td></tr></tbody></table></body>
</html>
There is also (I don't know if it can help) an "embedded" javascript that brings results of a search up every time at least 2 characters in the "Nachnamevalue" inputbox are written.
What am I doing wrong?
EDIT:
When I try to execute the Sub step-by-step, I get the following:
Set Doc = ie.document
? Doc
[object HTMLDocument]
( in the watchlist it is an object without any variables inside )
GetElementById gets an element by its id attribute, but "Nachnamevalue" is the value of the name attribute.
To use the name:
.document.Forms("Suchform").Elements("Nachnamevalue").value = "xxx"
This worked for me. The code uses HTML from your question in file c:\Temp\page1.html.
Option Explicit
' Add reference to Microsoft Internet Controls
' Add reference to Microsoft HTML Object Library
Sub AddInfoFromIntranet()
Dim ie As SHDocVw.InternetExplorer
Dim doc As MSHTML.HTMLDocument
Dim elements As MSHTML.IHTMLElementCollection
Dim nachnameValueInput As MSHTML.HTMLInputElement
Set ie = New SHDocVw.InternetExplorer
With ie
.Visible = True
.navigate "c:\Temp\page1.html"
Do Until Not .Busy And .readyState = 4
DoEvents
Loop
Set doc = .document
Set elements = doc.getElementsByName("Nachnamevalue")
If Not elements Is Nothing Then
Set nachnameValueInput = elements(0)
If Not nachnameValueInput Is Nothing Then _
nachnameValueInput.Value = "{here goes whar I want to insert}"
End If
.Quit
End With
Set ie = Nothing
End Sub
To check the names of all input elements which exist at the momonet you execute the VBA code on the page you could use getElementsByTagName("input").
Set elements = doc.getElementsByTagName("input")
If Not elements Is Nothing Then
Dim inputElement
For Each inputElement In elements
Debug.Print inputElement.Name
Next inputElement
End If
You can try cycling all input and selecting the one is named as you need:
Set Elements = IE.document.getelementsbytagname("Input")
For Each Element In Elements
If Element.Name = "Nachnamevalue" Then
Element.Value = {Here your value}
Exit For
End If
Next Element