How to pull data from a website with Excel VBA - vba

I am trying to write VBA codes to pull the price of a product from a website. In order to this, I turned on the "Microsoft HTML Object Library" and "Microsoft Internet Controls" in VBA References. However, when I get up to the point to search the of the item that attaches the price, the codes failed. Appreciate if anyone can provide a solution for it.
Below is the link to the sample webpage that I would like to copy price from.
Link
Below is my initial codes:
Sub Update()
Dim IE As New InternetExplorer
IE.Visible = False
IE.navigate "http://www.chemistwarehouse.com.au/buy/36985/Reach-Dentotape-Waxed-20m"
Do
DoEvents
Loop Until IE.readyState = READYSTATE_COMPLETE
Dim Doc As HTMLDocument
Set Doc = IE.document
Dim getprice As String
getprice = Trim(Doc.getElementsByTagName("div class="Price" itemprop="price"").innerText)
Worksheets("Sheet1").Range(C1).Value = getprice
End Sub

The function getElementsByTagName() requires a tag name only as parameter:
e.g. getElementsByTagName("div")
Try getElementsByClassName() instead:
getprice = Trim(Doc.getElementsByClassName("Price").Item.innerText)

There were a few issues with the above code.
Issue 1
getprice = Trim(Doc.getElementsByTagName("div class="Price" itemprop="price"").innerText)
This:
div class="Price" itemprop="price" isn't a TagName. TagNames are things like Input, IMG, Anchors, etc. However, we can see the Class attribute for the price element you are interested in. We can change how we select this element by doing:
getprice = Trim(Doc.getElementsByClassName("Price")(0).innerText)
You may notice (0) at the end of the element selection. This is to indicate which element is being selected of the Price ClassName collection. getElementsByClassName returns multiple elements, the first element being 0.
Issue 2
Worksheets("Sheet1").Range(C1).Value = getprice
I don't C1 referenced anywhere. One way to reference a specific cell, is to use a String to represent the range. From your code this becomes:
Worksheets("Sheet1").Range("C1").Value = getprice

Related

How to Click an href using an Excel VBA

I am very new to VBA and had a question regarding how to click an href link in Internet Explorer. There are multiple href's on the source page. I have never encountered this and it has been giving me a hard time! I have looked on this website searching for answers but decided to ask here.
Below I have listed the code I have, up to the point where I encounter the problem, as well as the Source Code on Internet Explorer.
I commented out what I have tried and listed the error I received.
Code Below:
Sub ()
Dim i As Long
Dim URL As String
Dim IE As Object
Dim objElement As Object
Dim objCollection As Object
User = "User"
Pwd = "Pwd"
Set IE = CreateObject("InternetExplorer.Application")
IE.Visible = True
URL = "URL.com"
IE.Navigate URL
Do While IE.ReadyState <> 4
DoEvents
Loop
IE.Document.getElementById("txtUsername").Value = User
IE.Document.getElementById("txtPassword").Value = Pwd
IE.Document.getElementById("btnSubmit").Click
'IE.getElementByClassName("txtTerms").Click - Runtime Error 438
'IE.getElementByTagName("Claims Management").Click - Runtime Error 438
'Set HREF = IE.Document.getElementsByClassName("txtTerms")
'For Each HREF In IE.Document.getElementsByTagName("Claims").Click - No error occurs, nothing happens.
End Sub
Internet Explorer Source Code:
<table id="tblContent">
<tr>
<td class="txtTerms"><a href='href url 1'>Claims</a>
<br>Download<br>Create<br><a class='terms' href='href url 2'
target='terms'>Terms</a><br><br></td>
</tr>
My question would be, how to get VBA to click only on 'href url 1'?
Let me know if any additional information is needed. I apologize for my level of VBA but I am excited to learn more!
Thanks for the help!
In HTML, href is a property of the type <a> (link) which contains an absolute or relative path.
For example:
Questions
... will show as "Questions" and, if you click it, will bring you to www.stackoverflow.com/questions/. Note that "www.stackoverflow.com" has been added automatically since the path is relative.
Facebook
... will show as "Facebook" and, if you click it, will bring you to www.facebook.com. In this case, the path is absolute.
Although your HTML code is incomplete, I guess that all the links you want to navigate are contained in the table having id="tblContent". If that's the case, then you can get all the links (tagName == 'a') in that table and store the values in a collection:
Dim allHREFs As New Collection
Set allLinks = IE.Document.getElementById("tblContent").getElementsByTagName("a")
For Each link In allLinks
allHREFs.Add link.href
Next link
You can then decide to navigate them and do what you have to do one by one:
For j = 1 To allHREFs.Count
IE.Navigate URL + allHREFs(j) '<-- I'm assuming hrefs are relative.
'do your stuff here
Next href

Clicking a button in IE using VBA

I am using Excel VBA to try click a button on a site, here's the code from the site using inspect element:
<button class="_ah57t _84y62 _frcv2 _rmr7s">ClickHere</button>
And here's what i'm doing in VBA:
Sub testcode()
Dim ie As InternetExplorer
Dim html As HTMLDocument
Set ie = New InternetExplorer
ie.Visible = True
ie.Navigate "somesite.com"
Do While ie.READYSTATE <> READYSTATE_COMPLETE
DoEvents
Loop
Dim e
Set e = ie.Document.getElementsByClassName("_ah57t _84y62 _frcv2 _rmr7s")
e.Click
End Sub
Using the debug I found that the code seems to be storing something called "[object]" in the variable e and and then gives a Runtime error '438' when it gets to e.click. I have even tried using .Focus first, but get the same error. Any ideas?
The getElementsByClassName() function returns a collection not a single element. You need to specify an index on the returned collection in order to return a single element. If there is only one element within the class you can simply use:
ie.Document.getElementsByClassName("_ah57t _84y62 _frcv2 _rmr7s")(0).Click
The (0) specifies the index of the element within the collection returned from the class.
Its easy to tell whether a function returns a collection or single element:
getElementBy... - Returns a single element.
getElementsBy... - Returns a collection of elements.

VBA to change dropdown value in internet explorer

I am looking to automate internet explorer using Excel VBA to extract football results from a website and am really struggling with getting the data to update when I change the dropdown value.
The website is: http://www.whoscored.com/Regions/250/Tournaments/30/Seasons/3871/Stages/8209/Fixtures/Europe-UEFA-Europa-League-2013-2014
I am looking to change the value of the 'stages' dropdown and scrape the match results.
My code works fine for opening IE, changing the value of the 'scrape' dropdown but I can't get the data to update. Whilst I am comfortable with VBA I know very little about HTML and Javascript although I can guess what some lines are doing. From what I can see there is javascript code that handles the change event, I just can't see how to get it to run - I have tried firing the 'onchange' event in my code as suggested from my searches but I can't get it to work.
This is the code I can see that controls the drop down (I have deleted a lot of the dropdown values for other dropdowns as it made this post even longer:
<div id="breadcrumb-nav">
.
.
<span><select id="stages" name="stages"><option selected="selected" value="/Regions/250/Tournaments/30/Seasons/3871/Stages/8209">Europa League Group Stages</option>
<option value="/Regions/250/Tournaments/30/Seasons/3871/Stages/7816">Europa League Qualification</option>
<option value="/Regions/250/Tournaments/30/Seasons/3871/Stages/8158">Europa League Grp. A</option>
<option value="/Regions/250/Tournaments/30/Seasons/3871/Stages/8159">Europa League Grp. B</option>
.
.
<option value="/Regions/250/Tournaments/30/Seasons/3871/Stages/8466">Europa League</option>
</select></span>
</div>
<script type="text/javascript">
$('#breadcrumb-nav select').change(function () {
NG.GA.trackEvent('BreadcrumbNav', this.id);
window.location.href = this.value;
// TODO: Disable all selects?
});
</script>
my code:
Sub ScrapeData()
Dim ie As InternetExplorer
Dim URL As String
URL = "http://www.whoscored.com/Regions/250/Tournaments/30/Seasons/3871/Stages/8466/Fixtures/Europe-UEFA-Europa-League-2013-2014"
Set ie = New InternetExplorer
ie.Visible = True
ie.navigate (URL)
Do
DoEvents
Loop Until ie.readyState = 4
SelectValue ie, "/Regions/250/Tournaments/30/Seasons/3871/Stages/7816"
SelectValue ie, "/Regions/250/Tournaments/30/Seasons/3871/Stages/8209"
End Sub
Sub SelectValue(ByVal ie As InternetExplorer, ByVal value As String)
Dim htmlDoc As HTMLDocument
Dim ddStages As HTMLSelectElement
Dim idBreadCrumb As Object
Set htmlDoc = ie.document
With ie.document
Set idBreadCrumb = .getelementbyid("breadcrumb-nav")
Set ddStages = .getelementbyid("stages")
End With
ddStages.value = value
ddStages.FireEvent ("onchange")
'fireevent on ddStages didn't work so tried here too
idBreadCrumb.FireEvent ("onchange")
Do
DoEvents
Loop Until ie.readyState = 4
End Sub
Any help would be really appreciated.
There must be some JavaScript executing on the event "the select element has changed its value". My suggestion, much easier than executing the JavaScript, is to just navigate the link (because what the JS does here is just changing the HTML page you are seeing, and not the elements within the same webpage).
So, for example, I would just replace this:
SelectValue ie, "/Regions/250/Tournaments/30/Seasons/3871/Stages/7816"
with this
ie.Navigate "http://www.whoscored.com/" & "/Regions/250/Tournaments/30/Seasons/3871/Stages/7816"
to get the exactly same result.

Using MSXML to fetch data from website

I am trying to use the following code to geocode a bunch of cities from this website: mygeoposition.com but there seems to be some sort of issue and the variable 'Lat' in the following code always returns empty:
Sub Code()
Dim IE As MSXML2.XMLHTTP60
Set IE = New MSXML2.XMLHTTP60
IE.Open "GET", "http://mygeoposition.com/?q=Chuo-ku, Osaka", False
IE.send
While IE.ReadyState <> 4
DoEvents
Wend
Dim HTMLDoc As MSHTML.HTMLDocument
Dim htmlBody As MSHTML.htmlBody
Set HTMLDoc = New MSHTML.HTMLDocument
Set htmlBody = HTMLDoc.body
htmlBody.innerHTML = IE.responseText
Lat = HTMLDoc.getElementById("geodata-lat").innerHTML
IE.abort
End Sub
I have another code that uses the browser to do the same thing and it works fine with that but it gets quite slow. When I use this code with MSXML, it just doesn't work. Apologies I am a newbie with using VBA for data extraction from website. Please help.
The response contains no content in the geodata-lat element. It appears that client side code is getting that data so your response only is looking at the html that the server generated. I tried this out myself and this is the section of the response you are looking for. You can see it is empty:
If you try an element that has content (geodata-kml-button), it does pull in a value ("Download KML file"). Ignore the ByteArrayToString() call, that was just me testing:
If they don't have a real API then I don't think you can get your data this way.

Excel with VBA - XmlHttp to use div

I am using excel with VBA to open a page and extract some information and putting it in my database. After some research, I figured out that opening IE obviously takes more time and it can be achieved using XmlHTTP. I am using the XmlHTTP to open a web page as proposed in my another question. However, while using IE I was able to navigate through div tags. How can I accomplish the same in XmlHTTP?
If I use IE to open the page, I am doing something like below to navigate through multiple div elements.
Set openedpage1 = iedoc1.getElementById("profile-experience").getElementsbyClassName("title")
For Each div In openedpage1
---------
However, with XmlHttp, I am not able to do like below.
For Each div In html.getElementById("profile-experience").getElementsbyClassName("title")
I am getting an error as object doesn't support this property or method.
Take a look at this answer that I had posted for another question as this is close to what you're looking for. In summary, you will:
Create a Microsoft.xmlHTTP object
Use the xmlHTTP object to open your url
Load the response as XML into a DOMDOcument object
From there you can get a set of XMLNodes, select elements, attributes, etc. from the DOMDocument
The XMLHttp object returns the contents of the page as a string in responseText. You will need to parse this string to find the information you need. Regex is an option but it will be quite cumbersome.
This page uses string functions (Mid, InStr) to extract information from the html-text.
It may be possible to create a DOMDocument from the retreived HTML (I believe it is) but I haven't pursued this.
As mentioned with the answers above put the .responseText into an HTMLDocument and then work with that object e.g.
Option Explicit
Public Sub test()
Dim html As HTMLDocument
Set html = New HTMLDocument
With CreateObject("WINHTTP.WinHTTPRequest.5.1")
.Open "GET", "http://www.someurl.com", False
.send
html.body.innerHTML = .responseText
End With
Dim aNodeList As Object, iItem As Long
Set aNodeList = html.querySelectorAll("#profile-experience.title")
With ActiveSheet
For iItem = 0 To aNodeList.Length - 1
.Cells(iItem + 1, 1) = aNodeList.item(iItem).innerText
'.Cells(iItem + 1, 1) = aNodeList(iItem).innerText '<== or potentially this syntax
Next iItem
End With
End Sub
Note:
I have literally translated your getElementById("profile-experience").getElementsbyClassName("title") into a CSS selector, querySelectorAll("#profile-experience.title"), so assume that you have done that correctly.