VBA web scraping contents without class name or ID - vba

I would like to scrape Dividend Future Prices from HKEX.
Here's the URL of this site :
http://www.hkex.com.hk/Market-Data/Futures-and-Options-Prices/Equity-Index/HSCEI-Dividend-Futures?sc_lang=en#&product=DHH
I wanted to scrape Prev.Day settlement price of the "Dec-19 Contract" via VBA.
However, it doesn't have any class name or id, so I have no idea how to access the information.
<tr>
<td>Dec-19</td>
<td>-</td>
<td>-</td>
<td>413.78</td>
<td>
-
<br>
-
</td>
<td>-</td>
<td>
-
<br>
-
</td>
<td>-<td>
<td>17,330</td>
</tr>
How can I scrape this via VBA?

It's really the hell of an automation to find specific item with no remarkable flag attached to it. However, I've written this script without hardcoding index to the elements. Give this a shot and get your desired values:
Sub Hkex_Data()
Dim IE As New InternetExplorer, html As HTMLDocument
Dim posts As Object
With IE
.Visible = False
.navigate "http://www.hkex.com.hk/Market-Data/Futures-and-Options-Prices/Equity-Index/HSCEI-Dividend-Futures?sc_lang=en#&product=DHH"
Do Until .readyState = READYSTATE_COMPLETE: Loop
Set html = .document
End With
Application.Wait (Now + TimeValue("0:00:05"))
For Each posts In html.getElementsByClassName("hsirowcon")
Row = Row + 1: Cells(Row, 1) = posts.NextSibling.NextSibling.FirstChild.innerText
Cells(Row, 2) = posts.NextSibling.NextSibling.LastChild.innerText
Next posts
IE.Quit
End Sub
Result:
19-Dec 17,330
Reference to add to the library:
Microsoft internet controls
Microsoft Html Object Library

Use getElementsByTagName. Identify your and then go through each row and each td in rows. Something like that.
Dim objTR As IHTMLElement
Dim objTD As IHTMLElement
Dim objTable As IHTMLElement
For Each objTR In objTable.getElementsByTagName("tr")
For Each objTD In objTR
'do something with objtd.innerText
Next objTD
Next objTR
or you can declare your variables as Object if you prefer late binding.

You could also simply use a CSS selector and no loop:
html.querySelectorAll("td:nth-child(4)")(1).innerText
This method is fragile. If the style on the page changes this may break.
CSS selector:
If you observe the relevant part of the page (showing first contract year with headers for context and with chart between contract years removed):
The associated HTML for contract year 2019 is:
Prev.Day Settlement Price is the 4th td within this i.e. CSS selector td:nth-child(4).
This pattern is repeated for all contract years so you can return a nodeList of all matches to this (i.e. every td:nth-child(4) with the .querySelectorAll method).
Year 2019 is at index position 1; this is the second element in a 0 based indexed nodeList, so you access with .querySelectorAll("td:nth-child(4)")(1).
CSS query result - first few results:

Related

Find element by...(."<table class = "table table-bordered table-condensed table-striped text-center table-hover">") in selenium

I'm using Selenium.Driver to find a div class element that should return a specific table within a webpage.
Whilst having succeeded in abstracting the entire page through the find by tag method, my challenge is to now, return ONLY the table within the page, except that the table class is listed as a "compound name" and not supported in Selenium:
I've tried both the .xpath and the .css methods without success. My failure could be as a result of using wrong expressions.
My code:
Set HTMLTables = HTMLDoc.FindElementsByTag("table")
' The above code returns all elements within the entire page.
' Instead of finding elements by "table" tag,
' I wanna FindElement(s)By...("table table-bordered table-condensed table-striped
text-center table-hover")
' The given code shall return ONLY the TABLE from within the entire page.
Here's an update of my question, I've added both the micro and the targeted html page. The url link is also posted.
code:
enter image description here
url link: https://portalseven.com/lottery/southafrica_powerball_winning_numbers.jsp?viewType=2&timeRange=3
If you are looking for all the compound class choose this
Dim Table As Selenium.WebElement
Set Table = driver.FindElementByXPath("//*[#class='table table-bordered table-condensed table- striped text-center table-hover']")
If you are looking for part of the compound class you can also use
Dim FindBy As New Selenium.By
If Not driver.IsElementPresent(FindBy.Class("table-condensed"), 3000) Then
driver.Quit
Exit Sub
Else
' do something ...
Set Table = driver.FindElement(FindBy.Class("table-condensed"))
End If
Or
Dim FindBy As New Selenium.By
If Not driver.IsElementPresent(FindBy.Css(".table-bordered"), 3000) Then
driver.Quit
Exit Sub
Else
' do something ...
Set Table = driver.FindElement(FindBy.Css(".table-bordered"))
End If
The problem on your code is in Set Table = .... Compare this Set Table line below with yours.
I tested this procedure in Excel 2007 and it works!
Sub Selenium_FindElementByClass_Compound()
Dim driver As New WebDriver
Dim myUrl As String
Dim Table As Selenium.WebElement
' Set URL
myUrl = "https://portalseven.com/lottery/southafrica_powerball_winning_numbers.jsp?viewType=2&timeRange=3"
' Open chrome
driver.Start "Chrome"
' Navigate to Url
driver.Get myUrl
Application.Wait Now + TimeValue("00:00:5")
' Find table
Set Table = driver.FindElementByXPath("//*[#class='table table-bordered table-condensed table-striped text-center table-hover']")
' Copy table to Excel
Table.AsTable.ToExcel ThisWorkbook.Worksheets.Add.Range("A1")
End Sub

How to Click an href using an Excel VBA

I am very new to VBA and had a question regarding how to click an href link in Internet Explorer. There are multiple href's on the source page. I have never encountered this and it has been giving me a hard time! I have looked on this website searching for answers but decided to ask here.
Below I have listed the code I have, up to the point where I encounter the problem, as well as the Source Code on Internet Explorer.
I commented out what I have tried and listed the error I received.
Code Below:
Sub ()
Dim i As Long
Dim URL As String
Dim IE As Object
Dim objElement As Object
Dim objCollection As Object
User = "User"
Pwd = "Pwd"
Set IE = CreateObject("InternetExplorer.Application")
IE.Visible = True
URL = "URL.com"
IE.Navigate URL
Do While IE.ReadyState <> 4
DoEvents
Loop
IE.Document.getElementById("txtUsername").Value = User
IE.Document.getElementById("txtPassword").Value = Pwd
IE.Document.getElementById("btnSubmit").Click
'IE.getElementByClassName("txtTerms").Click - Runtime Error 438
'IE.getElementByTagName("Claims Management").Click - Runtime Error 438
'Set HREF = IE.Document.getElementsByClassName("txtTerms")
'For Each HREF In IE.Document.getElementsByTagName("Claims").Click - No error occurs, nothing happens.
End Sub
Internet Explorer Source Code:
<table id="tblContent">
<tr>
<td class="txtTerms"><a href='href url 1'>Claims</a>
<br>Download<br>Create<br><a class='terms' href='href url 2'
target='terms'>Terms</a><br><br></td>
</tr>
My question would be, how to get VBA to click only on 'href url 1'?
Let me know if any additional information is needed. I apologize for my level of VBA but I am excited to learn more!
Thanks for the help!
In HTML, href is a property of the type <a> (link) which contains an absolute or relative path.
For example:
Questions
... will show as "Questions" and, if you click it, will bring you to www.stackoverflow.com/questions/. Note that "www.stackoverflow.com" has been added automatically since the path is relative.
Facebook
... will show as "Facebook" and, if you click it, will bring you to www.facebook.com. In this case, the path is absolute.
Although your HTML code is incomplete, I guess that all the links you want to navigate are contained in the table having id="tblContent". If that's the case, then you can get all the links (tagName == 'a') in that table and store the values in a collection:
Dim allHREFs As New Collection
Set allLinks = IE.Document.getElementById("tblContent").getElementsByTagName("a")
For Each link In allLinks
allHREFs.Add link.href
Next link
You can then decide to navigate them and do what you have to do one by one:
For j = 1 To allHREFs.Count
IE.Navigate URL + allHREFs(j) '<-- I'm assuming hrefs are relative.
'do your stuff here
Next href

How to pull data from a website with Excel VBA

I am trying to write VBA codes to pull the price of a product from a website. In order to this, I turned on the "Microsoft HTML Object Library" and "Microsoft Internet Controls" in VBA References. However, when I get up to the point to search the of the item that attaches the price, the codes failed. Appreciate if anyone can provide a solution for it.
Below is the link to the sample webpage that I would like to copy price from.
Link
Below is my initial codes:
Sub Update()
Dim IE As New InternetExplorer
IE.Visible = False
IE.navigate "http://www.chemistwarehouse.com.au/buy/36985/Reach-Dentotape-Waxed-20m"
Do
DoEvents
Loop Until IE.readyState = READYSTATE_COMPLETE
Dim Doc As HTMLDocument
Set Doc = IE.document
Dim getprice As String
getprice = Trim(Doc.getElementsByTagName("div class="Price" itemprop="price"").innerText)
Worksheets("Sheet1").Range(C1).Value = getprice
End Sub
The function getElementsByTagName() requires a tag name only as parameter:
e.g. getElementsByTagName("div")
Try getElementsByClassName() instead:
getprice = Trim(Doc.getElementsByClassName("Price").Item.innerText)
There were a few issues with the above code.
Issue 1
getprice = Trim(Doc.getElementsByTagName("div class="Price" itemprop="price"").innerText)
This:
div class="Price" itemprop="price" isn't a TagName. TagNames are things like Input, IMG, Anchors, etc. However, we can see the Class attribute for the price element you are interested in. We can change how we select this element by doing:
getprice = Trim(Doc.getElementsByClassName("Price")(0).innerText)
You may notice (0) at the end of the element selection. This is to indicate which element is being selected of the Price ClassName collection. getElementsByClassName returns multiple elements, the first element being 0.
Issue 2
Worksheets("Sheet1").Range(C1).Value = getprice
I don't C1 referenced anywhere. One way to reference a specific cell, is to use a String to represent the range. From your code this becomes:
Worksheets("Sheet1").Range("C1").Value = getprice

click link using getelementsbyclassname vba 7 using access 2007

I have a connection to an internet explorer webpage but i cannot seem to "click" a specific link using vba7 in access. I've tried loads of options like getelementsbyclassname. Here's the source code from the webpage:
<tr>
<td valign=bottom class="formbutton" align=center colspan=2>
view
</td>
</tr>
As you can see it has no name nor a usable hyperlink and the purpose of this link is to submit the filled-out form and give it a unique number on a new web page.
Can you please help me in VBA 7 for Access 2007 to come up with a vba code that works?
Many thanks!
The following should work assuming you have a pointer to the InternetExplorer object already.
Public Sub ClickElement()
'Assuming you already have an IE object
Dim Elements As Object
Dim Element As Object
'get all 'A' tags, which the hyperlink is part of
Set Elements = IE.Document.getElementsByTagName("a")
For Each Element In Elements
'I believe the element you want to click appears as 'View' on screen
If Element.InnerText = "View" Then
'Usually a good idea to give the element focus and
'possibly fire the OnClick event, but sometimes not needed
Element.Focus
Element.Click
Element.FireEvent ("OnClick")
Exit For
End If
Next
End Sub

Excel with VBA - XmlHttp to use div

I am using excel with VBA to open a page and extract some information and putting it in my database. After some research, I figured out that opening IE obviously takes more time and it can be achieved using XmlHTTP. I am using the XmlHTTP to open a web page as proposed in my another question. However, while using IE I was able to navigate through div tags. How can I accomplish the same in XmlHTTP?
If I use IE to open the page, I am doing something like below to navigate through multiple div elements.
Set openedpage1 = iedoc1.getElementById("profile-experience").getElementsbyClassName("title")
For Each div In openedpage1
---------
However, with XmlHttp, I am not able to do like below.
For Each div In html.getElementById("profile-experience").getElementsbyClassName("title")
I am getting an error as object doesn't support this property or method.
Take a look at this answer that I had posted for another question as this is close to what you're looking for. In summary, you will:
Create a Microsoft.xmlHTTP object
Use the xmlHTTP object to open your url
Load the response as XML into a DOMDOcument object
From there you can get a set of XMLNodes, select elements, attributes, etc. from the DOMDocument
The XMLHttp object returns the contents of the page as a string in responseText. You will need to parse this string to find the information you need. Regex is an option but it will be quite cumbersome.
This page uses string functions (Mid, InStr) to extract information from the html-text.
It may be possible to create a DOMDocument from the retreived HTML (I believe it is) but I haven't pursued this.
As mentioned with the answers above put the .responseText into an HTMLDocument and then work with that object e.g.
Option Explicit
Public Sub test()
Dim html As HTMLDocument
Set html = New HTMLDocument
With CreateObject("WINHTTP.WinHTTPRequest.5.1")
.Open "GET", "http://www.someurl.com", False
.send
html.body.innerHTML = .responseText
End With
Dim aNodeList As Object, iItem As Long
Set aNodeList = html.querySelectorAll("#profile-experience.title")
With ActiveSheet
For iItem = 0 To aNodeList.Length - 1
.Cells(iItem + 1, 1) = aNodeList.item(iItem).innerText
'.Cells(iItem + 1, 1) = aNodeList(iItem).innerText '<== or potentially this syntax
Next iItem
End With
End Sub
Note:
I have literally translated your getElementById("profile-experience").getElementsbyClassName("title") into a CSS selector, querySelectorAll("#profile-experience.title"), so assume that you have done that correctly.