Im trying to scrape data from website: http://uk.investing.com/rates-bonds/financial-futures via vba, like real-time price, i.e. German 5 YR Bobl, US 30Y T-Bond, i have tried excel web query but it only scrapes the whole website, but I would like to scrape the rate only, is there a way of doing this?
There are several ways of doing this. This is an answer that I write hoping that all the basics of Internet Explorer automation will be found when browsing for the keywords "scraping data from website", but remember that nothing's worth as your own research (if you don't want to stick to pre-written codes that you're not able to customize).
Please note that this is one way, that I don't prefer in terms of performance (since it depends on the browser speed) but that is good to understand the rationale behind Internet automation.
1) If I need to browse the web, I need a browser! So I create an Internet Explorer browser:
Dim appIE As Object
Set appIE = CreateObject("internetexplorer.application")
2) I ask the browser to browse the target webpage. Through the use of the property ".Visible", I decide if I want to see the browser doing its job or not. When building the code is nice to have Visible = True, but when the code is working for scraping data is nice not to see it everytime so Visible = False.
With appIE
.Navigate "http://uk.investing.com/rates-bonds/financial-futures"
.Visible = True
End With
3) The webpage will need some time to load. So, I will wait meanwhile it's busy...
Do While appIE.Busy
DoEvents
Loop
4) Well, now the page is loaded. Let's say that I want to scrape the change of the US30Y T-Bond:
What I will do is just clicking F12 on Internet Explorer to see the webpage's code, and hence using the pointer (in red circle) I will click on the element that I want to scrape to see how can I reach my purpose.
5) What I should do is straight-forward. First of all, I will get by the ID property the tr element which is containing the value:
Set allRowOfData = appIE.document.getElementById("pair_8907")
Here I will get a collection of td elements (specifically, tr is a row of data, and the td are its cells. We are looking for the 8th, so I will write:
Dim myValue As String: myValue = allRowOfData.Cells(7).innerHTML
Why did I write 7 instead of 8? Because the collections of cells starts from 0, so the index of the 8th element is 7 (8-1). Shortly analysing this line of code:
.Cells() makes me access the td elements;
innerHTML is the property of the cell containing the value we look for.
Once we have our value, which is now stored into the myValue variable, we can just close the IE browser and releasing the memory by setting it to Nothing:
appIE.Quit
Set appIE = Nothing
Well, now you have your value and you can do whatever you want with it: put it into a cell (Range("A1").Value = myValue), or into a label of a form (Me.label1.Text = myValue).
I'd just like to point you out that this is not how StackOverflow works: here you post questions about specific coding problems, but you should make your own search first. The reason why I'm answering a question which is not showing too much research effort is just that I see it asked several times and, back to the time when I learned how to do this, I remember that I would have liked having some better support to get started with. So I hope that this answer, which is just a "study input" and not at all the best/most complete solution, can be a support for next user having your same problem. Because I have learned how to program thanks to this community, and I like to think that you and other beginners might use my input to discover the beautiful world of programming.
Enjoy your practice ;)
Other methods were mentioned so let us please acknowledge that, at the time of writing, we are in the 21st century. Let's park the local bus browser opening, and fly with an XMLHTTP GET request (XHR GET for short).
Wiki moment:
XHR is an API in the form of an object whose methods transfer data
between a web browser and a web server. The object is provided by the
browser's JavaScript environment
It's a fast method for retrieving data that doesn't require opening a browser. The server response can be read into an HTMLDocument and the process of grabbing the table continued from there.
Note that javascript rendered/dynamically added content will not be retrieved as there is no javascript engine running (which there is in a browser).
In the below code, the table is grabbed by its id cr1.
In the helper sub, WriteTable, we loop the columns (td tags) and then the table rows (tr tags), and finally traverse the length of each table row, table cell by table cell. As we only want data from columns 1 and 8, a Select Case statement is used specify what is written out to the sheet.
Sample webpage view:
Sample code output:
VBA:
Option Explicit
Public Sub GetRates()
Dim html As HTMLDocument, hTable As HTMLTable '<== Tools > References > Microsoft HTML Object Library
Set html = New HTMLDocument
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", "https://uk.investing.com/rates-bonds/financial-futures", False
.setRequestHeader "If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT" 'to deal with potential caching
.send
html.body.innerHTML = .responseText
End With
Application.ScreenUpdating = False
Set hTable = html.getElementById("cr1")
WriteTable hTable, 1, ThisWorkbook.Worksheets("Sheet1")
Application.ScreenUpdating = True
End Sub
Public Sub WriteTable(ByVal hTable As HTMLTable, Optional ByVal startRow As Long = 1, Optional ByVal ws As Worksheet)
Dim tSection As Object, tRow As Object, tCell As Object, tr As Object, td As Object, r As Long, C As Long, tBody As Object
r = startRow: If ws Is Nothing Then Set ws = ActiveSheet
With ws
Dim headers As Object, header As Object, columnCounter As Long
Set headers = hTable.getElementsByTagName("th")
For Each header In headers
columnCounter = columnCounter + 1
Select Case columnCounter
Case 2
.Cells(startRow, 1) = header.innerText
Case 8
.Cells(startRow, 2) = header.innerText
End Select
Next header
startRow = startRow + 1
Set tBody = hTable.getElementsByTagName("tbody")
For Each tSection In tBody
Set tRow = tSection.getElementsByTagName("tr")
For Each tr In tRow
r = r + 1
Set tCell = tr.getElementsByTagName("td")
C = 1
For Each td In tCell
Select Case C
Case 2
.Cells(r, 1).Value = td.innerText
Case 8
.Cells(r, 2).Value = td.innerText
End Select
C = C + 1
Next td
Next tr
Next tSection
End With
End Sub
you can use winhttprequest object instead of internet explorer as it's good to load data excluding pictures n advertisement instead of downloading full webpage including advertisement n pictures those make internet explorer object heavy compare to winhttpRequest object.
This question asked long before. But I thought following information will useful for newbies. Actually you can easily get the values from class name like this.
Sub ExtractLastValue()
Set objIE = CreateObject("InternetExplorer.Application")
objIE.Top = 0
objIE.Left = 0
objIE.Width = 800
objIE.Height = 600
objIE.Visible = True
objIE.Navigate ("https://uk.investing.com/rates-bonds/financial-futures/")
Do
DoEvents
Loop Until objIE.readystate = 4
MsgBox objIE.document.getElementsByClassName("pid-8907-last")(0).innerText
End Sub
And if you are new to web scraping please read this blog post.
Web Scraping - Basics
And also there are various techniques to extract data from web pages. This article explain few of them with examples.
Web Scraping - Collecting Data From a Webpage
I modified some thing that were poping up error for me and end up with this which worked great to extract the data as I needed:
Sub get_data_web()
Dim appIE As Object
Set appIE = CreateObject("internetexplorer.application")
With appIE
.navigate "https://finance.yahoo.com/quote/NQ%3DF/futures?p=NQ%3DF"
.Visible = True
End With
Do While appIE.Busy
DoEvents
Loop
Set allRowofData = appIE.document.getElementsByClassName("Ta(end) BdT Bdc($c-fuji-grey-c) H(36px)")
Dim i As Long
Dim myValue As String
Count = 1
For Each itm In allRowofData
For i = 0 To 4
myValue = itm.Cells(i).innerText
ActiveSheet.Cells(Count, i + 1).Value = myValue
Next
Count = Count + 1
Next
appIE.Quit
Set appIE = Nothing
End Sub
This isn't an important task, but it is one that I thought would be easy and has instead been very frustrating. I'm trying to grab the current print counter values from our MFP but I am not able to get past the second Frame on the page.
In fact, it seems to continually loop me back to the top as I try to go "deeper" into the page.
The data I am after is stored in the "contents" frame.
HTML Source - Nested Frames
Sub Copy_Count()
Dim IE As InternetExplorerMedium
Dim strURL() As String
Dim HTML_Doc As HTMLDocument
Dim HTML_Doc2 As HTMLDocument
' There will be additional machines to retrieve data from, current copier must navigate to main page first before counter.
strURL = Split("http://192.168.50.26/?MAIN=DEVICE,http://192.168.50.26/?MAIN=COUNTER&SUB=TOTAL", ",")
Set IE = New InternetExplorerMedium
IE.Navigate2 strURL(0)
Do
DoEvents
Loop Until IE.ReadyState = READYSTATE_COMPLETE
IE.Navigate2 strURL(1)
Do
DoEvents
Loop Until IE.ReadyState = READYSTATE_COMPLETE
Set HTML_Doc = IE.Document
' This get's me to "TopLevelFrame"
Debug.Print HTML_Doc.getElementsByTagName("frameset")(0).getElementsByTagName("frame")(0).Document.Body.innerHTML
' Assign the document of the TopLevelFrame
Set HTML_Doc2 = HTML_Doc.getElementsByTagName("frameset")(0).getElementsByTagName("frame")(0).Document
Debug.Print HTML_Doc2.getElementById("TotalFullColor")
Debug.Print HTML_Doc.getElementById("TotalFullColor").innerText
Debug.Print IE.Document.GetElementsByID("TotalFullColor")(1)
End Sub
Windows 10 Pro, IE 11, Office Pro Plus 2016
I am trying to scrape all the table from start page to end contents from this Webpage
Using the code below I can scrape the table contents of page 1 but I don't know how can I modify the code to get the data from start page to end.
Option Explicit
Sub NBAStats()
Dim IE As Object, obj As Object
Dim r As Integer, c As Integer, t As Integer
Dim elemCollection As Object
Set IE = CreateObject("InternetExplorer.Application")
With IE
.Visible = True
.navigate ("http://stats.nba.com/league/player/#!/")
While IE.ReadyState <> 4
DoEvents
Wend
Do While IE.busy: DoEvents: Loop
ThisWorkbook.Sheet1.Clear
Set elemCollection = IE.Document.getElementsByTagName("TABLE")
For t = 0 To (elemCollection.Length - 1)
For r = 0 To (elemCollection(t).Rows.Length - 1)
For c = 0 To (elemCollection(t).Rows(r).Cells.Length - 1)
ThisWorkbook.Worksheets(1).Cells(r + 1, c + 1) = elemCollection(t).Rows(r).Cells(c).innerText
Next c
Next r
Next t
End With
Set IE = Nothing
End Sub
Try to find the Sitemap.xml of the website you are scraping. The sitemap.xml fill have all the links present in the webpage.
Import that xml file to your Excel Sheet, Read each link & fetch each table in it.
At first, in my opinion VBA automation of the Internet Explorer is highly instable and not really practicable in productive use-cases. This means also scraping data from web sites which are provided only for viewing within a browser is not really practicable in productive use-cases. If you are entitled to use those data then you should ask for another data source (XML or JSONfor example). If you are not entitled then you should not do that. Possible the provider of the web site does not agree with this.
To be clear, I'm talking about web sites like this, which provides it's data with JavaScript only. If the data would be within the HTML then you could get those data via XMLHTTP. This is another thing.
I will nevertheless provide a "solution". So you can't simply think "He is simply unable to do this, so he is saying you should not do that."
So you must analyze the site and pick out the elements you can click for navigation.
Option Explicit
Sub NBAStats()
Dim IE As Object
Dim r As Long, c As Long, t As Long, rSheet As Long, rStart As Long
Dim bReady As Boolean
Dim elementsTable As Object
Dim elementsPageNavRigth As Object
Dim elemPageNavRigth As Object
Dim elementsTableDiv As Object
ThisWorkbook.Worksheets(1).Cells.Clear
Set IE = CreateObject("InternetExplorer.Application")
With IE
.Visible = True
.navigate ("http://stats.nba.com/league/player/#!/")
Do While IE.busy
DoEvents
Loop
rSheet = 0
Do
Do While elementsTableDiv Is Nothing
Set elementsTableDiv = IE.Document.getElementsByClassName("table-responsive")
DoEvents
Loop
Do While elementsTableDiv(0) Is Nothing
DoEvents
Loop
Set elementsPageNavRigth = IE.Document.getElementsByClassName("page-nav right")
Set elemPageNavRigth = elementsPageNavRigth(0)
If elemPageNavRigth.className = "page-nav right disabled" Then bReady = True
'If rSheet = 0 Then rStart = 0 Else rStart = 1
Set elementsTable = elementsTableDiv(0).getElementsByTagName("TABLE")
For r = rStart To (elementsTable(0).Rows.Length - 1)
For c = 0 To (elementsTable(0).Rows(r).Cells.Length - 1)
ThisWorkbook.Worksheets(1).Cells(r + rSheet + 1, c + 1) = elementsTable(t).Rows(r).Cells(c).innerText
Next c
Next r
rSheet = rSheet + r
If Not elemPageNavRigth Is Nothing Then elemPageNavRigth.Click
Set elementsTableDiv = Nothing
Loop Until bReady Or elemPageNavRigth Is Nothing
End With
Set IE = Nothing
End Sub
This is my first post on stackflow :) I've been Googling VBA knowledge and writing some VBA for about a month.
My computer info:
1.window 8.1
2.excel 2013
3.ie 11
My excel reference
Microsoft Object Library: yes
Microsoft Internet Controls: yes
Microsoft Form 2.0 Object library: yes
Microsoft Script Control 1.0: yes
Issue:
I was trying to retrieve data from internet explorer automatically using VBA.
I would like to retrieve the value within an input tag from a id called "u_0_1" which is under a id called "facebook". I am expecting to retrieve the value "AQFFmT0qn1TW" on cell c2. However, it got this msg popped up after I run the VBA "run-time error '91':object variable or with block variable not set.
I have been trying this for a couple of weeks using different methods such as,
1.getelementsbyClassname
2.getelementbyid
3.getelementsbyTagname
But it just doesn't work.
url:
http://coursesweb.net/javascript/getelementsbytagname
Below is my VBA code. Could you guys help me out a little bit please?
Private Sub CommandButton1_Click()
Dim ie As Object
Dim Doc As HTMLDocument
Dim getThis As String
Set ie = CreateObject("InternetExplorer.Application")
ie.Visible = 0
ie.navigate "http://coursesweb.net/javascript/getelementsbytagname"
Do
DoEvents
Loop Until ie.readyState = 4
Set Doc = ie.document
getThis = Trim(Doc.getElementById("u_0_1")(0).getElementsByTagName("input")(0).Value)
Range("c2").Value = getThis
End Sub
Thanks for your help. I have no idea that there is difference between JS and VBA in aspect of getelementsby () methods. And using the loop method to find the id which I find it very useful as well.
I still have some issues to retrieve value from a form or input type. I hope that you could help me or give me some suggestions as well.
Expected Result:
retrieve the value "AQFFmT0qn1TW" and copy it on Cell ("c2") automatically.
Actual Result:
nothing return to Cell ("C2")
Below is the HTML elements.
<form rel="async" ajaxify="/plugins/like/connect" method="post" action="/plugins/like/connect" onsubmit="return window.Event && Event.__inlineSubmit && Event.__inlineSubmit(this,event)" id="u_0_1">
<input type="hidden" name="fb_dtsg" value="AQFFmT0qn1TW" autocomplete="off">
Below is the VBA code based on your code.
Private Sub CommandButton1_Click()
Dim ie As Object
Dim Doc As HTMLDocument
Dim Elements As IHTMLElementCollection
Dim Element As IHTMLElement
Set ie = CreateObject("InternetExplorer.Application")
ie.Visible = 0
ie.navigate "http://coursesweb.net/javascript/getelementsbytagname"
Do
DoEvents
Loop Until ie.readyState = 4
Set Doc = ie.document
Set Elements = Doc.getElementsByTagName("input")
For Each Element In Elements
If Element.name = "fb_dtsg" Then
Range("c2").Value = Element.innerText
End If
Next Element
Set Elements = Nothing
End Sub
Cheers.
first of all, I can't find in source of website tags you were searching. Anyway, I think you can't chain getElementById.getElementsByTag as in JS. You have to loop through collection of document elements.
Private Sub CommandButton1_Click()
Dim ie As Object
Dim Doc As HTMLDocument
Dim Elements As IHTMLElementCollection
Dim Element As IHTMLElement
Set ie = CreateObject("InternetExplorer.Application")
ie.Visible = 0
ie.navigate "http://coursesweb.net/javascript/getelementsbytagname"
Do
DoEvents
Loop Until ie.readyState = 4
Set Doc = ie.document
Set Elements = Doc.getElementsByTagName("ul")
For Each Element In Elements
If Element.ID = "ex4" Then
Sheets(1).Cells(1, 1).Value = Element.innerText
End If
Next Element
Set Elements = Nothing
End Sub
First I'm getting collection of tags "ul", then looping through them for id "ex4". In your case you'd get collection of "input"s then loop for id you want. Finding id which is followed by different id shouldn't be hard, just some if...thens.
If you need further assistant please respond with url in which I can find exactly what you're looking for.
Cheers
I am trying to find a way to get the data from yelp.com
I have a spreadsheet on which there are several keywords and locations. I am looking to extract data from yelp listings based on these keywords and locations already in my spreadsheet.
I have created the following code, but it seems to get absurd data and not the exact information I am looking for.
I want to get business name, address and phone number, but all I am getting is nothing. If anyone here could help me solve this problem.
Sub find()
Dim ie As Object
Set ie = CreateObject("InternetExplorer.Application")
With ie
ie.Visible = False
ie.Navigate "http://www.yelp.com/search?find_desc=boutique&find_loc=New+York%2C+NY&ns=1&ls=3387133dfc25cc99#start=10"
' Don't show window
ie.Visible = False
'Wait until IE is done loading page
Do While ie.Busy
Application.StatusBar = "Downloading information, lease wait..."
DoEvents
Loop
' Make a string from IE content
Set mDoc = ie.Document
peopleData = mDoc.body.innerText
ActiveSheet.Cells(1, 1).Value = peopleData
End With
peopleData = "" 'Nothing
Set mDoc = Nothing
End Sub
If you right click in IE, and do View Source, it is apparent that the data served on the site is not part of the document's .Body.innerText property. I notice this is often the case with dynamically served data, and that approach is really too simple for most web-scraping.
I open it in Google Chrome and inspect the elements to get an idea of what I'm really looking for, and how to find it using a DOM/HTML parser; you will need to add a reference to Microsoft HTML Object Library.
I think you can get it to return a collection of the <DIV> tags, and then check those for the classname with an If statment inside the loop.
I made some revisions to my original answer, this should print each record in a new cell:
Option Explicit
Private Sub Sleep Lib "kernel32" (ByVal dwMilliseconds As Long)
Sub find()
'Uses late binding, or add reference to Microsoft HTML Object Library
' and change variable Types to use intellisense
Dim ie As Object 'InternetExplorer.Application
Dim html As Object 'HTMLDocument
Dim Listings As Object 'IHTMLElementCollection
Dim l As Object 'IHTMLElement
Dim r As Long
Set ie = CreateObject("InternetExplorer.Application")
With ie
.Visible = False
.Navigate "http://www.yelp.com/search?find_desc=boutique&find_loc=New+York%2C+NY&ns=1&ls=3387133dfc25cc99#start=10"
' Don't show window
'Wait until IE is done loading page
Do While .readyState <> 4
Application.StatusBar = "Downloading information, Please wait..."
DoEvents
Sleep 200
Loop
Set html = .Document
End With
Set Listings = html.getElementsByTagName("LI") ' ## returns the list
For Each l In Listings
'## make sure this list item looks like the listings Div Class:
' then, build the string to put in your cell
If InStr(1, l.innerHTML, "media-block clearfix media-block-large main-attributes") > 0 Then
Range("A1").Offset(r, 0).Value = l.innerText
r = r + 1
End If
Next
Set html = Nothing
Set ie = Nothing
End Sub