Can't pull data from a stubborn webpage using vba - vba

Hope you are doing well. The site i tried to scrape category-names from is very simple to look at if you notice it's inspected element but when i create a parser i can't pull the data. I wanted to scrape only the 7 category names from that page. I tried with all possible angles but failed. If anybody helps me point out what I'm doing wrong, I would be very grateful to him. Thanks in advance. FYC, I'm pasting here the code I tried with.
Sub ItemName()
Dim http As New MSXML2.XMLHTTP60, html As New HTMLDocument
Dim topics As Object, topic As Object, posts As Object, post As Object, ele As Object
Dim x As Long
x = 2
http.Open "GET", "http://www.bjs.com/tv--electronics.category.3000000000000144985.2002193", False
http.send
html.body.innerHTML = http.responseText
Set topics = html.getElementsByClassName("categories")
For Each topic In topics
For Each posts In topic.getElementsByTagName("li")
For Each post In posts.getElementsByTagName("a")
Set ele = post.getElementsByTagName("h4")(0)
Cells(x, 1) = ele.innerText
x = x + 1
Next post
Next posts
Next topic
End Sub

Here's one possible solution, I'm using the internet explorer object instead of MSXML. I'm able to retrieve the data from the page, and it's pretty quick.
Here's the full code:
Option Explicit
#If VBA7 Then
Public Declare PtrSafe Sub Sleep Lib "kernel32" (ByVal dwMilliseconds As LongPtr)
#Else
Public Declare Sub Sleep Lib "kernel32" (ByVal dwMilliseconds As Long)
#End If
Sub ItemName()
On Error GoTo errhand:
Dim ie As Object: Set ie = CreateObject("InternetExplorer.Application")
Dim topics As Object, topic As Object
Dim i As Byte
With ie
.Visible = False
.Navigate "http://www.bjs.com/tv--electronics.category.3000000000000144985.2002193"
Sleep 500 ' Wait for the page to start loading
Do Until .document.readyState = 4 Or .busy = False Or i >= 100
Sleep 100
DoEvents
i = i + 1
Loop
End With
Set topics = ie.document.getElementsByClassName("name ng-binding")
For Each topic In topics
'Print out the element's innertext
Debug.Print topic.innertext
Next
ie.Quit
Set ie = Nothing
Exit Sub
errhand:
Debug.Print Err.Number, Err.Description
ie.Quit
Set ie = Nothing
End Sub

As the content of that site are generated dynamically, so there is no way for xmlhttp request to catch the page source. However, to get around that selenium is good to go, as it works well when it comes to deal with javascriptheavy website. I only used selenium in my below script to get the page source. As soon as it get that, I reverted back to usual vba method to accomplish the process.
Sub Grabbing_item()
Dim driver As New ChromeDriver, html As New HTMLDocument
Dim post As Object
With driver
.get "http://www.bjs.com/tv--electronics.category.3000000000000144985.2002193"
html.body.innerHTML = .ExecuteScript("return document.body.innerHTML;")
.Quit
End With
For Each post In html.getElementsByClassName("name")
x = x + 1: Cells(x, 1) = post.innerText
Next post
End Sub

Related

How Do I Test If Webpage Contains Certain Text

I'm trying to detect if a web page has certain text. For example, I want to see if this web page includes the following phrase: "Here is my code"
I can't get it to ever find that the "If Then" condition is satisfied. Here's what I'm trying:
Const READYSTATE_COMPLETE = 4
Declare Function SetForegroundWindow Lib "user32" _
Alias "SetForegroundWindow" (ByVal Hwnd As Long)As Long
' Declare Internet Explorer object
Dim IE As SHDocVw.InternetExplorer
Dim strProgramName As String
Sub Main
' create instance of InternetExplorer
Set IE = New InternetExplorer
' using your newly created instance of Internet Explorer
With IE
SetForegroundWindow IE.HWND
.Visible = True
.Navigate2 "https://stackoverflow.com/questions/38355762/how-do-i-modify-web-scraping-code-to-loop-through-product-bullets-until-it-finds"
' Wait until page we are navigating to is loaded
Do While .Busy
Loop
Do
Loop Until .readyState = READYSTATE_COMPLETE
On Error Resume Next
If Err Then
Else
End If
Wait 2
If InStr(IE.document.body.innerHTML, "Here is my code") > 0 Then
MsgBox "Yessiree Bob"
Else
MsgBox "The text dosen't exist"
End If
Set IE = Nothing
' Tidy Up
End With
End Sub
I've also tried:
FindText = InStr(1, IE.document.body.innerHTML, "Here is my code")
If FindText > 0 Then
And
msg = IE.document.body.innerHTML
If InStr(msg, "Here is my code") > 0 Then
But nothing works. I've looked on Stack Overflow, but can't find this exact question.
Thanks in advance!
Use:
If InStr(IE.document.getElementById("body").innerHTML, "Here is my code") > 0 Then

Extracting website data with Excel and VBA [duplicate]

Im trying to scrape data from website: http://uk.investing.com/rates-bonds/financial-futures via vba, like real-time price, i.e. German 5 YR Bobl, US 30Y T-Bond, i have tried excel web query but it only scrapes the whole website, but I would like to scrape the rate only, is there a way of doing this?
There are several ways of doing this. This is an answer that I write hoping that all the basics of Internet Explorer automation will be found when browsing for the keywords "scraping data from website", but remember that nothing's worth as your own research (if you don't want to stick to pre-written codes that you're not able to customize).
Please note that this is one way, that I don't prefer in terms of performance (since it depends on the browser speed) but that is good to understand the rationale behind Internet automation.
1) If I need to browse the web, I need a browser! So I create an Internet Explorer browser:
Dim appIE As Object
Set appIE = CreateObject("internetexplorer.application")
2) I ask the browser to browse the target webpage. Through the use of the property ".Visible", I decide if I want to see the browser doing its job or not. When building the code is nice to have Visible = True, but when the code is working for scraping data is nice not to see it everytime so Visible = False.
With appIE
.Navigate "http://uk.investing.com/rates-bonds/financial-futures"
.Visible = True
End With
3) The webpage will need some time to load. So, I will wait meanwhile it's busy...
Do While appIE.Busy
DoEvents
Loop
4) Well, now the page is loaded. Let's say that I want to scrape the change of the US30Y T-Bond:
What I will do is just clicking F12 on Internet Explorer to see the webpage's code, and hence using the pointer (in red circle) I will click on the element that I want to scrape to see how can I reach my purpose.
5) What I should do is straight-forward. First of all, I will get by the ID property the tr element which is containing the value:
Set allRowOfData = appIE.document.getElementById("pair_8907")
Here I will get a collection of td elements (specifically, tr is a row of data, and the td are its cells. We are looking for the 8th, so I will write:
Dim myValue As String: myValue = allRowOfData.Cells(7).innerHTML
Why did I write 7 instead of 8? Because the collections of cells starts from 0, so the index of the 8th element is 7 (8-1). Shortly analysing this line of code:
.Cells() makes me access the td elements;
innerHTML is the property of the cell containing the value we look for.
Once we have our value, which is now stored into the myValue variable, we can just close the IE browser and releasing the memory by setting it to Nothing:
appIE.Quit
Set appIE = Nothing
Well, now you have your value and you can do whatever you want with it: put it into a cell (Range("A1").Value = myValue), or into a label of a form (Me.label1.Text = myValue).
I'd just like to point you out that this is not how StackOverflow works: here you post questions about specific coding problems, but you should make your own search first. The reason why I'm answering a question which is not showing too much research effort is just that I see it asked several times and, back to the time when I learned how to do this, I remember that I would have liked having some better support to get started with. So I hope that this answer, which is just a "study input" and not at all the best/most complete solution, can be a support for next user having your same problem. Because I have learned how to program thanks to this community, and I like to think that you and other beginners might use my input to discover the beautiful world of programming.
Enjoy your practice ;)
Other methods were mentioned so let us please acknowledge that, at the time of writing, we are in the 21st century. Let's park the local bus browser opening, and fly with an XMLHTTP GET request (XHR GET for short).
Wiki moment:
XHR is an API in the form of an object whose methods transfer data
between a web browser and a web server. The object is provided by the
browser's JavaScript environment
It's a fast method for retrieving data that doesn't require opening a browser. The server response can be read into an HTMLDocument and the process of grabbing the table continued from there.
Note that javascript rendered/dynamically added content will not be retrieved as there is no javascript engine running (which there is in a browser).
In the below code, the table is grabbed by its id cr1.
In the helper sub, WriteTable, we loop the columns (td tags) and then the table rows (tr tags), and finally traverse the length of each table row, table cell by table cell. As we only want data from columns 1 and 8, a Select Case statement is used specify what is written out to the sheet.
Sample webpage view:
Sample code output:
VBA:
Option Explicit
Public Sub GetRates()
Dim html As HTMLDocument, hTable As HTMLTable '<== Tools > References > Microsoft HTML Object Library
Set html = New HTMLDocument
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", "https://uk.investing.com/rates-bonds/financial-futures", False
.setRequestHeader "If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT" 'to deal with potential caching
.send
html.body.innerHTML = .responseText
End With
Application.ScreenUpdating = False
Set hTable = html.getElementById("cr1")
WriteTable hTable, 1, ThisWorkbook.Worksheets("Sheet1")
Application.ScreenUpdating = True
End Sub
Public Sub WriteTable(ByVal hTable As HTMLTable, Optional ByVal startRow As Long = 1, Optional ByVal ws As Worksheet)
Dim tSection As Object, tRow As Object, tCell As Object, tr As Object, td As Object, r As Long, C As Long, tBody As Object
r = startRow: If ws Is Nothing Then Set ws = ActiveSheet
With ws
Dim headers As Object, header As Object, columnCounter As Long
Set headers = hTable.getElementsByTagName("th")
For Each header In headers
columnCounter = columnCounter + 1
Select Case columnCounter
Case 2
.Cells(startRow, 1) = header.innerText
Case 8
.Cells(startRow, 2) = header.innerText
End Select
Next header
startRow = startRow + 1
Set tBody = hTable.getElementsByTagName("tbody")
For Each tSection In tBody
Set tRow = tSection.getElementsByTagName("tr")
For Each tr In tRow
r = r + 1
Set tCell = tr.getElementsByTagName("td")
C = 1
For Each td In tCell
Select Case C
Case 2
.Cells(r, 1).Value = td.innerText
Case 8
.Cells(r, 2).Value = td.innerText
End Select
C = C + 1
Next td
Next tr
Next tSection
End With
End Sub
you can use winhttprequest object instead of internet explorer as it's good to load data excluding pictures n advertisement instead of downloading full webpage including advertisement n pictures those make internet explorer object heavy compare to winhttpRequest object.
This question asked long before. But I thought following information will useful for newbies. Actually you can easily get the values from class name like this.
Sub ExtractLastValue()
Set objIE = CreateObject("InternetExplorer.Application")
objIE.Top = 0
objIE.Left = 0
objIE.Width = 800
objIE.Height = 600
objIE.Visible = True
objIE.Navigate ("https://uk.investing.com/rates-bonds/financial-futures/")
Do
DoEvents
Loop Until objIE.readystate = 4
MsgBox objIE.document.getElementsByClassName("pid-8907-last")(0).innerText
End Sub
And if you are new to web scraping please read this blog post.
Web Scraping - Basics
And also there are various techniques to extract data from web pages. This article explain few of them with examples.
Web Scraping - Collecting Data From a Webpage
I modified some thing that were poping up error for me and end up with this which worked great to extract the data as I needed:
Sub get_data_web()
Dim appIE As Object
Set appIE = CreateObject("internetexplorer.application")
With appIE
.navigate "https://finance.yahoo.com/quote/NQ%3DF/futures?p=NQ%3DF"
.Visible = True
End With
Do While appIE.Busy
DoEvents
Loop
Set allRowofData = appIE.document.getElementsByClassName("Ta(end) BdT Bdc($c-fuji-grey-c) H(36px)")
Dim i As Long
Dim myValue As String
Count = 1
For Each itm In allRowofData
For i = 0 To 4
myValue = itm.Cells(i).innerText
ActiveSheet.Cells(Count, i + 1).Value = myValue
Next
Count = Count + 1
Next
appIE.Quit
Set appIE = Nothing
End Sub

vba code running but not fetching data

I am new to vba.
I am trying to use below code by David Zemens to fetch data from yelp
Option Explicit
Private Sub Sleep Lib "kernel32" (ByVal dwMilliseconds As Long)
Sub find()
'Uses late binding, or add reference to Microsoft HTML Object Library
' and change variable Types to use intellisense
Dim ie As Object 'InternetExplorer.Application
Dim html As Object 'HTMLDocument
Dim Listings As Object 'IHTMLElementCollection
Dim l As Object 'IHTMLElement
Dim r As Long
Set ie = CreateObject("InternetExplorer.Application")
With ie
.Visible = False
.Navigate "http://www.yelp.com/search?find_desc=boutique&find_loc=New+York%2C+NY&ns=1&ls=3387133dfc25cc99#start=10"
' Don't show window
'Wait until IE is done loading page
Do While .readyState <> 4
Application.StatusBar = "Downloading information, Please wait..."
DoEvents
Sleep 200
Loop
Set html = .Document
End With
Set Listings = html.getElementsByTagName("LI") ' ## returns the list
For Each l In Listings
'## make sure this list item looks like the listings Div Class:
' then, build the string to put in your cell
If InStr(1, l.innerHTML, "media-block clearfix media-block-large main-attributes") > 0 Then
Range("A1").Offset(r, 0).Value = l.innerText
r = r + 1
End If
Next
Set html = Nothing
Set ie = Nothing
End Sub
Problem is that it's not getting any data from the source.
Regards
There's a lot of work to be done.
Here's something that you can start with. Hopefully, you will be able to find the other pieces of information using the same logic. This will print business names in the immediate window. I've found the business names in meta tag description.
I've changed the sleep amount to 5 seconds. IE will be able to fully load and the rest of the code will be processed reliably. The initial 200 milliseconds gave results once every couple of runs. I guess this depends how fast your computer is so 5 seconds is pretty safe I guess.
Option Explicit
Declare Sub Sleep Lib "kernel32" (ByVal dwMilliseconds As Long)
Sub find()
'Uses late binding, or add reference to Microsoft HTML Object Library
' and change variable Types to use intellisense
Dim returnstring As String 'this is going to hold boutiques names
Dim ie As Object 'InternetExplorer.Application
Dim html As Object 'HTMLDocument
Dim meta As Object 'IHTMLElementCollection
Dim l As Object 'IHTMLElement
Dim r As Long
Set ie = CreateObject("InternetExplorer.Application")
With ie
.Visible = False
.Navigate "http://www.yelp.com/search?find_desc=boutique&find_loc=New+York%2C+NY&ns=1&ls=3387133dfc25cc99#start=10"
' Don't show window
'Wait until IE is done loading page
Do While .readyState <> 4
Application.StatusBar = "Downloading information, Please wait..."
DoEvents
Sleep 5000
Loop
Set html = .Document
End With
Set meta = html.GetElementsByTagName("META") ' ## returns attribures
Dim m As Object
For Each m In meta
If InStr(m.Content, "Reviews on Boutique in New York -") > 0 Then
returnstring = Replace(m.Content, "Reviews on Boutique in New York -", "")
End If
Next
Dim i As Integer
For i = 0 To UBound(Split(returnstring, ","))
Debug.Print (Split(returnstring, ",")(i))
Next
Set html = Nothing
Set ie = Nothing
End Sub
Myoutput:

Get data from listings on a website to excel VBA

I am trying to find a way to get the data from yelp.com
I have a spreadsheet on which there are several keywords and locations. I am looking to extract data from yelp listings based on these keywords and locations already in my spreadsheet.
I have created the following code, but it seems to get absurd data and not the exact information I am looking for.
I want to get business name, address and phone number, but all I am getting is nothing. If anyone here could help me solve this problem.
Sub find()
Dim ie As Object
Set ie = CreateObject("InternetExplorer.Application")
With ie
ie.Visible = False
ie.Navigate "http://www.yelp.com/search?find_desc=boutique&find_loc=New+York%2C+NY&ns=1&ls=3387133dfc25cc99#start=10"
' Don't show window
ie.Visible = False
'Wait until IE is done loading page
Do While ie.Busy
Application.StatusBar = "Downloading information, lease wait..."
DoEvents
Loop
' Make a string from IE content
Set mDoc = ie.Document
peopleData = mDoc.body.innerText
ActiveSheet.Cells(1, 1).Value = peopleData
End With
peopleData = "" 'Nothing
Set mDoc = Nothing
End Sub
If you right click in IE, and do View Source, it is apparent that the data served on the site is not part of the document's .Body.innerText property. I notice this is often the case with dynamically served data, and that approach is really too simple for most web-scraping.
I open it in Google Chrome and inspect the elements to get an idea of what I'm really looking for, and how to find it using a DOM/HTML parser; you will need to add a reference to Microsoft HTML Object Library.
I think you can get it to return a collection of the <DIV> tags, and then check those for the classname with an If statment inside the loop.
I made some revisions to my original answer, this should print each record in a new cell:
Option Explicit
Private Sub Sleep Lib "kernel32" (ByVal dwMilliseconds As Long)
Sub find()
'Uses late binding, or add reference to Microsoft HTML Object Library
' and change variable Types to use intellisense
Dim ie As Object 'InternetExplorer.Application
Dim html As Object 'HTMLDocument
Dim Listings As Object 'IHTMLElementCollection
Dim l As Object 'IHTMLElement
Dim r As Long
Set ie = CreateObject("InternetExplorer.Application")
With ie
.Visible = False
.Navigate "http://www.yelp.com/search?find_desc=boutique&find_loc=New+York%2C+NY&ns=1&ls=3387133dfc25cc99#start=10"
' Don't show window
'Wait until IE is done loading page
Do While .readyState <> 4
Application.StatusBar = "Downloading information, Please wait..."
DoEvents
Sleep 200
Loop
Set html = .Document
End With
Set Listings = html.getElementsByTagName("LI") ' ## returns the list
For Each l In Listings
'## make sure this list item looks like the listings Div Class:
' then, build the string to put in your cell
If InStr(1, l.innerHTML, "media-block clearfix media-block-large main-attributes") > 0 Then
Range("A1").Offset(r, 0).Value = l.innerText
r = r + 1
End If
Next
Set html = Nothing
Set ie = Nothing
End Sub

Input textbox value is empty when submiting a web form in vba

I am trying to submit a webform by giving all the values in form input text boxes but when i call the submit button click from excel vba, one of the text boxes is becoming empty and throwing a validation error.
Sub Click_Btn()
Dim objForms As Object
Dim vTxtInput As Variant
'Set objIE = GetIEApp
Set objIE = New InternetExplorer
'Make sure an IE object was hooked
If TypeName(objIE) = "Nothing" Then
MsgBox "Could not hook Internet Explorer object", _
vbCritical, "GetFields() Error"
'GoTo Clean_Up
End If
objIE.Navigate "http://fly3.emirates.com/CAB/IBE/SearchAvailability.aspx"
objIE.Visible = True
Sleep 5000
Set objForms = objIE.document.all
'Choose one way Flights
objIE.document.getElementById("ctl00_c_CtWNW_onewaySearch").Click
Sleep 1000
objIE.document.getElementById("ctl00_c_CtWNW_ddlTo-suggest").Value = "Sydney (SYD)"
' Departure
objIE.document.getElementById("ctl00_c_CtWNW_ddlFrom-suggest").Value = "Mumbai (BOM)"
' Departure Date
objIE.document.getElementById("ctl00_c_CtWNW_txtDepartDate").Value = "28 Sep 12"
Sleep 1000
objIE.document.getElementById("ctl00_c_FS_FF").Click
End Sub
When i click the submit button for the form the ctl00_c_CtWNW_ddlTo-suggest text box is becoming empty and getting an error.
You are getting that error because you are not giving it enough time to validate the names of the FROM and TO Dropdowns (Yes - not Text Boxes)
Try this code (TRIED AND TESTED). Also I am using Late Binding with IE. Change as applicable in your code.
The Sleep 5000 give the text in the drop downs enough time to validate itself with the list of the drop down.
Option Explicit
Private Declare Sub Sleep Lib "kernel32" (ByVal dwMilliseconds As Long)
Sub Click_Btn()
Dim objForms As Object
Dim vTxtInput As Variant
Dim objIE As Object
Set objIE = CreateObject("InternetExplorer.Application")
'Make sure an IE object was hooked
If TypeName(objIE) = "Nothing" Then
MsgBox "Could not hook Internet Explorer object", _
vbCritical, "GetFields() Error"
'GoTo Clean_Up
End If
objIE.Navigate "http://fly3.emirates.com/CAB/IBE/SearchAvailability.aspx"
objIE.Visible = True
Sleep 5000
Set objForms = objIE.document.all
'~~> Choose one way Flights
objIE.document.getElementById("ctl00_c_CtWNW_onewaySearch").Click
'~~> From
objIE.document.getElementById("ctl00_c_CtWNW_ddlFrom-suggest").Focus
objIE.document.getElementById("ctl00_c_CtWNW_ddlFrom-suggest").Value _
= "Mumbai (BOM)"
Sleep 5000
'~~> Departure Date
objIE.document.getElementById("ctl00_c_CtWNW_txtDepartDate").Focus
objIE.document.getElementById("ctl00_c_CtWNW_txtDepartDate").Value _
= "28 Sep 12"
Sleep 5000
'~~> To
objIE.document.getElementById("ctl00_c_CtWNW_ddlTo-suggest").Focus
objIE.document.getElementById("ctl00_c_CtWNW_ddlTo-suggest").Value _
= "Sydney (SYD)"
Sleep 5000
objIE.document.getElementById("ctl00_c_FS_FF").Focus
objIE.document.getElementById("ctl00_c_FS_FF").Click
End Sub
EDIT
Here is Way 2, which is faster than Way 1 (Above) by 15 Seconds as we are not using Sleep 5000 for validating. This doesn't require you to validate the drop downs. What it does is it bypasses the checkValidation(); javascript which gets executed in the ONCLICK event of the Submit Button.
WAY 2
Option Explicit
Private Declare Sub Sleep Lib "kernel32" (ByVal dwMilliseconds As Long)
Sub Click_Btn()
Dim objForms As Object
Dim vTxtInput As Variant
Dim objIE As Object
Set objIE = CreateObject("InternetExplorer.Application")
'Make sure an IE object was hooked
If TypeName(objIE) = "Nothing" Then
MsgBox "Could not hook Internet Explorer object", _
vbCritical, "GetFields() Error"
'GoTo Clean_Up
End If
objIE.Navigate "http://fly3.emirates.com/CAB/IBE/SearchAvailability.aspx"
objIE.Visible = True
Sleep 5000
Set objForms = objIE.document.all
'Choose one way Flights
objIE.document.getElementById("ctl00_c_CtWNW_onewaySearch").Click
' From
objIE.document.getElementById("ctl00_c_CtWNW_ddlFrom-suggest").Focus
objIE.document.getElementById("ctl00_c_CtWNW_ddlFrom-suggest").Value _
= "Mumbai (BOM)"
' Departure Date
objIE.document.getElementById("ctl00_c_CtWNW_txtDepartDate").Focus
objIE.document.getElementById("ctl00_c_CtWNW_txtDepartDate").Value _
= "28 Sep 12"
' To
objIE.document.getElementById("ctl00_c_CtWNW_ddlTo-suggest").Focus
objIE.document.getElementById("ctl00_c_CtWNW_ddlTo-suggest").Value _
= "Sydney (SYD)"
objIE.document.getElementById("ctl00_c_FS_FF").Focus
objIE.document.getElementById("ctl00_c_FS_FF").onclick = _
Replace(objIE.document.getElementById("ctl00_c_FS_FF").onclick, _
"checkValidation();", "true;")
objIE.document.getElementById("ctl00_c_FS_FF").Click
End Sub