vba code running but not fetching data - vba

I am new to vba.
I am trying to use below code by David Zemens to fetch data from yelp
Option Explicit
Private Sub Sleep Lib "kernel32" (ByVal dwMilliseconds As Long)
Sub find()
'Uses late binding, or add reference to Microsoft HTML Object Library
' and change variable Types to use intellisense
Dim ie As Object 'InternetExplorer.Application
Dim html As Object 'HTMLDocument
Dim Listings As Object 'IHTMLElementCollection
Dim l As Object 'IHTMLElement
Dim r As Long
Set ie = CreateObject("InternetExplorer.Application")
With ie
.Visible = False
.Navigate "http://www.yelp.com/search?find_desc=boutique&find_loc=New+York%2C+NY&ns=1&ls=3387133dfc25cc99#start=10"
' Don't show window
'Wait until IE is done loading page
Do While .readyState <> 4
Application.StatusBar = "Downloading information, Please wait..."
DoEvents
Sleep 200
Loop
Set html = .Document
End With
Set Listings = html.getElementsByTagName("LI") ' ## returns the list
For Each l In Listings
'## make sure this list item looks like the listings Div Class:
' then, build the string to put in your cell
If InStr(1, l.innerHTML, "media-block clearfix media-block-large main-attributes") > 0 Then
Range("A1").Offset(r, 0).Value = l.innerText
r = r + 1
End If
Next
Set html = Nothing
Set ie = Nothing
End Sub
Problem is that it's not getting any data from the source.
Regards

There's a lot of work to be done.
Here's something that you can start with. Hopefully, you will be able to find the other pieces of information using the same logic. This will print business names in the immediate window. I've found the business names in meta tag description.
I've changed the sleep amount to 5 seconds. IE will be able to fully load and the rest of the code will be processed reliably. The initial 200 milliseconds gave results once every couple of runs. I guess this depends how fast your computer is so 5 seconds is pretty safe I guess.
Option Explicit
Declare Sub Sleep Lib "kernel32" (ByVal dwMilliseconds As Long)
Sub find()
'Uses late binding, or add reference to Microsoft HTML Object Library
' and change variable Types to use intellisense
Dim returnstring As String 'this is going to hold boutiques names
Dim ie As Object 'InternetExplorer.Application
Dim html As Object 'HTMLDocument
Dim meta As Object 'IHTMLElementCollection
Dim l As Object 'IHTMLElement
Dim r As Long
Set ie = CreateObject("InternetExplorer.Application")
With ie
.Visible = False
.Navigate "http://www.yelp.com/search?find_desc=boutique&find_loc=New+York%2C+NY&ns=1&ls=3387133dfc25cc99#start=10"
' Don't show window
'Wait until IE is done loading page
Do While .readyState <> 4
Application.StatusBar = "Downloading information, Please wait..."
DoEvents
Sleep 5000
Loop
Set html = .Document
End With
Set meta = html.GetElementsByTagName("META") ' ## returns attribures
Dim m As Object
For Each m In meta
If InStr(m.Content, "Reviews on Boutique in New York -") > 0 Then
returnstring = Replace(m.Content, "Reviews on Boutique in New York -", "")
End If
Next
Dim i As Integer
For i = 0 To UBound(Split(returnstring, ","))
Debug.Print (Split(returnstring, ",")(i))
Next
Set html = Nothing
Set ie = Nothing
End Sub
Myoutput:

Related

How Do I Test If Webpage Contains Certain Text

I'm trying to detect if a web page has certain text. For example, I want to see if this web page includes the following phrase: "Here is my code"
I can't get it to ever find that the "If Then" condition is satisfied. Here's what I'm trying:
Const READYSTATE_COMPLETE = 4
Declare Function SetForegroundWindow Lib "user32" _
Alias "SetForegroundWindow" (ByVal Hwnd As Long)As Long
' Declare Internet Explorer object
Dim IE As SHDocVw.InternetExplorer
Dim strProgramName As String
Sub Main
' create instance of InternetExplorer
Set IE = New InternetExplorer
' using your newly created instance of Internet Explorer
With IE
SetForegroundWindow IE.HWND
.Visible = True
.Navigate2 "https://stackoverflow.com/questions/38355762/how-do-i-modify-web-scraping-code-to-loop-through-product-bullets-until-it-finds"
' Wait until page we are navigating to is loaded
Do While .Busy
Loop
Do
Loop Until .readyState = READYSTATE_COMPLETE
On Error Resume Next
If Err Then
Else
End If
Wait 2
If InStr(IE.document.body.innerHTML, "Here is my code") > 0 Then
MsgBox "Yessiree Bob"
Else
MsgBox "The text dosen't exist"
End If
Set IE = Nothing
' Tidy Up
End With
End Sub
I've also tried:
FindText = InStr(1, IE.document.body.innerHTML, "Here is my code")
If FindText > 0 Then
And
msg = IE.document.body.innerHTML
If InStr(msg, "Here is my code") > 0 Then
But nothing works. I've looked on Stack Overflow, but can't find this exact question.
Thanks in advance!
Use:
If InStr(IE.document.getElementById("body").innerHTML, "Here is my code") > 0 Then

Can't pull data from a stubborn webpage using vba

Hope you are doing well. The site i tried to scrape category-names from is very simple to look at if you notice it's inspected element but when i create a parser i can't pull the data. I wanted to scrape only the 7 category names from that page. I tried with all possible angles but failed. If anybody helps me point out what I'm doing wrong, I would be very grateful to him. Thanks in advance. FYC, I'm pasting here the code I tried with.
Sub ItemName()
Dim http As New MSXML2.XMLHTTP60, html As New HTMLDocument
Dim topics As Object, topic As Object, posts As Object, post As Object, ele As Object
Dim x As Long
x = 2
http.Open "GET", "http://www.bjs.com/tv--electronics.category.3000000000000144985.2002193", False
http.send
html.body.innerHTML = http.responseText
Set topics = html.getElementsByClassName("categories")
For Each topic In topics
For Each posts In topic.getElementsByTagName("li")
For Each post In posts.getElementsByTagName("a")
Set ele = post.getElementsByTagName("h4")(0)
Cells(x, 1) = ele.innerText
x = x + 1
Next post
Next posts
Next topic
End Sub
Here's one possible solution, I'm using the internet explorer object instead of MSXML. I'm able to retrieve the data from the page, and it's pretty quick.
Here's the full code:
Option Explicit
#If VBA7 Then
Public Declare PtrSafe Sub Sleep Lib "kernel32" (ByVal dwMilliseconds As LongPtr)
#Else
Public Declare Sub Sleep Lib "kernel32" (ByVal dwMilliseconds As Long)
#End If
Sub ItemName()
On Error GoTo errhand:
Dim ie As Object: Set ie = CreateObject("InternetExplorer.Application")
Dim topics As Object, topic As Object
Dim i As Byte
With ie
.Visible = False
.Navigate "http://www.bjs.com/tv--electronics.category.3000000000000144985.2002193"
Sleep 500 ' Wait for the page to start loading
Do Until .document.readyState = 4 Or .busy = False Or i >= 100
Sleep 100
DoEvents
i = i + 1
Loop
End With
Set topics = ie.document.getElementsByClassName("name ng-binding")
For Each topic In topics
'Print out the element's innertext
Debug.Print topic.innertext
Next
ie.Quit
Set ie = Nothing
Exit Sub
errhand:
Debug.Print Err.Number, Err.Description
ie.Quit
Set ie = Nothing
End Sub
As the content of that site are generated dynamically, so there is no way for xmlhttp request to catch the page source. However, to get around that selenium is good to go, as it works well when it comes to deal with javascriptheavy website. I only used selenium in my below script to get the page source. As soon as it get that, I reverted back to usual vba method to accomplish the process.
Sub Grabbing_item()
Dim driver As New ChromeDriver, html As New HTMLDocument
Dim post As Object
With driver
.get "http://www.bjs.com/tv--electronics.category.3000000000000144985.2002193"
html.body.innerHTML = .ExecuteScript("return document.body.innerHTML;")
.Quit
End With
For Each post In html.getElementsByClassName("name")
x = x + 1: Cells(x, 1) = post.innerText
Next post
End Sub

Excel Issue: Object Variable or with block Variable not set

I have a code that works for me about 70% of the time, the other times it throws the Error code 91 "Object Variable or With block Variable not set". If i click End and re-run it it will work fine.
The function is taking data that is entered into cells of an excel spreadsheet and populating text boxes, checking radio buttons, and selecting from drop-down lists on a webpage.
I can't post a link to the actual webpage for privacy issues but I'm hoping someone can help me why the error is coming up?
The line that shows the error is
drp.selectedIndex = Thisworkbook.sheets("sheet1").Range("L2").
Sub FillInternetForm()
Dim ie As Object
Set ie = CreateObject("InternetExplorer.Application")
'set references for copying to submitted reps sheet and deleting for next rep
Application.ScreenUpdating = False
'create new instance of IE.
ie.navigate "removed for privacy"
'go to web page listed inside quotes
ie.Visible = True
While ie.busy
DoEvents 'wait until IE is done loading page.
Wend
'select onboarding system CRIS or ENS
Set html = ie.document
Dim drp As HTMLFormElement
Set drp = html.getElementById("ctl00$m$g_62853594_bb4b_4cec_8b5c_17fb6abb735e$ff45_1$ctl00 $DropDownChoice")
drp.selectedIndex = ThisWorkbook.Sheets("sheet1").Range("L2")
'set address nickname based on value
Set drp = html.getElementById("ctl00$m$g_62853594_bb4b_4cec_8b5c_17fb6abb735e$ff32_1$ctl00 $Lookup")
drp.selectedIndex = ThisWorkbook.Sheets("sheet1").Range("m2")
'set market based on value
Set drp = html.getElementById("ctl00$m$g_62853594_bb4b_4cec_8b5c_17fb6abb735e$ff6_1$ctl00$ DropDownChoice")
drp.selectedIndex = ThisWorkbook.Sheets("sheet1").Range("e2")
'check Not moved from another partner
ie.document.getElementById("ctl00_m_g_62853594_bb4b_4cec_8b5c_17fb6abb735e_ff46_1_ctl00_ctl01").Click
'input name and ssn based on excel sheet values
ie.document.all("ctl00$m$g_62853594_bb4b_4cec_8b5c_17fb6abb735e$ff3_1$ctl00$ctl00$TextField").Value = ThisWorkbook.Sheets("sheet1").Range("a2")
ie.document.all("ctl00$m$g_62853594_bb4b_4cec_8b5c_17fb6abb735e$ff4_1$ctl00$ctl00$TextField").Value = ThisWorkbook.Sheets("sheet1").Range("b2")
ie.document.all("ctl00$m$g_62853594_bb4b_4cec_8b5c_17fb6abb735e$ff5_1$ctl00$ctl00$TextField").Value = ThisWorkbook.Sheets("sheet1").Range("c2")
'input extra fields based on ICL value
'owner
ie.document.all("ctl00_m_g_62853594_bb4b_4cec_8b5c_17fb6abb735e_ff11_1_ctl00_ctl00_TextField").Value = ThisWorkbook.Sheets("sheet1").Range("j2")
'city
ie.document.all("ctl00$m$g_62853594_bb4b_4cec_8b5c_17fb6abb735e$ff14_1$ctl00$ctl00$TextField").Value = ThisWorkbook.Sheets("sheet1").Range("f2")
'state
ie.document.all("ctl00$m$g_62853594_bb4b_4cec_8b5c_17fb6abb735e$ff15_1$ctl00$ctl00$TextField").Value = ThisWorkbook.Sheets("sheet1").Range("g2")
'address
ie.document.all("ctl00$m$g_62853594_bb4b_4cec_8b5c_17fb6abb735e$ff13_1$ctl00$ctl00$TextField").Value = ThisWorkbook.Sheets("sheet1").Range("i2")
'phone
ie.document.all("ctl00$m$g_62853594_bb4b_4cec_8b5c_17fb6abb735e$ff10_1$ctl00$ctl00$TextField").Value = ThisWorkbook.Sheets("sheet1").Range("k2")
'zip
ie.document.all("ctl00$m$g_62853594_bb4b_4cec_8b5c_17fb6abb735e$ff16_1$ctl00$ctl00$TextField").Value = ThisWorkbook.Sheets("sheet1").Range("h2")
End Sub
While ie.busy is probably to blame. Use Loop Until ie.readyState = READYSTATE_COMPLETE instead.
I would cast .Range("L2") as an integer using CInt(.Range("L2")). You have a reference to the HTML Object library set. You should go ahead and add a reference to Microsoft Internet Controls. This way you'll get advantages of intellisense and internet constants. If you don't want to set the reference add Const READYSTATE_COMPLETE = 4 to the code and change the ie references back
Sub FillInternetForm()
Dim ie As InternetExplorer
Dim doc As HTMLDocument
Set ie = New InternetExplorer
'set references for copying to submitted reps sheet and deleting for next rep
Application.ScreenUpdating = False
'create new instance of IE.
ie.navigate "removed for privacy"
'go to web page listed inside quotes
ie.Visible = True
Do
DoEvents
Loop Until ie.readyState = READYSTATE_COMPLETE
With Sheets("sheet1")
'select onboarding system CRIS or ENS
Set doc = ie.document
Dim drp As HTMLFormElement
Set drp = HTML.getElementById("ctl00$m$g_62853594_bb4b_4cec_8b5c_17fb6abb735e$ff45_1$ctl00 $DropDownChoice")
drp.selectedIndex = CInt(.Range("L2"))
'set address nickname based on value
Set drp = HTML.getElementById("ctl00$m$g_62853594_bb4b_4cec_8b5c_17fb6abb735e$ff32_1$ctl00 $Lookup")
drp.selectedIndex = CInt(.Range("m2"))
'set market based on value
Set drp = HTML.getElementById("ctl00$m$g_62853594_bb4b_4cec_8b5c_17fb6abb735e$ff6_1$ctl00$ DropDownChoice")
drp.selectedIndex = CInt(.Range("e2"))
'check Not moved from another partner
doc.getElementById("ctl00_m_g_62853594_bb4b_4cec_8b5c_17fb6abb735e_ff46_1_ctl00_ctl01").Click
'input name and ssn based on excel sheet values
doc.all("ctl00$m$g_62853594_bb4b_4cec_8b5c_17fb6abb735e$ff3_1$ctl00$ctl00$TextField").value = .Range("a2")
doc.all("ctl00$m$g_62853594_bb4b_4cec_8b5c_17fb6abb735e$ff4_1$ctl00$ctl00$TextField").value = .Range("b2")
doc.all("ctl00$m$g_62853594_bb4b_4cec_8b5c_17fb6abb735e$ff5_1$ctl00$ctl00$TextField").value = .Range("c2")
'input extra fields based on ICL value
'owner
doc.all("ctl00_m_g_62853594_bb4b_4cec_8b5c_17fb6abb735e_ff11_1_ctl00_ctl00_TextField").value = .Range("j2")
'city
doc.all("ctl00$m$g_62853594_bb4b_4cec_8b5c_17fb6abb735e$ff14_1$ctl00$ctl00$TextField").value = .Range("f2")
'state
doc.all("ctl00$m$g_62853594_bb4b_4cec_8b5c_17fb6abb735e$ff15_1$ctl00$ctl00$TextField").value = .Range("g2")
'address
doc.all("ctl00$m$g_62853594_bb4b_4cec_8b5c_17fb6abb735e$ff13_1$ctl00$ctl00$TextField").value = .Range("i2")
'phone
doc.all("ctl00$m$g_62853594_bb4b_4cec_8b5c_17fb6abb735e$ff10_1$ctl00$ctl00$TextField").value = .Range("k2")
'zip
doc.all("ctl00$m$g_62853594_bb4b_4cec_8b5c_17fb6abb735e$ff16_1$ctl00$ctl00$TextField").value = .Range("h2")
End With
End Sub
I agree with #Thomas Inzina in that it's probably because the document hasn't fully loaded yet.
But I've found with my past projects that the best way to handle these 30% failure rates is to just "wait a bit longer" before processing or referring to elements.
Try adding a DoEvents and a Sleep command before you selectedIndex call.
See if it makes a difference
At the top of your module (before any subs are declared)
Private Declare Sub Sleep Lib "kernel32" (ByVal dwMilliseconds As Long)
Then add the two lines before you start set the doc variable
Doevents
Sleep 2000 ' Sleep two seconds (2000 ms)
Set doc = ie.document
You can play with the sleep value once you get a number that works for you
EDIT - added solution
A more reliable way is to actually run a loop that has DoEvents/Sleep and an incrementing counter for numTries. It exits the loop only when the check for getElementById doesn't fail - or your maxTries counter is reached.
Pretty simple loop but let me know if you need an example
EDIT - EXAMPLE: A wait until loaded loop
Change this
Set drp = HTML.getElementById("ctl00$m$g_62853594_bb4b_4cec_8b5c_17fb6abb735e$ff45_1$ctl00 $DropDownChoice")
drp.selectedIndex = CInt(.Range("L2"))
To this:
Dim iTries As Integer
Dim iMaxTries As Integer ' it's better to turn this into a const at top of your sub
iMaxTries = 3
iTries = 0
While (iTries < iMaxTries) And IsNull(HTML.getElementById("ctl00$m$g_62853594_bb4b_4cec_8b5c_17fb6abb735e$ff45_1$ctl00 $DropDownChoice"))
iTries = iTries + 1
DoEvents
Sleep 750 ' try 3/4 second delays to start with
Wend
If iTries = iMaxTries Then
MsgBox "Did not load HTML element in " & iMaxTries & " tries"
Exit Sub
End If
' Should all be loaded and ready to process now
Set drp = HTML.getElementById("ctl00$m$g_62853594_bb4b_4cec_8b5c_17fb6abb735e$ff45_1$ctl00 $DropDownChoice")
drp.selectedIndex = CInt(.Range("L2"))

How to scrape data from the following table format VBA

I am trying to scrape all the table from start page to end contents from this Webpage
Using the code below I can scrape the table contents of page 1 but I don't know how can I modify the code to get the data from start page to end.
Option Explicit
Sub NBAStats()
Dim IE As Object, obj As Object
Dim r As Integer, c As Integer, t As Integer
Dim elemCollection As Object
Set IE = CreateObject("InternetExplorer.Application")
With IE
.Visible = True
.navigate ("http://stats.nba.com/league/player/#!/")
While IE.ReadyState <> 4
DoEvents
Wend
Do While IE.busy: DoEvents: Loop
ThisWorkbook.Sheet1.Clear
Set elemCollection = IE.Document.getElementsByTagName("TABLE")
For t = 0 To (elemCollection.Length - 1)
For r = 0 To (elemCollection(t).Rows.Length - 1)
For c = 0 To (elemCollection(t).Rows(r).Cells.Length - 1)
ThisWorkbook.Worksheets(1).Cells(r + 1, c + 1) = elemCollection(t).Rows(r).Cells(c).innerText
Next c
Next r
Next t
End With
Set IE = Nothing
End Sub
Try to find the Sitemap.xml of the website you are scraping. The sitemap.xml fill have all the links present in the webpage.
Import that xml file to your Excel Sheet, Read each link & fetch each table in it.
At first, in my opinion VBA automation of the Internet Explorer is highly instable and not really practicable in productive use-cases. This means also scraping data from web sites which are provided only for viewing within a browser is not really practicable in productive use-cases. If you are entitled to use those data then you should ask for another data source (XML or JSONfor example). If you are not entitled then you should not do that. Possible the provider of the web site does not agree with this.
To be clear, I'm talking about web sites like this, which provides it's data with JavaScript only. If the data would be within the HTML then you could get those data via XMLHTTP. This is another thing.
I will nevertheless provide a "solution". So you can't simply think "He is simply unable to do this, so he is saying you should not do that."
So you must analyze the site and pick out the elements you can click for navigation.
Option Explicit
Sub NBAStats()
Dim IE As Object
Dim r As Long, c As Long, t As Long, rSheet As Long, rStart As Long
Dim bReady As Boolean
Dim elementsTable As Object
Dim elementsPageNavRigth As Object
Dim elemPageNavRigth As Object
Dim elementsTableDiv As Object
ThisWorkbook.Worksheets(1).Cells.Clear
Set IE = CreateObject("InternetExplorer.Application")
With IE
.Visible = True
.navigate ("http://stats.nba.com/league/player/#!/")
Do While IE.busy
DoEvents
Loop
rSheet = 0
Do
Do While elementsTableDiv Is Nothing
Set elementsTableDiv = IE.Document.getElementsByClassName("table-responsive")
DoEvents
Loop
Do While elementsTableDiv(0) Is Nothing
DoEvents
Loop
Set elementsPageNavRigth = IE.Document.getElementsByClassName("page-nav right")
Set elemPageNavRigth = elementsPageNavRigth(0)
If elemPageNavRigth.className = "page-nav right disabled" Then bReady = True
'If rSheet = 0 Then rStart = 0 Else rStart = 1
Set elementsTable = elementsTableDiv(0).getElementsByTagName("TABLE")
For r = rStart To (elementsTable(0).Rows.Length - 1)
For c = 0 To (elementsTable(0).Rows(r).Cells.Length - 1)
ThisWorkbook.Worksheets(1).Cells(r + rSheet + 1, c + 1) = elementsTable(t).Rows(r).Cells(c).innerText
Next c
Next r
rSheet = rSheet + r
If Not elemPageNavRigth Is Nothing Then elemPageNavRigth.Click
Set elementsTableDiv = Nothing
Loop Until bReady Or elemPageNavRigth Is Nothing
End With
Set IE = Nothing
End Sub

Get data from listings on a website to excel VBA

I am trying to find a way to get the data from yelp.com
I have a spreadsheet on which there are several keywords and locations. I am looking to extract data from yelp listings based on these keywords and locations already in my spreadsheet.
I have created the following code, but it seems to get absurd data and not the exact information I am looking for.
I want to get business name, address and phone number, but all I am getting is nothing. If anyone here could help me solve this problem.
Sub find()
Dim ie As Object
Set ie = CreateObject("InternetExplorer.Application")
With ie
ie.Visible = False
ie.Navigate "http://www.yelp.com/search?find_desc=boutique&find_loc=New+York%2C+NY&ns=1&ls=3387133dfc25cc99#start=10"
' Don't show window
ie.Visible = False
'Wait until IE is done loading page
Do While ie.Busy
Application.StatusBar = "Downloading information, lease wait..."
DoEvents
Loop
' Make a string from IE content
Set mDoc = ie.Document
peopleData = mDoc.body.innerText
ActiveSheet.Cells(1, 1).Value = peopleData
End With
peopleData = "" 'Nothing
Set mDoc = Nothing
End Sub
If you right click in IE, and do View Source, it is apparent that the data served on the site is not part of the document's .Body.innerText property. I notice this is often the case with dynamically served data, and that approach is really too simple for most web-scraping.
I open it in Google Chrome and inspect the elements to get an idea of what I'm really looking for, and how to find it using a DOM/HTML parser; you will need to add a reference to Microsoft HTML Object Library.
I think you can get it to return a collection of the <DIV> tags, and then check those for the classname with an If statment inside the loop.
I made some revisions to my original answer, this should print each record in a new cell:
Option Explicit
Private Sub Sleep Lib "kernel32" (ByVal dwMilliseconds As Long)
Sub find()
'Uses late binding, or add reference to Microsoft HTML Object Library
' and change variable Types to use intellisense
Dim ie As Object 'InternetExplorer.Application
Dim html As Object 'HTMLDocument
Dim Listings As Object 'IHTMLElementCollection
Dim l As Object 'IHTMLElement
Dim r As Long
Set ie = CreateObject("InternetExplorer.Application")
With ie
.Visible = False
.Navigate "http://www.yelp.com/search?find_desc=boutique&find_loc=New+York%2C+NY&ns=1&ls=3387133dfc25cc99#start=10"
' Don't show window
'Wait until IE is done loading page
Do While .readyState <> 4
Application.StatusBar = "Downloading information, Please wait..."
DoEvents
Sleep 200
Loop
Set html = .Document
End With
Set Listings = html.getElementsByTagName("LI") ' ## returns the list
For Each l In Listings
'## make sure this list item looks like the listings Div Class:
' then, build the string to put in your cell
If InStr(1, l.innerHTML, "media-block clearfix media-block-large main-attributes") > 0 Then
Range("A1").Offset(r, 0).Value = l.innerText
r = r + 1
End If
Next
Set html = Nothing
Set ie = Nothing
End Sub