Unable to parse some links lying within an iframe - vba

I've written a script in vba using IE to parse some links from a webpage. The thing is the links are within an iframe. I've twitched my code in such a way so that the script will first find a link within that iframe and navigate to that new page and parse the required content from there. If i do this way then I can get all the links.
Webpage URL: weblink
Successful approach (working one):
Sub Get_Links()
Dim IE As New InternetExplorer, HTML As HTMLDocument
Dim elem As Object, post As Object
With IE
.Visible = True
.navigate "put here the above link"
While .Busy = True Or .readyState < 4: DoEvents: Wend
Set elem = .document.getElementById("compInfo") #it is within iframe
.navigate elem.src
While .Busy = True Or .readyState < 4: DoEvents: Wend
Set HTML = .document
End With
For Each post In HTML.getElementsByClassName("news")
With post.getElementsByTagName("a")
If .Length Then R = R + 1: Cells(R, 1) = .Item(0).href
End With
Next post
IE.Quit
End Sub
I've seen few sites where no such links exist within iframe so, I will have no option to use any link to track down the content.
If you take a look at the below approach by tracking the link then you can notice that I've parsed the content from a webpage which are within Iframe. There is no such link within Iframe to navigate to a new webpage to locate the content. So, I used contentWindow.document instead and found it working flawlessly.
Link to the working code of parsing Iframe content from another site:
contentWindow approach
However, my question is: why should i navigate to a new webpage to collect the links as I can see the content in the landing page? I tried using contentWindow.document but it is giving me access denied error. How can I make my below code work using contentWindow.document like I did above?
I tried like this but it throws access denied error:
Sub Get_Links()
Dim IE As New InternetExplorer, HTML As HTMLDocument
Dim frm As Object, post As Object
With IE
.Visible = True
.Navigate "put here the above link"
While .Busy = True Or .readyState < 4: DoEvents: Wend
Set HTML = .document
End With
''the code breaks when it hits the following line "access denied error"
Set frm = HTML.getElementById("compInfo").contentWindow.document
For Each post In frm.getElementsByClassName("news")
With post.getElementsByTagName("a")
If .Length Then R = R + 1: Cells(R, 1) = .Item(0).href
End With
Next post
IE.Quit
End Sub
I've attached an image to let you know which links (they are marked with pencil) I'm after.
These are the elements within which one such link (i would like to grab) is found:
<div class="news">
<span class="news-date_time"><img src="images/arrow.png" alt="">19 Jan 2018 00:01</span>
<a style="color:#5b5b5b;" href="/HomeFinancial.aspx?&cocode=INE117A01022&Cname=ABB-India-Ltd&srno=17019039003&opt=9">ABB India Limited - Press Release</a>
</div>
Image of the links of that page I would like to grab:
From the very first day while creating this thread I strictly requested not to use this url http://hindubusiness.cmlinks.com/Companydetails.aspx?cocode=INE117A01022 to locate the data. I requested any solution from this main_page_link without touching the link within iframe. However, everyone is trying to provide solutions that I've already shown in my post. What did I put a bounty for then?

You can see the links within <iframe> in browser but can't access them programmatically due to Same-origin policy.
There is the example showing how to retrieve the links using XHR and RegEx:
Option Explicit
Sub Test()
Dim sContent As String
Dim sUrl As String
Dim aLinks() As String
Dim i As Long
' Retrieve initial webpage HTML content via XHR
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", "https://www.thehindubusinessline.com/stocks/abb-india-ltd/overview/", False
.Send
sContent = .ResponseText
End With
'WriteTextFile sContent, CreateObject("WScript.Shell").SpecialFolders("Desktop") & "\tmp\tmp.htm", -1
' Extract target iframe URL via RegEx
With CreateObject("VBScript.RegExp")
.Global = True
.MultiLine = True
.IgnoreCase = True
' Process all a within div.news
.Pattern = "<iframe[\s\S]*?src=""([^""]*?Companydetails[^""]*)""[^>]*>"
sUrl = .Execute(sContent).Item(i).SubMatches(0)
End With
' Retrieve iframe HTML content via XHR
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", sUrl, False
.Send
sContent = .ResponseText
End With
'WriteTextFile sContent, CreateObject("WScript.Shell").SpecialFolders("Desktop") & "\tmp\tmp.htm", -1
' Parse links via XHR
With CreateObject("VBScript.RegExp")
.Global = True
.MultiLine = True
.IgnoreCase = True
' Process all anchors within div.news
.Pattern = "<div class=""news"">[\s\S]*?href=""([^""]*)"
With .Execute(sContent)
ReDim aLinks(0 To .Count - 1)
For i = 0 To .Count - 1
aLinks(i) = .Item(i).SubMatches(0)
Next
End With
End With
Debug.Print Join(aLinks, vbCrLf)
End Sub
Generally RegEx's aren't recommended for HTML parsing, so there is disclaimer. Data being processed in this case is quite simple that is why it is parsed with RegEx.
The output for me as follows:
/HomeFinancial.aspx?&cocode=INE117A01022&Cname=ABB-India-Ltd&srno=17047038016&opt=9
/HomeFinancial.aspx?&cocode=INE117A01022&Cname=ABB-India-Ltd&srno=17046039003&opt=9
/HomeFinancial.aspx?&cocode=INE117A01022&Cname=ABB-India-Ltd&srno=17045039006&opt=9
/HomeFinancial.aspx?&cocode=INE117A01022&Cname=ABB-India-Ltd&srno=17043039002&opt=9
/HomeFinancial.aspx?&cocode=INE117A01022&Cname=ABB-India-Ltd&srno=17043010019&opt=9
I also tried to copy the content of the <iframe> from IE to clipboard (for further pasting to the worksheet) using commands:
IE.ExecWB OLECMDID_SELECTALL, OLECMDEXECOPT_DODEFAULT
IE.ExecWB OLECMDID_COPY, OLECMDEXECOPT_DODEFAULT
But actually that commands select and copy the main document, excluding the frame, unless I click on the frame manually. So that might be applied if click on the frame could be reproduced from VBA (frame node methods like .focus and .click didn't help).

Something like this should work. They key is to realize the iFrame is technically another Document. Reviewing the iFrame on the page you listed, you can easily use a web request to get at the data you need. As already mentioned, the reason you get an error is due to the Same-Origin policy. You could write something to get the src of the iFrame then do the web request as I've shown below, or, use IE to scrape the page, get the src, then load that page which looks like what you have done.
I would recommend using a web request approach, Internet Explorer can get annoying, fast.
Code
Public Sub SOExample()
Dim html As Object 'To store the HTML content
Dim Elements As Object 'To store the anchor collection
Dim Element As Object 'To iterate the anchor collection
Set html = CreateObject("htmlFile")
With CreateObject("MSXML2.XMLHTTP")
'Navigate to the source of the iFrame, it's another page
'View the source for the iframe. Alternatively -
'you could navigate to this page and use IE to scrape it
.Open "GET", "https://stocks.thehindubusinessline.com/Companydetails.aspx?&cocode=INE117A01022"
.send ""
'See if the request was ok, exit it there was an error
If Not .Status = 200 Then Exit Sub
'Assign the page's HTML to an HTML object
html.body.InnerHTML = .responseText
Set Elements = html.body.document.getElementByID("hmstockchart_CompanyNews1_updateGLVV")
Set Elements = Elements.getElementsByTagName("a")
For Each Element In Elements
'Print out the data to the Immediate window
Debug.Print Element.InnerText
Next
End With
End Sub
Results
ABB India Limited - AGM/Book Closure
Board of ABB India recommends final dividend
ABB India to convene AGM
ABB India to pay dividend
ABB India Limited - Outcome of Board Meeting
More ?

The simple of solution like everyone suggested is to directly go the link. This would take the IFRAME out of picture and it would be easier for you loop through links. But in case you still don't like the approach then you need to get a bit deeper into the hole.
Below is a function from a library I wrote long back in VB.NET
https://github.com/tarunlalwani/ScreenCaptureAPI/blob/2646c627b4bb70e36fe2c6603acde4cee3354b39/Source%20Code/ScreenCaptureAPI/ScreenCaptureAPI/ScreenCapture.vb#L803
Private Function _EnumIEFramesDocument(ByVal wb As HTMLDocumentClass) As Collection
Dim pContainer As olelib.IOleContainer = Nothing
Dim pEnumerator As olelib.IEnumUnknown = Nothing
Dim pUnk As olelib.IUnknown = Nothing
Dim pBrowser As SHDocVW.IWebBrowser2 = Nothing
Dim pFramesDoc As Collection = New Collection
_EnumIEFramesDocument = Nothing
pContainer = wb
Dim i As Integer = 0
' Get an enumerator for the frames
If pContainer.EnumObjects(olelib.OLECONTF.OLECONTF_EMBEDDINGS, pEnumerator) = 0 Then
pContainer = Nothing
' Enumerate and refresh all the frames
Do While pEnumerator.Next(1, pUnk) = 0
On Error Resume Next
' Clear errors
Err.Clear()
' Get the IWebBrowser2 interface
pBrowser = pUnk
If Err.Number = 0 Then
pFramesDoc.Add(pBrowser.Document)
i = i + 1
End If
Loop
pEnumerator = Nothing
End If
_EnumIEFramesDocument = pFramesDoc
End Function
So basically this is a VB.NET version of below C++ version
Accessing body (at least some data) in a iframe with IE plugin Browser Helper Object (BHO)
Now you just need to port it to VBA. The only problem you may have is finding the olelib rerefernce. Rest most of it is VBA compatible
So once you get the array of object, you will find the one which belongs to your frame and then you can just that one
frames = _EnumIEFramesDocument(IE)
frames.Item(1).document.getElementsByTagName("A").length

Related

Can't click on some dots to scrape information

I've written a script in vba in combination with IE to click on some dots available on a map in a web page. When a dot is clicked, a small box containing relevant information pops up.
Link to that website
I would like to parse the content of each box. The content of that box can be found using class name contentPane. However, the main concern here is to generate each box by clicking on those dots. When a box shows up, it looks how you can see in the below image.
This is the script I've tried so far:
Sub HitDotOnAMap()
Const Url As String = "https://www.arcgis.com/apps/Embed/index.html?webmap=4712740e6d6747d18cffc6a5fa5988f8&extent=-141.1354,10.7295,-49.7292,57.6712&zoom=true&scale=true&search=true&searchextent=true&details=true&legend=true&active_panel=details&basemap_gallery=true&disable_scroll=true&theme=light"
Dim IE As New InternetExplorer, HTML As HTMLDocument
Dim post As Object, I&
With IE
.Visible = True
.navigate Url
While .Busy = True Or .readyState < 4: DoEvents: Wend
Set HTML = .document
End With
Application.Wait Now + TimeValue("00:0:07") ''the following line zooms in the slider
HTML.querySelector("#mapDiv_zoom_slider .esriSimpleSliderIncrementButton").Click
Application.Wait Now + TimeValue("00:0:04")
With HTML.querySelectorAll("[id^='NWQMC_VM_directory_'] circle")
For I = 0 To .Length - 1
.item(I).Focus
.item(I).Click
Application.Wait Now + TimeValue("00:0:03")
Set post = HTML.querySelector(".contentPane")
Debug.Print post.innerText
HTML.querySelector("[class$='close']").Click
Next I
End With
End Sub
when I execute the above script, it looks like it is running smoothly but nothing happens (I meant, no clicking) and it doesn't throw any error either. Finally it quits the browser gracefully.
This is how a box with information looks like when a dot gets clicked.
Although I've used hardcoded delay within my script, they can be fixed later as soon as the macro starts working.
Question: How can I click each of the dots on that map and collect the relevant information from the popped-up box? I only expect to have any solution using Internet Explorer
The data are not the main concern here. I would like to know how IE work in such cases so that I can deal with them in future cases. Any solution other than IE is not I'm looking for.
No need to click on each dots. Json file has all the details and you can extract as per your requirement.
Installation of JsonConverter
Download the latest release
Import JsonConverter.bas into your project (Open VBA Editor, Alt + F11; File > Import File)
Add Dictionary reference/class
For Windows-only, include a reference to "Microsoft Scripting Runtime"
For Windows and Mac, include VBA-Dictionary
References to be added
Download the sample file here.
Code:
Sub HitDotOnAMap()
Const Url As String = "https://www.arcgis.com/sharing/rest/content/items/4712740e6d6747d18cffc6a5fa5988f8/data?f=json"
Dim IE As New InternetExplorer, HTML As HTMLDocument
Dim post As Object, I&
Dim data As String, colObj As Object
With IE
.Visible = True
.navigate Url
While .Busy = True Or .readyState < 4: DoEvents: Wend
data = .document.body.innerHTML
data = Replace(Replace(data, "<pre>", ""), "</pre>", "")
End With
Dim JSON As Object
Set JSON = JsonConverter.ParseJson(data)
Set colObj = JSON("operationalLayers")(1)("featureCollection")("layers")(1)("featureSet")
For Each Item In colObj("features")
For j = 1 To Item("attributes").Count - 1
Debug.Print Item("attributes").Keys()(j), Item("attributes").Items()(j)
Next
Next
End Sub
Output

Unable to switch to a new tab in an efficient manner

I've written a script in vba which is able to click on a certain link (Draw a map) of a webpage. When the clicking is done, a new tab opens up containing information I would like to grab from. My script can do all these errorlessly. Upon running the script it scrapes the title visible as Make a Google Map from a GPS file from the new tab.
My question: is there any alternative way to switch to new tab other than using hardcoded search like If IE.LocationURL Like "*" & "output_geocoder" Then?
This is my script:
Sub FetchInfo()
Const url As String = "http://www.gpsvisualizer.com/geocoder/"
Dim IE As New InternetExplorer, Html As HTMLDocument, R&
Dim winShell As New Shell
With IE
.Visible = True
.navigate url
While .Busy = True Or .readyState < 4: DoEvents: Wend
Set Html = .document
End With
Html.querySelector("input[value$='map']").Click
For Each IE In winShell.Windows
If IE.LocationURL Like "*" & "output_geocoder" Then
IE.Visible = True
While IE.Busy = True Or IE.readyState < 4: DoEvents: Wend
Set Html = IE.document
Exit For
End If
Next
Set post = Html.querySelector("h1")
MsgBox post.innerText
IE.Quit
End Sub
To execute the above script, add this reference to the library:
Microsoft Shell Controls And Automation
Microsoft Internet Controls
Microsoft HTML Object Library
Btw, there is nothing wrong with the above script. I only wish to know any better way to do the same.
This is the best I have so far with selenium
Option Explicit
Public Sub GetInfo()
Dim d As WebDriver
Set d = New ChromeDriver
Const url = "http://www.gpsvisualizer.com/geocoder/"
With d
.Start "Chrome"
.get url
.FindElementByCss("input[value$='map']").Click
.SwitchToNextWindow
.FindElementByCss("input.gpsv_submit").Click
MsgBox .Title
Stop
.Quit
End With
End Sub
The more fixed with title is:
.SwitchToWindowByTitle("GPS Visualizer: Draw a map from a GPS data file").Activate
.FindElementByCss("input.gpsv_submit").Click
tl;dr;
I will need to read up more on how robust .SwitchToNextWindow is.
FYI, you can get handles info with:
Dim hwnds As List
Set hwnds = driver.Send("GET", "/window_handles")

Trying to get VBA to pull in data from Zillow

I am new to coding and have been trying to figure out how to extract specific data from zillow and import it into excel. To be honest I am pretty lost trying to figure this out and I have been looking throughout the form and other online videos, but I haven't had any luck.
Here is the link to the website I am using https://www.zillow.com/new-york-ny/home-values/
I am looking to pull all the numbers into excel so I can run some calculations. If someone could help me just pull in the Zillow Home Value Index of $660,000 into excel, I feel that I can figure out the rest.
This is the code from the website
<ul class="value-info-list" id="yui_3_18_1_1_1529698944920_2626">
<li id="yui_3_18_1_1_1529698944920_2625">
<!-- TODO: need zillow logo icon here -->
<!-- <span class="zss-logo-color"><span class="zss-font-icon"></span></span> -->
<span class="value" id="yui_3_18_1_1_1529698944920_2624">
$660,000
</span>
<span class="info zsg-fineprint"> ZHVI
</span>
I tried getElementsByTagName getElementById and getElemenByClass The id is confusing me since I want to be able to enter any town into excel and it will search on zillow for the data on the web page. All the id tags are different so if I search by id in this code it will not work for other towns. I used the Class tag and was able to get some of the data I was looking for.
This is the code I came up with It pulls into the text box the $660,000. The Range function is working and putting the text box data into excel. This is pulling a bunch of strings which I was able to pull out the $660,000, but the way the sting is set up Im not sure how to pull the remaining data, such as the 1 year forecast "yr_forcast" is the cell range I want to pull the data into excel.
Sub SearchBot1()
'dimension (declare or set aside memory for) our variables
Dim objIE As InternetExplorer 'special object variable representing the IE browser
Dim aEle As HTMLLinkElement 'special object variable for an <a> (link) element
Dim y As Integer 'integer variable we'll use as a counter
Dim result As String 'string variable that will hold our result link
Dim Doc As HTMLDocument 'holds document object for internet explorer
'initiating a new instance of Internet Explorer and asigning it to objIE
Set objIE = New InternetExplorer
'make IE browser visible (False would allow IE to run in the background)
objIE.Visible = True
'navigate IE to this web page (a pretty neat search engine really)
objIE.navigate "https://www.zillow.com/new-york-ny/home-values/"
'wait here a few seconds while the browser is busy
Do While objIE.Busy = True Or objIE.readyState <> 4: DoEvents: Loop
'in the search box put cell "A2" value, the word "in" and cell "C1" value
objIE.document.getElementById("local-search").Value = _
Sheets("Sheet2").Range("B3").Value & ", " & Sheets("Sheet2").Range("B4").Value
'click the 'go' button
Set the_input_elements = objIE.document.getElementsByTagName("button")
For Each input_element In the_input_elements
If input_element.getAttribute("name") = "SubmitButton" Then
input_element.Click
Exit For
End If
Next input_element
'wait again for the browser
Do While objIE.Busy = True Or objIE.readyState <> 4: DoEvents: Loop
'price for home
Set Doc = objIE.document
Dim cclass As String
cclass = Trim(Doc.getElementsByClassName("value-info-list")(0).innerText)
MsgBox cclass
Dim aclass As Variant
aclass = Split(cclass, " ")
Range("Market_Price").Value = aclass(0)
Range("yr_forecast").Value = aclass(5)
'close the browser
objIE.Quit
End Sub
If you need anymore information please let me know.
The value you want is the first element with className value. You can use querySelector to apply a CSS selector of .value, where "." is the selector for class, to get this value.
Option Explicit
Public Sub GetInfo()
Dim html As New MSHTML.HTMLDocument
Const URL As String = "https://www.zillow.com/new-york-ny/home-values/"
html.body.innerHTML = GetHTML(URL)
Debug.Print html.querySelector(".value").innerText
End Sub
Public Function GetHTML(ByVal URL As String) As String
Dim sResponse As String
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", URL, False
.send
sResponse = StrConv(.responseBody, vbUnicode)
End With
GetHTML = Mid$(sResponse, InStr(1, sResponse, "<!DOCTYPE "))
End Function
You could also use:
Debug.Print html.getElementsByClassName("value")(0).innerText
Current webpage value:
Code output:

How can I import data from a child URL?

I thought I figured this out over the weekend, but it actually doesn't work the way I thought it would. I have a confidential corporate SharePoint site that I work with. I can't post the link here, or any specific data, but the concept below will illustrate the point fine.
I have a parent URL that I want to import data from. Let's say this is the parent URL.
http://www.sharenet.co.za/v3/q_sharelookup.php
From there, I want to import data from a specific link. Let's say this is the link: 'Building & Construction Materials'
I think the best way to do this is some kind of InStr() function and search for the string. Then, if found, click the link and open the child URL. When the child URL opens, it looks something like this:
http://www.sharenet.co.za/v3/sharesfound.php?ssector=2353&exch=JSE&bookmark=Building%20&%20Construction%20Materials&scheme=default
I can't tell what the sector numbers will be ahead of time, so I can't use a specific URL. I need to reference it as the parent and child, or maybe IE1 and IE2. I want to import all data from the child URL, which in this example, looks like this.
Name Full Name Code Sector
BUILDMX BUILDMAX LIMITED BDM 2353
KAYDAV KAYDAV GROUP LTD KDV 2353
AFRIMAT AFRIMAT LTD AFT 2353
Trellidor Trellidor Hldgs Ltd TRL 2353
MASONITE MASONITE (AFRICA) LIMITED MAS 2353
DAWN DISTRIBUTION AND WAREHOUSING NETWORK LIMITED DAW 2353
MAZOR MAZOR GROUP LTD MZR 2353
PPC PPC LIMITED PPC 2353
PPCN PPC Limited NPL PPCN 2353
Just to demonstrate how I tried to solve this, I tried the script below.
Sub ListLinks()
'Set a reference to microsoft Internet Controls
Dim IeApp As InternetExplorer
Dim sURL As String
Dim IeDoc As Object
Dim i As Long
Set IeApp = New InternetExplorer
IeApp.Visible = True
sURL = "http://www.sharenet.co.za/v3/q_sharelookup.php"
IeApp.Navigate sURL
Do
Loop Until IeApp.ReadyState = READYSTATE_COMPLETE
Set IeDoc = IeApp.Document
For i = 0 To IeDoc.Links.Length - 1
Cells(i + 1, 1).Value = IeDoc.Links(i).href
Next i
Set IeApp = Nothing
End Sub
I thought it would work fine, to list all URLs, and then loop through each to import data, but the problem on my SharePoint site is that the href doesn't appear to have any relevance to the name of the hyperlink.
In the picture above you can see 'Building & Construction Materials' in the TD element. If I can reference that in the 1st browser, and click the correct link to open a 2nd browser, and then reference that 2nd browser and scrape all TD elements from that, everything should work fine. Does anyone here know how to do that?
Good try on the code, got it pretty close- the one area that needs some fixing is when you try and get the list of items and loop it. You had the right idea on how it would work, but the HTML element syntaxes a little off so looks like just need some more experience using HTML objects... see sample code below:
Public Sub sampleCode()
Dim URL As String
Dim XMLHTTP As MSXML2.XMLHTTP60
Dim HTMLDoc_Main As HTMLDocument
Dim HTMLDoc_Secondary As HTMLDocument
Dim targetTable As HTMLObjectElement
Dim links As IHTMLElementCollection
Dim linkCounter As Long
Dim searchText As String
URL = "http://www.sharenet.co.za/v3/q_sharelookup.php"
searchText = "Building & Construction Materials"
Set XMLHTTP = New MSXML2.XMLHTTP60
Set HTMLDoc_Main = New HTMLDocument
With XMLHTTP
.Open "GET", URL, False
.send
While .readyState <> 4: Wend
HTMLDoc_Main.body.innerHTML = .responseText
End With
Set targetTable = HTMLDoc_Main.getElementsByClassName("dataTable")(0)
Set links = targetTable.getElementsByTagName("a")
For linkCounter = 0 To links.Length - 1
With links(linkCounter)
If InStr(1, .innerText, searchText) > 0 Then
Set XMLHTTP = New MSXML2.XMLHTTP60
Set HTMLDoc_Secondary = New HTMLDocument
XMLHTTP.Open "GET", .href, False
XMLHTTP.send
While XMLHTTP.readyState <> 4: Wend
HTMLDoc_Secondary.body.innerHTML = XMLHTTP.responseText
'Parse HTMLDoc_Secondary
End If
End With
Next
Set XMLHTTP = Nothing
Set HTMLDoc_Main = Nothing
Set HTMLDoc_Secondary = Nothing
End Sub
Couple notes- 1) I used XMLHTTPRequest instead of IE as it is faster so 2) you are going to need to add 'Microsoft HTML Object Library' and 'Microsoft XML, v6.0' to your references and 3) I can see you are outputting to ranges in your original code- if at all possible this should be avoided. Populate an array and then dump its entire contents out into your target sheet all at once to save time...
Hope this helps,
TheSilkCode

Get data from listings on a website to excel VBA

I am trying to find a way to get the data from yelp.com
I have a spreadsheet on which there are several keywords and locations. I am looking to extract data from yelp listings based on these keywords and locations already in my spreadsheet.
I have created the following code, but it seems to get absurd data and not the exact information I am looking for.
I want to get business name, address and phone number, but all I am getting is nothing. If anyone here could help me solve this problem.
Sub find()
Dim ie As Object
Set ie = CreateObject("InternetExplorer.Application")
With ie
ie.Visible = False
ie.Navigate "http://www.yelp.com/search?find_desc=boutique&find_loc=New+York%2C+NY&ns=1&ls=3387133dfc25cc99#start=10"
' Don't show window
ie.Visible = False
'Wait until IE is done loading page
Do While ie.Busy
Application.StatusBar = "Downloading information, lease wait..."
DoEvents
Loop
' Make a string from IE content
Set mDoc = ie.Document
peopleData = mDoc.body.innerText
ActiveSheet.Cells(1, 1).Value = peopleData
End With
peopleData = "" 'Nothing
Set mDoc = Nothing
End Sub
If you right click in IE, and do View Source, it is apparent that the data served on the site is not part of the document's .Body.innerText property. I notice this is often the case with dynamically served data, and that approach is really too simple for most web-scraping.
I open it in Google Chrome and inspect the elements to get an idea of what I'm really looking for, and how to find it using a DOM/HTML parser; you will need to add a reference to Microsoft HTML Object Library.
I think you can get it to return a collection of the <DIV> tags, and then check those for the classname with an If statment inside the loop.
I made some revisions to my original answer, this should print each record in a new cell:
Option Explicit
Private Sub Sleep Lib "kernel32" (ByVal dwMilliseconds As Long)
Sub find()
'Uses late binding, or add reference to Microsoft HTML Object Library
' and change variable Types to use intellisense
Dim ie As Object 'InternetExplorer.Application
Dim html As Object 'HTMLDocument
Dim Listings As Object 'IHTMLElementCollection
Dim l As Object 'IHTMLElement
Dim r As Long
Set ie = CreateObject("InternetExplorer.Application")
With ie
.Visible = False
.Navigate "http://www.yelp.com/search?find_desc=boutique&find_loc=New+York%2C+NY&ns=1&ls=3387133dfc25cc99#start=10"
' Don't show window
'Wait until IE is done loading page
Do While .readyState <> 4
Application.StatusBar = "Downloading information, Please wait..."
DoEvents
Sleep 200
Loop
Set html = .Document
End With
Set Listings = html.getElementsByTagName("LI") ' ## returns the list
For Each l In Listings
'## make sure this list item looks like the listings Div Class:
' then, build the string to put in your cell
If InStr(1, l.innerHTML, "media-block clearfix media-block-large main-attributes") > 0 Then
Range("A1").Offset(r, 0).Value = l.innerText
r = r + 1
End If
Next
Set html = Nothing
Set ie = Nothing
End Sub