Trying to get VBA to pull in data from Zillow - vba

I am new to coding and have been trying to figure out how to extract specific data from zillow and import it into excel. To be honest I am pretty lost trying to figure this out and I have been looking throughout the form and other online videos, but I haven't had any luck.
Here is the link to the website I am using https://www.zillow.com/new-york-ny/home-values/
I am looking to pull all the numbers into excel so I can run some calculations. If someone could help me just pull in the Zillow Home Value Index of $660,000 into excel, I feel that I can figure out the rest.
This is the code from the website
<ul class="value-info-list" id="yui_3_18_1_1_1529698944920_2626">
<li id="yui_3_18_1_1_1529698944920_2625">
<!-- TODO: need zillow logo icon here -->
<!-- <span class="zss-logo-color"><span class="zss-font-icon"></span></span> -->
<span class="value" id="yui_3_18_1_1_1529698944920_2624">
$660,000
</span>
<span class="info zsg-fineprint"> ZHVI
</span>
I tried getElementsByTagName getElementById and getElemenByClass The id is confusing me since I want to be able to enter any town into excel and it will search on zillow for the data on the web page. All the id tags are different so if I search by id in this code it will not work for other towns. I used the Class tag and was able to get some of the data I was looking for.
This is the code I came up with It pulls into the text box the $660,000. The Range function is working and putting the text box data into excel. This is pulling a bunch of strings which I was able to pull out the $660,000, but the way the sting is set up Im not sure how to pull the remaining data, such as the 1 year forecast "yr_forcast" is the cell range I want to pull the data into excel.
Sub SearchBot1()
'dimension (declare or set aside memory for) our variables
Dim objIE As InternetExplorer 'special object variable representing the IE browser
Dim aEle As HTMLLinkElement 'special object variable for an <a> (link) element
Dim y As Integer 'integer variable we'll use as a counter
Dim result As String 'string variable that will hold our result link
Dim Doc As HTMLDocument 'holds document object for internet explorer
'initiating a new instance of Internet Explorer and asigning it to objIE
Set objIE = New InternetExplorer
'make IE browser visible (False would allow IE to run in the background)
objIE.Visible = True
'navigate IE to this web page (a pretty neat search engine really)
objIE.navigate "https://www.zillow.com/new-york-ny/home-values/"
'wait here a few seconds while the browser is busy
Do While objIE.Busy = True Or objIE.readyState <> 4: DoEvents: Loop
'in the search box put cell "A2" value, the word "in" and cell "C1" value
objIE.document.getElementById("local-search").Value = _
Sheets("Sheet2").Range("B3").Value & ", " & Sheets("Sheet2").Range("B4").Value
'click the 'go' button
Set the_input_elements = objIE.document.getElementsByTagName("button")
For Each input_element In the_input_elements
If input_element.getAttribute("name") = "SubmitButton" Then
input_element.Click
Exit For
End If
Next input_element
'wait again for the browser
Do While objIE.Busy = True Or objIE.readyState <> 4: DoEvents: Loop
'price for home
Set Doc = objIE.document
Dim cclass As String
cclass = Trim(Doc.getElementsByClassName("value-info-list")(0).innerText)
MsgBox cclass
Dim aclass As Variant
aclass = Split(cclass, " ")
Range("Market_Price").Value = aclass(0)
Range("yr_forecast").Value = aclass(5)
'close the browser
objIE.Quit
End Sub
If you need anymore information please let me know.

The value you want is the first element with className value. You can use querySelector to apply a CSS selector of .value, where "." is the selector for class, to get this value.
Option Explicit
Public Sub GetInfo()
Dim html As New MSHTML.HTMLDocument
Const URL As String = "https://www.zillow.com/new-york-ny/home-values/"
html.body.innerHTML = GetHTML(URL)
Debug.Print html.querySelector(".value").innerText
End Sub
Public Function GetHTML(ByVal URL As String) As String
Dim sResponse As String
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", URL, False
.send
sResponse = StrConv(.responseBody, vbUnicode)
End With
GetHTML = Mid$(sResponse, InStr(1, sResponse, "<!DOCTYPE "))
End Function
You could also use:
Debug.Print html.getElementsByClassName("value")(0).innerText
Current webpage value:
Code output:

Related

How to enter a website address using VBA and search

I know this may seem easy. I have already entered a code to try and get this to work, but ran into one problem. The format on the link below is the same for all city and states. As long as you can type the name of the city ("City_Search") and the State ("State_Search") you should be able to access the website with the information as seen below.
I have attached the formula I am using below. If anyone can assist me with the search I would appreciate it.
Sub SearchBot1()
'dimension (declare or set aside memory for) our variables
Dim objIE As InternetExplorer 'special object variable representing the IE browser
Dim aEle As HTMLLinkElement 'special object variable for an <a> (link) element
Dim HTMLinputs As MSHTML.IHTMLElementCollection
'initiating a new instance of Internet Explorer and asigning it to objIE
Set objIE = New InternetExplorer
'make IE browser visible (False would allow IE to run in the background)
objIE.Visible = True
'navigate IE to this web page (a pretty neat search engine really)
objIE.navigate "https://datausa.io/profile/geo/" & Range("City_Search").Value & "-" & Range("State_Search").Value
'wait here a few seconds while the browser is busy
Do While objIE.Busy = True Or objIE.readyState <> 4: DoEvents: Loop
End Sub
The Idea would be for me to type any city into excel and once I hit run on a macro it will go to the site and search for the towns data. I have added a link below as an example of the page I am looking to get when I search.
https://datausa.io/profile/geo/hoboken-nj/
You need to hyphenate cities that have spaces in their title. Counties need to be the correct abbreviation and both are case sensitive i.e. need to be all lower case. So you need to add these hyphens, if missing, using a function like Replace in vba, to swop Chr$(32) with "-" or Chr$(45), and potentially LCase$ to convert to lowercase.
You should also fully qualify the range with the worksheet you intend to use.
With data already in correct format in cell:
E.g. with los-angeles-ca or los-angeles-county-ca in a cell.
Option Explicit
Public Sub SearchBot1()
Dim objIE As InternetExplorer, aEle As HTMLLinkElement
Dim HTMLinputs As MSHTML.IHTMLElementCollection
Set objIE = New InternetExplorer
'e.g. https://datausa.io/profile/geo/los-angeles-ca/
With objIE
.Visible = True
.navigate "https://datausa.io/profile/geo/" & Range("City_Search").Value & "-" & Range("State_Search").Value
Do While .Busy = True Or .readyState <> 4: DoEvents: Loop
Stop
' .Quit '<== Uncomment me to close browser at end
End With
End Sub
Adding hyphens:
If you had los angeles, not los-angeles, in a cell:
Replace$(Range("City_Search").Value, Chr$(32), Chr$(45))
Lowercase and hyphen:
To be really safe you could convert to lowercase aswell to handle any upper case letters in the cell you are referencing e.g.
For Los Angeles use: Replace$(LCase$(Range("City_Search").Value)
Option Explicit
Public Sub SearchBot1()
Dim objIE As InternetExplorer, aEle As HTMLLinkElement
Dim HTMLinputs As MSHTML.IHTMLElementCollection, ws As Worksheet
Set ws = ThisWorkbook.Worksheets("Sheet1")
Set objIE = New InternetExplorer
'e.g. https://datausa.io/profile/geo/los-angeles-ca/
With objIE
.Visible = True
.navigate "https://datausa.io/profile/geo/" & ws.Range("City_Search").Value & "-" & ws.Range("State_Search").Value
Do While .Busy = True Or .readyState <> 4: DoEvents: Loop
Stop
' .Quit '<== Uncomment me to close browser at end
End With
End Sub
That gets you to the pages. What you do then......
DID you know that this website has its own data-search API?
And you can also extract data using a background object instead of creating an Internet Explorer?
For instance:
Sub getCityData()
''' Create a background server connection
Dim myCon As Object: Set myCon = CreateObject("MSXML2.ServerXMLHTTP.6.0")
''' Open a connection string with the DataUSA API and basic request for (geo, place, population)
myCon.Open "GET", "http://api.datausa.io/api/?show=geo&sumlevel=place&required=pop"
myCon.send ''' Send the request
''' Dataset in the ResponseText is HUGE so for demo show first 5000 characters
Sheet1.Range("A1").Value2 = Left(myCon.responseText, 5000)
End Sub
That will pull the ENTIRE DATA SET for every "place" in America with its population for every year from 2013 onwards in about a second. It will place the first 5000 characters of the dataset in to cell A1 on Sheet1 (I recommend putting this in a new Excel file).
I don't have time to learn the site's API but it seems to have good documentation On github and the responses come back in JSON format - if you really want to make a powerful excel interface use their API with background connections - they have so much data for the USA at your fingertips

Unable to parse some links lying within an iframe

I've written a script in vba using IE to parse some links from a webpage. The thing is the links are within an iframe. I've twitched my code in such a way so that the script will first find a link within that iframe and navigate to that new page and parse the required content from there. If i do this way then I can get all the links.
Webpage URL: weblink
Successful approach (working one):
Sub Get_Links()
Dim IE As New InternetExplorer, HTML As HTMLDocument
Dim elem As Object, post As Object
With IE
.Visible = True
.navigate "put here the above link"
While .Busy = True Or .readyState < 4: DoEvents: Wend
Set elem = .document.getElementById("compInfo") #it is within iframe
.navigate elem.src
While .Busy = True Or .readyState < 4: DoEvents: Wend
Set HTML = .document
End With
For Each post In HTML.getElementsByClassName("news")
With post.getElementsByTagName("a")
If .Length Then R = R + 1: Cells(R, 1) = .Item(0).href
End With
Next post
IE.Quit
End Sub
I've seen few sites where no such links exist within iframe so, I will have no option to use any link to track down the content.
If you take a look at the below approach by tracking the link then you can notice that I've parsed the content from a webpage which are within Iframe. There is no such link within Iframe to navigate to a new webpage to locate the content. So, I used contentWindow.document instead and found it working flawlessly.
Link to the working code of parsing Iframe content from another site:
contentWindow approach
However, my question is: why should i navigate to a new webpage to collect the links as I can see the content in the landing page? I tried using contentWindow.document but it is giving me access denied error. How can I make my below code work using contentWindow.document like I did above?
I tried like this but it throws access denied error:
Sub Get_Links()
Dim IE As New InternetExplorer, HTML As HTMLDocument
Dim frm As Object, post As Object
With IE
.Visible = True
.Navigate "put here the above link"
While .Busy = True Or .readyState < 4: DoEvents: Wend
Set HTML = .document
End With
''the code breaks when it hits the following line "access denied error"
Set frm = HTML.getElementById("compInfo").contentWindow.document
For Each post In frm.getElementsByClassName("news")
With post.getElementsByTagName("a")
If .Length Then R = R + 1: Cells(R, 1) = .Item(0).href
End With
Next post
IE.Quit
End Sub
I've attached an image to let you know which links (they are marked with pencil) I'm after.
These are the elements within which one such link (i would like to grab) is found:
<div class="news">
<span class="news-date_time"><img src="images/arrow.png" alt="">19 Jan 2018 00:01</span>
<a style="color:#5b5b5b;" href="/HomeFinancial.aspx?&cocode=INE117A01022&Cname=ABB-India-Ltd&srno=17019039003&opt=9">ABB India Limited - Press Release</a>
</div>
Image of the links of that page I would like to grab:
From the very first day while creating this thread I strictly requested not to use this url http://hindubusiness.cmlinks.com/Companydetails.aspx?cocode=INE117A01022 to locate the data. I requested any solution from this main_page_link without touching the link within iframe. However, everyone is trying to provide solutions that I've already shown in my post. What did I put a bounty for then?
You can see the links within <iframe> in browser but can't access them programmatically due to Same-origin policy.
There is the example showing how to retrieve the links using XHR and RegEx:
Option Explicit
Sub Test()
Dim sContent As String
Dim sUrl As String
Dim aLinks() As String
Dim i As Long
' Retrieve initial webpage HTML content via XHR
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", "https://www.thehindubusinessline.com/stocks/abb-india-ltd/overview/", False
.Send
sContent = .ResponseText
End With
'WriteTextFile sContent, CreateObject("WScript.Shell").SpecialFolders("Desktop") & "\tmp\tmp.htm", -1
' Extract target iframe URL via RegEx
With CreateObject("VBScript.RegExp")
.Global = True
.MultiLine = True
.IgnoreCase = True
' Process all a within div.news
.Pattern = "<iframe[\s\S]*?src=""([^""]*?Companydetails[^""]*)""[^>]*>"
sUrl = .Execute(sContent).Item(i).SubMatches(0)
End With
' Retrieve iframe HTML content via XHR
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", sUrl, False
.Send
sContent = .ResponseText
End With
'WriteTextFile sContent, CreateObject("WScript.Shell").SpecialFolders("Desktop") & "\tmp\tmp.htm", -1
' Parse links via XHR
With CreateObject("VBScript.RegExp")
.Global = True
.MultiLine = True
.IgnoreCase = True
' Process all anchors within div.news
.Pattern = "<div class=""news"">[\s\S]*?href=""([^""]*)"
With .Execute(sContent)
ReDim aLinks(0 To .Count - 1)
For i = 0 To .Count - 1
aLinks(i) = .Item(i).SubMatches(0)
Next
End With
End With
Debug.Print Join(aLinks, vbCrLf)
End Sub
Generally RegEx's aren't recommended for HTML parsing, so there is disclaimer. Data being processed in this case is quite simple that is why it is parsed with RegEx.
The output for me as follows:
/HomeFinancial.aspx?&cocode=INE117A01022&Cname=ABB-India-Ltd&srno=17047038016&opt=9
/HomeFinancial.aspx?&cocode=INE117A01022&Cname=ABB-India-Ltd&srno=17046039003&opt=9
/HomeFinancial.aspx?&cocode=INE117A01022&Cname=ABB-India-Ltd&srno=17045039006&opt=9
/HomeFinancial.aspx?&cocode=INE117A01022&Cname=ABB-India-Ltd&srno=17043039002&opt=9
/HomeFinancial.aspx?&cocode=INE117A01022&Cname=ABB-India-Ltd&srno=17043010019&opt=9
I also tried to copy the content of the <iframe> from IE to clipboard (for further pasting to the worksheet) using commands:
IE.ExecWB OLECMDID_SELECTALL, OLECMDEXECOPT_DODEFAULT
IE.ExecWB OLECMDID_COPY, OLECMDEXECOPT_DODEFAULT
But actually that commands select and copy the main document, excluding the frame, unless I click on the frame manually. So that might be applied if click on the frame could be reproduced from VBA (frame node methods like .focus and .click didn't help).
Something like this should work. They key is to realize the iFrame is technically another Document. Reviewing the iFrame on the page you listed, you can easily use a web request to get at the data you need. As already mentioned, the reason you get an error is due to the Same-Origin policy. You could write something to get the src of the iFrame then do the web request as I've shown below, or, use IE to scrape the page, get the src, then load that page which looks like what you have done.
I would recommend using a web request approach, Internet Explorer can get annoying, fast.
Code
Public Sub SOExample()
Dim html As Object 'To store the HTML content
Dim Elements As Object 'To store the anchor collection
Dim Element As Object 'To iterate the anchor collection
Set html = CreateObject("htmlFile")
With CreateObject("MSXML2.XMLHTTP")
'Navigate to the source of the iFrame, it's another page
'View the source for the iframe. Alternatively -
'you could navigate to this page and use IE to scrape it
.Open "GET", "https://stocks.thehindubusinessline.com/Companydetails.aspx?&cocode=INE117A01022"
.send ""
'See if the request was ok, exit it there was an error
If Not .Status = 200 Then Exit Sub
'Assign the page's HTML to an HTML object
html.body.InnerHTML = .responseText
Set Elements = html.body.document.getElementByID("hmstockchart_CompanyNews1_updateGLVV")
Set Elements = Elements.getElementsByTagName("a")
For Each Element In Elements
'Print out the data to the Immediate window
Debug.Print Element.InnerText
Next
End With
End Sub
Results
ABB India Limited - AGM/Book Closure
Board of ABB India recommends final dividend
ABB India to convene AGM
ABB India to pay dividend
ABB India Limited - Outcome of Board Meeting
More ?
The simple of solution like everyone suggested is to directly go the link. This would take the IFRAME out of picture and it would be easier for you loop through links. But in case you still don't like the approach then you need to get a bit deeper into the hole.
Below is a function from a library I wrote long back in VB.NET
https://github.com/tarunlalwani/ScreenCaptureAPI/blob/2646c627b4bb70e36fe2c6603acde4cee3354b39/Source%20Code/ScreenCaptureAPI/ScreenCaptureAPI/ScreenCapture.vb#L803
Private Function _EnumIEFramesDocument(ByVal wb As HTMLDocumentClass) As Collection
Dim pContainer As olelib.IOleContainer = Nothing
Dim pEnumerator As olelib.IEnumUnknown = Nothing
Dim pUnk As olelib.IUnknown = Nothing
Dim pBrowser As SHDocVW.IWebBrowser2 = Nothing
Dim pFramesDoc As Collection = New Collection
_EnumIEFramesDocument = Nothing
pContainer = wb
Dim i As Integer = 0
' Get an enumerator for the frames
If pContainer.EnumObjects(olelib.OLECONTF.OLECONTF_EMBEDDINGS, pEnumerator) = 0 Then
pContainer = Nothing
' Enumerate and refresh all the frames
Do While pEnumerator.Next(1, pUnk) = 0
On Error Resume Next
' Clear errors
Err.Clear()
' Get the IWebBrowser2 interface
pBrowser = pUnk
If Err.Number = 0 Then
pFramesDoc.Add(pBrowser.Document)
i = i + 1
End If
Loop
pEnumerator = Nothing
End If
_EnumIEFramesDocument = pFramesDoc
End Function
So basically this is a VB.NET version of below C++ version
Accessing body (at least some data) in a iframe with IE plugin Browser Helper Object (BHO)
Now you just need to port it to VBA. The only problem you may have is finding the olelib rerefernce. Rest most of it is VBA compatible
So once you get the array of object, you will find the one which belongs to your frame and then you can just that one
frames = _EnumIEFramesDocument(IE)
frames.Item(1).document.getElementsByTagName("A").length

Scrape data from 'nested' iframe with vba

Am trying to scrape the title and price from mcmaster.com to auto populate an internal purchase rec form. (URL with item part number is hard coded below for testing.) Got the title fine, but can't get the price. I don't think my code is looking at the correct place because the element is not found. I modeled my code from here.
Sub get_title_header()
Dim wb As Object
Dim doc As Object
Dim sURL As String
Set wb = CreateObject("internetExplorer.Application")
sURL = "https://www.mcmaster.com/#95907A480"
wb.navigate sURL
wb.Visible = True
While wb.Busy Or wb.readyState <> READYSTATE_COMPLETE
DoEvents
Wend
'Wait a bit for everything to catchup...(network?)
Application.Wait (Now + TimeValue("0:00:02"))
'HTML document
Set doc = wb.document
Dim MyString As String
MyString = doc.Title
Sheet2.Cells(1, 2) = Trim(Replace(MyString, "McMaster-Carr -", "", , , vbTextCompare))
Dim el As Object
For Each el In doc.getElementsByName("Prce")
Sheet2.Cells(2, 2).Value = el.innerText
Next el
wb.Quit
End Sub
Is this iframe nested? Can anyone explain why my code can't see the iframe data and help me get the item price? Thanks in advance!
The element you try to scrape in your link is nested as follows:
The first mistake you do is that you're trying to get this object by name, and not by id:
doc.getElementsByName("Prce")
There's no object in your HTML document having name="Prce", so I'd expect the For cycle not even to start.
What you might want to do to get your price, instead, is:
Sheet2.Cells(2, 2).Value = doc.getElementById("Prce").getElementsByClassName("PrceTxt")(0).innerText
where:
doc.getElementById("Prce") gets you the element <div id="Prce">...</div>
.getElementsByClassName("PrceTxt")(0) gets you the first (and only) element inside the above div, i.e. <div class="PrceTxt">...</div>
.innerText gets you the text of this element

Visual Basic Word Get Elements from a Website

I'm trying to import text from specific div´s of a website to a bookmark in a Word document and I'm stuck with reading the HTML from a website. I tried 100 tutorials all for VBA Excel (maybe that's why) and always the same result.
Let's say that I have a site like:
<html>
<div id = "test">
this is an example text
</div>
</html>
and here is my VBA Code:
Sub read_html()
Set objIE = CreateObject("InternetExplorer.Application")
Dim htmlOut As String
With objIE
.Navigate "http://blabla.net/testy/test.html"
Do
Loop Until Not .Busy
htmlOut = .Document.getElementsByName("test")
.Quit
MsgBox "example:" & htmlOut
End With
Set iexpl = Nothing
End Sub
The MsgBox returns: example [object]
getElementsByName will return a collection of all the elements with that name. Even if there's only one, it still returns a collection. Just a collection with one item in it. You can get the first element in the collection like
htmlOut = .Document.getElementsByName("test").Item(0).InnerText
The .Item(0) will return the first item and the .InnerText will return a string.

need help scraping with excel vba

I need to scrape Title, product description and Product code and save it into worksheet from <<<HERE>>> in this case those are :
"Catherine Lansfield Helena Multi Bedspread - Double"
"This stunning ivory bedspread has been specially designed to sit with the Helena bedroom range. It features a subtle floral design with a diamond shaped quilted finish. The bedspread is padded so can be used as a lightweight quilt in the summer or as an extra layer in the winter.
Polyester.
Size L260, W240cm.
Suitable for a double bed.
Machine washable at 30°C.
Suitable for tumble drying.
EAN: 5055184924746.
Product Code 116/4196"
I have tried different methods and none was good for me in the end. For Mid and InStr functions result was none, it could be that my code was wrong. Sorry i do not give any code because i had already messed it up many times and have had no result. I have tried to scrape hole page with GetDatafromPage. It works well, but for different product pages the output goes to different rows as ammount of elements changes from page to page. Also it`s not possible to scrape only chosen elements. So it is pointless to get value from defined cells.
Another option instead of using the InternetExplorer object is the xmlhttp object. Here is a similar example to kekusemau but instead using xmlhttp object to request the page. I am then loading the responseText from the xmlhttp object in the html file.
Sub test()
Dim xml As Object
Set xml = CreateObject("MSXML2.XMLHTTP")
xml.Open "Get", "http://www.argos.co.uk/static/Product/partNumber/1164196.htm", False
xml.send
Dim doc As Object
Set doc = CreateObject("htmlfile")
doc.body.innerhtml = xml.responsetext
Dim name
Set name = doc.getElementById("pdpProduct").getElementsByTagName("h1")(0)
MsgBox name.innerText
Dim desc
Set desc = doc.getElementById("genericESpot_pdp_proddesc2colleft").getElementsByTagName("div")(0)
MsgBox desc.innerText
Dim id
Set id = doc.getElementById("pdpProduct").getElementsByTagName("span")(0).getElementsByTagName("span")(2)
MsgBox id.innerText
End Sub
This seems to be not too difficult. You can use Firefox to take a look at the page structure (right-click somewhere and click inspect element, and go on from there...)
Here is a simple sample code:
Sub test()
Dim ie As InternetExplorer
Dim x
Set ie = New InternetExplorer
ie.Visible = True
ie.Navigate "http://www.argos.co.uk/static/Product/partNumber/1164196.htm"
While ie.ReadyState <> READYSTATE_COMPLETE
DoEvents
Wend
Set x = ie.Document.getElementById("pdpProduct").getElementsByTagName("h1")(0)
MsgBox Trim(x.innerText)
Set x = ie.Document.getElementById("genericESpot_pdp_proddesc2colleft").getElementsByTagName("div")(0)
MsgBox x.innerText
Set x = ie.Document.getElementById("pdpProduct").getElementsByTagName("span")(0).getElementsByTagName("span")(2)
MsgBox x.innerText
ie.Quit
End Sub
(I have a reference in Excel to Microsoft Internet Controls, I don't know if that is there by default, if not you have to set it first to run this code).