I know this may seem easy. I have already entered a code to try and get this to work, but ran into one problem. The format on the link below is the same for all city and states. As long as you can type the name of the city ("City_Search") and the State ("State_Search") you should be able to access the website with the information as seen below.
I have attached the formula I am using below. If anyone can assist me with the search I would appreciate it.
Sub SearchBot1()
'dimension (declare or set aside memory for) our variables
Dim objIE As InternetExplorer 'special object variable representing the IE browser
Dim aEle As HTMLLinkElement 'special object variable for an <a> (link) element
Dim HTMLinputs As MSHTML.IHTMLElementCollection
'initiating a new instance of Internet Explorer and asigning it to objIE
Set objIE = New InternetExplorer
'make IE browser visible (False would allow IE to run in the background)
objIE.Visible = True
'navigate IE to this web page (a pretty neat search engine really)
objIE.navigate "https://datausa.io/profile/geo/" & Range("City_Search").Value & "-" & Range("State_Search").Value
'wait here a few seconds while the browser is busy
Do While objIE.Busy = True Or objIE.readyState <> 4: DoEvents: Loop
End Sub
The Idea would be for me to type any city into excel and once I hit run on a macro it will go to the site and search for the towns data. I have added a link below as an example of the page I am looking to get when I search.
https://datausa.io/profile/geo/hoboken-nj/
You need to hyphenate cities that have spaces in their title. Counties need to be the correct abbreviation and both are case sensitive i.e. need to be all lower case. So you need to add these hyphens, if missing, using a function like Replace in vba, to swop Chr$(32) with "-" or Chr$(45), and potentially LCase$ to convert to lowercase.
You should also fully qualify the range with the worksheet you intend to use.
With data already in correct format in cell:
E.g. with los-angeles-ca or los-angeles-county-ca in a cell.
Option Explicit
Public Sub SearchBot1()
Dim objIE As InternetExplorer, aEle As HTMLLinkElement
Dim HTMLinputs As MSHTML.IHTMLElementCollection
Set objIE = New InternetExplorer
'e.g. https://datausa.io/profile/geo/los-angeles-ca/
With objIE
.Visible = True
.navigate "https://datausa.io/profile/geo/" & Range("City_Search").Value & "-" & Range("State_Search").Value
Do While .Busy = True Or .readyState <> 4: DoEvents: Loop
Stop
' .Quit '<== Uncomment me to close browser at end
End With
End Sub
Adding hyphens:
If you had los angeles, not los-angeles, in a cell:
Replace$(Range("City_Search").Value, Chr$(32), Chr$(45))
Lowercase and hyphen:
To be really safe you could convert to lowercase aswell to handle any upper case letters in the cell you are referencing e.g.
For Los Angeles use: Replace$(LCase$(Range("City_Search").Value)
Option Explicit
Public Sub SearchBot1()
Dim objIE As InternetExplorer, aEle As HTMLLinkElement
Dim HTMLinputs As MSHTML.IHTMLElementCollection, ws As Worksheet
Set ws = ThisWorkbook.Worksheets("Sheet1")
Set objIE = New InternetExplorer
'e.g. https://datausa.io/profile/geo/los-angeles-ca/
With objIE
.Visible = True
.navigate "https://datausa.io/profile/geo/" & ws.Range("City_Search").Value & "-" & ws.Range("State_Search").Value
Do While .Busy = True Or .readyState <> 4: DoEvents: Loop
Stop
' .Quit '<== Uncomment me to close browser at end
End With
End Sub
That gets you to the pages. What you do then......
DID you know that this website has its own data-search API?
And you can also extract data using a background object instead of creating an Internet Explorer?
For instance:
Sub getCityData()
''' Create a background server connection
Dim myCon As Object: Set myCon = CreateObject("MSXML2.ServerXMLHTTP.6.0")
''' Open a connection string with the DataUSA API and basic request for (geo, place, population)
myCon.Open "GET", "http://api.datausa.io/api/?show=geo&sumlevel=place&required=pop"
myCon.send ''' Send the request
''' Dataset in the ResponseText is HUGE so for demo show first 5000 characters
Sheet1.Range("A1").Value2 = Left(myCon.responseText, 5000)
End Sub
That will pull the ENTIRE DATA SET for every "place" in America with its population for every year from 2013 onwards in about a second. It will place the first 5000 characters of the dataset in to cell A1 on Sheet1 (I recommend putting this in a new Excel file).
I don't have time to learn the site's API but it seems to have good documentation On github and the responses come back in JSON format - if you really want to make a powerful excel interface use their API with background connections - they have so much data for the USA at your fingertips
Related
I am new to coding and have been trying to figure out how to extract specific data from zillow and import it into excel. To be honest I am pretty lost trying to figure this out and I have been looking throughout the form and other online videos, but I haven't had any luck.
Here is the link to the website I am using https://www.zillow.com/new-york-ny/home-values/
I am looking to pull all the numbers into excel so I can run some calculations. If someone could help me just pull in the Zillow Home Value Index of $660,000 into excel, I feel that I can figure out the rest.
This is the code from the website
<ul class="value-info-list" id="yui_3_18_1_1_1529698944920_2626">
<li id="yui_3_18_1_1_1529698944920_2625">
<!-- TODO: need zillow logo icon here -->
<!-- <span class="zss-logo-color"><span class="zss-font-icon"></span></span> -->
<span class="value" id="yui_3_18_1_1_1529698944920_2624">
$660,000
</span>
<span class="info zsg-fineprint"> ZHVI
</span>
I tried getElementsByTagName getElementById and getElemenByClass The id is confusing me since I want to be able to enter any town into excel and it will search on zillow for the data on the web page. All the id tags are different so if I search by id in this code it will not work for other towns. I used the Class tag and was able to get some of the data I was looking for.
This is the code I came up with It pulls into the text box the $660,000. The Range function is working and putting the text box data into excel. This is pulling a bunch of strings which I was able to pull out the $660,000, but the way the sting is set up Im not sure how to pull the remaining data, such as the 1 year forecast "yr_forcast" is the cell range I want to pull the data into excel.
Sub SearchBot1()
'dimension (declare or set aside memory for) our variables
Dim objIE As InternetExplorer 'special object variable representing the IE browser
Dim aEle As HTMLLinkElement 'special object variable for an <a> (link) element
Dim y As Integer 'integer variable we'll use as a counter
Dim result As String 'string variable that will hold our result link
Dim Doc As HTMLDocument 'holds document object for internet explorer
'initiating a new instance of Internet Explorer and asigning it to objIE
Set objIE = New InternetExplorer
'make IE browser visible (False would allow IE to run in the background)
objIE.Visible = True
'navigate IE to this web page (a pretty neat search engine really)
objIE.navigate "https://www.zillow.com/new-york-ny/home-values/"
'wait here a few seconds while the browser is busy
Do While objIE.Busy = True Or objIE.readyState <> 4: DoEvents: Loop
'in the search box put cell "A2" value, the word "in" and cell "C1" value
objIE.document.getElementById("local-search").Value = _
Sheets("Sheet2").Range("B3").Value & ", " & Sheets("Sheet2").Range("B4").Value
'click the 'go' button
Set the_input_elements = objIE.document.getElementsByTagName("button")
For Each input_element In the_input_elements
If input_element.getAttribute("name") = "SubmitButton" Then
input_element.Click
Exit For
End If
Next input_element
'wait again for the browser
Do While objIE.Busy = True Or objIE.readyState <> 4: DoEvents: Loop
'price for home
Set Doc = objIE.document
Dim cclass As String
cclass = Trim(Doc.getElementsByClassName("value-info-list")(0).innerText)
MsgBox cclass
Dim aclass As Variant
aclass = Split(cclass, " ")
Range("Market_Price").Value = aclass(0)
Range("yr_forecast").Value = aclass(5)
'close the browser
objIE.Quit
End Sub
If you need anymore information please let me know.
The value you want is the first element with className value. You can use querySelector to apply a CSS selector of .value, where "." is the selector for class, to get this value.
Option Explicit
Public Sub GetInfo()
Dim html As New MSHTML.HTMLDocument
Const URL As String = "https://www.zillow.com/new-york-ny/home-values/"
html.body.innerHTML = GetHTML(URL)
Debug.Print html.querySelector(".value").innerText
End Sub
Public Function GetHTML(ByVal URL As String) As String
Dim sResponse As String
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", URL, False
.send
sResponse = StrConv(.responseBody, vbUnicode)
End With
GetHTML = Mid$(sResponse, InStr(1, sResponse, "<!DOCTYPE "))
End Function
You could also use:
Debug.Print html.getElementsByClassName("value")(0).innerText
Current webpage value:
Code output:
I am new to VBA, and trying to put together a code that will allow me to search a website for bond information, which is listed in my excel file, and pull back the bond's issue date. I have been able to get to the website, and using F8 to execute the code manually, the code appears to work fine. However, when I run the macro, I get
Error 91: Object variable or With block variable not set.
I am unsure how to fix this issue, and I have tried to find earlier answers, as there is a lot of info on error 91. But none seem to be able to help with my specific problem.
Please assist, thanks.
Dim objIE As InternetExplorer 'special object variable representing the IE browser
Dim datelabel As String
Dim x As String 'for the CUSIP number
Dim y As Integer 'integer variable we'll use as a counter
Dim result As String 'string variable that will hold our result link
'initiating a new instance of Internet Explorer and asigning it to objIE
Set objIE = New InternetExplorer
'make IE browser visible (False would allow IE to run in the background)
objIE.Visible = False
'navigate IE to this web page
objIE.navigate "https://emma.msrb.org/Search/Search.aspx"
'wait here a few seconds while the browser is busy
Do While objIE.Busy = True Or objIE.readyState <> 4: DoEvents: Loop
'DATA SCRAPING Portion
Dim Doc As HTMLDocument
Set Doc = objIE.document
'get issue date
datelabel = Doc.getElementsByClassName("value")(0).innerText 'HERE IS WHERE I HAVE MY PROBLEM
'MsgBox datelabel
End If
Loop
'close the browser
objIE.Quit
This is the html that I am trying to pull:
span class="value"
from:
https://emma.msrb.org/SecurityView/SecurityDetails.aspx?cusip=ACDB05F7DCC14B929AC9D2D082A3D9AE0
I thought I figured this out over the weekend, but it actually doesn't work the way I thought it would. I have a confidential corporate SharePoint site that I work with. I can't post the link here, or any specific data, but the concept below will illustrate the point fine.
I have a parent URL that I want to import data from. Let's say this is the parent URL.
http://www.sharenet.co.za/v3/q_sharelookup.php
From there, I want to import data from a specific link. Let's say this is the link: 'Building & Construction Materials'
I think the best way to do this is some kind of InStr() function and search for the string. Then, if found, click the link and open the child URL. When the child URL opens, it looks something like this:
http://www.sharenet.co.za/v3/sharesfound.php?ssector=2353&exch=JSE&bookmark=Building%20&%20Construction%20Materials&scheme=default
I can't tell what the sector numbers will be ahead of time, so I can't use a specific URL. I need to reference it as the parent and child, or maybe IE1 and IE2. I want to import all data from the child URL, which in this example, looks like this.
Name Full Name Code Sector
BUILDMX BUILDMAX LIMITED BDM 2353
KAYDAV KAYDAV GROUP LTD KDV 2353
AFRIMAT AFRIMAT LTD AFT 2353
Trellidor Trellidor Hldgs Ltd TRL 2353
MASONITE MASONITE (AFRICA) LIMITED MAS 2353
DAWN DISTRIBUTION AND WAREHOUSING NETWORK LIMITED DAW 2353
MAZOR MAZOR GROUP LTD MZR 2353
PPC PPC LIMITED PPC 2353
PPCN PPC Limited NPL PPCN 2353
Just to demonstrate how I tried to solve this, I tried the script below.
Sub ListLinks()
'Set a reference to microsoft Internet Controls
Dim IeApp As InternetExplorer
Dim sURL As String
Dim IeDoc As Object
Dim i As Long
Set IeApp = New InternetExplorer
IeApp.Visible = True
sURL = "http://www.sharenet.co.za/v3/q_sharelookup.php"
IeApp.Navigate sURL
Do
Loop Until IeApp.ReadyState = READYSTATE_COMPLETE
Set IeDoc = IeApp.Document
For i = 0 To IeDoc.Links.Length - 1
Cells(i + 1, 1).Value = IeDoc.Links(i).href
Next i
Set IeApp = Nothing
End Sub
I thought it would work fine, to list all URLs, and then loop through each to import data, but the problem on my SharePoint site is that the href doesn't appear to have any relevance to the name of the hyperlink.
In the picture above you can see 'Building & Construction Materials' in the TD element. If I can reference that in the 1st browser, and click the correct link to open a 2nd browser, and then reference that 2nd browser and scrape all TD elements from that, everything should work fine. Does anyone here know how to do that?
Good try on the code, got it pretty close- the one area that needs some fixing is when you try and get the list of items and loop it. You had the right idea on how it would work, but the HTML element syntaxes a little off so looks like just need some more experience using HTML objects... see sample code below:
Public Sub sampleCode()
Dim URL As String
Dim XMLHTTP As MSXML2.XMLHTTP60
Dim HTMLDoc_Main As HTMLDocument
Dim HTMLDoc_Secondary As HTMLDocument
Dim targetTable As HTMLObjectElement
Dim links As IHTMLElementCollection
Dim linkCounter As Long
Dim searchText As String
URL = "http://www.sharenet.co.za/v3/q_sharelookup.php"
searchText = "Building & Construction Materials"
Set XMLHTTP = New MSXML2.XMLHTTP60
Set HTMLDoc_Main = New HTMLDocument
With XMLHTTP
.Open "GET", URL, False
.send
While .readyState <> 4: Wend
HTMLDoc_Main.body.innerHTML = .responseText
End With
Set targetTable = HTMLDoc_Main.getElementsByClassName("dataTable")(0)
Set links = targetTable.getElementsByTagName("a")
For linkCounter = 0 To links.Length - 1
With links(linkCounter)
If InStr(1, .innerText, searchText) > 0 Then
Set XMLHTTP = New MSXML2.XMLHTTP60
Set HTMLDoc_Secondary = New HTMLDocument
XMLHTTP.Open "GET", .href, False
XMLHTTP.send
While XMLHTTP.readyState <> 4: Wend
HTMLDoc_Secondary.body.innerHTML = XMLHTTP.responseText
'Parse HTMLDoc_Secondary
End If
End With
Next
Set XMLHTTP = Nothing
Set HTMLDoc_Main = Nothing
Set HTMLDoc_Secondary = Nothing
End Sub
Couple notes- 1) I used XMLHTTPRequest instead of IE as it is faster so 2) you are going to need to add 'Microsoft HTML Object Library' and 'Microsoft XML, v6.0' to your references and 3) I can see you are outputting to ranges in your original code- if at all possible this should be avoided. Populate an array and then dump its entire contents out into your target sheet all at once to save time...
Hope this helps,
TheSilkCode
I have a vba code which can upload the data from an excel sheet to a website. However, the code works fine in Win7 System and IE browser 8,but it does not work on a win8 IE browser 11.
Here are part of the code:
Dim objIE As SHDocVw.InternetExplorer
Dim htmlDoc As MSHTML.HTMLDocument
Dim htmlFrame As MSHTML.HTMLFrameElement
Dim frame As HTMLIFrame
Dim htmlElement As HTMLDTElement
Dim myDoc As Object
Set curSheet = ActiveWorkbook.ActiveSheet
Set oShApp = CreateObject("Shell.Application")
For Each oWin In oShApp.Windows
If oWin.Name = "Windows Internet Explorer" Then
Set IE = oWin
Exit For
End If
Next
If IE Is Nothing Then
MsgBox ("Please sign into Avocado, then re-run this macro")
Set IE = New InternetExplorerMedium
IE.Visible = True
IE.navigate "https://www.google.com"
Exit Sub
End If
Sheets("Prepare").Select
fPathName = Cells(5, 5)
Call MakeFolders(fPathName)
Call MakeFolders2(fPathName)
Call MakeFolders3(fPathName)
'fFileName = fPathName & "\*.xls"
fFileName = Dir(fPathName & "\*.xls")
The code runs in a loop when enters the statement : "If IE Is Nothing Then"
Even when the google site is opened, the program still keeps prompting out the msgBox, and reopen the website again and again, and it never executes to the last part "Sheets("Prepare").Select". I am very confused because it works perfect in the IE 8 browsers. I am wondering if there is any difference between IE11 and IE8 in terms of vba IE function.
Please take a look up it and give me some ideas on this, your help is greatly appreciated. Thank you very much.
I ran into the identical problem and found, that the window name of the shell.application has changed in IE 11. Earlier versions had the name "Windows Internet Explorer", but IE 11 uses only "Internet Explorer"
If you change your if condition accordingly, it will work again ...
I think you are using an old version of internet explorer. Use the code I provided bellow and see if it works. Please make sure you have the following references added to your project:
Microsoft HTML Object Library
Microsoft Internet Controls.
If you are not sure how to add the references to your code check out this link.External References
Sub Test()
Dim objIE As InternetExplorer
Dim htmlDoc As HTMLDocument
Dim htmlFrame As HTMLFrameElement
Dim frame As HTMLIFrame
Dim htmlElement As HTMLDTElement
Dim myDoc As Object
Dim curSheet As Worksheet
' Set the variables
Set curSheet = ActiveWorkbook.ActiveSheet
Set objIE = New InternetExplorer
' Make the browser visible and navigate
With objIE
.Visible = True
.navigate "https://www.google.com"
End With
WaitForInternetToLoad objIE
Sheets("Prepare").Select
fPathName = Cells(5, 5).Value
Call MakeFolders(fPathName)
Call MakeFolders2(fPathName)
Call MakeFolders3(fPathName)
'fFileName = fPathName & "\*.xls"
fFileName = Dir(fPathName & "\*.xls")
End Sub
Sub WaitForInternetToLoad(ByRef ie As InternetExplorer)
Do
dovents
Loop While Not ie.readyState = READYSTATE_COMPLETE
End Sub
Extra Information.
Hi I see you are having some trouble with this and I want to extend my help.
Let's start by the fact that you are using two external libraries in your code; and you should have references set to them for the code to work properly. See my picture I have highlighted the libraries yellow.
What are the differences:
Microsoft Internet Controls
Is the one that takes care of the internet object. It creates an internet explorer application and it navigates to certain links. Once it finishes navigating to the url then " internet object" has a "document". This object cannot do anything else.
Microsoft HTML Object Library
This library takes care of the Html document. And as you probably guessed you will assign the "document" from the previous object (internet explorer) to a HTML document variable and then you can do further manipulation.
I need to scrape Title, product description and Product code and save it into worksheet from <<<HERE>>> in this case those are :
"Catherine Lansfield Helena Multi Bedspread - Double"
"This stunning ivory bedspread has been specially designed to sit with the Helena bedroom range. It features a subtle floral design with a diamond shaped quilted finish. The bedspread is padded so can be used as a lightweight quilt in the summer or as an extra layer in the winter.
Polyester.
Size L260, W240cm.
Suitable for a double bed.
Machine washable at 30°C.
Suitable for tumble drying.
EAN: 5055184924746.
Product Code 116/4196"
I have tried different methods and none was good for me in the end. For Mid and InStr functions result was none, it could be that my code was wrong. Sorry i do not give any code because i had already messed it up many times and have had no result. I have tried to scrape hole page with GetDatafromPage. It works well, but for different product pages the output goes to different rows as ammount of elements changes from page to page. Also it`s not possible to scrape only chosen elements. So it is pointless to get value from defined cells.
Another option instead of using the InternetExplorer object is the xmlhttp object. Here is a similar example to kekusemau but instead using xmlhttp object to request the page. I am then loading the responseText from the xmlhttp object in the html file.
Sub test()
Dim xml As Object
Set xml = CreateObject("MSXML2.XMLHTTP")
xml.Open "Get", "http://www.argos.co.uk/static/Product/partNumber/1164196.htm", False
xml.send
Dim doc As Object
Set doc = CreateObject("htmlfile")
doc.body.innerhtml = xml.responsetext
Dim name
Set name = doc.getElementById("pdpProduct").getElementsByTagName("h1")(0)
MsgBox name.innerText
Dim desc
Set desc = doc.getElementById("genericESpot_pdp_proddesc2colleft").getElementsByTagName("div")(0)
MsgBox desc.innerText
Dim id
Set id = doc.getElementById("pdpProduct").getElementsByTagName("span")(0).getElementsByTagName("span")(2)
MsgBox id.innerText
End Sub
This seems to be not too difficult. You can use Firefox to take a look at the page structure (right-click somewhere and click inspect element, and go on from there...)
Here is a simple sample code:
Sub test()
Dim ie As InternetExplorer
Dim x
Set ie = New InternetExplorer
ie.Visible = True
ie.Navigate "http://www.argos.co.uk/static/Product/partNumber/1164196.htm"
While ie.ReadyState <> READYSTATE_COMPLETE
DoEvents
Wend
Set x = ie.Document.getElementById("pdpProduct").getElementsByTagName("h1")(0)
MsgBox Trim(x.innerText)
Set x = ie.Document.getElementById("genericESpot_pdp_proddesc2colleft").getElementsByTagName("div")(0)
MsgBox x.innerText
Set x = ie.Document.getElementById("pdpProduct").getElementsByTagName("span")(0).getElementsByTagName("span")(2)
MsgBox x.innerText
ie.Quit
End Sub
(I have a reference in Excel to Microsoft Internet Controls, I don't know if that is there by default, if not you have to set it first to run this code).