Scrape data from 'nested' iframe with vba - vba

Am trying to scrape the title and price from mcmaster.com to auto populate an internal purchase rec form. (URL with item part number is hard coded below for testing.) Got the title fine, but can't get the price. I don't think my code is looking at the correct place because the element is not found. I modeled my code from here.
Sub get_title_header()
Dim wb As Object
Dim doc As Object
Dim sURL As String
Set wb = CreateObject("internetExplorer.Application")
sURL = "https://www.mcmaster.com/#95907A480"
wb.navigate sURL
wb.Visible = True
While wb.Busy Or wb.readyState <> READYSTATE_COMPLETE
DoEvents
Wend
'Wait a bit for everything to catchup...(network?)
Application.Wait (Now + TimeValue("0:00:02"))
'HTML document
Set doc = wb.document
Dim MyString As String
MyString = doc.Title
Sheet2.Cells(1, 2) = Trim(Replace(MyString, "McMaster-Carr -", "", , , vbTextCompare))
Dim el As Object
For Each el In doc.getElementsByName("Prce")
Sheet2.Cells(2, 2).Value = el.innerText
Next el
wb.Quit
End Sub
Is this iframe nested? Can anyone explain why my code can't see the iframe data and help me get the item price? Thanks in advance!

The element you try to scrape in your link is nested as follows:
The first mistake you do is that you're trying to get this object by name, and not by id:
doc.getElementsByName("Prce")
There's no object in your HTML document having name="Prce", so I'd expect the For cycle not even to start.
What you might want to do to get your price, instead, is:
Sheet2.Cells(2, 2).Value = doc.getElementById("Prce").getElementsByClassName("PrceTxt")(0).innerText
where:
doc.getElementById("Prce") gets you the element <div id="Prce">...</div>
.getElementsByClassName("PrceTxt")(0) gets you the first (and only) element inside the above div, i.e. <div class="PrceTxt">...</div>
.innerText gets you the text of this element

Related

Trying to get VBA to pull in data from Zillow

I am new to coding and have been trying to figure out how to extract specific data from zillow and import it into excel. To be honest I am pretty lost trying to figure this out and I have been looking throughout the form and other online videos, but I haven't had any luck.
Here is the link to the website I am using https://www.zillow.com/new-york-ny/home-values/
I am looking to pull all the numbers into excel so I can run some calculations. If someone could help me just pull in the Zillow Home Value Index of $660,000 into excel, I feel that I can figure out the rest.
This is the code from the website
<ul class="value-info-list" id="yui_3_18_1_1_1529698944920_2626">
<li id="yui_3_18_1_1_1529698944920_2625">
<!-- TODO: need zillow logo icon here -->
<!-- <span class="zss-logo-color"><span class="zss-font-icon"></span></span> -->
<span class="value" id="yui_3_18_1_1_1529698944920_2624">
$660,000
</span>
<span class="info zsg-fineprint"> ZHVI
</span>
I tried getElementsByTagName getElementById and getElemenByClass The id is confusing me since I want to be able to enter any town into excel and it will search on zillow for the data on the web page. All the id tags are different so if I search by id in this code it will not work for other towns. I used the Class tag and was able to get some of the data I was looking for.
This is the code I came up with It pulls into the text box the $660,000. The Range function is working and putting the text box data into excel. This is pulling a bunch of strings which I was able to pull out the $660,000, but the way the sting is set up Im not sure how to pull the remaining data, such as the 1 year forecast "yr_forcast" is the cell range I want to pull the data into excel.
Sub SearchBot1()
'dimension (declare or set aside memory for) our variables
Dim objIE As InternetExplorer 'special object variable representing the IE browser
Dim aEle As HTMLLinkElement 'special object variable for an <a> (link) element
Dim y As Integer 'integer variable we'll use as a counter
Dim result As String 'string variable that will hold our result link
Dim Doc As HTMLDocument 'holds document object for internet explorer
'initiating a new instance of Internet Explorer and asigning it to objIE
Set objIE = New InternetExplorer
'make IE browser visible (False would allow IE to run in the background)
objIE.Visible = True
'navigate IE to this web page (a pretty neat search engine really)
objIE.navigate "https://www.zillow.com/new-york-ny/home-values/"
'wait here a few seconds while the browser is busy
Do While objIE.Busy = True Or objIE.readyState <> 4: DoEvents: Loop
'in the search box put cell "A2" value, the word "in" and cell "C1" value
objIE.document.getElementById("local-search").Value = _
Sheets("Sheet2").Range("B3").Value & ", " & Sheets("Sheet2").Range("B4").Value
'click the 'go' button
Set the_input_elements = objIE.document.getElementsByTagName("button")
For Each input_element In the_input_elements
If input_element.getAttribute("name") = "SubmitButton" Then
input_element.Click
Exit For
End If
Next input_element
'wait again for the browser
Do While objIE.Busy = True Or objIE.readyState <> 4: DoEvents: Loop
'price for home
Set Doc = objIE.document
Dim cclass As String
cclass = Trim(Doc.getElementsByClassName("value-info-list")(0).innerText)
MsgBox cclass
Dim aclass As Variant
aclass = Split(cclass, " ")
Range("Market_Price").Value = aclass(0)
Range("yr_forecast").Value = aclass(5)
'close the browser
objIE.Quit
End Sub
If you need anymore information please let me know.
The value you want is the first element with className value. You can use querySelector to apply a CSS selector of .value, where "." is the selector for class, to get this value.
Option Explicit
Public Sub GetInfo()
Dim html As New MSHTML.HTMLDocument
Const URL As String = "https://www.zillow.com/new-york-ny/home-values/"
html.body.innerHTML = GetHTML(URL)
Debug.Print html.querySelector(".value").innerText
End Sub
Public Function GetHTML(ByVal URL As String) As String
Dim sResponse As String
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", URL, False
.send
sResponse = StrConv(.responseBody, vbUnicode)
End With
GetHTML = Mid$(sResponse, InStr(1, sResponse, "<!DOCTYPE "))
End Function
You could also use:
Debug.Print html.getElementsByClassName("value")(0).innerText
Current webpage value:
Code output:

VBA Macro to retrieve information from a google header after a search

I am writing a macro that will go into google to search for the county of several cities. So far I have managed to make the macro search for the correct terms but the only roadblock I have is when I get the results. I don't know what to write to retrieve them.
When I do the search I get this result:
http://i.stack.imgur.com/O8jEh.png
What I want to do is retrieve the "Suffolk County" line but I am at lost on how to do that.
Does anybody have any tips or know how to do this or what to look for? I already tried inspecting the item but to no avail.
This is how my code looks so far:
Sub AutomateIE()
Dim ie As InternetExplorer
Dim City As String
Dim State As String
Dim URL As String
Set ie = New InternetExplorer
'For testing Purposes
City = "boston"
State = "MA"
'Search google for state and county
URL = "www.google.com/?safe=active&ssui=on#q=" + City + "+" + State + "+county&safe=active&ssui=on"
ie.Navigate URL
'Loop unitl ie page is fully loaded
Do Until ie.ReadyState = READYSTATE_COMPLETE
Loop
End Sub
You'll need to make sure the "Microsoft HTML Object Library" and "Microsoft Internet Controls" are available. In the VB Window, go to Tools -> References, and put a check mark by that library (may have to scroll to find it).
Try this:
Sub test()
Dim IE As New InternetExplorer
Dim city$, state$
'For testing Purposes
city = "boston"
state = "MA"
'Search google for state and county
URL = "www.google.com/?safe=active&ssui=on#q=" + city + "+" + state + "+county&safe=active&ssui=on"
IE.navigate URL
IE.Visible = False
Do
DoEvents
Loop Until IE.readyState = READYSTATE_COMPLETE
Application.Wait (Now() + TimeValue("00:00:016")) ' For internal page refresh or loading
Dim doc As HTMLDocument 'variable for document or data which need to be extracted out of webpage
Set doc = IE.document
Dim dd As Variant
dd = doc.getElementsByClassName("_eF")(0).innerText
' Now, trim the name to take the State out
dd = Left(dd, WorksheetFunction.Search(",", dd) - 1)
MsgBox dd
End Sub

VBA skipping code directly after submitting form in IE

Currently I have 2 pieces of code that work separately, but when used together they don't work properly.
The first code asks the user to input information which is stored. It then navigates to the correct webpage where it uses the stored user input information to navigate via filling and submitting a form. It arrives at the correct place.
The second code uses a specific URL via ie.navigate "insert url here" to navigate to the same place as the first code. It then scrapes URL data and stores it in a newly created sheet. It does this correctly.
When merging them I replace the navigation segment from the second code with the first code, but then it only stores the first 5 of 60 URLs as if it hadn't fully loaded the page before scraping data. It seems to skip the code directly after ie.document.forms(0).submit which is supposed to wait for the page to load before moving on to the scraping..
extra info: the button wasn't defined so I cannot just click it so I had to use ie.document.forms(0).submit
Summary of what I want the code to do:
request user input
store user input
open ie
navigate to page
enter user input into search field
select correct search category from listbox
submit form
'problem happens here
scrape url data
store url data in specific excel worksheet
The merged code:
Sub extractTablesData()
Dim ie As Object, obj As Object
Dim Var_input As String
Dim elemCollection As Object
Dim html As HTMLDocument
Dim Link As Object
Dim erow As Long
' create new sheet to store info
Application.DisplayAlerts = False
ThisWorkbook.Sheets("HL").Delete
ThisWorkbook.Sheets.Add.Name = "HL"
Application.DisplayAlerts = True
Set ie = CreateObject("InternetExplorer.Application")
Var_input = InputBox("Enter info")
With ie
.Visible = True
.navigate ("URL to the webpage")
While ie.readyState <> 4
DoEvents
Wend
'Input Term 1 into input box
ie.document.getElementById("trm1").Value = Var_input
'accessing the Field 1 ListBox
For Each obj In ie.document.all.Item("FIELD1").Options
If obj.Value = "value in listbox" Then
obj.Selected = True
End If
Next obj
' button undefined - using this to submit form
ie.document.forms(0).submit
'----------------------------------------------------------------
'seems to skip this part all together when merged
'Wait until IE is done loading page
Do While ie.readyState <> READYSTATE_COMPLETE
Application.StatusBar = "Trying to go to website…"
DoEvents
Loop
'----------------------------------------------------------------
Set html = ie.document
Set ElementCol = html.getElementsByTagName("a")
For Each Link In ElementCol
erow = Worksheets("HL").Cells(Rows.Count, 1).End(xlUp).Offset(1, 0).Row
Cells(erow, 1).Value = Link
Cells(erow, 1).Columns.AutoFit
Next
Application.StatusBar = “”
Application.ScreenUpdating = True
End With
End Sub
I've been stuck for quite some time on this and haven't found any solutions on my own so I'm reaching out. Any help will be greatly appreciated!
You mentioned you think the website might not be fully loaded. This is a common problem because of the more dynamic elements on a webpage. The easiest way to handle this is to insert the line:
Application.Wait Now + Timevalue("00:00:02")
This will force the code to pause for an additional 2 seconds. Insert this line below the code which waits for the page to load and this will give Internet Explorer a chance to catch back up. Depending on the website and the reliability of your connection to it I recommend adjusting this value anywhere up to about 5 seconds.
Most websites seem to require additional waiting like this, so handy code to remember when things don't work as expected. Hope this helps.
I solved this by using a completely different method. I used a query table with strings to go where I wanted.
Sub ExtractTableData()
Dim This_input As String
Const prefix As String = "Beginning of url"
Const postfix As String = "end of url"
Dim qt As QueryTable
Dim ws As Worksheet
Application.DisplayAlerts = False
ThisWorkbook.Sheets("HL").Delete
ThisWorkbook.Sheets.Add.Name = "HL"
Application.DisplayAlerts = True
This_input = InputBox("enter key info to go to specific url")
Set ws = ActiveSheet
Set qt = ws.QueryTables.Add( _
Connection:="URL;" & prefix & This_input & postfix, _
Destination:=Worksheets("HL").Range("A1"))
qt.RefreshOnFileOpen = True
qt.WebSelectionType = xlSpecifiedTables
'qt.webtables is key to getting the specific table on the page
qt.WebTables = 2
qt.Refresh BackgroundQuery:=False
End Sub

VBA: not able to pull value within <input> tag using getelementsbyTagname().Value

This is my first post on stackflow :) I've been Googling VBA knowledge and writing some VBA for about a month.
My computer info:
1.window 8.1
2.excel 2013
3.ie 11
My excel reference
Microsoft Object Library: yes
Microsoft Internet Controls: yes
Microsoft Form 2.0 Object library: yes
Microsoft Script Control 1.0: yes
Issue:
I was trying to retrieve data from internet explorer automatically using VBA.
I would like to retrieve the value within an input tag from a id called "u_0_1" which is under a id called "facebook". I am expecting to retrieve the value "AQFFmT0qn1TW" on cell c2. However, it got this msg popped up after I run the VBA "run-time error '91':object variable or with block variable not set.
I have been trying this for a couple of weeks using different methods such as,
1.getelementsbyClassname
2.getelementbyid
3.getelementsbyTagname
But it just doesn't work.
url:
http://coursesweb.net/javascript/getelementsbytagname
Below is my VBA code. Could you guys help me out a little bit please?
Private Sub CommandButton1_Click()
Dim ie As Object
Dim Doc As HTMLDocument
Dim getThis As String
Set ie = CreateObject("InternetExplorer.Application")
ie.Visible = 0
ie.navigate "http://coursesweb.net/javascript/getelementsbytagname"
Do
DoEvents
Loop Until ie.readyState = 4
Set Doc = ie.document
getThis = Trim(Doc.getElementById("u_0_1")(0).getElementsByTagName("input")(0).Value)
Range("c2").Value = getThis
End Sub
Thanks for your help. I have no idea that there is difference between JS and VBA in aspect of getelementsby () methods. And using the loop method to find the id which I find it very useful as well.
I still have some issues to retrieve value from a form or input type. I hope that you could help me or give me some suggestions as well.
Expected Result:
retrieve the value "AQFFmT0qn1TW" and copy it on Cell ("c2") automatically.
Actual Result:
nothing return to Cell ("C2")
Below is the HTML elements.
<form rel="async" ajaxify="/plugins/like/connect" method="post" action="/plugins/like/connect" onsubmit="return window.Event && Event.__inlineSubmit && Event.__inlineSubmit(this,event)" id="u_0_1">
<input type="hidden" name="fb_dtsg" value="AQFFmT0qn1TW" autocomplete="off">
Below is the VBA code based on your code.
Private Sub CommandButton1_Click()
Dim ie As Object
Dim Doc As HTMLDocument
Dim Elements As IHTMLElementCollection
Dim Element As IHTMLElement
Set ie = CreateObject("InternetExplorer.Application")
ie.Visible = 0
ie.navigate "http://coursesweb.net/javascript/getelementsbytagname"
Do
DoEvents
Loop Until ie.readyState = 4
Set Doc = ie.document
Set Elements = Doc.getElementsByTagName("input")
For Each Element In Elements
If Element.name = "fb_dtsg" Then
Range("c2").Value = Element.innerText
End If
Next Element
Set Elements = Nothing
End Sub
Cheers.
first of all, I can't find in source of website tags you were searching. Anyway, I think you can't chain getElementById.getElementsByTag as in JS. You have to loop through collection of document elements.
Private Sub CommandButton1_Click()
Dim ie As Object
Dim Doc As HTMLDocument
Dim Elements As IHTMLElementCollection
Dim Element As IHTMLElement
Set ie = CreateObject("InternetExplorer.Application")
ie.Visible = 0
ie.navigate "http://coursesweb.net/javascript/getelementsbytagname"
Do
DoEvents
Loop Until ie.readyState = 4
Set Doc = ie.document
Set Elements = Doc.getElementsByTagName("ul")
For Each Element In Elements
If Element.ID = "ex4" Then
Sheets(1).Cells(1, 1).Value = Element.innerText
End If
Next Element
Set Elements = Nothing
End Sub
First I'm getting collection of tags "ul", then looping through them for id "ex4". In your case you'd get collection of "input"s then loop for id you want. Finding id which is followed by different id shouldn't be hard, just some if...thens.
If you need further assistant please respond with url in which I can find exactly what you're looking for.
Cheers

Get data from listings on a website to excel VBA

I am trying to find a way to get the data from yelp.com
I have a spreadsheet on which there are several keywords and locations. I am looking to extract data from yelp listings based on these keywords and locations already in my spreadsheet.
I have created the following code, but it seems to get absurd data and not the exact information I am looking for.
I want to get business name, address and phone number, but all I am getting is nothing. If anyone here could help me solve this problem.
Sub find()
Dim ie As Object
Set ie = CreateObject("InternetExplorer.Application")
With ie
ie.Visible = False
ie.Navigate "http://www.yelp.com/search?find_desc=boutique&find_loc=New+York%2C+NY&ns=1&ls=3387133dfc25cc99#start=10"
' Don't show window
ie.Visible = False
'Wait until IE is done loading page
Do While ie.Busy
Application.StatusBar = "Downloading information, lease wait..."
DoEvents
Loop
' Make a string from IE content
Set mDoc = ie.Document
peopleData = mDoc.body.innerText
ActiveSheet.Cells(1, 1).Value = peopleData
End With
peopleData = "" 'Nothing
Set mDoc = Nothing
End Sub
If you right click in IE, and do View Source, it is apparent that the data served on the site is not part of the document's .Body.innerText property. I notice this is often the case with dynamically served data, and that approach is really too simple for most web-scraping.
I open it in Google Chrome and inspect the elements to get an idea of what I'm really looking for, and how to find it using a DOM/HTML parser; you will need to add a reference to Microsoft HTML Object Library.
I think you can get it to return a collection of the <DIV> tags, and then check those for the classname with an If statment inside the loop.
I made some revisions to my original answer, this should print each record in a new cell:
Option Explicit
Private Sub Sleep Lib "kernel32" (ByVal dwMilliseconds As Long)
Sub find()
'Uses late binding, or add reference to Microsoft HTML Object Library
' and change variable Types to use intellisense
Dim ie As Object 'InternetExplorer.Application
Dim html As Object 'HTMLDocument
Dim Listings As Object 'IHTMLElementCollection
Dim l As Object 'IHTMLElement
Dim r As Long
Set ie = CreateObject("InternetExplorer.Application")
With ie
.Visible = False
.Navigate "http://www.yelp.com/search?find_desc=boutique&find_loc=New+York%2C+NY&ns=1&ls=3387133dfc25cc99#start=10"
' Don't show window
'Wait until IE is done loading page
Do While .readyState <> 4
Application.StatusBar = "Downloading information, Please wait..."
DoEvents
Sleep 200
Loop
Set html = .Document
End With
Set Listings = html.getElementsByTagName("LI") ' ## returns the list
For Each l In Listings
'## make sure this list item looks like the listings Div Class:
' then, build the string to put in your cell
If InStr(1, l.innerHTML, "media-block clearfix media-block-large main-attributes") > 0 Then
Range("A1").Offset(r, 0).Value = l.innerText
r = r + 1
End If
Next
Set html = Nothing
Set ie = Nothing
End Sub