Currently I have 2 pieces of code that work separately, but when used together they don't work properly.
The first code asks the user to input information which is stored. It then navigates to the correct webpage where it uses the stored user input information to navigate via filling and submitting a form. It arrives at the correct place.
The second code uses a specific URL via ie.navigate "insert url here" to navigate to the same place as the first code. It then scrapes URL data and stores it in a newly created sheet. It does this correctly.
When merging them I replace the navigation segment from the second code with the first code, but then it only stores the first 5 of 60 URLs as if it hadn't fully loaded the page before scraping data. It seems to skip the code directly after ie.document.forms(0).submit which is supposed to wait for the page to load before moving on to the scraping..
extra info: the button wasn't defined so I cannot just click it so I had to use ie.document.forms(0).submit
Summary of what I want the code to do:
request user input
store user input
open ie
navigate to page
enter user input into search field
select correct search category from listbox
submit form
'problem happens here
scrape url data
store url data in specific excel worksheet
The merged code:
Sub extractTablesData()
Dim ie As Object, obj As Object
Dim Var_input As String
Dim elemCollection As Object
Dim html As HTMLDocument
Dim Link As Object
Dim erow As Long
' create new sheet to store info
Application.DisplayAlerts = False
ThisWorkbook.Sheets("HL").Delete
ThisWorkbook.Sheets.Add.Name = "HL"
Application.DisplayAlerts = True
Set ie = CreateObject("InternetExplorer.Application")
Var_input = InputBox("Enter info")
With ie
.Visible = True
.navigate ("URL to the webpage")
While ie.readyState <> 4
DoEvents
Wend
'Input Term 1 into input box
ie.document.getElementById("trm1").Value = Var_input
'accessing the Field 1 ListBox
For Each obj In ie.document.all.Item("FIELD1").Options
If obj.Value = "value in listbox" Then
obj.Selected = True
End If
Next obj
' button undefined - using this to submit form
ie.document.forms(0).submit
'----------------------------------------------------------------
'seems to skip this part all together when merged
'Wait until IE is done loading page
Do While ie.readyState <> READYSTATE_COMPLETE
Application.StatusBar = "Trying to go to website…"
DoEvents
Loop
'----------------------------------------------------------------
Set html = ie.document
Set ElementCol = html.getElementsByTagName("a")
For Each Link In ElementCol
erow = Worksheets("HL").Cells(Rows.Count, 1).End(xlUp).Offset(1, 0).Row
Cells(erow, 1).Value = Link
Cells(erow, 1).Columns.AutoFit
Next
Application.StatusBar = “”
Application.ScreenUpdating = True
End With
End Sub
I've been stuck for quite some time on this and haven't found any solutions on my own so I'm reaching out. Any help will be greatly appreciated!
You mentioned you think the website might not be fully loaded. This is a common problem because of the more dynamic elements on a webpage. The easiest way to handle this is to insert the line:
Application.Wait Now + Timevalue("00:00:02")
This will force the code to pause for an additional 2 seconds. Insert this line below the code which waits for the page to load and this will give Internet Explorer a chance to catch back up. Depending on the website and the reliability of your connection to it I recommend adjusting this value anywhere up to about 5 seconds.
Most websites seem to require additional waiting like this, so handy code to remember when things don't work as expected. Hope this helps.
I solved this by using a completely different method. I used a query table with strings to go where I wanted.
Sub ExtractTableData()
Dim This_input As String
Const prefix As String = "Beginning of url"
Const postfix As String = "end of url"
Dim qt As QueryTable
Dim ws As Worksheet
Application.DisplayAlerts = False
ThisWorkbook.Sheets("HL").Delete
ThisWorkbook.Sheets.Add.Name = "HL"
Application.DisplayAlerts = True
This_input = InputBox("enter key info to go to specific url")
Set ws = ActiveSheet
Set qt = ws.QueryTables.Add( _
Connection:="URL;" & prefix & This_input & postfix, _
Destination:=Worksheets("HL").Range("A1"))
qt.RefreshOnFileOpen = True
qt.WebSelectionType = xlSpecifiedTables
'qt.webtables is key to getting the specific table on the page
qt.WebTables = 2
qt.Refresh BackgroundQuery:=False
End Sub
Related
Thanks in advance for your answers.
I have a little Excel document that is made so that the naïve user can enter web pages in an Excel sheet, hit a button, and play the videos from that page in their browser, in full screen, and automatically loop the videos without any further user interaction. It basically creates a slide show of videos.
I originally made it for YouTube and it works fine there. I'm now trying to expand it to use another site. It works as planned but needs an extra step.
Whereas YouTube was made with a Full Screen mode that I can access programatically, this website has embedded videos. (An example: https://www.sharecare.com/video/health-topics-a-z/copd/what-can-i-do-to-prevent-my-copd-from-getting-worse).
You can see in the code that I open IE in full screen mode (which it does) but that's the full web page (header, side banner etc.). I want the video from that page to be the only element, full screen.
If I physically go into the page I can select for the video to play full screen. I've tried searching for various ways to do this, but most of the posts are for something else or how to get a video to play inside Excel rather than what I'm doing.
Sub StartLooping()
Dim IEapp As Object
Dim VidAddr1, VidAddr2 As String
Dim AddrStrStart, AddrStrEnd As Long
Dim AddrFudge1, AddrFudge2 As Integer
Dim TimeStart, DurMin, DurSec, DurTot As Single
Dim LRAll, LRVid, LRMin, LRSec, LRVidB, LRMinB, LRSecB As Integer
Dim I As Integer
Application.EnableCancelKey = xlErrorHandler
On Error GoTo ErrorHandle
'Review Sheet
LRVid = Cells(Rows.Count, "D").End(xlUp).Row
LRMin = Cells(Rows.Count, "E").End(xlUp).Row
LRSec = Cells(Rows.Count, "F").End(xlUp).Row
LRVidB = Cells(Rows.Count, "I").End(xlUp).Row
LRMinB = Cells(Rows.Count, "J").End(xlUp).Row
LRSecB = Cells(Rows.Count, "K").End(xlUp).Row
LRAll = Cells(Rows.Count, "S").End(xlUp).Row
If LRVid <> LRMin Then
MsgBox "You have to include video address and how long (the minutes and the seconds - use 0 if needed)"
Exit Sub
End If
If LRVid <> LRSec Then
MsgBox "You have to include video address and how long (the minutes and the seconds - use 0 if needed)"
Exit Sub
End If
If LRVidB <> LRMinB Then
MsgBox "You have to include video address and how long (the minutes and the seconds - use 0 if needed)"
Exit Sub
End If
If LRVidB <> LRSecB Then
MsgBox "You have to include video address and how long (the minutes and the seconds - use 0 if needed)"
Exit Sub
End If
'Start of For-Next Loop
For I = 20 To LRAll
'Set Addr
'VidAddr1
If Len(Range("S" & I).Text) = 0 Then
Exit Sub
Else
VidAddr1 = Range("S" & I).Text
End If
VidAddr2 = VidAddr1
'Set Timer
TimeStart = Timer 'Start time
DurMin = Range("T" & I).Value
DurSec = Range("U" & I).Value
DurTot = (DurMin * 60) + DurSec
'Open the web page
Set IEapp = CreateObject("Internetexplorer.Application") 'Set IEapp = InternetExplorer
With IEapp
.Silent = True 'No Pop-ups
.Visible = True
.FullScreen = True
.Navigate VidAddr2 'Load web page
'Keep it open for the duration
Do While Timer < (TimeStart + DurTot)
'Check for Esc - refers to a public function
If KeyDown(vbKeyEscape) Then
IEapp.Quit
Set IEapp = Nothing
Exit Sub
End If
Loop
'Close the page
IEapp.Quit
Set IEapp = Nothing
End With
If I = LRAll Then I = 19
Next I
ErrorHandle:
MsgBox Err.Number & " " & Err.Description
Exit Sub
End Sub
I copied the code. It works fine. It's just that extra bit of "Oh, I did that, here's how to go about it" that I need.
The browser used is IE so I can keep it simple, but if this were possible in another common browser that would be good to know.
Here's the second set that I tried today (9/12)
Dim IEapp As Object
Dim IEAppColl As HTMLButtonElement
'Open doc
Set IEapp = CreateObject("Internetexplorer.Application") 'Set IEapp = InternetExplorer
With IEapp
.Silent = True 'No Pop-ups
.Visible = True
'.FullScreen = True
.navigate "https://www.sharecare.com/video/health-topics-a-z/copd/got-copd-ask-your-doctor-about-vitamin-d"
Do While .readyState < 4 Or .Busy
Loop
Set IEAppColl = IEapp.Document.getElementsByTagName("BUTTON")
If IEAppColl.Name = "Fullscreen" Then
IEAppColl.Click
End If
End With
For the example COPD page given this works with Selenium basic. You install from here and then go VBE > Tools > References > and add a reference to Selenium Type Library. You can also use an IEDriver to work with InternetExplorer rather than Chrome (which uses ChromeDriver).
Option Explicit
Public Sub PlayFullScreen()
Dim d As WebDriver, t As Date, ele As Object
Const MAX_WAIT_SEC As Long = 10
Set d = New ChromeDriver
Const URL = "https://www.sharecare.com/video/health-topics-a-z/copd/what-can-i-do-to-prevent-my-copd-from-getting-worse"
With d
.Start "Chrome"
.get URL
Do
DoEvents
On Error Resume Next
Set ele = .FindElementByCss("[title='Accept Cookies']")
If Timer - t > MAX_WAIT_SEC Then Exit Do
On Error GoTo 0
Loop While ele Is Nothing
Application.Wait Now + TimeSerial(0, 0, 1)
If Not ele Is Nothing Then ele.Click
.FindElementByCss("#myExperience").Click
Application.Wait Now + TimeSerial(0, 0, 1)
.FindElementByCss("[Title=Fullscreen]", timeout:=7000).Click
Stop '<==Delete me later
.Quit
End With
End Sub
My suggestion is to, instead of using a browser to play video, use a video player.
Under "More Controls" you should have a "Windows Media Player", and likely others, depending on what you have installed.
For example, I've used the VLC control on Access forms. When you install VLC it automatically adds the control to Office (so I assume Office has to be installed first.)
Here's a tutorial I found online:
Link: How to Play Video on Access Form
Random Tip Time:
It can be tricky to Google about a website because, for example, using the search term *YouTube* in a Google search results in a list of YouTube content (videos).
Exclude results from a specific site with Google's site: and - operators like:
"microsoft access" form play youtube video -site:youtube.com
...which is how I found the tutorial above, and several others.
Using the search term access can also be tricky to search for since it's such a common word, which is why I'll often enclose it in quotes like "MS Access" or "Microsoft Access" (like above), which makes Google search for those words in that order.
(More Google tips)
I've got a bit of code know that works with pulling information from a cell and paste it into a search engine to search. Which works fine as a tested on both google and bing pulls back what I want to search.
Now I've changed the code which I think is correct to an internal website but now get and run-time error and automation error. It will load the site fine, Just not search through it.
I'm no expert at this so still trying to learn VBA. Also if you can advise any pointers on the best way to troubleshoot things likes.
As at the moment, the only way I figure it out is to break each line see what it does and find the fault. But this is a bit beyond me.
The only thing I think it could be but might be wrong is missing a reference
I've got the HTML Object library and Internet controls.
Run time error
This is the code from the site for the search box.
<input class="genInput" type="text" size="32" name="xx_quicksearch" value="" id="quicksearchbox">
This is my code,
Sub Test()
Dim ie As Object
Dim form As Variant
Dim button As Variant
Dim LR As Integer
Dim var As String
LR = Cells(Rows.Count, 1).End(xlUp).Row
For x = 2 To LR
var = Cells(x, 1).Value
Set ie = CreateObject("internetexplorer.application")
ie.Visible = True
With ie
.Visible = True
.navigate "*******"
While Not .readyState = READYSTATE_COMPLETE
Wend
End With
'Wait some to time for loading the page
While ie.Busy
DoEvents
Wend
Application.Wait (Now + TimeValue("0:00:02"))
ie.document.getElementById("quicksearch").Value = var
'code to click the button
Set form = ie.document.getElementsByTagName("form")
Application.Wait (Now + TimeValue("0:00:02"))
Set button = form(0).onsubmit
form(0).submit
'wait for page to load
While ie.Busy
DoEvents
Wend
Next x
End Sub
The id is different from what you posted. In: ie.document.getElementById("quicksearch").Value = var try typing "quicksearchbox" rather than just "quicksearch"
I've written a script in vba using IE to parse some links from a webpage. The thing is the links are within an iframe. I've twitched my code in such a way so that the script will first find a link within that iframe and navigate to that new page and parse the required content from there. If i do this way then I can get all the links.
Webpage URL: weblink
Successful approach (working one):
Sub Get_Links()
Dim IE As New InternetExplorer, HTML As HTMLDocument
Dim elem As Object, post As Object
With IE
.Visible = True
.navigate "put here the above link"
While .Busy = True Or .readyState < 4: DoEvents: Wend
Set elem = .document.getElementById("compInfo") #it is within iframe
.navigate elem.src
While .Busy = True Or .readyState < 4: DoEvents: Wend
Set HTML = .document
End With
For Each post In HTML.getElementsByClassName("news")
With post.getElementsByTagName("a")
If .Length Then R = R + 1: Cells(R, 1) = .Item(0).href
End With
Next post
IE.Quit
End Sub
I've seen few sites where no such links exist within iframe so, I will have no option to use any link to track down the content.
If you take a look at the below approach by tracking the link then you can notice that I've parsed the content from a webpage which are within Iframe. There is no such link within Iframe to navigate to a new webpage to locate the content. So, I used contentWindow.document instead and found it working flawlessly.
Link to the working code of parsing Iframe content from another site:
contentWindow approach
However, my question is: why should i navigate to a new webpage to collect the links as I can see the content in the landing page? I tried using contentWindow.document but it is giving me access denied error. How can I make my below code work using contentWindow.document like I did above?
I tried like this but it throws access denied error:
Sub Get_Links()
Dim IE As New InternetExplorer, HTML As HTMLDocument
Dim frm As Object, post As Object
With IE
.Visible = True
.Navigate "put here the above link"
While .Busy = True Or .readyState < 4: DoEvents: Wend
Set HTML = .document
End With
''the code breaks when it hits the following line "access denied error"
Set frm = HTML.getElementById("compInfo").contentWindow.document
For Each post In frm.getElementsByClassName("news")
With post.getElementsByTagName("a")
If .Length Then R = R + 1: Cells(R, 1) = .Item(0).href
End With
Next post
IE.Quit
End Sub
I've attached an image to let you know which links (they are marked with pencil) I'm after.
These are the elements within which one such link (i would like to grab) is found:
<div class="news">
<span class="news-date_time"><img src="images/arrow.png" alt="">19 Jan 2018 00:01</span>
<a style="color:#5b5b5b;" href="/HomeFinancial.aspx?&cocode=INE117A01022&Cname=ABB-India-Ltd&srno=17019039003&opt=9">ABB India Limited - Press Release</a>
</div>
Image of the links of that page I would like to grab:
From the very first day while creating this thread I strictly requested not to use this url http://hindubusiness.cmlinks.com/Companydetails.aspx?cocode=INE117A01022 to locate the data. I requested any solution from this main_page_link without touching the link within iframe. However, everyone is trying to provide solutions that I've already shown in my post. What did I put a bounty for then?
You can see the links within <iframe> in browser but can't access them programmatically due to Same-origin policy.
There is the example showing how to retrieve the links using XHR and RegEx:
Option Explicit
Sub Test()
Dim sContent As String
Dim sUrl As String
Dim aLinks() As String
Dim i As Long
' Retrieve initial webpage HTML content via XHR
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", "https://www.thehindubusinessline.com/stocks/abb-india-ltd/overview/", False
.Send
sContent = .ResponseText
End With
'WriteTextFile sContent, CreateObject("WScript.Shell").SpecialFolders("Desktop") & "\tmp\tmp.htm", -1
' Extract target iframe URL via RegEx
With CreateObject("VBScript.RegExp")
.Global = True
.MultiLine = True
.IgnoreCase = True
' Process all a within div.news
.Pattern = "<iframe[\s\S]*?src=""([^""]*?Companydetails[^""]*)""[^>]*>"
sUrl = .Execute(sContent).Item(i).SubMatches(0)
End With
' Retrieve iframe HTML content via XHR
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", sUrl, False
.Send
sContent = .ResponseText
End With
'WriteTextFile sContent, CreateObject("WScript.Shell").SpecialFolders("Desktop") & "\tmp\tmp.htm", -1
' Parse links via XHR
With CreateObject("VBScript.RegExp")
.Global = True
.MultiLine = True
.IgnoreCase = True
' Process all anchors within div.news
.Pattern = "<div class=""news"">[\s\S]*?href=""([^""]*)"
With .Execute(sContent)
ReDim aLinks(0 To .Count - 1)
For i = 0 To .Count - 1
aLinks(i) = .Item(i).SubMatches(0)
Next
End With
End With
Debug.Print Join(aLinks, vbCrLf)
End Sub
Generally RegEx's aren't recommended for HTML parsing, so there is disclaimer. Data being processed in this case is quite simple that is why it is parsed with RegEx.
The output for me as follows:
/HomeFinancial.aspx?&cocode=INE117A01022&Cname=ABB-India-Ltd&srno=17047038016&opt=9
/HomeFinancial.aspx?&cocode=INE117A01022&Cname=ABB-India-Ltd&srno=17046039003&opt=9
/HomeFinancial.aspx?&cocode=INE117A01022&Cname=ABB-India-Ltd&srno=17045039006&opt=9
/HomeFinancial.aspx?&cocode=INE117A01022&Cname=ABB-India-Ltd&srno=17043039002&opt=9
/HomeFinancial.aspx?&cocode=INE117A01022&Cname=ABB-India-Ltd&srno=17043010019&opt=9
I also tried to copy the content of the <iframe> from IE to clipboard (for further pasting to the worksheet) using commands:
IE.ExecWB OLECMDID_SELECTALL, OLECMDEXECOPT_DODEFAULT
IE.ExecWB OLECMDID_COPY, OLECMDEXECOPT_DODEFAULT
But actually that commands select and copy the main document, excluding the frame, unless I click on the frame manually. So that might be applied if click on the frame could be reproduced from VBA (frame node methods like .focus and .click didn't help).
Something like this should work. They key is to realize the iFrame is technically another Document. Reviewing the iFrame on the page you listed, you can easily use a web request to get at the data you need. As already mentioned, the reason you get an error is due to the Same-Origin policy. You could write something to get the src of the iFrame then do the web request as I've shown below, or, use IE to scrape the page, get the src, then load that page which looks like what you have done.
I would recommend using a web request approach, Internet Explorer can get annoying, fast.
Code
Public Sub SOExample()
Dim html As Object 'To store the HTML content
Dim Elements As Object 'To store the anchor collection
Dim Element As Object 'To iterate the anchor collection
Set html = CreateObject("htmlFile")
With CreateObject("MSXML2.XMLHTTP")
'Navigate to the source of the iFrame, it's another page
'View the source for the iframe. Alternatively -
'you could navigate to this page and use IE to scrape it
.Open "GET", "https://stocks.thehindubusinessline.com/Companydetails.aspx?&cocode=INE117A01022"
.send ""
'See if the request was ok, exit it there was an error
If Not .Status = 200 Then Exit Sub
'Assign the page's HTML to an HTML object
html.body.InnerHTML = .responseText
Set Elements = html.body.document.getElementByID("hmstockchart_CompanyNews1_updateGLVV")
Set Elements = Elements.getElementsByTagName("a")
For Each Element In Elements
'Print out the data to the Immediate window
Debug.Print Element.InnerText
Next
End With
End Sub
Results
ABB India Limited - AGM/Book Closure
Board of ABB India recommends final dividend
ABB India to convene AGM
ABB India to pay dividend
ABB India Limited - Outcome of Board Meeting
More ?
The simple of solution like everyone suggested is to directly go the link. This would take the IFRAME out of picture and it would be easier for you loop through links. But in case you still don't like the approach then you need to get a bit deeper into the hole.
Below is a function from a library I wrote long back in VB.NET
https://github.com/tarunlalwani/ScreenCaptureAPI/blob/2646c627b4bb70e36fe2c6603acde4cee3354b39/Source%20Code/ScreenCaptureAPI/ScreenCaptureAPI/ScreenCapture.vb#L803
Private Function _EnumIEFramesDocument(ByVal wb As HTMLDocumentClass) As Collection
Dim pContainer As olelib.IOleContainer = Nothing
Dim pEnumerator As olelib.IEnumUnknown = Nothing
Dim pUnk As olelib.IUnknown = Nothing
Dim pBrowser As SHDocVW.IWebBrowser2 = Nothing
Dim pFramesDoc As Collection = New Collection
_EnumIEFramesDocument = Nothing
pContainer = wb
Dim i As Integer = 0
' Get an enumerator for the frames
If pContainer.EnumObjects(olelib.OLECONTF.OLECONTF_EMBEDDINGS, pEnumerator) = 0 Then
pContainer = Nothing
' Enumerate and refresh all the frames
Do While pEnumerator.Next(1, pUnk) = 0
On Error Resume Next
' Clear errors
Err.Clear()
' Get the IWebBrowser2 interface
pBrowser = pUnk
If Err.Number = 0 Then
pFramesDoc.Add(pBrowser.Document)
i = i + 1
End If
Loop
pEnumerator = Nothing
End If
_EnumIEFramesDocument = pFramesDoc
End Function
So basically this is a VB.NET version of below C++ version
Accessing body (at least some data) in a iframe with IE plugin Browser Helper Object (BHO)
Now you just need to port it to VBA. The only problem you may have is finding the olelib rerefernce. Rest most of it is VBA compatible
So once you get the array of object, you will find the one which belongs to your frame and then you can just that one
frames = _EnumIEFramesDocument(IE)
frames.Item(1).document.getElementsByTagName("A").length
I have a VERY basic script I am using with an excel spreadsheet to populate a form. It inserts data into the fields then waits for me to click the submit button on each page. Then it waits for the page to load and fills in the next set of fields. I click submit, next page loads, etc. Finally, on the last two pages I have to do some manual input that I can't do with the spreadsheet. Code thus far is shown below.
My question is, at the end of what I have there now, how can I make the system wait for me to fill out that last page, then once I submit it realize that it has been submitted and loop back to the beginning, incrementing the row on the spreadsheet so that we can start over and do the whole thing again for the next student?
As you will probably be able to tell I am not a programmer, just a music teacher who does not savor the idea of filling out these forms manually for all 200 of my students and got the majority of the code you see from a tutorial.
Function FillInternetForm()
Dim IE As Object
Set IE = CreateObject("InternetExplorer.Application")
'create new instance of IE. use reference to return current open IE if
'you want to use open IE window. Easiest way I know of is via title bar.
IE.Navigate "https://account.makemusic.com/Account/Create/?ReturnUrl=/OpenId/VerifyGradebookRequest"
'go to web page listed inside quotes
IE.Visible = True
While IE.busy
DoEvents 'wait until IE is done loading page.
Wend
IE.Document.All("BirthMonth").Value = "1"
IE.Document.All("BirthYear").Value = "2000"
IE.Document.All("Email").Value = ThisWorkbook.Sheets("queryRNstudents").Range("f2")
IE.Document.All("Password").Value = ThisWorkbook.Sheets("queryRNstudents").Range("e2")
IE.Document.All("PasswordConfirm").Value = ThisWorkbook.Sheets("queryRNstudents").Range("e2")
IE.Document.All("Country").Value = "USA"
IE.Document.All("responseButtonsDiv").Click
newHour = Hour(Now())
newMinute = Minute(Now())
newSecond = Second(Now()) + 3
waitTime = TimeSerial(newHour, newMinute, newSecond)
Application.Wait waitTime
IE.Document.All("FirstName").Value = ThisWorkbook.Sheets("queryRNstudents").Range("a2")
IE.Document.All("LastName").Value = ThisWorkbook.Sheets("queryRNstudents").Range("b2")
IE.Document.All("Address1").Value = "123 Nowhere St"
IE.Document.All("City").Value = "Des Moines"
IE.Document.All("StateProvince").Value = "IA"
IE.Document.All("ZipPostalCode").Value = "50318"
End Function
I would use the events of the IE, more specifically the form, something like this, using MSHTML Controls library.
Private WithEvents IEForm As MSHTML.HTMLFormElement
Public Sub InternetExplorerTest()
Dim ie As SHDocVw.InternetExplorer
Dim doc As MSHTML.HTMLDocument
Set ie = New SHDocVw.InternetExplorer
ie.Visible = 1
ie.navigate "http://stackoverflow.com/questions/tagged/vba"
While ie.readyState <> READYSTATE_COMPLETE Or ie.Busy
DoEvents
Wend
Set doc = ie.document
Set IEForm = doc.forms(0)
End Sub
Private Function IEForm_onsubmit() As Boolean
MsgBox "Form Submitted"
End Function
I am trying to find a way to get the data from yelp.com
I have a spreadsheet on which there are several keywords and locations. I am looking to extract data from yelp listings based on these keywords and locations already in my spreadsheet.
I have created the following code, but it seems to get absurd data and not the exact information I am looking for.
I want to get business name, address and phone number, but all I am getting is nothing. If anyone here could help me solve this problem.
Sub find()
Dim ie As Object
Set ie = CreateObject("InternetExplorer.Application")
With ie
ie.Visible = False
ie.Navigate "http://www.yelp.com/search?find_desc=boutique&find_loc=New+York%2C+NY&ns=1&ls=3387133dfc25cc99#start=10"
' Don't show window
ie.Visible = False
'Wait until IE is done loading page
Do While ie.Busy
Application.StatusBar = "Downloading information, lease wait..."
DoEvents
Loop
' Make a string from IE content
Set mDoc = ie.Document
peopleData = mDoc.body.innerText
ActiveSheet.Cells(1, 1).Value = peopleData
End With
peopleData = "" 'Nothing
Set mDoc = Nothing
End Sub
If you right click in IE, and do View Source, it is apparent that the data served on the site is not part of the document's .Body.innerText property. I notice this is often the case with dynamically served data, and that approach is really too simple for most web-scraping.
I open it in Google Chrome and inspect the elements to get an idea of what I'm really looking for, and how to find it using a DOM/HTML parser; you will need to add a reference to Microsoft HTML Object Library.
I think you can get it to return a collection of the <DIV> tags, and then check those for the classname with an If statment inside the loop.
I made some revisions to my original answer, this should print each record in a new cell:
Option Explicit
Private Sub Sleep Lib "kernel32" (ByVal dwMilliseconds As Long)
Sub find()
'Uses late binding, or add reference to Microsoft HTML Object Library
' and change variable Types to use intellisense
Dim ie As Object 'InternetExplorer.Application
Dim html As Object 'HTMLDocument
Dim Listings As Object 'IHTMLElementCollection
Dim l As Object 'IHTMLElement
Dim r As Long
Set ie = CreateObject("InternetExplorer.Application")
With ie
.Visible = False
.Navigate "http://www.yelp.com/search?find_desc=boutique&find_loc=New+York%2C+NY&ns=1&ls=3387133dfc25cc99#start=10"
' Don't show window
'Wait until IE is done loading page
Do While .readyState <> 4
Application.StatusBar = "Downloading information, Please wait..."
DoEvents
Sleep 200
Loop
Set html = .Document
End With
Set Listings = html.getElementsByTagName("LI") ' ## returns the list
For Each l In Listings
'## make sure this list item looks like the listings Div Class:
' then, build the string to put in your cell
If InStr(1, l.innerHTML, "media-block clearfix media-block-large main-attributes") > 0 Then
Range("A1").Offset(r, 0).Value = l.innerText
r = r + 1
End If
Next
Set html = Nothing
Set ie = Nothing
End Sub