Manipulate a web-page control with VBA - vba

Thanks in advance for your answers.
I have a little Excel document that is made so that the naïve user can enter web pages in an Excel sheet, hit a button, and play the videos from that page in their browser, in full screen, and automatically loop the videos without any further user interaction. It basically creates a slide show of videos.
I originally made it for YouTube and it works fine there. I'm now trying to expand it to use another site. It works as planned but needs an extra step.
Whereas YouTube was made with a Full Screen mode that I can access programatically, this website has embedded videos. (An example: https://www.sharecare.com/video/health-topics-a-z/copd/what-can-i-do-to-prevent-my-copd-from-getting-worse).
You can see in the code that I open IE in full screen mode (which it does) but that's the full web page (header, side banner etc.). I want the video from that page to be the only element, full screen.
If I physically go into the page I can select for the video to play full screen. I've tried searching for various ways to do this, but most of the posts are for something else or how to get a video to play inside Excel rather than what I'm doing.
Sub StartLooping()
Dim IEapp As Object
Dim VidAddr1, VidAddr2 As String
Dim AddrStrStart, AddrStrEnd As Long
Dim AddrFudge1, AddrFudge2 As Integer
Dim TimeStart, DurMin, DurSec, DurTot As Single
Dim LRAll, LRVid, LRMin, LRSec, LRVidB, LRMinB, LRSecB As Integer
Dim I As Integer
Application.EnableCancelKey = xlErrorHandler
On Error GoTo ErrorHandle
'Review Sheet
LRVid = Cells(Rows.Count, "D").End(xlUp).Row
LRMin = Cells(Rows.Count, "E").End(xlUp).Row
LRSec = Cells(Rows.Count, "F").End(xlUp).Row
LRVidB = Cells(Rows.Count, "I").End(xlUp).Row
LRMinB = Cells(Rows.Count, "J").End(xlUp).Row
LRSecB = Cells(Rows.Count, "K").End(xlUp).Row
LRAll = Cells(Rows.Count, "S").End(xlUp).Row
If LRVid <> LRMin Then
MsgBox "You have to include video address and how long (the minutes and the seconds - use 0 if needed)"
Exit Sub
End If
If LRVid <> LRSec Then
MsgBox "You have to include video address and how long (the minutes and the seconds - use 0 if needed)"
Exit Sub
End If
If LRVidB <> LRMinB Then
MsgBox "You have to include video address and how long (the minutes and the seconds - use 0 if needed)"
Exit Sub
End If
If LRVidB <> LRSecB Then
MsgBox "You have to include video address and how long (the minutes and the seconds - use 0 if needed)"
Exit Sub
End If
'Start of For-Next Loop
For I = 20 To LRAll
'Set Addr
'VidAddr1
If Len(Range("S" & I).Text) = 0 Then
Exit Sub
Else
VidAddr1 = Range("S" & I).Text
End If
VidAddr2 = VidAddr1
'Set Timer
TimeStart = Timer 'Start time
DurMin = Range("T" & I).Value
DurSec = Range("U" & I).Value
DurTot = (DurMin * 60) + DurSec
'Open the web page
Set IEapp = CreateObject("Internetexplorer.Application") 'Set IEapp = InternetExplorer
With IEapp
.Silent = True 'No Pop-ups
.Visible = True
.FullScreen = True
.Navigate VidAddr2 'Load web page
'Keep it open for the duration
Do While Timer < (TimeStart + DurTot)
'Check for Esc - refers to a public function
If KeyDown(vbKeyEscape) Then
IEapp.Quit
Set IEapp = Nothing
Exit Sub
End If
Loop
'Close the page
IEapp.Quit
Set IEapp = Nothing
End With
If I = LRAll Then I = 19
Next I
ErrorHandle:
MsgBox Err.Number & " " & Err.Description
Exit Sub
End Sub
I copied the code. It works fine. It's just that extra bit of "Oh, I did that, here's how to go about it" that I need.
The browser used is IE so I can keep it simple, but if this were possible in another common browser that would be good to know.
Here's the second set that I tried today (9/12)
Dim IEapp As Object
Dim IEAppColl As HTMLButtonElement
'Open doc
Set IEapp = CreateObject("Internetexplorer.Application") 'Set IEapp = InternetExplorer
With IEapp
.Silent = True 'No Pop-ups
.Visible = True
'.FullScreen = True
.navigate "https://www.sharecare.com/video/health-topics-a-z/copd/got-copd-ask-your-doctor-about-vitamin-d"
Do While .readyState < 4 Or .Busy
Loop
Set IEAppColl = IEapp.Document.getElementsByTagName("BUTTON")
If IEAppColl.Name = "Fullscreen" Then
IEAppColl.Click
End If
End With

For the example COPD page given this works with Selenium basic. You install from here and then go VBE > Tools > References > and add a reference to Selenium Type Library. You can also use an IEDriver to work with InternetExplorer rather than Chrome (which uses ChromeDriver).
Option Explicit
Public Sub PlayFullScreen()
Dim d As WebDriver, t As Date, ele As Object
Const MAX_WAIT_SEC As Long = 10
Set d = New ChromeDriver
Const URL = "https://www.sharecare.com/video/health-topics-a-z/copd/what-can-i-do-to-prevent-my-copd-from-getting-worse"
With d
.Start "Chrome"
.get URL
Do
DoEvents
On Error Resume Next
Set ele = .FindElementByCss("[title='Accept Cookies']")
If Timer - t > MAX_WAIT_SEC Then Exit Do
On Error GoTo 0
Loop While ele Is Nothing
Application.Wait Now + TimeSerial(0, 0, 1)
If Not ele Is Nothing Then ele.Click
.FindElementByCss("#myExperience").Click
Application.Wait Now + TimeSerial(0, 0, 1)
.FindElementByCss("[Title=Fullscreen]", timeout:=7000).Click
Stop '<==Delete me later
.Quit
End With
End Sub

My suggestion is to, instead of using a browser to play video, use a video player.
Under "More Controls" you should have a "Windows Media Player", and likely others, depending on what you have installed.
For example, I've used the VLC control on Access forms. When you install VLC it automatically adds the control to Office (so I assume Office has to be installed first.)
Here's a tutorial I found online:
Link: How to Play Video on Access Form
Random Tip Time:
It can be tricky to Google about a website because, for example, using the search term *YouTube* in a Google search results in a list of YouTube content (videos).
Exclude results from a specific site with Google's site: and - operators like:
"microsoft access" form play youtube video -site:youtube.com
...which is how I found the tutorial above, and several others.
Using the search term access can also be tricky to search for since it's such a common word, which is why I'll often enclose it in quotes like "MS Access" or "Microsoft Access" (like above), which makes Google search for those words in that order.
(More Google tips)

Related

Office 2016 VBA fails to open Internet Explorer shell window but works in Office 2013

I inherited this VBA script from my predecessor. It works fine for me in Excel 2013 up until recently when I was told I may need to work from home. Come to find out, the Office 2016 environment of my newly accessed VPN desktop does not like this script. I keep getting "The remote server machine is unknown or unavailable" when it reaches .ReadyState <> READYSTATE_COMPLETE.
The navigation did not fail as I can see the window where it successfully navigated to the URL and I can interact with it correctly. The strange thing is if I change the URL to "www.google.com" I get a valid ready state result.
I also need to figure out how to late bind the Shell Windows so it will work with both the v15 and v16 libraries simultaneously.
The intent of this script is to automate a process that
1. Opens an internal database at DBurl via web interface
2. Manipulates and runs a java script located on the web page
3. Close the browser window without closing any other browser windows
This could be modified for someone else's use by looking for a page element, such as a search box or specific button on a page, and interacting with it.
Edit:
Additional testing has revealed that a pause at and skipping the Do While loop and resuming at IETab1 = SWs.Count results in this script working in Office 2016. The only issue, then, is without the loop, the page isn't yet ready for the next step when the script tries to run the interaction. A wait for 5 seconds in place of the loop band-aid's this issue. Finding why the .ReadyState won't read will fix this issue.
Declare PtrSafe Function apiShowWindow Lib "user32" Alias "ShowWindow" _
(ByVal hwnd As Long, ByVal nCmdShow As Long) As Long
Sub OpenWebDB()
Dim ieApp As Object
Dim SWs As ShellWindows
Dim IETab1 As Integer
Dim JScript As String
Dim CurrentWindow As Object
Dim DBurl As String
Dim tNow As Date, tOut As Date
DBurl = "My.Database.url"
Set SWs = New ShellWindows
tNow = Now
tOut = tNow + TimeValue("00:00:15")
If ieApp Is Nothing Then
Set ieApp = CreateObject("InternetExplorer.Application")
With ieApp
.Navigate DBurl
Do While tNow < tOut And .ReadyState <> READYSTATE_COMPLETE
DoEvents
tNow = Now
Loop
IETab1 = SWs.Count
End With
End If
If Not tNow < tOut Then GoTo DBFail
On Error GoTo DBFail
Set CurrentWindow = SWs.Item(IETab1 - 1).Document.parentWindow
JScript = "javascript: DoSomething"
Call CurrentWindow.execScript(JScript)
On Error GoTo 0
SWs.Item(IETab1 - 1).Quit
Set ieApp = Nothing
Set SWs = Nothing
Exit Sub
DBFail:
MsgBox (DBurl & vbCrLf & "took too long to connect or failed to load correctly." & vbCrLf & _
"Please notify the Database manager if this issue continues."), vbCritical, "DB Error"
SWs.Item(IETab1 - 1).Quit
Set ieApp = Nothing
Set SWs = Nothing
End Sub
Try to remove the tNow < tOut from the Do While condition. Or, using the While statement to wait page complete:
While IE.ReadyState <> 4
DoEvents
Wend
The intent of this script is to automate a process that
1. Opens an internal database at DBurl via web interface
2. Manipulates and runs a java script located on the web page
3. Close the browser window without closing any other browser windows
Besides, according to the intent of the script, I suggest you could refer the following code (it could loop through the tabs, and close specific tab according the title):
Sub TestClose()
Dim IE As Object, Data As Object
Dim ticket As String
Dim my_url As String, my_title As String
Set IE = CreateObject("InternetExplorer.Application")
With IE
.Visible = True
.Navigate "https://www.microsoft.com/en-sg/" '1st tab
.Navigate "https://www.bing.com", CLng(2048) '2nd
.Navigate "https://www.google.com", CLng(2048) '3rd
While IE.ReadyState <> 4
DoEvents
Wend
'wait some time to let page load
Application.Wait (Now + TimeValue("0:00:05"))
'get the opened windows
Set objShell = CreateObject("Shell.Application")
IE_count = objShell.Windows.Count
'loop through the window and find the tab
For x = 0 To (IE_count - 1)
On Error Resume Next
'get the location and title
my_url = objShell.Windows(x).Document.Location
my_title = objShell.Windows(x).Document.Title
'debug to check the value
Debug.Print x
Debug.Print my_title
'find the special tab based on the title.
If my_title Like "Bing" & "*" Then
Set IE = objShell.Windows(x)
IE.Quit 'call the Quit method to close the tab.
Exit For 'exit the for loop
Else
End If
Next
End With
Set IE = Nothing
End Sub

Need help to convert Internet Explorer based web scraping to XMLHTTP

I am trying to speed up some intranet webscraping as well as make it more reliable. I am just learning how to implement XMLHTTP and I need some advice on converting my code from IE based scrapping to XMLHTTP.
I have 2 subs in my module that accomplishes loading up and navigating the intranet site (GetWebTable) and parsing through the data (GetOneTable) to return a table in excel. The subs are as follows:
Sub GetWebTable(sAccountNum As String)
On Error Resume Next
Dim objIE As Object
Dim strBuffer As String
Dim thisCol As Integer
Dim iAcctCount As Integer
Dim iCounter As Integer
Dim iNextCounter As Integer
Dim iAcctCell As Integer
Dim thisColCustInfo As Integer
Dim iErrorCounter As Integer
If InStr(1, sAccountNum, "-") <> 0 Then
sAccountNum = Replace(sAccountNum, "-", "")
End If
If InStr(1, sAccountNum, " ") <> 0 Then
sAccountNum = Replace(sAccountNum, " ", "")
End If
iErrorCounter = 1
TRY_AGAIN:
'Spawn Internet Explorer
Set objIE = GetObject("new:{D5E8041D-920F-45e9-B8FB-XXXXXXX}")
DoEvents
With objIE
.Visible = False
.Navigate "http://intranetsite.aspx"
While .busy = True Or .readystate <> 4: DoEvents: Wend
While .Document.readyState <> "complete": DoEvents: Wend
.Document.getElementById("ctl00_MainContentRegion_tAcct").Value = sAcct
While .busy = True Or .readyState <> 4: DoEvents: Wend
While .Document.readyState <> "complete": DoEvents: Wend
.Document.getElementById("ctl00_MainContentRegion_btnRunReport").Click
While .busy = True Or .readyState <> 4: DoEvents: Wend
While .Document.readyState <> "complete": DoEvents: Wend
End With
thisCol = 53
thisColCustInfo = 53
GetOneTable objIE.Document, 9, thisCol
'Cleanup:
objIE.Quit
Set objIE = Nothing
GetWebTable_Error:
Select Case Err.Number
Case 0
Case Else
Debug.Print Err.Number, Err.Description
iErrorCounter = iErrorCounter + 1
objIE.Quit
Set objIE = Nothing
If iErrorCounter > 4 Then On Error Resume Next
GoTo TRY_AGAIN
'Stop
End Select
End Sub
Sub GetOneTable(varWebPageDoc, varTableNum, varColInsert)
Dim varDocElement As Object ' the elements of the document
Dim varDocTable As Object ' the table required
Dim varDocRow As Object ' the rows of the table
Dim varDocCell As Object ' the cells of the rows.
Dim Rng As Range
Dim iCellCount As Long
Dim iElemCount As Long
Dim iTableCount As Long
Dim iRowCount As Long
Dim iRowCounter As Integer
Dim bTableEndFlag As Boolean
bTableEndFlag = False
For Each varDocElement In varWebPageDoc.all
If varDocElement.nodeName = "TABLE" Then
iElemCount = iElemCount + 1
End If
If iElemCount = varTableNum Then
Set varDocTable = varDocElement
iTableCount = iTableCount + 1
iRowCount = iRowCount + 1
Set Rng = Worksheets("Sheet1").Cells(2, varColInsert)
For Each varDocRow In varDocTable.Rows
For Each varDocCell In varDocRow.Cells
If Left(varDocCell.innerText, 9) = "Total for" Then
bTableEndFlag = True
Exit For
End If
Rng.Value = varDocCell.innerText
Set Rng = Rng.Offset(, 1)
iCellCount = iCellCount + 1
Next varDocCell
iRowCount = iRowCount + 1
Set Rng = Rng.Offset(1, -iCellCount)
iCellCount = 0
Next varDocRow
Exit For
End If
Next varDocElement
Set varDocElement = Nothing
Set varDocTable = Nothing
Set varDocRow = Nothing
Set varDocCell = Nothing
Set Rng = Nothing
End Sub
Any thoughts?
HTML is not XML. XML is strictly enforced is terms of opening and closing tags whilst HTML is famous for <br> tags without closuing </br>. You'd be very lucky if the HTML is XML compliant.
Anyway, if you want to use XMLHTTP because of the HTTP request and still keep your IE based web scraping code then see this article http://exceldevelopmentplatform.blogspot.com/2018/01/vba-xmlhttp-request-xhr-does-not-parse.html It shows how to use XMLHTTP before passing response to MSHTML.
You can use MSHTML independently of IE, see this article Use MSHTML to parse local HTML file without using Internet Explorer (Microsoft HTML Object Library). If you read that you will see much of the code that you write against the IE object model is in fact aaginst the MSHTML object model and as such you can decouple and jettison IE. Enjoy!
EDIT1: Don't forget you can ask your company's IT staff
You say it is an intranet site which implies internal to your company, you could ask the programmers who are responsible for that system for a direct API guide.
EDIT2: Folding in feedback about how to mimic a browser...
To mimic the browser you need to figure out the traffic that button clicks generate...
To watch network traffic I recommend you switch to Chrome as your browser. Then, on this web page, right-click mouse button and take "Inspect" menu option, this opens the Chrome Developer Tools. Then, in Developer Tools select the Network tab, then click on a link on this page and you will see the traffic that is generated.
So, if you want to go pure XMLHTTP and leave browsers behind then you won't have buttons available to click but you can observe the network traffic that happens when a button is clicked in a browser and you can then mimic this in code.
So for example, in your comment you ask how do I enter an account number and click the button. I'm guessing that clicking a button will result in a XMLHTTP call of something like http://example.com/dowork/mypage.asp?accountnumber=1233456&otherParams=true so you see account number would be buried in the query parameters. Once you have that url you can put that in your XMLHTTP request.
One potential problem is that system designers may have chosen to hide account numbers in the body of a HTTP POST because it is sensitive/confidential data. However, Chrome Developer Tools is very good and should still yield that information but may have to poke around.

Extracting website data with Excel and VBA [duplicate]

Im trying to scrape data from website: http://uk.investing.com/rates-bonds/financial-futures via vba, like real-time price, i.e. German 5 YR Bobl, US 30Y T-Bond, i have tried excel web query but it only scrapes the whole website, but I would like to scrape the rate only, is there a way of doing this?
There are several ways of doing this. This is an answer that I write hoping that all the basics of Internet Explorer automation will be found when browsing for the keywords "scraping data from website", but remember that nothing's worth as your own research (if you don't want to stick to pre-written codes that you're not able to customize).
Please note that this is one way, that I don't prefer in terms of performance (since it depends on the browser speed) but that is good to understand the rationale behind Internet automation.
1) If I need to browse the web, I need a browser! So I create an Internet Explorer browser:
Dim appIE As Object
Set appIE = CreateObject("internetexplorer.application")
2) I ask the browser to browse the target webpage. Through the use of the property ".Visible", I decide if I want to see the browser doing its job or not. When building the code is nice to have Visible = True, but when the code is working for scraping data is nice not to see it everytime so Visible = False.
With appIE
.Navigate "http://uk.investing.com/rates-bonds/financial-futures"
.Visible = True
End With
3) The webpage will need some time to load. So, I will wait meanwhile it's busy...
Do While appIE.Busy
DoEvents
Loop
4) Well, now the page is loaded. Let's say that I want to scrape the change of the US30Y T-Bond:
What I will do is just clicking F12 on Internet Explorer to see the webpage's code, and hence using the pointer (in red circle) I will click on the element that I want to scrape to see how can I reach my purpose.
5) What I should do is straight-forward. First of all, I will get by the ID property the tr element which is containing the value:
Set allRowOfData = appIE.document.getElementById("pair_8907")
Here I will get a collection of td elements (specifically, tr is a row of data, and the td are its cells. We are looking for the 8th, so I will write:
Dim myValue As String: myValue = allRowOfData.Cells(7).innerHTML
Why did I write 7 instead of 8? Because the collections of cells starts from 0, so the index of the 8th element is 7 (8-1). Shortly analysing this line of code:
.Cells() makes me access the td elements;
innerHTML is the property of the cell containing the value we look for.
Once we have our value, which is now stored into the myValue variable, we can just close the IE browser and releasing the memory by setting it to Nothing:
appIE.Quit
Set appIE = Nothing
Well, now you have your value and you can do whatever you want with it: put it into a cell (Range("A1").Value = myValue), or into a label of a form (Me.label1.Text = myValue).
I'd just like to point you out that this is not how StackOverflow works: here you post questions about specific coding problems, but you should make your own search first. The reason why I'm answering a question which is not showing too much research effort is just that I see it asked several times and, back to the time when I learned how to do this, I remember that I would have liked having some better support to get started with. So I hope that this answer, which is just a "study input" and not at all the best/most complete solution, can be a support for next user having your same problem. Because I have learned how to program thanks to this community, and I like to think that you and other beginners might use my input to discover the beautiful world of programming.
Enjoy your practice ;)
Other methods were mentioned so let us please acknowledge that, at the time of writing, we are in the 21st century. Let's park the local bus browser opening, and fly with an XMLHTTP GET request (XHR GET for short).
Wiki moment:
XHR is an API in the form of an object whose methods transfer data
between a web browser and a web server. The object is provided by the
browser's JavaScript environment
It's a fast method for retrieving data that doesn't require opening a browser. The server response can be read into an HTMLDocument and the process of grabbing the table continued from there.
Note that javascript rendered/dynamically added content will not be retrieved as there is no javascript engine running (which there is in a browser).
In the below code, the table is grabbed by its id cr1.
In the helper sub, WriteTable, we loop the columns (td tags) and then the table rows (tr tags), and finally traverse the length of each table row, table cell by table cell. As we only want data from columns 1 and 8, a Select Case statement is used specify what is written out to the sheet.
Sample webpage view:
Sample code output:
VBA:
Option Explicit
Public Sub GetRates()
Dim html As HTMLDocument, hTable As HTMLTable '<== Tools > References > Microsoft HTML Object Library
Set html = New HTMLDocument
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", "https://uk.investing.com/rates-bonds/financial-futures", False
.setRequestHeader "If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT" 'to deal with potential caching
.send
html.body.innerHTML = .responseText
End With
Application.ScreenUpdating = False
Set hTable = html.getElementById("cr1")
WriteTable hTable, 1, ThisWorkbook.Worksheets("Sheet1")
Application.ScreenUpdating = True
End Sub
Public Sub WriteTable(ByVal hTable As HTMLTable, Optional ByVal startRow As Long = 1, Optional ByVal ws As Worksheet)
Dim tSection As Object, tRow As Object, tCell As Object, tr As Object, td As Object, r As Long, C As Long, tBody As Object
r = startRow: If ws Is Nothing Then Set ws = ActiveSheet
With ws
Dim headers As Object, header As Object, columnCounter As Long
Set headers = hTable.getElementsByTagName("th")
For Each header In headers
columnCounter = columnCounter + 1
Select Case columnCounter
Case 2
.Cells(startRow, 1) = header.innerText
Case 8
.Cells(startRow, 2) = header.innerText
End Select
Next header
startRow = startRow + 1
Set tBody = hTable.getElementsByTagName("tbody")
For Each tSection In tBody
Set tRow = tSection.getElementsByTagName("tr")
For Each tr In tRow
r = r + 1
Set tCell = tr.getElementsByTagName("td")
C = 1
For Each td In tCell
Select Case C
Case 2
.Cells(r, 1).Value = td.innerText
Case 8
.Cells(r, 2).Value = td.innerText
End Select
C = C + 1
Next td
Next tr
Next tSection
End With
End Sub
you can use winhttprequest object instead of internet explorer as it's good to load data excluding pictures n advertisement instead of downloading full webpage including advertisement n pictures those make internet explorer object heavy compare to winhttpRequest object.
This question asked long before. But I thought following information will useful for newbies. Actually you can easily get the values from class name like this.
Sub ExtractLastValue()
Set objIE = CreateObject("InternetExplorer.Application")
objIE.Top = 0
objIE.Left = 0
objIE.Width = 800
objIE.Height = 600
objIE.Visible = True
objIE.Navigate ("https://uk.investing.com/rates-bonds/financial-futures/")
Do
DoEvents
Loop Until objIE.readystate = 4
MsgBox objIE.document.getElementsByClassName("pid-8907-last")(0).innerText
End Sub
And if you are new to web scraping please read this blog post.
Web Scraping - Basics
And also there are various techniques to extract data from web pages. This article explain few of them with examples.
Web Scraping - Collecting Data From a Webpage
I modified some thing that were poping up error for me and end up with this which worked great to extract the data as I needed:
Sub get_data_web()
Dim appIE As Object
Set appIE = CreateObject("internetexplorer.application")
With appIE
.navigate "https://finance.yahoo.com/quote/NQ%3DF/futures?p=NQ%3DF"
.Visible = True
End With
Do While appIE.Busy
DoEvents
Loop
Set allRowofData = appIE.document.getElementsByClassName("Ta(end) BdT Bdc($c-fuji-grey-c) H(36px)")
Dim i As Long
Dim myValue As String
Count = 1
For Each itm In allRowofData
For i = 0 To 4
myValue = itm.Cells(i).innerText
ActiveSheet.Cells(Count, i + 1).Value = myValue
Next
Count = Count + 1
Next
appIE.Quit
Set appIE = Nothing
End Sub

VBA skipping code directly after submitting form in IE

Currently I have 2 pieces of code that work separately, but when used together they don't work properly.
The first code asks the user to input information which is stored. It then navigates to the correct webpage where it uses the stored user input information to navigate via filling and submitting a form. It arrives at the correct place.
The second code uses a specific URL via ie.navigate "insert url here" to navigate to the same place as the first code. It then scrapes URL data and stores it in a newly created sheet. It does this correctly.
When merging them I replace the navigation segment from the second code with the first code, but then it only stores the first 5 of 60 URLs as if it hadn't fully loaded the page before scraping data. It seems to skip the code directly after ie.document.forms(0).submit which is supposed to wait for the page to load before moving on to the scraping..
extra info: the button wasn't defined so I cannot just click it so I had to use ie.document.forms(0).submit
Summary of what I want the code to do:
request user input
store user input
open ie
navigate to page
enter user input into search field
select correct search category from listbox
submit form
'problem happens here
scrape url data
store url data in specific excel worksheet
The merged code:
Sub extractTablesData()
Dim ie As Object, obj As Object
Dim Var_input As String
Dim elemCollection As Object
Dim html As HTMLDocument
Dim Link As Object
Dim erow As Long
' create new sheet to store info
Application.DisplayAlerts = False
ThisWorkbook.Sheets("HL").Delete
ThisWorkbook.Sheets.Add.Name = "HL"
Application.DisplayAlerts = True
Set ie = CreateObject("InternetExplorer.Application")
Var_input = InputBox("Enter info")
With ie
.Visible = True
.navigate ("URL to the webpage")
While ie.readyState <> 4
DoEvents
Wend
'Input Term 1 into input box
ie.document.getElementById("trm1").Value = Var_input
'accessing the Field 1 ListBox
For Each obj In ie.document.all.Item("FIELD1").Options
If obj.Value = "value in listbox" Then
obj.Selected = True
End If
Next obj
' button undefined - using this to submit form
ie.document.forms(0).submit
'----------------------------------------------------------------
'seems to skip this part all together when merged
'Wait until IE is done loading page
Do While ie.readyState <> READYSTATE_COMPLETE
Application.StatusBar = "Trying to go to website…"
DoEvents
Loop
'----------------------------------------------------------------
Set html = ie.document
Set ElementCol = html.getElementsByTagName("a")
For Each Link In ElementCol
erow = Worksheets("HL").Cells(Rows.Count, 1).End(xlUp).Offset(1, 0).Row
Cells(erow, 1).Value = Link
Cells(erow, 1).Columns.AutoFit
Next
Application.StatusBar = “”
Application.ScreenUpdating = True
End With
End Sub
I've been stuck for quite some time on this and haven't found any solutions on my own so I'm reaching out. Any help will be greatly appreciated!
You mentioned you think the website might not be fully loaded. This is a common problem because of the more dynamic elements on a webpage. The easiest way to handle this is to insert the line:
Application.Wait Now + Timevalue("00:00:02")
This will force the code to pause for an additional 2 seconds. Insert this line below the code which waits for the page to load and this will give Internet Explorer a chance to catch back up. Depending on the website and the reliability of your connection to it I recommend adjusting this value anywhere up to about 5 seconds.
Most websites seem to require additional waiting like this, so handy code to remember when things don't work as expected. Hope this helps.
I solved this by using a completely different method. I used a query table with strings to go where I wanted.
Sub ExtractTableData()
Dim This_input As String
Const prefix As String = "Beginning of url"
Const postfix As String = "end of url"
Dim qt As QueryTable
Dim ws As Worksheet
Application.DisplayAlerts = False
ThisWorkbook.Sheets("HL").Delete
ThisWorkbook.Sheets.Add.Name = "HL"
Application.DisplayAlerts = True
This_input = InputBox("enter key info to go to specific url")
Set ws = ActiveSheet
Set qt = ws.QueryTables.Add( _
Connection:="URL;" & prefix & This_input & postfix, _
Destination:=Worksheets("HL").Range("A1"))
qt.RefreshOnFileOpen = True
qt.WebSelectionType = xlSpecifiedTables
'qt.webtables is key to getting the specific table on the page
qt.WebTables = 2
qt.Refresh BackgroundQuery:=False
End Sub

Get data from listings on a website to excel VBA

I am trying to find a way to get the data from yelp.com
I have a spreadsheet on which there are several keywords and locations. I am looking to extract data from yelp listings based on these keywords and locations already in my spreadsheet.
I have created the following code, but it seems to get absurd data and not the exact information I am looking for.
I want to get business name, address and phone number, but all I am getting is nothing. If anyone here could help me solve this problem.
Sub find()
Dim ie As Object
Set ie = CreateObject("InternetExplorer.Application")
With ie
ie.Visible = False
ie.Navigate "http://www.yelp.com/search?find_desc=boutique&find_loc=New+York%2C+NY&ns=1&ls=3387133dfc25cc99#start=10"
' Don't show window
ie.Visible = False
'Wait until IE is done loading page
Do While ie.Busy
Application.StatusBar = "Downloading information, lease wait..."
DoEvents
Loop
' Make a string from IE content
Set mDoc = ie.Document
peopleData = mDoc.body.innerText
ActiveSheet.Cells(1, 1).Value = peopleData
End With
peopleData = "" 'Nothing
Set mDoc = Nothing
End Sub
If you right click in IE, and do View Source, it is apparent that the data served on the site is not part of the document's .Body.innerText property. I notice this is often the case with dynamically served data, and that approach is really too simple for most web-scraping.
I open it in Google Chrome and inspect the elements to get an idea of what I'm really looking for, and how to find it using a DOM/HTML parser; you will need to add a reference to Microsoft HTML Object Library.
I think you can get it to return a collection of the <DIV> tags, and then check those for the classname with an If statment inside the loop.
I made some revisions to my original answer, this should print each record in a new cell:
Option Explicit
Private Sub Sleep Lib "kernel32" (ByVal dwMilliseconds As Long)
Sub find()
'Uses late binding, or add reference to Microsoft HTML Object Library
' and change variable Types to use intellisense
Dim ie As Object 'InternetExplorer.Application
Dim html As Object 'HTMLDocument
Dim Listings As Object 'IHTMLElementCollection
Dim l As Object 'IHTMLElement
Dim r As Long
Set ie = CreateObject("InternetExplorer.Application")
With ie
.Visible = False
.Navigate "http://www.yelp.com/search?find_desc=boutique&find_loc=New+York%2C+NY&ns=1&ls=3387133dfc25cc99#start=10"
' Don't show window
'Wait until IE is done loading page
Do While .readyState <> 4
Application.StatusBar = "Downloading information, Please wait..."
DoEvents
Sleep 200
Loop
Set html = .Document
End With
Set Listings = html.getElementsByTagName("LI") ' ## returns the list
For Each l In Listings
'## make sure this list item looks like the listings Div Class:
' then, build the string to put in your cell
If InStr(1, l.innerHTML, "media-block clearfix media-block-large main-attributes") > 0 Then
Range("A1").Offset(r, 0).Value = l.innerText
r = r + 1
End If
Next
Set html = Nothing
Set ie = Nothing
End Sub