Parsing a website with VBA following various links - vba

I am trying to download data from a website that is within a table.
The actual downloading works fine as long as I know the direct link to the page. One page though has data on various sub-pages and I would first load the links, and then follow each of the links on page 1.
Trouble is, that once I load a sub-page, the object variable holding the initial links is lost. How can I preserve the object variable?
My code so far looks like this:
...
more code here
...
ieApp.Navigate "http://www.website.com/blabla"
Do While ieApp.Busy: Sleep 500: DoEvents: Loop
Do Until ieApp.readyState = READYSTATE_COMPLETE: DoEvents: Loop
Set rcl = ieApp.Document.getElementsbyClassName("col-md-3")
For Each ahref In rcl(0).getElementsByTagName("a")
ieApp.Navigate ahref.href
Do While ieApp.Busy: Sleep 500: DoEvents: Loop
Do Until ieApp.readyState = READYSTATE_COMPLETE: DoEvents: Loop
' Now get the data
Call subSaveRecords
Next
...
continue more
...
Basically after I initiate the ieApp.navigate within the For Each the rcl-Object/ahref object is lost as the ieApp has a new page. How can I "preserve" the object values while still moving onwards to new pages?
Thanks for your help.

You can't simply create a new object for open the sublinks and pass the object to the subroutine for get the data?
set ie2 = CreateObject(internetexplorer.application)
ieApp.Navigate "http://www.website.com/blabla"
Do While ieApp.Busy: Sleep 500: DoEvents: Loop
Do Until ieApp.readyState = READYSTATE_COMPLETE: DoEvents: Loop
Set rcl = ieApp.Document.getElementsbyClassName("col-md-3")
For Each ahref In rcl(0).getElementsByTagName("a")
ie2.Navigate ahref.href
Do While ie2.Busy: Sleep 500: DoEvents: Loop
Do Until ie2.readyState = READYSTATE_COMPLETE: DoEvents: Loop
'Now get the data
Call subSaveRecords ie2
Next
Or you can add the link to a collection or an array before changing page.
ieApp.Navigate "http://www.website.com/blabla"
Do While ieApp.Busy: Sleep 500: DoEvents: Loop
Do Until ieApp.readyState = READYSTATE_COMPLETE: DoEvents: Loop
Set rcl = ieApp.Document.getElementsbyClassName("col-md-3")
For Each ahref in rcl(0).getElementsByTagName("a")
linkCollection.add ahref.href
Next ahref
For Each ahref In linkCollection
ieApp.Navigate ahref
Do While ieApp.Busy: Sleep 500: DoEvents: Loop
Do Until ieApp.readyState = READYSTATE_COMPLETE: DoEvents: Loop
'Now get the data
Call subSaveRecords
Next

Related

VBA Web Scraping: macro stops when I run it but not in debugging mode

I have tried few thing like waiting for 1 to 10 seconds before the linecode with issue "Html.querySelector("a[href='#tab-import']").Click" where it stops or adding a "loop" until the page charges but I dont know why it only works in debugging mode.
HTML source:
Import
The error is 91: Object variable not set.
My code with cemented credentials
Sub Direnet()
Dim ieApp As InternetExplorer
'create a new instance of ie
Set ieApp = New InternetExplorer
'you don’t need this, but it’s good for debugging
ieApp.Visible = True
'go to the page we want
ieApp.Navigate "http://direnetdemos.com/casagarza-op/admin/index.php?route=tool/export_import&token=9reP7LOHg0SMChCYFBbcLPoQSjiQ72W1"
Do While ieApp.Busy: DoEvents: Loop
Do Until ieApp.ReadyState = READYSTATE_COMPLETE: DoEvents: Loop
With ieApp
.document.all.Item("input-username").Value = "xxxx"
.document.all.Item("input-password").Value = "xxxx"
.document.forms(0).submit
Do While ieApp.Busy: DoEvents: Loop
Do Until ieApp.ReadyState = READYSTATE_COMPLETE: DoEvents: Loop
'Click en pestaña Import
Dim Html As HTMLDocument
Set Html = ieApp.document
Application.Wait (Now + TimeValue("0:00:02"))
Html.querySelector("a[href='#tab-import']").Click
Html.querySelectorAll("input[name='incremental']")(1).Click
Html.querySelector("input[id='upload']").Click
Html.querySelector("a[onclick='uploadData();']").Click
End With
End Sub
Every time you click something you should wait for the browser to be "ready" again:
Html.querySelector("a[href='#tab-import']").Click
Do
DoEvents
Loop Until Not ieApp.Busy And ieApp.ReadyState = READYSTATE_COMPLETE
Html.querySelectorAll("input[name='incremental']")(1).Click
Do
DoEvents
Loop Until Not ieApp.Busy And ieApp.ReadyState = READYSTATE_COMPLETE
Html.querySelector("input[id='upload']").Click
Do
DoEvents
Loop Until Not ieApp.Busy And ieApp.ReadyState = READYSTATE_COMPLETE
Html.querySelector("a[onclick='uploadData();']").Click
Do
DoEvents
Loop Until Not ieApp.Busy And ieApp.ReadyState = READYSTATE_COMPLETE
Also, don't use Application.Wait for this kind of thing - it literally just freezes code execution for a given time. It doesn't allow events to process or have any kind of link to the broswer's ready state.
Well it finally worked moving this linecode "IE.Visible = True" a few lines after. Thank you very much for all.

VBA code how to select a dropdown list

Need help, I have the below code that logs you in, moves to the right page, selects the table data and copies it, but the issue is there is a drop down list as shown with code, how do I get the code to select the "All" and how do I write that? I am willing to pay some to get this answer fast.
The data from web page for the drop down is here:
"<select name="flightrisk_tbl_length" aria-controls="flightrisk_tbl" class="form-control input-sm"><option value="5">5</option><option value="10">10</option><option value="25">25</option><option value="-1">All</option></select>"
Sub GetTable()
Dim ieApp As InternetExplorer
Dim ieDoc As Object
Dim ieTable As Object
Dim clip As DataObject
'create a new instance of ie
Set ieApp = New InternetExplorer
'you don’t need this, but it’s good for debugging
ieApp.Visible = True
'assume we’re not logged in and just go directly to the login page
ieApp.Navigate "xxxx"
Do While ieApp.Busy: DoEvents: Loop
Do Until ieApp.ReadyState = READYSTATE_COMPLETE: DoEvents: Loop
Set ieDoc = ieApp.Document
'fill in the login form – View Source from your browser to get the control names
With ieDoc.forms(0)
.UserName.Value = "xxxx"
.Password.Value = "xxxx"
.submit
End With
Do While ieApp.Busy: DoEvents: Loop
Do Until ieApp.ReadyState = READYSTATE_COMPLETE: DoEvents: Loop
'now that we’re in, go to the page we want
ieApp.Navigate "xxxx"
Do While ieApp.Busy: DoEvents: Loop
Do Until ieApp.ReadyState = READYSTATE_COMPLETE: DoEvents: Loop
'get the table based on the table’s id
Set ieDoc = ieApp.Document
Set ieTable = ieDoc.All.Item("flightrisk_tbl")
'copy the tables html to the clipboard and paste to the sheet
If Not ieTable Is Nothing Then
Set clip = New DataObject
clip.SetText "" & ieTable.outerHTML & ""
clip.PutInClipboard
Sheet12.Select
Sheet12.Range("A1").Select
Sheet12.PasteSpecial "Unicode Text"
End If
'close 'er up
ieApp.Quit
Set ieApp = Nothing
End Sub
You can use the GetElementsByName method to get the dropdown object:
Dim oDropDown as object
Set oDropDown = ieApp.GetElementsByName("flightrisk_tbl_length")(0) 'Use zero, because this returns an array of all elements with the same name.
Afterwards, you can use the innerHTML or use the GetElementsByClassName method to get everything from the option classes.

VBA to click a webpage button

I would like to click a webpage button using VBA. I tried several options (please see the code below) but none of them seems work. If instead of click I use focus, it selects the button that I want to push, however the click command is not working. What other methods can I try?
Sub SubmitInfo()
Set IE = CreateObject("InternetExplorer.Application")
IE.Visible = True
IE.navigate "http://www.ecotransit.org/calculation.en.html"
Do While IE.readyState <> 4: DoEvents: Loop
Application.Wait (Now + TimeValue("0:00:02"))
' 1st option
IE.document.getElementById("calculationBT").Click
' 2nd option
IE.document.getElementsByClassName("formGreenButton")(0).Click
' 3rd option
Set htmlColl = IE.document.getElementsByTagName("input")
Do While IE.document.readyState <> "complete": DoEvents: Loop
For Each htmlInput In htmlColl
If Trim(htmlInput.Type) = "submit" Then
htmlInput.Click
Exit For
End If
Next htmlInput
' 4th option
Set htmlColl = IE.document.getElementsByTagName("input")
Do While IE.document.readyState <> "complete": DoEvents: Loop
For Each htmlInput In htmlColl
If Trim(htmlInput.Value) = "CALCULATE" Then
htmlInput.Click
Exit For
End If
Next htmlInput
End Sub
Below is the button HTML code that I want to click
<input id="calculationBT" type="submit" class="formGreenButton" value="CALCULATE" />
IE.document.getElementById("calculationBT").Click
should do the job but you must be sure that there is only one occurence of the id, which should be in this case.

Obtain web data after logging into website

I'm trying to get data from a website which requires to log in a user and password. I've followed this tutorial and managed to log into the website, but for some reason it's not getting the table.
Here's the code:
Sub GetTable()
Dim ieApp As InternetExplorer
Dim ieDoc As Object
Dim ieTable As Object
Dim clip As DataObject
'create a new instance of ie
Set ieApp = New InternetExplorer
'you don’t need this, but it’s good for debugging
ieApp.Visible = True
'assume we’re not logged in and just go directly to the login page
ieApp.Navigate "https://accounts.google.com/ServiceLogin"
Do While ieApp.Busy: DoEvents: Loop
Do Until ieApp.ReadyState = READYSTATE_COMPLETE: DoEvents: Loop
Set ieDoc = ieApp.Document
'fill in the login form –
With ieDoc.forms(0)
.Email.Value = "email#email.com"
.Passwd.Value = "password"
.submit
End With
Do While ieApp.Busy: DoEvents: Loop
Do Until ieApp.ReadyState = READYSTATE_COMPLETE: DoEvents: Loop
'now that we’re in, go to the page we want
ieApp.Navigate "my-website.com"
Do While ieApp.Busy: DoEvents: Loop
Do Until ieApp.ReadyState = READYSTATE_COMPLETE: DoEvents: Loop
'get the table based on the table's id
Set ieDoc = ieApp.Document
For i = 0 To (ieDoc.all.Length - 1)
'Only look at tables
If TypeName(ieDoc.all(i)) = “HTMLTable” Then
Set ieTable = ieDoc.all(i)
'I want to check the 3rd row (.Rows(2)) and will get an error if there
'are less than three rows.
If ieTable.Rows.Length > 2 Then
'Here’s the text in the first cell of the third row that tells me
'I have the right table
If ieTable.Rows(0).Cells(0).innertext = "Text" Then
'copy the tables html to the clipboard and paste to teh sheet
If Not ieTable Is Nothing Then
Set clip = New DataObject
clip.SetText "<html>" & ieTable.outerHTML & "</html>"
clip.PutInClipboard
Sheet1.Select
Sheet1.Range("A1").Select
Sheet1.PasteSpecial "Unicode Text"
End If
End If
End If
End If
Next i
'close 'er up
ieApp.Quit
Set ieApp = Nothing
End Sub
Assuming correctly marked with table tag you could have used the following to get the collection of tables which you can then loop through:
ieDoc.getElementsByTagName("table")

VBA Click Link in IE Ajax PopUp

I'm trying to automate the process of downloading a CSV from a web site.
I've managed to write the code to:
Log in to the site
Navigate to the correct page
Click the link to make the download accessible.
From there, my problem begins. Upon clicking the link, the site does a series of Ajax calls and displays a div with a link to download the file that has a unique name each time. I can get the box to pop up, but I cannot get the VBA to click the link after it becomes available. Can anybody help with getting VBA to click the link in the box that is displayed after the ajax call?
Sub GetTableData()
Dim ieApp As InternetExplorer
Dim ieDoc As Object
Set ieApp = New InternetExplorer
ieApp.Visible = True
ieApp.Navigate "https://www.thesite.com/Login.aspx"
Do While ieApp.Busy: DoEvents: Loop
Do Until ieApp.ReadyState = READYSTATE_COMPLETE: DoEvents: Loop
Set ieDoc = ieApp.Document
On Error GoTo downloadPage
testPage = ieDoc.getElementById("file_box_link_id")
With ieDoc
.getElementById("EMail").Value = "me#web.com"
.getElementById("Password").Value = "pass"
.getElementById("Submit").Click
End With
Do While ieApp.Busy: DoEvents: Loop
Do Until ieApp.ReadyState = READYSTATE_COMPLETE: DoEvents: Loop
ieApp.Navigate "https://www.thesite.com/Menu.aspx"
Do While ieApp.Busy: DoEvents: Loop
Do Until ieApp.ReadyState = READYSTATE_COMPLETE: DoEvents: Loop
downloadPage:
ieDoc.getElementById("download_link").Click
Do While ieApp.Busy: DoEvents: Loop
Do Until ieApp.ReadyState = READYSTATE_COMPLETE: DoEvents: Loop
' THIS IS WHERE THE PROBLEM COMES...THIS HTML IS ADDED AFTER THE AJAX CALL
dlLink = ieDoc.getElementById("download_link")
Debug.Print dlLink.href
ieApp.Quit
Set ieApp = Nothing
End Sub
I'm not sure if you can get any data from the pop-up, but if you can, try to use getElementsByTagName method and the outerText property.
For Each Link In ieDoc.getElementsByTagName("a") Do
If Link.outerText = "The link text" Then
Link.Click
Exit For
End If
Next
You can find more information here
Hope it help!
Best regards,
Luiz Fernando