Query Web Table on Current Website - vba

I am running into a bit of a problem. Normally when I pull a table I use the "data from web" tool in excel, however I now have quite a few places I need to pull data that first require me to enter a username and password. I figured out some code for that (though probably not the most elegant) but realized that once I get to my desired page I have no idea how to extract the table. Here is what I have so far.
Sub Login()
Sheets("IOL").Select
Set ie = CreateObject("InternetExplorer.application")
ie.Visible = True
ie.Navigate ("https://internalsite.company.com/secure/login" & ActiveCell)
Do
If ie.ReadyState = 4 Then
ie.Visible = True
Exit Do
Else
DoEvents
End If
Loop
ie.Document.forms(0).all("badgeBarcodeId").Value = "00000"
ie.Document.forms(0).submit
'used because it redirects to a new page after submitting and I couldn't figure out how to make it wait for the new page to load before proceeding.
Application.Wait (Now + TimeValue("0:00:02"))
ie.Document.forms(0).all("password").Value = "00000"
ie.Document.forms(0).submit
End Sub
After the login is accomplished I would like to go to http://internalsite.company.com/csv and import the csv directly into a sheet. Anytime I make a new connection it makes me log in again so I figure there has to be a way to extract the file without adding a new connection. I'm pretty new with more complex VBA so bear with me.

I was able to get this code to do the job, but it is more preferable to get the CSV directly instead of the table. Sometimes the table doesn't like to load.
Sub Login()
Dim clip As DataObject
Dim ieTable As Object
Set ie = CreateObject("InternetExplorer.application")
ie.Visible = True
ie.Navigate ("https://internalsite1.company.com/secure/login" & ActiveCell)
Do
If ie.ReadyState = 4 Then
ie.Visible = True
Exit Do
Else
DoEvents
End If
Loop
ie.Document.forms(0).all("badgeBarcodeId").Value = "00000"
ie.Document.forms(0).submit
Do While ie.Busy: DoEvents: Loop
Do Until ie.ReadyState = 4: DoEvents: Loop
ie.Document.forms(0).all("password").Value = "000000"
ie.Document.forms(0).submit
Do While ie.Busy: DoEvents: Loop
Do Until ie.ReadyState = 4: DoEvents: Loop
ie.Navigate "http://internalsite2.company.com/site/Inbound?filter=1To3Days"
Do While ie.Busy: DoEvents: Loop
Do Until ie.ReadyState = 4: DoEvents: Loop
Set ieTable = ie.Document.all.Item("DataTables_Table_0")
If Not ieTable Is Nothing Then
Set clip = New DataObject
clip.SetText "" & ieTable.outerHTML & ""
clip.PutInClipboard
Workbooks("Production Meeting Dashboard.xlsm").Activate
Sheets("IOL").Select
Range("A1").Select
ActiveSheet.PasteSpecial Format:="Unicode Text", link:=False, _
DisplayAsIcon:=False, NoHTMLFormatting:=True
End If
End Sub

Related

Crawler & Scraper using excel vba

I am trying to crawl in an intranet URL, so I can get the excel automatically select one of the options from a dropdown menu, then enter a value in a text box, then click on Find to get redirected to another page, where I want to get a value copy to another worksheet in the same workbook, I have created the below, but the code is not working, saying object required. :(
Sub Test()
Dim rng As Range
Set rng = Sheets("sheet1").Range("A1", Sheets("sheet1").Cells.Range("A1").End(xlDown))
Set ie = CreateObject("InternetExplorer.application")
ie.Visible = True
ie.Navigate ("https://gcd.ad.plc.cwintra.com/GCD_live/login/login.asp")
Do
If ie.ReadyState = 4 Then
ie.Visible = False
Exit Do
Else
DoEvents
End If
Loop
ie.Document.forms(0).all("txtUsername").Value = ""
ie.Document.forms(0).all("txtPassword").Value = ""
ie.Document.forms(0).submit
ie.Visible = True
Appliction.Wait (Now + TimeValue("00:00:02"))
DoEvents
For Each cell In rng
ie.Navigate ("https://gcd.ad.plc.cwintra.com/GCD_live/search.asp")
DoEvents
ie.Document.getElementById("cboFieldName").selectedIndex = 6
ie.Document.getElementById("txtFieldValue").Select
SendKeys (cell.Value)
DoEvents
ie.Document.getElementById("cmdFind").Click
Next cell
End Sub

VBA Internet Explorer Application gives different results for each function call

I'm trying to automate a task in excel that requires opening a webpage, navigating to a link on that page, and then clicking on a button on the second page to download an .xlsx file.
I've written a script that should do this. However, the response I get from the webpage is not always the same. In particular, sometimes this will return a download from the first page and sometimes it will navigate to the second page and not download anything, once or twice it has done both.
My sense is that this has to do with how long it takes for InternetExplorer.application to complete a request. I can't figure out how to troubleshoot this though, given that I tell the script to wait for IE.application to complete its request.
Sub DoBrowse2()
'For Each lnk In Sheets("Sheet4").Hyperlinks
'Range(lnk).Hy.Follow
'Next
Dim i As Long
Dim URL As String
Dim BaseURL As String
Dim ToURL As String
Dim IE As Object
Dim objElement As Object
Dim objCollection As Object
Dim HWNDSrc As Long
Dim html As IHTMLDocument
Set IE = CreateObject("InternetExplorer.Application")
URL = Range("B2").Hyperlinks(1).Address
IE.Navigate URL
IE.Visible = True
Application.StatusBar = URL & " is loading. Please wait..."
Do While IE.ReadyState = 4: DoEvents: Loop
Do Until IE.ReadyState = 4: DoEvents: Loop
Application.StatusBar = URL & " Loaded"
'Set html = IE.Document
'Dim elements As IHTMLElementCollection
'Set elements = html.all
For Each itm In IE.Document.all
If itm.className = "datagrid" Then
For Each el In itm.Document.all
Debug.Print "hello"
If el.className = "ujump" And Right(el.innerText, 12) = "Constituents" Then
'Debug.Print el.innerText
ToURL = el.getAttribute("data-subset")
BaseURL = "http://datastream.thomsonreuters.com/navigator/search.aspx?dsid=ZUCH002&AppGroup=DSAddin&host=Metadata&prev=scmTELCMBR&s=D&subset="
ToURL = BaseURL & ToURL
'Debug.Print ToURL
IE.Navigate ToURL
IE.Visible = True
Do While IE.Busy
Debug.Print "in busy loop"
Application.Wait DateAdd("s", 1, Now)
Loop
GoTo end_of_for
End If
Next
End If
Next
end_of_for:
Debug.Print ("STOP STOP STOP STOP STOP")
Dim Script As String
For Each itm In IE.Document.all
If itm.className = "lgc excel" Then
Debug.Print "hello world"
Debug.Print itm.getAttribute("onclick")
itm.Click
Do While IE.Busy
Debug.Print "app busy"
Application.Wait DateAdd("s", 1, Now)
Loop
Exit For
End If
Next
End Sub
Thanks in advance for your help.
Use this to determine whether IE page has been fully loaded, it always must be both of these conditions:
Do Until ie.ReadyState = 4 And ie.Busy = False
DoEvents
Loop
Even with code above if there are scripts on the page, some content may be loaded after ie.ReadyState = 4 And ie.Busy = False condition is met and either easy way, but inefficient and unreliable Application.Wait can be used or you can try finding elements on the website which inform about loading state and determine the state by their visible attributes etc.
Part of your code is wrong and causes an endless loop:
Do While IE.ReadyState = 4: DoEvents: Loop
Do Until IE.ReadyState = 4: DoEvents: Loop
It makes DoEvents fire while readystate is complete and also until it reaches complete status.
Narrow down a collection of all elements:
For Each itm In IE.Document.all
to a specific collection for better performance when possible, for example:
For Each itm In IE.Document.GetElementsByTagName("div")

Reference Cell in VBA Script

I was curious if there is a way to reference a specific cell within the following VBA script. The cell contains a date in yyyymmdd format, which would go in the commented section below.
Sub OpenData()
Set IE = CreateObject("InternetExplorer.Application")
IE.Visible = False
IE.navigate "http://website.com/'desired cell value'/subdirectory/file.txt"
'Check for good connection to web page loop!
Do
If IE.readyState = 4 Then
IE.Visible = True
Exit Do
Else
DoEvents
End If
Loop
'Wait for window to open!
Application.wait (Now + TimeValue("0:00:02"))
'MsgBox "Done"
IE.Visible = True
Eric
Have you tried format?
'...
IE.navigate "http://website.com/" & Format (Cells(1,1).Value, “dd/MM/yyyy”) & "/subdirectory/file.txt" '...

Internet Explorer 11 automation error

I am having a problem when using vba from excel to navigate on multiple pages. I have used the code on ie8 and ie9 without any problems, but in ie11, when i navigate to another url the ie.locationurl and ie.document still refer to the previous url. I noticed this only happens with php websites. I am interested in finding a way to solve this. Any help appreciated.
Here is some example code with a php website:
Sub test()
Dim ie As SHDocVw.InternetExplorer
Set ie = New SHDocVw.InternetExplorer
With ie
.navigate "https://www.facebook.com/" 'php website
.Visible = True
Do While ie.Busy
Loop
Do While ie.readyState <> 4
Loop
.navigate "https://en.wikipedia.org/wiki/Main_Page"
.Visible = True
Debug.Print ie.locationurl
Do While ie.Busy
Loop
Do While ie.readyState <> 4
Loop
End With
End Sub
And here is one that works but doesn't have a php website:
Sub test2()
Dim ie As SHDocVw.InternetExplorer
Set ie = New SHDocVw.InternetExplorer
With ie
.navigate "http://www.break.com/" 'non php website
.Visible = True
Do While ie.Busy
Loop
Do While ie.readyState <> 4
Loop
.navigate "https://en.wikipedia.org/wiki/Main_Page"
.Visible = True
Debug.Print ie.locationurl
Do While ie.Busy
Loop
Do While ie.readyState <> 4
Loop
End With
End Sub
You are requesting the LocationURL property before you've allowed the page to load.
Sub test()
Dim ie As SHDocVw.InternetExplorer
Set ie = New SHDocVw.InternetExplorer
With ie
.Visible = True
.navigate "https://www.facebook.com/" 'php website
Debug.Print "1." & .locationurl
Do While .Busy Or .readyState <> 4: DoEvents: Loop
Debug.Print "2." & ie.locationurl
.navigate "https://en.wikipedia.org/wiki/Main_Page"
Debug.Print "3." & .locationurl
Do While .Busy Or .readyState <> 4: DoEvents: Loop
Debug.Print "4." & .locationurl
End With
End Sub
Results from the VBE's Immediate window:
test
1.
2.https://www.facebook.com/
3.https://www.facebook.com/
4.https://en.wikipedia.org/wiki/Main_Page
I've paired up the ie.Busy and ie.readyState checks and threw DoEvents into the loop to allow some processing to happen while the page loads. You were not taking full advantage of the parent ie object references in the With ... End With statement so I chopped off a few unnecessary ie references within the block.
As the Internet Explorer version is pertinent to this question, this is the version that the code was run on.
            

I can not get Web site text using VBA IE

I used the VBA below and monitored the variable IE.document. When the url is IE.navigate="http://www.mixi.jp", I could get all the web text using IE.document.all.
But in other sites like "http://www.yahoo.co.jp" , I could not get Web text. Why is that?
Sub Main()
Set IE = CreateObject("InternetExplorer.Application")
IE.Visible = True
IE.navigate "http://www.yahoo.co.jp"
Do While IE.Busy Or IE.readyState < 4
DoEvents
Loop
End Sub
Because this particular website's IE.document is nothing:
This seems to be the case with dynamically served sites, there are some suggestions here which I use below.
Although I am not sure you will be able to easily "get all of the text", you can certainly still iterate over the elements you're interested in extracting:
Sub Main()
Set ie = CreateObject("InternetExplorer.Application")
ie.Visible = True
ie.navigate "http://www.yahoo.co.jp"
Do While ie.Busy Or ie.readyState < 4
DoEvents
Loop
'Create a collection of DIV elements for example
Dim myElements
DIm ele
Set myElements = ie.Document.Body.GetElementsByTagName("DIV")
'Then you can iterate over the DIV elements/etc., modify as needed.
For each ele in myElements
Debug.Print ele.InnerText
Next
End Sub