Web-scraping in Excel with variable url (url extension) - vba

I am fairly new to VBA and VBA in excel, I have been trying to find out how to conditionally scrape web data based off of one cells value ("Guid") and have not really found a way to progress the function -- to make it dynamic. As of right now I can only get it to retrieve data for one specific cell, and print in another specified cell. I believe I am just missing some kind of looping variable function? (aside from there is probably a more correct way of writing the code).
Sub ie_open()
Dim wb As Workbook
Dim ws As Worksheet
Dim TxtRng As Range
Dim Guid As Range
Dim ie As Object
Dim URL As String
URL = "https://url.com/userpage="
Set wb = ActiveWorkbook
Set ws = wb.Sheets("Detail Report - Individuals")
Set Guid = ws.Range("E2")
Set TxtRng = ws.Range("F2")
Set ie = CreateObject("INTERNETEXPLORER.APPLICATION")
ie.NAVIGATE (URL + Guid)
ie.Visible = True
While ie.ReadyState <> 4
DoEvents
Wend
TxtRng = ie.document.getelementbyid("lbl_Location").innertext
End Sub
Thank you in advance.

Turn on a reference to HTML elements (Go to Tools -- References. You should also turn on a reference to Microsoft Internet controls so you can declare IE as an InternetExplorer object rather than just an object, but it's not necessary), then you can loop through each element like
Sub ie_open()
Dim wb As Workbook
Dim ws As Worksheet
Dim TxtRng As Range
Dim Guid As Range
Dim ie As Object
Dim URL As String
'ADDED THIS
Dim sl as Ihtmlelement
Dim r as long = 1
URL = "https://url.com/userpage="
Set wb = ActiveWorkbook
Set ws = wb.Sheets("Detail Report - Individuals")
Set Guid = ws.Range("E2")
Set TxtRng = ws.Range("F2")
Set ie = CreateObject("INTERNETEXPLORER.APPLICATION")
ie.NAVIGATE (URL + Guid)
ie.Visible = True
While ie.ReadyState <> 4
DoEvents
Wend
For each sl in ie.document.all
ws.cells(r, 1).value = sl.innertext
r = r + 1
Next
'TxtRng = ie.document.getelementbyid("lbl_Location").innertext
End Sub
Edit: forgot to increment the r variable in the loop, and I think it should be IE.Document.All instead of just IE.Document when initializing the loop

Related

Macro to open multiple links in new tabs

I want my macro to open each link stored in a spreadsheet in a separate IE tab. I am successful with opening the first link, but for some reason on the second iteration of the loop I get:
Automation error.The interface is unknown
error.
I suspect the macro somehow loses IE object reference after first iteration, but I am not sure why.
Range is set OK.
Here is the code:
Sub OpenCodingForms()
Dim wb1 As Workbook
Dim ws1 As Worksheet
Dim CodingFormLinks As Range
Dim IE as InternetExplorerMedium
Set wb1 = Workbooks("New shortcut.xlsm")
Set ws1 = wb1.Worksheets("Data")
Set CodingFormLinks = ws1.Range("A2", Range("A2").End(xlDown))
Set IE = CreateObject("InternetExplorer.Application")
IE.Visible = True
ws1.Activate
For Each link In CodingFormLinks.Cells
IE.Navigate link, CLng(2049)
Next link
End Sub
I ran into this issue before and ended up just writing a routine to get the instance. You will need to add a reference to shell controls and automation.
you may have to adjust this to look for the sURL var in the beginning of the actual URL if there is redirection.
Sub OpenCodingForms()
Dim wb1 As Workbook
Dim ws1 As Worksheet
Dim CodingFormLinks As Range
Dim IE As InternetExplorerMedium
Set wb1 = Workbooks("New shortcut.xlsm")
Set ws1 = wb1.Worksheets("Data")
Set CodingFormLinks = ws1.Range("A2", Range("A2").End(xlDown))
Set IE = CreateObject("InternetExplorer.Application")
IE.Visible = True
ws1.Activate
Dim sUrl As String
For Each link In CodingFormLinks.Cells
sUrl = link.Value
IE.navigate sUrl, CLng(2048)
Set IE = GetWebPage(sUrl)
Next link
End Sub
''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
'Desc: The Function gets the Internet Explorer window that has the current
' URL from the sURL Parameter. The Function Timesout after 30 seconds
'Input parameters:
'String sURL - The URL to look for
'Output parameters:
'InternetExplorer ie - the Internet Explorer window holding the webpage
'Result: returns the Internet Explorer window holding the webpage
''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
Function GetWebPage(sUrl As String) As InternetExplorer
Dim winShell As Shell
Dim dt As Date
'set the timeout period
dt = DateAdd("s", 300, DateTime.Now)
Dim IE As InternetExplorer
'loop until we timeout
Do While dt > DateTime.Now
Set winShell = New Shell
'loop through the windows and check the internet explorer windows
For Each IE In winShell.Windows
'check for the url
If IE.LocationURL = sUrl Then
'set the window visible
IE.Visible = True
IE.Silent = True
'set the return value
Set GetWebPage = IE
Do While IE.Busy
DoEvents
Loop
Set winShell = Nothing
Exit Do
End If
Next IE
Set winShell = Nothing
DoEvents
Loop
End Function

Internet Explorer VBA Automation Error: The object Invoked has disconnected from its clients

I'm trying to write code that will read a value from Excel, look it up in an internal web based system and store the results back in the Excel. It reads the Excel with no problem, opens Internet Explorer with no problem, but when I then try to reference what's been opened, I get the above error. The line "ie.Navigate url" works, but the next line "Set DOC = ie.Document" generates the error. Any ideas on what's causing this? Here's my code:
Public Sub getClient()
Dim xOpen As Boolean
xOpen = False
Dim row As Long
Dim xL As Excel.Application
Set xL = New Excel.Application
xL.Visible = False
Dim wb As Excel.Workbook
Dim sh As Excel.Worksheet
'Change the name as needed, out put in some facility to input it or
'process multiples...
Dim filename As String
filename = "auditLookup.xlsx"
Set wb = xL.Workbooks.Open(getPath("Audit") + filename)
xOpen = True
Set sh = wb.Sheets(1)
Dim ie As Variant
Set ie = CreateObject("InternetExplorer.Application")
ie.Visible = True
Dim DOC As HTMLDocument
Dim idx As Integer
Dim data As String
Dim links As Variant
Dim lnk As Variant
Dim iRow As Long
iRow = 2 'Assume headers
Dim clientName As String
Dim clientID As String
Dim nameFound As Boolean
Dim idFound As Boolean
Dim url As String
While sh.Cells(iRow, 1) <> ""
'Just in case these IDs are ever prefixed with zeroes, I'm inserting
'some random character in front, but removing it of course when
'processing.
url = "https://.../" + mid(sh.Cells(iRow, 1), 2)
ie.navigate url
Set DOC = ie.Document
'Search td until we find "Name:" then the next td will be the name.
'Then search for "P1 ID (ACES):" and the next td with be that.
Set links = DOC.getElementsByTagName("td")
clientName = ""
clientID = ""
nameFound = False
idFound = False
For Each lnk In links
data = lnk.innerText
If nameFound Then
clientName = data
ElseIf idFound Then
clientID = data
End If
If nameFound And idFound Then
Exit For
End If
If data = "Name:" Then
nameFound = True
ElseIf data = "P1 ID (ACES):" Then
idFound = True
End If
Next
sh.Cells(iRow, 2) = clientName
sh.Cells(iRow, 2) = clientID
iRow = iRow + 1
Wend
Set ie = Nothing
If xOpen Then
wb.Save
Set wb = Nothing
xL.Quit
Set xL = Nothing
Set sh = Nothing
xOpen = False
End If
Exit Sub
Changing to:
Dim ie As InternetExplorer
Set ie = New InternetExplorerMedium
...
Solved the problem. Plus I did need to add back the Do loop mentioned in the comments:
Do
DoEvents
Loop Until ie.ReadyState = READYSTATE_COMPLETE

Copy some text from the site to Excel (Run-time error 462)

The program works (needed text displayed in Excel). But after that IE stops working and Run-time error 462 (The remote server machine does not exist or is unavailable) . Searching solution in the internet. https://support.microsoft.com/en-us/kb/178510 .
I can not effectively change the code .
Sub extractTablesData1()
Dim IE As Object
Dim Data As Object
Dim xlApp As Excel.Application
Dim xlwb As Excel.Workbook
Dim xlsh As Excel.Worksheet
Dim i As Integer
Dim elemCollection As Variant
Set xlApp = CreateObject("Excel.Application")
Set xlwb = xlApp.Workbooks("IESite.xlsx")
Set xlsh = xlwb.Worksheets("Data")
Set IE = CreateObject("InternetExplorer.Application")
With IE
.Visible = False
.navigate ("http://allscores.ru/soccer/new_ftour.php?champ=2404&f_team=406&tour=110")
While IE.ReadyState <> 4
DoEvents
Wend
Set Data = IE.document.getElementsbyTagName("table")(6).querySelectorAll("td.clr, td.clr_win, td.clr_draw, td.clr_loose")
i = 1
For Each elemCollection In Data
xlsh.Cells(34, 1 + i).Value = elemCollection.innerText
i = i + 1
Next elemCollection
End With
IE.Quit
Set IE = Nothing
End Sub
As described in the link:
RESOLUTION
To resolve this problem, modify the code so each call to an Excel object, method, or property is qualified with the appropriate object variable.
You are using:
ActiveWorkbook.Sheets(1).Cells(34, 1 + i).Value
This should be:
Dim xlwb as Workbook
Dim xlsh as Worksheet
set xlwb = Workbooks("IESite")
set xlsh = xlwb.Worksheets("Data")
xlsh.Cells(34, 1 + i).Value = elemCollection.innerText
Further changes:
As required by the solution, everything needs to be defined. Let me know if with the below way it does work, if so i'll remove the code above.
Don't forget to define ElemCollection
Sub extractTablesData1()
Dim IE As Object
Dim Data As Object
Dim xlApp As Excel.Application
Dim xlwb As Excel.Workbook
Dim xlsh As Excel.Worksheet
Dim i As Integer
Dim elemCollection as ... 'please define elemCollection as the type it is
Set xlApp = CreateObject("Excel.Application")
Set xlwb = xlApp.Workbooks("IESite.xlsx")
Set xlsh = xlwb.Worksheets("Data")
Set IE = CreateObject("InternetExplorer.Application")
With IE
.Visible = False
.navigate ("http://allscores.ru/soccer/new_ftour.php?champ=2404&f_team=406&tour=110")
While IE.ReadyState <> 4
DoEvents
Wend
Set Data = IE.document.getElementsbyTagName("table")(6).querySelectorAll("td.clr, td.clr_win, td.clr_draw, td.clr_loose")
i = 1
For Each elemCollection In Data
xlsh.Cells(34, 1 + i).Value = elemCollection.innerText
i = i + 1
Next elemCollection
End With
IE.Quit
Set IE = Nothing
End Sub

VBA returns [object] when using Doc.getElementsByTagName. Can't locate correct TagName from browser

I'm having trouble locating the correct tag name from Chrome. When I run the following code all that is returned is "[object]". Does anyone have any suggestions to pull in the correct field?
Here's the website I'm attempting to pull information. Specifically Census Tract but at this point if someone could show me how to return any information I will mine the correct one.
Here's my code, currently running the output to a MsgBox just to find the right field. From there I will reenable to for statement and make the URL in the VBA dynamic. Also note that I've attempted to use getElementsByName but no progress there either:
Sub censusTract()
Dim sht As Worksheet
Dim lastRow As Long
Set sht = ActiveWorkbook.Sheets("Sheet1")
lastRow = sht.Cells(sht.Rows.Count, "A").End(xlUp).Row
'For i = 2 To lastRow
Dim IE As New InternetExplorer
'IE.Visible = True
IE.navigate "http://geocoding.geo.census.gov/geocoder/geographies/address?street=1308+Lapwing+Rd&city=Edmond&state=OK&zip=73003&benchmark=4&vintage=4"
Do
DoEvents
Loop Until IE.readyState = READYSTATE_COMPLETE
Dim Doc As HTMLDocument
Set Doc = IE.document
Dim sDD As String
'sDD = Doc.getElementsByTagName("br")(0)
sDD = Doc.getElementsByName("pl_gov_census_geo_geocoder_domain_AddressResult")(0)
MsgBox sDD
'IE.Quit
'sht.Cells(i, 41).Value = sDD
'Next i
End Sub
You are very close. All you need to do is pull the correct data from the object. You can use one of the following
sDD = Doc.getElementsByName("pl_gov_census_geo_geocoder_domain_AddressResult").item(0).innerText
sDD = Doc.getElementsByName("pl_gov_census_geo_geocoder_domain_AddressResult").item(0).innerHTML
sDD = Doc.getElementsByName("pl_gov_census_geo_geocoder_domain_AddressResult").item(0).outerHTML
Right now its just pulling the object, adding innerText, innerHTML or outerHTML should pull your result.
Once you have that pulled you may want to split the results into an array so you can do what you need with each of the elements. the code below should get you started.
Sub censusTract()
Dim sht As Worksheet
Dim lastRow As Long, v As Variant, block As Variant, x As Integer
Set sht = ActiveWorkbook.Sheets("Sheet1")
lastRow = sht.Cells(sht.Rows.Count, "A").End(xlUp).Row
'For i = 2 To lastRow
Dim IE As New InternetExplorer
'IE.Visible = True
IE.navigate "http://geocoding.geo.census.gov/geocoder/geographies/address?street=1308+Lapwing+Rd&city=Edmond&state=OK&zip=73003&benchmark=4&vintage=4"
Do
DoEvents
Loop Until IE.readyState = READYSTATE_COMPLETE
Dim Doc As HTMLDocument
Set Doc = IE.document
Dim sDD As String
'sDD = Doc.getElementsByTagName("br")(0)
sDD = Doc.getElementsByName("pl_gov_census_geo_geocoder_domain_AddressResult").Item(0).innerText
block = Split(sDD, vbLf)
x = 2 'start column
For Each v In block
If v <> "" Then
Cells(i, x).Value = v
x = x + 1
End If
Next v
'IE.Quit
'sht.Cells(i, 41).Value = sDD
'Next i
End Sub
Let me know if you have any questions.

Search column for urls, save webpages as individual text files

I have code here that works for a url that is hard coded, and it only works for one url and one text file.
Sub saveUrl_Test()
Dim FileName As String
Dim FSO As Object
Dim ieApp As Object
Dim Txt As String
Dim TxtFile As Object
Dim URL As String
URL = "www.bing.com"
FileName = "C:\mallet\bing.com.txt"
Set FSO = CreateObject("Scripting.FileSystemObject")
Set TxtFile = FSO.OpenTextFile(FileName, 2, True, -1)
Set ieApp = CreateObject("InternetExplorer.Application")
ieApp.Visible = True
ieApp.Navigate URL
While ieApp.Busy Or ieApp.ReadyState <> 4
DoEvents
Wend
Txt = ieApp.Document.body.innerText
TxtFile.Write Txt
TxtFile.Close
ieApp.Quit
Set ieApp = Nothing
Set FSO = Nothing
End Sub
What I want it to do is search in column B for urls (possibly using InStr(variable, "http://") as a boolean), and then save each webpage as an individual text file. Would there be a way to name the text files using part of the URL strings? Also, is there a way for the webpage not to open, but still save as a text file? Opening the webpages wastes a lot of time.
I created this additional sub based on #MikeD's suggestion, but I get the wend without while error.
Sub url_Test(URL As String, FileName As String)
Dim FSO As Object
Dim ieApp As Object
Dim Txt As String
Dim TxtFile As Object
Set FSO = CreateObject("Scripting.FileSystemObject")
Set TxtFile = FSO.OpenTextFile(FileName, 2, True, -1)
Set ieApp = CreateObject("InternetExplorer.Application")
ieApp.Visible = True
ieApp.Navigate URL
While ieApp.Busy Or ieApp.ReadyState <> 4
DoEvents
Wend
Txt = ieApp.Document.body.innerText
TxtFile.Write Txt
TxtFile.Close
ieApp.Quit
Set ieApp = Nothing
Set FSO = Nothing
End Sub
Sub LoopOverB()
Dim myRow As Long
myRow = 10
While Cells(myRow, 2).Value <> ""
If InStr(1, Cells(myRow, 2).Value, "http:\\", vbTextCompare) Then Call url_Test(Cells(myRow, 2).Value, "C:\mallet\test\" & Cells(myRow, 1).Value & ".txt")
myRow = myRow + 1
Wend
End Sub
First you could parameterize the sub
Sub saveUrl_param(URL as String, FileName as String)
....
End Sub
and remove the Dim and assignment statements for URL and FileName
Secondly you write another Sub which loops through non-empty cells in column B, retrieving values and conditionally calling the saveUrl_param() routine.
example:
Sub LoopOverB()
Dim C As Range
For Each C In Intersect(ActiveSheet.Columns("B"), ActiveSheet.UsedRange).SpecialCells(xlCellTypeConstants)
' If C = .... Then ' note: URL in [B], filename in [C]
' saveUrl_param(C, C(1,2))
' End If
Next C
End Sub
and no - you can't do it without opening the Web page; you somehow have to get the page from the server (or the proxy). This is done by
ieApp.Navigate URL
and the following While ... Wend construct waits until the page is fully loaded into the browser object.
To speed up things you could skip
ieApp.Visible = True
once you have confidence that your Sub is working correctly, and you could move
Dim ieApp As Object ' I would prefer As SHDocVw.InternetExplorer .... don't like late binding
Set ieApp = CreateObject("InternetExplorer.Application")
to the calling sub and hand over the ieApp object to the subroutine as a parameter in order to not open/close the browser again & again.