Pasting URLs That Have Been Scraped From a Webpage - vba

So I'm looking to dump a bunch of URLs from a webpage into excel as a list. I was previously dumping the items into a listbox, but I have found that listboxes are quite difficult to work with!
Once I have collected the URLs into a column in excel, I want excel to click on each link and find the email address that is on the page. Here is the coding that I currently have...
Dim IE As Object
Set IE = CreateObject("InternetExplorer.Application")
IE.Visible = True
url_name = Sheet1.Range("A2")
If url_name = "" Then Exit Sub
IE.Navigate (url_name)
Do
DoEvents
Loop Until IE.ReadyState = 4
Set AllHyperLinks = IE.Document.GetElementsByTagName("A")
For Each hyper_link In AllHyperLinks
Range("x":"F").Value = hyper_link
This is all I have so far! I'm not sure how to complete the loop! I want the code to paste every new URL that it finds on the page in the next empty row in column F.

You can complete the loop in this way:
Dim IE As Object, LR As Long, i As Long
LR = Sheet1.Range("A" & Sheet1.Rows.Count).End(xlUp).Row
Set IE = CreateObject("InternetExplorer.Application")
IE.Visible = True
For i = 2 To LR
url_name = Sheet1.Range("A" & i).Value
If url_name = "" Then Exit Sub
IE.Navigate (url_name)
Do
DoEvents
Loop Until IE.ReadyState = 4
Set AllHyperLinks = IE.Document.GetElementsByTagName("A")
For Each hyper_link In AllHyperLinks
Range("x":"F").Value = hyper_link
Next hyper_link
Next i
Please note that if you have large set of data, this is going to take a LOOOONGGGG time.

Related

Pause script till website fully loaded - Excel VBA

I'm currently trying to create a sheet which will extract tracking information for parcels sent out. I've worked out the following code for the time being but encounter the following issues:
The code continues before the page fully loads, I suspect this may be because after the initial loading is complete, it runs a script and refreshes.
If mouse is not rolling over Internet Explorer, high probability of a human verification with images. I understand this may not be possible to avoid but is there any way I can pause the script while someone completes the verification?
Sub RoyalTrack()
Dim i As Long
Dim ie As Object
Dim t As String
Set ie = CreateObject("InternetExplorer.Application")
With ie
.Visible = True
.Navigate "https://www.royalmail.com/track-your-item#/tracking-results/SF511991733GB"
.Resizable = True
End With
While ie.ReadyState <> 4 Or ie.Busy: DoEvents: Wend
Dim full As Variant
Dim latest As Variant
full = ie.Document.getElementsByClassName("c-tracking-history")(0).innerText
latest = ie.Document.getElementsByClassName("tracking-history-item ng-scope")(0).innerText
MsgBox full
MsgBox latest
End Sub
Managed to figure it out. Added a 2 second wait after page loads to allow loading and an error handler to identify if the required property is available.
Sub RoyalTrack()
Dim i As Long
Dim ie As Object
Dim t As String
Dim trackingN As String
Dim count As Integer
count = 2
Do While Worksheets("Sheet1").Range("D" & count).Value <> ""
Set ie = CreateObject("InternetExplorer.Application")
trackingN = Worksheets("Sheet1").Range("D" & count).Value
With ie
.Visible = True
' Variable tracking SF-GB
.Navigate "https://www.royalmail.com/track-your-item#/tracking-results/" & trackingN
.resizable = True
End With
While ie.readyState <> 4 Or ie.Busy: DoEvents: Wend
Application.Wait (Now + TimeValue("0:00:02"))
Dim full As Variant
Dim latest As Variant
On Error Resume Next
latest = ie.document.getElementsByClassName("tracking-history-item ng-scope")(0).innerText
If Err Then
MsgBox "Prove your humanity if you can"
Err.Clear
End If
latest = ie.document.getElementsByClassName("tracking-history-item ng-scope")(0).innerText
Windows("Book1.xls").Activate
Sheets("Sheet1").Select
Range("E" & count).Value = latest
ie.Quit
Set ie = Nothing
count = count + 1
Loop
End Sub

Crawler & Scraper using excel vba

I am trying to crawl in an intranet URL, so I can get the excel automatically select one of the options from a dropdown menu, then enter a value in a text box, then click on Find to get redirected to another page, where I want to get a value copy to another worksheet in the same workbook, I have created the below, but the code is not working, saying object required. :(
Sub Test()
Dim rng As Range
Set rng = Sheets("sheet1").Range("A1", Sheets("sheet1").Cells.Range("A1").End(xlDown))
Set ie = CreateObject("InternetExplorer.application")
ie.Visible = True
ie.Navigate ("https://gcd.ad.plc.cwintra.com/GCD_live/login/login.asp")
Do
If ie.ReadyState = 4 Then
ie.Visible = False
Exit Do
Else
DoEvents
End If
Loop
ie.Document.forms(0).all("txtUsername").Value = ""
ie.Document.forms(0).all("txtPassword").Value = ""
ie.Document.forms(0).submit
ie.Visible = True
Appliction.Wait (Now + TimeValue("00:00:02"))
DoEvents
For Each cell In rng
ie.Navigate ("https://gcd.ad.plc.cwintra.com/GCD_live/search.asp")
DoEvents
ie.Document.getElementById("cboFieldName").selectedIndex = 6
ie.Document.getElementById("txtFieldValue").Select
SendKeys (cell.Value)
DoEvents
ie.Document.getElementById("cmdFind").Click
Next cell
End Sub

GetElementsByTagName Returning [object HTMLParagraphElement]

I have the below code, wherein I'm trying to open a series of urls and pull in the data from each url (example: http://apps.mohltc.ca/ltchomes/detail.php?id=2588&lang=en). Of most interest to me would be those labeled as "Local Health Integration Network", "Licensee" and "Licensed Beds".
As it stands, I'm trying to just pull in all elements with tag name "p" and deal with the data scrubbing later on. My code currently pulls in "[object HTML Paragraph Element]" instead of the array that I'm hoping for. Can someone explain why this is?
Sub ImportLicenseeData()
Dim ie As Object
Dim LH As Object
Dim r As Integer
Set ie = CreateObject("InternetExplorer.Application")
For r = 4 To 10
With ie
ie.Visible = False
ie.Navigate Cells(r, "H").Value
Do While (ie.Busy Or ie.ReadyState <> 4): DoEvents: Loop
Set Doc = ie.Document
Set LH = Doc.getElementsByTagName("p")
End With
Worksheets("Sheet1").Range("J" & r).Value = LH
Next r
End Sub
Any help is appreciated.
Dim LH As IHTMLElementCollection
Dim htmlEle1 as IHTMLElement
It requires Microsoft HTML Object Library reference. Then you can interact with elements of LH collection (it's not an array) like this:
Set LH = Doc.getElementsByTagName("p")
For Each htmlEle1 in LH
Debug.Print htmlEle1.innerText
Next htmlEle1
Thanks for the help everyone. I wasn't too familiar with handling the HTML Elements, so I ended up going with a different approach. Appreciate the feedback regardless.
via http://www.ozgrid.com/forum/showthread.php?t=178150
Sub RetrieveHTML()
Dim rngSelect As Range
Dim sURL As String
Set rngSelect = Range("H8", Range("H8").End(xlDown))
Debug.Print rngSelect.Address
Set ie = CreateObject("InternetExplorer.Application")
For Each c In rngSelect
sURL = c.Value
With ie
.Visible = False
.Navigate sURL
Do Until .ReadyState = 4
DoEvents
Loop
Do While .Busy: DoEvents: Loop
Range(c.Address).Offset(0, 1).Value = ie.Document.DocumentElement.outerHTML
End With
Next c
End Sub

VBA to find text from webpages

I have created Macro which gives me all URLs present on any webpages.
We just need to provide the URL and it gives us the all links present in that webpage and paste it in one column
Private Sub CommandButton4_Click()
'We refer to an active copy of Internet Explorer
Dim ie As InternetExplorer
'code to refer to the HTML document returned
Dim html As HTMLDocument
Dim ElementCol As Object
Dim Link As Object
Dim erow As Long
Application.ScreenUpdating = False
'open Internet Explorer and go to website
Set ie = New InternetExplorer
ie.Visible = True
ie.navigate Cells(1, 1)
'Wait until IE is done loading page
Do While ie.READYSTATE <> READYSTATE_COMPLETE
Application.StatusBar = "Trying to go to website…"
DoEvents
Loop
Set html = ie.document
'Display text of HTML document returned in a cell
'Range("A1") = html.DocumentElement.innerHTML
Set ElementCol = html.getElementsByTagName("a")
For Each Link In ElementCol
erow = Worksheets("Sheet4").Cells(Rows.Count, 1).End(xlUp).Offset(1, 0).Row
Cells(erow, 1).Value = Link
Cells(erow, 1).Columns.AutoFit
Next
'close down IE, reset status bar & turn on screenupdating
'Set ie = Nothing
Application.StatusBar = ""
Application.ScreenUpdating = True
ie.Quit
ActiveSheet.Range("$A$1:$A$2752").removeDuplicates Columns:=1, Header:=xlNo
End Sub
Now can anyone will help me to create macro to find particular text from all these URLs present in column and if that text is present then in next column it should print text "text found".
Example if we search text "New" then it should print text "Text found" in next column of the URL.
Thank you.
The key would be the function Instr, if it finds the string "New", it returns the position where it begins, otherwise it returns 0.
i=1
do until trim(Cells(i,1).Value) = vbNullString
if instr(Cells(i,1).Value,"New") then
Cells(i,2).value="Text found"
end if
i=i+1
loop
Similar to above.
Dim a As Variant
a = 2
While Cells(a, 1) <> "" And Cells(a + 1, 1) <> ""
If InStr(Cells(a, 1), "new") = 0 Then
Else
Cells(a, 2) = "Text Found"
End If
a = a + 1
Wend

VBA hanging on ie.busy and readystate check

I am trying to grab some football player data from a website to fill a privately used database. I've included the entire code below. This first section is a looper that calls the second function to fill a database. I've run this code in MSAccess to fill a database last summer and it worked great.
Now I am only getting a few teams to fill before the program gets hung up at
While IE.Busy Or IE.ReadyState <> READYSTATE_COMPLETE: DoEvents: Wend
I've searched countless websites regarding this error and tried changing this code by putting in sub function to wait a period of seconds or other work-arounds. None of those solve the issue. I've also tried running this on multiple computers.
The first computer made it through 3 teams (or three calls of the 2nd function). The second slower computer makes it through 5 teams. Both eventually hang. The 1st computer has Internet Explorer 10 and the second has IE8.
Sub Parse_NFL_RawSalaries()
Status ("Importing NFL Salary Information.")
Dim mydb As Database
Dim teamdata As DAO.Recordset
Dim i As Integer
Dim j As Double
Set mydb = CurrentDb()
Set teamdata = mydb.OpenRecordset("TEAM")
i = 1
With teamdata
Do Until .EOF
Call Parse_Team_RawSalaries(teamdata![RotoworldTeam])
.MoveNext
i = i + 1
j = i / 32
Status("Importing NFL Salary Information. " & Str(Round(j * 100, 0)) & "% done")
Loop
End With
teamdata.Close ' reset variables
Set teamdata = Nothing
Set mydb = Nothing
Status ("") 'resets the status bar
End Sub
Second function:
Function Parse_Team_RawSalaries(Team As String)
Dim mydb As Database
Dim rst As DAO.Recordset
Dim IE As InternetExplorer
Dim HTMLdoc As HTMLDocument
Dim TABLEelements As IHTMLElementCollection
Dim TRelements As IHTMLElementCollection
Dim TDelements As IHTMLElementCollection
Dim TABLEelement As Object
Dim TRelement As Object
Dim TDelement As HTMLTableCell
Dim c As Long
' open the table
Set mydb = CurrentDb()
Set rst = mydb.OpenRecordset("TempSalary")
Set IE = CreateObject("InternetExplorer.Application")
IE.Visible = False
IE.navigate "http://www.rotoworld.com/teams/contracts/nfl/" & Team
While IE.Busy Or IE.ReadyState <> READYSTATE_COMPLETE: DoEvents: Wend
Set HTMLdoc = IE.Document
Set TABLEelements = HTMLdoc.getElementsByTagName("Table")
For Each TABLEelement In TABLEelements
If TABLEelement.id = "cp1_tblContracts" Then
Set TRelements = TABLEelement.getElementsByTagName("TR")
For Each TRelement In TRelements
If TRelement.className <> "columnnames" Then
rst.AddNew
rst![Team] = Team
c = 0
Set TDelements = TRelement.getElementsByTagName("TD")
For Each TDelement In TDelements
Select Case c
Case 0
rst![Player] = Trim(TDelement.innerText)
Case 1
rst![position] = Trim(TDelement.innerText)
Case 2
rst![ContractTerms] = Trim(TDelement.innerText)
End Select
c = c + 1
Next TDelement
rst.Update
End If
Next TRelement
End If
Next TABLEelement
' reset variables
rst.Close
Set rst = Nothing
Set mydb = Nothing
IE.Quit
End Function
In Parse_Team_RawSalaries, instead of using the InternetExplorer.Application object, how about using MSXML2.XMLHTTP60?
So, instead of this:
Set IE = CreateObject("InternetExplorer.Application")
IE.Visible = False
IE.navigate "http://www.rotoworld.com/teams/contracts/nfl/" & Team
While IE.Busy Or IE.ReadyState <> READYSTATE_COMPLETE: DoEvents: Wend
Set HTMLdoc = IE.Document
Maybe try using this (add a reference to "Microsoft XML 6.0" in VBA Editor first):
Dim IE As MSXML2.XMLHTTP60
Set IE = New MSXML2.XMLHTTP60
IE.Open "GET", "http://www.rotoworld.com/teams/contracts/nfl/" & Team, False
IE.send
While IE.ReadyState <> 4
DoEvents
Wend
Dim HTMLDoc As MSHTML.HTMLDocument
Dim HTMLBody As MSHTML.htmlBody
Set HTMLDoc = New MSHTML.HTMLDocument
Set HTMLBody = HTMLDoc.body
HTMLBody.innerHTML = IE.responseText
I've generally found that MSXML2.XMLHTTP60 (and WinHttp.WinHttpRequest, for that matter) generally perform better (faster and more reliable) than InternetExplorer.Application.
I've found this post very helpful when I encountered similiar problem. Here is my solution:
I used
Dim browser As SHDocVw.InternetExplorer
Set browser = New SHDocVw.InternetExplorer
and
cTime = Now + TimeValue("00:01:00")
Do Until (browser.readyState = 4 And Not browser.Busy)
If Now < cTime Then
DoEvents
Else
browser.Quit
Set browser = Nothing
MsgBox "Error"
Exit Sub
End If
Loop
Sometimes page is loaded but code stops on DoEvents and goes on and on and on. Using this code it goes on only for 1 minute and if browser is not ready it quits the browser and exits sub.
I know this is a old post but. I have had the same problem with my code for downloading web site pictures using Excel VBA automation. Some sites wont let you download a image file using a link without first opening the link in a browser. However my code was getting hung up sometimes with when the objBrowser.visible was set to false with the folowing code
Do Until (objBrowser.busy = False And objBrowser.readyState = 4)
Application.Wait (Now + TimeValue("0:00:01"))
DoEvents 'browser.readyState = 4
Loop
the simple fix was to make the objBrowser.visible
I fixed it with
Dim Passes As Integer: Passes = 0
Do Until (objBrowser.busy = False And objBrowser.readyState = 4)
Passes = Passes + 1 'count loops
Application.Wait (Now + TimeValue("0:00:01"))
DoEvents
If Passes > 5 Then
'set size browser cannot set it smaller than 400
objBrowser.Width = 400 'set size
objBrowser.Height = 400
Label8.Caption = Passes 'display loop count
' position browser "you cannot move it off the screen" ready state wont change
objBrowser.Left = UserForm2.Left + UserForm2.Width
objBrowser.Top = UserForm2.Top + UserForm2.Height
objBrowser.Visible = True
DoEvents
objBrowser.Visible = False
End If
Loop
objBrowser only flashes for less than a second but it gets the job done!