VBA scraping with Excel - vba

I am trying to scrape data from: http://www.boliga.dk/salg/resultater?so=1&sort=omregnings_dato-d&maxsaledate=today&iPostnr=&gade=&type=Villa&minsaledate=2017
IN connection with this I have two questions.
Michał Perłakowski has giving an excellent guide how to scrape but the code he is using is getElementById(Scraping data from website using vba). Since the webpage I want to scrape from does not use an ID. I am wondering what alternatives available. My guess would be getElementsByClassName.
My next question is how to make the macro change page (I have more than 100) can I just write "next"?
Dim appIE As Object
Set appIE = CreateObject("internetexplorer.application")
With appIE
.navigate "http://www.boliga.dk/salg/resultater?so=1&sort=omregnings_dato-d&maxsaledate=today&iPostnr=&gade=&type=Villa&minsaledate=2017"
.Visible = True
End With
Do While appIE.Busy
DoEvents
Loop
Set allRowOfData = appIE.document.getElementsByName("Bygget")
Dim myValue As String: myValue = allRowOfData.Cells(7).innerHTML
appIE.Quit
Set appIE = Nothing
Range("A1").Value = myValue

Try this:
Option Explicit
Sub scrape()
Dim appIE As Object
Dim ihtml As Object
Set appIE = CreateObject("internetexplorer.application")
With appIE
.Visible = True
.navigate "http://www.boliga.dk/salg/resultater?so=1&sort=omregnings_dato-d&maxsaledate=today&iPostnr=&gade=&type=Villa&minsaledate=2017"
While .Busy = True Or .readyState < 4: DoEvents: Wend
Set ihtml = .document
Dim allRowOfData As Object
Set allRowOfData = appIE.document.getElementById("searchresult")
Dim r As Long, c As Long
Dim curHTMLRow As Object
For r = 1 To allRowOfData.Rows.Length - 1
Set curHTMLRow = allRowOfData.Rows(r)
For c = 0 To curHTMLRow.Cells.Length - 1 'comment out
Cells(r + 1, c + 1) = curHTMLRow.Cells(c).innerText ' Cells(r + 1, c + 1) = curHTMLRow.Cells(7).innerText
Next c 'comment out
Next r
.Quit
End With
Set appIE = Nothing
End Sub
Just the column of interest rather than whole table:
Option Explicit
Sub scrape()
Dim appIE As Object
Dim ihtml As Object
Set appIE = CreateObject("internetexplorer.application")
With appIE
.Visible = True
.navigate "http://www.boliga.dk/salg/resultater?so=1&sort=omregnings_dato-d&maxsaledate=today&iPostnr=&gade=&type=Villa&minsaledate=2017"
While .Busy = True Or .readyState < 4: DoEvents: Wend
Set ihtml = .document
Dim allRowOfData As Object
Set allRowOfData = appIE.document.getElementById("searchresult")
Dim r As Long, c As Long
Dim curHTMLRow As Object
For r = 1 To allRowOfData.Rows.Length - 1
Set curHTMLRow = allRowOfData.Rows(r)
Cells(r + 1, c + 1) = curHTMLRow.Cells(7).innerText
Next r
.Quit
End With
Set appIE = Nothing
End Sub
Reference:
https://www.experts-exchange.com/questions/28571716/Excel-VBA-WEb-Data-Scraping-from-a-Table.html

Related

How to scrape post div table content on VBA?

I am new to web-scraping. I am trying to get data from O-net online. To be specific, I would like to draw the median wages and employment stats for a few dozens selected jobs. For example:
https://www.onetonline.org/link/summary/13-2041.00
O-net content
Upon inspection, the html looks like this
Below is my code though it does not work.
Sub scrape()
Dim appIE As Object
Set appIE = CreateObject("internetexplorer.application")
With appIE
.Navigate "https://www.onetonline.org/link/summary/13-2041.00"
.Visible = False
End With
Do While appIE.Busy
DoEvents
Loop
Set allRowOfData = appIE.document.getElementByID("wrapper_WagesEmployment").getelementbytagname("table")(0)
Dim myValue As String: myValue = allRowOfData.Cells(3).innerHTML
appIE.Quit
Set appIE = Nothing
Range("A1").Value = myValue
End Sub
Please insert Option Explicit at the top of the module to help you identify which variable are not declared properly.
You spelled getelementbytagname method wrong, it's getElementsByTagName.
Try the below code:
Private Sub scrape()
Dim appIE As Object
Set appIE = CreateObject("internetexplorer.application")
With appIE
.navigate "https://www.onetonline.org/link/summary/13-2041.00"
.Visible = False
Do While .Busy Or .readyState <> 4
DoEvents
Loop
End With
Dim ieDoc As Object
Set ieDoc = appIE.Document
Dim allRowOfData As Object
Set allRowOfData = ieDoc.getElementById("wrapper_WagesEmployment").getElementsByTagName("table")(0)
Dim medianWageValue As String
medianWageValue = allRowOfData.getElementsByClassName("report2")(0).innerText
Range("A1").Value = medianWageValue 'Fully qualify your range referece e.g. Sheet1.Range("A1").Value
Set ieDoc = Nothing
appIE.Quit
Set appIE = Nothing
End Sub
Alternative - You can do the same using XMLHTTP that doesn't require you to open IE and most of the time, faster (You will need to add Microsoft HTML Object Library in your reference):
Private Sub ScrapeByXMLHTTP()
Dim xmlhttp As Object
Set xmlhttp = CreateObject("MSXML2.XMLHTTP")
With xmlhttp
.Open "GET", "https://www.onetonline.org/link/summary/13-2041.00"
.send
Dim htmlDoc As MSHTML.HTMLDocument
Set htmlDoc = New MSHTML.HTMLDocument
htmlDoc.body.innerHTML = .responseText
End With
Set xmlhttp = Nothing
htmlDoc.body.innerHTML = htmlDoc.getElementById("wrapper_WagesEmployment").getElementsByTagName("table")(0).outerHTML
Dim medianWageValue As String
medianWageValue = htmlDoc.getElementsByClassName("report2")(0).innerText
Set htmlDoc = nothing
Range("A1").Value = medianWageValue 'Fully qualify your range referece e.g. Sheet1.Range("A1").Value
End Sub

Get final URL with Excel VBA

I've been strugling with this issue, and trying to find a solution here in StackOverflow, but nothing helped.
I have thousands of links of images (Column A), that will get you to the final JPG url. It's not a redirect link because I've tried with different websites and it doesn't detect it.
Here is an example:
https://www.pepperl-fuchs.com/global/en/doci.htm?docilang=ENG&view=showproductpicbypartno&partno=000046
It will get you here:
https://files.pepperl-fuchs.com/webcat/navi/productInfo/pd/d428540a.jpg
So I would like to extrapolate all the final links in Column B.
I found some code that opens IE for each link, but it probably misses the function to copy the URL and paste it in the cell:
Sub Test()
Dim IE As Object
Dim URL As Range
Dim objDocument As Object
Dim x As Integer
Set IE = CreateObject("InternetExplorer.Application")
With IE
.Visible = True
For Each URL In Range("A2:A16")
.Navigate URL.Value
While .busy Or .ReadyState <> 4: DoEvents: Wend
If LCase(TypeName(objDocument)) = "htmldocument" Then
Cells(A, 1).Value = objDocument.URL
Cells(A, 2).Value = objDocument.Title
x = x + 1
End If
Next
End With
End Sub
Can you guys help me figure out what is missing? Unfortunately I'm not really familiar with VBA.
Thank you very much
Try this
Sub Test()
Dim IE As Object
Dim URL As Range
Dim objDocument As Object
Dim x As Integer
Set IE = CreateObject("InternetExplorer.Application")
With IE
.Visible = True
For Each URL In Range("A2:A16")
.Navigate URL.Value
While .busy Or .ReadyState <> 4: DoEvents: Wend
URL.Offset(, 1).Value = .LocationURL
Next
End With
End Sub
Try this code
Sub Test()
Dim html As HTMLDocument
Dim ie As Object
Dim objDocument As Object
Dim url As Range
Dim x As Integer
Set ie = CreateObject("InternetExplorer.Application")
x = 1
With ie
.Visible = True
For Each url In Range("A2:A3")
.navigate url.Value
While .Busy Or .readyState <> 4: DoEvents: Wend
Set html = .document
x = x + 1
Cells(x, 2).Value = html.url
Cells(x, 3).Value = html.Title
Next url
End With
End Sub
Need more to test with but this will be a lot faster and if you can easily adapt to using an array to loop faster than looping sheet by using Dim arr(): arr = .Range("A1:A" & .Cells(.Rows.Count, "A").End(xlUp).Row).Value and looping the first dimension.
Option Explicit
Public Sub GetInfo()
Dim rng As Range
With Worksheets("Sheet1")
For Each rng In .Range("A1:A" & .Cells(.Rows.Count, "A").End(xlUp).Row)
If InStr(rng.Value, "http") > 0 Then Debug.Print GetURL(rng.Value)
Next
End With
End Sub
Public Function GetURL(ByVal url As String) As String
Dim sResponse As String, s As Long, e As Long
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", url, False
.send
sResponse = StrConv(.responseBody, vbUnicode)
End With
s = InStr(1, sResponse, "https")
e = InStr(1, sResponse, ".jpg") + 4
GetURL = Mid(sResponse, s, e - s)
End Function
This does assume that all your links follow the same pattern as the first.

Not able to login to my indeed account using VBA

On line:
Set ieElement = objIE.Document.getElementsByClassName("sg-btn sg-btn-primary btn-signin")
ieElement.Click of the below code throws an error called "Object does not support this property or method".
Sub login()
Dim objIE As InternetExplorer
Dim uid As String
Dim pwd As String
Dim rng As Range
Dim sh As Worksheet
Dim ieElement As Object
Set sh = Sheets("Indeed Resume Download")
Set rng = sh.Range("A2")
On Error Resume Next
objIE.Quit
Set objIE = Nothing
On Error GoTo 0
uid = rng.Value
pwd = rng.Offset(0, 1).Value
Dim j As Long
Set objIE = New InternetExplorer 'Initialize internet object
objIE.Navigate "https://secure.indeed.com/account/loginservice=my&hl=en_IN&co=IN&continue=https%3A%2F%2Fwww.indeed.co.in%2F"
objIE.Visible = True
objIE.Document.all.signin_email.Value = UserID
objIE.Document.all.signin_password.Value = Password
Set ieElement = objIE.Document.getElementsByClassName("sg-btn sg-btn-primary btn-signin")
ieElement.Click
For j = 2 To sh.Cells(Rows.Count, 27).End(xlUp).Row
Set objIE = New InternetExplorer
With objIE
.Navigate sh.Range("CA" & j).Value
Do While .Busy = True
DoEvents
Loop
Do While .Busy: Loop
Do While .ReadyState <> READYSTATE_COMPLETE: Loop
Do While .Busy: Loop
End With
Do While objIE.Busy = True
DoEvents
Loop
Next j
Set objIE = Nothing
End Sub
When you use getElementsByClassName, you have to determine which class name you are wanting, as there can be many. If your class name is unique (which in this case it appears it could be), you can simply add a (0) to the end. If you want to search through all the class names you could use a For Each...Next statement.
Notice the different between these:
getElementByID() and getElementsByClassName()? Elements is plural when used with ClassName, so you need to designate which element of that class you want.
Try replacing your line with this:
Set ieElement = objIE.Document.getElementsByClassName("sg-btn sg-btn-primary btn-signin")(0)
The method you are currently using would work along these lines:
Dim ieElements, ieElemBtn, ieElement
Set ieElements = objIE.Document.getElementsByClassName("sg-btn sg-btn-primary btn-signin")
For Each ieElement in ieElements
If ieElement ..... Then
Set ieElemBtn = ieElement
Exit For
End If
Next ieElement
ieElemBtn.Click

Error While using Macro to add google's first image link to excel

I'm using the below Code to input Google's first images link in B1
for certain values in A1.
Public Sub Test()
Dim IE As InternetExplorer
Dim HTMLdoc As HTMLDocument
Dim imgElements As IHTMLElementCollection
Dim imgElement As HTMLImg
Dim aElement As HTMLAnchorElement
Dim N As Integer, I As Integer
Dim Url As String, Url2 As String
Dim LastRow As Long
Dim m, sImageSearchString
LastRow = Range("A" & Rows.Count).End(xlUp).Row
For I = 1 To LastRow
Url = "http://www.google.co.in/search?q=" & Cells(I, 1) & "&source=lnms&tbm=isch&sa=X&rnd=1"
Set IE = New InternetExplorer
With IE
.Visible = False
.Navigate Url 'sWebSiteURL
Do Until .readyState = 4: DoEvents: Loop
'Do Until IE.document.readyState = "Complete": DoEvents: Loop
Set HTMLdoc = .document
Set imgElements = HTMLdoc.getElementsByTagName("IMG")
N = 1
For Each imgElement In imgElements
If InStr(imgElement.src, sImageSearchString) Then
If imgElement.ParentNode.nodeName = "A" Then
Set aElement = imgElement.ParentNode
Url2 = imgElement.src
N = N + 1
End If
End If
Next
Cells(I, 2) = Url2
IE.Quit
Set IE = Nothing
End With
Next
End Sub
however I'm receiving the below error, can you please advise?
I'm using Windows 10, Excel 365
In VBA Menu - Tools - References - tick MS Internet Controls.
Or
Using Late Binding
Dim ie As Object
Set ie = CreateObject("InternetExplorer.Application")

How to import data from multiple pages?

I have a small piece of code that lists links in a site.
Sub ListLinks()
'Set a reference to microsoft Internet Controls
Dim IeApp As InternetExplorer
Dim sURL As String
Dim IeDoc As Object
Dim i As Long
Set IeApp = New InternetExplorer
IeApp.Visible = True
sURL = "http://www.sharenet.co.za/v3/q_sharelookup.php"
IeApp.Navigate sURL
Do
Loop Until IeApp.ReadyState = READYSTATE_COMPLETE
Set IeDoc = IeApp.Document
For i = 0 To IeDoc.Links.Length - 1
Cells(i + 1, 1).Value = IeDoc.Links(i).href
Next i
Set IeApp = Nothing
End Sub
That is very useful for listing all links in a site. How can I loop through these URLs and import data from each one?
For instance, the first link under 'Name or Sector' is this:
http://www.sharenet.co.za/v3/sharesfound.php?ssector=0533&exch=JSE&bookmark=Oil & Gas&scheme=default
There is actually nothing to import from there. The next link has some data:
http://www.sharenet.co.za/v3/sharesfound.php?ssector=0537&exch=JSE&bookmark=Oil%20-%20Integrated&scheme=default
The data from there looks like this:
Name Full Name Code Sector
SACOIL-N Sacoil Holdings Ltd NPL SCLN 0537
ERIN Erin Energy Corporation ERN 0537
BEE-SASOL BEE - SASOL LIMITED SOLBE1 0537
SACOIL SACOIL HOLDINGS LD SCL 0537
OANDO OANDO PLC OAO 0537
OANDORIGT OANDO PLC RIGT OAON 0537
MONTAUK Montauk Holdings Ltd MNK 0537
How can I import that data from each link?
This seems to work pretty well. It may need a bit of fine tuning, but this should be pretty darn close.
Sub ListLinks()
'Set a reference to microsoft Internet Controls
Dim IeApp As InternetExplorer
Dim sURL As String
Dim IeDoc As Object
Dim i As Long
Set IeApp = New InternetExplorer
IeApp.Visible = True
sURL = "http://www.sharenet.co.za/v3/q_sharelookup.php"
IeApp.Navigate sURL
Do
Loop Until IeApp.ReadyState = READYSTATE_COMPLETE
Set IeDoc = IeApp.Document
For i = 0 To IeDoc.Links.Length - 1
Cells(i + 1, 1).Value = IeDoc.Links(i).href
Next i
Set IeApp = Nothing
Call CopyFromURL
End Sub
Public Sub CopyFromURL()
Dim IE As InternetExplorer, doc As HTMLDocument
Dim thisClass As IHTMLElement2, thisLink As IHTMLElement
Dim rng As Range, cell As Range
Const READYSTATE_COMPLETE As Integer = 4
Dim TR_col As Object, TR As Object
Dim TD_col As Object, TD As Object
Dim row As Long, col As Long
row = 1
Set rng = Range("A1:A5")
For Each cell In rng
Set IE = CreateObject("InternetExplorer.Application")
IE.Navigate cell
Do While (IE.Busy Or IE.ReadyState <> READYSTATE_COMPLETE)
DoEvents
Loop
Set TR_col = IE.Document.getElementsByTagName("TR")
For Each TR In TR_col
Set TD_col = TR.getElementsByTagName("TD")
col = 2
For Each TD In TD_col
Cells(row, col) = TD.innerText
col = col + 1
Next
col = 2
row = row + 1
Next
Next cell
IE.Quit
End Sub