Excel VBA Web Scraping - vba

I'm new to Excel VBA programming and need some help to debug my code for web scraping some data off of Amazon.
On my sheet1, I am listing the asins of products and trying to use this code to complete the URL to land onto the product page, then will the sales rank of the product via class ID or another html tag.
However, I keep getting the error:
User-Defined Type not Defined
I can't figure out where the bug is.
The code is listed below:
Private Sub Worksheet_Change(ByVal Target As Range)
If Target.Row = Range("azAsin").Row And Target.Column = Range("azAsin").Column Then
Dim IE As New InternetExplorer
IE.Visible = True
IE.Navigate "https://www.amazon.com/dp/" & Range("azAsin").Value
Do
DoEvents
Loop Until IE.readyState = READYSTATE_COMPLETE
Dim pap As HTMLDocument
Set pap = IE.document
Dim sDoc As String
sDoc = Doc.getElementsById("SalesRank")
MsgBox sDoc
End If
End Sub

Try this:
Dim ie As Object
Set ie = CreateObject("InternetExplorer.Application")
ie.Visible = True
'...
Dim objRank as Object
Set objRank = ie.Document.getElementById("SalesRank")
Dim rank As String
If Not objRank Is Nothing Then rank = objRank.innerText

Here is a fully latebound version that also removes the need for a browser.
Observations on your code:
The other answer has corrected, but not mentioned, that in your existing script you attempt:
Dim sDoc As String
Doc = Doc.getElementsById("SalesRank")
getElementsById is invalid as Id should be singular and the syntax is getElementById. This will also return an object, not a string. so you would need ie.Document.getElementById("SalesRank").innerText to get a string; though it is better to set to an object first and test If Not obj Is Nothing Then, as indeed the former answerer has done.
Code:
Option Explicit
Private Sub Worksheet_Change(ByVal Target As Range)
Dim html As Object, i As Long, URL As String
URL = "https://www.amazon.com/dp/" & Range("azAsin").Value
If Target.Row = Range("azAsin").Row And Target.Column = Range("azAsin").Column Then
With CreateObject("MSXML2.serverXMLHTTP")
.Open "GET", URL, False
.send
Set html = CreateObject("HTMLFile")
html.body.innerHTML = .responseText
End With
For i = 0 To html.all.Length - 1
If html.all(i).ID = "SalesRank" Then MsgBox html.all(i).innerText
Next i
End If
End Sub

Related

Can't get rid of "old format or invalid type library" error in vba

I've written a script in vba to get some set names from a webpage and the script is getting them accordingly until it catches an error somewhere within the execution. This is the first time I encountered such error.
What my script is doing is get all the links under Company Sets and then tracking down each of the links it goes one layer deep and then following all the links under Set Name it goes another layer deep and finally parse the table from there. I parsed the name of PUBLISHED SET which is stored within the variable bName instead of the table as the script is getting bigger. I used IE to get the PUBLISHED SET as there are few leads which were causing encoding issues.
I searched through all the places to find any workaround but no luck.
However, I came across this thread where there is a proposed solution written in vb but can't figure out how can I make it work within vba.
Script I'm trying with:
Sub FetchRecords()
Const baseUrl$ = "https://www.psacard.com"
Const link = "https://www.psacard.com/psasetregistry/baseball/company-sets/16"
Dim IE As New InternetExplorer, Htmldoc As HTMLDocument
Dim Http As New XMLHTTP60, Html As New HTMLDocument, bName$, tRow As Object
Dim post As Object, elem As Object, posts As Object, I&, R&, C&
Dim key As Variant
Dim idic As Object: Set idic = CreateObject("Scripting.Dictionary")
With Http
.Open "GET", link, False
.send
Html.body.innerHTML = .responseText
End With
Set posts = Html.querySelectorAll(".dataTable tr td a[href*='/psasetregistry/baseball/company-sets/']")
For I = 0 To posts.Length - 7
idic(baseUrl & Split(posts(I).getAttribute("href"), "about:")(1)) = 1
Next I
For Each key In idic.Keys
With Http
.Open "GET", key, False
.send
Html.body.innerHTML = .responseText
End With
For Each post In Html.getElementsByTagName("a")
If InStr(post.getAttribute("title"), "Contact User") > 0 Then
If InStr(post.ParentNode.getElementsByTagName("a")(0).getAttribute("href"), "publishedset") > 0 Then
IE.Visible = True
IE.navigate baseUrl & Split(post.ParentNode.getElementsByTagName("a")(0).getAttribute("href"), "about:")(1)
While IE.Busy = True Or IE.readyState < 4: DoEvents: Wend
Set Htmldoc = IE.document
bName = Htmldoc.querySelector("h1 b.text-primary").innerText
If InStr(bName, "/") > 0 Then bName = Split(Htmldoc.querySelector(".inline-block a[href*='/contactuser/']").innerText, " ")(1)
R = R + 1: Cells(R, 1) = bName
End If
End If
Next post
Next key
IE.Quit
End Sub
I get that error pointing at the following line after extracting records between 70 to 90:
bName = Htmldoc.querySelector("h1 b.text-primary").innerText
The error looks like:
Automation Error: old format or invalid type library
Proposed solution in the linked thread written in vb (can't convert to vba):
'save the current settings for easier restoration later
Dim oldCI As System.Globalization.CultureInfo = _
System.Threading.Thread.CurrentThread.CurrentCulture
'change the settings
System.Threading.Thread.CurrentThread.CurrentCulture = _
New System.Globalization.CultureInfo("en-US")
Your code here
'restore the previous settings
System.Threading.Thread.CurrentThread.CurrentCulture = oldCI

IE Web Automation - How to auto select value from combo box using Excel VBA/XML Macro

I'm a beginner in VBA and I've failed to select country name automatically in web Combo box or list box from my Excel spreadsheet. My code is entering country name only, but not selecting it.
How can I change this code so it can pick country name from my Excel spreadsheet and select the same in web combo box as a loop. Passport number, DOB and Nationality are correct on my code. If you'll use manually then you can find the work permit number which I need to capture in my spreadsheet. Chrome Inspect Element screenshot is attached herewith.
My code is as follows:
Sub MOL()
Dim IE As New SHDocVw.InternetExplorer
Dim Doc As MSHTML.HTMLDocument
Dim Buttons As MSHTML.IHTMLElementCollection
Dim Button As MSHTML.IHTMLElement
Dim HTMLInput As MSHTML.IHTMLElement
Dim Tags As MSHTML.IHTMLElement
Dim HTMLTables As MSHTML.IHTMLElementCollection
Dim HTMLTable As MSHTML.IHTMLElement
Dim HTMLRow As MSHTML.IHTMLElement
Dim HTMLCell As MSHTML.IHTMLElement
Dim Alltext As IHTMLElementCollection
Application.ScreenUpdating = False
'Application.Calculation = xlCalculationManual
'Application.EnableEvents = False
On Error Resume Next
IE.Visible = True
IE.navigate "https://eservices.mol.gov.ae/SmartTasheel/Complain/IndexLogin?lang=en-gb"
Do While IE.readyState <> READYSTATE_COMPLETE: Loop
Set Doc = IE.document
Set Buttons = Doc.getElementsByTagName("Button")
Buttons(2).Click
Do While IE.readyState <> READYSTATE_INTERACTIVE = 3: Loop
Set HTMLInputs = Doc.getElementsByTagName("Input")
HTMLInputs(46).Value = "somevalue"
HTMLInputs(48).Value = "24/02/1990"
HTMLInputs(47).Value = "India"
Buttons(21).Click
End Sub
The solution you look for is a bit difficult to provide. There are few tricky parts to hurdle to select the NATIONALITY from dropdown. I've used .querySelector() within the script to make it concise. However, it should serve your purpose no matter whatever country you wanna select from dropdown. Give it a shot:
Sub GetInfo()
Dim IE As New InternetExplorer, HTML As HTMLDocument, post As Object, URL$
URL = "https://eservices.mol.gov.ae/SmartTasheel/Complain/IndexLogin?lang=en-gb"
With IE
.Visible = True
.navigate URL
While .Busy = True Or .readyState < 4: DoEvents: Wend
Set HTML = .document
HTML.getElementById("TransactionInfo_WorkPermitNumber").innerText = "2659558"
HTML.querySelector("button[ng-click='showEmployeeSearch()']").Click
Application.Wait Now + TimeValue("00:00:03") ''If for some reason the script fails, make sure to increase the delay
HTML.getElementById("txtPassportNumber").Value = "J2659558"
HTML.getElementById("Nationality").Focus
For Each post In HTML.getElementsByClassName("ng-scope")
With post.getElementsByClassName("ng-binding")
For I = 0 To .Length - 1
If .item(I).innerText = "INDIA" Then ''you can change the country name here to select from dropdown
.item(I).Click
Exit For
End If
Next I
End With
Next post
HTML.getElementById("txtBirthDate").Value = "24/02/1990"
HTML.querySelector("button[onclick='SearchEmployee()']").Click
End With
End Sub
Reference to add to the library:
Microsoft Internet Controls
Microsoft HTML Object library
When you execute the above script, it should give you the desired result.
Another way would be to go for using xmlhttp request which is way faster than IE. You need to pass the query string parameter arguments as dictionary through "POST" request. If you want to change the parameter as in, birth date,passportor nationality just do it in the QueryString. Btw, the Nationality parameter should be filled in with value instead of name as in, 100 for INDIA. This is how your script should look like:
Sub Get_Data()
Dim res As Variant, QueryString$, ID$, Name$
QueryString = "{""PersonPassportNumber"":""J2659558"",""PersonNationality"":""100"",""PersonBirthDate"":""24/02/1990""}"
With New XMLHTTP
.Open "POST", "https://eservices.mol.gov.ae/SmartTasheel/Dashboard/GetEmployees", False
.setRequestHeader "User-Agent", "Mozilla/5.0"
.setRequestHeader "Content-Type", "application/json"
.send QueryString
res = .responseText
End With
ID = Split(Split(Split(res, "Employees"":")(1), "ID"":""")(1), """,")(0)
Name = Split(Split(Split(res, "Employees"":")(1), "OtherData2"":""")(1), """}")(0)
[A1] = ID: [B1] = Name
End Sub
Reference to add to the library:
Microsoft XML, V6.0
Running the above script, you should get the NAME and ID of your required search.

Unable to fetch some content using createDocumentFromUrl in vba

I've written some code in vba to get the movie names from a torrent website using .createDocumentFromUrl() method. As I've never worked with this method and haven't found any remarkable information either on it to successfully go with, I get stuck. I have tried to create a script, though.
Here is my try:
Sub Fromurl_Method()
Const URL As String = "https://yts.am/browse-movies"
Dim hStart As HTMLDocument, hdoc As HTMLDocument
Set hStart = New HTMLDocument
Set hdoc = hStart.createDocumentFromUrl(URL, vbNullString)
Do While hdoc.readyState = "loading" Or hdoc.readyState = "interactive": DoEvents: Loop
Debug.Print hdoc.DocumentElement.innerHTML
End Sub
When I execute the above script, it fetches some html elements which are not from that website. I took a closer look into the elements I have parsed and noticed this line on the top This content cannot be displayed in a frame. It is behaving the same way with most of the sites. How can I make it successful? Thanks in advance.
Once again, my intention is to parse all the movie names from that site using .createDocumentFromUrl().
Sub Get_Info()
Dim Elems, e As Variant
Const READYSTATE_COMPLETE& = 4&
Dim ie As Object
Set ie = Nothing
DoEvents
Set ie = CreateObject("InternetExplorer.Application")
DoEvents
With ie
.Visible = false
.Navigate "https://yts.am/browse-movies"
While Not .readyState = READYSTATE_COMPLETE
DoEvents
Wend
End With
Dim i As Double
With ie.Document
Set Elems = .getElementsByTagName("a")
DoEvents
i = 2
For Each e In Elems
If e.getAttribute("class") = "browse-movie-title" Then
Range("A" & i).Value = e.innerText
i = i + 1
End If
Next e
End With
Set Elems = Nothing
Set e = Nothing
ie.Quit
Set ie = Nothing
End Sub
The code above will give you a list of all movies. Just modify the code to adapt it to your needs of getting the first one if you only need the first one.

VBA: Run-time error 424: Object required when trying to web scrape

I'm trying to update various fund sizes using morgninstar.co.uk. The code worked fine until it suddenly stopped and gave an error:
"Run-time error 424: Object required".
The exact line where the error occurs is:
Set allData = IE.document.getElementById("overviewQuickstatsDiv").getElementsByTagName("tbody")(0)
The idea is to ultimately scan the whole "tbody"-tag and look for the line "Fund Size" inside "tr" and "td"-tags. When "Fund Size" is found, the code would return the 3rd "td"-tag (actual fund size).
After this I'd add a loop to loop through a list of funds that I've got.
As the code stopped working completely, I haven't got this far yet. Here I'm just trying to check if the code returns the actual fund size.
Since there are not always 3 "td"-tags inside the "tr"-tags, I'll still have to construct some sort of IF-statement to fix that issue.
But for now I'd just want to know how I could get the code running again? I've spent great deal of time searching for an answer but as it seems that this is a variable type problem the solution depends on the situation.
I'm using Excel 2010 and Internet Explorer 11.
URL in easy form to copy-paste:
http://www.morningstar.co.uk/uk/funds/snapshot/snapshot.aspx?id=F0GBR04BKW
Sub testToScrapeWholeTbodyTag()
'Microsoft Internet Controls
'Microsoft HTML Object Library
'Microsoft Shell Controls and Automation
'======Opens URL======
Dim IE As Object
Set IE = CreateObject("internetexplorer.application")
With IE
.navigate "http://www.morningstar.co.uk/uk/funds/snapshot/snapshot.aspx?id=F0GBR04BKW"
.Visible = False
End With
While IE.Busy
DoEvents
Wend
'======Got from internet, fixed a previous error. However, I'm not 100% sure what this does======
Dim sh
Dim eachIE As Object
Do
Set sh = New Shell32.Shell
For Each eachIE In sh.Windows
If InStr(1, eachIE.LocationURL, "http://www.morningstar.co.uk/uk/funds/snapshot/snapshot.aspx?id=F0GBR04BKW") Then
Set IE = eachIE
IE.Visible = False '"This is here because in some environments, the new process defaults to Visible."
Exit Do
End If
Next eachIE
Loop
Set eachIE = Nothing
Set sh = Nothing
'======Looks for the "Fund Size"======
'Trying to look for "Fund Size" inside "tr"-tag and if found, return the value in the 3rd "tr"-tag
Set allData = IE.document.getElementById("overviewQuickstatsDiv").getElementsByTagName("tbody")(0) 'Run-time error 424: Object required
row1 = allData.getElementsByTagName("tr")(5).Cells(0).innerHTML
row2 = allData.getElementsByTagName("tr")(5).Cells(1).innerHTML
row3 = allData.getElementsByTagName("tr")(5).Cells(2).innerHTML
If Left(row1, 9) = "Fund Size" Then
Worksheets("Sheet3").Range("B3") = Split(row3, ";")(1)
End If
Debug.Print allData.getElementsByTagName("tr")(5).Cells(0).innerHTML '"Fund Size"
Debug.Print allData.getElementsByTagName("tr")(5).Cells(2).innerHTML 'Actual fund size
IE.Quit
Set IE = Nothing
End Sub
EDIT:
Switched method. Now the problem is to get the fund size extracted. So the below code works as it is but I'd need to add a couple of lines to get the fund size out of it. This is my first time using this method so it may well be that I've just not understood some really basic thing. Still, I wasn't able to find a solution to this on my own.
Sub XMLhttpRequestTest()
'Microsoft XML, v 6.0
'Microsoft HTML object library
Dim HTMLDoc As New HTMLDocument
Dim ohttp As New MSXML2.XMLHTTP60
Dim myurl As String
Dim TRelements As Object
Dim TRelement As Object
myurl = "http://www.morningstar.co.uk/uk/funds/snapshot/snapshot.aspx?id=F0GBR04BKW"
ohttp.Open "GET", myurl, False
ohttp.send
HTMLDoc.body.innerHTML = ohttp.responseText
With HTMLDoc.body
Set TRelements = .getElementsByTagName("tr")
For Each TRelement In TRelements
Debug.Print TRelement.innerText
Next
End With
End Sub
You can use a css selector of
#overviewQuickstatsDiv td.line.text
And then select the element at index 4
# means id. . = className.
Option Explicit
Public Sub XMLhttpRequestTest()
'Microsoft XML, v 6.0
'Microsoft HTML object library
Dim HTMLDoc As New HTMLDocument, ohttp As New MSXML2.XMLHTTP60
Const URL As String = "http://www.morningstar.co.uk/uk/funds/snapshot/snapshot.aspx?id=F0GBR04BKW"
Dim TRelements As Object, TRelement As Object
With ohttp
.Open "GET", URL, False
.send
HTMLDoc.body.innerHTML = .responseText
Debug.Print HTMLDoc.querySelectorAll("#overviewQuickstatsDiv td.line.text")(4).innerText
'Your other stuff
End With
End Sub

Get ticker price within html class issue. Creating a stock ticker through InternetExplorer.application

I'm trying to create my own ticker price. Now I'm trying to get the ticker prices for indexes and trust funds which complicates things. I want to access the ticker price for the following url: https://www.avanza.se/fonder/om-fonden.html/313047/norron-active-r
Now the issue is getting to the 180,05 ( under "NAV-kurs" ) ticket price which is under the HTML class"SText bold" ( right-clicking price and selecting view component ).
I can do individual stocks through yahoo finance through getelementbyid but how do I access the innertext for a HTML-class? I can't find any property that works in connection to getelementbyclassname which I tried.
Code below
Private Sub get_ticker()
Dim ie_app As InternetExplorer
Dim ie_doc As htmldocument
Dim ticker As String
Set ie_app = CreateObject("internetexplorer.application")
ie_app.Visible = True
ie_app.navigate ("https://www.avanza.se/fonder/om-fonden.html/313047/norron-active-r")
Do Until ie_app.readyState = READYSTATE_COMPLETE
DoEvents
Loop
Set ie_doc = ie_app.document
ticker = ie_doc.getelement <<---- gaah
Debug.Print ticker
End Sub
If you are reluctant to use Javascript at all you can try something like that.
Note: this code relies on early binding and requires both the Microsoft XML (v6.0) and Microsoft HTML Object Library to be ticked in your references.
Sub getPrice()
Dim xhr As MSXML2.XMLHTTP60
Dim doc As MSHTML.HTMLDocument
Dim XSText As MSHTML.IHTMLElementCollection
Dim elt As MSHTML.HTMLDivElement
Dim parentElt As MSHTML.HTMLLIElement
Dim myPrice As Single
Set xhr = New MSXML2.XMLHTTP60
With xhr
.Open "GET", "https://www.avanza.se/fonder/om-fonden.html/313047/norron-active-r", False
.send
If .readyState = 4 And .Status = 200 Then
Set doc = New MSHTML.HTMLDocument
doc.body.innerHTML = .responseText
Else
MsgBox "xhr error"
Exit Sub
End If
End With
set xhr = Nothing
Set XSText = doc.getElementsByClassName("XSText")
For Each elt In XSText
If InStr(elt.innerHTML, "NAV SEK") <> 0 Then
Set parentElt = elt.parentElement
myPrice = CSng(parentElt.getElementsByClassName("SText bold")(0).innerHTML)
End If
Next
MsgBox myPrice
End Sub
Cheat - That page contains jQuery which is much nicer to use to select elements with no ID, E.g. tell it to look for a div of class XSText containing NAV SEK : div.XSText:contains('NAV SEK') and read the text of the next element:
...
Set ie_doc = ie_app.Document
''Create a new element in the document we can read from:
Dim tempInput As HTMLInputElement
Set tempInput = ie_doc.createElement("input")
tempInput.Type = "hidden"
tempInput.ID = "tempInput"
'' add it to the document
ie_doc.appendChild tempInput
'' use jQuery to lookup the value and assign it to the temp input
ie_doc.parentWindow.execScript("$('#tempInput').val($( ""div.XSText:contains('NAV SEK')"" ).next().text())")
'' read the value
msgbox tempInput.Value