VBA: Run-time error 424: Object required when trying to web scrape - vba

I'm trying to update various fund sizes using morgninstar.co.uk. The code worked fine until it suddenly stopped and gave an error:
"Run-time error 424: Object required".
The exact line where the error occurs is:
Set allData = IE.document.getElementById("overviewQuickstatsDiv").getElementsByTagName("tbody")(0)
The idea is to ultimately scan the whole "tbody"-tag and look for the line "Fund Size" inside "tr" and "td"-tags. When "Fund Size" is found, the code would return the 3rd "td"-tag (actual fund size).
After this I'd add a loop to loop through a list of funds that I've got.
As the code stopped working completely, I haven't got this far yet. Here I'm just trying to check if the code returns the actual fund size.
Since there are not always 3 "td"-tags inside the "tr"-tags, I'll still have to construct some sort of IF-statement to fix that issue.
But for now I'd just want to know how I could get the code running again? I've spent great deal of time searching for an answer but as it seems that this is a variable type problem the solution depends on the situation.
I'm using Excel 2010 and Internet Explorer 11.
URL in easy form to copy-paste:
http://www.morningstar.co.uk/uk/funds/snapshot/snapshot.aspx?id=F0GBR04BKW
Sub testToScrapeWholeTbodyTag()
'Microsoft Internet Controls
'Microsoft HTML Object Library
'Microsoft Shell Controls and Automation
'======Opens URL======
Dim IE As Object
Set IE = CreateObject("internetexplorer.application")
With IE
.navigate "http://www.morningstar.co.uk/uk/funds/snapshot/snapshot.aspx?id=F0GBR04BKW"
.Visible = False
End With
While IE.Busy
DoEvents
Wend
'======Got from internet, fixed a previous error. However, I'm not 100% sure what this does======
Dim sh
Dim eachIE As Object
Do
Set sh = New Shell32.Shell
For Each eachIE In sh.Windows
If InStr(1, eachIE.LocationURL, "http://www.morningstar.co.uk/uk/funds/snapshot/snapshot.aspx?id=F0GBR04BKW") Then
Set IE = eachIE
IE.Visible = False '"This is here because in some environments, the new process defaults to Visible."
Exit Do
End If
Next eachIE
Loop
Set eachIE = Nothing
Set sh = Nothing
'======Looks for the "Fund Size"======
'Trying to look for "Fund Size" inside "tr"-tag and if found, return the value in the 3rd "tr"-tag
Set allData = IE.document.getElementById("overviewQuickstatsDiv").getElementsByTagName("tbody")(0) 'Run-time error 424: Object required
row1 = allData.getElementsByTagName("tr")(5).Cells(0).innerHTML
row2 = allData.getElementsByTagName("tr")(5).Cells(1).innerHTML
row3 = allData.getElementsByTagName("tr")(5).Cells(2).innerHTML
If Left(row1, 9) = "Fund Size" Then
Worksheets("Sheet3").Range("B3") = Split(row3, ";")(1)
End If
Debug.Print allData.getElementsByTagName("tr")(5).Cells(0).innerHTML '"Fund Size"
Debug.Print allData.getElementsByTagName("tr")(5).Cells(2).innerHTML 'Actual fund size
IE.Quit
Set IE = Nothing
End Sub
EDIT:
Switched method. Now the problem is to get the fund size extracted. So the below code works as it is but I'd need to add a couple of lines to get the fund size out of it. This is my first time using this method so it may well be that I've just not understood some really basic thing. Still, I wasn't able to find a solution to this on my own.
Sub XMLhttpRequestTest()
'Microsoft XML, v 6.0
'Microsoft HTML object library
Dim HTMLDoc As New HTMLDocument
Dim ohttp As New MSXML2.XMLHTTP60
Dim myurl As String
Dim TRelements As Object
Dim TRelement As Object
myurl = "http://www.morningstar.co.uk/uk/funds/snapshot/snapshot.aspx?id=F0GBR04BKW"
ohttp.Open "GET", myurl, False
ohttp.send
HTMLDoc.body.innerHTML = ohttp.responseText
With HTMLDoc.body
Set TRelements = .getElementsByTagName("tr")
For Each TRelement In TRelements
Debug.Print TRelement.innerText
Next
End With
End Sub

You can use a css selector of
#overviewQuickstatsDiv td.line.text
And then select the element at index 4
# means id. . = className.
Option Explicit
Public Sub XMLhttpRequestTest()
'Microsoft XML, v 6.0
'Microsoft HTML object library
Dim HTMLDoc As New HTMLDocument, ohttp As New MSXML2.XMLHTTP60
Const URL As String = "http://www.morningstar.co.uk/uk/funds/snapshot/snapshot.aspx?id=F0GBR04BKW"
Dim TRelements As Object, TRelement As Object
With ohttp
.Open "GET", URL, False
.send
HTMLDoc.body.innerHTML = .responseText
Debug.Print HTMLDoc.querySelectorAll("#overviewQuickstatsDiv td.line.text")(4).innerText
'Your other stuff
End With
End Sub

Related

Can't get rid of "old format or invalid type library" error in vba

I've written a script in vba to get some set names from a webpage and the script is getting them accordingly until it catches an error somewhere within the execution. This is the first time I encountered such error.
What my script is doing is get all the links under Company Sets and then tracking down each of the links it goes one layer deep and then following all the links under Set Name it goes another layer deep and finally parse the table from there. I parsed the name of PUBLISHED SET which is stored within the variable bName instead of the table as the script is getting bigger. I used IE to get the PUBLISHED SET as there are few leads which were causing encoding issues.
I searched through all the places to find any workaround but no luck.
However, I came across this thread where there is a proposed solution written in vb but can't figure out how can I make it work within vba.
Script I'm trying with:
Sub FetchRecords()
Const baseUrl$ = "https://www.psacard.com"
Const link = "https://www.psacard.com/psasetregistry/baseball/company-sets/16"
Dim IE As New InternetExplorer, Htmldoc As HTMLDocument
Dim Http As New XMLHTTP60, Html As New HTMLDocument, bName$, tRow As Object
Dim post As Object, elem As Object, posts As Object, I&, R&, C&
Dim key As Variant
Dim idic As Object: Set idic = CreateObject("Scripting.Dictionary")
With Http
.Open "GET", link, False
.send
Html.body.innerHTML = .responseText
End With
Set posts = Html.querySelectorAll(".dataTable tr td a[href*='/psasetregistry/baseball/company-sets/']")
For I = 0 To posts.Length - 7
idic(baseUrl & Split(posts(I).getAttribute("href"), "about:")(1)) = 1
Next I
For Each key In idic.Keys
With Http
.Open "GET", key, False
.send
Html.body.innerHTML = .responseText
End With
For Each post In Html.getElementsByTagName("a")
If InStr(post.getAttribute("title"), "Contact User") > 0 Then
If InStr(post.ParentNode.getElementsByTagName("a")(0).getAttribute("href"), "publishedset") > 0 Then
IE.Visible = True
IE.navigate baseUrl & Split(post.ParentNode.getElementsByTagName("a")(0).getAttribute("href"), "about:")(1)
While IE.Busy = True Or IE.readyState < 4: DoEvents: Wend
Set Htmldoc = IE.document
bName = Htmldoc.querySelector("h1 b.text-primary").innerText
If InStr(bName, "/") > 0 Then bName = Split(Htmldoc.querySelector(".inline-block a[href*='/contactuser/']").innerText, " ")(1)
R = R + 1: Cells(R, 1) = bName
End If
End If
Next post
Next key
IE.Quit
End Sub
I get that error pointing at the following line after extracting records between 70 to 90:
bName = Htmldoc.querySelector("h1 b.text-primary").innerText
The error looks like:
Automation Error: old format or invalid type library
Proposed solution in the linked thread written in vb (can't convert to vba):
'save the current settings for easier restoration later
Dim oldCI As System.Globalization.CultureInfo = _
System.Threading.Thread.CurrentThread.CurrentCulture
'change the settings
System.Threading.Thread.CurrentThread.CurrentCulture = _
New System.Globalization.CultureInfo("en-US")
Your code here
'restore the previous settings
System.Threading.Thread.CurrentThread.CurrentCulture = oldCI

VBA doesn't read XMLHTTP request's response according to its tree structure

I have checked that both browser-generated page and VBA XMLHTTP request's string response have the same tree structure, with a tag being a child of aside.
Unfortunately when I want to return bookie name, which is title attribute of a, I get an error accessing 1st child of aside. It comes out that I need to use code assuming that a tag is a sibling of aside to get it working:
Required reference: Microsoft HTML Library
Sub SendRequest()
Dim XMLHTTP As Object: Set XMLHTTP = CreateObject("MSXML2.XMLHTTP.6.0")
Dim htmlEle1 As IHTMLElement
Dim htmlDoc As New HTMLDocument
Dim urlName As String
urlName = "https://www.oddschecker.com/golf/the-masters/2018-us-masters/winner"
With XMLHTTP
.Open "GET", urlName, False
.send
htmlDoc.body.innerHTML = .responseText
For Each htmlEle1 In htmlDoc.getElementsByClassName("eventTableHeader")(0).Children
If InStr(htmlEle1.className, "bookie-area") <> 0 Then
Debug.Print htmlEle1.Children(1).getAttribute("title")
End If
Next htmlEle1
End With
End Sub
Does this behavior have something to do with the fact that aside is HTML5 element and VBA thinks that it is a semi-closing tag?
So this took awful lot of time to figure out. The issue is that you can't do it this way. When you launch a new HTMLDocument the documentMode of it is by default set to 5
So when we load a write any HTML inside it, it has no idea of these HTML5 tags and it just does its own correction. This is as good as you running HTML5 site in a IE6 browser or something. Unfortunately there is no way I could find out which would allow us to create/parse document with a higher documentMode
Update
Thanks to #FlorentB for pointing out that emulation mode works on the MSHTML library as well. I was already aware of the same from below
Embedding Youtube Videos in webbrowser. Object doesn't support property or method
But I assumed it won't work for the MSHTML library. I have now tested it by running below command
REG ADD "HKCU\Software\Microsoft\Internet Explorer\Main\FeatureControl\FEATURE_BROWSER_EMULATION" /v excel.exe /t REG_DWORD /d 11001 /f
And then the existing code and it works.
Alternat approach
If setting the registry key needs to be avoided for any reason then one can use the IE COM Browser directly.
You can do this by adding a reference to Microsoft Internet Controls and then execute the below code
Sub dothis()
Dim XMLHTTP As Object: Set XMLHTTP = CreateObject("MSXML2.XMLHTTP.6.0")
Dim htmlEle1 As IHTMLElement
Dim htmlDoc As HTMLDocument
'Set htmlIDoc = htmlDoc
Dim urlName As String
urlName = "https://www.oddschecker.com/golf/the-masters/2018-us-masters/winner"
Dim ie As InternetExplorerMedium
Set ie = New InternetExplorerMedium
ie.Visible = False
ie.navigate2 urlName
While ie.readyState <> READYSTATE_COMPLETE
DoEvents
Wend
Set htmlDoc = ie.document
Debug.Print (htmlDoc.documentMode)
For Each htmlEle1 In htmlDoc.getElementsByClassName("eventTableHeader")(0).Children
If InStr(htmlEle1.className, "bookie-area") <> 0 Then
Debug.Print htmlEle1.Children(0).children(0).getAttribute("title")
End If
Next htmlEle1
End Sub
And now you can see that a is a child of aside

Excel VBA Web Scraping

I'm new to Excel VBA programming and need some help to debug my code for web scraping some data off of Amazon.
On my sheet1, I am listing the asins of products and trying to use this code to complete the URL to land onto the product page, then will the sales rank of the product via class ID or another html tag.
However, I keep getting the error:
User-Defined Type not Defined
I can't figure out where the bug is.
The code is listed below:
Private Sub Worksheet_Change(ByVal Target As Range)
If Target.Row = Range("azAsin").Row And Target.Column = Range("azAsin").Column Then
Dim IE As New InternetExplorer
IE.Visible = True
IE.Navigate "https://www.amazon.com/dp/" & Range("azAsin").Value
Do
DoEvents
Loop Until IE.readyState = READYSTATE_COMPLETE
Dim pap As HTMLDocument
Set pap = IE.document
Dim sDoc As String
sDoc = Doc.getElementsById("SalesRank")
MsgBox sDoc
End If
End Sub
Try this:
Dim ie As Object
Set ie = CreateObject("InternetExplorer.Application")
ie.Visible = True
'...
Dim objRank as Object
Set objRank = ie.Document.getElementById("SalesRank")
Dim rank As String
If Not objRank Is Nothing Then rank = objRank.innerText
Here is a fully latebound version that also removes the need for a browser.
Observations on your code:
The other answer has corrected, but not mentioned, that in your existing script you attempt:
Dim sDoc As String
Doc = Doc.getElementsById("SalesRank")
getElementsById is invalid as Id should be singular and the syntax is getElementById. This will also return an object, not a string. so you would need ie.Document.getElementById("SalesRank").innerText to get a string; though it is better to set to an object first and test If Not obj Is Nothing Then, as indeed the former answerer has done.
Code:
Option Explicit
Private Sub Worksheet_Change(ByVal Target As Range)
Dim html As Object, i As Long, URL As String
URL = "https://www.amazon.com/dp/" & Range("azAsin").Value
If Target.Row = Range("azAsin").Row And Target.Column = Range("azAsin").Column Then
With CreateObject("MSXML2.serverXMLHTTP")
.Open "GET", URL, False
.send
Set html = CreateObject("HTMLFile")
html.body.innerHTML = .responseText
End With
For i = 0 To html.all.Length - 1
If html.all(i).ID = "SalesRank" Then MsgBox html.all(i).innerText
Next i
End If
End Sub

How can I import data from a child URL?

I thought I figured this out over the weekend, but it actually doesn't work the way I thought it would. I have a confidential corporate SharePoint site that I work with. I can't post the link here, or any specific data, but the concept below will illustrate the point fine.
I have a parent URL that I want to import data from. Let's say this is the parent URL.
http://www.sharenet.co.za/v3/q_sharelookup.php
From there, I want to import data from a specific link. Let's say this is the link: 'Building & Construction Materials'
I think the best way to do this is some kind of InStr() function and search for the string. Then, if found, click the link and open the child URL. When the child URL opens, it looks something like this:
http://www.sharenet.co.za/v3/sharesfound.php?ssector=2353&exch=JSE&bookmark=Building%20&%20Construction%20Materials&scheme=default
I can't tell what the sector numbers will be ahead of time, so I can't use a specific URL. I need to reference it as the parent and child, or maybe IE1 and IE2. I want to import all data from the child URL, which in this example, looks like this.
Name Full Name Code Sector
BUILDMX BUILDMAX LIMITED BDM 2353
KAYDAV KAYDAV GROUP LTD KDV 2353
AFRIMAT AFRIMAT LTD AFT 2353
Trellidor Trellidor Hldgs Ltd TRL 2353
MASONITE MASONITE (AFRICA) LIMITED MAS 2353
DAWN DISTRIBUTION AND WAREHOUSING NETWORK LIMITED DAW 2353
MAZOR MAZOR GROUP LTD MZR 2353
PPC PPC LIMITED PPC 2353
PPCN PPC Limited NPL PPCN 2353
Just to demonstrate how I tried to solve this, I tried the script below.
Sub ListLinks()
'Set a reference to microsoft Internet Controls
Dim IeApp As InternetExplorer
Dim sURL As String
Dim IeDoc As Object
Dim i As Long
Set IeApp = New InternetExplorer
IeApp.Visible = True
sURL = "http://www.sharenet.co.za/v3/q_sharelookup.php"
IeApp.Navigate sURL
Do
Loop Until IeApp.ReadyState = READYSTATE_COMPLETE
Set IeDoc = IeApp.Document
For i = 0 To IeDoc.Links.Length - 1
Cells(i + 1, 1).Value = IeDoc.Links(i).href
Next i
Set IeApp = Nothing
End Sub
I thought it would work fine, to list all URLs, and then loop through each to import data, but the problem on my SharePoint site is that the href doesn't appear to have any relevance to the name of the hyperlink.
In the picture above you can see 'Building & Construction Materials' in the TD element. If I can reference that in the 1st browser, and click the correct link to open a 2nd browser, and then reference that 2nd browser and scrape all TD elements from that, everything should work fine. Does anyone here know how to do that?
Good try on the code, got it pretty close- the one area that needs some fixing is when you try and get the list of items and loop it. You had the right idea on how it would work, but the HTML element syntaxes a little off so looks like just need some more experience using HTML objects... see sample code below:
Public Sub sampleCode()
Dim URL As String
Dim XMLHTTP As MSXML2.XMLHTTP60
Dim HTMLDoc_Main As HTMLDocument
Dim HTMLDoc_Secondary As HTMLDocument
Dim targetTable As HTMLObjectElement
Dim links As IHTMLElementCollection
Dim linkCounter As Long
Dim searchText As String
URL = "http://www.sharenet.co.za/v3/q_sharelookup.php"
searchText = "Building & Construction Materials"
Set XMLHTTP = New MSXML2.XMLHTTP60
Set HTMLDoc_Main = New HTMLDocument
With XMLHTTP
.Open "GET", URL, False
.send
While .readyState <> 4: Wend
HTMLDoc_Main.body.innerHTML = .responseText
End With
Set targetTable = HTMLDoc_Main.getElementsByClassName("dataTable")(0)
Set links = targetTable.getElementsByTagName("a")
For linkCounter = 0 To links.Length - 1
With links(linkCounter)
If InStr(1, .innerText, searchText) > 0 Then
Set XMLHTTP = New MSXML2.XMLHTTP60
Set HTMLDoc_Secondary = New HTMLDocument
XMLHTTP.Open "GET", .href, False
XMLHTTP.send
While XMLHTTP.readyState <> 4: Wend
HTMLDoc_Secondary.body.innerHTML = XMLHTTP.responseText
'Parse HTMLDoc_Secondary
End If
End With
Next
Set XMLHTTP = Nothing
Set HTMLDoc_Main = Nothing
Set HTMLDoc_Secondary = Nothing
End Sub
Couple notes- 1) I used XMLHTTPRequest instead of IE as it is faster so 2) you are going to need to add 'Microsoft HTML Object Library' and 'Microsoft XML, v6.0' to your references and 3) I can see you are outputting to ranges in your original code- if at all possible this should be avoided. Populate an array and then dump its entire contents out into your target sheet all at once to save time...
Hope this helps,
TheSilkCode

My excel macro generates errors inconsistently

I am working on a macro to automate some web browser stuff and there is an intermittent runtime error 91 "object variable or with block variable not set". I have been playing around with how I declare and create my browser object but nothing is working.
Sub Morning_Script()
Dim WebBrowser As Object
Set WebBrowser = New InternetExplorerMedium
WebBrowser.Visible = True
WebBrowser.navigate "www.google.com"
While WebBrowser.Busy = True
DoEvents
Wend
WebBrowser.document.getElementById("lst-ib").Value = "test"
'WebBrowser.document.getElementById("verify").Value = Worksheets("Sheet1").Range("B2")
'WebBrowser.document.getElementById("institution").Value = "xxx"
'Worksheets("Sheet1").Range("B1").Clear
'Worksheets("Sheet1").Range("B2").Clear
End Sub
The URL has been changed to a public website and the form entry has been commented out for now.
If you would prefer to avoid fixed timers, try this one:
Dim objElement As Object: Set objElement = Nothing
While objElement Is Nothing
Set objElement = WebBrowser.document.getElementById("lst-ib")
DoEvents
Wend
objElement.Value = "test"
If you would like to refer to verify and institution to, you would have to implement this scheme for them, too and wait until all three variables are non-nothing. You may also want to set a timeout or maximum retry number if there is a really great number of pages to download.