I've written some code in vba to find certain identities against some names in some websites. The code is working well if everything in it's right order, i meant if the link is valid, the name matches with a tags and finally the regex can find the identity. If any of the three or all of the three are bad searches then the script throws error. I've already specified the position where error occurs in my below script.
All i expect from you experts to provide me with any solution as to how i can handle the errors and let my script continue until all the links are exhausted.
As I do not have much knowledge on VBA so i tried with On error resume next to skip the errors. However, it turns out to be a clear mess when i take a look at the results. I'm pasting a rough example what i get when i use On error resume next.
Sub Identity_Finder()
Dim http As New XMLHTTP60, html As New HTMLDocument
Dim post As Object, link As Variant, refined_links As String
Dim rxp As New RegExp, identity As Object
For Each link In [{"http://spltech.in/","http://www.unifrostindia.com/","http://www.unitfrostindia.com/","http://www.greenplanet.in/"}]
With http
.Open "GET", link, False
.send '''throws here the first error if the link is invalid
html.body.innerHTML = .responseText
End With
For Each post In html.getElementsByTagName("a")
If InStr(post.innerText, "certain_name") > 0 Then refined_links = post.href: Exit For
Next post
With http
.Open "GET", refined_links, False
.send ''throws another error here if no such link is found
End With
With rxp
.Pattern = "some_regex"
.Global = True
Set identity = .Execute(http.responseText)
End With
r = r + 1: Cells(r, 1) = link
Cells(r, 2) = identity(0) ''''throws another error here if no such identity is noticed
Next link
End Sub
Upon using On error resume next What i get:
John executive
Mac lawyer
lulu lawyer
Robin lawyer
Cathy student
Expected output:
John executive
Mac lawyer
lulu
Robin
Cathy student
The empty fields (when they are not found) are getting filled in with the previous values when i use On error resume next. How can I get around this misleading result? Thanks in advance.
The most efficient way to error trap in VBA is to
1) actually test the inputs / results before running them either through custom-made functions or built-in coding concepts or a mix of both.
2) Use VBA built-in error-handling if absolutely needed
Example 1
For example. You can wrap this statement with a custom function to test if a URL is valid or not.
With http
.Open "GET", link, False
.send '''throws here the first error if the link is invalid
html.body.innerHTML = .responseText
End With
If ValidURL Then
With http
.Open "GET", link, False
.send
html.body.innerHTML = .responseText
End With
End If
Where ValidURL is a function defined as:
Function ValidURL(URL as String) as Boolean
Dim result as Boolean
'I don't know how you would specify a valid link in your specific case
'but that code goes here
'a dummy example follows
result = Left(URL,7) = "http://"
ValidURL = result 'True or False
End Function
Example 2
I assume in this statement:
With http
.Open "GET", refined_links, False
.send ''throws another error here if no such link is found
End With
there is a specific error number (code) that is produced when no such link is found. Discover that number and use this code to bypass.
With http
.Open "GET", refined_links, False
On Error Resume Next
.Send
On Error GoTo 0
End With
If err.Number <> 9999 'replace with correct number
'continue with regex test
End If
PUTTING IT ALL TOGETHER
Finally putting that all together you can build like so, with minimal use of On Error Resume Next and no GoTo statements.
For Each link In [{"http://spltech.in/","http://www.unifrostindia.com/","http://www.unitfrostindia.com/","http://www.greenplanet.in/"}]
If ValidURL(link) Then
With http
.Open "GET", link, False
.send
html.body.innerHTML = .responseText
End With
For Each post In html.getElementsByTagName("a")
If InStr(post.innerText, "certain_name") > 0 Then refined_links = post.href: Exit For
Next post
With http
.Open "GET", refined_links, False
On Error Resume Next
.Send
On Error GoTo 0
End With
If err.Number <> 9999 'replace with correct number
With rxp
.Pattern = "some_regex"
.Global = True
Set identity = .Execute(http.responseText)
End With
'i will leave it to you on how to account for no pattern match
r = r + 1: Cells(r, 1) = link
Cells(r, 2) = identity(0) ''''throws another error here if no such identity is noticed
End If
End If
Next link
Related
running following code i get a .readystate = 1, i do not understand why. Is anyone able to help me ? As well and more globally, i am looking for some documentation on using MSXML2.ServerXMLHTTP or XMLHTTP60, is there a website for noobs on this topic ?*
thanks in advance !
Sub a()
Dim wwwPage As Object
Dim Email As String
DoEvents
On Error Resume Next
Set wwwPage = CreateObject("MSXML2.ServerXMLHTTP")
With wwwPage
.Open "GET", "https://runninggeek.be/annuaire-des-clubs/", False
.send
Position = 1
If .Status = 200 And .readyState = 4 Then
Email = "xxx"
End If
End With
End Sub
Am expecting .readystate 4. I see sometimes we need to use headers but i do not know why nor how, nor even how i can find what header to use
I've written a script in vba to get some set names from a webpage and the script is getting them accordingly until it catches an error somewhere within the execution. This is the first time I encountered such error.
What my script is doing is get all the links under Company Sets and then tracking down each of the links it goes one layer deep and then following all the links under Set Name it goes another layer deep and finally parse the table from there. I parsed the name of PUBLISHED SET which is stored within the variable bName instead of the table as the script is getting bigger. I used IE to get the PUBLISHED SET as there are few leads which were causing encoding issues.
I searched through all the places to find any workaround but no luck.
However, I came across this thread where there is a proposed solution written in vb but can't figure out how can I make it work within vba.
Script I'm trying with:
Sub FetchRecords()
Const baseUrl$ = "https://www.psacard.com"
Const link = "https://www.psacard.com/psasetregistry/baseball/company-sets/16"
Dim IE As New InternetExplorer, Htmldoc As HTMLDocument
Dim Http As New XMLHTTP60, Html As New HTMLDocument, bName$, tRow As Object
Dim post As Object, elem As Object, posts As Object, I&, R&, C&
Dim key As Variant
Dim idic As Object: Set idic = CreateObject("Scripting.Dictionary")
With Http
.Open "GET", link, False
.send
Html.body.innerHTML = .responseText
End With
Set posts = Html.querySelectorAll(".dataTable tr td a[href*='/psasetregistry/baseball/company-sets/']")
For I = 0 To posts.Length - 7
idic(baseUrl & Split(posts(I).getAttribute("href"), "about:")(1)) = 1
Next I
For Each key In idic.Keys
With Http
.Open "GET", key, False
.send
Html.body.innerHTML = .responseText
End With
For Each post In Html.getElementsByTagName("a")
If InStr(post.getAttribute("title"), "Contact User") > 0 Then
If InStr(post.ParentNode.getElementsByTagName("a")(0).getAttribute("href"), "publishedset") > 0 Then
IE.Visible = True
IE.navigate baseUrl & Split(post.ParentNode.getElementsByTagName("a")(0).getAttribute("href"), "about:")(1)
While IE.Busy = True Or IE.readyState < 4: DoEvents: Wend
Set Htmldoc = IE.document
bName = Htmldoc.querySelector("h1 b.text-primary").innerText
If InStr(bName, "/") > 0 Then bName = Split(Htmldoc.querySelector(".inline-block a[href*='/contactuser/']").innerText, " ")(1)
R = R + 1: Cells(R, 1) = bName
End If
End If
Next post
Next key
IE.Quit
End Sub
I get that error pointing at the following line after extracting records between 70 to 90:
bName = Htmldoc.querySelector("h1 b.text-primary").innerText
The error looks like:
Automation Error: old format or invalid type library
Proposed solution in the linked thread written in vb (can't convert to vba):
'save the current settings for easier restoration later
Dim oldCI As System.Globalization.CultureInfo = _
System.Threading.Thread.CurrentThread.CurrentCulture
'change the settings
System.Threading.Thread.CurrentThread.CurrentCulture = _
New System.Globalization.CultureInfo("en-US")
Your code here
'restore the previous settings
System.Threading.Thread.CurrentThread.CurrentCulture = oldCI
I found the following code on an excel forum:
Function URLExists(url As String) As Boolean
Dim Request As Object
Dim ff As Integer
Dim rc As Variant
On Error GoTo EndNow
Set Request = CreateObject("WinHttp.WinHttpRequest.5.1")
With Request
.Open "GET", url, False
.Send
rc = .StatusText
End With
Set Request = Nothing
If rc = "OK" Then URLExists = True
Exit Function
EndNow:
End Function
I see no reason why it would not work if I use a legitiamte URL. I have been testing it and keep on getting a FALSE returned. Opening the same URL in a browser works.
I suspect that this might have something to do with our coporate security settings. How can I test whether the code works or if it is a security issue?
The function goes to the error line on .Send
I have an excel-sheet which contains many links.
How do I open them all at once in a new tab with my default browser?
Like this? Included checking url is valid (basic check). The advantage here is you adapt to log information about the response from the URL.
Option Explicit
Sub TEST()
Dim h As Hyperlink
For Each h In ActiveSheet.Hyperlinks
If UrlOK(h.Address) Then h.Follow
Next h
End Sub
Public Function UrlOK(ByVal url As String) As Boolean
Dim request As Object
Dim respCode As Long
On Error Resume Next
Set request = CreateObject("WinHttp.WinHttpRequest.5.1")
With request
.Open "GET", url, False
.Send
respCode = .Status
End With
If respCode = 200 Then UrlOK = True
On Error GoTo 0
End Function
Edit: Thanks to #Omegastripes for noting
1) If you use MSXML2.XMLHTTP over WinHttp.WinHttpRequest.5.1 you get a more reliable result
Benefits include (amongst others):
A) Simplified code to open a URL.
B) Separate sessions do not impact each other.
C) Protected Mode IE Support
D) Credential Cache
2) Use HEAD over GET, in the request, to reduce network traffic
With a HEAD request, a server will only return the headers of a resource, rather than the resource itself.
So you could use a revised, more efficient function, as follows:
Public Function UrlOK(ByVal url As String) As Boolean
Dim request As Object
Dim respCode As Long
On Error Resume Next
Set request = CreateObject("MSXML2.XMLHTTP")
With request
.Open "HEAD", url, False
.Send
respCode = .Status
End With
If respCode = 200 Then UrlOK = True
On Error GoTo 0
End Function
Image of code in a standard module and where to place cursor to execute Test sub.
That's pretty easy in VBA
Sub OpenAll()
Dim H As Hyperlink
For Each H In ActiveWorkbook.ActiveSheet.UsedRange.Hyperlinks
H.Follow
Next
End Sub
If there are invalid URLs you can stop the code from erroring like this:
Sub OpenAll()
Dim H As Hyperlink
For Each H In ActiveWorkbook.ActiveSheet.Hyperlinks
On Error Resume Next
H.Follow
On Error GoTo 0
Next
End Sub
Running my code for a site to crawl the titles of different tutorials spreading across several pages, I found it working flawless. I tried to write some code not depending on the last page number the url has but on the status code until it shows http.status<>200. The code I'm pasting below is working impeccably in this case. However, Trouble comes up when I try to use another url to see whether it breaks automatically but found that the code did fetch all the results but did not break. What is the workaround in this case so that the code will break when it is done and stop the macro? Here is the working one?
Sub WiseOwl()
Const mlink = "http://www.wiseowl.co.uk/videos/default"
Dim http As New XMLHTTP60, html As New HTMLDocument
Dim post As Object
Do While True
y = y + 1
With http
.Open "GET", mlink & "-" & y & ".htm", False
.send
If .Status <> 200 Then
MsgBox "It's done"
Exit Sub
End If
html.body.innerHTML = .responseText
End With
For Each post In html.getElementsByClassName("woVideoListDefaultSeriesTitle")
With post.getElementsByTagName("a")
x = x + 1
If .Length Then Cells(x, 1) = .item(0).innerText
End With
Next post
Loop
End Sub
I found a logic to get around with yellowpage. My update script is able to parse yellowpage but breaks before scraping the last page because there is no "Next Page" button. I tried with this:
"https://www.dropbox.com/s/iptqm79b0byw3dz/Yellowpage.txt?dl=0"
However, the same logic I tried to apply with torrent site but it doesn't work here:
"https://www.yify-torrent.org/genres/western/p-1/"
You can always rely on elements if they exits or not. Here for example, if you try to use the object which you have set your element to, you will get:
Run-time error '91': Object variable or With block variable not set
This is the key you should be looking for to put an end to your code. Please see the below example:
Sub yify()
Const mlink = "https://www.yify-torrent.org/genres/western/p-"
Dim http As New XMLHTTP60, html As New HTMLDocument
Dim post As Object
Dim posts As Object
y = 1
Do
With http
.Open "GET", mlink & y & "/", False
.send
html.body.innerHTML = .responseText
End With
Set posts = html.getElementsByClassName("mv")
On Error GoTo Endofpage
Debug.Print Len(posts) 'to force Error 91
For Each post In posts
With post.getElementsByTagName("div")
x = x + 1
If .Length Then Cells(x, 1) = .Item(0).innerText
End With
Next post
y = y + 1
Endofpage:
Loop Until Err.Number = 91
Debug.Print "It's over"
End Sub