Fetch all the href links of a webpage using vba selenium - vba

Option Explicit
Private chr As Selenium.ChromeDriver
Sub Test()
Set chr = New Selenium.ChromeDriver
chr.Start
chr.Get "https://www.grainger.com/category/pumps/rotary-pumps/rotary-gear-pump-heads?
categoryIndex=1"
chr.Wait 1000
Dim mylinks As Selenium.WebElements
Dim mylink As Selenium.WebElement
Dim i As Long
Set mylinks = chr.FindElementsByTag("a")
For i = 0 To mylinks.Count
For Each mylink In mylinks
If LCase(mylink.Attribute("data-testid")) = "product-detail-title" Then
Debug.Print mylink.Attribute("href")
Exit For
End If
Next mylink
i = i + 1
Next
End Sub
I was trying to extract all the href links present in that webpage link mentioned with the above code. But, the issue that i am facing is, whatever the first link it is fetching in the result, the same link is repeating in the next loop also in the result till mylinks.count. I think, i'm missing something here, kindly suggest what needs to be corrected here to get all the links as i needed. Please advise.

Related

VBA HTML Scraping Error "Object Variable or With Variable not set"

I was following a tutorial of this video https://www.youtube.com/watch?v=sGw6r5GVA5g&t=2803s made by the WiseOwlTutorials channel and got stuck at a listing procedure he explains at the 36:00 position of the video.
At that point, he starts to explain how to return the video url and name of a video list from a specific category through a iteration method called Sub ListVideosOnPage(VidCatName As String, VidCatURL As String) used in another module which loops through all video categories of their website main video page https://www.wiseowl.co.uk/videos (left corner menu list).
When this procedure starts, it goes inside each video category and get the name and url of each video from that category in order to list it on a page which, in that part of the Youtube video cited above, is a debug page. However, the actual WiseOwl Video page is diferente from that when the tutorial video was made.
So, I changed his method a little in order to put the correct elements on the debbugin page, as shown below:
Sub ListVideosOnPage(VidCatName As String, VidCatURL As String)
Dim XMLReq As New MSXML2.XMLHTTP60
Dim HTMLDoc As New MSHTML.HTMLDocument
Dim VidTables As MSHTML.IHTMLElementCollection
Dim VidTable As MSHTML.IHTMLElement
Dim VidRows As MSHTML.IHTMLElementCollection
Dim VidRow As MSHTML.IHTMLElement
Dim VidLink As MSHTML.IHTMLElement
XMLReq.Open "GET", VidCatURL, False
XMLReq.send
If XMLReq.Status <> 200 Then
MsgBox "Problem" & vbNewLine & XMLReq.Status & " - " & XMLReq.statusText
Exit Sub
End If
HTMLDoc.body.innerHTML = XMLReq.responseText
'get the table element in each video category found by other module
'VidTables tag added by me to get the new element on the WiseOwl website
Set VidTables = HTMLDoc.getElementsByTagName("table")
'loop starts to search for row and link tags on the current table
For Each VidTable In VidTables
Set VidRows = VidTable.getElementsByTagName("tr")
For Each VidRow In VidRows
Set VidLink = VidRow.getElementsByTagName("a")(0) 'just pick the first link
Debug.Print VidRow.innerText, VidRow.getattribute("href") 'objetc variable not set error happpens here
Next VidRow
Next VidTable
End Sub
I found a way to circumvent this Object Variable or With Variable not set error by changing the code inside vidrow loop, adding a manual index to the code to get only the first link in each row:
For Each VidTable In VidTables
Set VidRows = VidTable.getElementsByTagName("tr")
For Each VidRow In VidRows
Index = 0
For Each VidLink In VidLinks
If Index = 0 Then
Debug.Print VidLink.innerText, VidLink.getAttribute("href")
Index = Index + 1
End If
Next VidLink
Next VidRow
Next VidTable
But, in the turorial video referenced above, the instructor doesnt get this error when he codes indexes in the way shown below:
VidLink = VidRow.getElementsByTagName("a")(0)
Debug.Print VidRow.innerText, VidRow.getattribute("href")
So my question is how do I get these object variable not set errors and in the tutorial video the instructor doesnt? Looks like the same code to me, with each element defined in the right way and a much more efficient way to code then using if's. Could anyone more used to VBA please help with an answer this? Maybe I missing something.
tl:dr:
I first give you the debug and fix info;
I go on to show you a different way using CSS selectors to target the page styling. This is generally faster, more robust and more flexible;
VidCatName doesn't appear to be used but I have left in for now. I personally would remove unless you will later develop the code to use this variable. The second sub parameters are passed by value so I have added ByVal to the signature.
① Debugging:
Your error is because you are looping all table rows and trying to access a tags and then href attributes. The first row of each table is the header row and this doesn't have a tag elements, nor associated href attributes. See image below:
Table element on page:
See that the first tr tagged element in the table contains a child th tag element, indicating it is the table header, and that there is no associated a tag element.
Kind of like you were shown elsewhere in that video, you want to change your loop to a For Next, and then, in this case, start from index 1 to skip the header row.
So, the part containing this line: For Each VidRow In VidRows , becomes the following:
Dim VidRowID As Long
For Each VidTable In VidTables
Set VidRows = VidTable.getElementsByTagName("tr")
For VidRowID = 1 To VidRows.Length - 1 'first row is actually header which doesn't have an a tag or href
Set VidLink = VidRows(VidRowID).getElementsByTagName("a")(0)
Debug.Print VidLink.innerText, VidLink.getAttribute("href")
Next VidRowID
Next VidTable
There is also only one table per page so a loop of all tables is unnecessary code in this case.
Example full call (using your code with just the change in loop type):
Option Explicit
Public Sub test()
ListVideosOnPage "Business Intelligence (70)", "https://www.wiseowl.co.uk/business-intelligence/videos/"
End Sub
Public Sub ListVideosOnPage(ByVal VidCatName As String,ByVal VidCatURL As String)
Dim XMLReq As New MSXML2.XMLHTTP60
Dim HTMLDoc As New MSHTML.HTMLDocument
Dim VidTables As MSHTML.IHTMLElementCollection
Dim VidTable As MSHTML.IHTMLElement
Dim VidRows As MSHTML.IHTMLElementCollection
Dim VidRow As MSHTML.IHTMLElement
Dim VidLink As MSHTML.IHTMLElement
XMLReq.Open "GET", VidCatURL, False
XMLReq.send
If XMLReq.Status <> 200 Then
MsgBox "Problem" & vbNewLine & XMLReq.Status & " - " & XMLReq.statusText
Exit Sub
End If
HTMLDoc.body.innerHTML = XMLReq.responseText
Set VidTables = HTMLDoc.getElementsByTagName("table") 'Should limit to just one table
Dim VidRowID As Long
For Each VidTable In VidTables
Set VidRows = VidTable.getElementsByTagName("tr")
For VidRowID = 1 To VidRows.Length - 1 'first row is actually header which doesn't have an a tag or href
Set VidLink = VidRows(VidRowID).getElementsByTagName("a")(0)
Debug.Print VidLink.innerText, VidLink.getAttribute("href")
Next VidRowID
Next VidTable
End Sub
② CSS selectors:
I would instead use a CSS selector combination to target the a tag elements within the target parent table element. This is written as .bpTable a. A more official term for this combination is descendant selector.
The descendant combinator — typically represented by a single space (
) character — combines two selectors such that elements matched by the
second selector are selected if they have an ancestor element matching
the first selector. Selectors that utilize a descendant combinator are
called descendant selectors.
The .bpTable is in fact itself a class selector (like .getElementsByClassName). The class part indicated by the leading ".". So, elements with class name bpTable; which is the class name of the target table on each page.
Target table element on page:
This selector is applied via the .querySelectorAll method of .document and returns a static nodeList. You can then loop the .Length of this nodeList, from 0 to .Length -1, accessing elements by index.
Public Sub ListVideosOnPage(ByVal VidCatName As String, ByVal VidCatURL As String)
Dim XMLReq As New MSXML2.XMLHTTP60
Dim HTMLDoc As New MSHTML.HTMLDocument
XMLReq.Open "GET", VidCatURL, False
XMLReq.send
If XMLReq.Status <> 200 Then
MsgBox "Problem" & vbNewLine & XMLReq.Status & " - " & XMLReq.statusText
Exit Sub
End If
HTMLDoc.body.innerHTML = XMLReq.responseText
Dim aNodeList As Object, link As Long
Set aNodeList = HTMLDoc.querySelectorAll(".bpTable a")
For link = 0 To aNodeList.Length - 1
Debug.Print aNodeList(link).innerText, aNodeList(link).href
Next
End Sub
References (VBE > Tools > References):
Microsoft HTML Object Library
Microsoft XML, V6.0 'For my Excel 2016 version

VBA : Finding nested elements in HTML Document

I am trying to scrape all the href links of products from this link.
I am using the following code to get the product links on the page:
Sub urlCatch()
Dim internet As Object
Dim internetdata As Object
Dim div_result As Object
Dim header_links As Object
Dim link, itm As Object
Dim url As String
Dim X As Variant
Set internet = CreateObject("InternetExplorer.Application")
internet.Visible = True
url = "http://www.dell.com/vg/p/desktops.aspx?c=vg&cs=vgdhs1&l=en&s=dhs&~ck=mn"
internet.Navigate url
Do Until internet.ReadyState >= 4
DoEvents
Loop
Set internetdata = internet.document
Set div_result = internetdata.getelementsbyclassname("categorySubNavigation").getelementsbyclassname("c4 seriesOptions")
Set header_links = div_result.getelementsbytagname("a")
For Each itm In header_links
Set link = itm.ChildNodes.Item(0)
Cells(Range("A" & Rows.Count).End(xlUp).Row + 1, 1) = link.href
Next
End Sub
I am getting an error at div_reult: "object doesn't support this property or method".
This is the first time I am using an element to retrieve something, so maybe I might be making mistakes which I am currently unaware off.
Please look into my code, and let me know what blunder I am making here.
I am told by one of my friends that I might need to use a regex to get all these links, but I wanted to get a hold of this method first.
Please give me some guidance. Thanks.
GetElementsByTagName or other fetch methods work on single element and not over a collection. You need another loop.
Sub urlCatch()
Dim url As String
Dim internet As Object
Dim internetdata
Dim div_result
Dim links
Dim itm
Dim itm2
Set internet = CreateObject("InternetExplorer.Application")
internet.Visible = True
url = "http://www.dell.com/vg/p/desktops.aspx?c=vg&cs=vgdhs1&l=en&s=dhs&~ck=mn"
internet.Navigate url
Do
DoEvents
Loop Until internet.ReadyState >= 4 And Not internet.busy
Set internetdata = internet.document.body
Set div_result = internetdata.getelementsbyclassname("c4 seriesOptions")
For Each itm In div_result
Set links = itm.getElementsByTagName("A")
For Each itm2 In links
Cells(Range("A" & Rows.Count).End(xlUp).Row + 1, 1) = itm2.href
Next
Next
End Sub

Refactor code to break links so Variant type is not used

How do I refactor the following sub-routine so it does not use the Variant data type?
Sub BreakAllLinks()
Dim Link As Variant
Dim myLinks As Variant
myLinks = Excel.ActiveWorkbook.LinkSources(Type:=Excel.xlLinkTypeExcelLinks)
For Each Link In myLinks
Excel.ActiveWorkbook.BreakLink Name:=Link, Type:=Excel.xlLinkTypeExcelLinks
Next Link
End Sub
Here's how you could do it with no Variants - but you shouldn't.
Sub BreakAllLinks()
Dim myLinks() As String
Dim LinkIdx As Long
Dim Link As String
ReDim myLinks(1 To UBound(ActiveWorkbook.LinkSources(xlLinkTypeExcelLinks)))
For LinkIdx = LBound(myLinks) To UBound(myLinks)
myLinks(LinkIdx) = ActiveWorkbook.LinkSources(xlLinkTypeExcelLinks)(LinkIdx)
Next LinkIdx
For LinkIdx = LBound(myLinks) To UBound(myLinks)
Link = myLinks(LinkIdx)
ActiveWorkbook.BreakLink Link, xlLinkTypeExcelLinks
Next LinkIdx
End Sub
That's a little over-the-top on purpose to demonstrate all the data types involved. You can only For..Each an array with a Variant - it's just how the language is written. The best practice isn't 'don't use Variants' but rather 'Use the most restrictively typed variable that you can'. In your case, the Variant is the most restrictively typed variable you can use.
There is a way to write that without Variants and not so obviously crazy
Sub BreakAllLinks()
Dim LinkIdx As Long
For LinkIdx = LBound(ActiveWorkbook.LinkSources(1)) To UBound(ActiveWorkbook.LinkSources(1))
ActiveWorkbook.BreakLink ActiveWorkbook.LinkSources(1)(1), xlLinkTypeExcelLinks
Next LinkIdx
End Sub
But even then, I'd opt for the Variant. It's worth the trade off.
A Linksource is a String.
But why bother ?
Sub M_snb()
For Each it In ActiveWorkbook.LinkSources(1)
MsgBox = TypeName(it)
ActiveWorkbook.BreakLink it, 1
Next
End Sub

Excel vba macro to connect to a particular webpage - search and retrieve data

I have column, say column A containing 1500 rows each having a string (Hexadecimal encodes). What I need is connect to a particular website search paste the string, press on decode, copy the result and paste it back to column B.
Any help would be of great help. I am new here.
Example:
String in Column A: 5468616e6b732061206c6f7420696e20616476616e6365
Website to search in: http://encodertool.com/hexadecimal
Copy from excel cell and paste in tab (under heading): ENTER AN Hexadecimal CONTENT TO DECODE
Then hit DECODE
Then Copy from DECODING RESULT
Finally paste back in ColumnB in my excel sheet.
Looking forward for an answer.
Thanks a million in advance.
Are you doing this as an exercise in automating the browser? Seems like you could more easily do it directly in VBA
From: http://bytes.com/topic/access/answers/874752-convert-hex-string
Sub tester()
Debug.Print fConvertHexToString( _
"5468616e6b732061206c6f7420696e20616476616e6365")
End Sub
Public Function fConvertHexToString(strHexString As String) As String
Dim intLenOfString As Integer
Dim intCounter As Integer
Dim strBuild As String
'Hex String must have a valid length, and it must be an even length
If Len(strHexString) = 0 Or Len(strHexString) Mod 2 <> 0 Then Exit Function
intLenOfString = Len(strHexString)
For intCounter = 1 To Len(strHexString)
If intCounter Mod 2 <> 0 Then 'need Hex pairs
'Retrieve the Value of the Hex Pair, then Convert to a Character,
'then Append to a Base String
strBuild = strBuild & Chr$(Val("&H" & Mid$(strHexString, intCounter, 2)))
End If
Next
fConvertHexToString = strBuild
End Function
Something like this. I have just run a mock test and it works. Give it a try. You can modify the code to your needs. This is a plain code. Code can be enhanced as well. But this does what you ask for
Dim ie As InternetExplorer
Dim doc As HTMLDocument
Sub start()
Dim ran As Range
Dim cel As Excel.Range
Set ran = Worksheets("Sheet1").Range("A1:A4") 'Change Your input range here
For Each cel In ran
If cel.Value <> Empty Then
Set ie = New InternetExplorerMedium 'open iE
ie.navigate ("http://encodertool.com/hexadecimal") 'Navigate to IE
ie.Visible = True
'Wait untill IE is loaded
Do
' Wait till the Browser is loaded
Loop Until ie.readyState = READYSTATE_COMPLETE
Set doc = ie.document
doc.getElementById("input_4").innerText = cel.Value ' Enter input value
test ' Click button
cel.Offset(0, 1).Value = doc.getElementById("output_4").innerText ' save Output value
End If
ie.Quit
Next cel
End Sub
'Click the Decode button
Sub test()
Set cl_button= doc.getElementsByTagName("a")
For Each one In cl_button
If one.getAttribute("onclick") = "ajaxfct('fcts.php','4')" Then
one.Click
Exit For
End If
Next one
End Sub
Before running the code, add reference to HTML object library & Internet controls. Also change the range of your input. I have set it to A1:A4 . Change to it whatever. Make sure there are no blank cells in the Range. ALso, If you dont want the browser to be displayed set
ie.visible = false
This is one way of doing it. THere are many simpler and effective ways of doing it

How can click on a javascript show link programatically with vb.net?

I'm trying to develop a new feature for our vb.net order entry system. At the moment I provide an assisted paypal login which loops through transactions and copies the transactions. My program then looks at this data and copies it into text boxes. The operator then approves and saves the record.
EDIT: I need to see the transaction in PayPal.
So my code uses IHTMLFormElement and loops round form elements and adds values. However I only really use this to log in to paypal. See my code...
Dim theObject As Object = Nothing
theObject = "https://www.paypal.com/cgi-bin/webscr?cmd=_login-run"
WebBrowPayPal.AxWebBrowser1.Navigate2(theObject)
While WebBrowPayPal.AxWebBrowser1.ReadyState <>
tagREADYSTATE.READYSTATE_COMPLETE
Application.DoEvents()
End While
Dim HtmlDoc As IHTMLDocument2 = CType(WebBrowPayPal.AxWebBrowser1.Document,
IHTMLDocument2)
Dim FormCol As IHTMLElementCollection = HtmlDoc.forms
Dim iForms As Integer = FormCol.length
Dim i As Integer
Dim x As Integer
For i = 0 To iForms - 1
Dim oForm As IHTMLFormElement = CType(FormCol.item(CType(i, Object),
CType(i, Object)), IHTMLFormElement)
For x = 0 To oForm.length - 1
If oForm.elements(x).tagname = "INPUT" Then
If oForm.elements(x).name = "login_email" Then
oForm.elements(x).value = "PayPal#mydomain.com"
End If
If oForm.elements(x).name = "login_password" Then
oForm.elements(x).value = "mypassword"
End If
If oForm.elements(x).type = "submit" Or _
oForm.elements(x).type = "SUBMIT" Then
oForm.elements(x).click()
End If
End If
Next
Next i
I'm now trying this page
https://www.paypal.com/uk/cgi-bin/webscr?cmd=_history&nav=0.3.0
Which is the history page, which allows you to search on the paypal transaction id.
Unfortunately you need to click on 'find a transaction' which then uses some javascript to shows the post fields. So the problem is that the fields I need to use are hidden.
How can I click on this javascript link in code ?
I'm not sure if this will help but you might want to try executing the script (which fires when you're clicking "Find a transaction") directly using IHTMLDocument2.write method:
Dim HtmlDoc As IHTMLDocument2 = CType(WebBrowPayPal.AxWebBrowser1.Document,
IHTMLDocument2)
HtmlDoc.write("<script>[Search button event handler]</script>")
UPDATE
I managed to get it working. Here is a form Load event handler (WinForms) that causes the Web Browser control to load http://www.google.com and then clicks "I'm feeling lucky" button:
Imports mshtml
Public Class Form1
Private Sub Form1_Load(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles MyBase.Load
Dim clicked As Boolean = False
Dim doc As IHTMLDocument2 = Nothing
Dim form As IHTMLFormElement = Nothing
Dim input As HTMLInputElement = Nothing
Dim forms As IHTMLElementCollection = Nothing
AxWebBrowser1.Navigate2("http://www.google.com")
While AxWebBrowser1.ReadyState <> SHDocVw.tagREADYSTATE.READYSTATE_COMPLETE
Application.DoEvents()
End While
doc = CType(AxWebBrowser1.Document, IHTMLDocument2)
For i As Integer = 0 To doc.forms.length - 1
form = CType(doc.forms.item(i, i), IHTMLFormElement)
For j As Integer = 0 To form.length - 1
If TypeOf (form.elements(j)) Is HTMLInputElement Then
input = CType(form.elements(j), HTMLInputElement)
If String.Compare(input.name, "btnI", StringComparison.InvariantCultureIgnoreCase) = 0 Then
input.click()
clicked = True
Exit For
End If
End If
Next
If clicked Then
Exit For
End If
Next
End Sub
End Class
If you need to click on a link then just change HTMLInputElement to HTMLLinkElement (in declaration and cast operation).
I suppose you're not using WinForms but if you do so, I'd suggest you to switch to the .NET version of the WebBrowser control.
Hope this will help you.
-- Pavel