vba selenium sending keys - vba

I am trying to data scrape LinkedIn page. I successfully sign in, input search value to inputbox, but after cannot send key enter. Code is below. I think there was something like for enter sendkeys. "~". Whats is interesting why the code like webdriver.sendkeys("inputvalue",keys.enter) gives an error?
Sub LinkdNIn_SlnIUM_login()
Dim webdriber As WebDriver
Set webdriber = New ChromeDriver
Const URL = "https://www.linkedin.com/"
webdriber.Start "chrome", URL
webdriber.Get "/"
webdriber.Wait 3000
On Error Resume Next
Dim a As WebElements
Set a = webdriber.FindElementsByClass("authwall-join-form__form-toggle--bottom")
a.Item(1).Click
Set a = webdriber.FindElementsById("session_key")
a.First.Click
a.First.SendKeys Worksheets(1).Range("I6").Value
webdriber.Wait 3000
Set a = webdriber.FindElementsById("session_password")
a.Item(1).Click
webdriber.Wait 3000
a.First.SendKeys Worksheets(1).Range("I7").Value
webdriber.Wait 3000
Set a = webdriber.FindElementsByClass("sign-in-form__submit-button")
a.Item(1).Click
webdriber.Wait 3000
Set a = webdriber.FindElementsByClass("secondary-action")
a.Item(1).Click
webdriber.Wait 3000
Set a = webdriber.FindElementsByClass("search-global-typeahead__collapsed-search-button")
a.Item(1).Click
webdriber.Wait 3000
Set a = webdriber.FindElementsByClass("search-global-typeahead__input")
a.Item(1).Click
webdriber.Wait 3000
a.Item(1).SendKeys ("data analyst in jobs")
a.Item(1).SendKeys Keys.Enter ' error object is required
End Sub
I tried with no result;
a.Item(1).SendKeys ("data analyst in jobs")
webdriber.Wait 3000
a.Item(1).SendKeys "{ENTER}"

try
a.Item(1).SendKeys ("data analyst in jobs" & vbCrLf)

Related

Can't get rid of "old format or invalid type library" error in vba

I've written a script in vba to get some set names from a webpage and the script is getting them accordingly until it catches an error somewhere within the execution. This is the first time I encountered such error.
What my script is doing is get all the links under Company Sets and then tracking down each of the links it goes one layer deep and then following all the links under Set Name it goes another layer deep and finally parse the table from there. I parsed the name of PUBLISHED SET which is stored within the variable bName instead of the table as the script is getting bigger. I used IE to get the PUBLISHED SET as there are few leads which were causing encoding issues.
I searched through all the places to find any workaround but no luck.
However, I came across this thread where there is a proposed solution written in vb but can't figure out how can I make it work within vba.
Script I'm trying with:
Sub FetchRecords()
Const baseUrl$ = "https://www.psacard.com"
Const link = "https://www.psacard.com/psasetregistry/baseball/company-sets/16"
Dim IE As New InternetExplorer, Htmldoc As HTMLDocument
Dim Http As New XMLHTTP60, Html As New HTMLDocument, bName$, tRow As Object
Dim post As Object, elem As Object, posts As Object, I&, R&, C&
Dim key As Variant
Dim idic As Object: Set idic = CreateObject("Scripting.Dictionary")
With Http
.Open "GET", link, False
.send
Html.body.innerHTML = .responseText
End With
Set posts = Html.querySelectorAll(".dataTable tr td a[href*='/psasetregistry/baseball/company-sets/']")
For I = 0 To posts.Length - 7
idic(baseUrl & Split(posts(I).getAttribute("href"), "about:")(1)) = 1
Next I
For Each key In idic.Keys
With Http
.Open "GET", key, False
.send
Html.body.innerHTML = .responseText
End With
For Each post In Html.getElementsByTagName("a")
If InStr(post.getAttribute("title"), "Contact User") > 0 Then
If InStr(post.ParentNode.getElementsByTagName("a")(0).getAttribute("href"), "publishedset") > 0 Then
IE.Visible = True
IE.navigate baseUrl & Split(post.ParentNode.getElementsByTagName("a")(0).getAttribute("href"), "about:")(1)
While IE.Busy = True Or IE.readyState < 4: DoEvents: Wend
Set Htmldoc = IE.document
bName = Htmldoc.querySelector("h1 b.text-primary").innerText
If InStr(bName, "/") > 0 Then bName = Split(Htmldoc.querySelector(".inline-block a[href*='/contactuser/']").innerText, " ")(1)
R = R + 1: Cells(R, 1) = bName
End If
End If
Next post
Next key
IE.Quit
End Sub
I get that error pointing at the following line after extracting records between 70 to 90:
bName = Htmldoc.querySelector("h1 b.text-primary").innerText
The error looks like:
Automation Error: old format or invalid type library
Proposed solution in the linked thread written in vb (can't convert to vba):
'save the current settings for easier restoration later
Dim oldCI As System.Globalization.CultureInfo = _
System.Threading.Thread.CurrentThread.CurrentCulture
'change the settings
System.Threading.Thread.CurrentThread.CurrentCulture = _
New System.Globalization.CultureInfo("en-US")
Your code here
'restore the previous settings
System.Threading.Thread.CurrentThread.CurrentCulture = oldCI

Web-scraping across multipages without even knowing the last page number

Running my code for a site to crawl the titles of different tutorials spreading across several pages, I found it working flawless. I tried to write some code not depending on the last page number the url has but on the status code until it shows http.status<>200. The code I'm pasting below is working impeccably in this case. However, Trouble comes up when I try to use another url to see whether it breaks automatically but found that the code did fetch all the results but did not break. What is the workaround in this case so that the code will break when it is done and stop the macro? Here is the working one?
Sub WiseOwl()
Const mlink = "http://www.wiseowl.co.uk/videos/default"
Dim http As New XMLHTTP60, html As New HTMLDocument
Dim post As Object
Do While True
y = y + 1
With http
.Open "GET", mlink & "-" & y & ".htm", False
.send
If .Status <> 200 Then
MsgBox "It's done"
Exit Sub
End If
html.body.innerHTML = .responseText
End With
For Each post In html.getElementsByClassName("woVideoListDefaultSeriesTitle")
With post.getElementsByTagName("a")
x = x + 1
If .Length Then Cells(x, 1) = .item(0).innerText
End With
Next post
Loop
End Sub
I found a logic to get around with yellowpage. My update script is able to parse yellowpage but breaks before scraping the last page because there is no "Next Page" button. I tried with this:
"https://www.dropbox.com/s/iptqm79b0byw3dz/Yellowpage.txt?dl=0"
However, the same logic I tried to apply with torrent site but it doesn't work here:
"https://www.yify-torrent.org/genres/western/p-1/"
You can always rely on elements if they exits or not. Here for example, if you try to use the object which you have set your element to, you will get:
Run-time error '91': Object variable or With block variable not set
This is the key you should be looking for to put an end to your code. Please see the below example:
Sub yify()
Const mlink = "https://www.yify-torrent.org/genres/western/p-"
Dim http As New XMLHTTP60, html As New HTMLDocument
Dim post As Object
Dim posts As Object
y = 1
Do
With http
.Open "GET", mlink & y & "/", False
.send
html.body.innerHTML = .responseText
End With
Set posts = html.getElementsByClassName("mv")
On Error GoTo Endofpage
Debug.Print Len(posts) 'to force Error 91
For Each post In posts
With post.getElementsByTagName("div")
x = x + 1
If .Length Then Cells(x, 1) = .Item(0).innerText
End With
Next post
y = y + 1
Endofpage:
Loop Until Err.Number = 91
Debug.Print "It's over"
End Sub

Error when changing IE automation code to XML

I recently started working with XML automation and after changing some basic IE automation code over, I seem to be getting an error. Here's the HTML:
<tbody>
<tr class="group-2 first">
<td class="date-col">
<a href="/stats/matches/mapstatsid/48606/teamone-vs-merciless">
<div class="time" data-time-format="d/M/yy" data-unix="1498593600000">27/6/17</div>
</a>
</td>
......SOME MORE HTML HERE......
</tr>
......SOME MORE HTML HERE......
</tbody>
And here's the code i'm using in Excel VBA:
Sub readData()
Dim XMLPage As New MSXML2.XMLHTTP60
Dim html As New MSHTML.HTMLDocument
XMLPage.Open "GET", "https://www.hltv.org/stats/matches", False
XMLPage.send
If XMLPage.Status <> 200 Then MsgBox XMLPage.statusText
html.body.innerHTML = XMLPage.responseText
For Each profile In html.getElementsByTagName("tbody")(0).Children
Debug.Print profile.getElementsByClassName("date-col")(0).getElementsByTagName("a")(0).getAttribute("href") 'Run time error '438' here
Next
End Sub
I'm getting the Run time error '438' at the debug print code. seems to be happening when getting the class but i'm unsure why. It works fine if I use this for example:
Debug.Print profile.innertext
Worked for me:
Sub readData()
Dim XMLPage As New MSXML2.XMLHTTP60
Dim html As New MSHTML.HTMLDocument, links, a, i
XMLPage.Open "GET", "https://www.hltv.org/stats/matches", False
XMLPage.send
If XMLPage.Status <> 200 Then MsgBox XMLPage.statusText
html.body.innerHTML = XMLPage.responseText
Set links = html.querySelectorAll("td.date-col > a")
Debug.Print links.Length
For i = 0 To links.Length - 1
Debug.Print links(i).href
Next
Set links = Nothing
Set html = Nothing
End Sub
FYI when I used For Each to loop over the links collection Excel would reliably crash, so I'd stay with the loop shown
profile refers to a row, and profile.cells(0) will refer to the first column in that row. So try...
profile.cells(0).getElementsByTagName("a")(0).getAttribute("href")
Also, profile should be declared as HTMLTableRow.
The URL you are using isn't serving valid XML, but it's recoverable with some simple regex replacements. Once we have some valid XML, we can load that into a DOM document and use XPath to select the nodes as required:
Option Explicit
'Add references to:
' - MSXML v3
' - Microsoft VBScript Regular Expressions 5.5
Sub test()
Const START_MARKER As String = "<table class=""stats-table matches-table"">"
Const END_MARKER As String = "</table>"
With New MSXML2.XMLHTTP
.Open "GET", "https://www.hltv.org/stats/matches", False
.send
If .Status = 200 Then
'The HTML isn't valid XHTML, so we can't just use the http.XMLResponse DOMDocument
'Let's extract the HTML table
Dim tableStart As Long
tableStart = InStr(.responseText, START_MARKER)
Dim tableEnd As Long
tableEnd = InStr(tableStart, .responseText, END_MARKER)
Dim tableHTML As String
tableHTML = Mid$(.responseText, tableStart, tableEnd - tableStart + Len(END_MARKER))
'The HTML table has invalid img tags (let's add a closing tag with some regex)
With New RegExp
.Global = True
.Pattern = "(\<img [\W\w]*?)"">"
Dim tableXML As String
tableXML = .Replace(tableHTML, "$1"" />")
End With
'And load an XML document from the cleaned up HTML fragment
Dim doc As MSXML2.DOMDocument
Set doc = New MSXML2.DOMDocument
doc.LoadXML tableXML
End If
End With
If Not doc Is Nothing Then
'Use XPath to select the nodes we need
Dim nodes As MSXML2.IXMLDOMSelection
Set nodes = doc.SelectNodes("//td[#class='date-col']/a/#href")
'Enumerate the URLs
Dim node As IXMLDOMAttribute
For Each node In nodes
Debug.Print node.nodeTypedValue
Next node
End If
End Sub
Output:
/stats/matches/mapstatsid/48606/teamone-vs-merciless
/stats/matches/mapstatsid/48607/merciless-vs-teamone
/stats/matches/mapstatsid/48608/merciless-vs-teamone
/stats/matches/mapstatsid/48600/wysix-vs-fnatic-academy
/stats/matches/mapstatsid/48602/skitlite-vs-nexus
/stats/matches/mapstatsid/48604/extatus-vs-forcebuy
/stats/matches/mapstatsid/48605/extatus-vs-forcebuy
/stats/matches/mapstatsid/48599/planetkey-vs-gatekeepers
/stats/matches/mapstatsid/48603/gatekeepers-vs-planetkey
/stats/matches/mapstatsid/48595/wysix-vs-gambit
/stats/matches/mapstatsid/48596/kinguin-vs-playing-ducks
/stats/matches/mapstatsid/48597/spirit-academy-vs-tgfirestorm
/stats/matches/mapstatsid/48601/spirit-academy-vs-tgfirestorm
/stats/matches/mapstatsid/48593/fnatic-academy-vs-gambit
/stats/matches/mapstatsid/48594/alternate-attax-vs-nexus
/stats/matches/mapstatsid/48590/pro100-vs-playing-ducks
/stats/matches/mapstatsid/48583/extatus-vs-ex-fury
/stats/matches/mapstatsid/48589/extatus-vs-ex-fury
/stats/matches/mapstatsid/48584/onlinerol-vs-forcebuy
/stats/matches/mapstatsid/48591/forcebuy-vs-onlinerol
/stats/matches/mapstatsid/48581/epg-vs-veni-vidi-vici
/stats/matches/mapstatsid/48588/epg-vs-veni-vidi-vici
/stats/matches/mapstatsid/48592/veni-vidi-vici-vs-epg
/stats/matches/mapstatsid/48582/log-vs-gatekeepers
/stats/matches/mapstatsid/48586/gatekeepers-vs-log
/stats/matches/mapstatsid/48580/spraynpray-vs-epg
/stats/matches/mapstatsid/48579/quantum-bellator-fire-vs-spraynpray
/stats/matches/mapstatsid/48571/noxide-vs-masterminds
/stats/matches/mapstatsid/48572/athletico-vs-legacy
/stats/matches/mapstatsid/48578/node-vs-avant
/stats/matches/mapstatsid/48573/funky-monkeys-vs-grayhound
/stats/matches/mapstatsid/48574/grayhound-vs-funky-monkeys
/stats/matches/mapstatsid/48575/hegemonyperson-vs-eclipseo
/stats/matches/mapstatsid/48577/eclipseo-vs-hegemonyperson
/stats/matches/mapstatsid/48566/masterminds-vs-tainted-black
/stats/matches/mapstatsid/48562/grayhound-vs-legacy
/stats/matches/mapstatsid/48563/noxide-vs-riotous-raccoons
/stats/matches/mapstatsid/48564/avant-vs-dark-sided
/stats/matches/mapstatsid/48565/avant-vs-dark-sided
/stats/matches/mapstatsid/48567/eclipseo-vs-uya
/stats/matches/mapstatsid/48568/uya-vs-eclipseo
/stats/matches/mapstatsid/48560/uya-vs-new4
/stats/matches/mapstatsid/48561/new4-vs-uya
/stats/matches/mapstatsid/48559/jaguar-sa-vs-miami-flamingos
/stats/matches/mapstatsid/48558/spartak-vs-binary-dragons
/stats/matches/mapstatsid/48557/kungar-vs-spartak
/stats/matches/mapstatsid/48556/igamecom-vs-fragsters
/stats/matches/mapstatsid/48554/nordic-warthogs-vs-aligon
/stats/matches/mapstatsid/48555/binary-dragons-vs-kungar
/stats/matches/mapstatsid/48550/havu-vs-rogue-academy
Looking at the MSHTML.HTMLDocument reference there is no method getElementsByClassName.
You will need to loop through each row in the tbody you are selecting and then get the first td in that row and then get the first link in that td and read the href attribute from it. You could alternately compare the class attribute of the td but since it is the first element in the row there is no need to do that.

Half of the records are getting scraped out of 84

I've made a parser in VBA which is able to scrape the name from yellow page Canada. However, the issue is that the page contains 84 Names but my parser is scraping only 41 Names. How can I fix this? Any help would be my blessing. Thanks in advance. Here is the code:
http.Open "GET", "http://www.yellowpages.ca/search/si/1/Outdoor%20wedding/Edmonton", False
http.send
html.body.innerHTML = http.responseText
Set topics = html.getElementsByClassName("listing__name--link jsListingName")
For Each topic In topics
Cells(x, 1) = topic.innerText
x = x + 1
Next topic
Btw, I used the MSxml2.xmlhttp60 request.
If you look at the page's web requests, you'll notice it'll trigger another web request once the page has been scrolled past a certain point.
The format of the new requests is like this:
First 40 records: http://www.yellowpages.ca/search/si/1/Outdoor%20wedding/Edmonton
Next 40 records: http://www.yellowpages.ca/search/si/2/Outdoor%20wedding/Edmonton
Next 40 records: http://www.yellowpages.ca/search/si/3/Outdoor%20wedding/Edmonton
Basically for new data (in batches of 40 records) it increments part of the URL by 1.
Which is good news, we can just do a loop to return the results. Here's the code I came up with. For whatever reason, the getElementsByClassName selector wasn't working for me, so I worked around it in my code. If you can use that selector, use that instead of what I have below for that part.
Lastly, I added an explicit reference to Microsoft XML v6.0, so you should do the same to get this to function as it is.
Option Explicit
Public Sub SOTestScraper()
Dim topics As Object
Dim topic As Object
Dim webResp As Object
Dim i As Long
Dim j As Long
Dim mySheet As Worksheet: Set mySheet = ThisWorkbook.Sheets("Sheet1") ' Change this
Dim myArr() As Variant: ReDim myArr(10000) 'Probably overkill
For i = 1 To 20 ' unsure how many records you expect, I defaulted to 20 pages, or 800 results
Set webResp = getWebResponse(CStr(i)) ' return the web response
Set topics = webResp.getElementsByTagName("*") ' I couldn't find the className so I did this instead
If topics Is Nothing Then Exit For 'Exit the for loop if Status 200 wasn't received
For Each topic In topics
On Error Resume Next
'If getElementByClassName is working for you, use it
If topic.ClassName = "listing__name--link jsListingName" Then
myArr(j) = topic.InnerText
j = j + 1
End If
Next
Next
'add the data to Excel
ReDim Preserve myArr(j - 1)
mySheet.Range("A1:A" & j) = WorksheetFunction.Transpose(myArr)
End Sub
Function getWebResponse(ByVal pageNumber As String) As Object
Dim http As MSXML2.ServerXMLHTTP60: Set http = New MSXML2.ServerXMLHTTP60
Dim html As Object: Set html = CreateObject("htmlfile")
With http
.Open "GET", "http://www.yellowpages.ca/search/si/" & pageNumber & "/Outdoor%20wedding/Edmonton"
.send
.waitForResponse
html.body.innerHTML = .responseText
.waitForResponse
End With
If Not http.Status = 200 Then
Set getWebResponse = Nothing
Else
Set getWebResponse = html
End If
Set html = Nothing
Set http = Nothing
End Function

Input textbox value is empty when submiting a web form in vba

I am trying to submit a webform by giving all the values in form input text boxes but when i call the submit button click from excel vba, one of the text boxes is becoming empty and throwing a validation error.
Sub Click_Btn()
Dim objForms As Object
Dim vTxtInput As Variant
'Set objIE = GetIEApp
Set objIE = New InternetExplorer
'Make sure an IE object was hooked
If TypeName(objIE) = "Nothing" Then
MsgBox "Could not hook Internet Explorer object", _
vbCritical, "GetFields() Error"
'GoTo Clean_Up
End If
objIE.Navigate "http://fly3.emirates.com/CAB/IBE/SearchAvailability.aspx"
objIE.Visible = True
Sleep 5000
Set objForms = objIE.document.all
'Choose one way Flights
objIE.document.getElementById("ctl00_c_CtWNW_onewaySearch").Click
Sleep 1000
objIE.document.getElementById("ctl00_c_CtWNW_ddlTo-suggest").Value = "Sydney (SYD)"
' Departure
objIE.document.getElementById("ctl00_c_CtWNW_ddlFrom-suggest").Value = "Mumbai (BOM)"
' Departure Date
objIE.document.getElementById("ctl00_c_CtWNW_txtDepartDate").Value = "28 Sep 12"
Sleep 1000
objIE.document.getElementById("ctl00_c_FS_FF").Click
End Sub
When i click the submit button for the form the ctl00_c_CtWNW_ddlTo-suggest text box is becoming empty and getting an error.
You are getting that error because you are not giving it enough time to validate the names of the FROM and TO Dropdowns (Yes - not Text Boxes)
Try this code (TRIED AND TESTED). Also I am using Late Binding with IE. Change as applicable in your code.
The Sleep 5000 give the text in the drop downs enough time to validate itself with the list of the drop down.
Option Explicit
Private Declare Sub Sleep Lib "kernel32" (ByVal dwMilliseconds As Long)
Sub Click_Btn()
Dim objForms As Object
Dim vTxtInput As Variant
Dim objIE As Object
Set objIE = CreateObject("InternetExplorer.Application")
'Make sure an IE object was hooked
If TypeName(objIE) = "Nothing" Then
MsgBox "Could not hook Internet Explorer object", _
vbCritical, "GetFields() Error"
'GoTo Clean_Up
End If
objIE.Navigate "http://fly3.emirates.com/CAB/IBE/SearchAvailability.aspx"
objIE.Visible = True
Sleep 5000
Set objForms = objIE.document.all
'~~> Choose one way Flights
objIE.document.getElementById("ctl00_c_CtWNW_onewaySearch").Click
'~~> From
objIE.document.getElementById("ctl00_c_CtWNW_ddlFrom-suggest").Focus
objIE.document.getElementById("ctl00_c_CtWNW_ddlFrom-suggest").Value _
= "Mumbai (BOM)"
Sleep 5000
'~~> Departure Date
objIE.document.getElementById("ctl00_c_CtWNW_txtDepartDate").Focus
objIE.document.getElementById("ctl00_c_CtWNW_txtDepartDate").Value _
= "28 Sep 12"
Sleep 5000
'~~> To
objIE.document.getElementById("ctl00_c_CtWNW_ddlTo-suggest").Focus
objIE.document.getElementById("ctl00_c_CtWNW_ddlTo-suggest").Value _
= "Sydney (SYD)"
Sleep 5000
objIE.document.getElementById("ctl00_c_FS_FF").Focus
objIE.document.getElementById("ctl00_c_FS_FF").Click
End Sub
EDIT
Here is Way 2, which is faster than Way 1 (Above) by 15 Seconds as we are not using Sleep 5000 for validating. This doesn't require you to validate the drop downs. What it does is it bypasses the checkValidation(); javascript which gets executed in the ONCLICK event of the Submit Button.
WAY 2
Option Explicit
Private Declare Sub Sleep Lib "kernel32" (ByVal dwMilliseconds As Long)
Sub Click_Btn()
Dim objForms As Object
Dim vTxtInput As Variant
Dim objIE As Object
Set objIE = CreateObject("InternetExplorer.Application")
'Make sure an IE object was hooked
If TypeName(objIE) = "Nothing" Then
MsgBox "Could not hook Internet Explorer object", _
vbCritical, "GetFields() Error"
'GoTo Clean_Up
End If
objIE.Navigate "http://fly3.emirates.com/CAB/IBE/SearchAvailability.aspx"
objIE.Visible = True
Sleep 5000
Set objForms = objIE.document.all
'Choose one way Flights
objIE.document.getElementById("ctl00_c_CtWNW_onewaySearch").Click
' From
objIE.document.getElementById("ctl00_c_CtWNW_ddlFrom-suggest").Focus
objIE.document.getElementById("ctl00_c_CtWNW_ddlFrom-suggest").Value _
= "Mumbai (BOM)"
' Departure Date
objIE.document.getElementById("ctl00_c_CtWNW_txtDepartDate").Focus
objIE.document.getElementById("ctl00_c_CtWNW_txtDepartDate").Value _
= "28 Sep 12"
' To
objIE.document.getElementById("ctl00_c_CtWNW_ddlTo-suggest").Focus
objIE.document.getElementById("ctl00_c_CtWNW_ddlTo-suggest").Value _
= "Sydney (SYD)"
objIE.document.getElementById("ctl00_c_FS_FF").Focus
objIE.document.getElementById("ctl00_c_FS_FF").onclick = _
Replace(objIE.document.getElementById("ctl00_c_FS_FF").onclick, _
"checkValidation();", "true;")
objIE.document.getElementById("ctl00_c_FS_FF").Click
End Sub