Task:
So my first foray into Selenium and I am attempting to:
Find the number of pages in a pagination set listed at the bottom of https://codingislove.com/ This is purely to support task 2 by determining the loop end.
Loop over them
I believe these are linked but for those that want a single issue. I simply want to find the correct collection and loop over it to load each page.
The number of pages is, at time of writing, 6 as seen at the bottom of the webpage and shown below:
As an MCVE I simply want to find the number of pages and click my way through them. Using Selenium Basic.
What I have tried:
I have read through a number of online resources, I have listed but a few in references.
Task 1)
It seems that I should be able to find the count of pages using the Size property. But I can't seem to find the right object to use this with. I have made a number of attempts; a few shown below:
bot.FindElementsByXPath("//*[#id=""main""]/nav/div/a[3]").Size '<==this I think is too specific
bot.FindElementsByClass("page-numbers").Size
But these yield the run-time error 438:
"Object does not support this property or method"
And the following doesn't seem to expose the required methods:
bot.FindElementByCss(".navigation.pagination")
I have fudged with
bot.FindElementsByClass("page-numbers").Count + 1
But would like something more robust
Task 2)
I know that I can navigate to the next page, from page 1, with:
bot.FindElementByXPath("//*[#id=""main""]/nav/div/a[3]").Click
But I can't use this in a loop presumably because the XPath needs to be updated.
If not updated it leads to a runtime error 13.
As the re-directs follow a general pattern of
href="https://codingislove.com/page/pageNumber/"
I can again fudge my way through by constructing each URL in the loop with
bot.Get "https://codingislove.com/page/" & i & "/"
But I would like something more robust.
Question:
How do I loop over the pagination set in a robust fashion using selenium? Sure I am having a dense day and that there should be an easy to target appropriate collection to loop over.
Code - My current attempt
Option Explicit
Public Sub scrapeCIL()
Dim bot As New WebDriver, i As Long, pageCount As Long
bot.Start "chrome", "https://codingislove.com"
bot.Get "/"
pageCount = bot.FindElementsByClass("page-numbers").Count + 1 '
For i = 1 To pageCount 'technically can loop from 2 I know!
' bot.FindElementByXPath("//*[#id=""main""]/nav/div/a[3]").Click 'runtime error 13
' bot.FindElementByXPath("//*[#id=""main""]/nav/div/a[2]/span").Click ''runtime error 13
bot.Get "https://codingislove.com/page/" & i & "/"
Next i
Stop
bot.Quit
End Sub
Note:
Any supported browser will do. It doesn't have to be Chrome.
References:
Finding the number of pagination buttons in Selenium WebDriver
http://seleniumhome.blogspot.co.uk/2013/07/how-can-we-automate-pagination-using.html
Requirements:
Selenium Basic
ChromeDriver 2.37 'Or use IE but zoom must be at 100%
VBE Tools > references > Selenium type library
To click the element, it must be visible in the screen, so you need to scroll to the bottom of the page first (selenium might do this implicitly some times, but I don't find it reliable).
Try this:
Option Explicit
Public Sub scrapeCIL()
Dim bot As New WebDriver, btn As Object, i As Long, pageCount As Long
bot.Start "chrome", "https://codingislove.com"
bot.Get "/"
pageCount = bot.FindElementsByClass("page-numbers").Count
For i = 1 To pageCount
bot.ExecuteScript ("window.scrollTo(0,document.body.scrollHeight);")
Application.wait Now + TimeValue("00:00:02")
On Error Resume Next
Set btn = bot.FindElementByCss("a[class='next page-numbers']")
If btn.IsPresent = True Then
btn.Click
End If
On Error GoTo 0
Next i
bot.Quit
End Sub
Similar principle:
Option Explicit
Public Sub GetItems()
Dim i As Long
With New ChromeDriver
.Get "https://codingislove.com/"
For i = 1 To 6
.FindElementByXPath("//*[#id=""main""]/nav/div/a[3]").SendKeys ("Keys.PageDown")
Application.Wait Now + TimeValue("00:00:02")
On Error Resume Next
.FindElementByCss("a.next").Click
On Error GoTo 0
Next i
End With
End Sub
Reference:
'http://seleniumhome.blogspot.co.uk/2013/07/how-to-press-keyboard-in-selenium.html
If you're only interested in clicking through each of the pages (and getting the number of pages is just an aid to doing this) then you should be able to click this element until it's no longer there:
<span class="screen-reader-text">Next Page</span>
Using
bot.FindElementByXpath("//span[contains(text(), 'Next Page')]")
Have a loop click that link on each page load. Eventually it wont be there. Then use VBA's error/exception handling to handle whatever the equivalent of NoSuchElementException is in this implementation of WebDriver. You will need to re-find the element each time in the loop.
How about trying like this? Few days back I could figure out that there is an option .SendKeys("keys.END") which will lead you to the bottom of a page so that the driver can reach out the expected element to click. I used If Err.Number <> 0 Then Exit Do within the do loop so that if the scraper encounters any error, it will break out of loop as in, element not found error in this case when the clicking on the last page button is done.
Give this a shot:
Sub GetItems()
Dim pagenum As Object
With New ChromeDriver
.get "https://codingislove.com/"
Do
On Error Resume Next
Set pagenum = .FindElementByCss("a.next")
pagenum.SendKeys ("Keys.END")
Application.Wait Now + TimeValue("00:00:03")
pagenum.Click
If Err.Number <> 0 Then Exit Do
On Error GoTo 0
Loop
.Quit
End With
End Sub
Reference to add to the library:
Selenium Type Library
Related
I am having trouble with code while trying to take the class name. I have tried and run time error 32 is appearing as:
InvalidSelector Error: compund class names not permitted
Maybe somebody canhelp with the code below. I have chrome browser 103.5060.114 but web driver 103.5060.56 . I could not find other driver to wnload.
HTML:
<span dir="auto" title="Customer" class="ggj6brxn gfz4du6o r7fjleex g0rxnol2 lhj4utae le5p0ye3 l7jjieqr i0jNr">Customer</span>
VBA code:
Sub iselementpresenttest()
Dim bot As New WebDriver
Dim ks As New Keys
'Init New Chrome instance & navigate to WebWhatsApp
bot.Start "chrome", "https://web.whatsapp.com/"
bot.Get "/"
bot.Window.Maximize
MsgBox "Please scan the QR code. After you are logged in, please confirm this message box by clicking 'ok'"
bot.Wait (3500)
' If bot.FindElementByClass("_2qo4q _3NIfV") = 1 Then
' Debug.Print "true"
'End If
If bot.FindElementByClass("ggj6brxn gfz4du6o r7fjleex g0rxnol2 lhj4utae le5p0ye3 l7jjieqr i0jNr") = 1 Then
' If bot.FindElementByXPath("//*[#id='pane-side']/div/div/div/div[11]/div/div/div[2]/div[2]/div[1]/span/div/span") = 1 Then
Debug.Print "true"
End If
' If bot.FindElementsByXPath("//*[#id='main']/div[3]/div/div[2]/div[3]/div[20]/div/div[1]/div[1]/div[2]/div/div/span").Count > 0 Then
If bot.FindElementsByXPath("//*[#id='main']/div[3]/div/div[2]/div[3]/div[20]/div/div[1]/div[1]/div[2]/div/div/span").Count > 0 Then
bot.TakeScreenshot.ToExcel.PasteSpecial (xlPasteAll)
bot.Quit
MsgBox "Yes"
Else
bot.Quit
MsgBox "NO"
End If
End Sub
My aim is to take the send receive whatsapp icon to the excel sheet
You need to take care of a couple of things here:
The classnames of the <span> looks dynamic and and may change sooner or later, even may be next time you access the application afresh.
Selenium doesn't permit compund class names
Solution
To identify the element you can use the following locator strategies:
Using FindElementByCss:
bot.FindElementByCss("span[title='Customer']")
Using FindElementByXPath:
bot.FindElementByXPath("//span[#title='Customer' and text()='Customer']")
I'm trying to click on the Load More button located at the bottom of the left window of this webpage using vba in combination with selenium but the script always throws timeout error pointing at this .Get Url line. Although it seems I've defined an accurate xpath to locate the element, I can't think further as to what I should do now to achieve the same.
How can I click on that Load More button?
Sub ClickOnLoadMore()
Const Url$ = "http://www.ratemyprofessors.com/search.jsp?queryoption=TEACHER&queryBy=schoolDetails&schoolID=457&schoolName=James+Madison+University&dept=select"
Dim driver As New ChromeDriver, post As Object
With driver
.Get Url
Set post = .FindElementByXPath("//div[contains(.,'Load More')]")
.ExecuteScript "arguments[0].scrollIntoView();", post
post.Click
End With
End Sub
I see two "Load More" buttons. Both are matched by "//div[contains(.,'Load More')]". The first one is hidden. You need to handle second one.
Try this XPath
"//div[#class='content' and . = 'Load More']"
At least for me there were a couple of banners to dismiss as well as scrolling. There was no problem with the get line
Option Explicit
Public Sub ClickOnLoadMore()
Const Url$ = "http://www.ratemyprofessors.com/search.jsp?queryoption=TEACHER&queryBy=schoolDetails&schoolID=457&schoolName=James+Madison+University&dept=select"
Dim driver As New ChromeDriver, post As Object
With driver
.get Url
If .FindElementsByCss(".close-notice.close-this").Count > 0 Then
.FindElementByCss(".close-notice.close-this").Click
End If
.SwitchToFrame .FindElementByCss("[id^='spout-unit-iframe']")
With .FindElementByCss("#spout-ads #spout-header-close")
.ScrollIntoView
.Click
End With
.SwitchToDefaultContent
.ExecuteScript "document.querySelector('.result-list [onclick*=LoadMore]').scrollIntoView(true);" & _
"window.scrollBy(0, -(window.innerHeight - this.clientHeight) / 2);"
.FindElementByCss(".result-list [onclick*=LoadMore]").Click
Stop '<== Delete me later
'other code
.Quit
End With
End Sub
I'd like to use selenium VBA to download some data from Yahoo Finance KOSPI COmposite Index.
I got the difficulty when click the date picker arrow to get the mini window to select the end date as today. I tried to record the marco through selenium IDE in chrome, but IDE does not record the step when I click the arrow of the Time period to get the date picker visible.
Below is my code in VBA.
Public Function seleniumKorea(bot As WebDriver)
Dim url As String
url = "https://finance.yahoo.com/quote/%5EKS11/history?period1=1484018309&period2=1515554309&interval=1d&filter=history&frequency=1d"
bot.Start "chrome", url
bot.Get "/"
'Not sure how to add date picker here
bot.FindElementByName("endDate").Clear
bot.FindElementByName("endDate").SendKeys (Date)
bot.FindElementByXPath("(.//*[normalize-space(text()) and normalize-space(.)='End Date'])[1]/following::button[1]").Click
Application.Wait (Now + TimeValue("0:01:00"))
bot.FindElementByXPath("(.//*[normalize-space(text()) and normalize-space(.)='As of'])[1]/following::div[4]").Click
Application.Wait (Now + TimeValue("0:01:00"))
bot.FindElementByXPath("(.//*[normalize-space(text()) and normalize-space(.)='Currency in KRW'])[1]/following::span[2]").Click
Application.Wait (Now + TimeValue("0:01:00"))
End Function
I tried to use the ByXPath to get the svg class but failed.
Thanks in advance.
I would use the following which submits the OATH consent if required and scrolls the date picker into the viewport
Option Explicit
Public Sub DatePicking()
Dim d As WebDriver
Set d = New ChromeDriver
Const URL = "https://finance.yahoo.com/quote/%5EKS11/history?period1=1484018309&period2=1515554309&interval=1d&filter=history&frequency=1d/"
With d
.get URL
If .Title = "Yahoo is now part of Oath" Then
.FindElementByCss("form").submit
End If
With .FindElementByCss("[data-test='date-picker-full-range']")
.ScrollIntoView
.Click
End With
With .FindElementByCss("[name=startDate]")
.Clear
.SendKeys "05/10/2017"
End With
With .FindElementByCss("[name=endDate]")
.Clear
.SendKeys "05/10/2017"
End With
Stop '<==Delete me later
.Quit
End With
End Sub
Check this out. It should serve the purpose. I used xpath to solve the issue.
Sub CustomizeDate()
Const Url$ = "https://finance.yahoo.com/quote/%5EKS11/history?period1=1484018309&period2=1515554309&interval=1d&filter=history&frequency=1d"
Dim driver As New ChromeDriver
With driver
.get Url
.FindElementByXPath("//input[#data-test='date-picker-full-range']", timeout:=5000).Click
.FindElementByXPath("//input[#name='startDate']").Clear.SendKeys ("01/05/2017")
.FindElementByXPath("//input[#name='endDate']").Clear.SendKeys ("01/08/2017")
.FindElementByXPath("//button/span[.='Done']", timeout:=5000).Click
.FindElementByXPath("//button/span[.='Apply']", timeout:=5000).Click
End With
End Sub
Don't use hardcoded delay for any item to appear. Use Explicit Wait instead. The script will wait upto 5000, meaning 5 seconds for that item to be available.
I've written a script using vba in combination with selenium to get to the bottom of a lazy-loading webpage. However, my script is able to do that. But the for x loop I've used in my script is looking weird and I've no explanation for it. What I expect to do is use the same loop without any number hardcoded to it, as in 200 in this case. Any help on this will be highly appreciated.
Sub Get_links()
Dim driver As New WebDriver
With driver
.Start "chrome", "http://fortune.com/fortune500"
.get "/list/"
End With
For x = 0 To 200
driver.ExecuteScript "window.scrollTo(0, document.body.scrollHeight);"
driver.Wait 500
Next x
End Sub
To be honest, I really like solving/adapting your questions, they are really challenging. Here you go:
Sub Get_links()
Dim driver As New WebDriver
Dim CurrentPageHeight As Long, PrevPageHeight As Long
Dim EndofPage As Boolean
'EndofPage = False
With driver
.Start "chrome", "http://fortune.com/fortune500"
.get "/list/"
End With
Do While EndofPage = False
PrevPageHeight = CurrentPageHeight
CurrentPageHeight = driver.ExecuteScript("window.scrollTo(0, document.body.scrollHeight);var CurrentPageHeight=document.body.scrollHeight;return CurrentPageHeight;")
driver.Wait 3000 'depending on your internet connection, increase or decrease time
If PrevPageHeight = CurrentPageHeight Then
EndofPage = True
End If
Loop
End Sub
EDIT:
I suppose there is no implicit or explicit wait for Selenium in VBA, and there is no need to.
While scraping the web, whether it is Selenium or not, I always choose to rely on if element in page exists or not. From my personal experiences, "implicit and explicit wait" failed me both in python and vba while scraping.
Again, personally, I found that VBA is more reliable and easier than python not only for scraping but also for extracting data to excel since they in the same platform. The reason of this is because I found a solution to make sure I am scraping the page I want (not the previously loaded page in loop). Please check this post for the above mentioned solution which I was unable to find such a thing on the net.
I could implement the same thing to python, but I would do that only if I was going to use my parsed data in an api for example. Since it is excel, VBA is a better choice.
Anyways, I mimic-ed the implicit wait for you below. I hope it offers an insight into your comment/question.
Sub Get_links()
Dim driver As New WebDriver
Dim CurrentPageHeight As Long, NextPageHeight As Long
Dim EndofPage As Boolean
'EndofPage = False
With driver
.Start "chrome", "http://fortune.com/fortune500"
.get "/list/"
End With
Do
driver.ExecuteScript "window.scrollTo(0, document.body.scrollHeight);"
On Error Resume Next
Debug.Print Split(driver.FindElementsByClass("company-list")(1).Text, vbLf)(3001)
Loop Until Err.Number <> 9
End Sub
Edit2: The reason behind using Debug.Print Split(driver.FindElementsByClass("company-list")(1).Text, vbLf)(3001) is checking an element that belongs to the bottom of page, if it exists or not. There is nothing special about this phrase, you can use something similar as long as you can return an element from the bottom. Let me explain my logic:
If you debug.print driver.FindElementsByClass("company-list")(1).Text, you will see that is the complete list separated by line feeders.
So I split them with vbLf and have the rank 1000 in the list which is 3001st element. How do I know this? With a quick simple logic:
...(1).Text, vbLf)(0) -> RANK
...(1).Text, vbLf)(1) -> COMPANY
...(1).Text, vbLf)(2) -> REVENUES ($M)
...(1).Text, vbLf)(3) -> 1
...(1).Text, vbLf)(4) -> Walmart
...(1).Text, vbLf)(5) -> $485,873
...(1).Text, vbLf)(6) -> 2
.
.
(Rank 1) * 3 = (3)
(Rank 2) * 3 = (6)
.
.
.
(Rank 1000) * 3 = (3000)
You should have got rank 1000 from (3000), but you don't because there is another div right after 20th line in list. So it is (3001). You can use 3000, 2950, 2912, whatever you like as long as they are in the last 50 group.
I've written a script using VBA in combination with selenium to get all the company links from a webpage which doesn't display all the links until scrolled downmost. However, when I run my script, I get only 20 links but there are 1000 links in total. I've heard that it is possible to accomplish this type of task executing javascript function between the code. At this point, I can't get any idea how can I place that within my script. Here is what I've tried so far:
Sub Testing_scroll()
Dim driver As New WebDriver
Dim posts As Object, post As Object
driver.Start "chrome", "http://fortune.com/fortune500"
driver.get "/list/"
driver.execute_script ("window.scrollTo(0, document.body.scrollHeight);") --It doesn't support here
Set posts = driver.FindElementsByXPath("//li[contains(concat(' ', #class, ' '), ' small-12 ')]")
For Each post In posts
i = i + 1
Cells(i, 1) = post.FindElementByXPath(".//a").Attribute("href")
Next post
End Sub
According to the examples included with SeleniumBasic you should be using
driver.ExecuteScript("window.scrollTo(0, document.body.scrollHeight);")
not "driver.execute_script", which is the python equivalent from the previous solution I gave you :) You are going to have to loop that in the same way until you've got all 1000 links on the page.
This works for me. It scrolls incrementally down the page until it reaches the end. I found that scrolling directly to the end didn't load all of the in-between elements. This example is Selenium VBA for a Chrome driver.
Sub scrollToEndofPage(dr As WebDriver)
Dim aCH As WebActionChain, scrPOS As Long, oldPOS As Long
scrPOS = dr.ExecuteScript("return window.pageYOffset;")
Set aCH = dr.ActionChain
oldPOS = -1
Do While scrPOS > oldPOS
oldPOS = scrPOS
aCH.ScrollBy 0, 100: aCH.Perform
scrPOS = dr.ExecuteScript("return window.pageYOffset;")
DoEvents
Loop
End Sub