How can I scroll down a webpage using Selenium with VBA - vba

I've written a script using VBA in combination with selenium to get all the company links from a webpage which doesn't display all the links until scrolled downmost. However, when I run my script, I get only 20 links but there are 1000 links in total. I've heard that it is possible to accomplish this type of task executing javascript function between the code. At this point, I can't get any idea how can I place that within my script. Here is what I've tried so far:
Sub Testing_scroll()
Dim driver As New WebDriver
Dim posts As Object, post As Object
driver.Start "chrome", "http://fortune.com/fortune500"
driver.get "/list/"
driver.execute_script ("window.scrollTo(0, document.body.scrollHeight);") --It doesn't support here
Set posts = driver.FindElementsByXPath("//li[contains(concat(' ', #class, ' '), ' small-12 ')]")
For Each post In posts
i = i + 1
Cells(i, 1) = post.FindElementByXPath(".//a").Attribute("href")
Next post
End Sub

According to the examples included with SeleniumBasic you should be using
driver.ExecuteScript("window.scrollTo(0, document.body.scrollHeight);")
not "driver.execute_script", which is the python equivalent from the previous solution I gave you :) You are going to have to loop that in the same way until you've got all 1000 links on the page.

This works for me. It scrolls incrementally down the page until it reaches the end. I found that scrolling directly to the end didn't load all of the in-between elements. This example is Selenium VBA for a Chrome driver.
Sub scrollToEndofPage(dr As WebDriver)
Dim aCH As WebActionChain, scrPOS As Long, oldPOS As Long
scrPOS = dr.ExecuteScript("return window.pageYOffset;")
Set aCH = dr.ActionChain
oldPOS = -1
Do While scrPOS > oldPOS
oldPOS = scrPOS
aCH.ScrollBy 0, 100: aCH.Perform
scrPOS = dr.ExecuteScript("return window.pageYOffset;")
DoEvents
Loop
End Sub

Related

Extract Data with Selenium and VBA

I'm new using selenium. I have a site that is not more compatible with the IE, so i decided to try this new technique, but can't see what is wrong on my code. Any help will be apreciated.
Sub ExtractPrice()
Dim bot As WebDriver, myproducts As WebElements, myproduct As WebElement
Set bot = New WebDriver
bot.Start "chrome"
bot.Get "https://www.veadigital.com.ar/prod/72060/lechuga-capuchina-por-kg"
' Application.Wait Now + TimeValue("00:00:20")
Set myproducts = bot.FindElementsByClass("datos-producto-container")
'
For Each myproduct In myproducts
If myproduct.FindElementByClass("product-price").Text <> "" Then
'Debug.Print myproducts.FindElementByClass("product-price").Text
Worksheets("VEA").Range("b2").Value = myproducts.FindElementsByClass("product-price").Text
End If
Next
MsgBox ("complete")
End Sub
Issue is in this line :
Worksheets("VEA").Range("b2").Value = myproducts.FindElementsByClass("product-price").Text
Remember FindElements, returns a list of webelements rather than webelement. Instaead use the line you have used in if condition.
Worksheets("VEA").Range("b2").Value=myproduct.FindElementByClass("product-price").Text
Note : With above line of code you will get your price, but it will come as $379 instead of $3.79. As there is no . in price on page. Better way to store price is :
Dim intValue = myproduct.FindElementByClass("product-price").Text
Dim decValue= myproduct.findElementByXPath(".//div[#class='product-price']//span").Text
Worksheets("VEA").Range("b2").Value = Replace(intValue , decValue, "."&decValue)
Above will assign $3.79.

Can't set chrome preferences in vba selenium

I've created a script in vba in combination with selenium to parse the first headline from this webpage. Most of the times my script throws this error timeout or this error Run-time error 21; Application defined or Object defined error while sometimes it works flawlessly. As the page takes too much time to load it's content, I suppose I'm having one of the side effect of a slow loading page, so I wish to disable images from that page.
I've tried with:
Sub TestSelenium()
Const URL$ = "https://www.marketscreener.com/"
Dim driver As Object, post As Object
Set driver = New ChromeDriver
driver.get URL
Set post = driver.FindElementByCss(".une_title")
MsgBox post.Text
driver.Quit
End Sub
When I go for python selenium binding, I can use this option to disable images:
option = webdriver.ChromeOptions()
chrome_prefs = {}
option.experimental_options["prefs"] = chrome_prefs
chrome_prefs["profile.default_content_settings"] = {"images": 2}
chrome_prefs["profile.managed_default_content_settings"] = {"images": 2}
driver = webdriver.Chrome(options=option)
I know there are options to set different preferences in vba but in case of disabling images I can't find any proper way to set them:
driver.SetPreference
driver.AddArgument
How can I set chrome preferences in vba selenium to let the page load quickly without images?
To disable images from that page, this is how you can set the preference within vba selenium bindings:
driver.SetPreference "profile.managed_default_content_settings.images", 2
Your script looks like the following when you implement the above suggestion:
Sub TestSelenium()
Const URL$ = "https://www.marketscreener.com/"
Dim driver As Object, post As Object
Set driver = New ChromeDriver
driver.SetPreference "profile.managed_default_content_settings.images", 2
driver.get URL
Set post = driver.FindElementByCss(".une_title")
MsgBox post.Text
Stop
driver.Quit
End Sub
You could always try running it headless? That should remove any delay associated with image loading.
driver.AddArgument "--headless"

Trouble clicking on a load more button using vba

I'm trying to click on the Load More button located at the bottom of the left window of this webpage using vba in combination with selenium but the script always throws timeout error pointing at this .Get Url line. Although it seems I've defined an accurate xpath to locate the element, I can't think further as to what I should do now to achieve the same.
How can I click on that Load More button?
Sub ClickOnLoadMore()
Const Url$ = "http://www.ratemyprofessors.com/search.jsp?queryoption=TEACHER&queryBy=schoolDetails&schoolID=457&schoolName=James+Madison+University&dept=select"
Dim driver As New ChromeDriver, post As Object
With driver
.Get Url
Set post = .FindElementByXPath("//div[contains(.,'Load More')]")
.ExecuteScript "arguments[0].scrollIntoView();", post
post.Click
End With
End Sub
I see two "Load More" buttons. Both are matched by "//div[contains(.,'Load More')]". The first one is hidden. You need to handle second one.
Try this XPath
"//div[#class='content' and . = 'Load More']"
At least for me there were a couple of banners to dismiss as well as scrolling. There was no problem with the get line
Option Explicit
Public Sub ClickOnLoadMore()
Const Url$ = "http://www.ratemyprofessors.com/search.jsp?queryoption=TEACHER&queryBy=schoolDetails&schoolID=457&schoolName=James+Madison+University&dept=select"
Dim driver As New ChromeDriver, post As Object
With driver
.get Url
If .FindElementsByCss(".close-notice.close-this").Count > 0 Then
.FindElementByCss(".close-notice.close-this").Click
End If
.SwitchToFrame .FindElementByCss("[id^='spout-unit-iframe']")
With .FindElementByCss("#spout-ads #spout-header-close")
.ScrollIntoView
.Click
End With
.SwitchToDefaultContent
.ExecuteScript "document.querySelector('.result-list [onclick*=LoadMore]').scrollIntoView(true);" & _
"window.scrollBy(0, -(window.innerHeight - this.clientHeight) / 2);"
.FindElementByCss(".result-list [onclick*=LoadMore]").Click
Stop '<== Delete me later
'other code
.Quit
End With
End Sub

Loop over a set of pages with Selenium Basic (VBA)

Task:
So my first foray into Selenium and I am attempting to:
Find the number of pages in a pagination set listed at the bottom of https://codingislove.com/ This is purely to support task 2 by determining the loop end.
Loop over them
I believe these are linked but for those that want a single issue. I simply want to find the correct collection and loop over it to load each page.
The number of pages is, at time of writing, 6 as seen at the bottom of the webpage and shown below:
As an MCVE I simply want to find the number of pages and click my way through them. Using Selenium Basic.
What I have tried:
I have read through a number of online resources, I have listed but a few in references.
Task 1)
It seems that I should be able to find the count of pages using the Size property. But I can't seem to find the right object to use this with. I have made a number of attempts; a few shown below:
bot.FindElementsByXPath("//*[#id=""main""]/nav/div/a[3]").Size '<==this I think is too specific
bot.FindElementsByClass("page-numbers").Size
But these yield the run-time error 438:
"Object does not support this property or method"
And the following doesn't seem to expose the required methods:
bot.FindElementByCss(".navigation.pagination")
I have fudged with
bot.FindElementsByClass("page-numbers").Count + 1
But would like something more robust
Task 2)
I know that I can navigate to the next page, from page 1, with:
bot.FindElementByXPath("//*[#id=""main""]/nav/div/a[3]").Click
But I can't use this in a loop presumably because the XPath needs to be updated.
If not updated it leads to a runtime error 13.
As the re-directs follow a general pattern of
href="https://codingislove.com/page/pageNumber/"
I can again fudge my way through by constructing each URL in the loop with
bot.Get "https://codingislove.com/page/" & i & "/"
But I would like something more robust.
Question:
How do I loop over the pagination set in a robust fashion using selenium? Sure I am having a dense day and that there should be an easy to target appropriate collection to loop over.
Code - My current attempt
Option Explicit
Public Sub scrapeCIL()
Dim bot As New WebDriver, i As Long, pageCount As Long
bot.Start "chrome", "https://codingislove.com"
bot.Get "/"
pageCount = bot.FindElementsByClass("page-numbers").Count + 1 '
For i = 1 To pageCount 'technically can loop from 2 I know!
' bot.FindElementByXPath("//*[#id=""main""]/nav/div/a[3]").Click 'runtime error 13
' bot.FindElementByXPath("//*[#id=""main""]/nav/div/a[2]/span").Click ''runtime error 13
bot.Get "https://codingislove.com/page/" & i & "/"
Next i
Stop
bot.Quit
End Sub
Note:
Any supported browser will do. It doesn't have to be Chrome.
References:
Finding the number of pagination buttons in Selenium WebDriver
http://seleniumhome.blogspot.co.uk/2013/07/how-can-we-automate-pagination-using.html
Requirements:
Selenium Basic
ChromeDriver 2.37 'Or use IE but zoom must be at 100%
VBE Tools > references > Selenium type library
To click the element, it must be visible in the screen, so you need to scroll to the bottom of the page first (selenium might do this implicitly some times, but I don't find it reliable).
Try this:
Option Explicit
Public Sub scrapeCIL()
Dim bot As New WebDriver, btn As Object, i As Long, pageCount As Long
bot.Start "chrome", "https://codingislove.com"
bot.Get "/"
pageCount = bot.FindElementsByClass("page-numbers").Count
For i = 1 To pageCount
bot.ExecuteScript ("window.scrollTo(0,document.body.scrollHeight);")
Application.wait Now + TimeValue("00:00:02")
On Error Resume Next
Set btn = bot.FindElementByCss("a[class='next page-numbers']")
If btn.IsPresent = True Then
btn.Click
End If
On Error GoTo 0
Next i
bot.Quit
End Sub
Similar principle:
Option Explicit
Public Sub GetItems()
Dim i As Long
With New ChromeDriver
.Get "https://codingislove.com/"
For i = 1 To 6
.FindElementByXPath("//*[#id=""main""]/nav/div/a[3]").SendKeys ("Keys.PageDown")
Application.Wait Now + TimeValue("00:00:02")
On Error Resume Next
.FindElementByCss("a.next").Click
On Error GoTo 0
Next i
End With
End Sub
Reference:
'http://seleniumhome.blogspot.co.uk/2013/07/how-to-press-keyboard-in-selenium.html
If you're only interested in clicking through each of the pages (and getting the number of pages is just an aid to doing this) then you should be able to click this element until it's no longer there:
<span class="screen-reader-text">Next Page</span>
Using
bot.FindElementByXpath("//span[contains(text(), 'Next Page')]")
Have a loop click that link on each page load. Eventually it wont be there. Then use VBA's error/exception handling to handle whatever the equivalent of NoSuchElementException is in this implementation of WebDriver. You will need to re-find the element each time in the loop.
How about trying like this? Few days back I could figure out that there is an option .SendKeys("keys.END") which will lead you to the bottom of a page so that the driver can reach out the expected element to click. I used If Err.Number <> 0 Then Exit Do within the do loop so that if the scraper encounters any error, it will break out of loop as in, element not found error in this case when the clicking on the last page button is done.
Give this a shot:
Sub GetItems()
Dim pagenum As Object
With New ChromeDriver
.get "https://codingislove.com/"
Do
On Error Resume Next
Set pagenum = .FindElementByCss("a.next")
pagenum.SendKeys ("Keys.END")
Application.Wait Now + TimeValue("00:00:03")
pagenum.Click
If Err.Number <> 0 Then Exit Do
On Error GoTo 0
Loop
.Quit
End With
End Sub
Reference to add to the library:
Selenium Type Library

How can I get to the bottom of a webpage rectifying the existing loop?

I've written a script using vba in combination with selenium to get to the bottom of a lazy-loading webpage. However, my script is able to do that. But the for x loop I've used in my script is looking weird and I've no explanation for it. What I expect to do is use the same loop without any number hardcoded to it, as in 200 in this case. Any help on this will be highly appreciated.
Sub Get_links()
Dim driver As New WebDriver
With driver
.Start "chrome", "http://fortune.com/fortune500"
.get "/list/"
End With
For x = 0 To 200
driver.ExecuteScript "window.scrollTo(0, document.body.scrollHeight);"
driver.Wait 500
Next x
End Sub
To be honest, I really like solving/adapting your questions, they are really challenging. Here you go:
Sub Get_links()
Dim driver As New WebDriver
Dim CurrentPageHeight As Long, PrevPageHeight As Long
Dim EndofPage As Boolean
'EndofPage = False
With driver
.Start "chrome", "http://fortune.com/fortune500"
.get "/list/"
End With
Do While EndofPage = False
PrevPageHeight = CurrentPageHeight
CurrentPageHeight = driver.ExecuteScript("window.scrollTo(0, document.body.scrollHeight);var CurrentPageHeight=document.body.scrollHeight;return CurrentPageHeight;")
driver.Wait 3000 'depending on your internet connection, increase or decrease time
If PrevPageHeight = CurrentPageHeight Then
EndofPage = True
End If
Loop
End Sub
EDIT:
I suppose there is no implicit or explicit wait for Selenium in VBA, and there is no need to.
While scraping the web, whether it is Selenium or not, I always choose to rely on if element in page exists or not. From my personal experiences, "implicit and explicit wait" failed me both in python and vba while scraping.
Again, personally, I found that VBA is more reliable and easier than python not only for scraping but also for extracting data to excel since they in the same platform. The reason of this is because I found a solution to make sure I am scraping the page I want (not the previously loaded page in loop). Please check this post for the above mentioned solution which I was unable to find such a thing on the net.
I could implement the same thing to python, but I would do that only if I was going to use my parsed data in an api for example. Since it is excel, VBA is a better choice.
Anyways, I mimic-ed the implicit wait for you below. I hope it offers an insight into your comment/question.
Sub Get_links()
Dim driver As New WebDriver
Dim CurrentPageHeight As Long, NextPageHeight As Long
Dim EndofPage As Boolean
'EndofPage = False
With driver
.Start "chrome", "http://fortune.com/fortune500"
.get "/list/"
End With
Do
driver.ExecuteScript "window.scrollTo(0, document.body.scrollHeight);"
On Error Resume Next
Debug.Print Split(driver.FindElementsByClass("company-list")(1).Text, vbLf)(3001)
Loop Until Err.Number <> 9
End Sub
Edit2: The reason behind using Debug.Print Split(driver.FindElementsByClass("company-list")(1).Text, vbLf)(3001) is checking an element that belongs to the bottom of page, if it exists or not. There is nothing special about this phrase, you can use something similar as long as you can return an element from the bottom. Let me explain my logic:
If you debug.print driver.FindElementsByClass("company-list")(1).Text, you will see that is the complete list separated by line feeders.
So I split them with vbLf and have the rank 1000 in the list which is 3001st element. How do I know this? With a quick simple logic:
...(1).Text, vbLf)(0) -> RANK
...(1).Text, vbLf)(1) -> COMPANY
...(1).Text, vbLf)(2) -> REVENUES ($M)
...(1).Text, vbLf)(3) -> 1
...(1).Text, vbLf)(4) -> Walmart
...(1).Text, vbLf)(5) -> $485,873
...(1).Text, vbLf)(6) -> 2
.
.
(Rank 1) * 3 = (3)
(Rank 2) * 3 = (6)
.
.
.
(Rank 1000) * 3 = (3000)
You should have got rank 1000 from (3000), but you don't because there is another div right after 20th line in list. So it is (3001). You can use 3000, 2950, 2912, whatever you like as long as they are in the last 50 group.