Webscraping using selenium grid docker cluster - selenium

I am working on selenium grid docker to scrape website. If I use only one chrome node means the selenium grid is working if I scale more than one node of chrome selenium grid and the scrapy again it stops working. It just blinks after some time with big error message.
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
import scrapy
from selenium import webdriver
class ProductSpider(scrapy.Spider):
name = "product_spider"
start_urls = ['https://google.com']
def __init__(self):
options = webdriver.ChromeOptions()
options.add_argument('--headless')
self.driver = webdriver.Remote(command_executor='http://localhost:5000/wd/hub',
desired_capabilities=DesiredCapabilities.CHROME)
def parse(self, response):
data = self.driver.get(response.url)
print(data,'/////////////')
Then I opened python shell and type the code individual
Python 3.6.5 (default, Apr 1 2018, 05:46:30)
[GCC 7.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from selenium import webdriver
>>> from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
>>> options = webdriver.ChromeOptions()
>>> options.add_argument('--headless')
>>> driver = webdriver.Remote(command_executor='http://localhost:5000/wd/hub',
... desired_capabilities=DesiredCapabilities.CHROME)
As you see it stopped in webdriver. Remote .cursor is just blinking for long time then big error message is shown. I think problem is in webdriver.Remote(command_executor='http://localhost:5000/wd/hub',
... desired_capabilities=DesiredCapabilities.CHROME) line.
Can anyone give a solution for this problem
Note it's working if selenium grid has one node (chrome) if I scale more than one node (chrome).
This is the error message after long time:
Traceback (most recent call last): File "", line 1, in
File
"/home/vicky/.local/lib/python3.6/site-packages/selenium/webdriver/remote/webdriver.py",
line 156, in init
self.start_session(capabilities, browser_profile) File "/home/vicky/.local/lib/python3.6/site-packages/selenium/webdriver/remote/webdriver.py",
line 251, in start_session
response = self.execute(Command.NEW_SESSION, parameters) File "/home/vicky/.local/lib/python3.6/site-packages/selenium/webdriver/remote/webdriver.py",
line 320, in execute
self.error_handler.check_response(response) File "/home/vicky/.local/lib/python3.6/site-packages/selenium/webdriver/remote/errorhandler.py",
line 242, in check_response
raise exception_class(message, screen, stacktrace) selenium.common.exceptions.WebDriverException: Message: Error
forwarding the new session Error forwarding the request Connect to
172.18.0.8:5555 [/172.18.0.8] failed: Connection timed out (Connection timed out) Stacktrace:
at org.openqa.grid.web.servlet.handler.RequestHandler.process (RequestHandler.java:117)
at org.openqa.grid.web.servlet.DriverServlet.process (DriverServlet.java:84)
at org.openqa.grid.web.servlet.DriverServlet.doPost (DriverServlet.java:68)
at javax.servlet.http.HttpServlet.service (HttpServlet.java:707)
at javax.servlet.http.HttpServlet.service (HttpServlet.java:790)
at org.seleniumhq.jetty9.servlet.ServletHolder.handle (ServletHolder.java:860)
at org.seleniumhq.jetty9.servlet.ServletHandler.doHandle (ServletHandler.java:535)
at org.seleniumhq.jetty9.server.handler.ScopedHandler.nextHandle (ScopedHandler.java:188)
at org.seleniumhq.jetty9.server.session.SessionHandler.doHandle (SessionHandler.java:1595)
at org.seleniumhq.jetty9.server.handler.ScopedHandler.nextHandle (ScopedHandler.java:188)
at org.seleniumhq.jetty9.server.handler.ContextHandler.doHandle (ContextHandler.java:1253)
at org.seleniumhq.jetty9.server.handler.ScopedHandler.nextScope (ScopedHandler.java:168)
at org.seleniumhq.jetty9.servlet.ServletHandler.doScope (ServletHandler.java:473)
at org.seleniumhq.jetty9.server.session.SessionHandler.doScope (SessionHandler.java:1564)
at org.seleniumhq.jetty9.server.handler.ScopedHandler.nextScope (ScopedHandler.java:166)
at org.seleniumhq.jetty9.server.handler.ContextHandler.doScope (ContextHandler.java:1155)
at org.seleniumhq.jetty9.server.handler.ScopedHandler.handle (ScopedHandler.java:141)
at org.seleniumhq.jetty9.server.handler.HandlerWrapper.handle (HandlerWrapper.java:132)
at org.seleniumhq.jetty9.server.Server.handle (Server.java:530)
at org.seleniumhq.jetty9.server.HttpChannel.handle (HttpChannel.java:347)
at org.seleniumhq.jetty9.server.HttpConnection.onFillable (HttpConnection.java:256)
at org.seleniumhq.jetty9.io.AbstractConnection$ReadCallback.succeeded
(AbstractConnection.java:279)
at org.seleniumhq.jetty9.io.FillInterest.fillable (FillInterest.java:102)
at org.seleniumhq.jetty9.io.ChannelEndPoint$2.run (ChannelEndPoint.java:124)
at org.seleniumhq.jetty9.util.thread.strategy.EatWhatYouKill.doProduce
(EatWhatYouKill.java:247)
at org.seleniumhq.jetty9.util.thread.strategy.EatWhatYouKill.produce
(EatWhatYouKill.java:140)
at org.seleniumhq.jetty9.util.thread.strategy.EatWhatYouKill.run (EatWhatYouKill.java:131)
at org.seleniumhq.jetty9.util.thread.ReservedThreadExecutor$ReservedThread.run
(ReservedThreadExecutor.java:382)
at org.seleniumhq.jetty9.util.thread.QueuedThreadPool.runJob (QueuedThreadPool.java:708)
at org.seleniumhq.jetty9.util.thread.QueuedThreadPool$2.run (QueuedThreadPool.java:626)
I also attached the selenium grid console screenshot when multiple node is used.
link here to see the picture

It looks like you're starting up new Selenium nodes with Firefox but your tests specifically look for Chrome.
I'd recommend using Zalenium to set up your Selenium Grid:
https://github.com/zalando/zalenium

Related

option debuggerAddress not work in Selenium 4 (python)

After upgrade selenium version to 4,
debugging chrome not works.
here is my code
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.chrome.options import Options
import subprocess
# path setting
browser_path = 'C:/Program Files/Google/Chrome/Application/chrome.exe'
# run browser
subprocess.Popen([browser_path,'--remote-debugging-port=9222'])
# option for debbuging chrome
options = Options()
options.add_experimental_option("debuggerAddress", "127.0.0.1:9222")
# focus to opened browser
driver = webdriver.Chrome(service = Service(ChromeDriverManager().install()), options = options)
# do something
driver.get('http://google.com')
but it raises an error in the line (after ~ 60sec remaining in that line):
driver = webdriver.Chrome(service = Service(ChromeDriverManager().install()), options = options)
the error message is :
"name": "WebDriverException",
"message": "Message: unknown error: cannot connect to chrome at 127.0.0.1:9222
the code has been worked well so far (before upgrade selenium),
what is the problem in selenium version 4 ?

Unable to launch firefox browser via selenium geckodriver with error : Service geckodriver unexpectedly exited. Status code was: -9

Code block:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.firefox.service import Service
from selenium.webdriver.firefox.options import Options
new_driver_path = '/Users/username/Desktop/Python/geckodriver'
new_binary_path = '/Applications/Firefox.app/Contents/MacOS/firefox-bin'
ops = Options()
ops.binary_location = new_binary_path
serv = Service(new_driver_path)
driver = webdriver.Firefox(service=serv, options=ops)
On running the above python program i get the following error.
Traceback (most recent call last):
File "prog.py", line 13, in <module>
driver = webdriver.Firefox(service=serv, options=ops)
File "/usr/local/lib/python3.7/site-packages/selenium/webdriver/firefox/webdriver.py", line 174, in __init__
self.service.start()
File "/usr/local/lib/python3.7/site-packages/selenium/webdriver/common/service.py", line 98, in start
self.assert_process_still_running()
File "/usr/local/lib/python3.7/site-packages/selenium/webdriver/common/service.py", line 112, in assert_process_still_running
% (self.path, return_code)
selenium.common.exceptions.WebDriverException: Message: Service /Users/chetanparakh/Desktop/Python/geckodriver unexpectedly exited. Status code was: -9
I might be wrong but maybe something seems to be wrong with the new_binary_path.
This error message...
self.assert_process_still_running()
File "/usr/local/lib/python3.7/site-packages/selenium/webdriver/common/service.py", line 112, in assert_process_still_running
% (self.path, return_code)
selenium.common.exceptions.WebDriverException: Message: Service /Users/chetanparakh/Desktop/Python/geckodriver unexpectedly exited. Status code was: -9
...implies that the previous instance of GeckoDriver is still present hence the program was unable to initiate/spawn a new GeckoDriver process.
Solution
Always invoke driver.quit() within tearDown(){} method to close & destroy the WebDriver and Web Client instances gracefully.

Selenium starts the browser but shows an error as Message: Can not connect to the Service

The code starts the browser, stops at this step (line 5), and after a while throws an error:
selenium.common.exceptions.WebDriverException: Message: Can not connect to the Service C:\Program Files\Mozilla Firefox\firefox.exe
from selenium import webdriver
from selenium.webdriver.firefox.service import Service
s = Service(r'C:\Program Files\Mozilla Firefox\firefox.exe')
driver = webdriver.Firefox(service=s)
driver.get('http://www.google.com')
myPageTitle = driver.title
print(myPageTitle)
driver.quit()
Firefox - 95.0.2
Selenium - 4.1.0
I tried with chrome, same problem
Does anyone know what the problem is and how to solve it?
As an argument to Service() instead of the firefox executable, you need to pass the absolute location of the GeckoDriver executable which can be downloaded from mozilla/geckodriver page.
So your effective code block will be:
from selenium import webdriver
from selenium.webdriver.firefox.service import Service
s = Service(r'C:\path\to\geckodriver.exe')
driver = webdriver.Firefox(service=s)
driver.get('http://www.google.com')
myPageTitle = driver.title
print(myPageTitle)
driver.quit()

Web-scraping with Selenium in Python

I am trying to get the latest price from a currency rate on Bloomberg using Python + Selenium + PhantomJS
Here is the URL
Here is the HTML
<div class="overviewRow__0956421f">
<span class="priceText__1853e8a5">3.9100</span>
<span class="currency__defc7184">BRL</span>
</div>
Here is my code
from selenium import webdriver
my_url = 'https://www.bloomberg.com/quote/USDBRL:CUR'
driver = webdriver.PhantomJS()
driver.get(my_url)
price = driver.find_element_by_class_name("priceText__1853e8a5")
print(price)
But is not scraping.
Here is the error stack trace:
/Users/marcelo/PycharmProjects/extractwiki/venv/bin/python /Users/marcelo/PycharmProjects/extractwiki/wiki.py
/Users/marcelo/PycharmProjects/extractwiki/venv/lib/python3.7/site-packages/selenium/webdriver/phantomjs/webdriver.py:49: UserWarning: Selenium support for PhantomJS has been deprecated, please use headless versions of Chrome or Firefox instead
warnings.warn('Selenium support for PhantomJS has been deprecated, please use headless '
Traceback (most recent call last):
File "/Users/marcelo/PycharmProjects/extractwiki/wiki.py", line 8, in <module>
price = driver.find_element_by_class_name("overviewRow__0956421f")
File "/Users/marcelo/PycharmProjects/extractwiki/venv/lib/python3.7/site-packages/selenium/webdriver/remote/webdriver.py", line 563, in find_element_by_class_name
return self.find_element(by=By.CLASS_NAME, value=name)
File "/Users/marcelo/PycharmProjects/extractwiki/venv/lib/python3.7/site-packages/selenium/webdriver/remote/webdriver.py", line 966, in find_element
'value': value})['value']
File "/Users/marcelo/PycharmProjects/extractwiki/venv/lib/python3.7/site-packages/selenium/webdriver/remote/webdriver.py", line 320, in execute
self.error_handler.check_response(response)
File "/Users/marcelo/PycharmProjects/extractwiki/venv/lib/python3.7/site-packages/selenium/webdriver/remote/errorhandler.py", line 242, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.NoSuchElementException: Message: {"errorMessage":"Unable to find element with class name 'overviewRow__0956421f'","request":{"headers":{"Accept":"application/json","Accept-Encoding":"identity","Content-Length":"110","Content-Type":"application/json;charset=UTF-8","Host":"127.0.0.1:54302","User-Agent":"selenium/3.14.0 (python mac)"},"httpVersion":"1.1","method":"POST","post":"{\"using\": \"class name\", \"value\": \"overviewRow__0956421f\", \"sessionId\": \"1eaf82f0-a39a-11e8-867d-9dbde70c7bc5\"}","url":"/element","urlParsed":{"anchor":"","query":"","file":"element","directory":"/","path":"/element","relative":"/element","port":"","host":"","password":"","user":"","userInfo":"","authority":"","protocol":"","source":"/element","queryKey":{},"chunks":["element"]},"urlOriginal":"/session/1eaf82f0-a39a-11e8-867d-9dbde70c7bc5/element"}}
Screenshot: available via screen
Process finished with exit code 1
Could some expert help me please?
Try this:
from selenium import webdriver
my_url = 'https://www.bloomberg.com/quote/USDBRL:CUR'
driver = webdriver.PhantomJS()
driver.get(my_url)
price = driver.find_element_by_class_name("priceText__1853e8a5")
for value in price:print(value.text)

Selenium WebDriverException: Expected 'id' mouse to be mapped to InputState whose subtype is undefined, got: pointerMove

I have a problem with Selenium that I can't make sense of. Also, I can't find a lot of information about this problem via Google.
My Selenium script performs the following steps:
Log into Facebook.
Go to the list of friend proposals.
Scroll down a few times (in order to load more proposals).
Present all proposals one by one on the console and ask the user whether the friend should be added.
On confirmation, an Action chain is created that moves to the proposal in question and then the add button is clicked.
But the Action chain does not work. I get the following error:
Potential friend name: 'John Doe'
Social context: 'Max Mustermann und 3 weitere gemeinsame Freunde'
Traceback (most recent call last):
File "c:\...\facebook_selenium_minimal.py", line 74, in <module>
main()
File "c:\...\facebook_selenium_minimal.py", line 57, in main
friend_add_button).perform()
File "C:\Python36\lib\site-packages\selenium\webdriver\common\action_chains.py", line 77, in perform
self.w3c_actions.perform()
File "C:\Python36\lib\site-packages\selenium\webdriver\common\actions\action_builder.py", line 76, in perform
self.driver.execute(Command.W3C_ACTIONS, enc)
File "C:\Python36\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 238, in execute
self.error_handler.check_response(response)
File "C:\Python36\lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 193, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.WebDriverException: Message: Expected 'id' mouse to be mapped to InputState whose subtype is undefined, got: pointerMove
This is my Selenium script:
import time
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.support.ui import WebDriverWait # available since 2.4.0
from selenium.webdriver.support import expected_conditions as EC # available since 2.26.0
from selenium.webdriver.common.action_chains import ActionChains
TIMEOUT = 5
def main():
driver = webdriver.Firefox()
driver.get("http://www.facebook.com")
print(driver.title)
input_mail = driver.find_element_by_id("email")
input_password = driver.find_element_by_id("pass")
input_mail.send_keys("your_login#example.com")
input_password.send_keys("your_password")
input_password.submit()
try:
WebDriverWait(driver, TIMEOUT).until(
EC.visibility_of_element_located((By.NAME, "requests")))
driver.get("https://www.facebook.com/friends/requests/?fcref=jwl")
WebDriverWait(driver, TIMEOUT).until(
EC.visibility_of_element_located((By.ID, "fbSearchResultsBox")))
# Let Facebook load more friend proposals.
for i in range(2):
driver.execute_script("window.scrollTo(0, document.body.scrollHeight)")
time.sleep(1.0)
friend_proposals = driver.find_elements_by_class_name(
"friendBrowserListUnit")
for friend_proposal in friend_proposals:
try:
friend_title = friend_proposal.find_element_by_class_name(
"friendBrowserNameTitle")
except NoSuchElementException:
print("Title element could not be found. Skipping.")
continue
print("Potential friend name: '%s'" % friend_title.text)
social_context = friend_proposal.find_element_by_class_name(
"friendBrowserSocialContext")
social_context_text = social_context.text
print("Social context: '%s'" % social_context_text)
friend_add_button = friend_proposal.find_element_by_class_name(
"FriendRequestAdd")
actions = ActionChains(driver)
actions.move_to_element(friend_proposal).move_to_element(
friend_add_button).perform()
time.sleep(0.1)
print("Should I add the friend (y/N): ")
response = input()
if response == "y":
friend_add_button.click()
time.sleep(1.0)
print("Added friend...")
except TimeoutException as exc:
print("TimeoutException: " + str(exc))
finally:
driver.quit()
if __name__ == '__main__':
try:
main()
except:
raise
I'm using the latest Selenium version:
C:\Users\Robert>pip show selenium
Name: selenium
Version: 3.3.1
And I have Firefox 52.0.1 with geckodriver v0.15.0.
Update: A quick test revealed that the same script works flawlessly with the Chrome Webdriver.
Update 2: This issue in the Selenium bugtracker on Github might be related: https://github.com/SeleniumHQ/selenium/issues/3642
I ran into the same issue today. You might have observed that the first move_to_element and perform() worked - at least this was true in my case. To repeat this action, you should reset the action chain in your for loop:
actions.perform()
actions.reset_actions()
For me - the .perform fails the first time through - I am on selenium 3.3.1, gecko 15 and latest firefox using java - same code works perfectly on chrome.