Web-scraping with Selenium in Python - selenium

I am trying to get the latest price from a currency rate on Bloomberg using Python + Selenium + PhantomJS
Here is the URL
Here is the HTML
<div class="overviewRow__0956421f">
<span class="priceText__1853e8a5">3.9100</span>
<span class="currency__defc7184">BRL</span>
</div>
Here is my code
from selenium import webdriver
my_url = 'https://www.bloomberg.com/quote/USDBRL:CUR'
driver = webdriver.PhantomJS()
driver.get(my_url)
price = driver.find_element_by_class_name("priceText__1853e8a5")
print(price)
But is not scraping.
Here is the error stack trace:
/Users/marcelo/PycharmProjects/extractwiki/venv/bin/python /Users/marcelo/PycharmProjects/extractwiki/wiki.py
/Users/marcelo/PycharmProjects/extractwiki/venv/lib/python3.7/site-packages/selenium/webdriver/phantomjs/webdriver.py:49: UserWarning: Selenium support for PhantomJS has been deprecated, please use headless versions of Chrome or Firefox instead
warnings.warn('Selenium support for PhantomJS has been deprecated, please use headless '
Traceback (most recent call last):
File "/Users/marcelo/PycharmProjects/extractwiki/wiki.py", line 8, in <module>
price = driver.find_element_by_class_name("overviewRow__0956421f")
File "/Users/marcelo/PycharmProjects/extractwiki/venv/lib/python3.7/site-packages/selenium/webdriver/remote/webdriver.py", line 563, in find_element_by_class_name
return self.find_element(by=By.CLASS_NAME, value=name)
File "/Users/marcelo/PycharmProjects/extractwiki/venv/lib/python3.7/site-packages/selenium/webdriver/remote/webdriver.py", line 966, in find_element
'value': value})['value']
File "/Users/marcelo/PycharmProjects/extractwiki/venv/lib/python3.7/site-packages/selenium/webdriver/remote/webdriver.py", line 320, in execute
self.error_handler.check_response(response)
File "/Users/marcelo/PycharmProjects/extractwiki/venv/lib/python3.7/site-packages/selenium/webdriver/remote/errorhandler.py", line 242, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.NoSuchElementException: Message: {"errorMessage":"Unable to find element with class name 'overviewRow__0956421f'","request":{"headers":{"Accept":"application/json","Accept-Encoding":"identity","Content-Length":"110","Content-Type":"application/json;charset=UTF-8","Host":"127.0.0.1:54302","User-Agent":"selenium/3.14.0 (python mac)"},"httpVersion":"1.1","method":"POST","post":"{\"using\": \"class name\", \"value\": \"overviewRow__0956421f\", \"sessionId\": \"1eaf82f0-a39a-11e8-867d-9dbde70c7bc5\"}","url":"/element","urlParsed":{"anchor":"","query":"","file":"element","directory":"/","path":"/element","relative":"/element","port":"","host":"","password":"","user":"","userInfo":"","authority":"","protocol":"","source":"/element","queryKey":{},"chunks":["element"]},"urlOriginal":"/session/1eaf82f0-a39a-11e8-867d-9dbde70c7bc5/element"}}
Screenshot: available via screen
Process finished with exit code 1
Could some expert help me please?

Try this:
from selenium import webdriver
my_url = 'https://www.bloomberg.com/quote/USDBRL:CUR'
driver = webdriver.PhantomJS()
driver.get(my_url)
price = driver.find_element_by_class_name("priceText__1853e8a5")
for value in price:print(value.text)

Related

Unable to launch firefox browser via selenium geckodriver with error : Service geckodriver unexpectedly exited. Status code was: -9

Code block:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.firefox.service import Service
from selenium.webdriver.firefox.options import Options
new_driver_path = '/Users/username/Desktop/Python/geckodriver'
new_binary_path = '/Applications/Firefox.app/Contents/MacOS/firefox-bin'
ops = Options()
ops.binary_location = new_binary_path
serv = Service(new_driver_path)
driver = webdriver.Firefox(service=serv, options=ops)
On running the above python program i get the following error.
Traceback (most recent call last):
File "prog.py", line 13, in <module>
driver = webdriver.Firefox(service=serv, options=ops)
File "/usr/local/lib/python3.7/site-packages/selenium/webdriver/firefox/webdriver.py", line 174, in __init__
self.service.start()
File "/usr/local/lib/python3.7/site-packages/selenium/webdriver/common/service.py", line 98, in start
self.assert_process_still_running()
File "/usr/local/lib/python3.7/site-packages/selenium/webdriver/common/service.py", line 112, in assert_process_still_running
% (self.path, return_code)
selenium.common.exceptions.WebDriverException: Message: Service /Users/chetanparakh/Desktop/Python/geckodriver unexpectedly exited. Status code was: -9
I might be wrong but maybe something seems to be wrong with the new_binary_path.
This error message...
self.assert_process_still_running()
File "/usr/local/lib/python3.7/site-packages/selenium/webdriver/common/service.py", line 112, in assert_process_still_running
% (self.path, return_code)
selenium.common.exceptions.WebDriverException: Message: Service /Users/chetanparakh/Desktop/Python/geckodriver unexpectedly exited. Status code was: -9
...implies that the previous instance of GeckoDriver is still present hence the program was unable to initiate/spawn a new GeckoDriver process.
Solution
Always invoke driver.quit() within tearDown(){} method to close & destroy the WebDriver and Web Client instances gracefully.

Selenium NoSuchElementException when the element does exist

I am working on a discord bot that uses selenium to grab a zoom link off of an instagram account. It has been working perfectly until just today when it started giving the error selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"xpath","selector":"//*[#id="react-root"]/section/main/div/header/section/div[2]/a[1]"}. I have no idea why this is. I have checked that the Xpath is correct and added a sleep function to allow it time to load. So far the only thing that has worked has been changing the browser that it uses from Chrome to Firefox, however this was done on my local machine and can't be done in the actual code because I have it deployed to heroku with a headless chrome browser running and I don't want to go through the trouble of trying to get Firefox to work since it took me over an hour to get chrome working.
Here is the relevant code:
#client.command()
async def zoom(ctx):
await ctx.send("Retrieving current zoom link from instagram...\nPlease wait...")
chrome_options = webdriver.ChromeOptions()
chrome_options.binary_location = os.environ.get("GOOGLE_CHROME_BIN")
chrome_options.add_argument("--headless")
chrome_options.add_argument("--disable-dev-shm-usage")
chrome_options.add_argument("--no-sandbox")
driver = webdriver.Chrome(executable_path=os.environ.get("CHROMEDRIVER_PATH"), chrome_options=chrome_options)
url = "https://instagram.com/profile/"
driver.get(url) # load instagram page
time.sleep(5)
link = driver.find_element_by_xpath('//*[#id="react-root"]/section/main/div/header/section/div[2]/a[1]').get_attribute("innerHTML") # Get zoom link
await ctx.send("Here is the current zoom link:\nhttps://" + link)
driver.quit()
Here is the whole error traceback:
2020-06-02T03:45:55.540096+00:00 app[worker.1]: Traceback (most recent call last):
2020-06-02T03:45:55.540174+00:00 app[worker.1]: File "main.py", line 48, in zoom
2020-06-02T03:45:55.540177+00:00 app[worker.1]: link = driver.find_element_by_xpath('//*[#id="react-root"]/section/main/div/header/section/div[2]/a[1]').get_attribute("innerHTML") # Get zoom link
2020-06-02T03:45:55.540204+00:00 app[worker.1]: File "/app/.heroku/python/lib/python3.6/site-packages/selenium/webdriver/remote/webdriver.py", line 394, in find_element_by_xpath
2020-06-02T03:45:55.540206+00:00 app[worker.1]: return self.find_element(by=By.XPATH, value=xpath)
2020-06-02T03:45:55.540264+00:00 app[worker.1]: File "/app/.heroku/python/lib/python3.6/site-packages/selenium/webdriver/remote/webdriver.py", line 978, in find_element
2020-06-02T03:45:55.540265+00:00 app[worker.1]: 'value': value})['value']
2020-06-02T03:45:55.540305+00:00 app[worker.1]: File "/app/.heroku/python/lib/python3.6/site-packages/selenium/webdriver/remote/webdriver.py", line 321, in execute
2020-06-02T03:45:55.540305+00:00 app[worker.1]: self.error_handler.check_response(response)
2020-06-02T03:45:55.540306+00:00 app[worker.1]: File "/app/.heroku/python/lib/python3.6/site-packages/selenium/webdriver/remote/errorhandler.py", line 242, in check_response
2020-06-02T03:45:55.540306+00:00 app[worker.1]: raise exception_class(message, screen, stacktrace)
2020-06-02T03:45:55.540375+00:00 app[worker.1]: selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"xpath","selector":"//*[#id="react-root"]/section/main/div/header/section/div[2]/a[1]"}
2020-06-02T03:45:55.540377+00:00 app[worker.1]: (Session info: headless chrome=83.0.4103.61)
Any help is greatly appreciated, I really need to get this bot back up and running again.
Thanks in advance!
You have not used the escaped characters in the double quotation in the id;
try;
//*[#id=\"react-root\"]/section/main/div/header/section/div[2]/a[1]
instead of;
//*[#id="react-root"]/section/main/div/header/section/div[2]/a[1]

Selenium IE Driver is not launching the browser in User Profile

I tried to open the URL using IE Driver , the script is working fine as Admin profile but it fails for user profile. The user is under proxy and I have tried the following things PROTECTED MODE SAME LEVEL, REGISTRY SETTING, and same proxy for user and admin.
CODE:
from selenium import webdriver
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
cap = DesiredCapabilities().INTERNETEXPLORER
cap['ignoreProtectedModeSettings'] = True
cap['IntroduceInstabilityByIgnoringProtectedModeSettings'] = True
cap['nativeEvents'] = True
cap['ignoreZoomSetting'] = True
cap['requireWindowFocus'] = True
browser = webdriver.Ie(capabilities=cap, executable_path='C:\\IEDriver\\IEDriverServer.exe')
browser.get('https://www.bharti-axagi.co.in/')
Error
Traceback (most recent call last): File "C:/Users/hitesh
kumar/PycharmProjects/Open IE/Open IE1.py", line 11, in
browser = webdriver.Ie(capabilities=cap, executable_path='C:\IEDriver\IEDriverServer.exe') File
"C:\Python27\lib\site-packages\selenium\webdriver\ie\webdriver.py",
line 88, in init
desired_capabilities=capabilities) File "C:\Python27\lib\site-packages\selenium\webdriver\remote\webdriver.py",
line 156, in init
self.start_session(capabilities, browser_profile) File "C:\Python27\lib\site-packages\selenium\webdriver\remote\webdriver.py",
line 251, in start_session
response = self.execute(Command.NEW_SESSION, parameters) File "C:\Python27\lib\site-packages\selenium\webdriver\remote\webdriver.py",
line 320, in execute
self.error_handler.check_response(response) File "C:\Python27\lib\site-packages\selenium\webdriver\remote\errorhandler.py",
line 208, in check_response
raise exception_class(value) selenium.common.exceptions.WebDriverException: Message:
Just change the line cap = DesiredCapabilities.INTERNETEXPLORER
To:
cap = DesiredCapabilities.INTERNETEXPLORER.copy()
This is how it's done in the documentation...
Hope this helps you!

Webscraping using selenium grid docker cluster

I am working on selenium grid docker to scrape website. If I use only one chrome node means the selenium grid is working if I scale more than one node of chrome selenium grid and the scrapy again it stops working. It just blinks after some time with big error message.
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
import scrapy
from selenium import webdriver
class ProductSpider(scrapy.Spider):
name = "product_spider"
start_urls = ['https://google.com']
def __init__(self):
options = webdriver.ChromeOptions()
options.add_argument('--headless')
self.driver = webdriver.Remote(command_executor='http://localhost:5000/wd/hub',
desired_capabilities=DesiredCapabilities.CHROME)
def parse(self, response):
data = self.driver.get(response.url)
print(data,'/////////////')
Then I opened python shell and type the code individual
Python 3.6.5 (default, Apr 1 2018, 05:46:30)
[GCC 7.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from selenium import webdriver
>>> from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
>>> options = webdriver.ChromeOptions()
>>> options.add_argument('--headless')
>>> driver = webdriver.Remote(command_executor='http://localhost:5000/wd/hub',
... desired_capabilities=DesiredCapabilities.CHROME)
As you see it stopped in webdriver. Remote .cursor is just blinking for long time then big error message is shown. I think problem is in webdriver.Remote(command_executor='http://localhost:5000/wd/hub',
... desired_capabilities=DesiredCapabilities.CHROME) line.
Can anyone give a solution for this problem
Note it's working if selenium grid has one node (chrome) if I scale more than one node (chrome).
This is the error message after long time:
Traceback (most recent call last): File "", line 1, in
File
"/home/vicky/.local/lib/python3.6/site-packages/selenium/webdriver/remote/webdriver.py",
line 156, in init
self.start_session(capabilities, browser_profile) File "/home/vicky/.local/lib/python3.6/site-packages/selenium/webdriver/remote/webdriver.py",
line 251, in start_session
response = self.execute(Command.NEW_SESSION, parameters) File "/home/vicky/.local/lib/python3.6/site-packages/selenium/webdriver/remote/webdriver.py",
line 320, in execute
self.error_handler.check_response(response) File "/home/vicky/.local/lib/python3.6/site-packages/selenium/webdriver/remote/errorhandler.py",
line 242, in check_response
raise exception_class(message, screen, stacktrace) selenium.common.exceptions.WebDriverException: Message: Error
forwarding the new session Error forwarding the request Connect to
172.18.0.8:5555 [/172.18.0.8] failed: Connection timed out (Connection timed out) Stacktrace:
at org.openqa.grid.web.servlet.handler.RequestHandler.process (RequestHandler.java:117)
at org.openqa.grid.web.servlet.DriverServlet.process (DriverServlet.java:84)
at org.openqa.grid.web.servlet.DriverServlet.doPost (DriverServlet.java:68)
at javax.servlet.http.HttpServlet.service (HttpServlet.java:707)
at javax.servlet.http.HttpServlet.service (HttpServlet.java:790)
at org.seleniumhq.jetty9.servlet.ServletHolder.handle (ServletHolder.java:860)
at org.seleniumhq.jetty9.servlet.ServletHandler.doHandle (ServletHandler.java:535)
at org.seleniumhq.jetty9.server.handler.ScopedHandler.nextHandle (ScopedHandler.java:188)
at org.seleniumhq.jetty9.server.session.SessionHandler.doHandle (SessionHandler.java:1595)
at org.seleniumhq.jetty9.server.handler.ScopedHandler.nextHandle (ScopedHandler.java:188)
at org.seleniumhq.jetty9.server.handler.ContextHandler.doHandle (ContextHandler.java:1253)
at org.seleniumhq.jetty9.server.handler.ScopedHandler.nextScope (ScopedHandler.java:168)
at org.seleniumhq.jetty9.servlet.ServletHandler.doScope (ServletHandler.java:473)
at org.seleniumhq.jetty9.server.session.SessionHandler.doScope (SessionHandler.java:1564)
at org.seleniumhq.jetty9.server.handler.ScopedHandler.nextScope (ScopedHandler.java:166)
at org.seleniumhq.jetty9.server.handler.ContextHandler.doScope (ContextHandler.java:1155)
at org.seleniumhq.jetty9.server.handler.ScopedHandler.handle (ScopedHandler.java:141)
at org.seleniumhq.jetty9.server.handler.HandlerWrapper.handle (HandlerWrapper.java:132)
at org.seleniumhq.jetty9.server.Server.handle (Server.java:530)
at org.seleniumhq.jetty9.server.HttpChannel.handle (HttpChannel.java:347)
at org.seleniumhq.jetty9.server.HttpConnection.onFillable (HttpConnection.java:256)
at org.seleniumhq.jetty9.io.AbstractConnection$ReadCallback.succeeded
(AbstractConnection.java:279)
at org.seleniumhq.jetty9.io.FillInterest.fillable (FillInterest.java:102)
at org.seleniumhq.jetty9.io.ChannelEndPoint$2.run (ChannelEndPoint.java:124)
at org.seleniumhq.jetty9.util.thread.strategy.EatWhatYouKill.doProduce
(EatWhatYouKill.java:247)
at org.seleniumhq.jetty9.util.thread.strategy.EatWhatYouKill.produce
(EatWhatYouKill.java:140)
at org.seleniumhq.jetty9.util.thread.strategy.EatWhatYouKill.run (EatWhatYouKill.java:131)
at org.seleniumhq.jetty9.util.thread.ReservedThreadExecutor$ReservedThread.run
(ReservedThreadExecutor.java:382)
at org.seleniumhq.jetty9.util.thread.QueuedThreadPool.runJob (QueuedThreadPool.java:708)
at org.seleniumhq.jetty9.util.thread.QueuedThreadPool$2.run (QueuedThreadPool.java:626)
I also attached the selenium grid console screenshot when multiple node is used.
link here to see the picture
It looks like you're starting up new Selenium nodes with Firefox but your tests specifically look for Chrome.
I'd recommend using Zalenium to set up your Selenium Grid:
https://github.com/zalando/zalenium

Selenium WebDriverException: Expected 'id' mouse to be mapped to InputState whose subtype is undefined, got: pointerMove

I have a problem with Selenium that I can't make sense of. Also, I can't find a lot of information about this problem via Google.
My Selenium script performs the following steps:
Log into Facebook.
Go to the list of friend proposals.
Scroll down a few times (in order to load more proposals).
Present all proposals one by one on the console and ask the user whether the friend should be added.
On confirmation, an Action chain is created that moves to the proposal in question and then the add button is clicked.
But the Action chain does not work. I get the following error:
Potential friend name: 'John Doe'
Social context: 'Max Mustermann und 3 weitere gemeinsame Freunde'
Traceback (most recent call last):
File "c:\...\facebook_selenium_minimal.py", line 74, in <module>
main()
File "c:\...\facebook_selenium_minimal.py", line 57, in main
friend_add_button).perform()
File "C:\Python36\lib\site-packages\selenium\webdriver\common\action_chains.py", line 77, in perform
self.w3c_actions.perform()
File "C:\Python36\lib\site-packages\selenium\webdriver\common\actions\action_builder.py", line 76, in perform
self.driver.execute(Command.W3C_ACTIONS, enc)
File "C:\Python36\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 238, in execute
self.error_handler.check_response(response)
File "C:\Python36\lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 193, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.WebDriverException: Message: Expected 'id' mouse to be mapped to InputState whose subtype is undefined, got: pointerMove
This is my Selenium script:
import time
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.support.ui import WebDriverWait # available since 2.4.0
from selenium.webdriver.support import expected_conditions as EC # available since 2.26.0
from selenium.webdriver.common.action_chains import ActionChains
TIMEOUT = 5
def main():
driver = webdriver.Firefox()
driver.get("http://www.facebook.com")
print(driver.title)
input_mail = driver.find_element_by_id("email")
input_password = driver.find_element_by_id("pass")
input_mail.send_keys("your_login#example.com")
input_password.send_keys("your_password")
input_password.submit()
try:
WebDriverWait(driver, TIMEOUT).until(
EC.visibility_of_element_located((By.NAME, "requests")))
driver.get("https://www.facebook.com/friends/requests/?fcref=jwl")
WebDriverWait(driver, TIMEOUT).until(
EC.visibility_of_element_located((By.ID, "fbSearchResultsBox")))
# Let Facebook load more friend proposals.
for i in range(2):
driver.execute_script("window.scrollTo(0, document.body.scrollHeight)")
time.sleep(1.0)
friend_proposals = driver.find_elements_by_class_name(
"friendBrowserListUnit")
for friend_proposal in friend_proposals:
try:
friend_title = friend_proposal.find_element_by_class_name(
"friendBrowserNameTitle")
except NoSuchElementException:
print("Title element could not be found. Skipping.")
continue
print("Potential friend name: '%s'" % friend_title.text)
social_context = friend_proposal.find_element_by_class_name(
"friendBrowserSocialContext")
social_context_text = social_context.text
print("Social context: '%s'" % social_context_text)
friend_add_button = friend_proposal.find_element_by_class_name(
"FriendRequestAdd")
actions = ActionChains(driver)
actions.move_to_element(friend_proposal).move_to_element(
friend_add_button).perform()
time.sleep(0.1)
print("Should I add the friend (y/N): ")
response = input()
if response == "y":
friend_add_button.click()
time.sleep(1.0)
print("Added friend...")
except TimeoutException as exc:
print("TimeoutException: " + str(exc))
finally:
driver.quit()
if __name__ == '__main__':
try:
main()
except:
raise
I'm using the latest Selenium version:
C:\Users\Robert>pip show selenium
Name: selenium
Version: 3.3.1
And I have Firefox 52.0.1 with geckodriver v0.15.0.
Update: A quick test revealed that the same script works flawlessly with the Chrome Webdriver.
Update 2: This issue in the Selenium bugtracker on Github might be related: https://github.com/SeleniumHQ/selenium/issues/3642
I ran into the same issue today. You might have observed that the first move_to_element and perform() worked - at least this was true in my case. To repeat this action, you should reset the action chain in your for loop:
actions.perform()
actions.reset_actions()
For me - the .perform fails the first time through - I am on selenium 3.3.1, gecko 15 and latest firefox using java - same code works perfectly on chrome.