Databricks - Selenium - Open browser tab with - selenium

I have successfully installed Selenium in Databricks and can import the Python selenium and webdriver. On my local computer once I run the Selenium get command a separate browser windows open where I can see what Selenium is doing.
However, when running the same script on Databricks there is unfortunately no window opening. I am wondering if this is at all possible. I found some options like
browser.execute_script('''window.open("http://bings.com","_blank");''')
or
add_experimental_option
but these options did not work.
I am currently using the following options on Databricks:
from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
from webdriver_manager.utils import ChromeType
import pandas as pd
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--headless')
chrome_options.add_argument('--disable-dev-shm-usage')
# installed driver
chrome_driver = "/tmp/chromedriver1/chromedriver"
driver = webdriver.Chrome(executable_path=chrome_driver, options=chrome_options)
Has someone an idea how this could work?
Thanks in advance!

No it won't open since you are using headless browser. See the below line from your code :-
chrome_options.add_argument('--headless')
Having said that, it will execute the Python-Selenium bindings instruction, you have. You can try to print the page title like this :
driver = webdriver.Chrome(executable_path=chrome_driver, options=chrome_options)
driver.get("https://www.google.com")
print(driver.title)
should print google.com title in the console.

Related

Selenium: get() not working with custom google profile

All what im trying to do is pretty much access whatsapp web where I have my whatsapp already linked, However when I use a custom profile the profile does open, however browser.get("https://web.whatsapp.com) doesn't seem to open. or any browser.get(). What could be the issue?
from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
from selenium.common.exceptions import NoSuchElementException, TimeoutException, WebDriverException
options = webdriver.ChromeOptions()
options.add_argument('--user-data-dir=/Users/omarassouma/Library/Application Support/Google/Chrome/')
options.add_experimental_option("deatch", True)
browser = webdriver.Chrome(executable_path="/Applications/Google Chrome.app/Contents/MacOS/Google Chrome",chrome_options=options)
browser.get("https://web.whatsapp.com/")
this is the updated version, it now opens whatsapp web however not in a custom profile, moreover I cant really use webdriver.options(), is there anything extra I have to import?.
from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
options = webdriver.ChromeOptions()
options.add_argument("user-data-dir=/Users/omarassouma/Library/Application Support/Google/Chrome/User Data/Default")
browser = webdriver.Chrome(executable_path="/Users/omarassouma/Downloads/chromedriver",options=options)
browser.get("https://web.whatsapp.com/")
You need to take care of a couple of things as follows:
To use a Custome Chrome Profile you have to pass the absolute path as follows:
options.add_argument("user-data-dir=/Users/omarassouma/Library/Application Support/Google/Chrome/User Data/Default")
You can find a detailed discussion in How to use Chrome Profile in Selenium Webdriver Python 3
Instead of passing the absolute path of the google-chrome binary, you need to pass the absolute path of the ChromeDriver through the key executable_path.
Additionally, instead of chrome_options you need to use options as chrome_options is deprecated now.
You can find a detailed discussion in DeprecationWarning: use options instead of chrome_options error using Brave Browser With Python Selenium and Chromedriver on Windows
So effectively the line of code will be:
browser = webdriver.Chrome(executable_path="/path/to/chromedriver", options=options)
At 05.11.2022 I found the only way to pass through authorization for myself is using cookie - https://stackoverflow.com/a/15058521
Runing selenium driver with google account isn't working

Selenium and ChromeDriver Issues

When the ChromeDriver version does not match my current chrome version, i upgrade chromedriver by the following code:
from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
driver = webdriver.Chrome(ChromeDriverManager().install())
Then i use selenium for scraping the website data, but still i got some errors. Anyone can help me with this issue? Appreciate.
import time
import pandas as pd
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
options = Options()
options.headless = True
driver = webdriver.Chrome(options=options)
driver.get("https://www.binance.com/cn/futures/funding-history/0")
time.sleep(5)
The errors took place when the above code is ran which has been attached.
You may try to include the below code to get the latest version automatically
through PIP :
pip install chromedriver-autoinstaller
Usage
Just type import chromedriver_autoinstaller in the module you want to use chromedriver.
Example
from selenium import webdriver
import chromedriver_autoinstaller
chromedriver_autoinstaller.install() # Check if the current version of chromedriver exists
# and if it doesn't exist, download it automatically,
# then add chromedriver to path
driver = webdriver.Chrome()
driver.get("http://www.python.org")
assert "Python" in driver.title
Read more about auto upgrade here

Python - Can't download file after opening a new tab using selenium

I need to download a xls file from a website using chrome and selenium. There are multiple websites I need to go and so I need to open new tabs. However, when I open the second tab, I cannot download the file I need. Below are simple version my code. Image that I have just download some file from one tab and then open a new one using window.open():
import time
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
options = webdriver.ChromeOptions()
prefs = {'download.default_directory' : SAVE_PATH, "download.prompt_for_download": False}
options.add_experimental_option('prefs', prefs)
driver = webdriver.Chrome(executable_path = DRIVE_PATH, chrome_options = options)
driver.execute_script("window.open('https://www.fhfa.gov/DataTools/Downloads/Pages/House-Price-Index-Datasets.aspx#mpo');")
time.sleep(5)
driver.switch_to.window(driver.window_handles[1])
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//*[#id='WebPartWPQ2']/div[1]/table[3]/tbody/tr[2]/td[2]/p/a"))).click()
Without opening new tab, I could download the file successfully. But after opening new tab, chrome tells me "Fail - Download error". Something wrong with my code?
MacOS, Chrome Version 76.0.3809.100, ChromeDriver version 75.0.3770.140 in download success in both ways.
To locate download link better to use css-selectors below, you find more information about locator strategies here
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "a[href*='HPI_PO_summary.xls']"))).click()
Faster way is to use requests to download files from https://www.fhfa.gov/, here's example:
import requests
import os
file_name = "HPI_PO_summary.xls"
response = requests.get(f'https://www.fhfa.gov/DataTools/Downloads/Documents/HPI/{file_name}')
with open(os.path.join(SAVE_PATH, file_name), 'wb') as f:
f.write(response.content)
Answering my question in here:
It seems the issue is in the SAVE_PATH. Initially my SAVE_PATH was:
r"C:\Users\hw\Desktop\myfile\"
And for some reason it works (based on the answer here) if I add one more slash to the end of the path:
r"C:\Users\hw\Desktop\myfile\\"

selenium.common.exceptions.SessionNotCreatedException: Message: session not created: No matching capabilities error with ChromeDriver Chrome Selenium

First, machine and package specs:
I am running:
ChromeDriver version 75.0.3770.140
Selenium: version '3.141.0'
WSL (linux subsystem) of windows 10
I am trying to run a chromebrowser through selenium. I found: these commands, to use selenium through google chrome.
I have a test directory, with only the chromedriver binary file, and the script, in it. The location of the directory is: /home/kela/test_dir/
I ran the code:
import selenium
from selenium import webdriver
from bs4 import BeautifulSoup
from selenium.webdriver.firefox.options import Options
from selenium.webdriver.firefox.firefox_binary import FirefoxBinary
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
options = Options()
options.binary_location='/home/kela/test_dir/chromedriver'
driver = webdriver.Chrome(chrome_options = options,executable_path='/home/kela/test_dir/chromedriver')
The output from this code is:
selenium.common.exceptions.SessionNotCreatedException: Message: session not created: No matching capabilities found
Can anyone explain why I need capabilities when the same script works for others without capabilities? I did try adding:
chrome_options.add_argument('--headless')
chrome_options.add_argument('--no-sandbox')
but I got the same error. So I'm not sure what capabilities I need to add (considering it works for others without it?)
Edit 1: Addressing DebanjanB's comments below:
Chromedriver is in the expected location. I am using windows 10. From here, the expected location is C:\Program Files (x86)\Google\Chrome\Application\chrome.exe; and this is where it is on my machine (I copied and pasted this location from the chrome Properties table).
ChromeDriver is having executable permission for non-root users.
I definitely have Google Chrome v75.0 installed (I can see that the Product version 75.0.3770.100)
I am running the script as a non-root user, as my bash command line ends with a $ and not # (i.e kela:~/test_dir$ and not kela:~/test_dir#)
Edit 2: Based on DebanjanB's answer below, I am very close to having it working, but just not quite.
The code:
import selenium
from selenium import webdriver
from bs4 import BeautifulSoup
from selenium.webdriver.firefox.options import Options
from selenium.webdriver.firefox.firefox_binary import FirefoxBinary
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
from selenium.webdriver.chrome.options import Options
options = Options()
options.binary_location='/c/Program Files (x86)/Google/Chrome/Application/chrome.exe'
driver = webdriver.Chrome(options=options)
driver.get('http://google.com/')
Produces a dialog box that reads:
Google Chrome cannot read and write to it's data directory: /tmp/.com/google.Chrom.gyw63s
So then I double checked my Chrome permissions and I should be able to write to Chrome:
Also, I can see that /tmp/ has a bunch of .com dirs in it:
.com.google.Chrome.4jnWme/ .com.google.Chrome.FdNyKP/ .com.google.Chrome.VAcWMQ/ .com.google.Chrome.ZbkRx0/ .com.google.Chrome.iRrceF/
.com.google.Chrome.A2QHHB/ .com.google.Chrome.G7Y51c/ .com.google.Chrome.WD8BtK/ .com.google.Chrome.cItmhA/ .com.google.Chrome.pm28hN/
However, since that seemed to be more of a warning than an error, I clicked 'ok' to close the dialog box, and a new tab does open in the browser; but the URL is just 'data:,'. The same thing happens if I remove the line 'driver.get('http://google.com')' from the script, so I know the warning/issue is with the line:
driver = webdriver.Chrome(chrome_options = options,executable_path='/home/kela/test_dir/chromedriver')
For example, from here, I tried adding:
options.add_argument('--profile-directory=Default')
But the same warning pops up.
Edit 3:
As edit 3 was starting to veer into a different question than specifically being addressed here, I started a new question here.
This error message...
selenium.common.exceptions.SessionNotCreatedException: Message: session not created: No matching capabilities found
...implies that the ChromeDriver was unable to initiate/spawn a new WebBrowser i.e. Chrome Browser session.
binary_location
binary_location set/get(s) the location of the Chrome (executable) binary and is defined as:
def binary_location(self, value):
"""
Allows you to set where the chromium binary lives
:Args:
- value: path to the Chromium binary
"""
self._binary_location = value
So as per your code trials, options.binary_location='/home/kela/test_dir/chromedriver' is incorrect.
Solution
If Chrome is installed at the default location, you can safely remove this property. Incase Chrome is installed at a customized location you need to use the options.binary_location property to point to the Chrome installation.
You can find a detailed discussion in Selenium: WebDriverException:Chrome failed to start: crashed as google-chrome is no longer running so ChromeDriver is assuming that Chrome has crashed
Effectively, you code block will be:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
options = Options()
options.binary_location=r'C:\Program Files (x86)\Google\Chrome\Application\chrome.exe'
driver = webdriver.Chrome(options=options, executable_path='/home/kela/test_dir/chromedriver.exe')
driver.get('http://google.com/')
Additionally, ensure the following:
ChromeDriver is having executable permission for non-root users.
As you are using ChromeDriver v75.0 ensure that you have the recommended version of the Google Chrome v75.0 as:
---------ChromeDriver 75.0.3770.8 (2019-04-29)---------
Supports Chrome version 75
Execute the Selenium Test as non-root user.

Selenium: How to make geckodriver headless

I have completed code for geckodriver and was wondering how to make geckodriver headless now. I saw a post previously with the following text:
from selenium import webdriver
from selenium.webdriver.firefox.options import Options
options = Options()
options.add_argument("--headless")
driver = webdriver.Firefox(firefox_options=options,
executable_path="C:\\Utility\\BrowserDrivers\\geckodriver.exe")
print("Firefox Headless Browser Invoked")
driver.get('http://google.com/')
driver.quit()
I don't understand where the download for options came from under webdriver. When I downloaded geckodriver, all that came with it was the executable file. Any help is greatly appreciated!!
Works for me. Steps I used: (1) Open a command prompt and navigate to the folder containing geckodriver.exe. (2) Start geckodriver.exe without any options from a command prompt. (3) Open another command prompt and type python and press the Return key. (4) Copy/paste the following code into your your python session.
from selenium import webdriver
from selenium.webdriver.firefox.options import Options
from selenium.webdriver import Firefox
options = Options()
options.add_argument("--headless")
# Don't put the path to geckodriver in the following. But the firefox executable
# must be in the path. If not, include the path to firefox, not geckodriver below.
driver = Firefox(firefox_options=options)
print("Firefox Headless Browser Invoked")
driver.get('http://google.com/')
# Print the first 300 characters on the page.
print(driver.page_source[:300])
driver.quit()