Python - Can't download file after opening a new tab using selenium - selenium

I need to download a xls file from a website using chrome and selenium. There are multiple websites I need to go and so I need to open new tabs. However, when I open the second tab, I cannot download the file I need. Below are simple version my code. Image that I have just download some file from one tab and then open a new one using window.open():
import time
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
options = webdriver.ChromeOptions()
prefs = {'download.default_directory' : SAVE_PATH, "download.prompt_for_download": False}
options.add_experimental_option('prefs', prefs)
driver = webdriver.Chrome(executable_path = DRIVE_PATH, chrome_options = options)
driver.execute_script("window.open('https://www.fhfa.gov/DataTools/Downloads/Pages/House-Price-Index-Datasets.aspx#mpo');")
time.sleep(5)
driver.switch_to.window(driver.window_handles[1])
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//*[#id='WebPartWPQ2']/div[1]/table[3]/tbody/tr[2]/td[2]/p/a"))).click()
Without opening new tab, I could download the file successfully. But after opening new tab, chrome tells me "Fail - Download error". Something wrong with my code?

MacOS, Chrome Version 76.0.3809.100, ChromeDriver version 75.0.3770.140 in download success in both ways.
To locate download link better to use css-selectors below, you find more information about locator strategies here
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "a[href*='HPI_PO_summary.xls']"))).click()
Faster way is to use requests to download files from https://www.fhfa.gov/, here's example:
import requests
import os
file_name = "HPI_PO_summary.xls"
response = requests.get(f'https://www.fhfa.gov/DataTools/Downloads/Documents/HPI/{file_name}')
with open(os.path.join(SAVE_PATH, file_name), 'wb') as f:
f.write(response.content)

Answering my question in here:
It seems the issue is in the SAVE_PATH. Initially my SAVE_PATH was:
r"C:\Users\hw\Desktop\myfile\"
And for some reason it works (based on the answer here) if I add one more slash to the end of the path:
r"C:\Users\hw\Desktop\myfile\\"

Related

Trying to run selenium chromedriver headless?

So I'm trying to scrape some forecasting data from a website periodically, and ideally I would like for it to happen in the background. I had a look at some documentation and came up with the following code:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.chrome.options import Options
options = webdriver.ChromeOptions()
options.add_argument("--headless")
driver = webdriver.Chrome(options = options)
options.add_argument("disable-infobars")
options.add_argument("--disable-extensions")
driver = webdriver.Chrome()
driver.get('https://www.windguru.cz/53')
WebDriverWait(driver, 5).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "#forecasts-page")))
\#Scraping block of code goes here
driver.quit()
I think the following line is over-riding the --headless argument but i'm not sure.
WebDriverWait(driver, 5).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "#forecasts-page")))
The reason I have it added in the first place is that the website I'm scraping isn't just static html (have a look for yourself, link is in code). I think there's some js that prompts the forecast data to load, so I need to wait a bit and make sure before the script starts scraping the dom.
Any idea how I can achieve this and run the browser in headless mode?
The line that's opening the chrome is this
driver = webdriver.Chrome()
You can do away with this and code should still work the same.

Selenium: get() not working with custom google profile

All what im trying to do is pretty much access whatsapp web where I have my whatsapp already linked, However when I use a custom profile the profile does open, however browser.get("https://web.whatsapp.com) doesn't seem to open. or any browser.get(). What could be the issue?
from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
from selenium.common.exceptions import NoSuchElementException, TimeoutException, WebDriverException
options = webdriver.ChromeOptions()
options.add_argument('--user-data-dir=/Users/omarassouma/Library/Application Support/Google/Chrome/')
options.add_experimental_option("deatch", True)
browser = webdriver.Chrome(executable_path="/Applications/Google Chrome.app/Contents/MacOS/Google Chrome",chrome_options=options)
browser.get("https://web.whatsapp.com/")
this is the updated version, it now opens whatsapp web however not in a custom profile, moreover I cant really use webdriver.options(), is there anything extra I have to import?.
from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
options = webdriver.ChromeOptions()
options.add_argument("user-data-dir=/Users/omarassouma/Library/Application Support/Google/Chrome/User Data/Default")
browser = webdriver.Chrome(executable_path="/Users/omarassouma/Downloads/chromedriver",options=options)
browser.get("https://web.whatsapp.com/")
You need to take care of a couple of things as follows:
To use a Custome Chrome Profile you have to pass the absolute path as follows:
options.add_argument("user-data-dir=/Users/omarassouma/Library/Application Support/Google/Chrome/User Data/Default")
You can find a detailed discussion in How to use Chrome Profile in Selenium Webdriver Python 3
Instead of passing the absolute path of the google-chrome binary, you need to pass the absolute path of the ChromeDriver through the key executable_path.
Additionally, instead of chrome_options you need to use options as chrome_options is deprecated now.
You can find a detailed discussion in DeprecationWarning: use options instead of chrome_options error using Brave Browser With Python Selenium and Chromedriver on Windows
So effectively the line of code will be:
browser = webdriver.Chrome(executable_path="/path/to/chromedriver", options=options)
At 05.11.2022 I found the only way to pass through authorization for myself is using cookie - https://stackoverflow.com/a/15058521
Runing selenium driver with google account isn't working

Databricks - Selenium - Open browser tab with

I have successfully installed Selenium in Databricks and can import the Python selenium and webdriver. On my local computer once I run the Selenium get command a separate browser windows open where I can see what Selenium is doing.
However, when running the same script on Databricks there is unfortunately no window opening. I am wondering if this is at all possible. I found some options like
browser.execute_script('''window.open("http://bings.com","_blank");''')
or
add_experimental_option
but these options did not work.
I am currently using the following options on Databricks:
from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
from webdriver_manager.utils import ChromeType
import pandas as pd
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--headless')
chrome_options.add_argument('--disable-dev-shm-usage')
# installed driver
chrome_driver = "/tmp/chromedriver1/chromedriver"
driver = webdriver.Chrome(executable_path=chrome_driver, options=chrome_options)
Has someone an idea how this could work?
Thanks in advance!
No it won't open since you are using headless browser. See the below line from your code :-
chrome_options.add_argument('--headless')
Having said that, it will execute the Python-Selenium bindings instruction, you have. You can try to print the page title like this :
driver = webdriver.Chrome(executable_path=chrome_driver, options=chrome_options)
driver.get("https://www.google.com")
print(driver.title)
should print google.com title in the console.

Is there a command to execute selenium tests that are not wrapped within a framework?

Is there a command to run selenium tests without using a framework? e.g. pytest foo_test.py
What would be required on my local machine in order to run the following test? I am confused as this appears the only requirement would be chromedriver but I don't know which command to use in order to execute the actual test.
from selenium import webdriver
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
capa = DesiredCapabilities.CHROME
capa["pageLoadStrategy"] = "none"
driver = webdriver.Chrome(desired_capabilities=capa)
wait = WebDriverWait(driver, 20)
driver.get('http://stackoverflow.com/')
wait.until(EC.presence_of_element_located((By.CSS_SELECTOR, '#h-top-questions')))
driver.execute_script("window.stop();")
Here is the Answer to your Question:
As you have asked Is there a command to run selenium tests without using a framework, the Answer is Yes.
To answer in simple words, there exists certain frameworks like pytest, unittest, etc in python to structure your test execution and interpreting the test results. Each of the frameworks have their own strengths. When the code base becomes bulky frameworks helps us to arrange. But using framework is not mandatory.
About your code, I don't see any significant error in your code but working with Selenium 3.x.x you need to download the chromedriver from here and save it in your machine. While you initialize the WebDriver instance you need to mention the absolute path of the chromedriver as below.
Here is your own code with some simple tweaks which works well at my end:
from selenium import webdriver
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
capa = DesiredCapabilities.CHROME
capa["pageLoadStrategy"] = "none"
driver = webdriver.Chrome(desired_capabilities=capa,executable_path="C:\\your_directory\\chromedriver.exe")
wait = WebDriverWait(driver, 20)
driver.get('http://stackoverflow.com/')
wait.until(EC.presence_of_element_located((By.CSS_SELECTOR, '#h-top-questions')))
driver.execute_script("window.stop();")
Let me know if this Answers your Question.
There are actual 2 requirements that you are using. Selenium itself is a requirement, and then the chromedriver as you mentioned. The file is just a python file, so you can run it by doing python foo_test.py. There is also the option to use a framework like Unittest, which can be useful for seeing test results.
Selenium itself is not a "testing framework", it is a library of commands that allow a user to interact with a web browser. Selenium can be used for webscraping or automating tasks as well as testing purposes.

How to use a running instance of Firefox with Selenium

I am running Ubuntu 14.04, Firefox 49.0.2, Python 3.4.3 & Selenium 3.0.1
I want to use Selenium to automate some browser functions, not to do any web site testing. How can I can I modify the simple login script below to use the instance of Firefox running on my desktop instead of opening a new Firefox window?
# login_yahoo_mail.py
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Firefox()
driver.get('http://mail.yahoo.com/?.intl=us')
enter_email = driver.find_element_by_id('login-username')
enter_email.clear()
enter_email.send_keys('cleanman2#yahoo.com')
next_button = driver.find_element_by_id('login-signin')
next_button.click()
enter_password = WebDriverWait(driver,10).until(EC.visibility_of_element_located((By.ID, 'login-passwd')))
enter_password.send_keys('dumba$$yahoo!^!&')
signin_button = driver.find_element_by_id('login-signin')
signin_button.click()
Thanks, Jim
It is possible with 'Selenium 4'.
Each window has a unique identifier which remains persistent in a single session. You can get the window handle of the current window by using:
driver.current_window_handle
driver = webdriver.Firefox()
# Store the ID of the original window
original_window = driver.current_window_handle
# Opens a new tab and switches to new tab
driver.switch_to.new_window('tab')
# Opens a new window and switches to new window
driver.switch_to.new_window('window')
#Close the tab or window
driver.close()
#Switch back to the old tab or window
driver.switch_to.window(original_window)
I haven't try it yet but since Webdriver doesn't know where the OS focus is, you can find the way you want with those options.
More: Windows and tabs handle