Selenium with python using headless Chrome on MacOS - takes too long - selenium

I have a script that logs into a site and then takes a screenshot. It uses Chrome 59 on MacOS in headless mode.
I have two problems, that I think are related. One problem is that my script takes minutes when it should take seconds. The second is that Chrome icon lingers in my Dock and never closes.
I think these problems are caused by the site that I am checking has a couple of elements that don't load. \images\grey_semi.png and https://www.google-analytics.com/analytics.js and I think this holds up selenium and prevents it from closing as instructed with driver.close()
What can I do?
script:
import os
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.chrome.options import Options
chrome_options = Options()
chrome_options.add_argument("--headless")
driver = webdriver.Chrome(executable_path=os.path.abspath("chromedriver"), chrome_options=chrome_options)
driver.get("https://url.com/")
username = driver.find_element_by_name("L1")
username.clear()
username.send_keys("name")
password = driver.find_element_by_name("P1")
password.clear()
password.send_keys("pass")
driver.find_element_by_id("login").click()
driver.get("https://url.com/Overview.aspx")
driver.get_screenshot_as_file('main-page.png')
driver.close()

I don't see any waits in your code. As you know Web apps are using AJAX techniques to load dynamic data. When a page is loaded by the browser, the elements within that page may load at different time intervals. Depending from the implementation it is possible that the load event is affected by the google-analytics.com/analytics.js, since a web page is considered completely loaded after all content (including images, script files, CSS files, etc.) is loaded. By default your UI Selenium tests use fresh instance of the browser, so it shouldn't cache the analytics.js. One more thing to check is if Google Analytics is placed in a specific place so that it isn't loaded until the page has loaded or run async. It used to be before the </body> tag but I believe it's now supposed to be the last <script> in the <head> tag. You can find more details of Page Load Impact of Google Analytics here, they claim if done right, the load times are so small that it’s not even worth worrying about. My best guess is that the issue is with the how Google Analytics are used.
About your second problem
Chrome icon lingers in my Dock and never closes
In case you see errors in browser console, try use the quit() method, it closes the browser and shuts down the ChromeDriver executable that is started when starting the ChromeDriver. Keep in mind that close() is used to close the browser only, but the driver instance still remains dangling. Another thing to check is that you are actually using the latest versions of both ChromeDriver executable and Chrome browser.
UPDATE:
If waits do NOT affect your execution time, this means that Selenium will wait for the page to finish loading and then look for the elements you've specified. The only real option that I can think off is to specify a page timeout like so:
from selenium.common.exceptions import TimeoutException
try:
driver.set_page_load_timeout(seconds)
except TimeoutException:
# put code with waits here

I solved this with the following:
driver.find_element_by_link_text("Job Board").click()
driver.find_element_by_link_text("Available Deployments").click()

Related

Selenium with headless chrome fails to get url when switching tabs

I'm currently running Selenium with Specflow.
One of my tests clicks on a button which triggers the download of a pdf file.
That file is automatically opened in a new tab where the test then grabs the url and downloads the referenced file directly to the selenium project.
This whole process works perfectly when chrome driver is run normally but fails on a headless browser with the following error:
The HTTP request to the remote WebDriver server for URL http://localhost:59658/session/c72cd9679ae5f713a6c857b80c3515e4/url timed out
after 60 seconds. -> The request was aborted: The operation has timed out.
This error occurs when attempting to run driver.Url
driver.Url calls work elsewhere in the code. It only fails after the headless browser switches tabs. (Yes, I am switching windows using the driver)
For reference, I cannot get this url without clicking the button on the first page and switching tabs as the url is auto-generated after the button is clicked.
I believe you are just using argument as "--headless" for better performance you should select screen size too. Sometimes, due to inappropriate screen size it cannot detect functions which you are looking for. try using this code or just add one line for size.
from selenium import webdriver
chrome_options = Options()
chrome_options.add_argument("--headless")
chrome_options.add_argument("--window-size=1920x1080")
driver = webdriver.Chrome(chrome_options=chrome_options)
Don't forget to put other arguments according to your need in "driver".

Tests timeouts (Selenium+Jenkins+Grid)

We've started getting random timeouts, but can not get reasons of that. The tests run on remote machines on amazon using selenium grid. Here is how it is going on:
browser is opened,
then a page is loading, but can not load fully within 120 seconds,
then timeout exeption is thrown.
If I run the same tests localy then everything is ok.
The Error is ordinary timeout exception that is thrown if a page is not loaded completely during the period of time that is set in driver.manage().timeouts().pageLoadTimeout(). The problem is that a page of the site can not be loaded completely within that time. But, When period of time that is set in driver.manage().timeouts().pageLoadTimeout() is finished and, consequently, Selenium possession of a browser is finished, the page is loaded at once. The issue can not be reproduced manually on the same remote machines. We've tried different versions of Selenium standalone, Chromedriver, Selenium driver. Browser is Google Chrome 63. Would be happy to hear any suggestions about reasons.
When Selenium loads a webpage/url by default it follows a default configuration of pageLoadStrategy set to normal. To make Selenium not to wait for full page load we can configure the pageLoadStrategy. pageLoadStrategy supports 3 different values as follows:
normal (full page load)
eager (interactive)
none
Code Sample :
Java
capabilities.setCapability("pageLoadStrategy", "none");
Python
caps["pageLoadStrategy"] = "none"
Here you can find the detailed discussions through Java and Python clients.

Slow/incomplete page load in the browser launched by selenium webdriver

I am using selenium WebDriver using RobotFramework. The major problem we are facing is, my tests are timing out even after setting timeout as high as 10 minutes. It happen with any browser I use. These thing works much faster If I run test manually (with all browser cache/data/cookie cleared). These are the other things I have observing for few months.
Some component are never loaded (I check the call trace using BrowserMob Proxy and we found nothing unusual)
"Click Element" does not work in many cases. Clicking on element triggers some action but that action is not always trigerred in automation. Manually it works all the time.
Notes:
This is happening on FF and Chrome. (IE is not working for me at all)
App server and automation suite is in LAN so latency is not an issue
No other heavy process is running at this time
Issue persist even if with firefox default profile
I tried it with different selenium versions. (2.45 - 2.52)
I took latest driver for chrome. Broweser: FF 40+ and GC 48
This does not look application issue as we spent 2 month confirming that. Let me know if you need any other detail.

Selenium - Chrome Web Driver - Html Only, No Images

I am doing a lot of testing with the Chrome Web Driver within Selenium. The problem is that every time I run it, it has to re-download all the site images which takes time. It does not keep these images cached.
So I would like to set which files to render and which not too. Obviously I want Javascript and CSS files to still download. But I particularily want to turn off images.
Is this possible? If not, is there a way to enable caching? So the next time I run the progran it can get the images from the local cache.
The solution is to load the same chrome profile again, it should (may not) ensure that images & other similar things are cached.
Here is how to load a particular profile :
https://code.google.com/p/selenium/wiki/ChromeDriver
DesiredCapabilities capabilities = DesiredCapabilities.chrome();
capabilities.setCapability("chrome.switches", Arrays.asList("--user-data-dir=/path/to/profile/directory"));
WebDriver driver = new ChromeDriver(capabilities);
A similar search on SO, gave this result - Load Chrome Profile using Selenium WebDriver using java you might bother to take a look.
Caching from one session to another is not possible as far as I know.
However, it is possible to remove rendering the page if you run it headless. This will load the page, but not render it (make it visible, load images).
You can use HTMLUnitDriver, which is the standard, but somewhat outdated, or you can use PhantomDriver, which has a more modern version of Javascript.
"You can use HTMLUnitDriver" >>> careful with this!
Please note that since HTMLUnitDriver is not the same implementation as the real browser, you may or may not get the same results in both.
So if you start hitting weird issues, like running the test with Chrome passes but running the test with HTMLUnitDriver does not, consider just running your tests with the browser driver. I've heard of looong troubleshooting stories from colleagues and they had to give up on running their suites in headless mode (ie, using HTMLUnitDriver).
NOTE: since I cannot leave a comment on the selected answer, I am "forced" to leave it as an answer. If somebody can help me convert it in a comment, I will appreciate! thanks
For caching images you would need something like squid or haproxy, and then proxy your selenium browser through it.

Selenium Webdriver/Browser with Python

I need to build a Python scraper to scrape data from a website where content is only displayed after a user clicks a link bound with a Javascript onclick function, and the page is not reloaded. I've looked into Selenium in order to do this and played around with it a bit, and it seems Selenium opens a new Firefox web browser everytime I instantiate a driver:
>>> driver = webdriver.Firefox()
Is this open browser required, or is there a way to get rid of it? I'm asking because the scraper is potentially part of a web app, and I'm afraid if multiple users start using it, I will have a bunch of browser windows open on my server.
Yes, selenium automates web browsers.
You can add this at the bottom of your python code to make sure the browser is closed at the end:
driver.quit()