should i able to load entire html page using htmlUnitDriver - selenium

I am using Chrome driver for my Selenium test case. It is working fine. There is a performance issue in my project, so I want to migrate the testcase from ChromeDriver to HtmlUnitDriver. When I am trying to use HtmlUnitDriver in my testcase, by just changing the driver name with HtmlUnitDriver, the selenium testcase is not working.
After working around with this driver, I thought that HtmlUnitDriver is not loading the entire page.
Why I am telling this is because HtmlUnitDriver can find some div id's which are in the beginning of the page.
Other divs were not found by this driver. I am getting NoSuchElementException for this div id's.
So please help me to resolve this problem in my project.

Aren't the elements you are looking for created by JavaScript/AJAX calls? You might need to enable JavaScript support in HtmlUnitDriver first.
But beware, it could work well, but it could behave differently from what you see in the real browsers.
Otherwise, are you using Implicit/Explicit Waits for your searches? Even with JS enabled, sometimes it takes a while before all asynchronous requests are handled.

Related

Chrome Headless - Firefox

I'm working on a monitoring tool for my website to log data. The actual logging is made on server. My goal is to calculate stats based on how long the user stays on the website.
Main question: I used chrome headless command --remote-debugging-port=80. I got logs for up to 10 minutes. Works perfectly. But how long will it work if left working? Is there a default timeout? If yes, how can I change it? If I want to run it exactly 30 minutes after page finished loading?
I'm trying to do the same on firefox (tried using PhantomJS but it wasn't loading the page correctly even though useragent was set to firefox) but firefox just throws an bank page when I'm trying to start a headless mode. I used "firefox -headless" and tried capturing an screenshot. It was just exiting my currently open firefox tabs without capturing any image. Any idea?
Using firefox quantum 59.0. I don't want to use selenium.
Also PhantomJS solution would be great. Currently I just want to collect logs. So, it only have to run all javascript (an jquery) code on the page which then sends the data using ajax. I tried page.onLoadFinished and then a wait function to make it stay on the page for the exact time after page loading.
Since no one answered, I will try to answer my own question after even more research and logical thinking.
Main question: Seems that there is no timeout but if need can be used --timeout X. Even though it's not perfect because it runs independently if the page if fully loaded or not.
As for the firefox, it's buggy. -new-instance (make headless run while you are already on firefox) is not working and -no-remote didn't help. Firefox is only working if running only one instance. So, if it's the PC you are working on and you want to run tests too, firefox is not for you. Headless runs only when no other instances of firefox are running, while chrome runs fine.
PhantomJS didn't work even though tried multiple solutions.
Best solution? Use chrome. Need portable? Use chromium and use headless. Or write your soft to use cefsharp which is based on chromium. Your browser with all libs will be around 120-200MB. Pretty big for portable but do it's work. Same as portable chrome or chromium. CefSharp have a privilege of integrating whatever you like into the browser since it's a... browser.

Selenium Golang binding without server

There are many selenium webdriver binding package of Golang.
However, I don't want to control browser throught server.
How can I control browser with Golang and selenium without selenium server?
You can try github.com/fedesog/webdriver which says in its documentation:
This is a pure go library and doesn't require a running Selenium driver.
I would characterize the Selenium webdriver as a client rather than a server. Caveat: I have used the Selenium webdriver (Chrome version) from .Net and I am assuming it is similar for Go.
The way Selenium works is that you will launch an instance of it from within code, and it creates a live version of the selected browser (i.e. Chrome) and your program retains control over it. Then you write code to tell the browser to navigate to a page, inspect the response, and interact with the browser by filling out form data, clicking on buttons, etc. You can see what is happening on the browser as the code runs, so it is easy to troubleshoot when the interaction doesn't go as planned.
I have used Selenium to upload tens of thousands of records to a website that has no API and only a graphical user interface. Give it a chance.

Selenium - Chrome Web Driver - Html Only, No Images

I am doing a lot of testing with the Chrome Web Driver within Selenium. The problem is that every time I run it, it has to re-download all the site images which takes time. It does not keep these images cached.
So I would like to set which files to render and which not too. Obviously I want Javascript and CSS files to still download. But I particularily want to turn off images.
Is this possible? If not, is there a way to enable caching? So the next time I run the progran it can get the images from the local cache.
The solution is to load the same chrome profile again, it should (may not) ensure that images & other similar things are cached.
Here is how to load a particular profile :
https://code.google.com/p/selenium/wiki/ChromeDriver
DesiredCapabilities capabilities = DesiredCapabilities.chrome();
capabilities.setCapability("chrome.switches", Arrays.asList("--user-data-dir=/path/to/profile/directory"));
WebDriver driver = new ChromeDriver(capabilities);
A similar search on SO, gave this result - Load Chrome Profile using Selenium WebDriver using java you might bother to take a look.
Caching from one session to another is not possible as far as I know.
However, it is possible to remove rendering the page if you run it headless. This will load the page, but not render it (make it visible, load images).
You can use HTMLUnitDriver, which is the standard, but somewhat outdated, or you can use PhantomDriver, which has a more modern version of Javascript.
"You can use HTMLUnitDriver" >>> careful with this!
Please note that since HTMLUnitDriver is not the same implementation as the real browser, you may or may not get the same results in both.
So if you start hitting weird issues, like running the test with Chrome passes but running the test with HTMLUnitDriver does not, consider just running your tests with the browser driver. I've heard of looong troubleshooting stories from colleagues and they had to give up on running their suites in headless mode (ie, using HTMLUnitDriver).
NOTE: since I cannot leave a comment on the selected answer, I am "forced" to leave it as an answer. If somebody can help me convert it in a comment, I will appreciate! thanks
For caching images you would need something like squid or haproxy, and then proxy your selenium browser through it.

HTMLUNIT with Headless Selenium

I am trying to scrape a website that contains images using a headless Selenium.
Initially, the website populates 50 images. If you scroll down more and more images are loaded.
Windows 7 x64
python 2.7
recent install of selenium
[1] Non-Headless
Navigating to the website with selenium as follows:
from selenium import webdriver
browser = webdriver.Firefox()
browser.get(url)
browser.execute_script('window.scrollBy(0, 10000)')
browser.page_source
This works (if anyone has a better suggestion please let me know).
I can continue to scrollBy() until I reach the end and then pull the source page.
[2] Headless with HTMLUNIT
from selenium import webdriver
driver = webdriver.Remote(desired_capabilities=webdriver.DesiredCapabilities.HTMLUNIT)
driver.get(url)
I cannot use scrollBy() in this headless environment.
Any suggestions on how to scrape this kind of page?
Thanks
One option is to study the JavaScript to see how it calculates what to load next. Then implement that logic in your scraping client instead. Once you have done that, you can use faster scraping tools like Perl's WWW::Mechanize.
You need to enable JavaScript explicitly when using the HtmlUnit Driver:
driver.setJavascriptEnabled(true);
According to [http://code.google.com/p/selenium/wiki/HtmlUnitDriver](the docs), it should emulate IE's JavaScript handling by default.
When I tried the same method, I got error messages that selenium crashed while connecting java to simulate javascript.
I wrote the script into execute_script method then the code works well.
I guess the communication between selenium and java server part is not configured properly.
Enabling the javascript with HTMLUNITDRIVERWITHJS is possible and quick ;)

What is the difference between Selenium's Remote Control vs WebDriver?

I'm not sure I quite understand the difference. WebDriver API also directly controls the browser of choice. When should you use selenium remote control (selenium RC) instead ?
Right now, my current situation is I am testing a web application by writing a suite with Selenium WebDriver API and letting it run on my computer. The tests are taking longer and longer to complete, so I have been searching for ways to run the tests on a Linux server.
If I use Selenium Remote Control, does this mean I have to rewrite everything I wrote with WebDriver API?
I am getting confused with Selenium Grid, Hudson, Selenium RC. I found a Selenium Grid plugin for Hudson, but not sure if this includes Selenium RC.
Am I taking the correct route? I envision the following architecture:
Hudson running on few Ubuntu dedicated servers.
Hudson running with Xvnc & Selenium Grid plugin. (Do I need to install Firefox separately ?)
Selenium grid running selenium RC test suites.
I think this is far more time efficient than running test on my current working desktop computer with WebDriver API.
WebDriver is now Selenium 2. The Selenium and WebDriver code bases are being merged. WebDriver gets over a number of issues that Selenium has and Selenium gets over a number of issues that Webdriver has.
If you have written your tests in Selenium one you don't have to rewrite them to work with Selenium 2. We, the core developers, have written it so that you create a browser instance and inject that into Selenium and your Selenium 1 tests will work in Selenium 2. I have put an example below for you.
// You may use any WebDriver implementation. Firefox is used here as an example
WebDriver driver = new FirefoxDriver();
// A "base url", used by selenium to resolve relative URLs
String baseUrl = "http://www.google.com";
// Create the Selenium implementation
Selenium selenium = new WebDriverBackedSelenium(driver, baseUrl);
// Perform actions with selenium
selenium.open("http://www.google.com");
selenium.type("name=q", "cheese");
selenium.click("name=btnG");
Selenium 2 unfortunately has not been put into Selenium 2 but it shouldn't be too long until it has been added since we are hoping to reach beta in the next couple of months.
As far as I understand, Webdriver implementation started little later than Selenium RC. From my point of view, WebDriver is more flexible solution, which fixed some annoying problems of SeleniumRC.
WebDriver provides standard interface for testing web GUI. There are several implementations of this interface (HTTP, browser-specific and based on Selenium). Since you already have some WebDriver tests, you must be familiar with basic docs like this
The tests are getting longer and longer to complete, so I have been searching for ways to run the tests on a linux server.
Did you try to find actual bottlenecks? I'm not sure, that elimination of WebDriver layer will help. I think, most time is spent on Selenium commands sending and HTTP requests to system-under-test.
If I use sleneium remote control, does
this mean I have to rewrite everything
I wrote with WebDriver API ?
Generally, yes. If you did not implement some additional layer between tests code and WebDriver.
As for Selenium Grid:
You may start several Selenium RC instances on several different [virtual] nodes, then register them in Selenium Grid. Your tests connect to Selenium Grid, and it redirects all commands to SeleniumRC instances, coordinating them in accordance with required browsers.
For details of hudson plugin you may find more info here