I've written a website scraper for use in a project.
I'm controlling Firefox through Sahi using Mink to visit each site and interact with any elements where necessary. I've managed to get this working perfectly on all sites I've tried except for one...
I'm trying to get the markup from https://www.o2.co.uk/shop/phones/
I'm using the exact same code for this page, as I have for all others:
// Configure driver
$this->driver = new \Behat\Mink\Driver\SahiDriver('firefox',
new \Behat\SahiClient\Client(
new \Behat\SahiClient\Connection(null, CRAWL_SERVER, 9999)
)
);
// Init session:
$this->session = new \Behat\Mink\Session($this->driver);
// Start session:
$this->session->start();
// Open the url
$this->session->visit($config['url']);
// Get the markup from the page
$markup = $this->session->getPage()->getContent();
When I use this code to attempt to get the markup from https://www.o2.co.uk/shop/phones/ Mink just seems to hang, waiting for something to happen.
It would seem that maybe something on this page is preventing either Sahi or Mink from returning the markup. I've also tried running other functions instead of getContent(), such as $this->session->wait(2000); and attempting to search through getPage using the find command.
If anyone has any idea as to why this is happening I would be very interested in finding out why and how I can make this work.
tl;dr
Why is Mink/Sahi timing out on this site?
Related
I am running Selenium Firefox Webdriver on Python for webscraping and when I am going around diferent pages, some of those have some mechanism that open new windows, something in the way of this:
$(function(){
window.open(url, windowName[, windowFeatures]);
});
And it is some kind of malicious webpage that keeps opening random pages on new windows and after some minutes my PC runs out of memory and crashes.
So want I want is to load some feature on the webdriver so it doesn't allow pages to open new windows.
I have tryed not to load JS but this feature is no longer working I guess.
Also, if you know some option or preference to ignore script tags I would like to know it.
Thanks in advice.
Try loading a JS file just in your tests which overwrites the window.open function. Something like:
(function(){
window.open = function() { return false; }
})();
Notice this is an immediatly invoked function exectution.
I've tried the standard
var elementForMs = driver.findElement(By.xpath(selector));
driver.executeScript("arguments[0].click()", elementForMs);
and
var elementForMs = driver.findElement(By.css(selector));
driver.executeScript("arguments[0].click()", elementForMs);
And there are simply cases where the element never responds to the click in Microsoft Edge 15.01563.
Each driver has unique bugs. So somethings that work in Firefox, may not work in Chrome or so on. So the only way around is to find what works and use it. And if possible report the issue to the driver owner
In your case since finding the element and clicking on it doesn't work with
var elementForMs = driver.findElement(By.xpath(selector));
driver.executeScript("arguments[0].click()", elementForMs);
But works when you use javascript directly in console. that means you should execute the same in your code
driver.executeScript("document.getElementXXX().click()");
This is solved. I had my baseURL wrong. And it was going to a different site, that redirected to the base url of the correct site. Thanks.
I have a problem using driver.get() with a variable inside. For example:
driver.get(baseURL+othervariable);
When I execute it, the browser goes to baseURL alone.
I have added before that line a console print to make sure the concatenation is ok, like:
System.out.println(baseURL+othervariable);
driver.get(baseURL+othervariable);
And I can see in the console that the concatenation is ok.
The weird thing is that if I insert the url directly without base url, like:
driver.get("http://examplesite.com/subsection");
It works.
Why am I facing this problem? because I'm using a for cycle in order to open an array of URLs that I need to check.
So the structure of my program is something like:
for (i=0 ; i<URLs.lenght ; i++) {
driver.get(baseURL + URLs[i]);
// then do some stuff
}
But the browser always open baseURL alone.
The weird thing is that I don't have any problem when executing this in the lower environments of this website. The problem occurs with the Live site.
Could it be that some configuration in the site is preventing Selenium from going to the desired URL?
But then I don't understand why when I insert the URL directly as a String into driver.get(), it works as expected, even in the live site.
So the problem is when I insert a variable inside, and only in the live site.
I'm totally confused. I tried Firefox driver, Chrome driver, etc. All do the same.
I also tried:
String finalURL = baseURL+URLs[i];
driver.get(finalURL);
And it refuses to open the complete URL. I never had this problem in many tests. Many times I executed driver.get() with variables and concatenations inside and I never faced this problem.
Could someone give me a hint? why is the problem only appearing when sending the URL as a variable but not when I send it as a String?
I'm using Selenium 3.0 btw.
Thanks for your help.
Try to be sure of URLs syntaxe (especially \ and //)
For some website, incorrect URLs redirect to baseURL page.
If you're working on an AngularJS project, you may need some waits (try ngWebDriver)
Good luck !
I'm trying to write an automated test that will automate the process of updating a google chrome extension. I'm not aware of another method of doing this automatically so here is what I'm currently trying to do:
Open the chrome extensions page (as far as I'm aware this is just an html page unless I'm missing something).
Click on the "Update extensions" button
Here is what I have tried having opened the chrome extensions page:
IwebElement UpdateButton = driver.findelement(By.Id("update-extensions-now"));
UpdateButton.Click();
For some reason the button click is not registering. I have tried some other locators such as CSS path and Xpath but they don't work either. Also, when I debug this test, it passes fine so I know it's not an issue with any of my locators. I have (as a test) tried to automate clicks on the other elements on this page and it's the same issue. I can't get a handle on any elements on the chrome://extensions page at all.
Has anyone encountered this or have any ideas as to what's going on?
You can use the Chrome extensions API to auto-update required extension.
Find the file "manifest.json" in the default Google Chrome
C:\Users\*UserName*\AppData\Local\Google\Chrome\User Data\Default\Extensions
There find the update URL of your extension:
{
"name": "My extension",
...
"update_url": "http://myhost.com/mytestextension/updates.xml",
...
}
The returned XML by the Google server looks like:
<?xml version='1.0' encoding='UTF-8'?>
<gupdate xmlns='http://www.google.com/update2/response' protocol='2.0'>
<app appid='yourAppID'>
<updatecheck codebase='http://myhost.com/mytestextension/mte_v2.crx' version='2.0' />
</app>
</gupdate>
appid
The extension or app ID, generated based on a hash of the public key, as described in Packaging. You can find the ID of an extension or Chrome App by going to the Extensions page (chrome://extensions).
codebase
A URL to the .crx file.
version
Used by the client to determine whether it should download the .crx file specified by codebase. It should match the value of "version" in the .crx file's manifest.json file.
The update manifest XML file may contain information about multiple extensions by including multiple elements.
Another option is to use the --extensions-update-frequency command-line flag to set a more frequent interval in seconds. For example, to make checks run every 45 seconds, run Google Chrome like this:
chrome.exe --extensions-update-frequency=45
Note that this affects checks for all installed extensions and apps, so consider the bandwidth and server load implications of this. You may want to temporarily uninstall all but the one you are testing with, and should not run with this option turned on during normal browser usage.
The request to update each individual extension would be:
http://test.com/extension_updates.php?x=id%3DyourAppID%26v%3D1.1
You can find even more detailed information on exntesions developers site: https://developer.chrome.com/extensions
If you look at the HTML of the "chrome://extensions" page you will notice that the "Update extensions now" button is contained within an iframe. You need to switch to the iframe before trying to register a button click. i.e:
(This is in c#. Note that this code is written from memory so it may not be 100% accurate. Also, you will want to write more robust method. This code just quickly demonstrates that by switching to the iframe, it will work ok)
String ChromeExtensionsPage = "chrome://extensions";
driver.Navigate().GoToUrl(ChromeExtensionsPage);
driver.Switchto().Frame("DesiredFrame");
IwebElement UpdateButton = driver.findelement(By.Id("DesiredButtonID"));
UpdateButton.Click();
I am working on writing a story for a bdd framework which uses jbehave/selenium/webdriver and am having a problem where the test encounters an error while running the story but appears to be fine when running manually. I'm having a problem where javascript for the functionality I'm testing behaves slightly different when I'm running tests manually on firefox vs through selenium web driver on the same system/version of firefox and this difference is causing a js error.
I've debugged and basically the root of the problem appears to be that var request_XML_container = $('div_appendpoint_id'); returns something different when I'm running the test manually vs when I run through the bdd framework.
var request_XML_container = $('div_appendpoint_id');
request_XML_container.innerHTML = encoded_xml_from_request;
var pos = method_to_get_position('id_of_place_div_should_be_appended_to');
// JS exception is thrown saying that style is not defined **ONLY**
// when running through web driver. Running test manually on
// same system and same browser works fine.
request_XML_container.style.left = (pos[0] - 300) + 'px';
request_XML_container.style.top = (pos[1] + 25) + 'px';
request_XML_container.style.display = "block";
Why this would work fine when running manually that var request_XML_container = $('div_appendpoint_id'); would return an item with style defined, but when running through webdriver that the style attribute of the element would not be defined?
UPDATE: I had originally thought that this was updating an iframe, but I read the markup wrong and the iframe I saw is a sibling - not a parent - of the element where the response is being appended to. I'm trying to simply append the response to a div. To be honest, this only makes things more confusing as grabbing a div by id should be pretty straight forward and I'm now sure why webdriver would be producing a different return element in this situation.
UPDATE 2: Steps to reproduce and information about the system I'm on:
Use webdriver to navigate to this url: http://fiddle.jshell.net/C3VB5/11/show/
Have webdriver click the button. It should not work
Run your test again, but pause put a breakpoint at your code to click the driver
Click the button on the browser that webdriver opened. It should not work
Refresh the browser page on the browser that webdriver opened. Now, it should work.
System details:
OS : OS X 10.8.5 (12F37)
IDE : Eclipse Kepler: Build id: 20130614-0229
Browser (used manually and by webdriver) : Firefox 23.0.1
Selenium version: 2.35.0
UPDATE 3: I have provided this maven project on github to aid in reproducing: https://github.com/dkwestbr/WebdriverBug/tree/master/Webdriver
Synopsis/tl:dr; Basically, in certain situations it appears as though webdriver is overwriting the '$()' javascript method with a method that does not return an HTMLElement with innerHTML or style defined (among other things). This post details the issue and how to reproduce.
I have opened this ticket to track the issue: https://code.google.com/p/selenium/issues/detail?id=6287&thanks=6287&ts=1379519170
I have confirmed that this is a bug with the Thucydides framework (understandable since they still aren't at a 1.0 release).
Issue can be tracked here: https://java.net/jira/browse/THUCYDIDES-203