Selenium grid runs out of free slots - selenium

I have a large suite of SpecFlow tests executing against a selenium grid running locally. The grid has a single host configured for max 10 firefox instances. The tests are run from NUnit serially, so I would only expect to require a single session at a time.
However, when approximately half of the test cases have been run, the console window reporting output from the hub starts reporting
INFO: Node host [url] has no free slots
Why?
All the test cases are associated with a TearDown method that closes and disposes the WebDriver, although I haven't verified that absolutely every test gets to this method without failing. I would expect a maximum of one session to be active at once. How can I find out what is stopping the host from recycling those sessions?
edit #1:
I think I've narrowed down the cause of the issue - it is indeed to do with not closing the WebDriver. There are [AfterScenario] attributes on the teardown methods that are meant to do this, but they only match a subset of scenarios as they have parameters on them. Removing the parameter so that the teardown associates with every scenario fixes the session exhaustion (or seems to) but there are some tests that expect to reacquire an existing session, so I'll have to fix them separately.
A bit of background: This test suite was inherited as part of a 'complete' solution and it's been left untouched and never run since delivery. I'm putting it back into service and have had to discover its quirks as I go - I didn't write any of this. I've had brief encounters with both Selenium and SpecFlow but never used the two together.

The issue turned out to be a facepalm-level fail - mostly in the sense that I didn't spot it. Some logging code was trying to write to a file that wasn't there, the thrown exception bypassed the call to Dispose() on the WebDriver, and was then swallowed with no error reporting. Therefore the sessions were hanging around. Removing the logging code fixed the session exhaustion.

Look on the node (remote desktop) and see what is happening on the box. It does sound like your test isn't closing out it's session properly.

Related

Is there a way to tell one browser instance from another when running concurrent tests in Testcafe

Is there a way to tell one browser instance from another when running concurrent tests in Testcafe?
Say we have two tests.
One creates some entity and then changes it and verifies that change is applied correctly.
Another deletes all the entities and verifies that everything is deleted.
If we run these tests in parallel they will interfere with each other. So there must be either a way to embrace this concurrency and synchronize these tests with some primitive or to make them parallel and run in isolated sandboxes.
I would prefer to go to the second option.
It could be something like
test('Some test', async t => {
await useSandbox(t.browser.alias, t.browser.os.name, t.browser.instanceId);
... rest of the test
})
But AFAIK there is no way to tell one browser instance from another inside the test code. Or is there?
TestCafe does not have a mechanism to affect test execution from another test. When TestCafe starts tests in parallel, it does not suppose that one test will interfere another.
TestCafe starts every test with clear cookies, storages and a user profile. So, if your data is kept in localStorage, every test will be run independently. However, if your data is kept on the server side (i.e. in a database), then TestCafe cannot use it in a sandbox, since all tests interact with DB through the same website.
In this case, it's better to run these two tests one by one, not simultaneously.

How can an uncalled test affect another in Go?

I have a test function TestJobqueue() in https://github.com/VertebrateResequencing/wr/blob/develop/jobqueue/jobqueue_test.go that I can call in isolation: go test -tags netgo ./jobqueue -v -run 'TestJobqueue$'.
I recently started getting test failures related to boltdb (one of my dependencies) bombing out with signal SIGBUS: bus error code panics, or just normally failing tests because the database couldn't be opened. But only when working off an NFS mounted directory. Fair enough, I or boltdb have some kind of NFS-related bug.
But the thing I can't wrap my head around is that I only get these errors when an entirely different test function exists.
As per the comments in TestREST() in https://github.com/VertebrateResequencing/wr/blob/92fb61ccd7819c8f1edfa8cce8468c4250d40ea7/jobqueue/rest_test.go, if I call Serve(serverConfig) (a function in the package being tested, a function call which is made many times in TestJobqueue() and other test functions) in that test function, TestJobqueue() fails. If I don't, it doesn't.
In short, the failure of tests in one test function can be controlled by the value of a boolean in a test function that I'm not running.
How is this possible?
Edit: to address some points brought up by the first answer, TestJobqueue() is being run in isolation. No other test runs before or after it. If the database file already exists, Serve() results in those files being deleted first, then a new one created to run the new set of tests. The odd thing that I'm seeking an answer for is how an unexecuted function can have this side effect. I can demonstrate it is really unexecuted by beginning or ending TestREST() with a panic call: the output of that panic is never seen, but TestJobqueue() failure can still be controlled by the boolean in TestREST() (if the panic comes at the end).
Edit2: this turns out to be caused by an unusual thing I do in TestJobqueue(), which is to call go test on itself. Needless to say, if you do this, strange things can happen...
In short, the failure of tests in one test function can be controlled by the value of a boolean in a test function that I'm not running.
This is not a great summary. Your test starts a server. The other test starts a server, clearly, the problem is there. You appear to have commented out the bit of code that stops the server at the end of the test? You can't run two servers on the same port.
You probably have a port conflict or some network condition that is triggered by running the two servers at once, because they both appear to use a similar (identical?) config loaded like this:
config := internal.ConfigLoad("development", true)
Running with no config uses default values, avoiding the conflict, running with config causes the conflict. So to pin it down, try creating a config with one setting at a time till you find the config setting that causes the problem (most likely Port or WebPort). Alternatively, make sure the tests stop the server at the end.
[EDIT] Looks like you have narrowed it down to DBFile config setting by changing one at a time. This implies the server starts a new db instance - if both try to use the same file for a new db, this would cause contention and the second test to run would fail.
It's not entirely clear from your description above what you're doing or what the problem is, so you could try to improve that to state exactly the sequence of actions and the problem. If for example you have previously run a test which creates a db, it could affect later test runs because of the presence of a db file, so your tests are not completely independent.
[EDIT 2 - after further edits to question]
If commenting out TestREST completely solves your problem (or a panic before it starts), and given changing it breaks the other test, you are executing TestREST somehow.
Looking at your code for jobqueue_test, it appears to invoke go test so you might be running more tests that you assume? Given you don't see the panic output I'd suspect your use of exec.Command in this big test. Try removing bits of the failing test till it works to narrow down exactly which invocation is running the other test. Calling go test within a test is pretty unusual!
https://github.com/VertebrateResequencing/wr/blob/develop/jobqueue/jobqueue_test.go#L2445

Selenium Grid Headless Parallel

I am using Selenium grid to scrape thousands of pages since all the pages are heavily populated by Javascript.
I found this tutorial which gave me a pretty good idea of how to set up Selenium grid and run script in parallel. However, my situation is a little different.
(1) I only want one type of browser, like Chrome(or Firefox), but I want to run as many as possible.
(2) To make sure this solution scale, I probably will use some Cloud service where the code will be running in Linux environment.
So here is my question:
Do I have to use TestNG/Junit frame work to run the code in parallel? If I run the code in multiple processes, all making requests to the same hub, will the hub coordinate them out of box?
(1) I only want one type of browser, like Chrome(or Firefox), but I
want to run as many as possible.
You should be running not as many possible, but a heuristic number, that just works for you. The reason being is, running like 30 chrome browsers at a time can give you unpredictable results.
(2) To make sure this solution scale, I probably will use some Cloud
service where the code will be running in Linux environment.
You can look at BrowserStack
Do I have to use TestNG/Junit frame work to run the code in parallel?
Thats upto you. As far as your creating the driver in multiple threads your fine. If your using your own FW, then you can create Thread pool and start creating the driver from each thread pool.
If I run the code in multiple processes, all making requests to the
same hub, will the hub coordinate them out of box?
Yes Selenium Hub will be co-ordinating this for you, out of the box. You no need to worry about anything here.

Possible issue with running selenium tests on one machine concurrently

I have multiple similar sites (same layout, just different data), and each of them has drop down menu on mouse over (and disappears on mouse out).
I am using Selenium 2 and WebDriver, and I have one selenium test case that basically do the mouse over and make sure each of the link in the drop down menu works.
I am using selenium grid, so I have a hub and few test machines.
Because I have many sites (few hundred) to test, so I am thinking of making each machine to run the test case against multiple sites in parallel.
My concern is because there can be only one active browser at a time, will it cause issue if web driver tries to perform Action.moveToElement() on multiple browsers at roughly the same time? Will only the active browser performs Action.moveToElement() properly and other browsers fail? If there will be an issue, is there any workaround?
I have tried it using JUnitCore.runClasses(ParallelComputer.classes(), SomeClass1.class, SomeClass2.class, SomeClass3.class);, it decreased the passed tests percentage from 100% to about 67% when running three tests on a machine. Not good =/.
The good part - firefox actually can do it in parallel. If the FF instances are delayed between each other so they don't do the same thing at the same time, it works better. Some of the failures happened during a Firefox bootup - so if you can minimize closing and opening windows, do it. But still, sometimes it just fails for no reason.
If you really would use the saved time, then go for it, log all failed tests and run them again after the first round - this time one at a time.
You could also solve this, depending on your ultimate goal of testing, by not using the Action class with the mouse-movement click, but instead use the WebDriver findBy-click method or Javascript executor method. It would probably be less contentious when running multiple windows at the same time. If the Action class, when defining a mouse movement, uses native calls at all, such as "move to Point", then one browser over the top of another, then I would guess it's possible that the movement point could be masked by another window. I am really not sure about this, just giving you another idea to try.

Click does not always work in Selenium

I use Selenium with PHPUnit, and sometimes test fail with an error condition which seems to be caused by the browser ignoring clickAndWait calls. The test execution passes the clickAndWait command without much delay (even if I set a large timeout), and the next assertion or element access fails; if I make a screenshot, it shows the previous page as if the click command did not happen at all. This happens both with links and with submit buttons (both normal, no javascript: or similar trickery), non-deterministically. It seems to happen more often on certain controls than others (many are not affected at all), and the frequency of tests failing seems more or less contant in the short term, but changes wildly in the long term (sometimes it is 1 in 100, sometimes 1 in 2). I am guessing it is influenced by some sort of server load, but could not see any obvious correlation.
I work more with Selenium 2 but I have noticed this as well. In my case I suspect other system clicks were interfering with Selenium (purely speculation) since I ran the tests on my machine.
The way I solved it was to instead send a key press of the Return key. For most cases this is equivalent to a click and in my experience has created more stable tests.
A quick caveat is that this technique stopped working for me after version 2.3.0. I submitted a bug report about it if you want to take a look.