Selenium Best Practice: One Long Test or Several Successively Long Tests? - selenium

In Selenium I often find myself making tests like ...
// Test #1
login();
// Test #2
login();
goToPageFoo();
// Test #3
login();
goToPageFoo();
doSomethingOnPageFoo();
// ...
In a unit testing environment, you'd want separate tests for each piece (ie. one for login, one for goToPageFoo, etc.) so that when a test fails you know exactly what went wrong. However, I'm not sure this is a good practice in Selenium.
It seems to result in a lot of redundant tests, and the "know what went wrong" problem doesn't seem so bad since it's usually clear what went wrong by looking at the what step the test was on. And it certainly takes longer to run a bunch of "build up" tests than it takes to run just the last ("built up") test.
Am I missing anything, or should I just have a single long test and skip all the shorter ones building up to it?

I have built a large test suite in Selenium using a lot of smaller tests (like in your code example). I did it for exactly the same reasons you did. To know "what went wrong" on a test failure.
This is a common best practice for standard unit tests, but if I had to do it over again, I would go mostly with the second approach. Larger built-up tests with some smaller tests when needed.
The reason is that Selenium tests take an order of magnitude longer than standard unit tests to run, particularly on longer scenarios. This makes the whole test suite unbearably long with most of the time being spent on running the same redundant code over and over again.
When you do get an error, say in a step that is repeated at the beginning of 20+ different tests, it does not really help to know you got the same error 20+ times. My test runner runs my test out of order so my first error isn't even on the first incremental test of the "build-up" series so I end up looking at the first test failure and it's error message to see where the failure came from. The same thing I would do with if I had used larger "built-up" tests.

Related

How to compare value from previous test run with current?

I'm using robotframework and Selenium via Selenium2Library
I would like to test if value extracted from DOM element changed and is different than one checked in previous test run.
I'm thinking about using Robotframework-MongoDB-Library or other database. Next step would be adding custom mini-library for saving and retrieving extracted value for test cases.
In first test run all this kind of test will be marked as failed but next runs theoretically should work correctly.
I'm not experienced in testing field, is this right approach? If not then how can I execute this kind of tests?
This is a bad practice, as on the 2nd run (which will pass) you don't really know if that DOM is actually correct as it might be a persistent issue.
The idea is that tests are reproducible, so when something fails, you can reproduce the reason why they failed.
Also, this approach might cause an interesting behaviour change in your team: When the tests fail, re-run them until they pass, and don't bother looking at why they failed (I would bet good money on this :)).
Something you might want to do is to refine your test, so you only check the bits that are important, rather than the whole DOM (or a big chunk of it)

Grails integration tests failing in a (seemingly) random and non-repeatable way

We are writing integration tests for our Grails 2.0.0 application with the help of the Fixtures and Buid-Test-Data plugins.
During testing, it was discovered that the integration test fail at certain times, and pass at other times. Running 'test-app' sometimes results in all tests passing, and sometimes results in some of our tests failing.
When the tests fail, they are caused by a unique constraint being violated during the insert of an instance of a domain class. This would indicate that there are still records in the test DB. I am running the H2 db, and have definitely got 'dbCreate = "create-drop"' in my DataSource.groovy.
Grails 2.0 integration test pollution? seems to indicate there is a significant test-pollution problem in Grails. Are there any solutions to this? Have I hit Grails-8530?
[Edit] the test-pollution seems to be caused by the unit tests. We have sort-of proved this by deleting the unit tests and successfully running 'test-app' repeatedly.
When I run into errors like this I like to try and find the unit test(s) that is causing the problem. This might be kinda tricky since yours seem to only be failing on occasion.
1) I'd look at unit tests that were recently added. If this problem just started happening then that's a good place to look.
2) Metaclassing seems to be good at causing these type of errors so I'd look for metaclassing that isn't setup/torn down properly. Not as much of an issue with 2.0 as with <= 1.3.7 but could be the problem.
3) I wrote a plugin that executes your tests in a random order. Which might not help you solve your current problem. But what might help you is it prints out all of your tests so you can take what it gives you and run grails test-app <pasted list of unit tests> IntegrationTestThatIsFailing then start removing unit tests to find the culprit(s). ( http://grails.org/plugin/random-test-order). I found a bug in this with 2.0 that I haven't had time to fix yet (integration tests fail when asserting on rendered view name) but it should still print out your test names for you (which is better than doing it yourself :)
The fact integration tests fail with a constraint violation due to existing records reminds me of a situation I once encountered with functional tests (selenium) executing in unpredictable order, some of them not cleaning up the database properly. Sure, the situation with functional tests is different, since it is more difficult to restore the database state (The testcase cannot rollback a transaction in another jvm).
Although integration tests usually roll back transactions, it is still possible to break this behavior if your code controls transactions (commits) explicitly.
First, I would try forcing execution order as mentioned by Jarred in 3). Assuming you can then reproduce the behavior, I would then check transactional behaviour next. Setting the logging level of org.hibernate.transaction to debug should show you where transaction boundaries are.
Sorry, don't yet have a good explanation why wiping out the unit tests helps getting rid of the symptoms besides a general "possibly metaclassing issues". :)

How do you compare the results of two nunit test runs?

We currently have a situation where several tests are failing. Someone is working on this, but it is not me. I have been tasked with other work. So I plan on running the tests in NUnit before I begin my work so I have a base line of failing tests and what the failure message is. I would like to use this result to verify that those tests fail with the exact same failure result while testing my own code. are there any resources that would allow me to do this?
update
I'm aware of the ExpectedException attribute. However that will not work for the tests that are failing the test condition. Also there are thousands of tests of which only about 100 tests are failing. I was hoping for something that would compare the two test runs and show me the differences.
I'd throw an
[Ignore("SomeCustomStringICanFindLater")]
attribute on the failing tests until they are fixed.
See IgnoreAttribute.
And try to convince your manager that a broken build should be everyone's top priority.
After doing some research while waiting on an answer. I found that the console runner produces xml output. I can use a diff tool to compare the two test runs and see which tests failed differently than the baseline test run.

How to protect yourself when refactoring non-regression tests?

Are there specific techniques to consider when refactoring the non-regression tests? The code is usually pretty simple, but it's obviously not included into the safety net of a test suite...
When building a non-regression test, I first ensure that it really exhibits the issue that I want to correct, of course. But if I come back later to this test because I want to refactor it (e.g. I just added another very similar test), I usually can't put the code-under-test back in a state where it was exhibiting the first issue. So I can't be sure that the test, once refactored, is still exercising the same paths in the code.
Are there specific techniques to deal with this issue, except being extra careful?
It's not a big problem. The tests test the code, and the code tests the tests. Although it's possible to make a clumsy mistake that causes the test to start passing under all circumstances, it's not likely. You'll be running the tests again and again, so the tests and the code they test gets a lot of exercise, and when things change for the worse, tests generally start failing.
Of course, be careful; of course, run the tests immediately before and after refactoring. If you're uncomfortable about your refactoring, do it in a way that allows you to see the test working (passing and failing). Find a reliable way to fail each test before the refactoring, and write it down. Get to green - all tests passing - then refactor the test. Run the tests; still green? Good. (If not, of course, get green, perhaps by starting over). Perform the changes that made the original unrefactored tests fail. Red? Same failure as before? Then reinstate the working code, and check for green again. Check it in and move onto your next task.
Try to include not only positive cases in your automated test, but also negative cases (and a proper handler for them).
Also, you can try to run your refactored automated test with breakpoints and supervise through the debugger that it keeps on exercising all the paths you intended it to exercise.

What is code coverage and how do YOU measure it?

What is code coverage and how do YOU measure it?
I was asked this question regarding our automating testing code coverage. It seems to be that, outside of automated tools, it is more art than science. Are there any real-world examples of how to use code coverage?
Code coverage is a measurement of how many lines/blocks/arcs of your code are executed while the automated tests are running.
Code coverage is collected by using a specialized tool to instrument the binaries to add tracing calls and run a full set of automated tests against the instrumented product. A good tool will give you not only the percentage of the code that is executed, but also will allow you to drill into the data and see exactly which lines of code were executed during a particular test.
Our team uses Magellan - an in-house set of code coverage tools. If you are a .NET shop, Visual Studio has integrated tools to collect code coverage. You can also roll some custom tools, like this article describes.
If you are a C++ shop, Intel has some tools that run for Windows and Linux, though I haven't used them. I've also heard there's the gcov tool for GCC, but I don't know anything about it and can't give you a link.
As to how we use it - code coverage is one of our exit criteria for each milestone. We have actually three code coverage metrics - coverage from unit tests (from the development team), scenario tests (from the test team) and combined coverage.
BTW, while code coverage is a good metric of how much testing you are doing, it is not necessarily a good metric of how well you are testing your product. There are other metrics you should use along with code coverage to ensure the quality.
Code coverage basically tells you how much of your code is covered under tests. For example, if you have 90% code coverage, it means 10% of the code is not covered under tests.
I know you might be thinking that if 90% of the code is covered, it's good enough, but you have to look from a different angle. What is stopping you from getting 100% code coverage?
A good example will be this:
if(customer.IsOldCustomer())
{
}
else
{
}
Now, in the code above, there are two paths/branches. If you are always hitting the "YES" branch, you are not covering the "else" part and it will be shown in the Code Coverage results. This is good because now you know that what is not covered and you can write a test to cover the "else" part. If there was no code coverage, you are just sitting on a time bomb, waiting to explode.
NCover is a good tool to measure code coverage.
Just remember, having "100% code-coverage" doesn't mean everything is tested completely - while it means every line of code is tested, it doesn't mean they are tested under every (common) situation..
I would use code-coverage to highlight bits of code that I should probably write tests for. For example, if whatever code-coverage tool shows myImportantFunction() isn't executed while running my current unit-tests, they should probably be improved.
Basically, 100% code-coverage doesn't mean your code is perfect. Use it as a guide to write more comprehensive (unit-)tests.
Complementing a few points to many of the previous answers:
Code coverage means, how well your test set is covering your source code. i.e. to what extent is the source code covered by the set of test cases.
As mentioned in above answers, there are various coverage criteria, like paths, conditions, functions, statements, etc. But additional criteria to be covered are
Condition coverage: All boolean expressions to be evaluated for true and false.
Decision coverage: Not just boolean expressions to be evaluated for true and false once, but to cover all subsequent if-elseif-else body.
Loop Coverage: means, has every possible loop been executed one time, more than once and zero time. Also, if we have assumption on max limit, then, if feasible, test maximum limit times and, one more than maximum limit times.
Entry and Exit Coverage: Test for all possible call and its return value.
Parameter Value Coverage (PVC). To check if all possible values for a parameter are tested. For example, a string could be any of these commonly: a) null, b) empty, c) whitespace (space, tabs, new line), d) valid string, e) invalid string, f) single-byte string, g) double-byte string. Failure to test each possible parameter value may leave a bug. Testing only one of these could result in 100% code coverage as each line is covered, but as only one of seven options are tested, means, only 14.2% coverage of parameter value.
Inheritance Coverage: In case of object oriented source, when returning a derived object referred by base class, coverage to evaluate, if sibling object is returned, should be tested.
Note: Static code analysis will find if there are any unreachable code or hanging code, i.e. code not covered by any other function call. And also other static coverage. Even if static code analysis reports that 100% code is covered, it does not give reports about your testing set if all possible code coverage is tested.
Code coverage has been explained well in the previous answers. So this is more of an answer to the second part of the question.
We've used three tools to determine code coverage.
JTest - a proprietary tool built over JUnit. (It generates unit tests as well.)
Cobertura - an open source code coverage tool that can easily be coupled with JUnit tests to generate reports.
Emma - another - this one we've used for a slightly different purpose than unit testing. It has been used to generate coverage reports when the web application is accessed by end-users. This coupled with web testing tools (example: Canoo) can give you very useful coverage reports which tell you how much code is covered during typical end user usage.
We use these tools to
Review that developers have written good unit tests
Ensure that all code is traversed during black-box testing
Code coverage is simply a measure of the code that is tested. There are a variety of coverage criteria that can be measured, but typically it is the various paths, conditions, functions, and statements within a program that makeup the total coverage. The code coverage metric is the just a percentage of tests that execute each of these coverage criteria.
As far as how I go about tracking unit test coverage on my projects, I use static code analysis tools to keep track.
For Perl there's the excellent Devel::Cover module which I regularly use on my modules.
If the build and installation is managed by Module::Build you can simply run ./Build testcover to get a nice HTML site that tells you the coverage per sub, line and condition, with nice colors making it easy to see which code path has not been covered.
In the previous answers Code coverage has been explained well . I am just adding some knowledge related to tools if your are working on iOS and OSX platforms, Xcode provides the facility to test and monitor code coverage.
Reference Links:
https://developer.apple.com/library/archive/documentation/DeveloperTools/Conceptual/testing_with_xcode/chapters/07-code_coverage.html
https://medium.com/zendesk-engineering/code-coverage-and-xcode-6b2fb8756a51
Both are helpful links for learning and exploring code coverage with Xcode.
The purpose of code coverage testing is to figure out how much code is being tested. Code coverage tool generate a report which shows how much of the application code has been run. Code coverage is measured as a percentage, the closer to 100%, the better. This is an example of a white-box test. Here are some open source tools for code coverage testing:
Simplecov - For Ruby
Coverlet - For .NET
Cobertura - For Java
Coverage.py - For Python
Jest - For JavaScript
For PHP you should take a look at the Github from Sebastian Bergmann
Provides collection, processing, and rendering functionality for PHP code coverage information.
https://github.com/sebastianbergmann/php-code-coverage
What code coverage IS NOT
To truly understand what code coverage is, it is very important to understand what it is not.
A couple of answers/comments here and on related questions have alluded to this:
Franci Penov
BTW, while code coverage is a good metric of how much testing you are doing, it is not necessarily a good metric of how well you are testing your product.
steve
Just because every line of your code is run at some point in your tests, it doesn't mean you have tested every possible scenario that the code could be run under. If you just had a function that took x and returned x/x and you ran the test using my_func(2) you would have 100% coverage (as the function's code will have been run) but you've missed a huge issue when 0 is the parameter. I.e. you haven't tested all necessary scenarios even with 100% coverage.
KeithS:
However, the flip side of coverage is actually twofold: first, a test that adds coverage for coverage's sake is useless; every test must prove that code works as expected in some novel situation. Also, "coverage" is not "exercise"; your test suites may execute every line of code in the SUT, but they may not prove that a line of logic works in every situation.
No one says it more succinctly and to the point than Mark Simpson:
Code coverage tells you what you definitely haven't tested, not what you have.
An Illustrative Example
I spent some time writing a reply to a feature request that Istanbul (a Javascript test coverage tool) "Change definition of coverage to require more than 1 hit" per line. No one will ever see it there 🤣, so I thought it might be useful to reuse the gist of it here:
A coverage tool CANNOT prove that your code is tested adequately. All it can do is tell you that you provided some kind of coverage for every line of code in your codebase, but even then it doesn't prove the coverage means anything, because a test might execute a line of code without making any assertions on its results. Only you as a developer can decide the actual semantically unique input variations and boundary conditions that need to be covered by tests and ensure that the test logic does in fact make the right assertions.
For example, say you have the following Javascript function. A single test that asserts an input of (1, 1) returns 1 would give you 100% line coverage. What does that prove?
function max(a, b) {
return a > b ? a : b
}
Putting aside for a moment the semantically poor coverage of this test, the 100% line coverage is rather misleading too, as it doesn't provide 100% branch coverage. That's easily seen by splitting the branches onto different lines and rerunning the line coverage report:
function max(a, b) {
if (a > b) {
return a
} else {
return b
}
}
or even
function max(a, b) {
return a > b ?
a :
b
}
What this tells us is that the "coverage" metric depends too much on the implementation, whereas ideally testing should be black box. And even then it's a judgement call.
For example, would the following three input cases constitute complete testing of the max function?
(2, 1)
(1, 2)
(1, 1)
You'd get 100% line and 100% branch coverage for the above implementations. But what about non-number inputs? Ok, so you add two more input cases:
(null, 1)
(1, null)
which forces you to update the implementation:
function max(a, b) {
if (typeof a !== 'number' || typeof b !== 'number') {
return undefined
}
return a > b ? a : b
}
Looking good. You have 100% line and branch coverage, and you've covered invalid inputs.
But is that enough? What about negative numbers?
The ideal of 100% blackbox coverage is a fantasy
In my opinion, in this situation, for the simple nature of this function, testing negative number cases is anal overkill. If the situation were different, say the function only existed because we need to implemented some tricky algorithm or optimization, that may or may not work as expected for negative numbers, then I'd add more input cases including negative numbers.
Often times, you only discover corner cases because you have hundreds or thousands of users and only through their using your software in unexpected ways or in conditions and software environments you could not foresee or reproduce even if you could are such rare cases exposed. And often those rare cases are artifacts of the nature of your implementation, not something you'd arrive at from analysis of an idealized abstraction of the buggy code's interfaces.
I think what that shows is the ideal of 100% blackbox coverage is a bit of a fantasy. You would waste a lot of time writing unnecessary tests if you treated everything as an idealized black box. In the example above, I know the implementation uses a simple and reliable non-number check and then uses the native Javascript logic to compare values (a > b), and that it would be silly to do anything more complex. Knowing that, I'm not going to test passing in negative numbers, floats, strings, objects, etc.
At the end of the day, you have to be practical and use good judgement, and that judgement usually cannot ignore knowing something about the nature of what's in the black box, or at least the assumptions made inside the black box.
All this said, I don't have a CS degree 😂. What's the equivalent of IANAL for programmer advice?