Find commit that broke a test without running every test on every commit - testing

I am trying to find a tool that can solve the following problem:
Our entire test suite takes hours to runs which often makes it difficult or at least very time consuming to find out which commit broke a specific test since there may be 50 to 200 commits in between test runs. At any given time there are only very few broken tests, so rerunning only the broken tests is very fast compared to running the entire test suite.
Is there a tool, e.g. a continuous integration server, that can rerun failed tests with a couple of revisions in between the last revision where the test was OK and the first revision where the test was not OK and therefore automatically find out on which specific commit a test switched from successful to broken.
For example:
Test A and B are ok in revision 100. Test A and B are broken in revision 200.
The tool should now run both tests with revision 150.
Then if e.g. test A was broken and test B was OK in revision 150, it could continue to check test A with revision 125 and test B with revision 175 and so on until every broken test can be explained by some specific commit.
For a single test, I could probably hack something together with git bisect. But for multiple failed test this is probably not sufficient, since we need to search in both directions for many revisions.

git
Are you using git or mercurial (see: What is Mercurial bisect good for?)?
Suppose version 2.6.18 of your project worked, but the version at "master" crashes. Sometimes the best way to find the cause of such a regression is to perform a brute-force search through the project's history to find the particular commit that caused the problem. The git bisect command can help you do this:[...]
From: Finding Issues - Git Bisect.
Essentially you mark two revisions as start and end and hit:
$ git bisect
Then run the test and depending on whether it passed or failled call
$ git bisect good
or
$ git bisect bad
Respectively. git will do binary search, always cutting remaining revisions in half. I guess you can script it easily. If you are using svn, git can easily import the whole repository.
Building single revision at a time
This is not an answer but an advice - just test every single commit! Today you can cluster continuous integration servers easily, with a farm of servers it is not that hard to test all commits.

Related

Rerun Failed Behat Tests Only A Set Number of Times

I have an Behat testing suite running through Travis CI on Pull Requests. I know that you can add a "--rerun" option to the command line to rerun failed tests, but for me Behat just keeps trying to rerun the failed tests, which eventually times out the test run session.
Is there a way to limit the number of times that failed tests are re-ran? Something like: "behat --rerun=3" for trying to run a failed scenario up to three times?
The only other way I can think to accomplish what I want is to use the database I'm testing Behat against or to write to a file and store test failures and the number of times they have been run.
EDIT:
Locally, running the following command ends up re-running only the one test I purposely made to fail...and it does it in a loop until something happens. Sometimes it was 11 times and sometimes 100+ times.
behat --tags #some_tag
behat --rerun
So, that doesn't match what the behat command line guide states. In my terminal, the help option give me "--rerun Re-run scenarios that failed during last execution." without any mention of the failed scenario file. I'm using a 3.0 version of behat though.
Packages used:
"behat/behat": "~3.0",
"behat/mink": "~1.5",
"behat/mink-extension": "~2.0",
"behat/mink-goutte-driver": "~1.0",
"behat/mink-selenium2-driver": "~1.1",
"behat/mink-browserkit-driver": "~1.1",
"drupal/drupal-extension": "~3.0"
Problem:
Test fail at random due to mainly Guzzle timeout errors going past 30 seconds trying to GET a URL. Sure you could try bumping up the max execution time, but since other tests have no issues and 30 seconds is a long time to wait for a request, I don't think that will fix the issue and it will make my test runs much longer for not a good reason.
It is possible that you might not have an efficient CI setup?
I think that this option should run the failed tests only once and maybe your setup enters in a loop.
If you need to run the failed scenarios for a number of times maybe you should write a script that checks some condition and runs with --rerun.
Be aware that as i see on the Behat CLI guide if no scenario was found in the file where the failed scenarios are saved then all scenarios will be executed.
I don't think that using '--rerun' in CI is good practice. You should do a high level review of the results before deciding to do a rerun.
I checked today the rerun on Behat 3 and it seems there might be a bug related to the rerun option, i saw today some pull requests on github.
One of them is https://github.com/Behat/Behat/pull/857
Related to the timeout you can check if Travis has some timeout to set, it it has enough resources and you can use the same steps to run it from desktop and see the difference for the same test environment.
Also set CURLOPT_TIMEOUT for guzzle with the value needed to pass in case is not to exaggerated and you will need to find other solution to improve the execution speed.
It should not be such an issue to have a higher value because this should be a conditional wait, so if is faster it will not impact the execution time, else it will wait longer for problematic scenarios.

What is the correct way to create branch in RCS, and do you need to set a lock first?

I am looking for best practices using branches in RCS.
I had read the man page for rcs and ci and also browsed at the following links:
http://www.gnu.org/software/rcs/manual/html_node/Concepts.html
http://www.gnu.org/software/rcs/manual/html_node/Quick-tour.html
Suppose i have revision 1.3 on tip of the trunk.
I now want to change file 1.2 (as 1.3 have several other changes I cannot yet use).
I understand I can create branch on revision 1.2 using ci -r1.2.1
My question are the follows:
1. Do I need to set a lock on the file? If so, on which revision?
2. If no lock set, I cannot use -u flag in order to keep the file in my local dir. In case I wish to do so, is it still possible without co the file again?
Side note: I feel RCS does not suit my company needs however migrating to another system is not my decision to make, so currently I need to keep working with it.
I'm looking for much the same thing, but seeing you've had no answers, I'll offer my current practice:
I use branches for development, not for keeping different variants going in parallel. The trunk is reserved for my best, presumably working, code on the and I try not to check in anything there that might break it. I branch the code when I want to start a line of development that will take some time, break it for a while, is an experiment I might have to be abandon, etc.
To start a new line of development I change the default branch to a new branch off the trunk rev that's to be the base of my code, and force a checkin onto that branch, with:
rcs -b1.2.1 foo.cpp
ci -f1.2.1 -l foo.cpp
Now I can dive in to developing the branch, and my next check-ins will go onto the new branch instead of onto the trunk. Whether you lock a revision or not is only relevant to whether you intend to modify the working file.
You're correct that you can't keep both revisions, trunk-tip and branch-tip in the same folder; they have the same file name. But you can check out one of them with a -p switch which forces the output to stdout (instead of to a local file) which you can then redirect into a sub-folder, or to a local file with a unique name.

How to ignore some builds in Hudson test results trend?

Sometimes the hudson build step fails and does not execute all my tests, which really screws up the test trend graph because it shows a 50% drop in the number of tests and then goes up again. Is there a way to exclude the failed builds? I tried to delete the whole failed build but that didn't help.
If you delete the "offending" build and then run another build I think you'll find the trend graph is rebuilt without the blip.
That said, I have to side with #DarkDust on this one. If your CI environment is prone to wobble then these stats can be useful for diagnostics.

Mercurial - Delivering Isolated Features to Test and Live

We are going to swap to Mercurial.
A piece in our plan that is missing is how to manage branching/merging of build to Test Box (and LiveBox) so isolated features can be mixed with the StableRelease and built to TestBox.
For instance, it seems the predominant usage is to have
DefaultStableBanch
TestBanch
FeatureABranch
FeatureBBranch
Development on FeatureA and FeatureB, will happen at the same time. It looks like predominant usage is to have cloned repositories with branches for the above.
Scenario 1 : If we build to test, we would merge LiveCode+FeatureA+FeatureB. If all goes well then we can merge the changesets to upstream to DefaultStable branch and build to LiveBox with FeatureA and FeatureB. job done.
Scenario 2 : If we build to test, we would merge LiveCode+FeatureA+FeatureB, and QA shows there is a problem with FeatureB. We do not want to build FeatureB anymore. We do want to progress FeatureA. We want to re-test with FeatureA on it's own and let QA pass that. Then release that to Live and hence business agility.
Questions :
If FeatureB fails QA , we need to take out FeatureB changeset nodes from the Test Branch, build to TesBox again and then hopefully then merge upstream to DefaultStable branch to LiveBox.
What is the best way of removing the FeatureB changeset nodes, from the TestBranch, since 1. we need more dev on FeatureB, and the FeatureB nodeset is not finsihed.
2. We need to isolate DefaultStable+FeatureABranch and build that to test
How are other people managing this ?
There are a lot of great writeups up Mercurial workflows, including:
http://stevelosh.com/blog/2010/02/mercurial-workflows-branch-as-needed/
http://stevelosh.com/blog/2010/05/mercurial-workflows-stable-default/
https://www.mercurial-scm.org/wiki/StandardBranching
All of those use Named Branches very minimally -- definitely not one per feature, which with clones as branches sounds like the work mode you (and I) prefer.
Hitting your specific question, if the combination LiveCode+FeatureA+FeatureB is failing tests the the best way to handle it is to just keep repairing FeatureB and then merging those changes down into FeatureA + Feature B. However, before you get to that stage it's a good idea to have QA hit LiveCode+FeatureA and LiveCode+FeatureB separately too, it's slightly more work for them (more test targets) but having each feature in isolation helps find the cause of the defect more quickly.
Once LiveCode+FeatureA and LiveCode+FeatureB are passing QA, then you merge them into LiveCode+FeatureA+FeatureB and if that still passes tests merge the whole thing into DefaultStable. There should be no need to remove FeatureB from a LiveCode+FeatureA+FeatureB because you can always just create a new clone of LiveCode and merge in only FeatureA if you want it.
Here's a great writeup of a Mercurial (Kiln) based QA/release process:
http://blog.bitquabit.com/2011/03/10/when-things-go-well/
Create FeatureA and FeatureB in feature clone branches from stable. Test is merely a temporary area for QA/Test to work from, so I would treat it as 'throwaway' from day one.
When FeatureA and FeatureB are developed enough, create a clone of either one, and pull the other into QA/Test. Do the build for QA, and when they provide feedback, make appropriate changes to FeatureB.
If FeatureA is acceptable for promotion, pull it into/push it to Stable, and merge into stable.
Is that clearer than my original post?

TeamCity: Managing deployment dependencies for acceptance tests?

I'm trying to configure a set of build configurations in TeamCity 6 and am trying to model a specific requirement in the cleanest possible manner way enabled by TeamCity.
I have a set of acceptance tests (around 4-8 suites of tests grouped by the functional area of the system they pertain to) that I wish to run in parallel (I'll model them as build configurations so they can be distributed across a set of agents).
From my initial research, it seems that having a AcceptanceTests meta-build config that pulls in the set of individual Acceptance test configs via Snapshot dependencies should do the trick. Then all I have to do is say that my Commit build config should trigger AcceptanceTests and they'll all get pulled in. So, lets say I also have AcceptanceSuiteA, AcceptanceSuiteB and AcceptanceSuiteC
So far, so good (I know I could also turn it around the other way and cause the Commit config to trigger AcceptanceSuiteA, AcceptanceSuiteB and AcceptanceSuiteC - problem there is I need to manually aggregate the results to determine the overall success of the acceptance tests as a whole).
The complicating bit is that while AcceptanceSuiteC just needs some Commit artifacts and can then live on it's own, AcceptanceSuiteA and AcceptanceSuiteB need to:
DeploySite (lets say it takes 2 minutes and I cant afford to spin up a completely isolated one just for this run)
Run tests against the deployed site
The problem is that I need to be able to ensure that:
the website only gets configured once
The website does not get clobbered while the two suites are running
If I set up DeploySite as a build config and have AcceptanceSuiteA and AcceptanceSuiteB pull it in as a snapshot dependency, AFAICT:
a subsequent or parallel run of AcceptanceSuiteB could trigger another DeploySite which would clobber the deployment that AcceptanceSuiteA and/or AcceptanceSuiteB are in the middle of using.
While I can say Limit the number of simultaneously running builds to force only one to happen at a time, I need to have one at a time and not while the dependent pieces are still running.
Is there a way in TeamCity to model such a hierarchy?
EDIT: Ideas:-
A crap solution is that DeploySite could set a 'in use flag' marker and then have the AcceptanceTests config clear that flag [after AcceptanceSuiteA and AcceptanceSuiteB have completed]. The problem then becomes one of having the next DeploySite down the pipeline wait until said gate has been opened again (Doing a blocking wait within the build, doesnt feel right - I want it to be flagged as 'not yet started' rather than looking like it's taking a long time to do something). However this sort of stuff a flag over here and have this bit check it is the sort of mutable state / flakiness smell I'm trying to get away from.
EDIT 2: if I could programmatically alter the agent configuration, I could set Agent Requirements to require InUse=false and then set the flag when a deploy starts and clear it after the tests have run
Seems you go look on the Jetbrains Devnet and YouTrack tracker first and remember to use the magic word clobber in your search.
Then you install groovy-plug and use the StartBuildPrecondition facility
To use the feature, add system.locks.readLock. or system.locks.writeLock. property to the build configuration.
The build with writeLock will only start when there are no builds running with read or write locks of the same name.
The build with readLock will only start when there are no builds running with write lock of the same name.
therein to manage the fact that the dependent configs 'read' and the DeploySite config 'writes' the shared item.
(This is not a full productised solution hence the tracker item remains open)
EDIT: And I still dont know whether the lock should be under Build Parameters|System Properties and what the exact name format should be, is it locks.writeLock.MYLOCKNAME (i.e., show up in config with reference syntax %system.locks.writeLock.MYLOCKNAME%) ?
Other puzzlers are: how does one manage giving builds triggered by build completion of a writeLock task read access - does the lock get dropped until the next one picks up (which would allow another writer in) - or is it necessary to have something queue up the parent and child dependency at the same time ?