go tests: clean up after panic - testing

Suppose that I set up a Docker container with a DB for my tests, and I do that in testing.TestMain because I want this to be done once and globally. I write a defer statement in this testing.Main() that does the cleanup (namely, removes the DB container).
Now, suppose that something goes wrong and my tests panic. This issue tells me that I can not write custom recover code to ensure that the container is removed. This is true: testing.M.Run() does its own recover() call, and it looks like there's no way to override its behaviour.
The question is: what should I do to make sure that my cleanup code is executed no matter what?

As noted in the issue you linked to:
The panic could come from a goroutine started by a test and the
testing package can't add a defer to those goroutines to catch the
panic.
Aadditionally, some panics are impossible to recover, e.g. out of
menory or runtime memory corruption.
In short, you cannot make sure that any code will be executed in all circumstances.
If your cleanup is non-critical, you can do it before & after (i.e. at the start of your tests, check if your container already exists and destroy it before creating a new one, then make a best effort to destroy it at the end). If your cleanup is critical, then wrap your go test call with something (e.g. a shell script or makefile) and make the wrapper responsible for the setup & teardown of external dependencies.

Related

How to define an optional change set in Liquibase?

We use Liquibase as database refactoring tool in a cloud service, and would now like to employ it to do some lightweight data migration, which would be realized as CustomTaskChange and would take just a few seconds. This data migration is 'nice to have' but it is by no means mandatory for the service to function properly - if it fails for some reason, the change set should just be skipped, the service started nevertheless, and the change set retried during the next restart of the service until it finally succeeds. So, errors when executing the change set should be ignored but the set marked as ran only after it actually ran successfully once.
We wonder how we could implement this kind of behavior using Liquibase: The <changeSet> attribute failOnError="false" continues in case of an error but according to documentation and an answer given by Nathan Voxland here at StackOverflow it always marks the change set as ran - hence Liquibase wouldn't retry to execute it during the next startup of the service. The <preConditions> attribute onFail seems to be concerned only with failing preconditions so startup would still fail in case of an error when setting onFail to, say, CONTINUE.
Is there any other option / attribute that we overlooked or a recommended fashion to solve this kind of situation?
You may be able to achieve the "retry until successful" behaviour if you implement the optional data migration inside the code of a custom precondition. Then, you could configure onFail of that precondition to CONTINUE which will give you the behaviour you want (source):
CONTINUE – Skip over the change set. Execution of the change set will be attempted again on the next update. Continue with the change log.
I'm not entirely sure if implementing the migration in the precondition code is technically possible – because it certainly wasn't meant for such things. And you also may want to verify that the custom precondition is not executed again once the patch set has been marked as ran.

How can an uncalled test affect another in Go?

I have a test function TestJobqueue() in https://github.com/VertebrateResequencing/wr/blob/develop/jobqueue/jobqueue_test.go that I can call in isolation: go test -tags netgo ./jobqueue -v -run 'TestJobqueue$'.
I recently started getting test failures related to boltdb (one of my dependencies) bombing out with signal SIGBUS: bus error code panics, or just normally failing tests because the database couldn't be opened. But only when working off an NFS mounted directory. Fair enough, I or boltdb have some kind of NFS-related bug.
But the thing I can't wrap my head around is that I only get these errors when an entirely different test function exists.
As per the comments in TestREST() in https://github.com/VertebrateResequencing/wr/blob/92fb61ccd7819c8f1edfa8cce8468c4250d40ea7/jobqueue/rest_test.go, if I call Serve(serverConfig) (a function in the package being tested, a function call which is made many times in TestJobqueue() and other test functions) in that test function, TestJobqueue() fails. If I don't, it doesn't.
In short, the failure of tests in one test function can be controlled by the value of a boolean in a test function that I'm not running.
How is this possible?
Edit: to address some points brought up by the first answer, TestJobqueue() is being run in isolation. No other test runs before or after it. If the database file already exists, Serve() results in those files being deleted first, then a new one created to run the new set of tests. The odd thing that I'm seeking an answer for is how an unexecuted function can have this side effect. I can demonstrate it is really unexecuted by beginning or ending TestREST() with a panic call: the output of that panic is never seen, but TestJobqueue() failure can still be controlled by the boolean in TestREST() (if the panic comes at the end).
Edit2: this turns out to be caused by an unusual thing I do in TestJobqueue(), which is to call go test on itself. Needless to say, if you do this, strange things can happen...
In short, the failure of tests in one test function can be controlled by the value of a boolean in a test function that I'm not running.
This is not a great summary. Your test starts a server. The other test starts a server, clearly, the problem is there. You appear to have commented out the bit of code that stops the server at the end of the test? You can't run two servers on the same port.
You probably have a port conflict or some network condition that is triggered by running the two servers at once, because they both appear to use a similar (identical?) config loaded like this:
config := internal.ConfigLoad("development", true)
Running with no config uses default values, avoiding the conflict, running with config causes the conflict. So to pin it down, try creating a config with one setting at a time till you find the config setting that causes the problem (most likely Port or WebPort). Alternatively, make sure the tests stop the server at the end.
[EDIT] Looks like you have narrowed it down to DBFile config setting by changing one at a time. This implies the server starts a new db instance - if both try to use the same file for a new db, this would cause contention and the second test to run would fail.
It's not entirely clear from your description above what you're doing or what the problem is, so you could try to improve that to state exactly the sequence of actions and the problem. If for example you have previously run a test which creates a db, it could affect later test runs because of the presence of a db file, so your tests are not completely independent.
[EDIT 2 - after further edits to question]
If commenting out TestREST completely solves your problem (or a panic before it starts), and given changing it breaks the other test, you are executing TestREST somehow.
Looking at your code for jobqueue_test, it appears to invoke go test so you might be running more tests that you assume? Given you don't see the panic output I'd suspect your use of exec.Command in this big test. Try removing bits of the failing test till it works to narrow down exactly which invocation is running the other test. Calling go test within a test is pretty unusual!
https://github.com/VertebrateResequencing/wr/blob/develop/jobqueue/jobqueue_test.go#L2445

How to enforce single instance of an application under mono?

So, I am able to enforce single instance of my application on Windows as follows.
[STAThread]
class method Program.Main(args: array of string);
begin
var mutex := new Mutex(true, "{8F6F0AC4-B9A1-45fd-A8CF-72F04E6BDE8F}");
if mutex.WaitOne(Timespan.Zero, true) then
begin
Application.EnableVisualStyles();
Application.SetCompatibleTextRenderingDefault(false);
Application.ThreadException += OnThreadException;
lMainForm := new MainForm;
lMainForm.ShowInTaskbar := true;
lMainForm.Visible := false;
Application.Run(lMainForm);
end
else
MessageBox.Show("Another copy running!!!");
end;
However, running the same application on Linux under mono this code does NOT work at all. I am able to run multiple copies. I don't know if it has to do with the fact that I am starting the application on the Terminal like mono MyPro.exe. If this is the problem, do you need to pass some values before you execute the command line.
Thanks in advance,
You need to enable shared memory in mono as Adrian Faciu has mentioned to make your approach work, however this is not the best approach (there's a reason it's disabled by default in the first place, even if I can't remember now exactly why).
I've used two solutions in the past:
A file-based lock. Create a known file, write the pid into that file. At startup in your app check if the file exists, and if it exists read the pid and check if there are any running processes with that pid (so that it can recover from crashes). And delete the file upon exit (in the instance that created it in the first place). The drawback is that there is a race condition at startup if several instances are launched pretty much at the same time. You can improve this with file locking, but you may have to use P/Invokes to do the proper file locking on Linux (I'm not entirely sure the managed API would do what you'd expect).
A socked-based lock. Open a known port. The advantage over the above is that you don't need to do any cleanup and there are no race conditions. The drawback is that you need a fixed/known port, and some other program might happen to use that exact port at the same time.
You probably need to enable shared handles using MONO_ENABLE_SHM environment variable:
MONO_ENABLE_SHM=1 mono MyPro.exe
Adrian's solution works for me(tm).
Wild guess: did you need MONO_ENABLE_SHM in both invocations?

How to override edit locks

I'm writing a WLST script to deploy some WAR's and an EAR. However, intermittently, the script will time out because it can't seem to get an edit lock (this script is part of a chain of many other scripts). I was wondering, is there a way to override or stop any current locks on the server? This is only a temporary solution, but in the interest of time, it will do for now.
Thanks.
You could try setting a wait period and timeout:
startEdit([waitTimeInMillis], [timeoutInMillis], [exclusive]).
Are other scripts erroring out, leaving the session locked? You could try adding exception handling around those. Also, if you have 'Automatically acquire lock" enabled in the Admin Console and you use the admin console sometimes it can cause problems if you are running scripts at the same time, even though you are not making "lock-requiring" changes.
Also, are you using the same user for the chained scripts?
Within WLST, you can pass a number as a parameter to gain an exclusive lock. This allows the script to grab a different lock than the regular one that's used whenever an administrator locks from the console. It also prevents two instances of the same script from stepping on each other.
However, this creates complex change merge scenarios that are best avoided (by processes).
Oracle's documentation on configuration locks can be found here.
Alternatively, if you want the script to temporarily relieve any existing locks regardless of the pending changes, you may as well disable change management from the console, minimizing the inconvenience caused.
WLST also contains the cancelEdit command that you could run before you startEdit. Hope one of these options pan out!
To take the configuration change lock from another administrator:
If another administrator already has the configuration lock, the following message appears: Another user already owns the lock. You will need to either wait for the lock to be released, or take the lock.
Locate the Change Center in the upper left corner of the
Administration Console.
Click Take Lock & Edit.
Make your configuration changes.
In the Change Center, click Activate Changes. Not all changes take
effect immediately. Some require a restart (see Use the Change
Center).
As long as you're running WLST as an administrative user, you should be able to jump into an existing edit session with the edit() command - I've done a quick test with two admin users, one in the Admin Console, and one using WLST, and it appears to work fine - I can see the changes in the Admin Console session inside the WLST interpreter.
You could put a very simple exception handler around your calls to startEdit that will log the exception's stack trace, but do nothing else. And then rely on the edit call to pop you into the change session.
Relying on that is going to be tricky though if another script has started an edit session and is expecting to be able to commit that change session itself - you'll be getting exceptions and unreliable behaviour across multiple invocations.

TeamCity: Managing deployment dependencies for acceptance tests?

I'm trying to configure a set of build configurations in TeamCity 6 and am trying to model a specific requirement in the cleanest possible manner way enabled by TeamCity.
I have a set of acceptance tests (around 4-8 suites of tests grouped by the functional area of the system they pertain to) that I wish to run in parallel (I'll model them as build configurations so they can be distributed across a set of agents).
From my initial research, it seems that having a AcceptanceTests meta-build config that pulls in the set of individual Acceptance test configs via Snapshot dependencies should do the trick. Then all I have to do is say that my Commit build config should trigger AcceptanceTests and they'll all get pulled in. So, lets say I also have AcceptanceSuiteA, AcceptanceSuiteB and AcceptanceSuiteC
So far, so good (I know I could also turn it around the other way and cause the Commit config to trigger AcceptanceSuiteA, AcceptanceSuiteB and AcceptanceSuiteC - problem there is I need to manually aggregate the results to determine the overall success of the acceptance tests as a whole).
The complicating bit is that while AcceptanceSuiteC just needs some Commit artifacts and can then live on it's own, AcceptanceSuiteA and AcceptanceSuiteB need to:
DeploySite (lets say it takes 2 minutes and I cant afford to spin up a completely isolated one just for this run)
Run tests against the deployed site
The problem is that I need to be able to ensure that:
the website only gets configured once
The website does not get clobbered while the two suites are running
If I set up DeploySite as a build config and have AcceptanceSuiteA and AcceptanceSuiteB pull it in as a snapshot dependency, AFAICT:
a subsequent or parallel run of AcceptanceSuiteB could trigger another DeploySite which would clobber the deployment that AcceptanceSuiteA and/or AcceptanceSuiteB are in the middle of using.
While I can say Limit the number of simultaneously running builds to force only one to happen at a time, I need to have one at a time and not while the dependent pieces are still running.
Is there a way in TeamCity to model such a hierarchy?
EDIT: Ideas:-
A crap solution is that DeploySite could set a 'in use flag' marker and then have the AcceptanceTests config clear that flag [after AcceptanceSuiteA and AcceptanceSuiteB have completed]. The problem then becomes one of having the next DeploySite down the pipeline wait until said gate has been opened again (Doing a blocking wait within the build, doesnt feel right - I want it to be flagged as 'not yet started' rather than looking like it's taking a long time to do something). However this sort of stuff a flag over here and have this bit check it is the sort of mutable state / flakiness smell I'm trying to get away from.
EDIT 2: if I could programmatically alter the agent configuration, I could set Agent Requirements to require InUse=false and then set the flag when a deploy starts and clear it after the tests have run
Seems you go look on the Jetbrains Devnet and YouTrack tracker first and remember to use the magic word clobber in your search.
Then you install groovy-plug and use the StartBuildPrecondition facility
To use the feature, add system.locks.readLock. or system.locks.writeLock. property to the build configuration.
The build with writeLock will only start when there are no builds running with read or write locks of the same name.
The build with readLock will only start when there are no builds running with write lock of the same name.
therein to manage the fact that the dependent configs 'read' and the DeploySite config 'writes' the shared item.
(This is not a full productised solution hence the tracker item remains open)
EDIT: And I still dont know whether the lock should be under Build Parameters|System Properties and what the exact name format should be, is it locks.writeLock.MYLOCKNAME (i.e., show up in config with reference syntax %system.locks.writeLock.MYLOCKNAME%) ?
Other puzzlers are: how does one manage giving builds triggered by build completion of a writeLock task read access - does the lock get dropped until the next one picks up (which would allow another writer in) - or is it necessary to have something queue up the parent and child dependency at the same time ?