RSpec errors when run en masse, but not individually - ruby-on-rails-3

Unfortunately I don't have a specific question (or clues), but was hoping someone could point me in the right direction.
When I run all of my tests (rspec spec), I am getting two tests that fail specifically related to Delayed Job.
When I run this spec file in isolation (rspec ./spec/controllers/xxx_controller_spec.rb) all the tests pass...... Is this a common problem? What should I be looking for?
Thanks!

You are already mentioning it: isolation might be the solution. Usually I would guess that you have things in the database that are being changed and not cleaned up properly (or rather, are not not mocked properly).
In this case though I would suggest that, because the system is quite under a high workload, the delayed jobs are not being worked off fast enough. The challenge is with all asynchronous tasks that should be tested: you must not let the system run the delayed jobs, but mock the calls and just make sure that the delayed jobs have been received.
Sadly, with no examples, I can hardly point out the missing mocks. But make sure that all calls to delay_jobs and similar receive the correct data, but do not actually create and run those jobs - your specs will be faster, too. Make sure you isolate the function under test and not call external dependencies.

Related

Snakemake: Job preemption can interrupt running jobs on clusters, how to make sure that the task is not considered as failed?

I'm using Snakemake on a cluster, and I don't know how best to handle the fact that some jobs can be preempted.
For more power on the cluster I use, it is possible to have access to the resources of other teams, but with the risk of being preempted, which consists in stopping the job in progress, and rescheduling it. It will be launched again as soon as a resource is available. This is especially advantageous when you have a lot of quick jobs to run. Unfortunately, I don't have the impression that Snakemake supports this properly.
In the example given in the help on the cluster-status feature for Slurm, there is no PREEMPTED in the running_status list (running_status=["PENDING", "CONFIGURING", "COMPLETING", "RUNNING", "SUSPENDED"]), which may lead to consider a preempted job has failed. Not a big deal, I’ve added PREEMPTED to this list, but I am led to believe that Snakemake did not consider this scenario.
More annoyingly, even when running Snakemake with the --rerun-incomplete option, when the job is interrupted by the preemption, then restarted, I get the following error:
IncompleteFilesException:
The files below seem to be incomplete. If you are sure that certain files are not incomplete, mark them as complete with
snakemake --cleanup-metadata <filenames>
To re-generate the files rerun your command with the --rerun-incomplete flag.
I would expect the interrupted job to restart from scratch.
For now, the only solution I have found is to stop using other teams' resources to avoid having my jobs preempted, but I am losing computing power.
How do you use Snakemake in a context where your jobs can be preempted? Anyone see a solution so I don't get the IncompleteFilesException anymore?
Thanks in advance
Snakemake has a restart feature, which can be used to let jobs be resubmitted automatically. However, there is no special handling for prememption currently, indeed. You are also right, I was not even aware that something like that exists on slurm. A PR in that direction would be welcome of course. Basically, one would need to extend the status script handling to recognize this and in that case restart the job.
Thanks for reporting these, I see two separate issues here:
Handling of the PREEMPTED status returned by slurm.
The IncompleteFilesException suggesting you use --rerun-incomplete when that is exactly what you are doing.
1. PREEMPTED status handling
I have no experience in using slurm, so I cannot comment if the script example in the docs that you are linking to will work for slurm. Especially the expression in output = str(subprocess.check_output(expression)) might have to be adjusted to slurm in some way. Maybe there's someone around here who also uses slurm and has found a working solution in the past?
But otherwise, adding PREEMPTED to the running_status list should be exactly what you want to do (assuming that that is exactly the tag returned by expression).
If this has to be adapted to slurm and you manage to generate a working status.py script, it might be worth adding this to the docs via a pull request onto this file, so that other slurm users don't have to reinvent the solution.
2. IncompleteFilesException with --rerun-incomplete flag
From the general description, this sounds a bit like a bug. But without any details, I cannot be sure. But maybe it's worth describing this in some more detail while filing an issue in the snakemake repo. Either simply by providing more details, or by even providing a minimal example to reproduce this behavior.

Handling test data when going from running Selenium tests in series to parallel

I'd like to start running my existing Selenium tests in parallel, but I'm having trouble deciding on the best approach due to the way my current tests are written.
The first step in of most of my tests is to get the DB into a clean state and then populate it with the data needed for the rest of the test. While this is great to isolate tests from each other, if I start running these same Selenium tests in parallel on the same SUT, they'll end up erasing other tests' data.
After much digging, I haven't been able to find any guidance or best-practices on how to deal with this situation. I've thought of a few ideas, but none have struck me as particularly awesome:
Rewrite the tests to not overwrite other tests' data, i.e. only add test data, never erase -- I could see this potentially leading to unexpected failures due to the variability of the database when each test is run. Anything from a different ordering of tests to an ill-placed failure could throw off the other tests. This just feels wrong.
Don't pre-populate the database -- Instead, create all needed data via Selenium itself. This would most replicate real-world usage, but would also take significantly longer than loading data directly into the database. This would probably negate any benefits from parallelization depending on how much test data each test case needs.
Have each Selenium node test a different copy of the SUT -- This way, each test would be free to do as it pleases with the database, since we are assume that no other test is touching it at the same time. The downside is that I'd need to have multiple databases setup and, at the start of each test case, figure out how to coordinate which database to initialize and how to signal to the node and SUT that this particular test case should be using this particular database. Not awful, but not what I would love to do if there's a better way.
Have each Selenium node test a different copy of the SUT, but break up the tests into distinct suites, one suite per node, before run-time -- Also viable, but not as flexible since over time you'd want to keep going back and even the length of each suite as much as possible.
All in all, none of these seem like clear winners. Option 3 seems the most reasonable, but I also have doubts about whether that is even a feasible approach. After researching a bit, it looks like I'll need to write a custom test runner to facilitate running the tests in parallel anyways, but the parts regarding the initial test data still have me looking for a better way.
Anyone have any better ways of handling database initialization when running Selenium tests in parallel?
FWIW, the app and tests suite is in PHP/PHPUnit.
Update
Since it sounds like the answer I'm looking for is very project-dependent, I'm at least going to attempt to come up with my own solution and report back with my findings.
There's no easy answer and it looks like you've thought out most of it. Also worth considering is to rewrite the tests to use separately partitioned data - this may or may not work depending on your domain (e.g. a separate bank account per node, if it's a banking app). Your pre-population of the DB could be restricted to static reference data, or you could pre-populate the data for each separate 'account'. Again, depends on how easy this is to do for your data.
I'm inclined to vote for 3, though, because database setup is relatively easy to script these days and the hardware requirements probably aren't too high for a small test data suite.

overhead of exec() call?

I have a web script that is a simple wrapper around a perl program:
#!/bin/sh
source ~/.myinit // pull in some configurations like [PRIVATE_DIR]
exec [PRIVATE_DIR]/myprog.pl
This is really just to keep the code better compartmentalized, since the main program (myprog.pl) runs on different machines with different configuration modes and it's cleaner to not have to update that for every installation. However, I would be willing to sacrifice cleanliness for efficiency.
So the question is, does this extra sh exec() call add any non-negligible overhead, considering it could be hit by the webserver quite frequently (we have 1000s of users on at a time)? I ask, because I know that people have gone to great lengths to embed programs into httpd to avoid having to make an extra fork/exec call. My assumption has been that this was due to the overhead of the program being called (eg mod_perl bypasses the extremely slow perl startup), not the process of calling itself. Would that be accurate?
I've tried to benchmark this but I can't get the resolution I need to see if it makes a difference. Any advice on that would also be appreciated.
Thanks!
Try a test?
Assuming you have a test environmentn separate from production.
(and this is a little vague, and real testers will be able to amplify how to improve on this)
Run your current 2 level design in a tight loop, using the same param for 100,000? 1,000,000? ...
Then 'fix' your code so the perl is called directly, with the same params as above, looping for same count.
Capture performance stats for both runs. The diff between the two should be (roughly) the cost of the extra call.
If this works out, I'd appreciate seeing the results on this:-)
(There are many more posts for tag=testing than tag=benchmark)
I hope this helps.
P.S. as you appear to be a new user, if you get an answer that helps you please remember to mark it as accepted, or give it a + (or -) as a useful answer
It's easy for a perl program to take over some envars from a shell script, see the answer to Have perl execute shellscript & take over env vars.
Using this there is no need for exec at all which should alleviate all your worries about exec overhead. :-)

How many cycles are required to validate an automated script

I have one query. Maybe it is a silly question but still I need the answer to clear my doubts.
Testing is evaluating the product or application. We do testing to check whether there are any show stoppers or not, any issues that should not present.
We automate (script I am talking about) testcases from the present test cases. Once the test case is automated, how many cycle do we need to check the test that the script is running with no major errors and thus the script is reliable to run instead of manually executing the test cases.
Thanks in advance.
If the test script always fails when a test fails, you need to run the script only once. Running the script several times without changing the code will not give you additional safety.
You may discover that your tests depend on some external source that changes during the tests and thereby make the tests fail sometimes. Running the tests several times will not solve this issue, either. To solve it, you must make sure that the test setup really initializes all external factors in such a way that the tests always succeed. If you can't achieve this, you can't test reliably, so there is no way around this.
That said, tests can never make sure that your product is 100% correct or safe. They just make sure that your product is still as good (or better) as it was before all the changes your made since the last test. It's kind of having a watermark which tells you the least amount of quality that you can depend on. Anything above the watermark is speculation but below it (the part that your tests cover) is safe.
So by refining your tests, you can make your product better with every change. Without the automatic tests, every change has a chance to make your product worse. This means, without tests, your quality will certainly deteriorate while with tests, you can guarantee to maintain a certain amount of quality.
It's a huge field with no simple answer.
It depends on several factors, including:
The code coverage of your tests
How you define reliable

When/how frequently should I test?

As a novice developer who is getting into the rhythm of my first professional project, I'm trying to develop good habits as soon as possible. However, I've found that I often forget to test, put it off, or do a whole bunch of tests at the end of a build instead of one at a time.
My question is what rhythm do you like to get into when working on large projects, and where testing fits into it.
Well, if you want to follow the TDD guys, before you start to code ;)
I am very much in the same position as you. I want to get more into testing, but I am currently in a position where we are working to "get the code out" rather than "get the code out right" which scares the crap out of me. So I am slowly trying to integrate testing processes in my development cycle.
Currently, I test as I code, trying to bust the code as I write it. I do find it hard to get into the TDD mindset.. Its taking time, but that is the way I would want to work..
EDIT:
I thought I should probably expand on this, this is my basic "working process"...
Plan what I want from the code,
possible object design, whatever.
Create my first class, add a huge comment to the top outlining
what my "vision" for the class is.
Outline the basic test scenarios.. These will basically
become the unit tests.
Create my first method.. Also writing a short comment explaining
how it is expected to work.
Write an automated test to see if it does what I expect.
Repeat steps 4-6 for each method (note the automated tests are in a huge list that runs on F5).
I then create some beefy tests to emulate the class in the working environment, obviously fixing any issues.
If any new bugs come to light following this, I then go back and write the new test in, make sure it fails (this also serves as proof-of-concept for the bug) then fix it..
I hope that helps.. Open to comments on how to improve this, as I said it is a concern of mine..
Before you check the code in.
First and often.
If I'm creating some new functionality for the system I'll be looking to initially define the interfaces and then write unit tests for those interfaces. To work out what tests to write consider the API of the interface and the functionality it provides, get out a pen and paper and think for a while about potential error conditions or ways to prove that it is doing the correct job. If this is too difficult then it's likely that your API isn't good enough.
In regards to the tests, see if you can avoid writing "integration" tests that test more than one specific object and keep them as "unit" test.
Then create a default implementation of your interface (that does nothing, returns rubbish values but doesn't throw exceptions), plug it into the tests to make sure that the tests fail (this tests that your tests work! :) ). Then write in the functionality and re-run the tests.
This mechanism isn't perfect but will cover a lot of simple coding mistakes and provide you with an opportunity to run your new feature without having to plug it into the entire application.
Following this you then need to test it in the main application with the combination of existing features.
This is where testing is more difficult and if possible should be partially outsourced to good QA tester as they'll have the knack of breaking things. Although it helps if you have these skills too.
Getting testing right is a knack that you have to pick up to be honest. My own experience comes from my own naive deployments and the subsequent bugs that were reported by the users when they used it in anger.
At first when this happened to me I found it irritating that the user was intentionally trying to break my software and I wanted to mark all the "bugs" down as "training issues". However after reflecting on it I realised that it is our role (as developers) to make the application as simple and reliable to use as possible even by idiots. It is our role to empower idiots and thats why we get paid the dollar. Idiot handling.
To effectively test like this you have to get into the mindset of trying to break everything. Assume the mantle of a user that bashes the buttons and generally attempts to destroy your application in weird and wonderful ways.
Assume that if you don't find flaws then they will be discovered in production to your companies serious loss of face. Take full responsibility for all of these issues and curse yourself when a bug you are responsible (or even part responsible) for is discovered in production.
If you do most of the above then you should start to produce much more robust code, however it is a bit of an art form and requires a lot of experience to be good at.
A good key to remember is
"Test early, test often and test again, when you think you are done"
When to test? When it's important that the code works correctly!
When hacking something together for myself, I test at the end. Bad practice, but these are usually small things that I'll use a few times and that's it.
On a larger project, I write tests before I write a class and I run the tests after every change to that class.
I test constantly. After I finish even a loop inside of a function, I run the program and hit a breakpoint at the top of the loop, then run through it. This is all just to make sure that the process is doing exactly what I want it to.
Then, once a function is finished, you test it in it's entirety. You probably want to set a breakpoint just before the function is called, and check your debugger to make sure that it works perfectly.
I guess I would say: "Test often."
I've only recently added unit testing to my regular work flow but I write unit tests:
to express the requirements for each new code module (right after I write the interface but before writing the implementation)
every time I think "it had better ... by the time I'm done"
when something breaks, to quantify the bug and prove that I've fixed it
when I write code which explicitly allocates or deallocates memory---I loath hunting for memory leaks...
I run the tests on most builds, and always before running the code.
Start with unit testing. Specifically, check out TDD, Test Driven Development. The concept behind TDD is you write the unit tests first, then write your code. If the test fails, you go back and re-work your code. If it passes, you move on to the next one.
I take a hybrid approach to TDD. I don't like to write tests against nothing, so I usually write some of the code first, then put the unit tests in. It's an iterative process, one which you're never really done with. You change the code, you run your tests. If there's any failures, fix and repeat.
The other sort of testing is integration testing, which comes along later in the process, and might typically be done by a QA testing team. In any case, integration testing addresses the need to test the pieces as a whole. It's the working product you're concerned with testing. This one is more difficult to deal with b/c it usually involves having automated testing tools (like Robot, for ex.).
Also, take a look at a product like CruiseControl.NET to do continuous builds. CC.NET is nice b/c it will run your unit tests with each build, notifying you immediately of any failures.
We don't do TDD here (though some have advocated it), but our rule is that you're supposed to check your unit tests in with your changes. It doesn't always happen, but it's easy to go back and look at a specific changeset and see whether or not tests were written.
I find that if I wait until the end of writing some new feature to test, I forget many of the edge cases that I thought might break the feature. This is ok if you are doing things to learn for yourself, but in a professional environment, I find my flow to be the classic form of: Red, Green, Refactor.
Red: Write your test so that it fails. That way you know the test is asserting against the correct variable.
Green: Make your new test pass in the easiest way possible. If that means hard-coding it, that's ok. This is great for those that just want something to work right away.
Refactor: Now that your test passes, you can go back and change your code with confidence. Your new change broke your test? Great, your change had an implication you didn't realize, now your test is telling you.
This rhythm has made me speed my development over time because I basically have a history compiler for all the things I thought that needed to be checked in order for a feature to work! This, in turn, leads to many other benefits, that I won't get to here...
Lots of great answers here!
I try to test at the lowest level that makes sense:
If a single computation or conditional is difficult or complex, add test code while you're writing it and ensure each piece works. Comment out the test code when you're done, but leave it there to document how you tested the algorithm.
Test each function.
Exercise each branch at least once.
Exercise the boundary conditions -- input values at which the code changes its behavior -- to catch "off by one" errors.
Test various combinations of valid and invalid inputs.
Look for situations that might break the code, and test them.
Test each module with the same strategy as above.
Test the body of code as a whole, to ensure the components interact properly. If you've been diligent about lower-level testing, this is essentially a "confidence test" to ensure nothing broke during assembly.
Since most of my code is for embedded devices, I pay particular attention to robustness, interaction between various threads, tasks, and components, and unexpected use of resources: memory, CPU, filesystem space, etc.
In general, the earlier you encounter an error, the easier it is to isolate, identify, and fix it--and the more time you get to spend creating, rather than chasing your tail.*
**I know, -1 for the gratuitous buffer-pointer reference!*