Speeding up RSpec & Factory girl model tests? - ruby-on-rails-3

I'm currently using FactoryGirl and Rspec to test my models, which is great but incredibly slow. The hundreds of tests that I have for each model take about 30 seconds to run, per model.
The core issue is that when I create an object to test, I'm using the FactoryGirl.create() method. That hits the DB, and is definitely slower than using build or build_stubbed. But if I just use build, then I'll never know if I run into an error with the database right (such as trying to write a null value to a column that I've specified as non-null)?
Is there any way to get the best of both world? Or should I test the DB integration part explicitly somewhere outside of model/unit tests?

Don't know if this is applicable in your case, but have you considered tweaking your spec_helper.rb to get your suite to run faster?
I documented the evolution of my spec_helper.rb file in this StackOverflow answer (see specifically Edit 4), and the links to other SO answers and blogs listed there helped me a lot in reducing the running time of the suite.

I tend to use FactoryGirl.build, or just .new to create instances in model specs, and then save them only if the test needs to check some behavior that requires a persisted instance.
This can be problematic when using associations or joins where the row ID must be present. It's something of a tradeoff--speedy tests vs. tests that are easy to write.

you should use build most of the times, I you want to be sure that some value won't be saved as null do some spec just for that, it has no sense to always create the objects on the db
if you test that the factory creates a valid object once then you can trust the factory that it will create valid objects always.
also, always use presence validations on the fields that can't be null/nil, if your field is not nil then you can be sure the db won't have a null value

Related

Should I add a "real-delete" method on DAO for integration test?

I am writing some integration tests for some legacy code. To ensure the functions behave as expected, I need to setup the fake data, invoke the testing APIs, then clean up the data.
Due to policy reason, we can only access the database via tools like Hibernate and MyBatis, never direct connection. However, our delete() method on the DAOs is always of the soft-deletion style (ie, turn on the is_delete flag.) So the clean-up actually just turns on the is_delete flag, and the fake data is still there!
So, should I add a "real-delete" method on the DAOs for the integration tests, or there's a better way to deal with this problem?
There is nothing wrong with adding a real delete method - after all, the point of an integration test is to test all the components together in an effort to emulate they way they will actually be used.
I would just make sure that if you do this, you first add records that you know will not be duplicates. Then you can assert that those records are present in the database, delete them, and assert that they are no longer present. That way you ensure that your test never deletes real data.

How to quickly analyse the impact of a program change?

Lately I need to do an impact analysis on changing a DB column definition of a widely used table (like PRODUCT, USER, etc). I find it is a very time consuming, boring and difficult task. I would like to ask if there is any known methodology to do so?
The question also apply to changes on application, file system, search engine, etc. At first, I thought this kind of functional relationship should be pre-documented or some how keep tracked, but then I realize that everything can have changes, it would be impossible to do so.
I don't even know what should be tagged to this question, please help.
Sorry for my poor English.
Sure. One can technically at least know what code touches the DB column (reads or writes it), by determining program slices.
Methodology: Find all SQL code elements in your sources. Determine which ones touch the column in question. (Careful: SELECT ALL may touch your column, so you need to know the schema). Determine which variables read or write that column. Follow those variables wherever they go, and determine the code and variables they affect; follow all those variables too. (This amounts to computing a forward slice). Likewise, find the sources of the variables used to fill the column; follow them back to their code and sources, and follow those variables too. (This amounts to computing a backward slice).
All the elements of the slice are potentially affecting/affected by a change. There may be conditions in the slice-selected code that are clearly outside the conditions expected by your new use case, and you can eliminate that code from consideration. Everything else in the slices you may have inspect/modify to make your change.
Now, your change may affect some other code (e.g., a new place to use the DB column, or combine the value from the DB column with some other value). You'll want to inspect up and downstream slices on the code you change too.
You can apply this process for any change you might make to the code base, not just DB columns.
Manually this is not easy to do in a big code base, and it certainly isn't quick. There is some automation to do for C and C++ code, but not much for other languages.
You can get a bad approximation by running test cases that involve you desired variable or action, and inspecting the test coverage. (Your approximation gets better if you run test cases you are sure does NOT cover your desired variable or action, and eliminating all the code it covers).
Eventually this task cannot be automated or reduced to an algorithm, otherwise there would be a tool to preview refactored changes. The better you wrote code in the beginning, the easier the task.
Let me explain how to reach the answer: isolation is the key. Mapping everything to object properties can help you automate your review.
I can give you an example. If you can manage to map your specific case to the below, it will save your life.
The OR/M change pattern
Like Hibernate or Entity Framework...
A change to a database column may be simply previewed by analysing what code uses a certain object's property. Since all DB columns are mapped to object properties, and assuming no code uses pure SQL, you are good to go for your estimations
This is a very simple pattern for change management.
In order to reduce a file system/network or data file issue to the above pattern you need other software patterns implemented. I mean, if you can reduce a complex scenario to a change in your objects' properties, you can leverage your IDE to detect the changes for you, including code that needs a slight modification to compile or needs to be rewritten at all.
If you want to manage a change in a remote service when you initially write your software, wrap that service in an interface. So you will only have to modify its implementation
If you want to manage a possible change in a data file format (e.g. length of field change in positional format, column reordering), write a service that maps that file to object (like using BeanIO parser)
If you want to manage a possible change in file system paths, design your application to use more runtime variables
If you want to manage a possible change in cryptography algorithms, wrap them in services (e.g. HashService, CryptoService, SignService)
If you do the above, your manual requirements review will be easier. Because the overall task is manual, but can be aided with automated tools. You can try to change the name of a class's property and see its side effects in the compiler
Worst case
Obviously if you need to change the name, type and length of a specific column in a database in a software with plain SQL hardcoded and shattered in multiple places around the code, and worse many tables present similar column namings, plus without project documentation (did I write worst case, right?) of a total of 10000+ classes, you have no other way than manually exploring your project, using find tools but not relying on them.
And if you don't have a test plan, which is the document from which you can hope to originate a software test suite, it will be time to make one.
Just adding my 2 cents. I'm assuming you're working in a production environment so there's got to be some form of unit tests, integration tests and system tests already written.
If yes, then a good way to validate your changes is to run all these tests again and create any new tests which might be necessary.
And to state the obvious, do not integrate your code changes into the main production code base without running these tests.
Yet again changes which worked fine in a test environment may not work in a production environment.
Have some form of source code configuration management system like Subversion, GitHub, CVS etc.
This enables you to roll back your changes

In “Given-When-Then” style BDD tests, is it OK to have multiple “When”s conjoined with an “And”?

I read Bob Martin's brilliant article on how "Given-When-Then" can actual be compared to an FSM. It got me thinking. Is it OK for a BDD test to have multiple "When"s?
For eg.
GIVEN my system is in a defined state
WHEN an event A occurs
AND an event B occurs
AND an event C occurs
THEN my system should behave in this manner
I personally think these should be 3 different tests for good separation of intent. But other than that, are there any compelling reasons for or against this approach?
When multiple steps (WHEN) are needed before you do your actual assertion (THEN), I prefer to group them in the initial condition part (GIVEN) and keep only one in the WHEN section. This kind of shows that the event that really triggers the "action" of my SUT is this one, and that the previous one are more steps to get there.
Your test would become:
GIVEN my system is in a defined state
AND an event A occurs
AND an event B occurs
WHEN an event C occurs
THEN my system should behave in this manner
but this is more of a personal preference I guess.
If you truly need to test that a system behaves in a particular manner under those specific conditions, it's a perfectly acceptable way to write a test.
I found that the other limiting factor could be in an E2E testing scenario that you would like to reuse a statement multiple times. In my case the BDD framework of my choice(pytest_bdd) is implemented in a way that a given statement can have a singular return value and it maps the then input parameters automagically by the name of the function that was mapped to the given step. Now this design prevents reusability whereas in my case I wanted that. In short I needed to create objects and add them to a sequence object provided by another given statement. The way I worked around this limitation is by using a test fixture(which I named test_context), which was a python dictionary(a hashmap) and used when statements that don't have same singular requirement so the '(when)add object to sequence' step looked up the sequence in the context and appended the object in question to it. So now I could reuse the add object to sequence action multiple times.
This requirement was tricky because BDD aims to be descriptive. So I could have used a single given statement with the pickled memory map of the sequence object that I wanted to perform test action on. BUT would it have been useful? I think not. I needed to get the sequence constructed first and that needed reusable statements. And although this is not in the BDD bible I think in the end it is a practical and pragmatic solution to a very real E2E descriptive testing problem.

DRY for JMeter tests

Is there a way to modularize JMeter tests.
I have recorded several use cases for our application. Each of them is in a separate thread group in the same test plan. To control the workflow I wrote some primitives (e.g. postprocessor elements) that are used in many of these thread groups.
Is there a way not to copy these elements into each thread group but to use some kind of referencing within the same test plan? What would also be helpful is a way to reference elements from a different file.
Does anybody have any solutions or workarounds. I guess I am not the only one trying to follow the DRY principle...
I think this post from Atlassian describes what you're after using Module controllers. I've not tried it myself yet, but have it on my list of things to do :)
http://blogs.atlassian.com/developer/2008/10/performance_testing_with_jmete.html
Jared
You can't do this with JMeter. The UI doesn't support it. The Workbench would be a perfect place to store those common elements but it's not saved in JMX.
However, you can parameterize just about anything so you can achieve similar effects. For example, we use the same regex post processor in several thread groups. Even though we can't share the processor, the whole expression is a parameter defined in the test plan, which is shared. We only need to change one place when the regex changes.
They are talking about saving Workbench in a future version of Jmeter. Once that's done, it's trivial to add some UI to refer to the element in Workbench.
Module controllers are useful for executing the same samples in different thread groups.
It is possible to use the same assertions in multiple thread groups very easily.
At your Test Plan level, create a set of User Defined variables with names like "Expected_Result_x". Then, in your response assertion, simply reference the variable name ${Expected_Result_x}. You would still need to add the assertion manually to every page you want a particular assertion on, but now you only have to change it one place if the assertion changes.

ASP.NET MVC TDD with LINQ and SQL database

I am trying to start a new MVC project with tests and I thought the best way to go would have 2 databases. 1 for testing against and 1 for when I run the app and use it (also test really as it's not production yet).
For the test database I was thinking of putting create table scripts and fill data scripts within the test setup method and then deleting all this in the tear down method.
I am going to be using Linq to SQL though and I don't think that will allow me to do this?
Will I have to just go the ADO route if I want to do it this way? Or should I just use a mock object and store data as an array or something?.
Any tips on best practices?
How did Jeff go about doing this for StackOveflow?
What I do is define an interface for a DataContext wrapper and use an implementation of the wrapper for the DataContext. This allows me to use an alternate, fake DataContext implementation in my tests (or mock it, if easier). This abstracts the database out of my unit tests completely. I found some starter code at http://andrewtokeley.net/archive/2008/07/06/mocking-linq-to-sql-datacontext.aspx, although I've extended it so that it handles the validation implementations on my entity classes.
I should also mention that I have a separate staging server for QA, so there is live testing of the entire system. I just don't use an actual database in my unit testing.
I checked out the link from tvanfosson and RikMigrations and after playing about with them I prefer the mocking datacontext method best. I realised I don't need to create tables and drop them all the time.
After a little more research I found Stephen Walther's article http://stephenwalther.com/blog/archive/2008/08/17/asp-net-mvc-tip-33-unit-test-linq-to-sql.aspx which to me seems easier and more reliable.
So I am going with this implementation.
Thanks for the help.
You may want to find some other way around actually hitting the database for your unit tests because it takes a lot more time. That being said, have you considered using Migrations for creating / deleting your tables instead of using sql scripts? RikMigrations is what I have been using to create my database so I can easily revision all of my code in one place. Justin Etheredge has a great article on using RikMigrations.
Consider these methods on DataContext:
http://msdn.microsoft.com/en-us/library/system.data.linq.datacontext.createdatabase.aspx
http://msdn.microsoft.com/en-us/library/system.data.linq.datacontext.executecommand(v=VS.100).aspx
I agree with much of the above, relating to unit testing. However, I think it's important to raise the point that using Mock Repositories and unit tests doesn't give you the same level of tests as a DB Integration Test would.
For example, our databases often have cascading deletes built right in to the schema. In this case, deleting a primary entity in an aggregate will automatically delete all child entities. However, this would not automatically apply in a mocked repository that was not backed up by a physical database with these business rules (unless you built all of those rules in to the Mock). This is important because if somebody comes along and changes the design of my schema, I need it to break my tests so I can adjust the code/schema accordingly. I appreciate that this is Integration Testing and not Unit Testing but thought it was worth mentioning.
My preferred option is to create a Master Design Database that contains sample data (the same sort of data you would create in your Mocks). During the start of each test run, I have an automated script that creates a backup of the MasterDB and restores it to "TestDB" (which all my tests use). That way, I maintain a repository of clean test data in Master than recreates itself upon each test run. My tests can play around with the data and test out all the scenarios needed.
When I debug the application, I have another script that backs up and restores the Master DB to a DEV database. I can play around with data here too without worrying about losing my sample data. I don't typically run this particular script every session because of the delay waiting for the DB to be recreated. I may run it once a day and then play around/debug the app throughout the day. If for example, I delete all the records from a table as part of my debugging, I would run the script to recreate the DevDB when I'm done.
These steps sound like they would add a huge amount of time to the process, but actually - they don't. Our application currently has in the region of 3500 tests, with about 3000 of them accessing the DB at some point. The database backup and restore typically takes around 10-12 seconds at the start of each test run. And since the whole test suite is only executed upon TFS checkin, we don't mind if we have to wait a while longer anyway. On an average day, our entire test suite takes about 15-20 minutes to run.
I appreciate and accept that integration testing is much slower than unit testing (because of the inherent need to use a real DB) but it more closely represents the 'real world' app. For example, Mock Repositories don't return DB error codes, the don't time-out, they don't lock up, they don't run out of disk space, etc.
Unit tests are ok for simple calculations, basic business rules, etc. and certainly they are absolutely the best choice for most operations that don't involve DB (or other resource) access. But I don't think they are as valuable as integration tests - people talk a lot about unit tests, but little is said about integration tests.
I expect those passionate about unit tests will be sending flames my way for this. That's fine - I'm just trying to bring some balance and to remind people that projects that are full of passed unit tests can still fail badly the moment you implement them in the field.
This article gives example of mocking linq to sql with typemock.
http://blog.benhall.me.uk/2007/11/how-to-unit-test-linq-to-sql-and.html