NHibernate - Usefulness - nhibernate

I work in a software and hardware development farm. Today one of my colleagues told me that NHibernate is only useful for small projects, and for complex or large scale projects it must be avoided. Also, it makes code harder to change.
Are those statements true?

Ebay uses Hibernate (the Java version that NHibernate is ported from). I don't consider that a small project.
As far as changing code goes, consider this: Let's assume we need to add a new property to an object.
Here is what has to be done with a hand-rolled data access layer:
Add the column to the db table.
Change every stored procedure that
deals with that object / table.
This is usually several stored
procedures in my experience.
Change the code in the mapping layer
Add the property to the Object
Here is what has to be done with NHibernate:
Add the column to the db table.
Add the property to the HBM file
Add the property to the object.

Have to agree with Daniel Augur on the first point.
On the second, "does it make code harder to change?", I'll provide a general view. Any time you use something ready-rolled you're going to run into restrictions that might not be easier to overcome. Even when the source is available, you may not wish to modify it for fear of deviating to the point of a breaking change.
Part of a software developer's job is determining whether the merits outweigh the drawbacks with 3rd party code.


Model validation logic in the database via constraints. Good idea, bad idea, or not worth it?

It's always rubbed me the wrong way to write code in my model's clean method to validate various constraints on the data when these same constraints aren't also present in the database.
After all, the database already has constraints for some of my data, like NOT NULL.
So, I've been writing RawSQL migrations that ADD CONSTRAINT some_logic in my most recent project that matches whatever logic I have in my clean() method.
It works OK, but it isn't an insignificant task to remember to add these constraints, add tests for these migrations, and update them when my model changes. Also, of course, I'm violating DRY by writing code in two places to do the same thing.
Should I give up this quixotic quest?
This is by no means a comprehensive answer, but at least I wanted to give my opinion.
There has been many frameworks that have pushed the idea of removing the constraints from the database, in order to check them at the application level. The idea seemed nice to me at first (in the early 2000s) but after some years I came to the (very personal) conclusion that this is a bad idea.
I think, to me it boils down to two things:
Data survives much longer than the applications. Whole systems go obsolete, but the data survives many more years. Sometimes the application is replaced, but the database is stil the same one.
The application is not as reliable when it comes to validate data. I'm talking about programming defects here. One version of the app may work well and then the next one has a bug. It may be that one developer moves out of the company, then the new replacement -- who doesn't know as much -- changes the app with disastrous consequences. All that time a simple database constraint (that is usually very cheap to implement) could have enforced data quality.
Yep, I'm a fan of strict database constraint. Nevertheless, this doesn't mean I'm against application validations. These ones can show much nicer error messages.
If writing too much logic in clean() feels dirty, an in-between solution would be to use Django's built-in validators directly on your model fields.
The validation logic isn't saved in the database, but it is tracked in migrations. Like clean() logic, Validators require you to call Model.clean_fields(), but a ModelForm does this automatically.
You can also dig into django-db-constraints. The library might help do what you're looking to do, and the source code might help you roll a solution that fits your needs.

How to quickly analyse the impact of a program change?

Lately I need to do an impact analysis on changing a DB column definition of a widely used table (like PRODUCT, USER, etc). I find it is a very time consuming, boring and difficult task. I would like to ask if there is any known methodology to do so?
The question also apply to changes on application, file system, search engine, etc. At first, I thought this kind of functional relationship should be pre-documented or some how keep tracked, but then I realize that everything can have changes, it would be impossible to do so.
I don't even know what should be tagged to this question, please help.
Sorry for my poor English.
Sure. One can technically at least know what code touches the DB column (reads or writes it), by determining program slices.
Methodology: Find all SQL code elements in your sources. Determine which ones touch the column in question. (Careful: SELECT ALL may touch your column, so you need to know the schema). Determine which variables read or write that column. Follow those variables wherever they go, and determine the code and variables they affect; follow all those variables too. (This amounts to computing a forward slice). Likewise, find the sources of the variables used to fill the column; follow them back to their code and sources, and follow those variables too. (This amounts to computing a backward slice).
All the elements of the slice are potentially affecting/affected by a change. There may be conditions in the slice-selected code that are clearly outside the conditions expected by your new use case, and you can eliminate that code from consideration. Everything else in the slices you may have inspect/modify to make your change.
Now, your change may affect some other code (e.g., a new place to use the DB column, or combine the value from the DB column with some other value). You'll want to inspect up and downstream slices on the code you change too.
You can apply this process for any change you might make to the code base, not just DB columns.
Manually this is not easy to do in a big code base, and it certainly isn't quick. There is some automation to do for C and C++ code, but not much for other languages.
You can get a bad approximation by running test cases that involve you desired variable or action, and inspecting the test coverage. (Your approximation gets better if you run test cases you are sure does NOT cover your desired variable or action, and eliminating all the code it covers).
Eventually this task cannot be automated or reduced to an algorithm, otherwise there would be a tool to preview refactored changes. The better you wrote code in the beginning, the easier the task.
Let me explain how to reach the answer: isolation is the key. Mapping everything to object properties can help you automate your review.
I can give you an example. If you can manage to map your specific case to the below, it will save your life.
The OR/M change pattern
Like Hibernate or Entity Framework...
A change to a database column may be simply previewed by analysing what code uses a certain object's property. Since all DB columns are mapped to object properties, and assuming no code uses pure SQL, you are good to go for your estimations
This is a very simple pattern for change management.
In order to reduce a file system/network or data file issue to the above pattern you need other software patterns implemented. I mean, if you can reduce a complex scenario to a change in your objects' properties, you can leverage your IDE to detect the changes for you, including code that needs a slight modification to compile or needs to be rewritten at all.
If you want to manage a change in a remote service when you initially write your software, wrap that service in an interface. So you will only have to modify its implementation
If you want to manage a possible change in a data file format (e.g. length of field change in positional format, column reordering), write a service that maps that file to object (like using BeanIO parser)
If you want to manage a possible change in file system paths, design your application to use more runtime variables
If you want to manage a possible change in cryptography algorithms, wrap them in services (e.g. HashService, CryptoService, SignService)
If you do the above, your manual requirements review will be easier. Because the overall task is manual, but can be aided with automated tools. You can try to change the name of a class's property and see its side effects in the compiler
Worst case
Obviously if you need to change the name, type and length of a specific column in a database in a software with plain SQL hardcoded and shattered in multiple places around the code, and worse many tables present similar column namings, plus without project documentation (did I write worst case, right?) of a total of 10000+ classes, you have no other way than manually exploring your project, using find tools but not relying on them.
And if you don't have a test plan, which is the document from which you can hope to originate a software test suite, it will be time to make one.
Just adding my 2 cents. I'm assuming you're working in a production environment so there's got to be some form of unit tests, integration tests and system tests already written.
If yes, then a good way to validate your changes is to run all these tests again and create any new tests which might be necessary.
And to state the obvious, do not integrate your code changes into the main production code base without running these tests.
Yet again changes which worked fine in a test environment may not work in a production environment.
Have some form of source code configuration management system like Subversion, GitHub, CVS etc.
This enables you to roll back your changes

Micro ORM - maintaining your SQL query strings

I will not go into the details why I am exploring the use of Micro ORMs at this stage - except to say that I feel powerless when I use a full blown ORM. There are too many things going on in the background that happens automatically, and not all of them are the best possible choices. I was quite ready to go back to raw database access, but I found out about the three new guys on the block: Dapper, PetaPoco and Massive. So I decided to give the low-level approach a go with a pet project. It is not relevant, but so far, I am using PetaPoco.
In any case, I am having trouble deciding how to go about maintaining the SQL strings that I will use from the higher levels. There are three main solutions that I can think of:
Sprinkle the SQL queries wherever I need them. This is the least infrastructure heavy method. However, it suffers in both maintainability and testability areas.
Limit the query usage to some service classes. This helps maintainability, is still low on infrastructure I need to implement. It may also be possible to build these service classes such that it would be easy to mock for testing purposes.
Prepare some classes to make the system somewhat flexible. I have started on this path. I implemented a Repository interface, and a database dependent Repository class. I have also build some tiny interfaces to capture SQL queries that can be passed to my Repository's GetMany() method. All the queries are implemented as individual classes right now, and I will probably need a little more interface around this to add some level of database independence - and maybe for some flexibility in decorating queries into paged and sorted queries (again, this would also make them a little bit more flexible in handling different databases).
What I am mainly worried about right now is that I have entered the slippery slope of writing all the functions needed for a full blown ORM, but badly. For example, it feels sensible right now that I write or find a library to convert linq calls into SQL statements so that I can massage my queries easily or write extenders that can decorate any query I pass to it, etc. But that is a large task, and is already done by the big guys, so I am resisting the urge to go there. I also want to retain control over what queries I send to the database - by explicitly writing them.
So what is the suggestion? Should I go #2 option, or try to stumble along on option #3? I am certain I cannot show any code written in the first option to anyone without blushing. Is there any other approach you can recommend?
EDIT: After I've asked the question, I realized there is another option, somewhat orthogonal to these three options: stored procedures. There seems to be a few advantages to putting all your queries inside the database as stored procedures. They are kept in a central location, and not spread through the code (though maintenance is an issue - the parameters may get out of sync). The reliance on database dialect is solved automatically: if you move databases, you port all your stored procedures, and you are done. And there is also the security benefits.
With the stored procedure option, the alternatives 1 and 2 seem a little bit more suitable. There seems to be not enough entities to warrant option 3 - but it is still possible to separate the procedure call commands from database accessing code.
I've implemented option 3 without stored procedures, and option 2 with stored procedures, and it seems like the latter is more suitable for me (in case anyone is interested with the outcome of the question).
I would say put the sql where you would have put the equivalent LINQ query, or the sql for DataContext.ExecuteQuery. As for where that is... well, that is up to you and depends on how much separation you want. - Marc Gravell, creator on Dapper
See Marc's opinion on the matter
I think the key point is, you shouldn't really be re-using the SQL. If your logic is re-used then it should be wrapped in a method called that can then be called from multiple places.
I know you've accepted your answer already but I still wanted to show you a nice alternative that may be helpful in your case as well. Now or in the future.
When using stored procedures it's wise to use T4
I tend to use stored procedures on my project even though it's not using PetaPoco, Dapper or Massive (project started before these were here). It uses BLToolkit instead. Anyway. Instead of writing my methods to run stored procedures and write code to provide stored procedure parameters, I've written a T4 template that generates the code for me.
Whenever stored procedures change (some may be added/removed, parameters added/removed/renamed/retyped), my code will break on compilation because method calls will not match their signature any more.
I keep my stored procedures in a file (so they get version controlled). If you work in a multi-developer team it may be sensible to have stored procedures each in its own file. It makes updates much less painful. I've experienced that on some project and it worked ok as long as number of SPs is not huge. You can restructure them into folders based on the entity they're related to.
Anyway. Maintenance is related to stored procedures, code change is just a simple click of a button in Visual Studio that converts all T4s at once. You don't have to search your methods that use those procedures. You'll be reported errors while compiling. One thing less to worry about.
So instead of writing
using (var db = new DbManager())
return db
db.Parameter("#Name", name),
db.Parameter("#Email", email),
db.Parameter("#Birth", birth),
db.Parameter("#ExternalID", exId))
and having a bunch of magic strings I can just simply write:
using (var db = new DataManager())
return db
.SaveWithRelations(name, email, birth, exId)
This is nicer, cleaner breaks on compile and provides intellisense so it's also faster to while developing.
The good thing is that stored procedures may become very complex and may do many things. In my upper example I check some data, insert person record and some related one as well and in the end return the newly inserted Person record. Inserts and updated should usually return data that was added/changed to reflect actual state.

How does EF4 compare with NHibernate?

Is it any better? I heard the CodeFirst extension but is it ready for primetime. Please share your experience with development, any performance overheads, etc.
I think this is a timely question, as I was wondering the exact same thing. I am trying to create a serious e-commerce model and I am trying to keep my POCOs free of persistence concerns as well as trying to stay true to Domain Driven Design. So far, I am very wary, and I am on the fence about whether I should jump ship to NHibernate. The only thing keeping me from doing so is that I assume that Microsoft will improve (and quickly).
Some of the biggest problems so far:
Inability to finely control object materialization. EF calls the zero-arg constructor on your POCO, and this is a behavior you cannot change.
No enum support. The community has been screaming -- screaming! -- for this, and it hasn't happened. The workarounds are terrible, and pollute your domain model.
Weird mapping bugs when trying to control column names and relationships in the database. The main ones I can think of are with compound keys and many-to-many relationships. These can be worked around, and I assume these will be fixed by release time, but they are frustrating nonetheless.
Bad SQL. I also do DBA work, and the SQL that EF generates (with or without Code-First) is atrocious.
And this is just the tip of the iceberg: I am only starting to learn EF4 and I'm running into awful roadblocks. As I think of more reasons, I'll add them here. I'm still struggling through it.
(I wonder whether the community will give it another vote of "no confidence.")
To add to the "Weird mapping bugs" problem: You cannot control the name of a column if it participates in a self-referencing relationship (for example, if you have a hierarchy). I assume this will be fixed in the final release.
Lack of batching, resulting in multiple roundtrips to the database. For example, how do you delete a bunch of items from a collection? Load all entities into memory and delete them one at a time. A smaller gripe is the number of DB hits when inserting into tables that participate in an inheritance relationship.
No intelligent way to deal with model changes. EF Code-First loves to completely drop your entire database if it needs to change the schema.
Few extensibility points. You can literally count on one hand the number of events that EF4 allows you to subscribe to (and Code-First doesn't provide much more).
As for me - I prefer EF but with some enhancements. Basically EF offers to you the following advantages:
Visual Model Editor
Database/Model Update wizard (instead of manual XML changes - what is terrible for me)
Also, I'm using 3-rd party commercial tools based on EF and L2S (LinqConnect) that provide for me the following features:
Geography support
Optimized SQL generation
Product absolutely integrated to Visual Studio
Smart database update wizard (synchronization mode)

Does an ORM integrate with existing applications or do I not understand?

Assume Hibernate for the ORM.
I'm not sure how to ask this. I want to build an application that can replace part of another. For example, say I have an application with various modules, called the "big" app. This application may handle HR, financial, purchases, skill sets, etc. But maybe, for whatever reason, I don't like the skill set module, but I like the rest of the application. I want to build an app that uses the same database that the rest of the "big" app uses but use my software as the front end for that piece.
I could build my app and have it hit the database directly with no ORM. My question is is there an advantage to using an ORM here. I'm thinking there is because if the "big" app goes away and another app is purchased, we could continue to use my version of skill set because I am using hibernate instead of hitting things directly. I'm still learning but I thought that my application used objects that I named and that in the case I just described I'd have to change my mapping files only or/and my code very little.
Here is another example. I have a legacy application and legacy database. It uses database X. I decide that I no longer like the old terminal emulator application that is used to get the data and that I want a graphical version. I can use hibernate with my application and when I finally decide to get rid of the legacy database and change to the latest Oracle or SQL Server, I can do so with minimal headache? Or is my database going to change so much that it wouldn't have matter anyway (I'm suggesting that upon changing to a new database more information will want to be captured)?
I was hoping for comments, if I am misunderstanding why hibernate/ORM might or might not be a benefit.
Thank you.
I do not think you will have a huge benefit frmo hibernate if the database schema changes to something completely different, you might have to change more than just your mapping - especially if more "structure" is added to the database (tables, column and such schema things). That said, if the database was structured mostly the same way, but lets say just the column names and tables names changes and a couple of tables are merged or something like that - you can get by with just changing your mapping.
But I would really recommend using hbernate for database agnosticity, that's is a pretty easy path.
AND then just because it doesn't exactly helps you if your entire database is changed, it such an incredible amount of other forces, that I would choose that over direct DB access most of the time.
Lastly you could think about using a service-layer such as the repository pattern that abstracs away the data access, so the business of your appilcation wouldn't need to change if the database changes.
Switching from one DBMS to another (ala Oracle to SQL Server) is one thing that using an ORM would certainly make much easier.
As for switching from one "big app" to another "big app", I doubt if using an ORM would help that much. It's likely that the database structure and business logic would be different enough that you would find yourself rewriting lots of code anyways.
You can generate domain objects with Hibernate Tools, if you do that than it will be painless and fast. however if you write all the objects by hand you will die. i think its good idea to rewrite part of the app and get to know hibernate better.
I think it's generally a bad idea to make any decision based on the
unknowns versus the knowns. Whether you're deciding on a data
access/persistence strategy, what car to buy, or what college to go
to, you should put the most weight on the things you know you want
today, rather than worrying about what may or may not happen tomorrow.
So when considering ORMs, I wouldn't worry too much about things such as apps
"going away" or DBMSs changing (unless that's either already been talked about, or
there's a history of this in your company). I'm not saying that these aren't things that will never happen, but rather that they should take a back seat to the generally much more important considerations of maintainability, performance, and developer productivity.
So in short, choose an ORM based on its ability to solve the problems and satisfy the requirements that you have today.