ORM: Handwritten schema or auto-generated? - orm

Should I use a hand-written schema for my projected developed in a high-level language (such as Python, Ruby) or should I let my ORM solution auto-generate it?
Eventually I will need to migrate without destroying all the data. It's okay to be tied to a specific RDBMS but it would be nice if features such as constraints and procedures could be supported somehow.

I never go with ORM-generated schema.
I find that the ways in which the ORM wants to generate the schema are often at total odds with how I want my database to be structured. Also, and I know this is trivial, the nomenclature scheme is usually poor.
Database structure has its own constraints, that I find that usually the ORM autogeneration tools don't consider fully. And if you're going to be wanting to run reports on your database later (and you will), then having good database structure and design is very important.

See this Coding Horror article and links for discussion on that migration you'll eventually need to do. Plan for it now.
Also see Martin Fowler on database evolution; I particularly recommend the notion that test data generation is part of database set-up. The idea may be a little underdeveloped, in that there is not a clear delineation of the different problems in different environments, development versus QA versus production.

Let the ORM generate the schema it wants. Then you can always change things that are too slow or that you want differently. But it allows you to quickly get started and have something working plus the ORM people usually know what they do when it comes to generating schemas.

Let your ORM solution generate it, but don't just blindly use it; read through it and sanity-check it.

Related

For a data analytical web app, would it be better to use django's auto-generated relational objects or raw SQL queries?

I am making an application that will serve as a demonstration of the data analysis capabilities our team can provide. I am very new to django and a the auto-generated APIs seem pretty cool, but I worry about scalability and having the quickness that only a carefully constructed and queried database can provide. Has anyone been in this situation and regretted/been satisfied with there choice of raw queries vs django APIs?
Django's ORM is great. It is one of the most complete and easy to use ORM I have ever encounter, but as every ORM, it has its limitations.
If your application requires full control of the database and very efficient queries, you might consider other approaches and compare them to Django and see which one fits better. You'll have to do some research.
Django is great for developing very fast complicated database applications, but I'm sure --if the application grows long enough-- sooner or later you'll have to start working directly with your database engine for optimization reasons. ORMs are generic tools, so database engine specific functions will not be available.
There is no rule to decide wheter or not django is gonna work for you but, one thing I can tell you is that its ORM helps you get your app started very quick and if you find some specific circumstance where you need to customize your SQL, then you can do it in Django as well. If not, just create a Python module which handle the database as you like in those specific circumstances and use it from your Django code. That probably will be the best way if you need to show your very efficient data analysis capabilities.
I hope this bring some light, it is a very wide question. One thing I'm sure is that you won't regret until your app grows big enough and when it does, you'll have the resources to find great programmers that could twist Python to handle every specific situation that behaves odd with Django's database accessing tools.
Found this link which may be helpful.
Good luck!
I'd consider this a case of premature optimization.
Django ORM is good enough in a general case, provided that your database is reasonably designed, has appropriate indexes, etc.
In > 90% cases this will be adequate, and often optimal.
When you will have identified specific complicated and slow queries, have reviewed the ORM-generated SQL and came up with a better query, then you may add a special case for it.
Maybe you will have more than one such special case. I still think that ORM will save you a lot of legwork in the 90% of database access cases where it is adequate.
Besides querying, an ORM allows you to describe the DB schema, its constraints, ways to recreate it and migrate it between versions, etc. Even if the ORM would not let you query the DB, these management capabilities would be enough reason to use an ORM.

What do I need to know about databases in order to create a quality Django app?

I'm trying to optimize my site and found this nice little Django doc:
Database Access Optimization, which suggests profiling followed by indexing and the selection of proper fields as the starting point for database optimization.
Normally, the django docs explain things pretty well, even things that more experienced programmers might consider "obvious". Not so in this case. After no explanation of indexing, the doc goes on to say:
We will assume you have done the obvious things above.
Uhhh. Wait! What the heck is indexing?
Obviously I can figure out what indexing is via google, my question is: what is it that I need to know as far as database stuff goes in order to create a scalable website? What should I be aware of about the Django framework specifically? What other "obvious" things ought I know? Where can I learn them?
I'm looking to get pointed in a direction here. I don't need to learn anything and everything about SQL, I just want to be informed enough to build my app the right way.
Thanks in advance!
I encourage you to read all that the other answers suggest and whatever else you can find on the subject, because it's all good information to know and will make you a better programmer.
That said, one of the nice things about Django and other similar frameworks is that for the most part you don't have to know what's going on behind the scenes in the DB. Django adds indexes automatically for fields that need them. The encouragement to add more is based on the use cases of your app. If you continually query based on one particular field, you should ensure that that field is indexed. It might be already (if it's a foreign key, primary key, etc.), but other random fields typically aren't.
There's also various optimizations that are database client-specific. Django can't do much here because it's goal is to remain database independent. So, if you're using PostgreSQL, MySQL, whatever, read about optimizations and best practices concerning those particular clients.
Wikipedia database design, and database normalization http://en.wikipedia.org/wiki/Database_design, and http://en.wikipedia.org/wiki/Database_normalization are two very important concepts, in addition to indexing.
In addition to these, having a basic understanding of your database of choice is necessary. Being able to add users, set permissions, and create a database are key things that you should know.
Learning how to backup your data is also a crucial thing.
The list keeps getting longer, one should also be aware of the db relationships that django handles for you, OneToOne, ManyToMany, ManyToOne. https://docs.djangoproject.com/en/dev/topics/db/models/
The performance impact of JOINs shouldn't be ignored. Access model properties in django is so easy, but understanding that some of Foreign Key relationships could have huge performance impacts is something to consider too.
Once you have a basic understanding of these things you should be at a pretty good starting point for creating a non-trivial django app!
Wikipedia has a nice article about database indexes, they are similar(ish) to an index in a book i.e. lets you (the computer) find things faster because you just look at the index (probably a very bad example :-)
As for performance there are many things you can do and presumably as it is a very detailed subject in itself, and is something that is particular to each RDBMS then it would be distracting / irrelevant for them (django) to go into great detail. Best thing is really to google performance tips for your particular RDBMS. There are some general tips such as indexing, limiting queries to only return the required data etc.
I think one of the main things is a good design, sticking as much as possible to Normal Form and in general actually taking your database into consideration before programming your models etc (which clearly you seem to be doing). Naming conventions are also a big plus, remembering explicit is better then implicit :-)
To summarise:
Learn/understand the fundamentals such as the relational model
Decide on a naming convention
Design your database perhaps using an ERM tool
Prefer surrogate ID's
Use the correct data type of minimum possible size
Use indexes appropriately and don't over index
Avoid unecessary/over querying
Prioritise security and stability over raw performance
Once you have an up and running database 'tune' the database analysing/profiling settings, queries, design etc
Backup and archive regularly - cron
Hang out here :-)
If required advance into replication (master/slave - django supports this quite well too)
Consider upgrading your hardware
Don't get too hung up about it

Migrating procedural, antique CRUD code and proprietary DBMS to OO ORM on SQL

Please excuse my long-winded explanation, but I wanted to be as explicit as possible in the hopes of getting as much useful feedback on my situation as possible. You can skip to the questions at the bottom if you are impatient.
Explanation
At my current job, development is done in an antiquated language that is hard-wired to a proprietary DBMS that comes with the language. The language is CRUD-focused, and is essentially a glorified database querying/reporting/updating language with some programming features bolted on as an afterthought. Most programs are top-down procedures and there is very little code reuse; updating a record often requires updating many entangled, related records at the same time that you just need to "know about" as the proprietary database has no inherent foreign key relationships. If a table needs to be updated, we generally must grep our source code and update every procedure that creates/updates records for that table and recompile. I could go on with other annoyances, but needless to say, I am looking for a way to abstract away as much of this behavior as possible into reusable code segments.
The language has semi-recently added some support for object-oriented development, and I have been able to demonstrate the benefits of reusable code to my coworkers with a recent project written using OO constructs. However, my project was only possible because it was a rare task that did not require interacting with our database.
I have really been trying hard to find a way to create re-usable code using OO techniques with this language, but since everything is so database-focused, what I really need is a way to create container classes around our table designs, putting most of our data processing logic into class methods and merging N related tables into 1 singular class. This has brought me to the idea of ORM frameworks, which of course is non-existent on the language I am using at work.
What I have found, is that the DBMS for this language can run a SQL99 engine concurrently with the proprietary language engine, and it includes JDBC and ODBC drivers. This has opened the door for me to explore migration strategies, which is where I think we eventually need to go. Since the SQL engine runs concurrently with the old engine, it is possible for us to do an incremental migration, running new code alongside old code with an eventual goal of migrating our data to a "pure" SQL DBMS when all the old code is replaced.
I initially did quite a bit of reading and proposed Java (using JPA2 for ORM) to my manager, but I think I scared him as he views Java as being a bit heavyweight for our needs. I then did a little more digging and re-proposed Ruby using the JRuby interpreter (using either ActiveRecord or DataMapper for ORM), which was much better received as Rails seems to fit in well with the re-shifting of our development to Web-based front-ends that we are attempting to move to with our old cludgy code, and of course because the ability to interact with Java if the need arises is a great capability.
The Questions
Nearly all of the reading I have
been doing about ORM is focused on
starting with a class structure, and
creating the mapped database
structure as a secondary process.
Is going the other way around
(starting with an existing database
and mapping classes to it) a very
odd thing to do?
Assuming question #1 == true, how
flexible are existing ORM frameworks
such as JPA2, ActiveRecord,
DataMapper etc. to "imperfect" table
design? I am sure we will have to
do some refactoring of existing
table design, but would like to know
if I am undertaking a Herculean task
before I waste too much time on the
effort.
If anyone has a better idea for
language+ORM, I would love to hear
it. It must be SQL-ready using JDBC
or ODBC to fit into our incremental
migration plan.
If anyone has any experience on a similar effort and could point out any helpful resources (especially books), I would be very grateful!
Nearly all of the reading I have been doing about ORM is focused on starting with a class structure, and creating the mapped database structure as a secondary process. Is going the other way around (starting with an existing database and mapping classes to it) a very odd thing to do?
Not really. There are several approaches when dealing with the persistence layer of an application:
Top-down: You start with the object model and the mappings and you derive the database schema from that data.
Bottom-up: You start with your data model i.e. the database schema and you derive the object model and the mappings from the tables.
Middle-out: You start with the mapping and you generate the object model and the tables.
Meet-in-the-middle: You start with an existing database schema and an existing object model, you develop a mapping to map between the two (you can even introduce an additional object layer and brige the existing one).
The top-down approach is the most object-oriented but the meet-in-the-middle approach is probably the most common.
Assuming question #1 == true, how flexible are existing ORM frameworks such as JPA2, ActiveRecord, DataMapper etc. to "imperfect" table design? I am sure we will have to do some refactoring of existing table design, but would like to know if I am undertaking a Herculean task before I waste too much time on the effort.
I would say that JPA is not the most flexible, it will not deal very well with exotic or heavily denormalized schemas (the result might be ugly from an OO point of view). Accesses that don't go through JPA might also be a problem. A data mapper tool like iBatis (now mybatis) will give you more flexibility.
If anyone has a better idea for language+ORM, I would love to hear it. It must be SQL-ready using JDBC or ODBC to fit into our incremental migration plan.
I know that RoR can deal with existing databases, I'm just not sure what the result will look like. But I don't really have enough experience with RoR so I'll let experts elaborate on this.
If anyone has any experience on a similar effort and could point out any helpful resources (especially books), I would be very grateful!
I suggest to browse Scott Ambler website and his book(s):
The Process of Database Refactoring: Strategies for Improving Database Quality
More food for thought:
Working Effectively with Legacy Code by Michael Feathers
Clean Code by Robert Martin

What is the difference between NHibernate and iBATIS.NET?

I am looking for some up to date information comparing NHibernate and iBATIS.NET. I found some information searching Google, but a good bit of it applies either to the Java versions of these products or is dated.
Some specific things I am interested in:
Which is better if you control both the data model and the application?
iBATIS is repeatedly called simpler to learn - does this have long-term maintenance consequences (i.e. easy to start, hard to maintain)?
Do both make it easy to switch the underlying database vendor?
How skilled do your developers need to be with SQL?
Any major feature that one has that the other lacks?
Is either product more suitable for a particular type of application?
Real world examples of observed benefits and drawbacks are appreciated!
EDIT: Thanks for the information. I am doing my own evaluation as well. One thing I am wondering about still, does iBATIS help you to save/update complex object graphs? It seems like NHibernate is nice in that I can pass it a root object and it figures out the details of what, if anything, needs to be updated in the database.
I made some research a while ago.
One specific question from me, might give you some additional information:
Would you use NHibernate for a project with a legacy database, which is partly out of your control?
Some of your points of interest I can answer:
Which is better if you control both the data model and the application?
I can answer it the other way around: If you don't have control over the data model and thus facing some legacy database, iBatis is the better choice.
iBATIS is repeatedly called simpler to learn - does this have long-term maintenance consequences (i.e. easy to start, hard to maintain)?
It depends what you want to do with it. If you have a domain driven development approach then iBatis might get painful by time. If you just do simple data manipulation and don't have a full blown domain model then nHibernate might be a overkill by the time.
Do both make it easy to switch the underlying database vendor?
Both have mechanisms to shield you off from a specific database vendor, but I admit that have not done intense research in this direction.
How skilled do your developers need to be with SQL?
When you use iBatis, you need more SQL skills than NHibernate. Using iBatis you always need to code some SQL. NHibernate doesn't require you to code SQL statements -- it even can do the DDLs for you. Powerful features will require you to go to old good SQL, which will be inevitable.
Some other points:
I personally find that iBatis much more lightweighter. You can get things done very quickly. NHibernate is more powerful, but has much more features, which you can use in wrong way.
It is possible to combine the use of NHibernate and iBatis! You can use NHibernate for your business logic. For reporting purposes, where you just read data out of tables, fallback to iBatis.
If your application has a longer life cycle and a lot of business logic, consider NHibernate. It has a lot of feature aiding you in handle business objects.
The community around NHibernate is very active and come up with useful tools.
In a sense it's comparing apples to oranges.
Which is better if you control both the data model and the application?
They both work with normalized databases well, so they are more-or-less equal if you can shape the db. iBatis is better at mapping to legacy databases since it doesn't actually care about the database structure at all. It only cares about the shape of the result set.
.iBATIS is repeatedly called simpler to learn - does this have long-term maintenance consequences (i.e. easy to start, hard to maintain)?
It is much simpler, but that is because it has a much smaller featureset. I don't think it has any ticking timebomb long term maintenance issues.
Do both make it easy to switch the underlying database vendor?
Yes
How skilled do your developers need to be with SQL?
Both require a good knowledge of SQL. With iBatis, you still have to write the sql queries/procs. With NHibernate you have to know how to write NHibernate queries to get effective SQL. Neither are a replacement for SQL knowledge.
Any major feature that one has that the other lacks?
iBatis is a datamapper (a term used on the iBatis site). NHibernate is a full-blown Object Relational Mapper. iBatis is a great way to go if you primarily want something that takes the monotony out of mapping objects to result sets. However, it doesn't go all the way in trying to solve the object/relational mismatch. NHibernate has many more features such as dirty tracking, caching based on identity /identity map, flexible querying, dynamic sql, batching etc... NHibernate is much more dynamic in that it can do many things in one trip to the DB that could take iBatis several trips.
We recently posted an article comparing these two tools, and I think many of your questions are addressed. The article is here on our wiki site.

Is O/R Mapping worth it?

The expressiveness of the query languages (QL) provided with ORMs can be very powerful. Unfortunately, once you have a fleet of complex queries, and then some puzzling schema or data problem arises, it is very difficult to enlist the DBA help that you need? Here they are, part of the team that is evolving the database, yet they can't read the application QL, much less suggest modifications. I generally end up grabbing generated SQL out of the log for them. But then when they recommend changes to it, how does that relate to the original QL? The process is not round-trip.
So after a decade of promoting the value of ORMs, I am now wondering if I should be writing my SQL manually. And maybe all that I really want the framework to do is automate the data marshaling as much as possible.
Question: Have you found a way to deal with the round-trip issue in your organization? Is there a SQL-marshaling framework that scales well, and maintains easily?
(Yes, I know that pure SQL might bind me to the database vendor. But it is possible to write standards-compliant SQL.)
I think that what you want is a solution that maximizes the benefits of ORM without preventing you using other means. We have much the same issue as you do in our application; very heavy queries, and a large data model. Given the size of the data model, ORM is invaluable for the vast majority of the application. It allows us to extend the data model without having to go to a great deal of effort hand-maintaining SQL scripts. Moreover, and you touched on this, we support four database vendors, so the abstraction is nice.
However, there are instances where we've had to tune the queries manually, and since we chose a flexible ORM solution, we can do that too. As you say, it gets out of our way when we need it gone, and simply marshals objects for us.
So, in short (yep, short) yes, ORM is worth it, but like every solution to a problem, it's not a panacea.
In general, ORMs increase developer productivity a lot so I'd using them unless they've become a bigger problem than they're worth. If a majority of your tables are big enough that you are having a lot of problems, consider ditching the ORM. I would definitely not say that ORMs are a bad idea in general. Most databases are small enough and most queries are simple enough that they work well.
I've overcome that problem by using stored procedures or hand-written SQL only for the poorly performing queries. DBAs love stored procedures because they can modify them without telling you. Most (if not all) ORMs allow you to mix in hand written SQL or stored procedures.
todays O/R frameworks, as i believe you're familiar with, support the option of defining some queries manually ((N)Hibernate does). that can be used for complex parts of schemas, and for straight-forward parts use the ORM as provided by the framework.
another thing for you to check out might be the iBatis framework (http://ibatis.apache.org/). i haven't used it, but i've read that it's more close to SQL and people familiar with databases and SQL prefer it over full-blown ORM framework like hibernate, because it's closer to them than the completely different concept of ORM.