In the past, any database administrator worth his salt would have told you never to query against a table directly. In fact, they would have prevented it by putting all tables in one schema & cutting-off direct access to that schema...thereby forcing you to query or CRUD from views & procedures etc. And further, protecting the data with security-layers like this made sense from a security perspective.
Now enters Entity Framework...
We use Entity Framework now where I work. The IQueryable is King! Everything is back into the DBO schema. And, people are going directly to tables left-and-right because Repository patterns and online examples seem to encourage this very practice.
Does Entity Framework do something under the hood that makes this practice okay now?
If so, what does it do?
Why is this okay now?
Help me understand.
Does Entity Framework do something under the hood that makes this
practice okay now?
EF doesn't really change the dynamic. It is a convenience of data access and rapid development, not a better way to get data.
If so, what does it do?
It does not. It does, I think, at least avoid constructing SELECT * type queries.
Why is this okay now?
It remains a matter of opinion and culture. In general a "strict" DBA will want you to hit only exposed objects layered on top of the tables for CRUD. It is much easier to tune such queries and maintain control of performance if the application is using the expose custom objects rather than using an ORM or hand-coding direct queries.
IMO, ORM are great for rapid protyping and basic stuff, but anything with more complex logic or substantial performance should be moved to custom built objects.
Where/when that lines is will vary substantially based on any number of things including:
Size/load of database
Availability of database professionals vs app
developers
Maturity of company
There are different types of Entity Framework (Code First, Database First, Model First). It appears you have an issue with Code First, which creates the database based on your classes and limits what you can do on the database side of things.
However with EF Database First, you can still do everything you did before. You can restrict access to your tables and expose Stored Procedures/Views for your CRUD operations. You would still benefit from EF, because of the strongly typed classes that are generated from your Views, and strongly typed methods generated from your Stored Procedures.
Now everyone can be happy - you get to cut off access to the schema and IQueryable is still king
Is there any situation in which you would create a table called Node and another table called NodeInstance if you were designing the DB schema to serve as the persistence and CRUD layer for an entity called Node using a relational database?
This question has been posted to try to dissuade my colleagues from making costly mistakes during the design of the database schema whose purpose is to serve as the storage, persistence and CRUD data-layer for the backend of an iPad app I am working on and also to avoid creating bugs and issues which will be a maintenance nightmare in the future. Because I am under NDA, I cannot post any details regarding the exact nature of the project, except to say that the main entity that we are creating the CRUD layer on the server for, is called a Node. Therefore, I have recommended to my colleagues who are working on the backend to create a class called Node to represent the Node object and to insert rows in the Node table in the relational DB for the Create operation, with a new row representing every instance of the Node which is created using the client app, in true ORM fashion.
However, for some reason, my colleagues seem to think that having 1 table for persisting the Node objects is the wrong approach and the right approach is to create a Node table and a NodeInstance table and that maintaining 2 tables in parallel to manage the persistence of the Node entity is more efficient / performant. Since I am a bit of an ORM nerd and also a DB-schema geek, I have been trying to figure out if there is any planet in the known universe where using a 2 table approach to persist and perform CRUD on 1 entity, would be a good idea but in all scenarios, it seems that this adds code complexity, not to mention that it necessitates unnecessary SQL joins, multiple queries to maintain data-integrity, issues with atomic transactions, and concurrency issues. However, if there is anyone on stackoverflow that can help me understand why my colleagues approach is sane, then I would like to have an open mind and try to entertain the notion that my colleagues are right. However, at the time of writing this, I am convinced that my colleagues do not have a complete understanding of ORM and are therefore, thinking of using 2 tables for persisting 1 entity. Please do shine some wisdom on this matter, my respected stackoverflow peers and experts.
Is it any better? I heard the CodeFirst extension but is it ready for primetime. Please share your experience with development, any performance overheads, etc.
I think this is a timely question, as I was wondering the exact same thing. I am trying to create a serious e-commerce model and I am trying to keep my POCOs free of persistence concerns as well as trying to stay true to Domain Driven Design. So far, I am very wary, and I am on the fence about whether I should jump ship to NHibernate. The only thing keeping me from doing so is that I assume that Microsoft will improve (and quickly).
Some of the biggest problems so far:
Inability to finely control object materialization. EF calls the zero-arg constructor on your POCO, and this is a behavior you cannot change.
No enum support. The community has been screaming -- screaming! -- for this, and it hasn't happened. The workarounds are terrible, and pollute your domain model.
Weird mapping bugs when trying to control column names and relationships in the database. The main ones I can think of are with compound keys and many-to-many relationships. These can be worked around, and I assume these will be fixed by release time, but they are frustrating nonetheless.
Bad SQL. I also do DBA work, and the SQL that EF generates (with or without Code-First) is atrocious.
And this is just the tip of the iceberg: I am only starting to learn EF4 and I'm running into awful roadblocks. As I think of more reasons, I'll add them here. I'm still struggling through it.
(I wonder whether the community will give it another vote of "no confidence.")
More:
To add to the "Weird mapping bugs" problem: You cannot control the name of a column if it participates in a self-referencing relationship (for example, if you have a hierarchy). I assume this will be fixed in the final release.
Lack of batching, resulting in multiple roundtrips to the database. For example, how do you delete a bunch of items from a collection? Load all entities into memory and delete them one at a time. A smaller gripe is the number of DB hits when inserting into tables that participate in an inheritance relationship.
No intelligent way to deal with model changes. EF Code-First loves to completely drop your entire database if it needs to change the schema.
Few extensibility points. You can literally count on one hand the number of events that EF4 allows you to subscribe to (and Code-First doesn't provide much more).
As for me - I prefer EF but with some enhancements. Basically EF offers to you the following advantages:
Visual Model Editor
Database/Model Update wizard (instead of manual XML changes - what is terrible for me)
Also, I'm using 3-rd party commercial tools based on EF and L2S (LinqConnect) that provide for me the following features:
Geography support
Optimized SQL generation
Product absolutely integrated to Visual Studio
Smart database update wizard (synchronization mode)
Please excuse my long-winded explanation, but I wanted to be as explicit as possible in the hopes of getting as much useful feedback on my situation as possible. You can skip to the questions at the bottom if you are impatient.
Explanation
At my current job, development is done in an antiquated language that is hard-wired to a proprietary DBMS that comes with the language. The language is CRUD-focused, and is essentially a glorified database querying/reporting/updating language with some programming features bolted on as an afterthought. Most programs are top-down procedures and there is very little code reuse; updating a record often requires updating many entangled, related records at the same time that you just need to "know about" as the proprietary database has no inherent foreign key relationships. If a table needs to be updated, we generally must grep our source code and update every procedure that creates/updates records for that table and recompile. I could go on with other annoyances, but needless to say, I am looking for a way to abstract away as much of this behavior as possible into reusable code segments.
The language has semi-recently added some support for object-oriented development, and I have been able to demonstrate the benefits of reusable code to my coworkers with a recent project written using OO constructs. However, my project was only possible because it was a rare task that did not require interacting with our database.
I have really been trying hard to find a way to create re-usable code using OO techniques with this language, but since everything is so database-focused, what I really need is a way to create container classes around our table designs, putting most of our data processing logic into class methods and merging N related tables into 1 singular class. This has brought me to the idea of ORM frameworks, which of course is non-existent on the language I am using at work.
What I have found, is that the DBMS for this language can run a SQL99 engine concurrently with the proprietary language engine, and it includes JDBC and ODBC drivers. This has opened the door for me to explore migration strategies, which is where I think we eventually need to go. Since the SQL engine runs concurrently with the old engine, it is possible for us to do an incremental migration, running new code alongside old code with an eventual goal of migrating our data to a "pure" SQL DBMS when all the old code is replaced.
I initially did quite a bit of reading and proposed Java (using JPA2 for ORM) to my manager, but I think I scared him as he views Java as being a bit heavyweight for our needs. I then did a little more digging and re-proposed Ruby using the JRuby interpreter (using either ActiveRecord or DataMapper for ORM), which was much better received as Rails seems to fit in well with the re-shifting of our development to Web-based front-ends that we are attempting to move to with our old cludgy code, and of course because the ability to interact with Java if the need arises is a great capability.
The Questions
Nearly all of the reading I have
been doing about ORM is focused on
starting with a class structure, and
creating the mapped database
structure as a secondary process.
Is going the other way around
(starting with an existing database
and mapping classes to it) a very
odd thing to do?
Assuming question #1 == true, how
flexible are existing ORM frameworks
such as JPA2, ActiveRecord,
DataMapper etc. to "imperfect" table
design? I am sure we will have to
do some refactoring of existing
table design, but would like to know
if I am undertaking a Herculean task
before I waste too much time on the
effort.
If anyone has a better idea for
language+ORM, I would love to hear
it. It must be SQL-ready using JDBC
or ODBC to fit into our incremental
migration plan.
If anyone has any experience on a similar effort and could point out any helpful resources (especially books), I would be very grateful!
Nearly all of the reading I have been doing about ORM is focused on starting with a class structure, and creating the mapped database structure as a secondary process. Is going the other way around (starting with an existing database and mapping classes to it) a very odd thing to do?
Not really. There are several approaches when dealing with the persistence layer of an application:
Top-down: You start with the object model and the mappings and you derive the database schema from that data.
Bottom-up: You start with your data model i.e. the database schema and you derive the object model and the mappings from the tables.
Middle-out: You start with the mapping and you generate the object model and the tables.
Meet-in-the-middle: You start with an existing database schema and an existing object model, you develop a mapping to map between the two (you can even introduce an additional object layer and brige the existing one).
The top-down approach is the most object-oriented but the meet-in-the-middle approach is probably the most common.
Assuming question #1 == true, how flexible are existing ORM frameworks such as JPA2, ActiveRecord, DataMapper etc. to "imperfect" table design? I am sure we will have to do some refactoring of existing table design, but would like to know if I am undertaking a Herculean task before I waste too much time on the effort.
I would say that JPA is not the most flexible, it will not deal very well with exotic or heavily denormalized schemas (the result might be ugly from an OO point of view). Accesses that don't go through JPA might also be a problem. A data mapper tool like iBatis (now mybatis) will give you more flexibility.
If anyone has a better idea for language+ORM, I would love to hear it. It must be SQL-ready using JDBC or ODBC to fit into our incremental migration plan.
I know that RoR can deal with existing databases, I'm just not sure what the result will look like. But I don't really have enough experience with RoR so I'll let experts elaborate on this.
If anyone has any experience on a similar effort and could point out any helpful resources (especially books), I would be very grateful!
I suggest to browse Scott Ambler website and his book(s):
The Process of Database Refactoring: Strategies for Improving Database Quality
More food for thought:
Working Effectively with Legacy Code by Michael Feathers
Clean Code by Robert Martin
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
As a web developer looking to move from hand-coded PHP sites to framework-based sites, I have seen a lot of discussion about the advantages of one ORM over another. It seems to be useful for projects of a certain (?) size, and even more important for enterprise-level applications.
What does it give me as a developer? How will my code differ from the individual SELECT statements that I use now? How will it help with DB access and security? How does it find out about the DB schema and user credentials?
Edit: #duffymo pointed out what should have been obvious to me: ORM is only useful for OOP code. My code is not OO, so I haven't run into the problems that ORM solves.
I'd say that if you aren't dealing with objects there's little point in using an ORM.
If your relational tables/columns map 1:1 with objects/attributes, there's not much point in using an ORM.
If your objects don't have any 1:1, 1:m or m:n relationships with other objects, there's not much point in using an ORM.
If you have complex, hand-tuned SQL, there's not much point in using an ORM.
If you've decided that your database will have stored procedures as its interface, there's not much point in using an ORM.
If you have a complex legacy schema that can't be refactored, there's not much point in using an ORM.
So here's the converse:
If you have a solid object model, with relationships between objects that are 1:1, 1:m, and m:n, don't have stored procedures, and like the dynamic SQL that an ORM solution will give you, by all means use an ORM.
Decisions like these are always a choice. Choose, implement, measure, evaluate.
ORMs are being hyped for being the solution to Data Access problems. Personally, after having used them in an Enterprise Project, they are far from being the solution for Enterprise Application Development. Maybe they work in small projects. Here are the problems we have experienced with them specifically nHibernate:
Configuration: ORM technologies require configuration files to map table schemas into object structures. In large enterprise systems the configuration grows very quickly and becomes extremely difficult to create and manage. Maintaining the configuration also gets tedious and unmaintainable as business requirements and models constantly change and evolve in an agile environment.
Custom Queries: The ability to map custom queries that do not fit into any defined object is either not supported or not recommended by the framework providers. Developers are forced to find work-arounds by writing adhoc objects and queries, or writing custom code to get the data they need. They may have to use Stored Procedures on a regular basis for anything more complex than a simple Select.
Proprietery binding: These frameworks require the use of proprietary libraries and proprietary object query languages that are not standardized in the computer science industry. These proprietary libraries and query languages bind the application to the specific implementation of the provider with little or no flexibility to change if required and no interoperability to collaborate with each other.
Object Query Languages: New query languages called Object Query Languages are provided to perform queries on the object model. They automatically generate SQL queries against the databse and the user is abstracted from the process. To Object Oriented developers this may seem like a benefit since they feel the problem of writing SQL is solved. The problem in practicality is that these query languages cannot support some of the intermediate to advanced SQL constructs required by most real world applications. They also prevent developers from tweaking the SQL queries if necessary.
Performance: The ORM layers use reflection and introspection to instantiate and populate the objects with data from the database. These are costly operations in terms of processing and add to the performance degradation of the mapping operations. The Object Queries that are translated to produce unoptimized queries without the option of tuning them causing significant performance losses and overloading of the database management systems. Performance tuning the SQL is almost impossible since the frameworks provide little flexiblity over controlling the SQL that gets autogenerated.
Tight coupling: This approach creates a tight dependancy between model objects and database schemas. Developers don't want a one-to-one correlation between database fields and class fields. Changing the database schema has rippling affects in the object model and mapping configuration and vice versa.
Caches: This approach also requires the use of object caches and contexts that are necessary to maintian and track the state of the object and reduce database roundtrips for the cached data. These caches if not maintained and synchrnonized in a multi-tiered implementation can have significant ramifications in terms of data-accuracy and concurrency. Often third party caches or external caches have to be plugged in to solve this problem, adding extensive burden to the data-access layer.
For more information on our analysis you can read:
http://www.orasissoftware.com/driver.aspx?topic=whitepaper
At a very high level: ORMs help to reduce the Object-Relational impedance mismatch. They allow you to store and retrieve full live objects from a relational database without doing a lot of parsing/serialization yourself.
What does it give me as a developer?
For starters it helps you stay DRY. Either you schema or you model classes are authoritative and the other is automatically generated which reduces the number of bugs and amount of boiler plate code.
It helps with marshaling. ORMs generally handle marshaling the values of individual columns into the appropriate types so that you don't have to parse/serialize them yourself. Furthermore, it allows you to retrieve fully formed object from the DB rather than simply row objects that you have to wrap your self.
How will my code differ from the individual SELECT statements that I use now?
Since your queries will return objects rather then just rows, you will be able to access related objects using attribute access rather than creating a new query. You are generally able to write SQL directly when you need to, but for most operations (CRUD) the ORM will make the code for interacting with persistent objects simpler.
How will it help with DB access and security?
Generally speaking, ORMs have their own API for building queries (eg. attribute access) and so are less vulnerable to SQL injection attacks; however, they often allow you to inject your own SQL into the generated queries so that you can do strange things if you need to. Such injected SQL you are responsible for sanitizing yourself, but, if you stay away from using such features then the ORM should take care of sanitizing user data automatically.
How does it find out about the DB schema and user credentials?
Many ORMs come with tools that will inspect a schema and build up a set of model classes that allow you to interact with the objects in the database. [Database] user credentials are generally stored in a settings file.
If you write your data access layer by hand, you are essentially writing your own feature poor ORM.
Oren Eini has a nice blog which sums up what essential features you may need in your DAL/ORM and why it writing your own becomes a bad idea after time:
http://ayende.com/Blog/archive/2006/05/12/25ReasonsNotToWriteYourOwnObjectRelationalMapper.aspx
EDIT: The OP has commented in other answers that his code base isn't very object oriented. Dealing with object mapping is only one facet of ORMs. The Active Record pattern is a good example of how ORMs are still useful in scenarios where objects map 1:1 to tables.
Top Benefits:
Database Abstraction
API-centric design mentality
High Level == Less to worry about at the fundamental level (its been thought of for you)
I have to say, working with an ORM is really the evolution of database-driven applications. You worry less about the boilerplate SQL you always write, and more on how the interfaces can work together to make a very straightforward system.
I love not having to worry about INNER JOIN and SELECT COUNT(*). I just work in my high level abstraction, and I've taken care of database abstraction at the same time.
Having said that, I never have really run into an issue where I needed to run the same code on more than one database system at a time realistically. However, that's not to say that case doesn't exist, its a very real problem for some developers.
I can't speak for other ORM's, just Hibernate (for Java).
Hibernate gives me the following:
Automatically updates schema for tables on production system at run-time. Sometimes you still have to update some things manually yourself.
Automatically creates foreign keys which keeps you from writing bad code that is creating orphaned data.
Implements connection pooling. Multiple connection pooling providers are available.
Caches data for faster access. Multiple caching providers are available. This also allows you to cluster together many servers to help you scale.
Makes database access more transparent so that you can easily port your application to another database.
Make queries easier to write. The following query that would normally require you to write 'join' three times can be written like this:
"from Invoice i where i.customer.address.city = ?" this retrieves all invoices with a specific city
a list of Invoice objects are returned. I can then call invoice.getCustomer().getCompanyName(); if the data is not already in the cache the database is queried automatically in the background
You can reverse-engineer a database to create the hibernate schema (haven't tried this myself) or you can create the schema from scratch.
There is of course a learning curve as with any new technology but I think it's well worth it.
When needed you can still drop down to the lower SQL level to write an optimized query.
Most databases used are relational databases which does not directly translate to objects. What an Object-Relational Mapper does is take the data, create a shell around it with utility functions for updating, removing, inserting, and other operations that can be performed. So instead of thinking of it as an array of rows, you now have a list of objets that you can manipulate as you would any other and simply call obj.Save() when you're done.
I suggest you take a look at some of the ORM's that are in use, a favourite of mine is the ORM used in the python framework, django. The idea is that you write a definition of how your data looks in the database and the ORM takes care of validation, checks and any mechanics that need to run before the data is inserted.
What does it give me as a developer?
Saves you time, since you don't have to code the db access portion.
How will my code differ from the individual SELECT statements that I use now?
You will use either attributes or xml files to define the class mapping to the database tables.
How will it help with DB access and security?
Most frameworks try to adhere to db best practices where applicable, such as parametrized SQL and such. Because the implementation detail is coded in the framework, you don't have to worry about it. For this reason, however, it's also important to understand the framework you're using, and be aware of any design flaws or bugs that may open unexpected holes.
How does it find out about the DB schema and user credentials?
You provide the connection string as always. The framework providers (e.g. SQL, Oracle, MySQL specific classes) provide the implementation that queries the db schema, processes the class mappings, and renders / executes the db access code as necessary.
Personally I've not had a great experience with using ORM technology to date. I'm currently working for a company that uses nHibernate and I really can't get on with it. Give me a stored proc and DAL any day! More code sure ... but also more control and code that's easier to debug - from my experience using an early version of nHibernate it has to be added.
Using an ORM will remove dependencies from your code on a particular SQL dialect. Instead of directly interacting with the database you'll be interacting with an abstraction layer that provides insulation between your code and the database implementation. Additionally, ORMs typically provide protection from SQL injection by constructing parameterized queries. Granted you could do this yourself, but it's nice to have the framework guarantee.
ORMs work in one of two ways: some discover the schema from an existing database -- the LINQToSQL designer does this --, others require you to map your class onto a table. In both cases, once the schema has been mapped, the ORM may be able to create (recreate) your database structure for you. DB permissions probably still need to be applied by hand or via custom SQL.
Typically, the credentials supplied programatically via the API or using a configuration file -- or both, defaults coming from a configuration file, but able to be override in code.
While I agree with the accepted answer almost completely, I think it can be amended with lightweight alternatives in mind.
If you have complex, hand-tuned SQL
If your objects don't have any 1:1, 1:m or m:n relationships with other objects
If you have a complex legacy schema that can't be refactored
...then you might benefit from a lightweight ORM where SQL is is not
obscured or abstracted to the point where it is easier to write your
own database integration.
These are a few of the many reasons why the developer team at my company decided that we needed to make a more flexible abstraction to reside on top of the JDBC.
There are many open source alternatives around that accomplish similar things, and jORM is our proposed solution.
I would recommend to evaluate a few of the strongest candidates before choosing a lightweight ORM. They are slightly different in their approach to abstract databases, but might look similar from a top down view.
jORM
ActiveJDBC
ORMLite
my concern with ORM frameworks is probably the very thing that makes it attractive to lots of developers.
nameley that it obviates the need to 'care' about what's going on at the DB level. Most of the problems that we see during the day to day running of our apps are related to database problems. I worry slightly about a world that is 100% ORM that people won't know about what queries are hitting the database, or if they do, they are unsure about how to change them or optimize them.
{I realize this may be a contraversial answer :) }