Related
I was doing a project that requires frequent database access, insertions and deletions. Should I go for Raw SQL commands or should I prefer to go with an ORM technique? The project can work fine without any objects and using only SQL commands? Does this affect scalability in general?
EDIT: The project is one of the types where the user isn't provided with my content, but the user generates content, and the project is online. So, the amount of content depends upon the number of users, and if the project has even 50000 users, and additionally every user can create content or read content, then what would be the most apt approach?
If you have no ( or limited ) experience with ORM, then it will take time to learn new API. Plus, you have to keep in mind, that the sacrifice the speed for 'magic'. For example, most ORMs will select wildcard '*' for fields, even when you just need list of titles from your Articles table.
And ORMs will aways fail in niche cases.
Most of ORMs out there ( the ones based on ActiveRecord pattern ) are extremely flawed from OOP's point of view. They create a tight coupling between your database structure and class/model.
You can think of ORMs as technical debt. It will make the start of project easier. But, as the code grows more complex, you will begin to encounter more and more problems caused by limitations in ORM's API. Eventually, you will have situations, when it is impossible to to do something with ORM and you will have to start writing SQL fragments and entires statements directly.
I would suggest to stay away from ORMs and implement a DataMapper pattern in your code. This will give you separation between your Domain Objects and the Database Access Layer.
I'd say it's better to try to achieve the objective in the most simple way possible.
If using an ORM has no real added advantage, and the application is fairly simple, I would not use an ORM.
If the application is really about processing large sets of data, and there is no business logic, I would not use an ORM.
That doesn't mean that you shouldn't design your application property though, but again: if using an ORM doesn't give you any benefit, then why should you use it ?
For speed of development, I would go with an ORM, in particular if most data access is CRUD.
This way you don't have to also develop the SQL and write data access routines.
Scalability should't suffer, though you do need to understand what you are doing (you could hurt scalability with raw SQL as well).
If the project is either oriented :
- data editing (as in viewing simple tables of data and editing them)
- performance (as in designing the fastest algorithm to do a simple task)
Then you could go with direct sql commands in your code.
The thing you don't want to do, is do this if this is a large software, where you end up with many classes, and lot's of code. If you are in this case, and you scatter sql everywhere in your code, you will clearly regret it someday. You will have a hard time making changes to your domain model. Any modification would become really hard (except for adding functionalities or entites independant with the existing ones).
More information would be good, though, as :
- What do you mean by frequent (how frequent) ?
- What performance do you need ?
EDIT
It seems you're making some sort of CMS service. My bet is you don't want to start stuffing your code with SQL. #teresko's pattern suggestion seems interesting, seperating your application logic from the DB (which is always good), but giving the possiblity to customize every queries. Nonetheless, adding a layer that fills in memory objects can take more time than simply using the database result to write your page, but I don't think that small difference should matter in your case.
I'd suggest to choose a good pattern that seperates your business logique and dataAccess, like what #terekso suggested.
It depends a bit on timescale and your current knowledge of MySQL and ORM systems. If you don't have much time, just do whatever you know best, rather than wasting time learning a whole new set of code.
With more time, an ORM system like Doctrine or Propel can massively improve your development speed. When the schema is still changing a lot, you don't want to be spending a lot of time just rewriting queries. With an ORM system, it can be as simple as changing the schema file and clearing the cache.
Then when the design settles down, keep an eye on performance. If you do use ORM and your code is solid OOP, it's not too big an issue to migrate to SQL one query at a time.
That's the great thing about coding with OOP - a decision like this doesn't have to bind you forever.
I would always recommend using some form of ORM for your data access layer, as there has been a lot of time invested into the security aspect. That alone is a reason to not roll your own, unless you feel confident about your skills in protecting against SQL injection and other vulnerabilities.
A project I am currently employed with will have some time soon to improve and specialise a product that is currently in use.
We may have about 4 man weeks spare in which we could replace the typed datasets that are in use.
The project is currently written in Vb.Net and we will definitely not have time to replace this code with C#.Net, although we would like to.
My question is what would you suggest as a replacement for the typed datasets.
I have currently suggested nHibernate as I have worked with Hibernate before and loved it.
Linq to SQL has been discounted.
So if you can suggest something else/better or highlight what advantages or disadvantages with regards to our current time constraints please do!
Considering your time constraints Linq to SQL (despite being deprecated) would have been ideal. While NH or EF4 are more complete and flexible ORM solutions they do require more consideration of mappings than does a simple drag and drop from the Server Explorer connection mapping onto LINQ to SQL designer and simple instantiation of a DataContext object.
If you don't have the time to get everyone up to speed on an ORM with a future why eliminate the typed datasets at all?
Performance wise they are probably close to identical to what you would be able to get out of an ORM. The benefit of replacement would be maintainability and developer pleasure, both of which would be <warning:shameless plug for personal preference> accompanied by a C# rewrite at the same time...
I think NHibernate is a good choice to replace typed datasets, I just successfully did that on a project I was on recently. I wouldn't do a "big bang" approach though. I would write new features using NHibernate and maintain old features using typed datasets. Once the new features are working well with NHibernate and you have the appropriate usage patterns in place, I would carefully transition the typed dataset and sproc code to use NHibernate instead. The speed at which you do the replacement doesn't really matter, just move at a comfortable pace.
Big bang is always a highly risky approach and incremental progress is easier for everyone to swallow.
I honestly don't see a compelling reason to switch a project in production from VB.NET to C#, there are so few meaningful differences and it helps to have VB.NET (in addition to C#) experience on your resume.
I would not encourage use of LinqToSql nor would I encourage use of Entity Framework 3.5. EF 4 may be a reasonable option using the same incremental approach.
I often hear people bashing ORMs for being inflexible and a "leaky abstraction", but you really don't hear why they're problematic. When used properly, what exactly are the faults of ORMs? I'm asking this because I'm working on a PHP orm and I'd like for it to solve problems that a lot of other ORMs fail at, such as lazy loading and the lack of subqueries.
Please be specific with your answers. Show some code or describe a database schema where an ORM struggles. Doesn't matter the language or the ORM.
One of the bigger issues I have noticed with all the ORMs I have used is updating only a few fields without retrieving the object first.
For example, say I have a Project object mapped in my database with the following fields: Id, name, description, owning_user. Say, through ajax, I want to just update the description field. In most ORMs the only way for me to update the database table while only having an Id and description values is to either retrieve the project object from the database, set the description and then send the object back to the database (thus requiring two database operations just for one simple update) or to update it via stored procedures (which is the method I am currently using).
Objects and database records really aren't all that similar. They have typed slots that you can store stuff in, but that's about it. Databases have a completely different notion of identity than programming languages. They can't handle composite objects well, so you have to use additional tables and foreign keys instead. Most have no concept of type inheritance. And the natural way to navigate a network of objects (follow some of the pointers in one object, get another object, and dereference again) is much less efficient when mapped to the database world, because you have to make multiple round trips and retrieve lots of data that you didn't care about.
In other words: the abstraction cannot be made very good in the first place; it isn't the ORM tools that are bad, but the metaphor that they implement. Instead of a perfect isomorphism it is is only a superficial similarity, so the task itself isn't a very good abstraction. (It is still way more useful than having to understand databases intimately, though. The scorn for ORM tools come mostly from DBAs looking down on mere programmers.)
ORMs also can write code that is not efficient. Since database performance is critical to most systems, they can cause problems that could have been avoided if a human being wrote the code (but which might not have been any better if the human in question didn't understand database performance tuning). This is especially true when the querying gets complex.
I think my biggest problem with them though is that by abstracting away the details, junior programmers are getting less understanding of how to write queries which they need to be able to to handle the edge cases and the places where the ORM writes really bad code. It's really hard to learn the advanced stuff when you never had to understand the basics. An ORM in the hands of someone who understands joins and group by and advanced querying is a good thing. In the hands of someone who doesn't understand boolean algebra and joins and a bunch of other basic SQL concepts, it is a very bad thing resulting in very poor design of database and queries.
Relational databases are not objects and shouldn't be treated as such. Trying to make an eagle into a silk purse is generally not successful. Far better to learn what the eagle is good at and why and let the eagle fly than to have a bad purse and a dead eagle.
The way I see it is like this. To use an ORM, you have to usually stack several php functions, and then connect to a database and essentially still run a MySQL query or something similar.
Why all of the abstraction in between code and database? Why can't we just use what we already know? Typically a web dev knows their backend language, their db language (some sort of SQL), and some sort of frontend languages, such as html, css, js, etc...
In essence, we're trying to add a layer of abstraction that includes many functions (and we all know php functions can be slower than assigning a variable). Yes, this is a micro calculation, but still, it adds up.
Not only do we now have several functions to go through, but we also have to learn the way the ORM works, so there's some time wasted there. I thought the whole idea of separation of code was to keep your code separate at all levels. If you're in the LAMP world, just create your query (you should know MySQL) and use the already existing php functionality for prepared statements. DONE!
LAMP WAY:
create query (string);
use mysqli prepared statements and retrieve data into array.
ORM WAY:
run a function that gets the entity
which runs a MySQL query
run another function that adds a conditional
run another function that adds another conditional
run another function that joins
run another function that adds conditionals on the join
run another function that prepares
runs another MySQL query
run another function that fetches the data
runs another MySQL Query
Does anyone else have a problem with the ORM stack? Why are we becoming such lazy developers? Or so creative that we're harming our code? If it ain't broke don't fix it. In turn, fix your dev team to understand the basics of web dev.
ORMs are trying to solve a very complex problem. There are edge cases galore and major design tradeoffs with no clear or obvious solutions. When you optimize an ORM design for situation A, you inherently make it awkward for solving situation B.
There are ORMs that handle lazy loading and subqueries in a "good enough" manner, but it's almost impossible to get from "good enough" to "great".
When designing your ORM, you have to have a pretty good handle on all the possible awkward database designs your ORM will be expected to handle. You have to explicitly make tradeoffs around which situations you are willing to handle awkwardly.
I don't look at ORMs as inflexible or any more leaky than your average complex abstraction. That said, certain ORMs are better than others in those respects.
Good luck reinventing the wheel.
So I'm having a head against the wall moment and hoping somebody can come help either remove the wall or stop my head from moving!!
Over the last 3/4 weeks I've been investigating ORM's in readyness for a new project. The ORM must map to an existing, large and ageing SQL database.
So I tried Subsonic. I really liked v2 and v3 after modding to work nicely with VB and named schemas in SQL was running OK. However, its lack of flexibility of having separate entity properties names vs column names had me pulling my hair out (sorry Rob).
I tried Entity Framework but I found like others it lacking in certain areas.
So I bit the bullet and tried nHibernate but after a week or so getting it working how I liked (with help from Codesmith to generate classes/hbms for me) I'm frustrated with the time it takes to startup (build a config object), despite trying a number of tricks to reduce this time.
I'm essentially after building a DAL class that I can share between apps and websites. Am I barking up the wrong tree? For a legacy project with 100s of tables should I go back to ado.net and use DTOs? Aarrgh!
Sorry for the ranty style of question. I don't have much hair left and I'd like to keep what I have!!
Thanks in advance, Ed
PS. I should add that I know SQL very well and not scared of getting my hands dirty to write fast queries. If anything I don't need to be hid from SQL
ORM let's you:
To map table rows to objects, that are the the workable pieces of object oriented programming.
To automatically navigate through object relationships
To easily add, edit and remove table rows
To query the database in a more intuitive way as you don't have to think of joins (this one will depend on the ORM and the query method)
To transparently handle L1 and L2 cache.
All of the above would have to be handled by hand if you werent using ORM.
PS: I agree to Dmitry as to the startup time of NHibernate (see question comments). Besides, did you try Fluent NHibernate? Fluent NHibernate is impressively easy. I couldn't believe my eyes when I first mapped a database. It's even easier than proprietary ORMs like DevExpress XPO.
The biggest benefit of an ORM tool is that it will help you layer your application correctly. Most project nowadays use a Data Layer to connect to the database. You start from the ORM tool to produce classes that correspond to your database objects. Then you define an interface using these methods. All persistence code uses the methods of this interface. This way the business logic layer is only coupled to this higher-layer interface and needs to know nothing about the database. In fact there should be no dependency on ADO.NET or even NHibernate.
Another advantage of ORM tools is that you de-couple your application from the database server. You could change the db engine and still use the same code. Also there isn't only the complexity of the SQL that the ORM hides from you. It can also help you with transactions logic and connection pooling.
I'd say that for new projects an ORM tool is a necessity. For legacy projects it isn't so much beneficial, unless of course you have the time/money to start from scratch.
In my experience, most ORMs end up being way more complex than SQL. Which defeats the entire purpose of using them.
One solution I'm enthusiastic about is LINQ2SQL. It excels as a thin layer about stored procedures or views. It's really easy to use and doesn't try to hide SQL.
There are basically two questions here:
What's great about ORMs? There are similar questions on Stackoverflow. See:
What are the advantages of using an ORM?
Is everyone here jumping on the ORM band wagon?
How can I improve NHibernate startup time? See:
http://ayende.com/Blog/archive/2007/10/26/Real-World-NHibernate-Reducing-startup-times-for-large-amount-of.aspx
http://nhforge.org/blogs/nhibernate/archive/2009/03/13/an-improvement-on-sessionfactory-initialization.aspx
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
The community reviewed whether to reopen this question last year and left it closed:
Original close reason(s) were not resolved
Improve this question
For some of the apps I've developed (then proceeded to forget about), I've been writing plain SQL, primarily for MySQL. Though I have used ORMs in python like SQLAlchemy, I didn't stick with them for long. Usually it was either the documentation or complexity (from my point of view) holding me back.
I see it like this: use an ORM for portability, plain SQL if it's just going to be using one type of database. I'm really looking for advice on when to use an ORM or SQL when developing an app that needs database support.
Thinking about it, it would be far better to just use a lightweight wrapper to handle database inconsistencies vs. using an ORM.
Speaking as someone who spent quite a bit of time working with JPA (Java Persistence API, basically the standardized ORM API for Java/J2EE/EJB), which includes Hibernate, EclipseLink, Toplink, OpenJPA and others, I'll share some of my observations.
ORMs are not fast. They can be adequate and most of the time adequate is OK but in a high-volume low-latency environment they're a no-no;
In general purpose programming languages like Java and C# you need an awful lot of magic to make them work (eg load-time weaving in Java, instrumentation, etc);
When using an ORM, rather than getting further from SQL (which seems to be the intent), you'll be amazed how much time you spend tweaking XML and/or annotations/attributes to get your ORM to generate performant SQL;
For complex queries, there really is no substitute. Like in JPA there are some queries that simply aren't possible that are in raw SQL and when you have to use raw SQL in JPA it's not pretty (C#/.Net at least has dynamic types--var--which is a lot nicer than an Object array);
There are an awful lot of "gotchas" when using ORMs. This includes unintended or unexpected behavior, the fact that you have to build in the capability to do SQL updates to your database (by using refresh() in JPA or similar methods because JPA by default caches everything so it won't catch a direct database update--running direct SQL updates is a common production support activity);
The object-relational mismatch is always going to cause problems. With any such problem there is a tradeoff between complexity and completeness of the abstraction. At times I felt JPA went too far and hit a real law of diminishing returns where the complexity hit wasn't justified by the abstraction.
There's another problem which takes a bit more explanation.
The traditional model for a Web application is to have a persistence layer and a presentation layer (possibly with a services or other layers in between but these are the important two for this discussion). ORMs force a rigid view from your persistence layer up to the presentation layer (ie your entities).
One of the criticisms of more raw SQL methods is that you end up with all these VOs (value objects) or DTOs (data transfer objects) that are used by simply one query. This is touted as an advantage of ORMs because you get rid of that.
Thing is those problems don't go away with ORMs, they simply move up to the presentation layer. Instead of creating VOs/DTOs for queries, you create custom presentation objects, typically one for every view. How is this better? IMHO it isn't.
I've written about this in ORM or SQL: Are we there yet?.
My persistence technology of choice (in Java) these days is ibatis. It's a pretty thin wrapper around SQL that does 90%+ of what JPA can do (it can even do lazy-loading of relationships although its not well-documented) but with far less overhead (in terms of complexity and actual code).
This came up last year in a GWT application I was writing. Lots of translation from EclipseLink to presentation objects in the service implementation. If we were using ibatis it would've been far simpler to create the appropriate objects with ibatis and then pass them all the way up and down the stack. Some purists might argue this is Badâ„¢. Maybe so (in theory) but I tell you what: it would've led to simpler code, a simpler stack and more productivity.
ORMs have some nice features. They can handle much of the dog-work of copying database columns to object fields. They usually handle converting the language's date and time types to the appropriate database type. They generally handle one-to-many relationships pretty elegantly as well by instantiating nested objects. I've found if you design your database with the strengths and weaknesses of the ORM in mind, it saves a lot of work in getting data in and out of the database. (You'll want to know how it handles polymorphism and many-to-many relationships if you need to map those. It's these two domains that provide most of the 'impedance mismatch' that makes some call ORM the 'vietnam of computer science'.)
For applications that are transactional, i.e. you make a request, get some objects, traverse them to get some data and render it on a Web page, the performance tax is small, and in many cases ORM can be faster because it will cache objects it's seen before, that otherwise would have queried the database multiple times.
For applications that are reporting-heavy, or deal with a large number of database rows per request, the ORM tax is much heavier, and the caching that they do turns into a big, useless memory-hogging burden. In that case, simple SQL mapping (LinQ or iBatis) or hand-coded SQL queries in a thin DAL is the way to go.
I've found for any large-scale application you'll find yourself using both approaches. (ORM for straightforward CRUD and SQL/thin DAL for reporting).
I say plain SQL for Reads, ORM for CUD.
Performance is something I'm always concerned about, specially in web applications, but also code maintainability and readability. To address these issues I wrote SqlBuilder.
ORM is not just portability (which is kinda hard to achieve even with ORMs, for that matter). What it gives you is basically a layer of abstraction over a persistent store, when a ORM tool frees you from writing boilerplate SQL queries (selects by PK or by predicates, inserts, updates and deletes) and lets you concentrate on the problem domain.
Any respectable design will need some abstraction for the database, just to handle the impedance mismatch. But the simplest first step (and adequate for most cases) I would expect would be a DAL, not a heavyweight ORM. Your only options aren't those at the ends of the spectrum.
EDIT in response to a comment requesting me to describe how I distinguish DAL from ORM:
A DAL is what you write yourself, maybe starting from a class that simply encapsulates a table and maps its fields to properties. An ORM is code you don't write for abstraction mechanisms inferred from other properties of your dbms schema, mostly PKs and FKs. (This is where you find out if the automatic abstractions start getting leaky or not. I prefer to inform them intentionally, but that may just be my personal preference).
The key that made my ORM use really fly was code generation. I agree that the ORM route isn't the fastest, in code performance terms. But when you have a medium to large team, the DB is changing rapidly the ability to regenerate classes and mappings from the DB as part of the build process is something brilliant to behold, especially when you use CI. So your code may not be the fastest, but your coding will be - I know which I'd take in most projects.
My recommendation is to develop using an ORM while the Schema is still fluid, use profiling to find bottlenecks, then tune those areas which need it using raw Sql.
Another thought, the caching built into Hibernate can often make massive performance improvements if used in the right way. No more going back to the DB to read reference data.
Dilemma whether to use a framework or not is quite common in modern day software development scenario.
What is important to understand is that every framework or approach has its pros and cons - for example in our experience we have found that ORM is useful when dealing with transactions i.e. insert/update/delete operations - but when it comes to fetch data with complex results it becomes important to evaluate the performance and effectiveness of the ORM tool.
Also it is important to understand that it is not compulsory to select a framework or an approach and implement everything in that. What we mean by that is we can have mix of ORM and native query language. Many ORM frameworks give extension points to plugin in native SQL. We should try not to over use a framework or an approach. We can combine certain frameworks or approaches and come with an appropriate solution.
You can use ORM when it comes to insertion, updation, deletion, versioning with high level of concurrency and you can use Native SQL for report generation and long listing
There's no 'one-tool-fits-all' solution, and this is also true for the question 'should i use an or/m or not ? '.
I would say: if you have to write an application/tool which is very 'data' focused, without much other logic, then I 'd use plain SQL, since SQL is the domain-specific language for this kind of applications.
On the other hand, if I was to write a business/enterprise application which contains a lot of 'domain' logic, then I'd write a rich class model which could express this domain in code. In such case, an OR/M mapper might be very helpfull to successfully do so, as it takes a lot of plumbing code out of your hands.
One of the apps I've developed was an IRC bot written in python. The modules it uses run in separate threads, but I haven't figured out a way to handle threading when using sqlite. Though, that might be better for a separate question.
I really should have just reworded both the title and the actual question. I've never actually used a DAL before, in any language.
Use an ORM that works like SQL, but provides compile-time checks and type safety. Like my favorite: Data Knowledge Objects (disclosure: I wrote it)
For example:
for (Bug bug : Bug.ALL.limit(100)) {
int id = bug.getId();
String title = bug.getTitle();
System.out.println(id +" "+ title);
}
Fully streaming. Easy to set up (no mappings to define - reads your existing schemas). Supports joins, transactions, inner queries, aggregation, etc. Pretty much anything you can do in SQL. And has been proven from giant datasets (financial time series) all the way down to trivial (Android).
I know this question is very old, but I thought that I would post an answer in case anyone comes across it like me. ORMs have come a long way. Some of them actually give you the best of both worlds: making development more productive and maintaining performance.
Take a look at SQL Data (http://sqldata.codeplex.com). It is a very light weight ORM for c# that covers all the bases.
FYI, I am the author of SQL Data.
I'd like to add my voice to the chorus of replies that say "There's a middle ground!".
To an application programmer, SQL is a mixture of things you might want to control and things you almost certainly don't want to be bothered controlling.
What I've always wanted is a layer (call it DAL, ORM, or micro-ORM, I don't mind which) that will take charge of the completely predictable decisions (how to spell SQL keywords, where the parentheses go, when to invent column aliases, what columns to create for a class that holds two floats and an int ...), while leaving me in charge of the higher-level aspects of the SQL, i.e. how to arrange JOINs, server-side computations, DISTINCTs, GROUP BYs, scalar subqueries, etc.
So I wrote something that does this: http://quince-lib.com/
It's for C++: I don't know whether that's the language you're using, but all the same it might be interesting to see this take on what a "middle ground" could look like.