Related
I suppose this isn't a particularly "good fit for our Q&A format" but no idea where else to ask it.
Why is SQL-querying text based? Is there a historical motivation, or just laziness?
mysqli_query("SELECT * FROM users WHERE Name=$name");
It opens up an incredible number of stupid mistakes like above, encouraging SQL injection. It's basically exec for SQL, and exec'ing code, especially dynamically generated, is widely discouraged.
Plus, to be honest, prepared statements seem like just a verbose workaround.
Why don't we have a more object-oriented, noninjectable way of doing it, like having a directly-accessible database object with certain methods:
$DB->setDB("mysite");
$table='users';
$col='username';
$DB->selectWhereEquals($table,$col,$name);
Which the database itself natively implements, completely eliminating all resemblance to exec and nullifying the entire basis of SQL injection.
Culminating in the real question, is there any database framework that does anything like this?
Why are programming languages text-based? Quite simply: because text can be used to represent powerful human-readable/editable DSLs (Domain-specific language), such as SQL.
The better (?) question is: Why do programmers refuse to use placeholders?
Modern database providers (e.g. ADO.NET, PDO) support placeholders and an appropriate range of database adapters1. Programmers who fail to take advantage of this support only have themselves to blame.
Besides ubiquitous support for placeholders in direct database providers, there are also many different APIs available including:
"Fluent database libraries"; these generally map an AST, such as LINQ, onto a dynamically generated SQL query and take care of the details such as correctly using placeholders.
A wide variety of ORMs; may or may not include a fluent API.
Primitive CRUD wrappers as found in the Android SQLite API, which looks suspiciously similar to the proposal.
I love the power of SQL and almost none of the queries I write can be expressed in such a trivial API. Even LINQ, which is a powerful provider, can sometimes get in my way (although the prudent use of Views helps).
1 Even though SQL is text-based (and such is used to pass the shape of the query to the back-end), bound data need not be included in-line, which is where the issue of SQL Injection (being able to change the shape of the query) comes in. The database adapter can, and does if it is smart enough, use the API exposed by the target database. In the case of SQL Server this amounts to sending the data separately [via RPC] and for SQLite this means creating and binding separate prepared-statement objects.
This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
What are the pros and cons to keeping SQL in Stored Procs versus Code
Just curious on the advantages and disadvantages of using a stored procedure vs. other forms of getting data from a database. What is the preferred method to ensure speed, accuracy, and security (we don't want sql injections!).
(should I post this question to another stack exchange site?)
As per the answer to all database questions 'it depends'. However, stored procedures definitely help in terms of speed because of plan caching (although properly parameterized SQL will benefit from that too). Accuracy is no different - an incorrect query is incorrect whether it's in a stored procedure or not. And in terms of security, they can offer a useful way of limiting access for users - seeing as you don't need to give them direct access to the underlying tables - you can just allow them to execute the stored procedures that you want. There are, however, many many questions on this topic and I'd advise you to search a bit and find out some more.
There are several questions on Stackoverflow about this problem. I really don't think you'll get a "right" answer here, both can work out very well, and both can work horribly. I think if you are using Java then the general pattern is to use an ORM framework like Hibernate/JPA. This can be completely safe from SQL injection attacks as long as you use the framework correctly. My experience with .Net developers is that they are more likely to use stored procedure backed persistence, but that seems to be more open than it was before. Both NHibernate and other MS technologies seem to be gaining popularity.
My personal view is that in general an ORM will save you some time from lots of verbose coding since it can automatically generate much of the SQL you use in a typical CRUD type system. To gain this you will likely give up a little performance and some flexibility. If your system is low to medium volume (10's of thousands of requests per day) then an ORM will be just fine for you. If you start getting in to the millions of requests per day then you may need something a little more bare metal like straight SQL or stored procedures. Note than an ORM doesn't prevent you from going more direct to the DB, it's just not normally what you would use.
One final note, is that I think ORM persistence makes an application much more testable. If you use stored procedures for much of your persistence then you are almost bound to start getting a bunch of business logic in these. To test them you have to actually persist data and interact with the DB, this makes testing slow and brittle. Using an ORM framework you can either avoid most of this testing or use an in memory DB when you really want to test persistence.
See:
Stored Procedures and ORM's
Manual DAL & BLL vs. ORM
This may be better on the Programmers SE, but I'll answer here.
CRUD stored procedures used to be, and sometimes still are, the best practice for data persistence and retrieval on a SQL DBMS. Every such DBMS has stored procedures, so you're practically guaranteed to be able to use this solution regardless of the coding language and DBMS, and code which uses the solution can be pointed to any DB that has the proper stored procs and it'll work with minimal code changes (there are some syntax changes required when calling SPs in different DBMSes; often these are integrated into a language's library support for accessing SPs on a particular DBMS). Perhaps the biggest advantage is centralized access to the table data; you can lock the tables themselves down like Fort Knox, and dispense access rights for the SPs as necessary to more limited user accounts.
However, they have some drawbacks. First off, SPs are difficult to TDD, because the tools don't really exist within database IDEs; you have to create tests in other code that exercise the SPs (and so the test must set up the DB with the test data that is expected). From a technical standpoint, such a test is not and cannot be a "unit test", which is a small, narrow test of a small, narrow area of functionality, which has no side effects (such as reading/writing to the file system). Also, SPs are one more layer that has to be changed when making a needed change to functionality. Adding a new field to a query result requires changing the table, the retrieval source code, and the SP. Adding a new way to search for records of a particular type requires the statement to be created and tested, then encapsulated in a SP, and the corresponding method created on the DAO.
The new best practice where available, IMO, is a library called an object-relational mapper or ORM. An ORM abstracts the actual data layer, so what you're asking for becomes the code objects themselves, and you query for them based on properties of those objects, not based on table data. These queries are almost always code-configurable, and are translated into the DBMS's flavor of SQL based on one or more "mappings" that you define between the object model and the data model (objects of type A are persisted as records in table B, where this property C is written to field D).
The advantages are more flexibility within the code actually looking for data in the form of these code objects. The criteria of a query is usually able to be customized in-code; if a new query is needed that has a different WHERE clause, you just write the query, and the ORM will translate it into the new SQL statement. Because the ORM is the only place where SQL is actually used (and most ORMs use system stored procs to execute parameterized query strings where available) injection attacks are virtually impossible. Lastly, depending on the language and the ORM, queries can be compiler-checked; in .NET, a library called Linq is available that provides a SQL-ish keyword syntax, that is then converted into method calls that are given to a "query provider" that can translate those method calls into the data store's native query language. This also allows queries to be tested in-code; you can verify that the query used will produce the desired results given an in-memory collection of objects that stands in for the actual DBMS.
The disadvantages of an ORM is that the ORM library is usually language-specific; Hibernate is available in Java, NHibernate (and L2E and L2SQL) in .NET, and a few similar libraries like Pork in PHP, but if you're coding in an older or more esoteric language there's simply nothing of the sort available. Another one is that security becomes a little trickier; most ORMs require direct access to the tables in order to query and update them. A few will tolerate being pointed to a view for retrieval and SPs for updating (allowing segregation of view/SP and table security and the ability to restrict the retrievable fields), but now you're mixing the worst of both worlds; you still have to define mappings, but now you also have code in the data layer. The easiest way to overcome this is to implement your security elsewhere; force applications to get data using a web service, which provides the data using the ORM and has specific, limited "front doors". Also, many ORMs have some performance problems when used in certain ways; most are designed to "lazy-load" data, where data is retrieved the moment it's actually needed and not before, which increases up-front performance when you don't need every record you asked for. However, when you DO need every record you asked for, this creates extra round trips. You have to structure queries in specific ways to get around this expected use-case behavior.
Which is better? You have to decide. I can tell you now that using an ORM is MUCH easier to set up and get working correctly than SPs, and it's much easier to make (and limit the scope of) changes to the schema and to queries. In the modern development house, where the priority is to make it work first, and then make it perform well and/or be secure against intrusion, that's a HUGE plus. In most cases where you think security is an issue, it really isn't, and when security really is an issue, putting the solution in the DB layer is usually the wrong place, because the DBMS is the very last line of defense against intrusion; if the DBMS itself has to be counted on to stop something unwanted from happening, you have failed to do so (or even encouraged it to happen) in many layers of software and firmware above it.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I'm not quite sure stackoverflow is a place for such a general question, but let's give it a try.
Being exposed to the need of storing application data somewhere, I've always used MySQL or sqlite, just because it's always done like that. As it seems like the whole world is using these databases (most of all software products, frameworks, etc), it is rather hard for a beginning developer like me to start thinking about whether this is a good solution or not.
Ok, say we have some object-oriented logic in our application, and objects are related to each other somehow. We need to map this logic to the storage logic, so relations between database objects are required too. This leads us to using relational database, and I'm ok with that - to put it simple, our database table rows sometimes will need to have references to other tables' rows. But why use SQL language for interaction with such a database?
SQL query is a text message. I can understand this is cool for actually understanding what it does, but isn't it silly to use text table and column names for a part of application that no one ever seen after deploynment? If you had to write a data storage from scratch, you would have never used this kind of solution. Personally, I would have used some 'compiled db query' bytecode, that would be assembled once inside a client application and passed to the database. And it surely would name tables and colons by id numbers, not ascii-strings. In the case of changes in table structure those byte queries could be recompiled according to new db schema, stored in XML or something like that.
What are the problems of my idea? Is there any reason for me not to write it myself and to use SQL database instead?
EDIT To make my question more clear. Most of answers claim that SQL, being a text query, helps developers better understand the query itself and debug it more easily. Personally, I haven't seen people writing SQL queries by hand for a while. Everyone I know, including me, is using ORM. This situation, in which we build up a new level of abstraction to hide SQL, leads to thinking if we need SQL or not. I would be very grateful if you could give some examples in which SQL is used without ORM purposely, and why.
EDIT2 SQL is an interface between a human and a database. The question is why do we have to use it for application/database interaction? I still ask for examples of human beings writing/debugging SQL.
Everyone I know, including me, is using ORM
Strange. Everyone I know, including me, still writes most of the SQL by hand. You typically end up with tighter, more high performance queries than you do with a generated solution. And, depending on your industry and application, this speed does matter. Sometimes a lot. yeah, I'll sometimes use LINQ for a quick-n-dirty where I don't really care what the resulting SQL looks like, but thus far nothing automated beats hand-tuned SQL for when performance against a large database in a high-load environment really matters.
If all you need to do is store some application data somewhere, then a general purpose RDBMS or even SQLite might be overkill. Serializing your objects and writing them to a file might be simpler in some cases. An advantage to SQLite is that if you have a lot of this kind of information, it is all contained in one file. A disadvantage is that it is more difficult to read it. For example, if you serialize you data to YAML, you can read the file with any text editor or shell.
Personally, I would have used some
'compiled db query' bytecode, that
would be assembled once inside a
client application and passed to the
database.
This is how some database APIs work. Check out static SQL and prepared statements.
Is there any reason for me not to
write it myself and to use SQL
database instead?
If you need a lot of features, at some point it will be easier to use an existing RDMBS then to write your own database from scratch. If you don't need many features, a simpler solution may be wiser.
The whole point of database products is to avoid writing the database layer for every new program. Yes, a modern RDMBS might not always be a perfect fit for every project. This is because they were designed to be very general, so in practice, you will always get additional features you don't need. That doesn't mean it is better to have a custom solution. The glove doesn't always need to be a perfect fit.
UPDATE:
But why use SQL language for
interaction with such a database?
Good question.
The answer to that may be found in the original paper describing the relational model A Relational Model of Data for Large Shared Data Banks, by E. F. Codd, published by IBM in 1970. This paper describes the problems with the existing database technologies of the time, and explains why the relational model is superior.
The reason for using the relational model, and thus a logical query language like SQL, is data independence.
Data independence is defined in the paper as:
"... the independence of application programs and terminal activities from the growth in data types and changes in data representations."
Before the relational model, the dominate technology for databases was referred to as the network model. In this model, the programmer had to know the on-disk structure of the data and traverse the tree or graph manually. The relational model allows one to write a query against the conceptual or logical scheme that is independent of the physical representation of the data on disk. This separation of logical scheme from the physical schema is why we use the relational model. For a more on this issue, here are some slides from a database class. In the relational model, we use logic based query languages like SQL to retrieve data.
Codd's paper goes into more detail about the benefits of the relational model. Give it a read.
SQL is a query language that is easy to type into a computer in contrast with the query languages typically used in a research papers. Research papers generally use relation algebra or relational calculus to write queries.
In summary, we use SQL because we happen to use the relational model for our databases.
If you understand the relational model, it is not hard to see why SQL is the way it is. So basically, you need to study the relation model and database internals more in-depth to really understand why we use SQL. It may be a bit of a mystery otherwise.
UPDATE 2:
SQL is an interface between a human
and a database. The question is why do
we have to use it for
application/database interaction? I
still ask for examples of human beings
writing/debugging SQL.
Because the database is a relational database, it only understands relational query languages. Internally it uses a relational algebra like language for specifying queries which it then turns into a query plan. So, we write our query in a form we can understand (SQL), the DB takes our SQL query and turns it into its internal query language. Then it takes the query and tries to find a "query plan" for executing the query. Then it executes the query plan and returns the result.
At some point, we must encode our query in a format that the database understands. The database only knows how to convert SQL to its internal representation, that is why there is always SQL at some point in the chain. It cannot be avoided.
When you use ORM, your just adding a layer on top of the SQL. The SQL is still there, its just hidden. If you have a higher-level layer for translating your request into SQL, then you don't need to write SQL directly which is beneficial in some cases. Some times we do not have such a layer that is capable of doing the kinds of queries we need, so we must use SQL.
Given the fact that you used MySQL and SQLite, I understand your point of view completely. Most DBMS have features that would require some of the programming from your side, while you get it from database for free:
Indexes - you can store large amounts of data and still be able to filter and search very quickly because of indexes. Of course, you could implement you own indexing, but why reinvent the wheel
data integrity - using database features like cascading foreign keys can ensure data integrity across the system. You only need to declare relationship between data, and system takes care of the rest. Of course, once more, you could implement constraints in code, but it's more work. Consider, for example, deletion, where you would have to write code in object's destructor to track all dependent objects and act accordingly
ability to have multiple applications written in different programming languages, working on different operating systems, some even distributed across the network - all using the same data stored in a common database
dead easy implementation of observer pattern via triggers. There are many cases where only some data depends on some other data and it does not affect UI aspect of application. Ensuring consistency can be very tricky or require a lot of programming. Of course, you could implement trigger-like behavior with objects but it requires more programming than simple SQL definition
There are some good answers here. I'll attempt to add my two cents.
I like SQL, I can think in it pretty easily. The queries produced by layers on top of the DB (like ORM frameworks) are usually hideous. They'll select tons of extra stuff, join in things you don't need, etc.; all because they don't know that you only want a small part of the object in this code. When you need high performance, you'll often end up going in and using at least some custom SQL queries in an ORM system just to speed up a few bottlenecks.
Why SQL? As others have said, it's easy for humans. It makes a good lowest common denominator. Any language can make SQL and call command line clients if necessary, and they is pretty much always a good library.
Is parsing out the SQL inefficient? Somewhat. The grammar is pretty structured, so there aren't tons of ambiguities that would make the parser's job really hard. The real thing is that the overhead of parsing out SQL is basically nothing.
Let's say you run a query like "SELECT x FROM table WHERE id = 3", and then do it again with 4, then 5, over and over. In that case, the parsing overhead may exist. That's why you have prepared statements (as others have mentioned). The server parses the query once, and can swap in the 3 and 4 and 5 without having to reparse everything.
But that's the trivial case. In real life, your system may join 6 tables and have to pull hundreds of thousands of records (if not more). It may be a query that you let run on a database cluster for hours, because that's the best way to do things in your case. Even with a query that takes only a minute or two to execute, the time to parse the query is essentially free compared to pulling records off disk and doing sorting/aggregation/etc. The overhead of sending the ext "LEFT OUTER JOIN ON" is only a few bytes compared to sending special encoded byte 0x3F. But when your result set is 30 MB (let alone gigs+), those few extra bytes are worthless compared to not having to mess with some special query compiler object.
Many people use SQL on small databases. The biggest one I interact with is only a few dozen gigs. SQL is used on everything from tiny files (like little SQLite DBs may be) up to terabyte size Oracle clusters. Considering it's power, it's actually a surprisingly simple and small command set.
It's an ubiquitous standard. Pretty much every programming language out there has a way to access SQL databases. Try that with a proprietary binary protocol.
Everyone knows it. You can find experts easily, new developers will usually understand it to some degree without requiring training
SQL is very closely tied to the relational model, which has been thoroughly explored in regard to optimization and scalability. But it still frequently requires manual tweaking (index creation, query structure, etc.), which is relatively easy due to the textual interface.
But why use SQL language for interaction with such a database?
I think it's for the same reason that you use a human-readable (source code) language for interaction with the compiler.
Personally, I would have used some 'compiled db query' bytecode, that would be assembled once inside a client application and passed to the database.
This is an existing (optional) feature of databases, called "stored procedures".
Edit:
I would be very grateful if you could give some examples in which SQL is used without ORM purposely, and why
When I implemented my own ORM, I implemented the ORM framework using ADO.NET: and using ADO.NET includes using SQL statements in its implementation.
After all the edits and comments, the main point of your question appears to be : why is the nature of SQL closer to being a human/database interface than to being an application/database interface ?
And the very simple answer to that question is : because that is exactly what it was originally intended to be.
The predecessors of SQL (QUEL being presumably the most important one) were intended to be exactly that : a QUERY language, i.e. one that didn't have any of INSERT, UPDATE, DELETE.
Moreover, it was intended to be a query language that could be used by any user, provided that user was aware of the logical structure of the database, and obviously knew how to express that logical structure in the query language he was using.
The original ideas behind QUEL/SQL were that a database was built using "just any mechanism conceivable", that the "real" database could be really just anything (e.g. one single gigantic XML file - allthough 'XML' was not considered a valid option at the time), and that there would be "some kind of machinery" that understood how to transform the actual structure of that 'just anything' into the logical relational structure as it was perceived by the SQL user.
The fact that in order to actually achieve that, the underlying structures are required to lend themselves to "viewing them relationally", was not understood as well in those days as it is now.
Yes, it is annoying to have to write SQL statements to store and retrieve objects.
That's why Microsoft have added things like LINQ (language integrated query) into C# and VB.NET to make it possible to query databases using objects and methods instead of strings.
Most other languages have something similar with varying levels of success depending on the abilities of that language.
On the other hand, it is useful to know how SQL works and I think it is a mistake to shield yourself entirely from it. If you use the database without thinking you can write extremely inefficient queries and index the database incorrectly. But once you understand how to use SQL correctly and have tuned your database, you have a very powerful tried-and-tested tool available for finding exactly the data you need extremely quickly.
My biggest reason for SQL is Ad-hoc reporting. That report your business users want but don't know that they need it yet.
SQL is an interface between a human
and a database. The question is why do
we have to use it for
application/database interaction? I
still ask for examples of human beings
writing/debugging SQL.
I use sqlite a lot right from the simplest of tasks (like logging my firewall logs directly to a sqlite database) to more complex analytic and debugging tasks in my day-to-day research. Laying out my data in tables and writing SQL queries to munge them in interesting ways seems to be the most natural thing to me in these situations.
On your point about why it is still used as an interface between application/database, this is my simple reasoning:
There is about 3-4 decades of
serious research in that area
starting in 1970 with Codd's seminal
paper on Relational Algebra.
Relational Algebra forms the
mathematical basis to SQL (and other
QLs), although SQL does not
completely follow the relational
model.
The "text" form of the language
(aside from being easily
understandable to humans) is also
easily parsable by machines (say
using a grammar parser like like
lex) and is easily convertable to whatever "bytecode" using any number of optimizations.
I am not sure if doing this in any
other way would have yielded
compelling benefits in the generic cases. Otherwise it
would have been probably discovered
and adopted in the 3 decades of
research. SQL probably provides the
best tradeoffs when bridging the
divide between humans/databases and
applications/databases.
The question that then becomes interesting to ask is, "What are the real benefits of doing SQL in any other "non-text" way?" Will google for this now:)
SQL is a common interface used by the DBMS platform - the entire point of the interface is that all database operations can be specified in SQL without needing supplementary API calls. This means that there is a common interface across all clients of the system - application software, reports and ad-hoc query tools.
Secondly, SQL gets more and more useful as queries get more complex. Try using LINQ to specify a 12-way join a with three conditions based on existential predicates and a condition based on an aggregate calculated in a subquery. This sort of thing is fairly comprehensible in SQL but unlikely to be possible in an ORM.
In many cases an ORM will do 95% of what you want - most of the queries issued by applications are simple CRUD operations that an ORM or other generic database interface mechanism can handle easily. Some operations are best done using custom SQL code.
However, ORMs are not the be-all and end-all of database interfacing. Fowler's Patterns of Enterprise Application Architecture has quite a good section on other types of database access strategy with some discussion of the merits of each.
There are often good reasons not to use an ORM as the primary database interface layer. An example of a good one is that platform database libraries like ADO.Net often do a good enough job and integrate nicely with the rest of the environment. You might find that the gain from using some other interface doesn't really outweigh the benefits from the integration.
However, the final reason that you can't really ignore SQL is that you are ultimately working with a database if you are doing a database application. There are many, many WTF stories about screw-ups in commercial application code done by people who didn't understand databases properly. Poorly thought-out database code can cause trouble in so many ways, and blithely thinking that you don't need to understand how the DBMS works is an act of Hubris that is bound to come and bite you some day. Worse yet, it will come and bite some other poor schmoe who inherits your code.
While I see your point, SQL's query language has a place, especially in large applications with a lot of data. And to point out the obvious, if the language wasn't there, you couldn't call it SQL (Structured Query Language). The benefit of having SQL over the method you described is SQL is generally very readable, though some really push the limits on their queries.
I whole heartly agree with Mark Byers, you should not shield yourself from SQL. Any developer can write SQL, but to really make your application perform well with SQL interaction, understanding the language is a must.
If everything was precompiled with bytecode as you described, I'd hate to be the one to have to debug the application after the original developer left (or even after not seeing the code for 6 months).
I think the premise of the question is incorrect. That SQL can be represented as text is immaterial. Most modern databases would only compile queries once and cache them anyway, so you already have effectively a 'compiled bytecode'. And there's no reason this couldn't happen client-wise though I'm not sure if anyone's done it.
You said SQL is a text message, well I think of him as a messenger, and, as we know, don't shoot the messenger. The real issue is that relations are not a good enough way of organising real world data. SQL is just lipstick on the pig.
If the first part you seem to refer to what is usually called the Object - relational mapping impedance. There are already a lot of frameworks to alleviate that problem. There are tradeofs as well. Some things will be easier, others will get more complex, but in the general case they work well if you can afford the extra layer.
In the second part you seem to complain about SQL being text (it uses strings instead of ids, etc)... SQL is a query language. Any language (computer or otherwise) that is meant to be read or written by humans is text oriented for that matter. Assembly, C, PHP, you name it. Why? Because, well... it does make sense, doesn't it?
If you want precompiled queries, you already have stored procedures. Prepared statements are also compiled once on the fly, IIRC. Most (if not all) db drivers talk to the database server using a binary protocol anyway.
yes, text is a bit inefficient. But actually getting the data is a lot more costly, so the text based sql is reasonably insignificant.
SQL was created to provide an interface to make ad hoc queries against a relational database.
Generally, most relational databases understand some form of SQL.
Object-oriented databases exist, and (presumably) use objects to do their querying... but as I understand it, OO databases have a lot more overheard, and relational databases work just fine.
Relational Databases also allow you to operate in a "disconnected" state. Once you have the information you asked for, you can close the database connection. With an OO database, you either need to return all objects related to the current one (and the ones they're related to... and the... etc...) or reopen the connection to retrieve new objects as they are accessed.
In addition to SQL, you also have ORMs (object-relational mappings) that map objects to SQL and back. There are quite a few of them, including LINQ (.NET), the MS Entity Framework (.NET), Hibernate (Java), SQLAlchemy (Python), ActiveRecord (Ruby), Class::DBI (Perl), etc...
A database language is useful because it provides a logical model for your data independent of any applications that use it. SQL has a lot of shortcomings however, not the least being that its integration with other languages is poor, type support is about 30 years behind the rest of the industry and it has never been a truly relational language anyway.
SQL has survived mostly because the database market has been and remains dominated by the three mega-vendors who have a vested interest in protecting their investment. That's changing and SQL's days are probably numbered but the model that will finally replace it probably hasn't arrived yet - although there are plenty of contenders around these days.
I don't think most people are getting your question, though I think it's very clear. Unfortunately I don't have the "correct" answer. I would guess it's a combination of several things:
Semi-arbitrary decisions when it was designed such as ease of use, not needing a SQL compiler (or IDE), portability, etc.
It happened to catch on well (probably due to similar reasons)
And now due to historical reasons (compatibility, well known, proven, etc.) continues to be used.
I don't think most companies have bothered with another solution because it works well, isn't much of a bottleneck, it's a standard, blah, blah..
One of the Unix design principles can be said thusly, "Write programs to handle text streams, because that is a universal interface.".
And that, I believe, is why we typically use SQL instead of some 'byte-SQL' that only has a compilation interface. Even if we did have a byte-SQL, someone would write a "Text SQL", and the loop would be complete.
Also, MySQL and SQLite are less full-featured than, say, MSSQL and Oracle SQL. So you're still in the low end of the SQL pool.
Actually there are a few non-SQL database (like Objectivity, Oracle Berkeley DB, etc.) products came but non of them succeeded. In future if someone finds intuitive alternative for SQL, that will answer your question.
There are a lot of non relational database systems. Here are just a few:
Memcached
Tokyo Cabinet
As far as finding a relational database that doesn't use SQL as its primary interface, I think you won't find it. Reason: SQL is a great way to talk about relations. I can't figure out why that's a big deal to you: if you don't like SQL, put an abstraction over it (like an ORM) so you don't have to worry about it. Let the abstraction worry about it. It gets you to the same place.
However, the problem your'e really mentioning here is the object-relation disconnect - the problem is with the relation itself. Objects and relational-tuples don't always lend themselves to be a 1-1 relationship, which is the reason why a developer can frustrated with a database. The solution to that is to use a different database type.
Because often, you cannot be sure that (citing you) "no one ever seen after deployment". Knowing that there is an easy interface for reporting and for dataset level querying is a good path for evolution of your app.
You're right, that there are other solutions that may be valid in some situations: XML, plain text files, OODB...
But having a set of common interfaces (like ODBC) is a huge plus for the life of data.
I think the reason might be the search/find/grab algorithms the sql laungage is connected to do. Remember that sql has been developed for 40 years - and the goal has been both preformence wise and user firendly wise.
Ask yourself what the best way of finding 2 attibutes is. Now why investigating that each time you would want to do something that includes that each time you develope your application. Assuming the main goal is the developing of your application when developing an application.
An application has similarities with other applications, a database has similarities with other databases. So there should be a "best way" of these to interact, logically.
Also ask yourself how you would develop a better console only application that does not use sql laungage. If you cannot do that I think you need to develope a new kind of GUI that are even more fundamentally easier to use than with a console - to develope things from it. And that might actually be possible. But still most development of applications is based around console and typing.
Then when it comes to laungage I don´t think you can make a much more fundamentally easier text laungage than sql. And remember that each word of anything is inseperatly connected to its meaning - if you remove the meaning the word cannot be used - if you remove the word you cannot communicate the meaning. You have nothing to describe it with (And maybe you cannot even think it beacuse it woulden´t be connected to anything else you have thought before...).
So basically the best possible algorithms for database manipulation are assigned to words - if you remove these words you will have to assign these manipulations something else - and what would that be?
i think you can use ORM
if and only if you know the basic of sql.
else the result there isn't the best
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
The community reviewed whether to reopen this question last year and left it closed:
Original close reason(s) were not resolved
Improve this question
For some of the apps I've developed (then proceeded to forget about), I've been writing plain SQL, primarily for MySQL. Though I have used ORMs in python like SQLAlchemy, I didn't stick with them for long. Usually it was either the documentation or complexity (from my point of view) holding me back.
I see it like this: use an ORM for portability, plain SQL if it's just going to be using one type of database. I'm really looking for advice on when to use an ORM or SQL when developing an app that needs database support.
Thinking about it, it would be far better to just use a lightweight wrapper to handle database inconsistencies vs. using an ORM.
Speaking as someone who spent quite a bit of time working with JPA (Java Persistence API, basically the standardized ORM API for Java/J2EE/EJB), which includes Hibernate, EclipseLink, Toplink, OpenJPA and others, I'll share some of my observations.
ORMs are not fast. They can be adequate and most of the time adequate is OK but in a high-volume low-latency environment they're a no-no;
In general purpose programming languages like Java and C# you need an awful lot of magic to make them work (eg load-time weaving in Java, instrumentation, etc);
When using an ORM, rather than getting further from SQL (which seems to be the intent), you'll be amazed how much time you spend tweaking XML and/or annotations/attributes to get your ORM to generate performant SQL;
For complex queries, there really is no substitute. Like in JPA there are some queries that simply aren't possible that are in raw SQL and when you have to use raw SQL in JPA it's not pretty (C#/.Net at least has dynamic types--var--which is a lot nicer than an Object array);
There are an awful lot of "gotchas" when using ORMs. This includes unintended or unexpected behavior, the fact that you have to build in the capability to do SQL updates to your database (by using refresh() in JPA or similar methods because JPA by default caches everything so it won't catch a direct database update--running direct SQL updates is a common production support activity);
The object-relational mismatch is always going to cause problems. With any such problem there is a tradeoff between complexity and completeness of the abstraction. At times I felt JPA went too far and hit a real law of diminishing returns where the complexity hit wasn't justified by the abstraction.
There's another problem which takes a bit more explanation.
The traditional model for a Web application is to have a persistence layer and a presentation layer (possibly with a services or other layers in between but these are the important two for this discussion). ORMs force a rigid view from your persistence layer up to the presentation layer (ie your entities).
One of the criticisms of more raw SQL methods is that you end up with all these VOs (value objects) or DTOs (data transfer objects) that are used by simply one query. This is touted as an advantage of ORMs because you get rid of that.
Thing is those problems don't go away with ORMs, they simply move up to the presentation layer. Instead of creating VOs/DTOs for queries, you create custom presentation objects, typically one for every view. How is this better? IMHO it isn't.
I've written about this in ORM or SQL: Are we there yet?.
My persistence technology of choice (in Java) these days is ibatis. It's a pretty thin wrapper around SQL that does 90%+ of what JPA can do (it can even do lazy-loading of relationships although its not well-documented) but with far less overhead (in terms of complexity and actual code).
This came up last year in a GWT application I was writing. Lots of translation from EclipseLink to presentation objects in the service implementation. If we were using ibatis it would've been far simpler to create the appropriate objects with ibatis and then pass them all the way up and down the stack. Some purists might argue this is Bad™. Maybe so (in theory) but I tell you what: it would've led to simpler code, a simpler stack and more productivity.
ORMs have some nice features. They can handle much of the dog-work of copying database columns to object fields. They usually handle converting the language's date and time types to the appropriate database type. They generally handle one-to-many relationships pretty elegantly as well by instantiating nested objects. I've found if you design your database with the strengths and weaknesses of the ORM in mind, it saves a lot of work in getting data in and out of the database. (You'll want to know how it handles polymorphism and many-to-many relationships if you need to map those. It's these two domains that provide most of the 'impedance mismatch' that makes some call ORM the 'vietnam of computer science'.)
For applications that are transactional, i.e. you make a request, get some objects, traverse them to get some data and render it on a Web page, the performance tax is small, and in many cases ORM can be faster because it will cache objects it's seen before, that otherwise would have queried the database multiple times.
For applications that are reporting-heavy, or deal with a large number of database rows per request, the ORM tax is much heavier, and the caching that they do turns into a big, useless memory-hogging burden. In that case, simple SQL mapping (LinQ or iBatis) or hand-coded SQL queries in a thin DAL is the way to go.
I've found for any large-scale application you'll find yourself using both approaches. (ORM for straightforward CRUD and SQL/thin DAL for reporting).
I say plain SQL for Reads, ORM for CUD.
Performance is something I'm always concerned about, specially in web applications, but also code maintainability and readability. To address these issues I wrote SqlBuilder.
ORM is not just portability (which is kinda hard to achieve even with ORMs, for that matter). What it gives you is basically a layer of abstraction over a persistent store, when a ORM tool frees you from writing boilerplate SQL queries (selects by PK or by predicates, inserts, updates and deletes) and lets you concentrate on the problem domain.
Any respectable design will need some abstraction for the database, just to handle the impedance mismatch. But the simplest first step (and adequate for most cases) I would expect would be a DAL, not a heavyweight ORM. Your only options aren't those at the ends of the spectrum.
EDIT in response to a comment requesting me to describe how I distinguish DAL from ORM:
A DAL is what you write yourself, maybe starting from a class that simply encapsulates a table and maps its fields to properties. An ORM is code you don't write for abstraction mechanisms inferred from other properties of your dbms schema, mostly PKs and FKs. (This is where you find out if the automatic abstractions start getting leaky or not. I prefer to inform them intentionally, but that may just be my personal preference).
The key that made my ORM use really fly was code generation. I agree that the ORM route isn't the fastest, in code performance terms. But when you have a medium to large team, the DB is changing rapidly the ability to regenerate classes and mappings from the DB as part of the build process is something brilliant to behold, especially when you use CI. So your code may not be the fastest, but your coding will be - I know which I'd take in most projects.
My recommendation is to develop using an ORM while the Schema is still fluid, use profiling to find bottlenecks, then tune those areas which need it using raw Sql.
Another thought, the caching built into Hibernate can often make massive performance improvements if used in the right way. No more going back to the DB to read reference data.
Dilemma whether to use a framework or not is quite common in modern day software development scenario.
What is important to understand is that every framework or approach has its pros and cons - for example in our experience we have found that ORM is useful when dealing with transactions i.e. insert/update/delete operations - but when it comes to fetch data with complex results it becomes important to evaluate the performance and effectiveness of the ORM tool.
Also it is important to understand that it is not compulsory to select a framework or an approach and implement everything in that. What we mean by that is we can have mix of ORM and native query language. Many ORM frameworks give extension points to plugin in native SQL. We should try not to over use a framework or an approach. We can combine certain frameworks or approaches and come with an appropriate solution.
You can use ORM when it comes to insertion, updation, deletion, versioning with high level of concurrency and you can use Native SQL for report generation and long listing
There's no 'one-tool-fits-all' solution, and this is also true for the question 'should i use an or/m or not ? '.
I would say: if you have to write an application/tool which is very 'data' focused, without much other logic, then I 'd use plain SQL, since SQL is the domain-specific language for this kind of applications.
On the other hand, if I was to write a business/enterprise application which contains a lot of 'domain' logic, then I'd write a rich class model which could express this domain in code. In such case, an OR/M mapper might be very helpfull to successfully do so, as it takes a lot of plumbing code out of your hands.
One of the apps I've developed was an IRC bot written in python. The modules it uses run in separate threads, but I haven't figured out a way to handle threading when using sqlite. Though, that might be better for a separate question.
I really should have just reworded both the title and the actual question. I've never actually used a DAL before, in any language.
Use an ORM that works like SQL, but provides compile-time checks and type safety. Like my favorite: Data Knowledge Objects (disclosure: I wrote it)
For example:
for (Bug bug : Bug.ALL.limit(100)) {
int id = bug.getId();
String title = bug.getTitle();
System.out.println(id +" "+ title);
}
Fully streaming. Easy to set up (no mappings to define - reads your existing schemas). Supports joins, transactions, inner queries, aggregation, etc. Pretty much anything you can do in SQL. And has been proven from giant datasets (financial time series) all the way down to trivial (Android).
I know this question is very old, but I thought that I would post an answer in case anyone comes across it like me. ORMs have come a long way. Some of them actually give you the best of both worlds: making development more productive and maintaining performance.
Take a look at SQL Data (http://sqldata.codeplex.com). It is a very light weight ORM for c# that covers all the bases.
FYI, I am the author of SQL Data.
I'd like to add my voice to the chorus of replies that say "There's a middle ground!".
To an application programmer, SQL is a mixture of things you might want to control and things you almost certainly don't want to be bothered controlling.
What I've always wanted is a layer (call it DAL, ORM, or micro-ORM, I don't mind which) that will take charge of the completely predictable decisions (how to spell SQL keywords, where the parentheses go, when to invent column aliases, what columns to create for a class that holds two floats and an int ...), while leaving me in charge of the higher-level aspects of the SQL, i.e. how to arrange JOINs, server-side computations, DISTINCTs, GROUP BYs, scalar subqueries, etc.
So I wrote something that does this: http://quince-lib.com/
It's for C++: I don't know whether that's the language you're using, but all the same it might be interesting to see this take on what a "middle ground" could look like.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
As a web developer looking to move from hand-coded PHP sites to framework-based sites, I have seen a lot of discussion about the advantages of one ORM over another. It seems to be useful for projects of a certain (?) size, and even more important for enterprise-level applications.
What does it give me as a developer? How will my code differ from the individual SELECT statements that I use now? How will it help with DB access and security? How does it find out about the DB schema and user credentials?
Edit: #duffymo pointed out what should have been obvious to me: ORM is only useful for OOP code. My code is not OO, so I haven't run into the problems that ORM solves.
I'd say that if you aren't dealing with objects there's little point in using an ORM.
If your relational tables/columns map 1:1 with objects/attributes, there's not much point in using an ORM.
If your objects don't have any 1:1, 1:m or m:n relationships with other objects, there's not much point in using an ORM.
If you have complex, hand-tuned SQL, there's not much point in using an ORM.
If you've decided that your database will have stored procedures as its interface, there's not much point in using an ORM.
If you have a complex legacy schema that can't be refactored, there's not much point in using an ORM.
So here's the converse:
If you have a solid object model, with relationships between objects that are 1:1, 1:m, and m:n, don't have stored procedures, and like the dynamic SQL that an ORM solution will give you, by all means use an ORM.
Decisions like these are always a choice. Choose, implement, measure, evaluate.
ORMs are being hyped for being the solution to Data Access problems. Personally, after having used them in an Enterprise Project, they are far from being the solution for Enterprise Application Development. Maybe they work in small projects. Here are the problems we have experienced with them specifically nHibernate:
Configuration: ORM technologies require configuration files to map table schemas into object structures. In large enterprise systems the configuration grows very quickly and becomes extremely difficult to create and manage. Maintaining the configuration also gets tedious and unmaintainable as business requirements and models constantly change and evolve in an agile environment.
Custom Queries: The ability to map custom queries that do not fit into any defined object is either not supported or not recommended by the framework providers. Developers are forced to find work-arounds by writing adhoc objects and queries, or writing custom code to get the data they need. They may have to use Stored Procedures on a regular basis for anything more complex than a simple Select.
Proprietery binding: These frameworks require the use of proprietary libraries and proprietary object query languages that are not standardized in the computer science industry. These proprietary libraries and query languages bind the application to the specific implementation of the provider with little or no flexibility to change if required and no interoperability to collaborate with each other.
Object Query Languages: New query languages called Object Query Languages are provided to perform queries on the object model. They automatically generate SQL queries against the databse and the user is abstracted from the process. To Object Oriented developers this may seem like a benefit since they feel the problem of writing SQL is solved. The problem in practicality is that these query languages cannot support some of the intermediate to advanced SQL constructs required by most real world applications. They also prevent developers from tweaking the SQL queries if necessary.
Performance: The ORM layers use reflection and introspection to instantiate and populate the objects with data from the database. These are costly operations in terms of processing and add to the performance degradation of the mapping operations. The Object Queries that are translated to produce unoptimized queries without the option of tuning them causing significant performance losses and overloading of the database management systems. Performance tuning the SQL is almost impossible since the frameworks provide little flexiblity over controlling the SQL that gets autogenerated.
Tight coupling: This approach creates a tight dependancy between model objects and database schemas. Developers don't want a one-to-one correlation between database fields and class fields. Changing the database schema has rippling affects in the object model and mapping configuration and vice versa.
Caches: This approach also requires the use of object caches and contexts that are necessary to maintian and track the state of the object and reduce database roundtrips for the cached data. These caches if not maintained and synchrnonized in a multi-tiered implementation can have significant ramifications in terms of data-accuracy and concurrency. Often third party caches or external caches have to be plugged in to solve this problem, adding extensive burden to the data-access layer.
For more information on our analysis you can read:
http://www.orasissoftware.com/driver.aspx?topic=whitepaper
At a very high level: ORMs help to reduce the Object-Relational impedance mismatch. They allow you to store and retrieve full live objects from a relational database without doing a lot of parsing/serialization yourself.
What does it give me as a developer?
For starters it helps you stay DRY. Either you schema or you model classes are authoritative and the other is automatically generated which reduces the number of bugs and amount of boiler plate code.
It helps with marshaling. ORMs generally handle marshaling the values of individual columns into the appropriate types so that you don't have to parse/serialize them yourself. Furthermore, it allows you to retrieve fully formed object from the DB rather than simply row objects that you have to wrap your self.
How will my code differ from the individual SELECT statements that I use now?
Since your queries will return objects rather then just rows, you will be able to access related objects using attribute access rather than creating a new query. You are generally able to write SQL directly when you need to, but for most operations (CRUD) the ORM will make the code for interacting with persistent objects simpler.
How will it help with DB access and security?
Generally speaking, ORMs have their own API for building queries (eg. attribute access) and so are less vulnerable to SQL injection attacks; however, they often allow you to inject your own SQL into the generated queries so that you can do strange things if you need to. Such injected SQL you are responsible for sanitizing yourself, but, if you stay away from using such features then the ORM should take care of sanitizing user data automatically.
How does it find out about the DB schema and user credentials?
Many ORMs come with tools that will inspect a schema and build up a set of model classes that allow you to interact with the objects in the database. [Database] user credentials are generally stored in a settings file.
If you write your data access layer by hand, you are essentially writing your own feature poor ORM.
Oren Eini has a nice blog which sums up what essential features you may need in your DAL/ORM and why it writing your own becomes a bad idea after time:
http://ayende.com/Blog/archive/2006/05/12/25ReasonsNotToWriteYourOwnObjectRelationalMapper.aspx
EDIT: The OP has commented in other answers that his code base isn't very object oriented. Dealing with object mapping is only one facet of ORMs. The Active Record pattern is a good example of how ORMs are still useful in scenarios where objects map 1:1 to tables.
Top Benefits:
Database Abstraction
API-centric design mentality
High Level == Less to worry about at the fundamental level (its been thought of for you)
I have to say, working with an ORM is really the evolution of database-driven applications. You worry less about the boilerplate SQL you always write, and more on how the interfaces can work together to make a very straightforward system.
I love not having to worry about INNER JOIN and SELECT COUNT(*). I just work in my high level abstraction, and I've taken care of database abstraction at the same time.
Having said that, I never have really run into an issue where I needed to run the same code on more than one database system at a time realistically. However, that's not to say that case doesn't exist, its a very real problem for some developers.
I can't speak for other ORM's, just Hibernate (for Java).
Hibernate gives me the following:
Automatically updates schema for tables on production system at run-time. Sometimes you still have to update some things manually yourself.
Automatically creates foreign keys which keeps you from writing bad code that is creating orphaned data.
Implements connection pooling. Multiple connection pooling providers are available.
Caches data for faster access. Multiple caching providers are available. This also allows you to cluster together many servers to help you scale.
Makes database access more transparent so that you can easily port your application to another database.
Make queries easier to write. The following query that would normally require you to write 'join' three times can be written like this:
"from Invoice i where i.customer.address.city = ?" this retrieves all invoices with a specific city
a list of Invoice objects are returned. I can then call invoice.getCustomer().getCompanyName(); if the data is not already in the cache the database is queried automatically in the background
You can reverse-engineer a database to create the hibernate schema (haven't tried this myself) or you can create the schema from scratch.
There is of course a learning curve as with any new technology but I think it's well worth it.
When needed you can still drop down to the lower SQL level to write an optimized query.
Most databases used are relational databases which does not directly translate to objects. What an Object-Relational Mapper does is take the data, create a shell around it with utility functions for updating, removing, inserting, and other operations that can be performed. So instead of thinking of it as an array of rows, you now have a list of objets that you can manipulate as you would any other and simply call obj.Save() when you're done.
I suggest you take a look at some of the ORM's that are in use, a favourite of mine is the ORM used in the python framework, django. The idea is that you write a definition of how your data looks in the database and the ORM takes care of validation, checks and any mechanics that need to run before the data is inserted.
What does it give me as a developer?
Saves you time, since you don't have to code the db access portion.
How will my code differ from the individual SELECT statements that I use now?
You will use either attributes or xml files to define the class mapping to the database tables.
How will it help with DB access and security?
Most frameworks try to adhere to db best practices where applicable, such as parametrized SQL and such. Because the implementation detail is coded in the framework, you don't have to worry about it. For this reason, however, it's also important to understand the framework you're using, and be aware of any design flaws or bugs that may open unexpected holes.
How does it find out about the DB schema and user credentials?
You provide the connection string as always. The framework providers (e.g. SQL, Oracle, MySQL specific classes) provide the implementation that queries the db schema, processes the class mappings, and renders / executes the db access code as necessary.
Personally I've not had a great experience with using ORM technology to date. I'm currently working for a company that uses nHibernate and I really can't get on with it. Give me a stored proc and DAL any day! More code sure ... but also more control and code that's easier to debug - from my experience using an early version of nHibernate it has to be added.
Using an ORM will remove dependencies from your code on a particular SQL dialect. Instead of directly interacting with the database you'll be interacting with an abstraction layer that provides insulation between your code and the database implementation. Additionally, ORMs typically provide protection from SQL injection by constructing parameterized queries. Granted you could do this yourself, but it's nice to have the framework guarantee.
ORMs work in one of two ways: some discover the schema from an existing database -- the LINQToSQL designer does this --, others require you to map your class onto a table. In both cases, once the schema has been mapped, the ORM may be able to create (recreate) your database structure for you. DB permissions probably still need to be applied by hand or via custom SQL.
Typically, the credentials supplied programatically via the API or using a configuration file -- or both, defaults coming from a configuration file, but able to be override in code.
While I agree with the accepted answer almost completely, I think it can be amended with lightweight alternatives in mind.
If you have complex, hand-tuned SQL
If your objects don't have any 1:1, 1:m or m:n relationships with other objects
If you have a complex legacy schema that can't be refactored
...then you might benefit from a lightweight ORM where SQL is is not
obscured or abstracted to the point where it is easier to write your
own database integration.
These are a few of the many reasons why the developer team at my company decided that we needed to make a more flexible abstraction to reside on top of the JDBC.
There are many open source alternatives around that accomplish similar things, and jORM is our proposed solution.
I would recommend to evaluate a few of the strongest candidates before choosing a lightweight ORM. They are slightly different in their approach to abstract databases, but might look similar from a top down view.
jORM
ActiveJDBC
ORMLite
my concern with ORM frameworks is probably the very thing that makes it attractive to lots of developers.
nameley that it obviates the need to 'care' about what's going on at the DB level. Most of the problems that we see during the day to day running of our apps are related to database problems. I worry slightly about a world that is 100% ORM that people won't know about what queries are hitting the database, or if they do, they are unsure about how to change them or optimize them.
{I realize this may be a contraversial answer :) }