When shouldn't you use a relational database? [closed] - sql

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
Apart from the google/bigtable scenario, when shouldn't you use a relational database? Why not, and what should you use? (did you learn 'the hard way'?)

In my experience, you shouldn't use a relational database when any one of these criteria are true:
your data is structured as a hierarchy or a graph (network) of arbitrary depth,
the typical access pattern emphasizes reading over writing, or
there’s no requirement for ad-hoc queries.
Deep hierarchies and graphs do not translate well to relational tables. Even with the assistance of proprietary extensions like Oracle's CONNECT BY, chasing down trees is a mighty pain using SQL.
Relational databases add a lot of overhead for simple read access. Transactional and referential integrity are powerful, but overkill for some applications. So for read-mostly applications, a file metaphor is good enough.
Finally, you simply don’t need a relational database with its full-blown query language if there are no unexpected queries anticipated. If there are no suits asking questions like "how many 5%-discounted blue widgets did we sell in on the east coast grouped by salesperson?", and there never will be, then you, sir, can live free of DB.

The relational database paradigm makes some assumptions about usage of data.
A relation consists of an unordered set of rows.
All rows in a relation have the same set of columns.
Each column has a fixed name and data type and semantic meaning on all rows.
Rows in a relation are identified by unique values in primary key column(s).
etc.
These assumptions support simplicity and structure, at the cost of some flexibility. Not all data management tasks fit into this kind of structure. Entities with complex attributes or variable attributes do not, for instance. If you need flexibility in areas where a relational database solution doesn't support it, you need to use a different kind of solution.
There are other solutions for managing data with different requirements. Semantic Web technology, for example, allows each entity to define its own attributes and to be self-describing, by treating metadata as attributes just like data. This is more flexible than the structure imposed by a relational database, but that flexibility comes with a cost of its own.
Overall, you should use the right tool for each job.
See also my other answer to "The Next-gen databases."

There are three main data models (C.J.Date, E.F.Codd) and I am adding a flat file to this:
flat file(s) (structure varies - from 'stupid' flat text to files conforming to grammars which coupled with clever tools do very clever things, think compilers and what they can do, narrow application in modelling new things)
hierarchical (trees, nested sets - examples: xml and other markup languages, registry, organizational charts, etc; anything can be modelled, but integrity rules are not easy to express and retrieval is hard to optimize automatically, some retrieval is fast and some is very slow )
network (networks, graphs - examples: navigational databases, hyperlinks, semantic web, again almost anything can be modelled but automatic optimizing of retrieval is a problem)
relational (first order predicate logic - example: relational databases, automatic optimization of retrieval)
Both hierarchical and network can be represented in relational and relational can be expressed in the other two.
The reason that relational is considered 'better' is the declarative nature and standardization on not only the data retrieval language but also on the data definition language, including the strong declarative data integrity, backed up with stable, scalable, multi-user management system.
Benefits come at a cost, which most projects find to be a good ratio for systems (multi application) that store long term data in a from that will be usable in foreseeable future.
If you are not building a system, but a single application, perhaps for a single user, and you are fairly certain that you will not want multiple applications using your data, nor multiple users, any time soon then you'll probably find faster approaches.
Also if you don't know what kind of data you want to store and how to model it then relational model strengths are wasted on it.
Or if you simply don't care about integrity of your data that much (which can be fine).
All data structures are optimized for a certain kind of use, only relational if properly modelled tries to represent the 'reality' in semantically unbiased way. People who had bad experience with relational databases usually don't realize that their experience would have been much worse with other types of data models. Horrible implementations are possible, and especially with relational databases, where it is relatively easy to build complex models, you could end up with quite a monster on your hands. Still I always feel better when I try to imagine the same monster in xml.
One example of how good relational model is, IMO, is ratio of complexity vs shortness of the questions that you will find that involve SQL.

I suggest you visit the High Scalability blog, which discusses this topic almost on a daily basis and has many articles about projects that chose distributed hashes, etc. over RDMBS.
The quick (but very incomplete answer) is that not all data translates well to tables in efficient ways. For example, if your data is essentially one big dictionary, there are probably much faster alternatives that plain old RDBMS. Having said that, it mostly a matter of performance, and if performance isn't a huge concern in a project, and stability, consistency and reliability, for example, are, then I don't see much point in delving into these technologies when RDBMS is a much more mature and well developed scheme, with support in all languages and platforms and a huge set of solutions to choose from.

Fifteen years ago I was working on a credit risk system (basically a big tree walking system). We were using Sybase on HPUX & solaris and performnce was killing us. We hired in consultants direct from Sybase who said it couldn't be done. Then we switched to an OO database (Object store in this case) and got a about a 100x performance increase (and the code was about 100x easier to write too)
But such situations are quite rare - a relational database is a good first choice.

When you schema varies a lot you will have a hard time with relational databases. This is where XML databases or key-value pair databases work best. or you could use IBM DB2 and have both relational data and XML data managed by a single database engine.

About 7-8 years ago I worked on a web site that grew in popularity beyond our initial expectations and it got us in trouble performance-wise. Since we were all relatively inexperienced in web based projects it posed a significant strain on us about what to do beyond usual database separation onto separate server, load balancing etc.
One day I've thought of something pretty simple. Since site was based on users, their profiles were stored in a database table the usual way someone would do it - user id, lots of info variables and stuff like that - which would show up as a users profile page which other users could look up. I've flushed all that data into a simple html file, already prepared as a users profile page and got a significant boost - basically a cache. I even made a system that when user edited their profile info, it would parse original html file, put it up for edit, and then flush out html back to the file system - got even more boost.
I made something simillar with messages users sent to each other. Basically wherever I could make a system bypass a database altogether, avoiding a INSERT or UPDATE, I got a significant boost. It may sound like a common sense, but it was an enlightening moment. It is not an avoidance of relational setup per se, but it is an avoidance of the database altogether - KISS.

Related

What is NoSQL? How it works? [duplicate]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
I've been hearing things about NoSQL and that it may eventually become the replacement for SQL DB storage methods due to the fact that DB interaction is often a bottle neck for speed on the web.
So I just have a few questions:
What exactly is it?
How does it work?
Why would it be better than using a SQL Database? And how much better is it?
Is the technology too new to start implementing yet or is it worth taking a look into?
There is no such thing as NoSQL!
NoSQL is a buzzword.
For decades, when people were talking about databases, they meant relational databases. And when people were talking about relational databases, they meant those you control with Edgar F. Codd's Structured Query Language. Storing data in some other way? Madness! Anything else is just flatfiles.
But in the past few years, people started to question this dogma. People wondered if tables with rows and columns are really the only way to represent data. People started thinking and coding, and came up with many new concepts how data could be organized. And they started to create new database systems designed for these new ways of working with data.
The philosophies of all these databases were different. But one thing all these databases had in common, was that the Structured Query Language was no longer a good fit for using them. So each database replaced SQL with their own query languages. And so the term NoSQL was born, as a label for all database technologies which defy the classic relational database model.
So what do NoSQL databases have in common?
Actually, not much.
You often hear phrases like:
NoSQL is scalable!
NoSQL is for BigData!
NoSQL violates ACID!
NoSQL is a glorified key/value store!
Is that true? Well, some of these statements might be true for some databases commonly called NoSQL, but every single one is also false for at least one other. Actually, the only thing NoSQL databases have in common, is that they are databases which do not use SQL. That's it. The only thing that defines them is what sets them apart from each other.
So what sets NoSQL databases apart?
So we made clear that all those databases commonly referred to as NoSQL are too different to evaluate them together. Each of them needs to be evaluated separately to decide if they are a good fit to solve a specific problem. But where do we begin? Thankfully, NoSQL databases can be grouped into certain categories, which are suitable for different use-cases:
Document-oriented
Examples: MongoDB, CouchDB
Strengths: Heterogenous data, working object-oriented, agile development
Their advantage is that they do not require a consistent data structure. They are useful when your requirements and thus your database layout changes constantly, or when you are dealing with datasets which belong together but still look very differently. When you have a lot of tables with two columns called "key" and "value", then these might be worth looking into.
Graph databases
Examples: Neo4j, GiraffeDB.
Strengths: Data Mining
While most NoSQL databases abandon the concept of managing data relations, these databases embrace it even more than those so-called relational databases.
Their focus is at defining data by its relation to other data. When you have a lot of tables with primary keys which are the primary keys of two other tables (and maybe some data describing the relation between them), then these might be something for you.
Key-Value Stores
Examples: Redis, Cassandra, MemcacheDB
Strengths: Fast lookup of values by known keys
They are very simplistic, but that makes them fast and easy to use. When you have no need for stored procedures, constraints, triggers and all those advanced database features and you just want fast storage and retrieval of your data, then those are for you.
Unfortunately they assume that you know exactly what you are looking for. You need the profile of User157641? No problem, will only take microseconds. But what when you want the names of all users who are aged between 16 and 24, have "waffles" as their favorite food and logged in in the last 24 hours? Tough luck. When you don't have a definite and unique key for a specific result, you can't get it out of your K-V store that easily.
Is SQL obsolete?
Some NoSQL proponents claim that their favorite NoSQL database is the new way of doing things, and SQL is a thing of the past.
Are they right?
No, of course they aren't. While there are problems SQL isn't suitable for, it still got its strengths. Lots of data models are simply best represented as a collection of tables which reference each other. Especially because most database programmers were trained for decades to think of data in a relational way, and trying to press this mindset onto a new technology which wasn't made for it rarely ends well.
NoSQL databases aren't a replacement for SQL - they are an alternative.
Most software ecosystems around the different NoSQL databases aren't as mature yet. While there are advances, you still haven't got supplemental tools which are as mature and powerful as those available for popular SQL databases.
Also, there is much more know-how for SQL around. Generations of computer scientists have spent decades of their careers into research focusing on relational databases, and it shows: The literature written about SQL databases and relational data modelling, both practical and theoretical, could fill multiple libraries full of books. How to build a relational database for your data is a topic so well-researched it's hard to find a corner case where there isn't a generally accepted by-the-book best practice.
Most NoSQL databases, on the other hand, are still in their infancy. We are still figuring out the best way to use them.
What exactly is it?
On one hand, a specific system, but it has also become a generic word for a variety of new data storage backends that do not follow the relational DB model.
How does it work?
Each of the systems labelled with the generic name works differently, but the basic idea is to offer better scalability and performance by using DB models that don't support all the functionality of a generic RDBMS, but still enough functionality to be useful. In a way it's like MySQL, which at one time lacked support for transactions but, exactly because of that, managed to outperform other DB systems. If you could write your app in a way that didn't require transactions, it was great.
Why would it be better than using a SQL Database? And how much better is it?
It would be better when your site needs to scale so massively that the best RDBMS running on the best hardware you can afford and optimized as much as possible simply can't keep up with the load. How much better it is depends on the specific use case (lots of update activity combined with lots of joins is very hard on "traditional" RDBMSs) - could well be a factor of 1000 in extreme cases.
Is the technology too new to start implementing yet or is it worth taking a look into?
Depends mainly on what you're trying to achieve. It's certainly mature enough to use. But few applications really need to scale that massively. For most, a traditional RDBMS is sufficient. However, with internet usage becoming more ubiquitous all the time, it's quite likely that applications that do will become more common (though probably not dominant).
Since someone said that my previous post was off-topic, I'll try to compensate :-) NoSQL is not, and never was, intended to be a replacement for more mainstream SQL databases, but a couple of words are in order to get things in the right perspective.
At the very heart of the NoSQL philosophy lies the consideration that, possibly for commercial and portability reasons, SQL engines tend to disregard the tremendous power of the UNIX operating system and its derivatives.
With a filesystem-based database, you can take immediate advantage of the ever-increasing capabilities and power of the underlying operating system, which have been steadily increasing for many years now in accordance with Moore's law. With this approach, many operating-system commands become automatically also "database operators" (think of "ls" "sort", "find" and the other countless UNIX shell utilities).
With this in mind, and a bit of creativity, you can indeed devise a filesystem-based database that is able to overcome the limitations of many common SQL engines, at least for specific usage patterns, which is the whole point behind NoSQL's philosophy, the way I see it.
I run hundreds of web sites and they all use NoSQL to a greater or lesser extent. In fact, they do not host huge amounts of data, but even if some of them did I could probably think of a creative use of NoSQL and the filesystem to overcome any bottlenecks. Something that would likely be more difficult with traditional SQL "jails". I urge you to google for "unix", "manis" and "shaffer" to understand what I mean.
If I recall correctly, it refers to types of databases that don't necessarily follow the relational form. Document databases come to mind, databases without a specific structure, and which don't use SQL as a specific query language.
It's generally better suited to web applications that rely on performance of the database, and don't need more advanced features of Relation Database Engines. For example, a Key->Value store providing a simple query by id interface might be 10-100x faster than the corresponding SQL server implementation, with a lower developer maintenance cost.
One example is this paper for an OLTP Tuple Store, which sacrificed transactions for single threaded processing (no concurrency problem because no concurrency allowed), and kept all data in memory; achieving 10-100x better performance as compared to a similar RDBMS driven system. Basically, it's moving away from the 'One Size Fits All' view of SQL and database systems.
In practice, NoSQL is a database system which supports fast access to large binary objects (docs, jpgs etc) using a key based access strategy. This is a departure from the traditional SQL access which is only good enough for alphanumeric values. Not only the internal storage and access strategy but also the syntax and limitations on the display format restricts the traditional SQL. BLOB implementations of traditional relational databases too suffer from these restrictions.
Behind the scene it is an indirect admission of the failure of the SQL model to support any form of OLTP or support for new dataformats. "Support" means not just store but full access capabilities - programmatic and querywise using the standard model.
Relational enthusiasts were quick to modify the defnition of NoSQL from Not-SQL to Not-Only-SQL to keep SQL still in the picture! This is not good especially when we see that most Java programs today resort to ORM mapping of the underlying relational model. A new concept must have a clearcut definition. Else it will end up like SOA.
The basis of the NoSQL systems lies in the random key - value pair. But this is not new. Traditional database systems like IMS and IDMS did support hashed ramdom keys (without making use of any index) and they still do. In fact IDMS already has a keyword NONSQL where they support SQL access to their older network database which they termed as NONSQL.
It's like Jacuzzi: both a brand and a generic name. It's not just a specific technology, but rather a specific type of technology, in this case referring to large-scale (often sparse) "databases" like Google's BigTable or CouchDB.
NoSQL the actual program appears to be a relational database implemented in awk using flat files on the backend. Though they profess, "NoSQL essentially has no arbitrary limits, and can work where other products can't. For example there is no limit on data field size, the number of columns, or file size" , I don't think it is the large scale database of the future.
As Joel says, massively scalable databases like BigTable or HBase, are much more interesting. GQL is the query language associated with BigTable and App Engine. It's largely SQL tweaked to avoid features Google considers bottle-necks (like joins). However, I haven't heard this referred to as "NoSQL" before.
NoSQL is a database system which doesn't use string based SQL queries to fetch data.
Instead you build queries using an API they will provide, for example Amazon DynamoDB is a good example of a NoSQL database.
NoSQL databases are better for large applications where scalability is important.
Does NoSQL mean non-relational database?
Yes, NoSQL is different from RDBMS and OLAP. It uses looser consistency models than traditional relational databases.
Consistency models are used in distributed systems like distributed shared memory systems or distributed data store.
How it works internally?
NoSQL database systems are often highly optimized for retrieval and appending operations and often offer little functionality beyond record storage (e.g. key-value stores). The reduced run-time flexibility compared to full SQL systems is compensated by marked gains in scalability and performance for certain data models.
It can work on Structured and Unstructured Data. It uses Collections instead of Tables
How do you query such "database"?
Watch SQL vs NoSQL: Battle of the Backends; it explains it all.

SQL versus noSQL (speed)

When people are comparing SQL and noSQL, and concluding the upsides and downsides of each one, what I never hear anyone talking about is the speed.
Isn't performing SQL queries generally faster than performing noSQL queries?
I mean, for me this would be a really obvious conclusion, because you should always be able to find something faster if you know the structure of your database than if you don't.
But people never seem to mention this, so I want to know if my conclusion is right or wrong.
People who tend to use noSQL use it specifically because it fits their use cases. Being divorced from normal RDBMS table relationships and constraints, as well as ACID-ity of data, it's very easy to make it run a lot faster.
Consider Twitter, which uses NoSQL because a user only does very limited things on site, or one exactly - tweet. And concurrency can be considered non-existent since (1) nobody else can modify your tweet and (2) you won't normally be simultaneously tweeting from multiple devices.
The definition of noSQL systems is a very broad one -- a database that doesn't use SQL / is not a RDBMS.
Therefore, the answer to your question is, in short: "it depends".
Some noSQL systems are basically just persistent key/value storages (like Project Voldemort). If your queries are of the type "look up the value for a given key", such a system will (or at least should be) faster that an RDBMS, because it only needs to have a much smaller feature set.
Another popular type of noSQL system is the document database (like CouchDB).
These databases have no predefined data structure.
Their speed advantage relies heavily on denormalization and creating a data layout that is tailored to the queries that you will run on it. For example, for a blog, you could save a blog post in a document together with its comments. This reduces the need for joins and lookups, making your queries faster, but it also could reduce your flexibility regarding queries.
As Einstein would say, speed is relative.
If you need to store a master/detail simple application (like a shopping cart), you would need to do several Insert statements in your SQL application, also you will get a Data set of information when you do a query to get the purchase, if you're using NoSQL, and you're using it well, then you would have all the data for a single order in one simple "record" (document if you use the terms of NoSQL databases like djondb).
So, I really think that the performance of an application can be measured by the number of things it need to do to achieve a single requirement, if you need to do several Inserts to store an order and you only need one simple Insert in a database like djondb then the performance will be 10x faster in the NoSQL world, just because you're using 10 times less calls to the database layer, that's it.
To illustrate my point let me link an example I wrote sometime ago about the differences between NoSQL and SQL data models approach: https://web.archive.org/web/20160510045647/http://djondb.com/blog/nosql-masterdetail-sample/, I know it's a self reference, but basically I wrote it to address this question which I found it's the most challenging question a RDBMS guy could have and it's always a good way to explain why NoSQL is so different from SQL world, and why it will achieve better performance anytime, not because we use "nasa" technology, it's because NoSQL will let the developer do less... and get more, and less code = greater performance.
The answer is: it depends. Generally speaking, the objective of NoSQL DATABASES (no "queries") is scalability. RDBMS usually have some hard limits at some point (I'm talking about millons and millons of rows) where you could not scale any more by traditional means (Replication, clustering, partitioning), and you need something more because your needs keep growing. Or even if you manage to scale, the overall setup is quite complicated. Or you can scale reads, but not writes.
And the queries depends on the particular implementation of your server, the type of query you are doing, the columns in the table, etc... remember that queries are just one part of the RDBMS.
query time of relational database like SQL for 1000 person data is 2000 ms and graph database like neo4j
is 2ms .if you crate more node 1000000 speed stable 2 ms

Why use SQL database? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I'm not quite sure stackoverflow is a place for such a general question, but let's give it a try.
Being exposed to the need of storing application data somewhere, I've always used MySQL or sqlite, just because it's always done like that. As it seems like the whole world is using these databases (most of all software products, frameworks, etc), it is rather hard for a beginning developer like me to start thinking about whether this is a good solution or not.
Ok, say we have some object-oriented logic in our application, and objects are related to each other somehow. We need to map this logic to the storage logic, so relations between database objects are required too. This leads us to using relational database, and I'm ok with that - to put it simple, our database table rows sometimes will need to have references to other tables' rows. But why use SQL language for interaction with such a database?
SQL query is a text message. I can understand this is cool for actually understanding what it does, but isn't it silly to use text table and column names for a part of application that no one ever seen after deploynment? If you had to write a data storage from scratch, you would have never used this kind of solution. Personally, I would have used some 'compiled db query' bytecode, that would be assembled once inside a client application and passed to the database. And it surely would name tables and colons by id numbers, not ascii-strings. In the case of changes in table structure those byte queries could be recompiled according to new db schema, stored in XML or something like that.
What are the problems of my idea? Is there any reason for me not to write it myself and to use SQL database instead?
EDIT To make my question more clear. Most of answers claim that SQL, being a text query, helps developers better understand the query itself and debug it more easily. Personally, I haven't seen people writing SQL queries by hand for a while. Everyone I know, including me, is using ORM. This situation, in which we build up a new level of abstraction to hide SQL, leads to thinking if we need SQL or not. I would be very grateful if you could give some examples in which SQL is used without ORM purposely, and why.
EDIT2 SQL is an interface between a human and a database. The question is why do we have to use it for application/database interaction? I still ask for examples of human beings writing/debugging SQL.
Everyone I know, including me, is using ORM
Strange. Everyone I know, including me, still writes most of the SQL by hand. You typically end up with tighter, more high performance queries than you do with a generated solution. And, depending on your industry and application, this speed does matter. Sometimes a lot. yeah, I'll sometimes use LINQ for a quick-n-dirty where I don't really care what the resulting SQL looks like, but thus far nothing automated beats hand-tuned SQL for when performance against a large database in a high-load environment really matters.
If all you need to do is store some application data somewhere, then a general purpose RDBMS or even SQLite might be overkill. Serializing your objects and writing them to a file might be simpler in some cases. An advantage to SQLite is that if you have a lot of this kind of information, it is all contained in one file. A disadvantage is that it is more difficult to read it. For example, if you serialize you data to YAML, you can read the file with any text editor or shell.
Personally, I would have used some
'compiled db query' bytecode, that
would be assembled once inside a
client application and passed to the
database.
This is how some database APIs work. Check out static SQL and prepared statements.
Is there any reason for me not to
write it myself and to use SQL
database instead?
If you need a lot of features, at some point it will be easier to use an existing RDMBS then to write your own database from scratch. If you don't need many features, a simpler solution may be wiser.
The whole point of database products is to avoid writing the database layer for every new program. Yes, a modern RDMBS might not always be a perfect fit for every project. This is because they were designed to be very general, so in practice, you will always get additional features you don't need. That doesn't mean it is better to have a custom solution. The glove doesn't always need to be a perfect fit.
UPDATE:
But why use SQL language for
interaction with such a database?
Good question.
The answer to that may be found in the original paper describing the relational model A Relational Model of Data for Large Shared Data Banks, by E. F. Codd, published by IBM in 1970. This paper describes the problems with the existing database technologies of the time, and explains why the relational model is superior.
The reason for using the relational model, and thus a logical query language like SQL, is data independence.
Data independence is defined in the paper as:
"... the independence of application programs and terminal activities from the growth in data types and changes in data representations."
Before the relational model, the dominate technology for databases was referred to as the network model. In this model, the programmer had to know the on-disk structure of the data and traverse the tree or graph manually. The relational model allows one to write a query against the conceptual or logical scheme that is independent of the physical representation of the data on disk. This separation of logical scheme from the physical schema is why we use the relational model. For a more on this issue, here are some slides from a database class. In the relational model, we use logic based query languages like SQL to retrieve data.
Codd's paper goes into more detail about the benefits of the relational model. Give it a read.
SQL is a query language that is easy to type into a computer in contrast with the query languages typically used in a research papers. Research papers generally use relation algebra or relational calculus to write queries.
In summary, we use SQL because we happen to use the relational model for our databases.
If you understand the relational model, it is not hard to see why SQL is the way it is. So basically, you need to study the relation model and database internals more in-depth to really understand why we use SQL. It may be a bit of a mystery otherwise.
UPDATE 2:
SQL is an interface between a human
and a database. The question is why do
we have to use it for
application/database interaction? I
still ask for examples of human beings
writing/debugging SQL.
Because the database is a relational database, it only understands relational query languages. Internally it uses a relational algebra like language for specifying queries which it then turns into a query plan. So, we write our query in a form we can understand (SQL), the DB takes our SQL query and turns it into its internal query language. Then it takes the query and tries to find a "query plan" for executing the query. Then it executes the query plan and returns the result.
At some point, we must encode our query in a format that the database understands. The database only knows how to convert SQL to its internal representation, that is why there is always SQL at some point in the chain. It cannot be avoided.
When you use ORM, your just adding a layer on top of the SQL. The SQL is still there, its just hidden. If you have a higher-level layer for translating your request into SQL, then you don't need to write SQL directly which is beneficial in some cases. Some times we do not have such a layer that is capable of doing the kinds of queries we need, so we must use SQL.
Given the fact that you used MySQL and SQLite, I understand your point of view completely. Most DBMS have features that would require some of the programming from your side, while you get it from database for free:
Indexes - you can store large amounts of data and still be able to filter and search very quickly because of indexes. Of course, you could implement you own indexing, but why reinvent the wheel
data integrity - using database features like cascading foreign keys can ensure data integrity across the system. You only need to declare relationship between data, and system takes care of the rest. Of course, once more, you could implement constraints in code, but it's more work. Consider, for example, deletion, where you would have to write code in object's destructor to track all dependent objects and act accordingly
ability to have multiple applications written in different programming languages, working on different operating systems, some even distributed across the network - all using the same data stored in a common database
dead easy implementation of observer pattern via triggers. There are many cases where only some data depends on some other data and it does not affect UI aspect of application. Ensuring consistency can be very tricky or require a lot of programming. Of course, you could implement trigger-like behavior with objects but it requires more programming than simple SQL definition
There are some good answers here. I'll attempt to add my two cents.
I like SQL, I can think in it pretty easily. The queries produced by layers on top of the DB (like ORM frameworks) are usually hideous. They'll select tons of extra stuff, join in things you don't need, etc.; all because they don't know that you only want a small part of the object in this code. When you need high performance, you'll often end up going in and using at least some custom SQL queries in an ORM system just to speed up a few bottlenecks.
Why SQL? As others have said, it's easy for humans. It makes a good lowest common denominator. Any language can make SQL and call command line clients if necessary, and they is pretty much always a good library.
Is parsing out the SQL inefficient? Somewhat. The grammar is pretty structured, so there aren't tons of ambiguities that would make the parser's job really hard. The real thing is that the overhead of parsing out SQL is basically nothing.
Let's say you run a query like "SELECT x FROM table WHERE id = 3", and then do it again with 4, then 5, over and over. In that case, the parsing overhead may exist. That's why you have prepared statements (as others have mentioned). The server parses the query once, and can swap in the 3 and 4 and 5 without having to reparse everything.
But that's the trivial case. In real life, your system may join 6 tables and have to pull hundreds of thousands of records (if not more). It may be a query that you let run on a database cluster for hours, because that's the best way to do things in your case. Even with a query that takes only a minute or two to execute, the time to parse the query is essentially free compared to pulling records off disk and doing sorting/aggregation/etc. The overhead of sending the ext "LEFT OUTER JOIN ON" is only a few bytes compared to sending special encoded byte 0x3F. But when your result set is 30 MB (let alone gigs+), those few extra bytes are worthless compared to not having to mess with some special query compiler object.
Many people use SQL on small databases. The biggest one I interact with is only a few dozen gigs. SQL is used on everything from tiny files (like little SQLite DBs may be) up to terabyte size Oracle clusters. Considering it's power, it's actually a surprisingly simple and small command set.
It's an ubiquitous standard. Pretty much every programming language out there has a way to access SQL databases. Try that with a proprietary binary protocol.
Everyone knows it. You can find experts easily, new developers will usually understand it to some degree without requiring training
SQL is very closely tied to the relational model, which has been thoroughly explored in regard to optimization and scalability. But it still frequently requires manual tweaking (index creation, query structure, etc.), which is relatively easy due to the textual interface.
But why use SQL language for interaction with such a database?
I think it's for the same reason that you use a human-readable (source code) language for interaction with the compiler.
Personally, I would have used some 'compiled db query' bytecode, that would be assembled once inside a client application and passed to the database.
This is an existing (optional) feature of databases, called "stored procedures".
Edit:
I would be very grateful if you could give some examples in which SQL is used without ORM purposely, and why
When I implemented my own ORM, I implemented the ORM framework using ADO.NET: and using ADO.NET includes using SQL statements in its implementation.
After all the edits and comments, the main point of your question appears to be : why is the nature of SQL closer to being a human/database interface than to being an application/database interface ?
And the very simple answer to that question is : because that is exactly what it was originally intended to be.
The predecessors of SQL (QUEL being presumably the most important one) were intended to be exactly that : a QUERY language, i.e. one that didn't have any of INSERT, UPDATE, DELETE.
Moreover, it was intended to be a query language that could be used by any user, provided that user was aware of the logical structure of the database, and obviously knew how to express that logical structure in the query language he was using.
The original ideas behind QUEL/SQL were that a database was built using "just any mechanism conceivable", that the "real" database could be really just anything (e.g. one single gigantic XML file - allthough 'XML' was not considered a valid option at the time), and that there would be "some kind of machinery" that understood how to transform the actual structure of that 'just anything' into the logical relational structure as it was perceived by the SQL user.
The fact that in order to actually achieve that, the underlying structures are required to lend themselves to "viewing them relationally", was not understood as well in those days as it is now.
Yes, it is annoying to have to write SQL statements to store and retrieve objects.
That's why Microsoft have added things like LINQ (language integrated query) into C# and VB.NET to make it possible to query databases using objects and methods instead of strings.
Most other languages have something similar with varying levels of success depending on the abilities of that language.
On the other hand, it is useful to know how SQL works and I think it is a mistake to shield yourself entirely from it. If you use the database without thinking you can write extremely inefficient queries and index the database incorrectly. But once you understand how to use SQL correctly and have tuned your database, you have a very powerful tried-and-tested tool available for finding exactly the data you need extremely quickly.
My biggest reason for SQL is Ad-hoc reporting. That report your business users want but don't know that they need it yet.
SQL is an interface between a human
and a database. The question is why do
we have to use it for
application/database interaction? I
still ask for examples of human beings
writing/debugging SQL.
I use sqlite a lot right from the simplest of tasks (like logging my firewall logs directly to a sqlite database) to more complex analytic and debugging tasks in my day-to-day research. Laying out my data in tables and writing SQL queries to munge them in interesting ways seems to be the most natural thing to me in these situations.
On your point about why it is still used as an interface between application/database, this is my simple reasoning:
There is about 3-4 decades of
serious research in that area
starting in 1970 with Codd's seminal
paper on Relational Algebra.
Relational Algebra forms the
mathematical basis to SQL (and other
QLs), although SQL does not
completely follow the relational
model.
The "text" form of the language
(aside from being easily
understandable to humans) is also
easily parsable by machines (say
using a grammar parser like like
lex) and is easily convertable to whatever "bytecode" using any number of optimizations.
I am not sure if doing this in any
other way would have yielded
compelling benefits in the generic cases. Otherwise it
would have been probably discovered
and adopted in the 3 decades of
research. SQL probably provides the
best tradeoffs when bridging the
divide between humans/databases and
applications/databases.
The question that then becomes interesting to ask is, "What are the real benefits of doing SQL in any other "non-text" way?" Will google for this now:)
SQL is a common interface used by the DBMS platform - the entire point of the interface is that all database operations can be specified in SQL without needing supplementary API calls. This means that there is a common interface across all clients of the system - application software, reports and ad-hoc query tools.
Secondly, SQL gets more and more useful as queries get more complex. Try using LINQ to specify a 12-way join a with three conditions based on existential predicates and a condition based on an aggregate calculated in a subquery. This sort of thing is fairly comprehensible in SQL but unlikely to be possible in an ORM.
In many cases an ORM will do 95% of what you want - most of the queries issued by applications are simple CRUD operations that an ORM or other generic database interface mechanism can handle easily. Some operations are best done using custom SQL code.
However, ORMs are not the be-all and end-all of database interfacing. Fowler's Patterns of Enterprise Application Architecture has quite a good section on other types of database access strategy with some discussion of the merits of each.
There are often good reasons not to use an ORM as the primary database interface layer. An example of a good one is that platform database libraries like ADO.Net often do a good enough job and integrate nicely with the rest of the environment. You might find that the gain from using some other interface doesn't really outweigh the benefits from the integration.
However, the final reason that you can't really ignore SQL is that you are ultimately working with a database if you are doing a database application. There are many, many WTF stories about screw-ups in commercial application code done by people who didn't understand databases properly. Poorly thought-out database code can cause trouble in so many ways, and blithely thinking that you don't need to understand how the DBMS works is an act of Hubris that is bound to come and bite you some day. Worse yet, it will come and bite some other poor schmoe who inherits your code.
While I see your point, SQL's query language has a place, especially in large applications with a lot of data. And to point out the obvious, if the language wasn't there, you couldn't call it SQL (Structured Query Language). The benefit of having SQL over the method you described is SQL is generally very readable, though some really push the limits on their queries.
I whole heartly agree with Mark Byers, you should not shield yourself from SQL. Any developer can write SQL, but to really make your application perform well with SQL interaction, understanding the language is a must.
If everything was precompiled with bytecode as you described, I'd hate to be the one to have to debug the application after the original developer left (or even after not seeing the code for 6 months).
I think the premise of the question is incorrect. That SQL can be represented as text is immaterial. Most modern databases would only compile queries once and cache them anyway, so you already have effectively a 'compiled bytecode'. And there's no reason this couldn't happen client-wise though I'm not sure if anyone's done it.
You said SQL is a text message, well I think of him as a messenger, and, as we know, don't shoot the messenger. The real issue is that relations are not a good enough way of organising real world data. SQL is just lipstick on the pig.
If the first part you seem to refer to what is usually called the Object - relational mapping impedance. There are already a lot of frameworks to alleviate that problem. There are tradeofs as well. Some things will be easier, others will get more complex, but in the general case they work well if you can afford the extra layer.
In the second part you seem to complain about SQL being text (it uses strings instead of ids, etc)... SQL is a query language. Any language (computer or otherwise) that is meant to be read or written by humans is text oriented for that matter. Assembly, C, PHP, you name it. Why? Because, well... it does make sense, doesn't it?
If you want precompiled queries, you already have stored procedures. Prepared statements are also compiled once on the fly, IIRC. Most (if not all) db drivers talk to the database server using a binary protocol anyway.
yes, text is a bit inefficient. But actually getting the data is a lot more costly, so the text based sql is reasonably insignificant.
SQL was created to provide an interface to make ad hoc queries against a relational database.
Generally, most relational databases understand some form of SQL.
Object-oriented databases exist, and (presumably) use objects to do their querying... but as I understand it, OO databases have a lot more overheard, and relational databases work just fine.
Relational Databases also allow you to operate in a "disconnected" state. Once you have the information you asked for, you can close the database connection. With an OO database, you either need to return all objects related to the current one (and the ones they're related to... and the... etc...) or reopen the connection to retrieve new objects as they are accessed.
In addition to SQL, you also have ORMs (object-relational mappings) that map objects to SQL and back. There are quite a few of them, including LINQ (.NET), the MS Entity Framework (.NET), Hibernate (Java), SQLAlchemy (Python), ActiveRecord (Ruby), Class::DBI (Perl), etc...
A database language is useful because it provides a logical model for your data independent of any applications that use it. SQL has a lot of shortcomings however, not the least being that its integration with other languages is poor, type support is about 30 years behind the rest of the industry and it has never been a truly relational language anyway.
SQL has survived mostly because the database market has been and remains dominated by the three mega-vendors who have a vested interest in protecting their investment. That's changing and SQL's days are probably numbered but the model that will finally replace it probably hasn't arrived yet - although there are plenty of contenders around these days.
I don't think most people are getting your question, though I think it's very clear. Unfortunately I don't have the "correct" answer. I would guess it's a combination of several things:
Semi-arbitrary decisions when it was designed such as ease of use, not needing a SQL compiler (or IDE), portability, etc.
It happened to catch on well (probably due to similar reasons)
And now due to historical reasons (compatibility, well known, proven, etc.) continues to be used.
I don't think most companies have bothered with another solution because it works well, isn't much of a bottleneck, it's a standard, blah, blah..
One of the Unix design principles can be said thusly, "Write programs to handle text streams, because that is a universal interface.".
And that, I believe, is why we typically use SQL instead of some 'byte-SQL' that only has a compilation interface. Even if we did have a byte-SQL, someone would write a "Text SQL", and the loop would be complete.
Also, MySQL and SQLite are less full-featured than, say, MSSQL and Oracle SQL. So you're still in the low end of the SQL pool.
Actually there are a few non-SQL database (like Objectivity, Oracle Berkeley DB, etc.) products came but non of them succeeded. In future if someone finds intuitive alternative for SQL, that will answer your question.
There are a lot of non relational database systems. Here are just a few:
Memcached
Tokyo Cabinet
As far as finding a relational database that doesn't use SQL as its primary interface, I think you won't find it. Reason: SQL is a great way to talk about relations. I can't figure out why that's a big deal to you: if you don't like SQL, put an abstraction over it (like an ORM) so you don't have to worry about it. Let the abstraction worry about it. It gets you to the same place.
However, the problem your'e really mentioning here is the object-relation disconnect - the problem is with the relation itself. Objects and relational-tuples don't always lend themselves to be a 1-1 relationship, which is the reason why a developer can frustrated with a database. The solution to that is to use a different database type.
Because often, you cannot be sure that (citing you) "no one ever seen after deployment". Knowing that there is an easy interface for reporting and for dataset level querying is a good path for evolution of your app.
You're right, that there are other solutions that may be valid in some situations: XML, plain text files, OODB...
But having a set of common interfaces (like ODBC) is a huge plus for the life of data.
I think the reason might be the search/find/grab algorithms the sql laungage is connected to do. Remember that sql has been developed for 40 years - and the goal has been both preformence wise and user firendly wise.
Ask yourself what the best way of finding 2 attibutes is. Now why investigating that each time you would want to do something that includes that each time you develope your application. Assuming the main goal is the developing of your application when developing an application.
An application has similarities with other applications, a database has similarities with other databases. So there should be a "best way" of these to interact, logically.
Also ask yourself how you would develop a better console only application that does not use sql laungage. If you cannot do that I think you need to develope a new kind of GUI that are even more fundamentally easier to use than with a console - to develope things from it. And that might actually be possible. But still most development of applications is based around console and typing.
Then when it comes to laungage I don´t think you can make a much more fundamentally easier text laungage than sql. And remember that each word of anything is inseperatly connected to its meaning - if you remove the meaning the word cannot be used - if you remove the word you cannot communicate the meaning. You have nothing to describe it with (And maybe you cannot even think it beacuse it woulden´t be connected to anything else you have thought before...).
So basically the best possible algorithms for database manipulation are assigned to words - if you remove these words you will have to assign these manipulations something else - and what would that be?
i think you can use ORM
if and only if you know the basic of sql.
else the result there isn't the best

NoSQL vs. SQL when scalability is irrelevant

Recently I have read a lot about different NoSQL databases and how they are being effectively deployed by some major websites out there. I'm starting a project in which I think the schema-free nature of a database such as MongoDB would be tremendously useful. Everything I have read though seems to indicate that the main advantage of a NoSQL database is scalability. Is choosing a NoSQL database for the schema-free design just as legitimate a design decision as that of scalability?
Yes, sometimes RDBMS are not the best solution, although there are ways to accomodate user defined fields (see XML Datatype, EAV design pattern, or just have spare generic columns) sometimes a schema free database is a good choice.
However, you need to nail down your requirements before choosing to go with a document database, as you will loose a lot of the power you may be used to with the relational model
eg...
If you would otherwise have multiple tables in your RDBMS database, you will need to research the features MongoDB affords you to accomodate these needs.
If you will need to query the data in specific ways, again you need to research what MongoDB offers you.
I wouldnt think of NoSQL as replacement for RDBMS, rather a slightly different tool that brings its own sets of advantages and disadvantages making it more suitable for some projects than others.
(Both databases may be used in some circumstances. Also if you decide to go down the route of possibly using MongoDB, once you have researched the websites out there and have more specific questions, you can visit Freenode IRC #mongodb channel)
There are a lot of other conditions that I've been hearing about with non-relational systems vs relational. I prefer this terminology over sql/no-sql as I personally think it describes the differences better, and several of the "no-sql" servers have sql add-ons, so anyway.... what sort of concurrency pattern or tranaction isolation is required in your system. One of the purported differences between rel and non-rel dbs is the "consistent-always", "consistent-mostly" or "consistent-eventually". Relation dbs by default usually fall into the "consistent-mostly" category and with some work, and a whole lot of locking and race conditions, ;) can be "consistent-always" so everyone is always looking at the most correct representation of a given piece of data. Most of what I've read/heard about non-rel dbs is that they are mainly "consistent-eventually". By this it means that there may be many instances of our data floating around, so user "A" may see that we have 92 widgets in inventory, whereas user "B" may see 79, and they may not get reconciled until someone actually goes to pull stuff from the warehouse. Another issue is mutability of data, how often does it need to be updated? The particular non-rel db's I've been exposed to have more overhead for updates, some of them having to regenerate the entire dataset to incorporate any updates.
Now mind, I think non-rel/nosql are great tools if they really match your use case. I've got several I'm looking into now for projects I've got. But you've got to look at all the trade offs when making the decision, otherwise it just turns into more resume driven development.
I don't think you should choose NoSQL datastore for its schema free design. Schema free design always existed in RDBMS via XML and some databases have good XML support. It is a lot easier to deal with a database than a NoSQL datastore. Scalability and big data should be the primary drivers to choose a NoSQL datastore otherwise the tradeoff of ACID and SQL is a lot to switch to NoSQL.
the most important things should be noticed to distinguish between No-SQL and SQL
which is :
NO-SQL useful when data base scales in a huge manner like social network
for example :
Stack Overflow: each question has multiple answers and not imaginary an answer without question, so No-SQL will ensure that each question include it's answers
as a result when needing getting answers of a question we can bring all answers without joining.Because join is the most expensive query in related database
thanks alot
what raised this issue that if you have a large server farm and need to manage the distribution of your data and load balancing which is more difficult and harder to implement using RDBMS and requires high IT skills to design, plan and deploy for your solution (and still performance is less).
but if you have only 3 or 4 servers with small project. I don't think you have an issue about it. NoSQL database is usually considered in large server farms not small number of servers

Replace relational DB (SQL Server) with rules-based/declarative implementation?

I have started working on a project in the financial services industry that is based (mainly) on SQL Server (2000), ColdFusion (8), and some Access/.NET applications. This project started as some simple Access forms/VBA and was slowly converted to web interfaces.
I could say that the database design and application coding was done by people that were learning on the job and didn't have the opportunity to learn about good design principles from the start. Many of the business rules are set in a myriad of cascading functions and stored procedures as well as in the web server templates. There is a huge amount of special case handling deep within complex 500-line SQL UDFs that use uncommented constants. It is very difficult to trace all of the interactions between the 10-20 UDFs that might be involved in a query. Some of the queries seem to take way too long to run (up to 15 minutes).
While the tables are fairly well indexed, there is a lack of FK relationships and almost no referential integrity. The DB is updated infrequently with daily batches of low volume (1,000 records in multiple tables.) It is primarily used to serve as a data repository - I suppose a data warehouse. We get very infrequent deadlocks or delays.
So, my question is: If I want to re-implement the whole project including the database and front-end would it make sense to look at non-relational implementations? The primary DB is only about 1GB (.mdf) so it could fit easily in memory. I would like to move from the SQL query structure to some declarative model that could be efficiently compiled and executed. If necessary, I could use the SQL DB just as a data store.
Why do you want to move from the relational approach? By moving from the relational approach you are only going to bury business logic deeper into the code by using any other approach. As you pointed out, the data model is fairly simple. You could first look at improving the data model itself. The reason they may not be any referential integrity constraints is because the initial designers might have assumed that this would lead to lower performance. They might be doing the checks using code that might itself be inefficient.
Your DB is small. adding referential integrity constraints will not affect the performance in any way. If required, you can rewrite some of the UDFs. Why dont you use a query analyzer to look at the performance metrics? That will give you a good starting point for analysis.
If I want to re-implement the whole project including the database and front-end would it make sense to look at non-relational implementations?
In general, most of the developers, even those who breathe map/reduce, and wear NoSQL T shirts, feel a LOT more comfortable with SQL.
If your application follows the classic MVC/MVP model, then most of the frameworks ( e.g. Spring, Rails, Grails, Django, Webmachine, etc.. ) actually come with first class support for a SQL back end. And some support for a NoSQL one.
In case you see no actual benefit that NoSQL can bring to your system ( here are the benefits I posted to another question ), why bother?
I would like to have a set of "english-language" rules that describe the transformations from the underlying raw data to a form that can be directly consumed by the application (web)
Seems that you are talking about a classic persistence layer with a service layer on top of it. Where "english-language" rules are just "english-language" methods in your service layer. Unless you need a more sophisticated rules engine, but most of the time it is not needed.