NoSQL vs. SQL when scalability is irrelevant - sql

Recently I have read a lot about different NoSQL databases and how they are being effectively deployed by some major websites out there. I'm starting a project in which I think the schema-free nature of a database such as MongoDB would be tremendously useful. Everything I have read though seems to indicate that the main advantage of a NoSQL database is scalability. Is choosing a NoSQL database for the schema-free design just as legitimate a design decision as that of scalability?

Yes, sometimes RDBMS are not the best solution, although there are ways to accomodate user defined fields (see XML Datatype, EAV design pattern, or just have spare generic columns) sometimes a schema free database is a good choice.
However, you need to nail down your requirements before choosing to go with a document database, as you will loose a lot of the power you may be used to with the relational model
eg...
If you would otherwise have multiple tables in your RDBMS database, you will need to research the features MongoDB affords you to accomodate these needs.
If you will need to query the data in specific ways, again you need to research what MongoDB offers you.
I wouldnt think of NoSQL as replacement for RDBMS, rather a slightly different tool that brings its own sets of advantages and disadvantages making it more suitable for some projects than others.
(Both databases may be used in some circumstances. Also if you decide to go down the route of possibly using MongoDB, once you have researched the websites out there and have more specific questions, you can visit Freenode IRC #mongodb channel)

There are a lot of other conditions that I've been hearing about with non-relational systems vs relational. I prefer this terminology over sql/no-sql as I personally think it describes the differences better, and several of the "no-sql" servers have sql add-ons, so anyway.... what sort of concurrency pattern or tranaction isolation is required in your system. One of the purported differences between rel and non-rel dbs is the "consistent-always", "consistent-mostly" or "consistent-eventually". Relation dbs by default usually fall into the "consistent-mostly" category and with some work, and a whole lot of locking and race conditions, ;) can be "consistent-always" so everyone is always looking at the most correct representation of a given piece of data. Most of what I've read/heard about non-rel dbs is that they are mainly "consistent-eventually". By this it means that there may be many instances of our data floating around, so user "A" may see that we have 92 widgets in inventory, whereas user "B" may see 79, and they may not get reconciled until someone actually goes to pull stuff from the warehouse. Another issue is mutability of data, how often does it need to be updated? The particular non-rel db's I've been exposed to have more overhead for updates, some of them having to regenerate the entire dataset to incorporate any updates.
Now mind, I think non-rel/nosql are great tools if they really match your use case. I've got several I'm looking into now for projects I've got. But you've got to look at all the trade offs when making the decision, otherwise it just turns into more resume driven development.

I don't think you should choose NoSQL datastore for its schema free design. Schema free design always existed in RDBMS via XML and some databases have good XML support. It is a lot easier to deal with a database than a NoSQL datastore. Scalability and big data should be the primary drivers to choose a NoSQL datastore otherwise the tradeoff of ACID and SQL is a lot to switch to NoSQL.

the most important things should be noticed to distinguish between No-SQL and SQL
which is :
NO-SQL useful when data base scales in a huge manner like social network
for example :
Stack Overflow: each question has multiple answers and not imaginary an answer without question, so No-SQL will ensure that each question include it's answers
as a result when needing getting answers of a question we can bring all answers without joining.Because join is the most expensive query in related database
thanks alot

what raised this issue that if you have a large server farm and need to manage the distribution of your data and load balancing which is more difficult and harder to implement using RDBMS and requires high IT skills to design, plan and deploy for your solution (and still performance is less).
but if you have only 3 or 4 servers with small project. I don't think you have an issue about it. NoSQL database is usually considered in large server farms not small number of servers

Related

What is NoSQL? How it works? [duplicate]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
I've been hearing things about NoSQL and that it may eventually become the replacement for SQL DB storage methods due to the fact that DB interaction is often a bottle neck for speed on the web.
So I just have a few questions:
What exactly is it?
How does it work?
Why would it be better than using a SQL Database? And how much better is it?
Is the technology too new to start implementing yet or is it worth taking a look into?
There is no such thing as NoSQL!
NoSQL is a buzzword.
For decades, when people were talking about databases, they meant relational databases. And when people were talking about relational databases, they meant those you control with Edgar F. Codd's Structured Query Language. Storing data in some other way? Madness! Anything else is just flatfiles.
But in the past few years, people started to question this dogma. People wondered if tables with rows and columns are really the only way to represent data. People started thinking and coding, and came up with many new concepts how data could be organized. And they started to create new database systems designed for these new ways of working with data.
The philosophies of all these databases were different. But one thing all these databases had in common, was that the Structured Query Language was no longer a good fit for using them. So each database replaced SQL with their own query languages. And so the term NoSQL was born, as a label for all database technologies which defy the classic relational database model.
So what do NoSQL databases have in common?
Actually, not much.
You often hear phrases like:
NoSQL is scalable!
NoSQL is for BigData!
NoSQL violates ACID!
NoSQL is a glorified key/value store!
Is that true? Well, some of these statements might be true for some databases commonly called NoSQL, but every single one is also false for at least one other. Actually, the only thing NoSQL databases have in common, is that they are databases which do not use SQL. That's it. The only thing that defines them is what sets them apart from each other.
So what sets NoSQL databases apart?
So we made clear that all those databases commonly referred to as NoSQL are too different to evaluate them together. Each of them needs to be evaluated separately to decide if they are a good fit to solve a specific problem. But where do we begin? Thankfully, NoSQL databases can be grouped into certain categories, which are suitable for different use-cases:
Document-oriented
Examples: MongoDB, CouchDB
Strengths: Heterogenous data, working object-oriented, agile development
Their advantage is that they do not require a consistent data structure. They are useful when your requirements and thus your database layout changes constantly, or when you are dealing with datasets which belong together but still look very differently. When you have a lot of tables with two columns called "key" and "value", then these might be worth looking into.
Graph databases
Examples: Neo4j, GiraffeDB.
Strengths: Data Mining
While most NoSQL databases abandon the concept of managing data relations, these databases embrace it even more than those so-called relational databases.
Their focus is at defining data by its relation to other data. When you have a lot of tables with primary keys which are the primary keys of two other tables (and maybe some data describing the relation between them), then these might be something for you.
Key-Value Stores
Examples: Redis, Cassandra, MemcacheDB
Strengths: Fast lookup of values by known keys
They are very simplistic, but that makes them fast and easy to use. When you have no need for stored procedures, constraints, triggers and all those advanced database features and you just want fast storage and retrieval of your data, then those are for you.
Unfortunately they assume that you know exactly what you are looking for. You need the profile of User157641? No problem, will only take microseconds. But what when you want the names of all users who are aged between 16 and 24, have "waffles" as their favorite food and logged in in the last 24 hours? Tough luck. When you don't have a definite and unique key for a specific result, you can't get it out of your K-V store that easily.
Is SQL obsolete?
Some NoSQL proponents claim that their favorite NoSQL database is the new way of doing things, and SQL is a thing of the past.
Are they right?
No, of course they aren't. While there are problems SQL isn't suitable for, it still got its strengths. Lots of data models are simply best represented as a collection of tables which reference each other. Especially because most database programmers were trained for decades to think of data in a relational way, and trying to press this mindset onto a new technology which wasn't made for it rarely ends well.
NoSQL databases aren't a replacement for SQL - they are an alternative.
Most software ecosystems around the different NoSQL databases aren't as mature yet. While there are advances, you still haven't got supplemental tools which are as mature and powerful as those available for popular SQL databases.
Also, there is much more know-how for SQL around. Generations of computer scientists have spent decades of their careers into research focusing on relational databases, and it shows: The literature written about SQL databases and relational data modelling, both practical and theoretical, could fill multiple libraries full of books. How to build a relational database for your data is a topic so well-researched it's hard to find a corner case where there isn't a generally accepted by-the-book best practice.
Most NoSQL databases, on the other hand, are still in their infancy. We are still figuring out the best way to use them.
What exactly is it?
On one hand, a specific system, but it has also become a generic word for a variety of new data storage backends that do not follow the relational DB model.
How does it work?
Each of the systems labelled with the generic name works differently, but the basic idea is to offer better scalability and performance by using DB models that don't support all the functionality of a generic RDBMS, but still enough functionality to be useful. In a way it's like MySQL, which at one time lacked support for transactions but, exactly because of that, managed to outperform other DB systems. If you could write your app in a way that didn't require transactions, it was great.
Why would it be better than using a SQL Database? And how much better is it?
It would be better when your site needs to scale so massively that the best RDBMS running on the best hardware you can afford and optimized as much as possible simply can't keep up with the load. How much better it is depends on the specific use case (lots of update activity combined with lots of joins is very hard on "traditional" RDBMSs) - could well be a factor of 1000 in extreme cases.
Is the technology too new to start implementing yet or is it worth taking a look into?
Depends mainly on what you're trying to achieve. It's certainly mature enough to use. But few applications really need to scale that massively. For most, a traditional RDBMS is sufficient. However, with internet usage becoming more ubiquitous all the time, it's quite likely that applications that do will become more common (though probably not dominant).
Since someone said that my previous post was off-topic, I'll try to compensate :-) NoSQL is not, and never was, intended to be a replacement for more mainstream SQL databases, but a couple of words are in order to get things in the right perspective.
At the very heart of the NoSQL philosophy lies the consideration that, possibly for commercial and portability reasons, SQL engines tend to disregard the tremendous power of the UNIX operating system and its derivatives.
With a filesystem-based database, you can take immediate advantage of the ever-increasing capabilities and power of the underlying operating system, which have been steadily increasing for many years now in accordance with Moore's law. With this approach, many operating-system commands become automatically also "database operators" (think of "ls" "sort", "find" and the other countless UNIX shell utilities).
With this in mind, and a bit of creativity, you can indeed devise a filesystem-based database that is able to overcome the limitations of many common SQL engines, at least for specific usage patterns, which is the whole point behind NoSQL's philosophy, the way I see it.
I run hundreds of web sites and they all use NoSQL to a greater or lesser extent. In fact, they do not host huge amounts of data, but even if some of them did I could probably think of a creative use of NoSQL and the filesystem to overcome any bottlenecks. Something that would likely be more difficult with traditional SQL "jails". I urge you to google for "unix", "manis" and "shaffer" to understand what I mean.
If I recall correctly, it refers to types of databases that don't necessarily follow the relational form. Document databases come to mind, databases without a specific structure, and which don't use SQL as a specific query language.
It's generally better suited to web applications that rely on performance of the database, and don't need more advanced features of Relation Database Engines. For example, a Key->Value store providing a simple query by id interface might be 10-100x faster than the corresponding SQL server implementation, with a lower developer maintenance cost.
One example is this paper for an OLTP Tuple Store, which sacrificed transactions for single threaded processing (no concurrency problem because no concurrency allowed), and kept all data in memory; achieving 10-100x better performance as compared to a similar RDBMS driven system. Basically, it's moving away from the 'One Size Fits All' view of SQL and database systems.
In practice, NoSQL is a database system which supports fast access to large binary objects (docs, jpgs etc) using a key based access strategy. This is a departure from the traditional SQL access which is only good enough for alphanumeric values. Not only the internal storage and access strategy but also the syntax and limitations on the display format restricts the traditional SQL. BLOB implementations of traditional relational databases too suffer from these restrictions.
Behind the scene it is an indirect admission of the failure of the SQL model to support any form of OLTP or support for new dataformats. "Support" means not just store but full access capabilities - programmatic and querywise using the standard model.
Relational enthusiasts were quick to modify the defnition of NoSQL from Not-SQL to Not-Only-SQL to keep SQL still in the picture! This is not good especially when we see that most Java programs today resort to ORM mapping of the underlying relational model. A new concept must have a clearcut definition. Else it will end up like SOA.
The basis of the NoSQL systems lies in the random key - value pair. But this is not new. Traditional database systems like IMS and IDMS did support hashed ramdom keys (without making use of any index) and they still do. In fact IDMS already has a keyword NONSQL where they support SQL access to their older network database which they termed as NONSQL.
It's like Jacuzzi: both a brand and a generic name. It's not just a specific technology, but rather a specific type of technology, in this case referring to large-scale (often sparse) "databases" like Google's BigTable or CouchDB.
NoSQL the actual program appears to be a relational database implemented in awk using flat files on the backend. Though they profess, "NoSQL essentially has no arbitrary limits, and can work where other products can't. For example there is no limit on data field size, the number of columns, or file size" , I don't think it is the large scale database of the future.
As Joel says, massively scalable databases like BigTable or HBase, are much more interesting. GQL is the query language associated with BigTable and App Engine. It's largely SQL tweaked to avoid features Google considers bottle-necks (like joins). However, I haven't heard this referred to as "NoSQL" before.
NoSQL is a database system which doesn't use string based SQL queries to fetch data.
Instead you build queries using an API they will provide, for example Amazon DynamoDB is a good example of a NoSQL database.
NoSQL databases are better for large applications where scalability is important.
Does NoSQL mean non-relational database?
Yes, NoSQL is different from RDBMS and OLAP. It uses looser consistency models than traditional relational databases.
Consistency models are used in distributed systems like distributed shared memory systems or distributed data store.
How it works internally?
NoSQL database systems are often highly optimized for retrieval and appending operations and often offer little functionality beyond record storage (e.g. key-value stores). The reduced run-time flexibility compared to full SQL systems is compensated by marked gains in scalability and performance for certain data models.
It can work on Structured and Unstructured Data. It uses Collections instead of Tables
How do you query such "database"?
Watch SQL vs NoSQL: Battle of the Backends; it explains it all.

Any Validity to the NoSQL movement?

First of all im relatively new to the Database world, Im graduating with my B.S. in Comp Science this semester and Database Technologies have really caught my eye so ive been studying alot of T-SQL because I want to in the end get a SQL Development job (MS SQL server seemed like the best choice right now because it's on the rise)
ANYWAYS, i've heard alot of hoopla about this NOsql movement of Non-relational database management systems. Trying to keep this question and non-subjective as possible i mainly want to know the advantages/disadvantages of NRDBMS's (like Nosql) and if there is really a future in them. Perhaps as a side question, is it a bad time to be studying SQL in general (specifically the normal RDBMS's we are so used to). I forsee people sticking with this for a long time, but then again.....I dont know. I'd hate to see my interest suddenly be taking a dive in the market.
There is definitely validity to the NoSQL movement, but I wouldn't worry about your SQL skills going to waste. NoSQL storage architectures were born out of the need for highly available and scalable data stores that went beyond what a typical relational database could provide. This comes at a cost though, and typically that cost is guaranteed consistency. This isn't always a large concern. In the case of something like Facebook doesn't have complete consistency for a period of time for things like your pictures, status updates, etc. As long as they get consistent at some point, it's okay. On the other end, take your bank account. That type of data store needs to provide the strong ACID characteristics that a relational database provides.
NoSQL isn't something that I see taking over the world, it's an alternative to the common approach of RDBMS's and as with everything else it has it's strengths and weaknesses.
Here is an excellent article on the subject written about NetFlix.
Others can address the NoSQL specifics better than I can, but as for the second part of your question (worrying about getting into SQL if NoSQL starts becoming more popular): I have customers who still use very old flat-file based mainframes.
SQL hasn't even reached full penetration yet, and it is VERY entrenched in a large number of business processes. The market for SQL development and maintenance won't be going away any time soon, and if it starts to it won't be overnight - you'll have time to learn the Next Big Thing before you're obsolete.
NoSql databases are great for storing unstructured data. Think of it as the next generation of Lotus Notes.
I wouldn't leverage a NoSql database for storing a list of people and addresses, as those are completely structured and well known.
However, if I had a set of dynamic attributes of some type (name/value pairs) or something a similar which required a lot of pivoting to get to, then I'd seriously look into it. I might even go that route even if there is structure, but it isn't known ahead of time. Such as with dynamic tables.
That said, when we did some evaluations earlier this year (March 2010) and we didn't think the state of the available open source NoSql databases were ready for serious production. There's a lot more to databases than just putting data in and getting it out. Automated backups, load balancing, solid query tools, consistency checkers, etc are an absolute must. We will reevaluate early next year.
SQL ain't going away, and the relational model is a basic information-systems building block that's definitely worth studying and understanding in its own right. I'd stick with it.
Databases based on an object instead of relational model have existed forever. The difference is that in the past they tended to be closed (and expensive!) packages from single vendors. No-one really wants to have their mission-critical apps locked into a proprietary database, dependent on licensing from a single, sometimes unresponsive, supplier.
In contrast today's NoSQL databases are typically free, open, and well-aligned to existing web-oriented technologies, allowing for quick, responsive scaling without worrying about licenses, and potential participation in future development (or local forking/patching if necessary).
What they also are is diverse, such that you can't really classify them all together as being good for a particular kind of task. There are trivial key-value buckets that make no attempt at being ACID-safe, there are object databases with their own safety paradigms (like CouchDB's revision conflicts), there are more traditional relational-like databases that just don't use SQL as a query mechanism (because let's face it, nice though it is that you can use the same query language across databases, hacking together SQL queries into a string just so that the database at the other end can pick the string apart to get the logic of the query you wanted to do, is a bit silly).
There are lots of them, most are very immature compared to the ancient edifice of SQL, and it'll take a while for winners to emerge. Is NoSQL “valid”? Sure. But I would say to use a particular NoSQL database as a basis for study today (as opposed to using one that fits your needs for a particular task that SQL is bad at) would be premature.
The future of big systems will require skills with both SQL and NoSQL.
NoSQL is an important paradigm and it's not going anywhere. Joins don't scale horizontally and SQL database are effectively just big "join machines". NoSQL is still in relative infancy, there are tons of players and just like SQL, each one has its own little variations.
But that's all going to shake out in the next few years
As a recent grad, you have to start somewhere. SQL is simply the easiest place to start. You will see lots of it going forward. However, once you've got your head around SQL (say you've passed your MS T-SQL course), I strongly suggest taking a look at something like MongoDB/Riak/CouchDB as your next adventure.
You probably won't jump into a company using NoSQL, but you will run in to problems where NoSQL is actually a much simpler solution. But you won't know this until you actually play with NoSQL.
It sounds like you're already pointing in the right direction by looking at job postings and seeing what current needs are in the way of data storage and management, if this is your passion. I wouldn't be surprised if interviews will start asking about the advantages/disadvantages of nosql just to see if you're familiar with the latest developments (and if you're applying for a dba position, they might also ask about ACID compliance and the CAP theorem).
Lots of companies are starting to use NoSQL technologies, so it's valid in that people are using it. And not just small startups either, but companies like facebook (cassandra), yahoo (hadoop), google (bigtable), and etsy (mongodb) believe that nosql solutions fit certain needs.
I think NoSQL is more of a niche. It's really good for some applications, but will probably never totally displace RDBMSs (although combinations of NoSQL on top of an RDBMS backend seem to be coming out more I hear). Advice would be to get good with an old-school RDBMS (it's still much more common, at least from what I've seen), and then get into NoSQL on the side if you want.
Brent Ozar did an excellent writeup on the topic here: NoSQL Basics for Database Administrators

What makes Cassandra (and NoSQL in general) a better solution to an RDBMS?

Well, NoSQL is a buzzword right now so I've been looking into it. I'm yet to get my head around ColumnFamilies and SuperColumns, etc... But I have been looking at how the data is mapped.
After reading this article, and others, it seems the data is mapped in a JSON like format.
Users = {
1: {
username: "dave",
password: "blahblah",
dateReged: "1/1/1"
},
2: {
username: "etc",
password: "blahblah",
dateReged: "2/1/1",
comment: "this guy has a comment and dave doesns't"
},
}
The RDBMS format would be:
Table name: "Users"
id | username | password | dateReged | comment
---+----------+----------+-----------+--------
1 | dave | blahblah | 1/1/1 |
---+----------+----------+-----------+--------
2 | etc | blahblah | 2/1/1 | this guy has a comment and dave doesn't
Assuming I understand this correctly and my above examples are right, why would I choose the RDBMS design over the NoSQL design? Personally, I'd much rather work with the JSON structure... Does this mean I should choose NoSQL over, say, MySQL?
I guess what I'm asking is "when should I choose NoSQL over RDBMS?"
On a side note, as I've said, I'm still not fully understanding how to go about implementing a Cassandra database. Ie, how do I create the above Users table in a new database? Any tutorials, documentation, etc you could point to would be great. My google'ing hasn't turned up much in terms of 'starting from scratch'...
If you are google, then you might be in a position where a NoSQL would be easier on you than a RDBMS. Since you are not, the many advantages an RDBMS provides you will probably be of some use. Significantly, on a single node, NoSQL offers absolutely no advantages over RDBMSes. RDBMSes offer lots of advantages over NoSQL, though. what are they?
RDBMSes use some pretty deep magic to understand the data it owns, and the data you are asking for, in such a way that it can return that data in the most efficient manner possible. If you didn't ask about some column, the rdbms doesn't waste any effort retrieving it. If you are interested in rows that have fields in common across two tables, (this is a join, btw), the RDBMS doesn't have to check every single pair of rows for matches, or what a NoSQL db usually does is just give you everything and make you do the checking. with a RDBMS, you can usually construct queries that are actually 'about' the data you are using, like "if the date is a tuesday", and if your indexes support it (if you do that query alot then you would add such an index) you can get those rows efficiently.
There is another reason why RDBMSes are nice. Transactions are easy on RDBMSes, but are much harder to get right on NoSQL databases. Supposing you are implementing a blogging engine. Suppose the post title (which appears in the URL) needs to be unique across all posts. In an RDBMS, you can easily be sure that you won't get this wrong accidentally. With a NoSQL database, if it does support some kind of transactional integrity, it's usually at the shard level, anything that could possibly require that kind of integrity must be on the same shard. since any pair of users could possibly be posting at the same moment, then every users' post must be on the same shard to get the same effect. Well, then you don't get any benefit at all from NoSQL.
The main advantage of NoSQL is horizontal scalability and distributed storage. That means you can have a large number of 'cluster nodes' and write to them in parallel. The cluster will ensure changes are propagated to the other cluster nodes eventually (eventual consistency).
NoSQL is not so much about SQL (the term means "not only SQL"). In fact, some NoSQL products do support a subset of SQL. The reason the data format is different (JSON or list of property / value pairs versus tabular data) is: within relational databases, the number of columns (and column names) is defined in a central place, which doesn't work well with horizontal scalability (you would need to stop all cluster nodes for schema changes). Also, joins are not supported as much because that would break horizontal scalability (data from multiple cluster nodes may need to be read, if the data is distributed).
NoSQl databases are fine for some websites where you don't need transaction or consistency where all you are doing is presenting some data (but until you get really really large, they are not really very needed).
But if you need to enforce financial rules (or other complex data integrity rules) or internal controls or reporting and aggregating data for reporting, you need an RDBMS. I'll bet even Google uses RDBMS' for their own HR and financial data, etc.
For some web applications, you might even want a combination of both, the nosql database for some types of information, the transactional relational database for orders and other things where transactional consistency is a must.
If you develop web sites, I think you need to thoroughly understand both types of databases and the needs behind them before choosing how to handle any new functionality.
It seems to me that you have almost no knowledge of relational databases and would rather do what is easier for you personally than what is right for the project. Maybe I'm not reading that correctly, but anyone who never uses joins is suspect in terms of understanding relational databases.
You don't decide between these two based on which one seems easier to understand or which is the buzzword of the month, you decide them based on the functionality you will need, not just for the user interface but for administrative tasks, reporting, financial or other types of data auditing, government regulation, data recovery in case of a hardware failure, etc.
RDBMS' are all about consistency. They do a great job on data that gets churned alot with transactions. See also ACID (atomicity, consistency, isolation, durability). Sometimes you don't need all that, like when storing data from logs or working on data that's not going to change, just accumulate.
NoSQL databases let you relax the requirements for transactions and get better performance (as well as scale to large distributed storage silos easier).
The advantage fo NoSql is that its simpler and if you have your OO blinkers on it fullfills all your persistence needs.
The advantage of SQL based realtional database is that you can easily re-use and extend your data in ways that were not envisaged in the original design. Also "Object" databases tend to perform very badly (even if its possable) when you want to do the equivalent of SQLs aggregate queries like COUNT, SUM, AVG.
Googles BIGTABLE which is the biggest OO database anywhere (and probably the biggest database period) also supports SQL and sql features like indexing and strong typing.
Answer is easy. If you need data storage - use NoSQL, if you need more features then just storing data - use RDBMS.
I guess what I'm asking is "when should I choose NoSQL over RDBMS?"
[Caveat: I've never read about NoSQL before]
According to Wikipedia, NoSQL isn't good at joins: which implies (to me) no referential integrity and no normalization.
As many books about NoSQL mention, it's not about which database is better than the other. It's more what you need.
As everyone say in the other answers, many NoSQL databases support horizontal scalability and are focused on high availability but they are not always the best fit for your needs.
for example, Cassandra is great to add or remove nodes from a cluster, allowing that high scalability. But when you compare Cassandra with MySQL in an environment with just one node (one server), and with no distributed architecture, there isn't a lot of different, since the main advantages of Cassandra are not used.
Now, why should you use SQL? The most common reason is transaction management. Currently, no popular NoSQL database natively supports transactions. You can emulate them, but they are not part of the native functionality as in most SQL databases.
For Cassandra, there is a full and free training in https://academy.datastax.com
There you won't only find trainings to install and configure Cassandra, but to use its tools. It even gives you completion certificates.
Datastax has its own distribution of Cassandra, but it follows all the same guidelines as the Apache project; it offers some extra tools.
The simplest answer I can think of is: When your data doesn't fit a relational model.
I gave a talk at OSCON about when NoSQL can be the right choice, and some of the different sub-categories to be aware of: http://assets.en.oreilly.com/1/event/45/The%20NoSQL%20Ecosystem%20Presentation.pdf
Cassandra in and of itself is not better than an RDBMS. It is better under some circumstances. An RDBMS is vastly superior for transaction processing, master data management, reference data, data warehousing and (some forms of) BI.
Use NOSQL if your application requires a flexible schema, variable length rows, variable types of columns, eventual integrity, horizonal scalability on commodity servers, and high availability achieved by means of a distributed architecture.
NOSQL does not do joins for several reasons: you already joined the data before the NOSQL file was loaded so there is no need to; because a distributed join over far-reaching servers would be resource intensive. The first reason above is simple: you have embedded all the data you need into a single structure. If you do not embed the data and have to link, don't expect great performance out of it. Linking is a euphemism for application-provided joining without the benefit of consolidating the data as a join does. Assuming hashing a key is the method of data distribution, different records that have the same hash key would be collocated. Thereby if joining were permitted, the joined data would all be on the same server.
It's not just black and white.

Strategic issue: Mixing relational and non-relational db?

There has been a lot of talk about contra-revolutionary NoSQL databases like Cassandra, CouchDB, Hypertable, MongoDB, Project Voldemort, BigTable, and so many more. As far as I am concerned, the strongest pros are scalability, performance and simplicity.
I am seriously considering to suggest using some non-relational db for our next project. However, some teams comprise some RDBMS fanatics, so convincing a hard switch might be impossible in some cases just because of emotional reasons. Also, when it comes to complex data models, I personally still believe in the power of RDBMS with their low-level consistance enforcement mechanisms.
Now here comes my question: I was wondering, if someone could seriously consider using both, RDBMS and non-relational DB in a new project: The complex, but not performance critical data model would still be implemented using a relational model and db, while all performance critical, yet, simple models would be implemented with a non-relational db. Furthermore, such a soft paradigm shift would be much easier to sell to some highly emotionalized team members than a hard one.
Would anyone recommend such an approach? Or would you rather recommend a black or white, i.e. relational or non-relational approach? All comments highly welcome!
P.S.: Any idea if such a mix-up works well with Spring and Hibernate/JPA?
Rob Conery recently wrote about his experience building his popular web application TekPub with both MongoDB and MySQL, highlighting the strengths of both:
The high-read stuff (account info, productions and episode info) is perfect for a "right now" kind of thing like MongoDb. The "what happened yesterday" stuff is perfect for a relational system.
At a high-level, Rob breaks their application data into two scopes: runtime data and historical data. The current state of a user's shopping cart, for example, is great for keeping in MongoDB. It's an object blob that is constantly mutating. To keep a historical record of what went in and out of the shopping cart; when it happened; the state of the checkout are great for relational, tabular data in MySQL.
He summarizes with this:
It works perfectly. I could not be happier with our setup. It's incredibly low maintenance, we can back it up like any other solution, and we have the data we need when we need it.
Update May 2016 (6 years later)
This has only become more true in the last 6 years. It's now common to see NoSQL databases powering transactional stores and traditional relational databases powering analytical databases.
I use both. RDBMS is good for complex analysis, reports, and data accessed in different ways by multiple users. NOSQL is great when I'm looking at a data from a single users perspective. A question I ask myself is: "Who is accessing this data?". If the answer is 1 user, I use NOSQL to store it.
Of course, there are other times when it is appropriate, but that's just one example.
As you mentioned, NOSQL is simpler storage, and it could cause you to have to right complex code to maintain data... For example, storing a list of connections.
Non-relational key-value databases are best used for blob storage and caching.

When shouldn't you use a relational database? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
Apart from the google/bigtable scenario, when shouldn't you use a relational database? Why not, and what should you use? (did you learn 'the hard way'?)
In my experience, you shouldn't use a relational database when any one of these criteria are true:
your data is structured as a hierarchy or a graph (network) of arbitrary depth,
the typical access pattern emphasizes reading over writing, or
there’s no requirement for ad-hoc queries.
Deep hierarchies and graphs do not translate well to relational tables. Even with the assistance of proprietary extensions like Oracle's CONNECT BY, chasing down trees is a mighty pain using SQL.
Relational databases add a lot of overhead for simple read access. Transactional and referential integrity are powerful, but overkill for some applications. So for read-mostly applications, a file metaphor is good enough.
Finally, you simply don’t need a relational database with its full-blown query language if there are no unexpected queries anticipated. If there are no suits asking questions like "how many 5%-discounted blue widgets did we sell in on the east coast grouped by salesperson?", and there never will be, then you, sir, can live free of DB.
The relational database paradigm makes some assumptions about usage of data.
A relation consists of an unordered set of rows.
All rows in a relation have the same set of columns.
Each column has a fixed name and data type and semantic meaning on all rows.
Rows in a relation are identified by unique values in primary key column(s).
etc.
These assumptions support simplicity and structure, at the cost of some flexibility. Not all data management tasks fit into this kind of structure. Entities with complex attributes or variable attributes do not, for instance. If you need flexibility in areas where a relational database solution doesn't support it, you need to use a different kind of solution.
There are other solutions for managing data with different requirements. Semantic Web technology, for example, allows each entity to define its own attributes and to be self-describing, by treating metadata as attributes just like data. This is more flexible than the structure imposed by a relational database, but that flexibility comes with a cost of its own.
Overall, you should use the right tool for each job.
See also my other answer to "The Next-gen databases."
There are three main data models (C.J.Date, E.F.Codd) and I am adding a flat file to this:
flat file(s) (structure varies - from 'stupid' flat text to files conforming to grammars which coupled with clever tools do very clever things, think compilers and what they can do, narrow application in modelling new things)
hierarchical (trees, nested sets - examples: xml and other markup languages, registry, organizational charts, etc; anything can be modelled, but integrity rules are not easy to express and retrieval is hard to optimize automatically, some retrieval is fast and some is very slow )
network (networks, graphs - examples: navigational databases, hyperlinks, semantic web, again almost anything can be modelled but automatic optimizing of retrieval is a problem)
relational (first order predicate logic - example: relational databases, automatic optimization of retrieval)
Both hierarchical and network can be represented in relational and relational can be expressed in the other two.
The reason that relational is considered 'better' is the declarative nature and standardization on not only the data retrieval language but also on the data definition language, including the strong declarative data integrity, backed up with stable, scalable, multi-user management system.
Benefits come at a cost, which most projects find to be a good ratio for systems (multi application) that store long term data in a from that will be usable in foreseeable future.
If you are not building a system, but a single application, perhaps for a single user, and you are fairly certain that you will not want multiple applications using your data, nor multiple users, any time soon then you'll probably find faster approaches.
Also if you don't know what kind of data you want to store and how to model it then relational model strengths are wasted on it.
Or if you simply don't care about integrity of your data that much (which can be fine).
All data structures are optimized for a certain kind of use, only relational if properly modelled tries to represent the 'reality' in semantically unbiased way. People who had bad experience with relational databases usually don't realize that their experience would have been much worse with other types of data models. Horrible implementations are possible, and especially with relational databases, where it is relatively easy to build complex models, you could end up with quite a monster on your hands. Still I always feel better when I try to imagine the same monster in xml.
One example of how good relational model is, IMO, is ratio of complexity vs shortness of the questions that you will find that involve SQL.
I suggest you visit the High Scalability blog, which discusses this topic almost on a daily basis and has many articles about projects that chose distributed hashes, etc. over RDMBS.
The quick (but very incomplete answer) is that not all data translates well to tables in efficient ways. For example, if your data is essentially one big dictionary, there are probably much faster alternatives that plain old RDBMS. Having said that, it mostly a matter of performance, and if performance isn't a huge concern in a project, and stability, consistency and reliability, for example, are, then I don't see much point in delving into these technologies when RDBMS is a much more mature and well developed scheme, with support in all languages and platforms and a huge set of solutions to choose from.
Fifteen years ago I was working on a credit risk system (basically a big tree walking system). We were using Sybase on HPUX & solaris and performnce was killing us. We hired in consultants direct from Sybase who said it couldn't be done. Then we switched to an OO database (Object store in this case) and got a about a 100x performance increase (and the code was about 100x easier to write too)
But such situations are quite rare - a relational database is a good first choice.
When you schema varies a lot you will have a hard time with relational databases. This is where XML databases or key-value pair databases work best. or you could use IBM DB2 and have both relational data and XML data managed by a single database engine.
About 7-8 years ago I worked on a web site that grew in popularity beyond our initial expectations and it got us in trouble performance-wise. Since we were all relatively inexperienced in web based projects it posed a significant strain on us about what to do beyond usual database separation onto separate server, load balancing etc.
One day I've thought of something pretty simple. Since site was based on users, their profiles were stored in a database table the usual way someone would do it - user id, lots of info variables and stuff like that - which would show up as a users profile page which other users could look up. I've flushed all that data into a simple html file, already prepared as a users profile page and got a significant boost - basically a cache. I even made a system that when user edited their profile info, it would parse original html file, put it up for edit, and then flush out html back to the file system - got even more boost.
I made something simillar with messages users sent to each other. Basically wherever I could make a system bypass a database altogether, avoiding a INSERT or UPDATE, I got a significant boost. It may sound like a common sense, but it was an enlightening moment. It is not an avoidance of relational setup per se, but it is an avoidance of the database altogether - KISS.