Related
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
I've been hearing things about NoSQL and that it may eventually become the replacement for SQL DB storage methods due to the fact that DB interaction is often a bottle neck for speed on the web.
So I just have a few questions:
What exactly is it?
How does it work?
Why would it be better than using a SQL Database? And how much better is it?
Is the technology too new to start implementing yet or is it worth taking a look into?
There is no such thing as NoSQL!
NoSQL is a buzzword.
For decades, when people were talking about databases, they meant relational databases. And when people were talking about relational databases, they meant those you control with Edgar F. Codd's Structured Query Language. Storing data in some other way? Madness! Anything else is just flatfiles.
But in the past few years, people started to question this dogma. People wondered if tables with rows and columns are really the only way to represent data. People started thinking and coding, and came up with many new concepts how data could be organized. And they started to create new database systems designed for these new ways of working with data.
The philosophies of all these databases were different. But one thing all these databases had in common, was that the Structured Query Language was no longer a good fit for using them. So each database replaced SQL with their own query languages. And so the term NoSQL was born, as a label for all database technologies which defy the classic relational database model.
So what do NoSQL databases have in common?
Actually, not much.
You often hear phrases like:
NoSQL is scalable!
NoSQL is for BigData!
NoSQL violates ACID!
NoSQL is a glorified key/value store!
Is that true? Well, some of these statements might be true for some databases commonly called NoSQL, but every single one is also false for at least one other. Actually, the only thing NoSQL databases have in common, is that they are databases which do not use SQL. That's it. The only thing that defines them is what sets them apart from each other.
So what sets NoSQL databases apart?
So we made clear that all those databases commonly referred to as NoSQL are too different to evaluate them together. Each of them needs to be evaluated separately to decide if they are a good fit to solve a specific problem. But where do we begin? Thankfully, NoSQL databases can be grouped into certain categories, which are suitable for different use-cases:
Document-oriented
Examples: MongoDB, CouchDB
Strengths: Heterogenous data, working object-oriented, agile development
Their advantage is that they do not require a consistent data structure. They are useful when your requirements and thus your database layout changes constantly, or when you are dealing with datasets which belong together but still look very differently. When you have a lot of tables with two columns called "key" and "value", then these might be worth looking into.
Graph databases
Examples: Neo4j, GiraffeDB.
Strengths: Data Mining
While most NoSQL databases abandon the concept of managing data relations, these databases embrace it even more than those so-called relational databases.
Their focus is at defining data by its relation to other data. When you have a lot of tables with primary keys which are the primary keys of two other tables (and maybe some data describing the relation between them), then these might be something for you.
Key-Value Stores
Examples: Redis, Cassandra, MemcacheDB
Strengths: Fast lookup of values by known keys
They are very simplistic, but that makes them fast and easy to use. When you have no need for stored procedures, constraints, triggers and all those advanced database features and you just want fast storage and retrieval of your data, then those are for you.
Unfortunately they assume that you know exactly what you are looking for. You need the profile of User157641? No problem, will only take microseconds. But what when you want the names of all users who are aged between 16 and 24, have "waffles" as their favorite food and logged in in the last 24 hours? Tough luck. When you don't have a definite and unique key for a specific result, you can't get it out of your K-V store that easily.
Is SQL obsolete?
Some NoSQL proponents claim that their favorite NoSQL database is the new way of doing things, and SQL is a thing of the past.
Are they right?
No, of course they aren't. While there are problems SQL isn't suitable for, it still got its strengths. Lots of data models are simply best represented as a collection of tables which reference each other. Especially because most database programmers were trained for decades to think of data in a relational way, and trying to press this mindset onto a new technology which wasn't made for it rarely ends well.
NoSQL databases aren't a replacement for SQL - they are an alternative.
Most software ecosystems around the different NoSQL databases aren't as mature yet. While there are advances, you still haven't got supplemental tools which are as mature and powerful as those available for popular SQL databases.
Also, there is much more know-how for SQL around. Generations of computer scientists have spent decades of their careers into research focusing on relational databases, and it shows: The literature written about SQL databases and relational data modelling, both practical and theoretical, could fill multiple libraries full of books. How to build a relational database for your data is a topic so well-researched it's hard to find a corner case where there isn't a generally accepted by-the-book best practice.
Most NoSQL databases, on the other hand, are still in their infancy. We are still figuring out the best way to use them.
What exactly is it?
On one hand, a specific system, but it has also become a generic word for a variety of new data storage backends that do not follow the relational DB model.
How does it work?
Each of the systems labelled with the generic name works differently, but the basic idea is to offer better scalability and performance by using DB models that don't support all the functionality of a generic RDBMS, but still enough functionality to be useful. In a way it's like MySQL, which at one time lacked support for transactions but, exactly because of that, managed to outperform other DB systems. If you could write your app in a way that didn't require transactions, it was great.
Why would it be better than using a SQL Database? And how much better is it?
It would be better when your site needs to scale so massively that the best RDBMS running on the best hardware you can afford and optimized as much as possible simply can't keep up with the load. How much better it is depends on the specific use case (lots of update activity combined with lots of joins is very hard on "traditional" RDBMSs) - could well be a factor of 1000 in extreme cases.
Is the technology too new to start implementing yet or is it worth taking a look into?
Depends mainly on what you're trying to achieve. It's certainly mature enough to use. But few applications really need to scale that massively. For most, a traditional RDBMS is sufficient. However, with internet usage becoming more ubiquitous all the time, it's quite likely that applications that do will become more common (though probably not dominant).
Since someone said that my previous post was off-topic, I'll try to compensate :-) NoSQL is not, and never was, intended to be a replacement for more mainstream SQL databases, but a couple of words are in order to get things in the right perspective.
At the very heart of the NoSQL philosophy lies the consideration that, possibly for commercial and portability reasons, SQL engines tend to disregard the tremendous power of the UNIX operating system and its derivatives.
With a filesystem-based database, you can take immediate advantage of the ever-increasing capabilities and power of the underlying operating system, which have been steadily increasing for many years now in accordance with Moore's law. With this approach, many operating-system commands become automatically also "database operators" (think of "ls" "sort", "find" and the other countless UNIX shell utilities).
With this in mind, and a bit of creativity, you can indeed devise a filesystem-based database that is able to overcome the limitations of many common SQL engines, at least for specific usage patterns, which is the whole point behind NoSQL's philosophy, the way I see it.
I run hundreds of web sites and they all use NoSQL to a greater or lesser extent. In fact, they do not host huge amounts of data, but even if some of them did I could probably think of a creative use of NoSQL and the filesystem to overcome any bottlenecks. Something that would likely be more difficult with traditional SQL "jails". I urge you to google for "unix", "manis" and "shaffer" to understand what I mean.
If I recall correctly, it refers to types of databases that don't necessarily follow the relational form. Document databases come to mind, databases without a specific structure, and which don't use SQL as a specific query language.
It's generally better suited to web applications that rely on performance of the database, and don't need more advanced features of Relation Database Engines. For example, a Key->Value store providing a simple query by id interface might be 10-100x faster than the corresponding SQL server implementation, with a lower developer maintenance cost.
One example is this paper for an OLTP Tuple Store, which sacrificed transactions for single threaded processing (no concurrency problem because no concurrency allowed), and kept all data in memory; achieving 10-100x better performance as compared to a similar RDBMS driven system. Basically, it's moving away from the 'One Size Fits All' view of SQL and database systems.
In practice, NoSQL is a database system which supports fast access to large binary objects (docs, jpgs etc) using a key based access strategy. This is a departure from the traditional SQL access which is only good enough for alphanumeric values. Not only the internal storage and access strategy but also the syntax and limitations on the display format restricts the traditional SQL. BLOB implementations of traditional relational databases too suffer from these restrictions.
Behind the scene it is an indirect admission of the failure of the SQL model to support any form of OLTP or support for new dataformats. "Support" means not just store but full access capabilities - programmatic and querywise using the standard model.
Relational enthusiasts were quick to modify the defnition of NoSQL from Not-SQL to Not-Only-SQL to keep SQL still in the picture! This is not good especially when we see that most Java programs today resort to ORM mapping of the underlying relational model. A new concept must have a clearcut definition. Else it will end up like SOA.
The basis of the NoSQL systems lies in the random key - value pair. But this is not new. Traditional database systems like IMS and IDMS did support hashed ramdom keys (without making use of any index) and they still do. In fact IDMS already has a keyword NONSQL where they support SQL access to their older network database which they termed as NONSQL.
It's like Jacuzzi: both a brand and a generic name. It's not just a specific technology, but rather a specific type of technology, in this case referring to large-scale (often sparse) "databases" like Google's BigTable or CouchDB.
NoSQL the actual program appears to be a relational database implemented in awk using flat files on the backend. Though they profess, "NoSQL essentially has no arbitrary limits, and can work where other products can't. For example there is no limit on data field size, the number of columns, or file size" , I don't think it is the large scale database of the future.
As Joel says, massively scalable databases like BigTable or HBase, are much more interesting. GQL is the query language associated with BigTable and App Engine. It's largely SQL tweaked to avoid features Google considers bottle-necks (like joins). However, I haven't heard this referred to as "NoSQL" before.
NoSQL is a database system which doesn't use string based SQL queries to fetch data.
Instead you build queries using an API they will provide, for example Amazon DynamoDB is a good example of a NoSQL database.
NoSQL databases are better for large applications where scalability is important.
Does NoSQL mean non-relational database?
Yes, NoSQL is different from RDBMS and OLAP. It uses looser consistency models than traditional relational databases.
Consistency models are used in distributed systems like distributed shared memory systems or distributed data store.
How it works internally?
NoSQL database systems are often highly optimized for retrieval and appending operations and often offer little functionality beyond record storage (e.g. key-value stores). The reduced run-time flexibility compared to full SQL systems is compensated by marked gains in scalability and performance for certain data models.
It can work on Structured and Unstructured Data. It uses Collections instead of Tables
How do you query such "database"?
Watch SQL vs NoSQL: Battle of the Backends; it explains it all.
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I occasionally hear things about how SQL sucks and it's not a good language, but I never really hear much about alternatives to it. So, are other good languages that serve the same purpose (database access) and what makes them better than SQL? Are there any good databases that use this alternative language?
EDIT:
I'm familiar with SQL and use it all the time. I don't have a problem with it, I'm just interested in any alternatives that might exist, and why people like them better.
I'm also not looking for alternative kinds of databases (the NoSQL movement), just different ways of accessing databases.
I certainly agree that SQL's syntax is difficult to work with, both from the standpoint of automatically generating it, and from the standpoint of parsing it, and it's not the style of language we would write today if we were designing SQL for the demands we place on it today. I don't think we'd find so many varied keywords if we designed the language today, I suspect join syntax would be different, functions like GROUP_CONCAT would have more regular syntax rather than sticking more keywords in the middle of the parentheses to control its behavior... create your own laundry list of inconsistencies and redundancies in SQL that you'd like/expect to see smoothed out if we redesigned the language today.
There aren't any alternatives to SQL for speaking to relational databases (i.e. SQL as a protocol), but there are many alternatives to writing SQL in your applications. These alternatives have been implemented in the form of frontends for working with relational databases. Some examples of a frontend include:
SchemeQL and CLSQL, which are probably the most flexible, owing to their Lisp heritage, but they also look like a lot more like SQL than other frontends.
LINQ (in .Net)
ScalaQL and ScalaQuery (in Scala)
SqlStatement, ActiveRecord and many others in Ruby,
HaskellDB
...the list goes on for many other languages.
I think that the underlying theme today is that rather than replace SQL with one new query language, we are instead creating language-specific frontends to hide the SQL in our regular every-day programming languages, and treating SQL as the protocol for talking to relational databases.
Take a look at this list.
Hibernate Query Language is probably the most common. The advantage of Hibernate is that objects map very easily (nearly automatically) to the relational database, and the developer doesn't have to spend much time doing database design. Check out the Hibernate website for more info. I'm sure others will chime in with other interesting query languages...
Of course, there's plenty of NoSQL stuff out there, but you specifically mention that you're not interested in those.
"I occasionally hear things about how SQL sucks and it's not a good language"
SQL is over thirty years old. Insights about "which features make something a 'good' language and which ones make it a 'bad' one" have evolved more rapidly than SQL itself.
Also, SQL is not a language that conforms to current standards of "what it takes to be relational", so, SQL just isn't a relational language to boot.
"but I never really hear much about alternatives to it."
I invite you to ponder the possibility that you are trying to hear only in the wrong places (that is, the commercial DBMS industry exclusively).
"So, are other good languages that serve the same purpose (database access) and what makes them better than SQL?"
Date&Darwen describe the features that a modern data manipulation language must conform to in their "Third Manifesto", the most recent version of which is laid down in their book "Databases, Types & the Relational Model".
"Are there any good databases that use this alternative language?"
If by "good", you mean something like "industrial-strength", then no. The closest thing available would probably be Dataphor.
The Rel project offers an implementation for the Tutorial D language defined in "Databases, Types & The Relational Model", but the current prime goal of Rel is to be educational in nature.
My SIRA_PRISE project offers an implementation for "truly relational" data management, but I hesitate to also label it "an implementation of a language".
And of course, you might also look into some non-relational stuff, as some have proposed, but I personally dismiss non-relational data management as multiple decades of technological regression. Not worth considering, that is.
Oh, and by the way, a software system that is used to manage databases is not "a database", but "a DataBase Management System", "DBMS" for short. Just like a photograph is not the same thing as a camera, and if you are discussing cameras, and you want to avoid confusion, then you should be using the proper word "cameras" instead of "photograph".
Perhaps you're thinking of the criticism C. Date and his friends have uttered against existing relational databases and SQL; they say the systems and language aren't 100% relational, and should be. I don't really see any real problem here; as far as I can see you can have a 100% relational system, if you want, just by disciplining the way in which you use SQL.
What I personally keep running into is the lack of expressive power SQL inherits from its theoretical basis, relational algebra. One issue is the lack of support for the use of domain ordering, which you run into when you work with data marked by dates, timestamps, etcetera. I once tried to do a reporting application entirely in plain SQL on a database full of timestamps and it just wasn't feasible. Another is the lack of support for path traversal: most of my data look like directed graphs that I need to traverse paths in, and SQL can't do it. (It lacks "transitive closure". SQL-1999 can do it with "recursive subqueries" but I haven't seen them in actual use yet. There are also various hacks to make SQL cope but they're ugly.) These problems are also discussed by some of Date's writings, by the way.
Recently I was pointed at .QL which appears to address the transitive closure issue nicely, but I don't know whether it can resolve the issue with ordered domains.
Take a look at LINQ to SQL...
Tried it out a couple months ago and never looked back....
Direct answer: I don't think there's any serious contender out there. DBase and its imitators (Foxpro, Codebase etc) was a contender for a while, but I think they basically lost the database query language war. There have been many other database products that had their own query language, like Progress and Paradox and several others I've used whose names I don't remember and surely many more that I never heard of. But I don't think any other contender even came close to getting a non-trivial share of the market.
As simple proof that there is a difference between a database format and a query language, the last version of DBase I used -- many years ago now -- offerred both the "traditional" DBase query language and SQL, both of which could be used to access the same data.
Side ramble: I wouldn't say that SQL sucks, but it has many flaws. With the benefit of the years of experience and hindsight we now have, I'm sure one could design a better query language. But creating a better query language, and convincing people to use it, are two very different things. Would it be enough better to convince people that it was worth the trouble of learning. People have invested many years of their lives learning to use SQL effectively. Even if your new language is easier to use, there would surely be a learning curve. And how would you migrate your existing systems from SQL to the new language? Etc. It can be done, of course, just like C++, C#, and Java have largely overthrown COBOL and FORTRAN. But it takes a combination of technical superiority and good marketing to pull it off.
Still, I get a chuckle out of people who rush forward to defend SQL anytime someone criticizes it, who insist that any problem you have with SQL must be your own ineptitude in using it and not any fault of SQL, that you must just not have reached the higher plane of thingking necessary to comprehend its perfection, etc. Calm down, take a deep breath: We are insulting a computer language, not your mother.
Back in the 1980's, ObjectStore provided transparent object access. It was kind of like an RDBMS plus an ORM, except without all those extra leaky abstraction layers: it stored objects directly in the database.
So this alternative was really "no language at all", or perhaps "the language you're already using". You'd write C++ code and create or traverse objects as if they were native objects, and the database took care of everything as needed. Kind of like ActiveRecord but it actually worked as well as the ActiveRecord marketing blitzes claim. :-)
(Of course, it didn't have Oracle's marketing muscle, and it didn't have MySQL's zero-cost, so everybody ignored it. And now we try to replicate that with RDBMSs and ORMs, and some people try to argue that tables actually make sense for storing objects, and that writing giant XML file to tell your computer how to map objects to tables is somehow a reasonable solution.)
I think you might be interested in looking at Dataphor, which is an open-source relational development environment with its own database server (which speaks D), and the ability to derive user interfaces from its query language.
Also, it appears Ingres still supports QUEL, and it's open source.
The general movement these days is NoSQL; generally these technologies are:
Distributed "hashtables" that store data as key/value pairs
Document-oriented databases
Personally I think there is nothing wrong with SQL as long as it fits your needs. SQL is expressive and great for working with structured data.
SQL works fine for the domain for which it was designed — interrelated tables of data. This is generally found in traditional business data processing. SQL doesn't work that well when trying to persist a complex network of objects.
If your needs are to store and process relatively traditional data, use some SQL-based DBMS.
In response to your edit:
If you're looking for alternatives to the SQL DML for retrieving data from relational data stores, I've never heard of any serious alternative to SQL.
The knocks SQL gets are not, I think, so much against the language as opposed to the underlying data storage principles on which the language is based. People often confuse the language SQL with the relational data model on which RDBMSes are built.
Relational Databases are not the only kind of databases around. I should say a word about Object-Databases as I havn't seen it in responses from others. I had some experience with the Zope python framework that use ZODB for objects persistency instead of RDBMS (well, it's theoretically possible to replace ZODB by another database within zope but the last time I checked I didn't succeed to have it working, so can't be positive about that).
The ZODB mindset is really different, more like object programming that would happen to be persistent.
ORM can be seen as a kind of language
In a way I believe the Object-database model is what ORM are about : accessing persistent data through your usual object model. It's a kind of language and it's gaining some market share, but for now we don't see it as a language but as an abstraction layer. However I believe it would be much more efficient to use an ORM over an Object-database than over SQL (in other words performance of ORMs I happened to use using some SQL database as base layers sucked).
There are many implementations of SQL (SQL Server, mysql, Oracle, etc.), but there is no other language that serves the same purpose in the sense of being a general purpose language designed for relational data storage and retrieval.
There are object databases such as db4o, and there are similar so-called noSQL databases that refer to just about any data storage mechanism that doesn't rely on SQL, but most commonly open-source products like Cassandra based loosely on Google's Bigtable concept.
There are also a number of special-purpose database products like CDF, but you probably don't need to worry about those - if you need one, you'll know.
None of these are equivalent to SQL.
That doesn't mean they're "better" or "worse" - they're just not the same. Dennis Forbes wrote a great post recently breaking down a number of the strange claims surfacing against SQL. He maintains (and I agree) that these complaints originate largely from people and shops who have either picked the wrong tool for the job in the first place, or aren't using their SQL DBMS properly (I'm not even surprised anymore when I see another SQL database where every column is a varchar(50) and there's not a single index or key, anywhere).
If you are implementing yet another social networking site and aren't too concerned with ACID principles, by all means start looking into products such as db4o. If you are developing a mission-critical business system, however, I highly highly recommend that you think twice before joining the "SQL sucks" chorus. Do the research first, find out what features the various products can and cannot support.
Edit - I was busy writing my answer and didn't get the question update from a few minutes. Having said that, SQL is essentially inseparable from the DBMS itself. If you run a SQL database product, then you access it with SQL, period.
Perhaps you are looking for abstractions over the syntax; Linq to SQL, Entity Framework, Hibernate/NHibernate, SubSonic, and a host of other ORM tools all provide their own SQL-like syntax that is not quite SQL. All of these "compile down" to SQL. If you run SQL Server, then you can also write CLR Functions/Procedures/Triggers, which allows you to write code in any .NET language that will run inside the database; however, this isn't really a substitute for SQL, more of an extension to it.
I'm not aware of any full "language" that you can layer on top of a SQL database; short of switching to a different database product, you're eventually going to see SQL on the pipe.
SQL is de-facto.
Frameworks that try to shield developers from it have eventually created their own specific language (Hibernate HQL comes to mind).
SQL solves a problem fairly well. It is no more difficult to learn than a high level programming language. If you already know a functional language then it is a breeze to grasp SQL.
Considering the leading database vendors providing state of the art databases (Oracle and SQL Server) support SQL and have invested years into optimization engines, etc. and all leading data modelling software and change management software deals in SQL, I'd say it is the safest bet.
Also, there is more to a database than just queries. There is scalability, backup and recovery, data mining. The big vendors support a lot of things that even the new "cache" engines don't even consider.
Problems with SQL have motivated me to cook up a draft query language called SMEQL over at the Portland Pattern Repository wiki. Comments Welcome. It borrows ideas from functional programming and IBM's experimental Business System 12 language. (I originally called it TQL, but found later that name was taken.)
Within the .NET world, while it still has a SQL-esque feel to it, LINQ-to-SQL will allow you to have a good mix of SQL and in-memory .NET processing of your data. It also simplifies a lot of the lower-level data plumbing that nobody really wants to do.
If you want to see a database type of a completely different mindset, take a look at CouchDB. "Better" is obviously a relative requirement and this sort of non-relation database is "Better" but only in certain scenarios.
SQL the language is very powerful, and relational database management systems have been and still are a huge success. But there is a class of application that requires very high scalability and availability, but not necessarily a high degree of data consistency (eventual consistency is what matters). A variety of systems get better performance and scaling than an RDBMS by relaxing the need for full ACID compliant transactions. These have been named "NoSQL", but as others point out, this is a misnomer: that perhaps they should be called NoACID databases.
Michael Stonebraker covers this in The "NoSQL" Discussion has Nothing to Do With SQL.
I'm trying to decide what database system to use for storing information that is relatively static but needs to be computed in a number of different (runtime specified) ways. The basic contours of the data are votes in the US Congress:
A bill:
has many roll calls
has a name, and other short metadata
has text, and other potentially long metadata
has a status (passed, failed, in progress)
A roll call:
has a date
has many votes
has a status (passed, failed)
A vote:
belongs to a member of Congress
has a kind (aye, nay, present, not voting)
A member of congress:
has a name (and other short metadata)
has many periods
A period:
has a start and end date
has a political party (Democrat, Republican, other)
has a position (member of Congress, committee chair, Speaker, etc.)
I would like to be able to easily build queries like:
For X, Y, and Z roll call votes, tell me the "Democratic" position and the "Republican" position. Then, rank congressmen in the congress those votes were held by their fidelity to those positions.
For X bill which failed, tell me the closest roll calls. Then, tell me which members of the majority party defected to produce those failures.
For X bill passed, but which was opposed by the majority party, tell me which members of the majority defected to produce the passage.
I will have a finite number of query types like these, but the bills, roll call votes, political parties, etc. involved will be dynamically generated.
What is the best storage mechanism for the underlying data that will allow me to issue these queries dynamically and as performantly as possible?
This looks like pretty standard relational data to me. Any RDBMS (MySQL, SqlServer, postgres, etc.) will do.
Or are you asking advice on how to make tables to store this data?
You could use just about any database, until I read:
...rank congressmen...
MySQL doesn't have any ranking functionality. I'm not clear on Postgres' ranking support, but Oracle and SQL Server have supported ranking for a while now (Oracle 9i+, SQL Server 2005+). And they both provide free versions.
Storage mechanism? Any mainstream database should be capable of dealing with the kind of scenario you are describing. Looks pretty standard stuff to me.
As others have stated - any relational database can support a simple model to solve this problem. However, a few other considerations:
This is an analytical and not transactional app and the commercial databases are currently stronger at analytics - because of the more mature optimizers, greater sql functionality, greater support for parallelism, materialized queries, automatic query rewrite against summary tables, etc.
If you just stick with the US congress and don't decide to also support state congresses and don't also decide to add a hundred years of historical data (all useful requirements), then pretty much any popular relational database could handle the performance issues. But if you do decide to get into the state level then I'd consider the commercial databases first.
Of the open source databases I'd consider the analytical functionality of postgresql to be the most mature.
Here's where I would usually chime in and say, use CouchDB or some other schema-free NOSQL database. But the way the problem is spec'd lays out nicely for a relational store. Plus, there's not a terribly large amount of data that would require distributed processing a la mapreduce.
That being said, if the question was framed a bit differently, without the initial relational bias (you're already in data design mode :) ), then a system like CouchDB could work. Depending on the analyses to be performed, a more document-centered approach might be helpful, as all the information needed for an analysis is present on each document (denormalized) and would avoid expensive joins.
Each bill might be one of these docs (json in CouchDB's case), and the rollcalls/votes/congress members with periods as sub-attribs/etc are all on the one 'bill' document. You could then mapreduce over all of the 'bill' documents performing your queries. A different document-oriented design might make sense depending on query requirements.
As the data set grows, you're not worried about size/performance, because you can always use more servers to perform mapreduce queries and distribute the load. Further, schemaless means documents can change as your app changes, without expensive rdbms table locking. But again, this data set doesn't change terribly often, and is not massive.
in this post Stack Overflow Architecture i read about something called nosql, i didn't understand what it means, and i tried to search on google but seams that i can't get exactly whats it.
Can anyone explain what nosql means in simple words?
If you've ever worked with a database, you've probably worked with a relational database. Examples would be an Access database, SQL Server, or MySQL. When you think about tables in these kinds of databases, you generally think of a grid, like in Excel. You have to name each column of your database table, and you have to specify whether all the values in that column are integers, strings, etc. Finally, when you want to look up information in that table, you have to use a language called SQL.
A new trend is forming around non-relational databases, that is, databases that do not fall into a neat grid. You don't have to specify which things are integers and strings and booleans, etc. These types of databases are more flexible, but they don't use SQL, because they are not structured that way.
Put simply, that is why they are "NoSQL" databases.
The advantage of using a NoSQL database is that you don't have to know exactly what your data will look like ahead of time. Perhaps you have a Contacts table, but you don't know what kind of information you'll want to store about each contact. In a relational database, you need to make columns like "Name" and "Address". If you find out later on that you need a phone number, you have to add a column for that. There's no need for this kind of planning/structuring in a NoSQL database. There are also potential scaling advantages, but that is a bit controversial, so I won't make any claims there.
Disadvantages of NoSQL databases is really the lack of SQL. SQL is simple and ubiquitous. SQL allows you to slice and dice your data easier to get aggregate results, whereas it's a bit more complicated in NoSQL databases (you'll probably use things like MapReduce, for which there is a bit of a learning curve).
From the NoSQL Homepage
NoSQL is a fast, portable, relational database management system without arbitrary limits, (other than memory and processor speed) that runs under, and interacts with, the UNIX 1 Operating System. It uses the "Operator-Stream Paradigm" described in "Unix Review", March, 1991, page 24, entitled "A 4GL Language". There are a number of "operators" that each perform a unique function on the data. The "stream" is supplied by the UNIX Input/Output redirection mechanism. Therefore each operator processes some data and then passes it along to the next operator via the UNIX pipe function. This is very efficient as UNIX pipes are implemented in memory. NoSQL is compliant with the "Relational Model".
I would also see this answer on Stackoverflow.
Put simply, it means not using a relational database for data storage.
Here's a relevant article: http://www.computerworld.com/s/article/9135086/No_to_SQL_Anti_database_movement_gains_steam_
NoSql is the new database philosophy which talks about all the shortcomings of the relational database design, particularly the problems they have in scaling up for today's demanding web environments.
NoSql is quickly evolving into a movement with new tools, software and formats coming up as alternative to SQL.
RDBMS is as ubiquitous as OOP and while both of these design methodologies solve some problems wonderfully, they don't solve all.
So think of NoSql as the functional programmin of the database world.
Was this simple enough?
NoSQL is the idea that SQL-type databases don't satisfy the demands/requirements of a heavily-used database that requires transactions be reliable and failsafe (or close to it). This ties into the ideas of ACID and CAP, both things worth looking into but not something to lose sleep over unless you run a really popular site that is transaction-heavy (ie Amazon or Ebay). To get a great start on these subjects, I suggest:
http://www.eflorenzano.com/blog/post/my-thoughts-nosql/
and
http://www.julianbrowne.com/article/viewer/brewers-cap-theorem
Something everyone considering a "nosql" approach should consider:
(I shan't risk putting the image into this post as it contains a curse word, and I don't want offensive flags. So clicker beware -- there's an f-word in there. Only click if you have a sense of humor.)
http://browsertoolkit.com/fault-tolerance.png
Found this nice article about no-sql
and this as well:
NoSQL, Yes Search
Why are SQL distributions so non-standard despite an ANSI standard existing for SQL? Are there really that many meaningful differences in the way SQL databases work or is it just the two databases with which I have been working: MS-SQL and PostgreSQL? Why do these differences arise?
The ANSI standard specifies only a limited set of commands and data types. Once you go beyond those, the implementors are on their own. And some very important concepts aren't specified at all, such as auto-incrementing columns. SQLite just picks the first non-null integer, MySQL requires AUTO INCREMENT, PostgreSQL uses sequences, etc. It's a mess, and that's only among the OSS databases! Try getting Oracle, Microsoft, and IBM to collectively decide on a tricky bit of functionality.
It's a form of "Stealth lock-in". Joel goes into great detail here:
http://www.joelonsoftware.com/articles/fog0000000056.html
http://www.joelonsoftware.com/articles/fog0000000052.html
Companies end up tying their business functionality to non-standard or weird unsupported functionality in their implementation, this restricts their ability to move away from their vendor to a competitor.
On the other hand, it's pretty short-sighted because anyone with half a brain will tend to abstract away the proprietary pieces, or avoid the lock-in altogether, if it gets too egregious.
First, I don't find databases to be as, say, browsers or operating systems in terms of incompatibility. Anyone with a few hours of training can start doing selects, inserts, deletes and updates on any SQL database. Meanwhile, it's difficult to write HTML that renders identically on every browser or write system code for more than one OS. Generally, differences in SQL are related to performance or fairly esoteric features. The major exception seems to be date formats and functions.
Second, database developers generally are motivated to add features that differentiate their product from everyone else. Products like Oracle, MS SQL Server and MySQL are vast ecosystems that rarely cross-pollinate in practice. At my workplace, we use Oracle and MySQL, but we could probably switch over to 100% Oracle in about a day if needed or desired. So I care a lot about the shiny toys Oracle gives us with each release, but I don't even know what version of MySQL we are using. IBM, Microsoft, PostgreSQL and the rest might as well not exist as far as we are concerned. Having the features to get and keep customers and users is far more important than compatibility in the database world. (That's the positive spin on the "lock-in" answer, I suppose.)
Third, there are legitimate reasons for different companies to implement SQL differently. For instance, Oracle has a multi-versioning system that allows very fast and scalable consistent reads. Other databases lack that feature, but usually are faster inserting rows and rolling back transactions. This is a fundamental difference in these systems. It doesn't make one better than the other (at least in the general case), just different. One should not be surprised if the SQL ontop of a database engine takes advantage of its strengths and attempts to minimize its weaknesses. In fact, it would be irresponsible of the developers to not do this.
John: The standard actually covers lots of subjects, including identity columns, sequences, triggers, routines, upsert, etc. But of course, many of these standards-components may have been brought in place later than the first implementations; and this could be a reason why SQL standards compliance is somewhat low, generally.
Neall: There are actually areas where the SQL standard is ahead of the implementations. For example, it would be nice to have CREATE ASSERTION, but as far as I know, no DBMS implements assertions yet.
Personally, I believe that the closed nature of some ISO standards (like the SQL standard) is part of the problem: When a standard is not readily available online, it's less likely to be known by implementors/planners, and too few customers ask for compliance because they don't know what to ask for.
It's certainly effective lock-in, as 1800 says. But in fairness to the database vendors, the SQL standard is always playing catch-up to current databases' feature sets. Most databases we have today are of pretty ancient lineages. If you trace Microsoft SQL Server back to its roots, I think you'll find Ingres - one of the very first relational databases written in the '70s. And Postgres was originally written by some of the same people in the '80s as a successor to Ingres. Oracle goes way back, and I'm not sure where MySQL came in.
Database non-portability does suck, but it could be a lot worse.