Many articles claim that relational databases cannot be scaled and NOSQL is better at it but do not explain why. Scalability is often projected as an advantage of NOSQL. What is the problem with scaling relational databases? What makes NOSQL databases superior to relational databases in the aspect of scalability?
Both SQL and NOSQL databases can scale. However, NOSQL databases have some simplified functionality that can improve scalability.
For instance, SQL databases generally enforce a set of properties called ACID properties. These ensure the consistency of the data over time and the ability implement an entire transaction "all at once".
However, when running in a multi-processor environment, there is overhead to strictly maintaining the ACID properties. Basically, the data needs to look the same from any processor at the same time.
NOSQL databases often implement "ACID-lite". For instance, they offer "eventual-consistency". This means that for a few seconds or minutes, a query might return different values depending on which processor(s) process it. And, this is fine for many applications.
This truly depends on the requirement of the enterprise in long run and volume of the data expected. The other key factor is the requirement in terms of do we need OLTP kind of scenario only and reporting is less which means implementing ACID scenario. No SQL is usually best for the scenario where reporting is vital as compare to SQL. As both carry its own Mertis but ideally its hybrid model to take adavntage of both usually works better where you have scalability and better transaction control on SQL DB's and high performance rreporting using NO SQL DB which allow all level of freedom such as graph DB, Key value pair. There are lot of intresting comparision are available evne for specific DB you want to evaluate.
Puneet
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
I've been tasked with a project to collect server configuration metadata from Windows servers and storing it in a DB for the purpose of reporting. I will be collecting data for over 100 configuration fields for each server.
One of the tasks the client wants to be able to do is compare config data for either the same server at different points in time, or two different servers which have the same function (i.e. Exchange servers). To see if there are any differences and what those differences may be.
As for DB design, I would normally just normalize all of the data into a OLTP type schema, where all of the similar config items would be persisted to a table relating to their specific area (e.g. Hardware info). But I'm thinking this may be a bad move and I should be looking to save this to some kind of OLAP type data warehouse.
I'm just not sure which way to go with the DB design, so could do with some direction on this. Should I go with normalizing the data and creating lots of tables, or one massive table with no normalisation and over 100 fields, or should I look into a star topology or something completely different (EAV)?
I am limited to using .Net and MSSQL server 2005.
Edit: The tool to collect and store the data will be run on an as required basis, rather than just grabbing the config data every day/week. Would be looking to keep the data for a couple of years at least.
Star Schema is best for reporting purposes in my experience. It is not necessary to use Star Schema for storage because it might be a set of views (indexed for performance) and you can design views for Star Schema later. Storage model should be a set of event tables to record configuration changes. You can start from flat log file structure and normalize it iteratively to find good structures for storage and queries. Storage model is supposed to be good if you can define model constraints, reporting model should be good for fast ad-hoc queries. You should focus on storage model because reporting model is a denormalization of storage model and it is easier to denormalize later. EAV structures are useless for both models because you can not define any constraints but queries are complex anyways.
I face the following problem.
The target is to develop a DB to store the following schema:
You have PRODUCTS that can be composed of both PRIMARY_PRODUCTS and also other PRODUCTS.
My first question is to know which one of SQL DB or NoSQL technology would be recommended for this?
I don't really know well NoSQL and I am not sure it is worth spending time investigating if the whole concept is not suited with the pb.
If NoSQL is worth looking at, which version is recommended? I was looking at Cassandra but there are so many types that the universe is quite big.
If NoSQL is not suited for this, so we need to revert to SQL.
Do you thing that hierarchyId is suited?
Both SQL or NoSQL can store and retrieve data of this kind, and both technologies can be made to do this job.
The major differences are elsewhere: in a nutshell, transactions and guaranteed consistency for SQL versus high performance for readers for NoSQL.
In your precise situation SQL, with its support for transactions, will ensure that viewers will see a composite product when all sub-products have been successfully stored.
In most real-life situations, however, the chance of a viewer seeing a partially-committed product on a NoSQL system is so slim as to be irrelevant: future reads of the product will be correct.
If my data is relational (publishers-authors-books, associations-teams-players), can we use NoSQL system like HBase or MongoDB to store the data?
(I know it may sound like a stupid question but I'm just learning :))
Yes, you can store any type of data in NoSQL datastores. The kind of information you describe should be very adequate for NoSQL.
However, be aware that in a typical NoSQL solution, you would be trading some/many features that are taken for granted in SQL databases, such as transactions, strong consistency, rich queries, ad-hoc queries, etc, mainly in favour of simpler models that can scale horizontally very easily.
One of the Digg engineers working on Cassandra (another NoSQL solution) wrote a very good post about data models and NoSQL (specifically Cassandra).
This may help you start thinking in column oriented data structures.
You can store relational data with playOrm and still do joins and such AND scale that data as well. There are lots of people saying you can't store relational data in noSQL but this is simply not true as we do it today AND we scale view partitioning and Scalable SQL (S-SQL) which is a slight twist on SQL so that we scale.
I've got an app fully working with PostgreSQL. After reading about MongoDB, I was interested to see how the app would work with it. After a few weeks, I migrated the whole system to MongoDB.
I like a few things with MongoDB. However, I found certain queries I was doing in PostgreSQL, I couldn't do efficiently in MongoDB. Especially, when I had to join several tables to calculate some logic. For example, this.
Moreover, I am using Ruby on Rails 3, and an ODM called Mongoid. Mongoid is still in beta release. Documentation was good, but again, at times I found the ODM to be very limiting compared to what Active Record offered with traditional (SQL) database systems.
Even to this date, I feel more comfortable working with PostgreSQL than MongoDB. Only because I can join tables and do anything with the data.
I've made two types of backups. One with PostgreSQL and the other with MongoDB. Some say, some apps are more suitable with one or the other type of db. Should I continue with MongoDB and eventually hope for its RoR ODM (Mongoid) to fully mature, or should I consider using PostgreSQL?
A few more questions:
1) Which one would be more suitable for developing a social networking site similar to Facebook.
2) Which one would be more suitable for 4-page standard layout type of website (Home, Products, About, Contact)
You dumped a decades-tested, fully featured RDBMS for a young, beta-quality, feature-thin document store with little community support. Unless you're already running tens of thousands of dollars a month in servers and think MongoDB was a better fit for the nature of your data, you probably wasted a lot of time for negative benefit. MongoDB is fun to toy with, and I've built a few apps using it myself for that reason, but it's almost never a better choice than Postgres/MySQL/SQL Server/etc. for production applications.
Let's quote what you wrote and see what it tells us:
"I like a few things with Mongodb. However, I found certain queries I was
doing in PostgreSql, I couldn't do efficiently in Mongodb. Especially,
when I had to join several tables to calculate some logic."
"I found the ODM to be very limiting compared to what Active Record offered
with traditional (SQL) database systems."
"I feel more comfortable working with PostgreSql than Mongodb. Only because
I can join tables and do anything with the data."
Based on what you've said it looks to me like you should stick with PostgreSQL. Keep an eye on MongoDB and use it if and when it's appropriate. But given what you've said it sounds like PG is a better fit for you at present.
Share and enjoy.
I haven't used MongoDB yet, and may never get round to it as I haven't found anything I can't do with Postgres, but just to quote the PostgreSQL 9.2 release notes:
With PostgreSQL 9.2, query results can be returned as JSON data types.
Combined with the new PL/V8 Javascript and PL/Coffee database
programming extensions, and the optional HStore key-value store, users
can now utilize PostgreSQL like a "NoSQL" document database, while
retaining PostgreSQL's reliability, flexibility and performance.
So looks like in new versions of Postgres you can have the best of both worlds. I haven't used this yet either but as a bit of a fan of PostgreSQL (excellent docs / mailing lists) I wouldn't hesitate using it for almost anything RDBMS related.
First of all postgres is an RDBMS and MongoDB is NoSQL .
but Stand-alone NoSQL technologies do not meet ACID standards because they sacrifice critical data protections in favor of high throughput performance for unstructured applications.
Postgres 9.4 providing NoSQL capabilities along with full transaction support, storing JSON documents with constraints on the fields data.
so you will get all advantages from both RDBMS and NoSQL
check it out for detailed article http://www.aptuz.com/blog/is-postgres-nosql-database-better-than-mongodb/
To experience Postgres' NoSQL performance for yourself. Download the pg_nosql_benchmark at GitHub. here is the link https://github.com/EnterpriseDB/pg_nosql_benchmark
We also have research on the same that which is better. PostGres or MongoDb. but with all facts and figures in hand, we found that PostGres is far better to use than MongoDb. in MongoDb, beside eats up memory and CPU, it also occupies large amount of disk space. It's increasing 2x size of disk on certain interval.
My experience with Postgres and Mongo after working with both the databases in my projects .
Postgres(RDBMS)
Postgres is recommended if your future applications have a complicated schema that needs lots of joins or all the data have relations or if we have heavy writing. Postgres is open source, faster, ACID compliant and uses less memory on disk, and is all around good performant for JSON storage also and includes full serializability of transactions with 3 levels of transaction isolation.
The biggest advantage of staying with Postgres is that we have best of both worlds. We can store data into JSONB with constraints, consistency and speed. On the other hand, we can use all SQL features for other types of data. The underlying engine is very stable and copes well with a good range of data volumes. It also runs on your choice of hardware and operating system. Postgres providing NoSQL capabilities along with full transaction support, storing JSON documents with constraints on the fields data.
General Constraints for Postgres
Scaling Postgres Horizontally is significantly harder, but doable.
Fast read operations cannot be fully achieved with Postgres.
NO SQL Data Bases
Mongo DB (Wired Tiger)
MongoDB may beat Postgres in dimension of “horizontal scale”. Storing JSON is what Mongo is optimized to do. Mongo stores its data in a binary format called BSONb which is (roughly) just a binary representation of a superset of JSON. MongoDB stores objects exactly as they were designed. According to MongoDB, for write-intensive applications, Mongo says the new engine(Wired Tiger) gives users an up to 10x increase in write performance(I should try this), with 80 percent reduction in storage utilization, helping to lower costs of storage, achieve greater utilization of hardware.
General Constraints of MongoDb
The usage of a schema less storage engine leads to the problem of implicit schemas. These schemas aren’t defined by our storage engine but instead are defined based on application behavior and expectations.
Stand-alone NoSQL technologies do not meet ACID standards because they sacrifice critical data protections in favor of high throughput performance for unstructured applications. It’s not hard to apply ACID on NoSQL databases but it would make database slow and inflexible up to some extent.
“Most of the NoSQL limitations were optimized in the newer versions and releases which have overcome its previous limitations up to a great extent”.
Which one would be more suitable for developing a social networking site similar to Facebook?
Facebook currently uses combination of databases like Hive and Cassandra.
Which one would be more suitable for 4-page standard layout type of website (Home, Products, About, Contact)
Again it depends how you want to store and process your data. but any SQL or NOSQL database would do the job.