NoSQL system to save relational data - sql

If my data is relational (publishers-authors-books, associations-teams-players), can we use NoSQL system like HBase or MongoDB to store the data?
(I know it may sound like a stupid question but I'm just learning :))

Yes, you can store any type of data in NoSQL datastores. The kind of information you describe should be very adequate for NoSQL.
However, be aware that in a typical NoSQL solution, you would be trading some/many features that are taken for granted in SQL databases, such as transactions, strong consistency, rich queries, ad-hoc queries, etc, mainly in favour of simpler models that can scale horizontally very easily.

One of the Digg engineers working on Cassandra (another NoSQL solution) wrote a very good post about data models and NoSQL (specifically Cassandra).
This may help you start thinking in column oriented data structures.

You can store relational data with playOrm and still do joins and such AND scale that data as well. There are lots of people saying you can't store relational data in noSQL but this is simply not true as we do it today AND we scale view partitioning and Scalable SQL (S-SQL) which is a slight twist on SQL so that we scale.

Related

These days SQL databases can store JSON. Then why do we need NoSQL?

One of the advantages of NoSQL databases is to handle unstructured data. Since that issue is now resolved in SQL databases, is there any need left for NoSQL? Only advantage that I can think of is NoSQL is still better at scalability.
You might choose a NoSQL database for the following reasons:
To store large volumes of data that might have little to no
structure.
NoSQL databases do not limit the types of data that you can store
together. NoSQL databases also enable you to add new data types as
your needs change. With document-oriented databases, you can store
data in one place without having to define the data type in advance.
To make the most of cloud computing and storage.
In order for a cloud solution to be scalable, the data must be easy
to share across multiple servers.
To speed development.
When you are developing in rapid iterations or making frequent
updates to the data structure, a relational database slows you down.
However, because NoSQL data doesn’t need to be prepped ahead of time,
you can make frequent updates to the data structure with minimal
downtime.
To boost horizontal scalability.
The CAP (consistency, availability, and partition tolerance) theorem
states that in any distributed system, only two of the three CAP
properties can be used simultaneously. Adjusting these properties in
favor of strong partition tolerance enables NoSQL users to boost
horizontal scalability.
The following Link provides sufficient details about the requirement of NoSQL databases.
https://support.rackspace.com/how-to/reasons-to-use-a-nosql-db/

Why are Relational databases said to be not good at scalability and what gives NOSQL databases the edge here?

Many articles claim that relational databases cannot be scaled and NOSQL is better at it but do not explain why. Scalability is often projected as an advantage of NOSQL. What is the problem with scaling relational databases? What makes NOSQL databases superior to relational databases in the aspect of scalability?
Both SQL and NOSQL databases can scale. However, NOSQL databases have some simplified functionality that can improve scalability.
For instance, SQL databases generally enforce a set of properties called ACID properties. These ensure the consistency of the data over time and the ability implement an entire transaction "all at once".
However, when running in a multi-processor environment, there is overhead to strictly maintaining the ACID properties. Basically, the data needs to look the same from any processor at the same time.
NOSQL databases often implement "ACID-lite". For instance, they offer "eventual-consistency". This means that for a few seconds or minutes, a query might return different values depending on which processor(s) process it. And, this is fine for many applications.
This truly depends on the requirement of the enterprise in long run and volume of the data expected. The other key factor is the requirement in terms of do we need OLTP kind of scenario only and reporting is less which means implementing ACID scenario. No SQL is usually best for the scenario where reporting is vital as compare to SQL. As both carry its own Mertis but ideally its hybrid model to take adavntage of both usually works better where you have scalability and better transaction control on SQL DB's and high performance rreporting using NO SQL DB which allow all level of freedom such as graph DB, Key value pair. There are lot of intresting comparision are available evne for specific DB you want to evaluate.
Puneet

why is sql vertically scalable and nosql horizontally

I am new to NoSQL and trying to understand it's meaning.
I have seen many articles in many different websites that repeat the fact that "SQL DataBases are scaled vertically (by adding CPU/memory) whereas NoSQL DataBases are scaled horizontally (by adding more machines that can perform distributed calculations)".
For example these articles:
http://dataconomy.com/sql-vs-nosql-need-know/
http://www.thegeekstuff.com/2014/01/sql-vs-nosql-db/
The thing is that I can't understand why.
As far as I am aware, the major difference between SQL and NoSQL (besides the scalability issue) is that SQL is stored in tables, whereas NoSQL is stored in different ways (Key-Value/Graph/xml, etc..).
I can't seem to understand the connection between those two facts (scalability and storing strategy). These seem like unrelated things to me (probably due to lack of understanding).
The articles are generally reasonable. Both NoSQL technologies and SQL technologies (for lack of a better term) have important roles to play nowadays --- as both articles point out. The discussion is somewhat reminiscent of hierarchical databases versus relational databases, once upon a time.
I disagree with the scalability differences. The discussions leave out technologies such as Hive, PrestoDB, and BigQuery, which are based on highly scalable technologies in the spirit of traditional RDBMSs.
The major differences between RDBMS and NoSQL (in my opinion) are ACID-compliance and data structure. The first is a "burden" that relational databases carry, for both better and worse -- definitely handy for financial transactions, but at the cost of overhead for other purposes. The second is an area where traditional databases are moving towards better handling of unstructured data, with direct support for nested tables, JSON, and XML formats. However, structure is important, as legions of data scientists probably learn the hard way as they interact with data.
Large scalable key-value databases have been designed with "horizontal" scalability in mind. That combined with the lack of pure ACID properties facilitates re-balancing the data for new hardware -- assuming you have designed the database correctly (and that can be a large assumption).
Databases such as Oracle, DB2, and Teradata have supported parallel processing literally for decades (although more biased toward a single server, albeit with shared-nothing architectures). Their technology pre-dates the more modern Apache-based systems (for lack of a better term), but it doesn't mean that they cannot scale across multiple processors.
New databases such as Hive, Redshift, BigQuery, and PrestoDB provide SQL-based interfaces in the more modern "horizontally" scalable sense (at least for queries). A lot of work is going on in the Postgres world to support parallel processing there -- and the example of databases such as Greenplum, Netezza, Vertica, and so on belie the idea that relational databases do not scale across multiple independent processors.

access sql database as nosql (couchbase)

I hope to access sql database as the way of nosql key-value pairs/document.
This is for future upgrade if user amount increases a lot,
I can migrate from sql to nosql immediately while application code changes nothing.
Of course I can write the api/solution by myself, just wonder if there is any person has done same thing as I said before and published the solution.
Your comment welcome
While I agree with everything scalabilitysolved has said, there is an interesting feature in the offing for Postgres, scheduled for the 9.4 Postgres release, namely, jsonb: http://www.postgresql.org/docs/devel/static/datatype-json.html with some interesting indexing and query possibilities. I mention this as you tagged Mongodb and Couchbase, both of which use JSON (well, technically BSON in Mongodb's case).
Of course, querying, sharding, replication, ACID guarantees, etc will still be totally different between Postgres (or any other traditional RDBMS) and any document-based NoSQL solution and migrations between any two RDBMS tends to be quite painful, let alone between an RDBMS and a NoSQL data store. However, jsonb looks quite promising as a potential half-way house between two of the major paradigms of data storage.
On the other hand, every release of MongoDB brings enhancements to the aggregation pipeline, which is certainly something that seems to appeal to people used to the flexibility that SQL offers and less "alien" than distributed map/reduce jobs. So, it seems reasonable to conclude that there will continue to be cross pollination.
See Explanation of JSONB introduced by PostgreSQL for some further insights into jsonb.
No no no, do not consider this, it's a really bad idea. Pick either a RDBMS or NoSQL solution based upon how your data is modelled and your usage patterns. Converting from one to another is going to be painful and especially if your 'user amount increases a lot'.
Let's face it, either approach would deal with a large increase in usage and both would benefit more from specific optimizations to their database then simply swapping because one 'scales more'.
If your data model fits RDBMS and it needs to perform better than analyze your queries, check your indexes are optimized and look into caching and better data pattern access.
If your data model fits a NoSQL database then as your dataset grows you can add additional nodes (Couchbase),caching expensive map reduce jobs and again optimizing your data pattern access.
In summary, pick either SQL or NoSQL dependent on your data needs, don't just assume that NoSQL is a magic bullet as with easier scaling comes a much less flexible querying model.

Which choice of technology for this?

I face the following problem.
The target is to develop a DB to store the following schema:
You have PRODUCTS that can be composed of both PRIMARY_PRODUCTS and also other PRODUCTS.
My first question is to know which one of SQL DB or NoSQL technology would be recommended for this?
I don't really know well NoSQL and I am not sure it is worth spending time investigating if the whole concept is not suited with the pb.
If NoSQL is worth looking at, which version is recommended? I was looking at Cassandra but there are so many types that the universe is quite big.
If NoSQL is not suited for this, so we need to revert to SQL.
Do you thing that hierarchyId is suited?
Both SQL or NoSQL can store and retrieve data of this kind, and both technologies can be made to do this job.
The major differences are elsewhere: in a nutshell, transactions and guaranteed consistency for SQL versus high performance for readers for NoSQL.
In your precise situation SQL, with its support for transactions, will ensure that viewers will see a composite product when all sub-products have been successfully stored.
In most real-life situations, however, the chance of a viewer seeing a partially-committed product on a NoSQL system is so slim as to be irrelevant: future reads of the product will be correct.