Join SQL with NoSQL databases - sql

I just wanted to know does it will make any sense to join sql database with nosql database?

Yes it makes sense, one of the big advantage of NoSQL data storage is that data is not tight to specific schema.
One fundamental difference between SQL and NOSQL dB's is support for transactions.
Imagine you were writing a banking app that keeps account balances. You will not be able to achieve accurate balance values unless you use transactions. This is common in all SQL dbs that support ACID semantics.
However support for transactions is not available in NOSql. Therefore NOSql is not suited for any project that needs transactions.
That said, if the same banking app needs tremendous scale, it can be built such that all non transactional data or data that can tolerate "eventual consistency" can use NOSql and data that needs transaction support can be stored in an SQL db.
The advantage of this design would be the benefit of automatic sharding or splitting of data that NOSql DBs provide that allows them to scale easily. In effect, maintenance needs of the DB can be significantly reduced by choosing for a hybrid model such as this.

Related

Sharding a RDBMS (SQL) database

I am reading about sharding and I understood it upto some context. But most of the material I read says that sharding (horizontally scaling) RDBMS is a challenging task. But I don't see why NO-SQL is easy to shard and RDBMS would be tough to shard?
My understanding is: some NO-SQL provides inbuilt sharding support which makes it easy to shard. But if the NO-SQL does not provide inbuilt sharding support, then sharding overhead in SQL/NO-SQL is same thing as it has to be implemented in application layer.
Is my understanding correct or did I miss anything?
I don't think sharding is particularly "harder" in a SQL versus a NO-SQL database from the user perspective. After all, the complicated stuff is all done "under the hood", so the interface for users is pretty similar.
Sharding means that rows of a given table are stored separately -- often in local storage on different nodes. The issue is keeping them up-to-date.
One key difference is that SQL enforces ACID properties on the data, in particular "consistency". This means that queries see the database only after transactions have been completed entirely or not at all.
NO-SQL databases typically implement eventual consistency. That is, a given transaction may take some time (typically measured in seconds up to a minute) before the transaction completes across all shards.
Consider the situation where a query is deleting one row in each shard. A SQL database will either see all rows deleted or none. A NO-SQL database might return intermediate results.
The advantage of NO-SQL is that large databases are often append-only and transactions only affect one shard -- so eventual consistency is quite good-enough.
The advantage of SQL databases is that consistency is guaranteed (well, in some databases you can fiddle with settings to weaken it). However, there is a higher cost of waiting for all shards to agree that a transaction has completed.
I will note that in some situations SQL databases have a tremendous application advantage -- because the applications do not need to deal with potentially inconsistent data.

These days SQL databases can store JSON. Then why do we need NoSQL?

One of the advantages of NoSQL databases is to handle unstructured data. Since that issue is now resolved in SQL databases, is there any need left for NoSQL? Only advantage that I can think of is NoSQL is still better at scalability.
You might choose a NoSQL database for the following reasons:
To store large volumes of data that might have little to no
structure.
NoSQL databases do not limit the types of data that you can store
together. NoSQL databases also enable you to add new data types as
your needs change. With document-oriented databases, you can store
data in one place without having to define the data type in advance.
To make the most of cloud computing and storage.
In order for a cloud solution to be scalable, the data must be easy
to share across multiple servers.
To speed development.
When you are developing in rapid iterations or making frequent
updates to the data structure, a relational database slows you down.
However, because NoSQL data doesn’t need to be prepped ahead of time,
you can make frequent updates to the data structure with minimal
downtime.
To boost horizontal scalability.
The CAP (consistency, availability, and partition tolerance) theorem
states that in any distributed system, only two of the three CAP
properties can be used simultaneously. Adjusting these properties in
favor of strong partition tolerance enables NoSQL users to boost
horizontal scalability.
The following Link provides sufficient details about the requirement of NoSQL databases.
https://support.rackspace.com/how-to/reasons-to-use-a-nosql-db/

Why are Relational databases said to be not good at scalability and what gives NOSQL databases the edge here?

Many articles claim that relational databases cannot be scaled and NOSQL is better at it but do not explain why. Scalability is often projected as an advantage of NOSQL. What is the problem with scaling relational databases? What makes NOSQL databases superior to relational databases in the aspect of scalability?
Both SQL and NOSQL databases can scale. However, NOSQL databases have some simplified functionality that can improve scalability.
For instance, SQL databases generally enforce a set of properties called ACID properties. These ensure the consistency of the data over time and the ability implement an entire transaction "all at once".
However, when running in a multi-processor environment, there is overhead to strictly maintaining the ACID properties. Basically, the data needs to look the same from any processor at the same time.
NOSQL databases often implement "ACID-lite". For instance, they offer "eventual-consistency". This means that for a few seconds or minutes, a query might return different values depending on which processor(s) process it. And, this is fine for many applications.
This truly depends on the requirement of the enterprise in long run and volume of the data expected. The other key factor is the requirement in terms of do we need OLTP kind of scenario only and reporting is less which means implementing ACID scenario. No SQL is usually best for the scenario where reporting is vital as compare to SQL. As both carry its own Mertis but ideally its hybrid model to take adavntage of both usually works better where you have scalability and better transaction control on SQL DB's and high performance rreporting using NO SQL DB which allow all level of freedom such as graph DB, Key value pair. There are lot of intresting comparision are available evne for specific DB you want to evaluate.
Puneet

Differences between OLAP and OLTP databases

What are the key differences between OLAP and OLTP databases.
Specifically in terms of implementation (rather than use cases).
OLAP is of course primarily used for reporting while OLTP is used for handling transactions.
I understand that OLAP databases are optimized for read over write, and that OLAP databases contain more denormalised data.
What other characteristics set the two apart?
OLTP:
As the name suggest "Online Transaction Processing", this is used for more transaction needs like "INSERT/SELECT/UPDATE/DELETE".
Low Response Time.
There are the original source of data.
Usually data is stored in 3NF form.
ACID properties are necessarily followed.
OLAP:
As the name suggest "Online Analytical Platform", used for analytical queries and in general are used for complex analytical queries and drawing inferences.
Periodic batch processing jobs are run here.
Typically de-normalized with fewer tables; use of star and/or snowflake schemas.
NOT necessarily follows ACID properties.
There are many difference. You may find tons of answers by googling this question. But some of the characteristics which are derived from practical implementation from my own experiences are:
OLTP is business domain specific system designed to perform specific tasks for example an eCommerce website having a database for handling online order while another OLTP database is being used for back end operation for order processing another OLTP database is for logistics etc. Whereas OLAP systems are designed to look at the information at whole business level by sourcing data from many heterogeneous system.
If I simplified the above example then OLTP is small units of Business Processing system while OLAP system is a large unit of Business Information.
You can refer this link for more clarification.

How can NoSQL databases achieve much better write throughput than some relational databases?

How is this possible? What is it about NoSQL that gives it a higher write throughput than some RDBMS? Does it boil down to scalability?
Some noSQL systems are basically just persistent key/value storages (like Project Voldemort). If your queries are of the type "look up the value for a given key", such a system will (or at least should be) faster that an RDBMS, because it only needs to have a much smaller feature set.
Another popular type of noSQL system is the document database (like CouchDB). These databases have no predefined data structure. Their speed advantage relies heavily on denormalization and creating a data layout that is tailored to the queries that you will run on it. For example, for a blog, you could save a blog post in a document together with its comments. This reduces the need for joins and lookups, making your queries faster, but it also could reduce your flexibility regarding queries.
There are many NoSQL solutions around, each one with its own strengths and weaknesses, so the following must be taken with a grain of salt.
But essentially, what many NoSQL databases do is rely on denormalization and try to optimize for the denormalized case. For instance, say you are reading a blog post together with its comments in a document-oriented database. Often, the comments will be saved together with the post itself. This means that it will be faster to retrieve all of them together, as they are stored in the same place and you do not have to perform a join.
Of course, you can do the same in SQL, and denormalizing is a common practice when one needs performance. It is just that many NoSQL solutions are engineered from the start to be always used this way. You then get the usual tradeoffs: for instance, adding a comment in the above example will be slower because you have to save the whole document with it. And once you have denormalized, you have to take care of preserving data integrity in your application.
Moreover, in many NoSQL solutions, it is impossible to do arbitrary joins, hence arbitrary queries. Some databases, like CouchDB, require you to think ahead of the queries you will need and prepare them inside the DB.
All in all, it boils down to expecting a denormalized schema and optimizing reads for that situation, and this works well for data that is not highly relational and that requires much more reads than writes.
This link explains a lot moreover where:
RDBMS -> data integrity is a key feature (which can slow down some operations like writing)
NoSQL -> Speed and horizontal scalability are imperative (So speed is really high with this imperatve)
AAAND... The thing about NoSQL is that NoSQl cannot be compared to SQL in any way. NoSQL is name of all persistence technologies that are not SQL. Document DBs, Key-Value DBs, Event DBs are all NoSQL. They are all different in almost all aspects, be it structure of saved data, querying, performance and available tools.
Hope it is useful to understand
In summary, NoSQL databases are built to easily scale across a large number of servers (by sharding/horizontal partitioning of data items), and to be fault tolerant (through replication, write-ahead logging, and data repair mechanisms). Furthermore, NoSQL supports achieving high write throughput (by employing memory caches and append-only storage semantics), low read latencies (through caching and smart storage data models), and flexibility (with schema-less design and denormalization).
From:
Open Journal of Databases (OJDB)
Volume 1, Issue 2, 2014
www.ronpub.com/journals/ojdb
ISSN 2199-3459
https://estudogeral.sib.uc.pt/bitstream/10316/27748/1/Which%20NoSQL%20Database.pdf - page 19
A higher write throughput can also be credited to the internal data structures that power the database storage engine.
Even though B-tree implementations used by some RDBMS have stood the test of time, LSM-trees used in some key-value datastores are typically faster for writes:
1: When a write comes, you add it to the in-memory balanced tree, called memtable.
2: When the memtable grows big, it is flushed to the disk.
To understand this data structure better, please check this video and this answer.