SQL and NoSQL which one is more suitable for this case and why? - sql

In my project:
Data is not going to be modified (only query).
It is going to be more than 1.000.000 instances of data.
Query performance is critical.
In case of using SQL, it is going to be a single table with 7 columns. (no joints)
There are also different classification approaches used in NoSQL. Which are given below with some examples:
Column: Accumulo, Cassandra, HBase
Document: Clusterpoint, Couchdb, Couchbase, MarkLogic, MongoDB
Key-value: Dynamo, FoundationDB, MemcacheDB, Redis, Riak, FairCom c-treeACE
Graph: Allegro, Neo4J, OrientDB, Virtuoso, Stardog
Source: http://en.wikipedia.org/wiki/NoSQL#cite_note-7
First of all, does the database system really makes an observable amount of performance difference for this case?
If it makes then,can you please explain which one is more suitable for my project SQL or NoSQL, if NoSQL then which classification approach?
Thank you in advance

I am currently enrolled in a project to set up a "standard" Database with a huge amount of data. We start by implementing in SQL to see the performance of the queries. Once this is done we address the problem of performance.
There is multiple reasons for this, but to name a few:
Standard SQL is easily implemented and standard across multiple instances (as of present day)
If you know SQL, make a fast implementation. To save time and get the project going.
There are loads of information available about SQL implementations.
I cannot answer about NoSQL but hopefully someone can fill me in.

The important question you need to ask is what kind of queries you will be performing. For example ClusterPoint offers real-time aggregation, so if you need result grouping and extracting summaries, it gives you great performance.
For a regular key/value they should all perform pretty well, so pick the one you are most comfortable with.

Related

Store Application [Graph Data base or SQL ]

I have a question I want to make an application for a store which stores, show information about items , receipts , vendors, clients, profits
what is the best technology to use in this case SQL or neo4j for example ? and why :) ?
Thank you so much your help will be greatly appreciated :)
Neo4j is a graph database. So, performance wise, its always better than relational database. You just need to model your requirement considering the neo4j features. Recently, I have read a blog pointing the performance difference between graph database and relational database.
Read this Why Graph database? Why Neo4j?
That depends on your usage scenarios. Do you simply want to store inventory and customer information? Do you want to mine data from transactions, like who bought what? Do you want to implement a recommender system?
Generally, when having lots of interrelated data and you're primarily interested in these relations, using a graph database is a good choice. In most other cases, it isn't. That doesn't mean that NoSQL is from the table, though: If your items' stored information greatly differs in structure, using a schema-less database (e.g. a document store like CouchDB or MongoDB) might still be a good idea.
If you're simply interested in storing your data, SQL is a good choice. If you like to aggregate over this data (e.g. to check inventories, analyse sales, etc.) it might even be the best one.
PS: It is wrong to assume that a graph database is always faster than a relational database. That totally depends on your data model and the kind of queries you need to do.
Regards
Hendrik

SQL versus noSQL (speed)

When people are comparing SQL and noSQL, and concluding the upsides and downsides of each one, what I never hear anyone talking about is the speed.
Isn't performing SQL queries generally faster than performing noSQL queries?
I mean, for me this would be a really obvious conclusion, because you should always be able to find something faster if you know the structure of your database than if you don't.
But people never seem to mention this, so I want to know if my conclusion is right or wrong.
People who tend to use noSQL use it specifically because it fits their use cases. Being divorced from normal RDBMS table relationships and constraints, as well as ACID-ity of data, it's very easy to make it run a lot faster.
Consider Twitter, which uses NoSQL because a user only does very limited things on site, or one exactly - tweet. And concurrency can be considered non-existent since (1) nobody else can modify your tweet and (2) you won't normally be simultaneously tweeting from multiple devices.
The definition of noSQL systems is a very broad one -- a database that doesn't use SQL / is not a RDBMS.
Therefore, the answer to your question is, in short: "it depends".
Some noSQL systems are basically just persistent key/value storages (like Project Voldemort). If your queries are of the type "look up the value for a given key", such a system will (or at least should be) faster that an RDBMS, because it only needs to have a much smaller feature set.
Another popular type of noSQL system is the document database (like CouchDB).
These databases have no predefined data structure.
Their speed advantage relies heavily on denormalization and creating a data layout that is tailored to the queries that you will run on it. For example, for a blog, you could save a blog post in a document together with its comments. This reduces the need for joins and lookups, making your queries faster, but it also could reduce your flexibility regarding queries.
As Einstein would say, speed is relative.
If you need to store a master/detail simple application (like a shopping cart), you would need to do several Insert statements in your SQL application, also you will get a Data set of information when you do a query to get the purchase, if you're using NoSQL, and you're using it well, then you would have all the data for a single order in one simple "record" (document if you use the terms of NoSQL databases like djondb).
So, I really think that the performance of an application can be measured by the number of things it need to do to achieve a single requirement, if you need to do several Inserts to store an order and you only need one simple Insert in a database like djondb then the performance will be 10x faster in the NoSQL world, just because you're using 10 times less calls to the database layer, that's it.
To illustrate my point let me link an example I wrote sometime ago about the differences between NoSQL and SQL data models approach: https://web.archive.org/web/20160510045647/http://djondb.com/blog/nosql-masterdetail-sample/, I know it's a self reference, but basically I wrote it to address this question which I found it's the most challenging question a RDBMS guy could have and it's always a good way to explain why NoSQL is so different from SQL world, and why it will achieve better performance anytime, not because we use "nasa" technology, it's because NoSQL will let the developer do less... and get more, and less code = greater performance.
The answer is: it depends. Generally speaking, the objective of NoSQL DATABASES (no "queries") is scalability. RDBMS usually have some hard limits at some point (I'm talking about millons and millons of rows) where you could not scale any more by traditional means (Replication, clustering, partitioning), and you need something more because your needs keep growing. Or even if you manage to scale, the overall setup is quite complicated. Or you can scale reads, but not writes.
And the queries depends on the particular implementation of your server, the type of query you are doing, the columns in the table, etc... remember that queries are just one part of the RDBMS.
query time of relational database like SQL for 1000 person data is 2000 ms and graph database like neo4j
is 2ms .if you crate more node 1000000 speed stable 2 ms

SQL vs NOSQL: Which to use for this schema?

I've got an upcoming project and I can't decide whether to stick with SQL or switch over to NoSQL. It's basically a reporting system with the main interface being reporting on the data entered in by users.
Here's the schema I've got mapped out:
Because this schema is so nested, I started thinking about NoSQL. With SQL, I'm afraid I'm going to have a crap-ton of joins to get to the bottom of the tree (the Record model).
My concerns, though, are two-fold:
I'm only just starting to get into NoSQL and I'm worried my
knowledge may limit me because of the tight timeframe.
Although creating data at the bottom of the tree will probably be relatively simple, I'm worried that it may be hard to report on without getting into some heavy map/reduce stuff (that I have zero experience with)
My question:
Given my concerns, do you think this schema -- because of how deeply nested it is -- lends itself more to NoSQL? If so, do you think the reporting on the "records" will be difficult?
I realize that it may be difficult to answer these questions without more info, so please let me know what other info may be helpful in coming up with an answer.
Thanks in advance for your help!
Just my opinion:
I Stared at diagram for approx 3 sec, this is clearly relational. Benefits of an RDBMS heavily outweigh a NoSQL solution here. Why would you want to use NoSQL? Are there 100,000+ records (may a million plus)? You need microsecond/millisecond performance?
NoSQL, as I understand, is not because you don't like lots of joins. It's because big systems for hierarchical data don't suit every situation. This suit this perfectly, however.
You can probably implode all of the {organisation, region,campus,event } hierarchy into one hierarchical / tree based / self-referential relation. Maybe "user", too.
That would drastically reduce the number of tables needed. for an example, please take a look at this implementation: Interesting tree/hierarchical data structure problem (which is actually more complex than yours).
BTW: I don't have the faintest idea what "metric model" means. Inches? Miles to the gallon? Or just "measurements" ? Could you please explain a bit more what you intend to do?
EDIT: BTW2: the model you propose is technically not too difficult for postgres. But it is probably bigger than necessary for humans.
My question: Given my concerns, do you think this schema -- because of
how deeply nested it is -- lends itself more to NoSQL?
Deep nesting is not a point pro or contra SQL/NoSQL.
If so, do you > think the reporting on the "records" will be difficult?
That's the tipping point and here you don't give us the relevant information: What is this "reporting" thing in your case?
Does one report aggregate much data? E.g. does it simply aggregate all records and return a sum of them?
Does it aggregate over many of your layers?
Does a report evaluate strictly hierarchical or does it correlate event1.metric4.record42 to event2.metric18.record50 (or something like that)?
How much data must be transfered from the NoSQL DB to your application only to aggregate it an throw most of the parts away.
How unstructured is your data? Well - very structured it seems.
Those are typical situations/points where RDBMs have proven their value. If these items are not important in your case, then you can choose freely.

Advice for hand-written olap-like extractions from relational database

We've implemented over the course of the years a series of web based reports summarizing historical business data (product sales, traffic, etc). The thing relies heavily on complex SQL queries, and the boss expects the results to be real time, but they need up to a minute to execute. The reports are customizable on a several dimensions.
I've done some basic research, and it looks like what we need is some kind of OLAP (?), ETL(?), whatever.
Is that true? Are we supposed to convert to a whole package and trash our beloved developments, or is there a possibility to keep it relational, SQL-based, and get close to a dedicated solution by simply pre-calculating some optimized views with a batch process running at night? Have you got pointers to good documentation on the subject?
Thank you.
You can do ETL (Extract, transform, and load) at night, loading the (probably summarized) data into tables that can usually be queried pretty quickly. Appropriate indexes are still important.
It often makes sense to put those summary tables in a different schema, a different database, or on a different server, but you don't absolutely have to do that.
The structure of the tables is important, and it's not like designing tables for an OLTP system. The IBM Redbooks have a couple of titles that can help you design the tables.
Data Modeling Techniques for Data
Warehousing
Dimensional Modeling: In a Business
Intelligence Environment
Most dbms today support SQL analytic functions. See, for example, Analytic Functions by Example for Oracle, or Window Functions for PostgreSQL.
In the long term, it sounds as though a move to a data warehouse would definitely benefit you (as suggested in Catcall's answer). You can use the existing reports as a starting point for your data warehouse's requirements.
In the short term, you could build summarised tables optimised for your existing reporting requirements. This should probably be regarded as a stopgap, unless you are never going to change these reports again.
You might also benefit from looking into partitioning tables in your database by date/time, since you will probably still want to report the current day's data for realtime reporting purposes.

What nosql means? can someone explain it to me in simple words?

in this post Stack Overflow Architecture i read about something called nosql, i didn't understand what it means, and i tried to search on google but seams that i can't get exactly whats it.
Can anyone explain what nosql means in simple words?
If you've ever worked with a database, you've probably worked with a relational database. Examples would be an Access database, SQL Server, or MySQL. When you think about tables in these kinds of databases, you generally think of a grid, like in Excel. You have to name each column of your database table, and you have to specify whether all the values in that column are integers, strings, etc. Finally, when you want to look up information in that table, you have to use a language called SQL.
A new trend is forming around non-relational databases, that is, databases that do not fall into a neat grid. You don't have to specify which things are integers and strings and booleans, etc. These types of databases are more flexible, but they don't use SQL, because they are not structured that way.
Put simply, that is why they are "NoSQL" databases.
The advantage of using a NoSQL database is that you don't have to know exactly what your data will look like ahead of time. Perhaps you have a Contacts table, but you don't know what kind of information you'll want to store about each contact. In a relational database, you need to make columns like "Name" and "Address". If you find out later on that you need a phone number, you have to add a column for that. There's no need for this kind of planning/structuring in a NoSQL database. There are also potential scaling advantages, but that is a bit controversial, so I won't make any claims there.
Disadvantages of NoSQL databases is really the lack of SQL. SQL is simple and ubiquitous. SQL allows you to slice and dice your data easier to get aggregate results, whereas it's a bit more complicated in NoSQL databases (you'll probably use things like MapReduce, for which there is a bit of a learning curve).
From the NoSQL Homepage
NoSQL is a fast, portable, relational database management system without arbitrary limits, (other than memory and processor speed) that runs under, and interacts with, the UNIX 1 Operating System. It uses the "Operator-Stream Paradigm" described in "Unix Review", March, 1991, page 24, entitled "A 4GL Language". There are a number of "operators" that each perform a unique function on the data. The "stream" is supplied by the UNIX Input/Output redirection mechanism. Therefore each operator processes some data and then passes it along to the next operator via the UNIX pipe function. This is very efficient as UNIX pipes are implemented in memory. NoSQL is compliant with the "Relational Model".
I would also see this answer on Stackoverflow.
Put simply, it means not using a relational database for data storage.
Here's a relevant article: http://www.computerworld.com/s/article/9135086/No_to_SQL_Anti_database_movement_gains_steam_
NoSql is the new database philosophy which talks about all the shortcomings of the relational database design, particularly the problems they have in scaling up for today's demanding web environments.
NoSql is quickly evolving into a movement with new tools, software and formats coming up as alternative to SQL.
RDBMS is as ubiquitous as OOP and while both of these design methodologies solve some problems wonderfully, they don't solve all.
So think of NoSql as the functional programmin of the database world.
Was this simple enough?
NoSQL is the idea that SQL-type databases don't satisfy the demands/requirements of a heavily-used database that requires transactions be reliable and failsafe (or close to it). This ties into the ideas of ACID and CAP, both things worth looking into but not something to lose sleep over unless you run a really popular site that is transaction-heavy (ie Amazon or Ebay). To get a great start on these subjects, I suggest:
http://www.eflorenzano.com/blog/post/my-thoughts-nosql/
and
http://www.julianbrowne.com/article/viewer/brewers-cap-theorem
Something everyone considering a "nosql" approach should consider:
(I shan't risk putting the image into this post as it contains a curse word, and I don't want offensive flags. So clicker beware -- there's an f-word in there. Only click if you have a sense of humor.)
http://browsertoolkit.com/fault-tolerance.png
Found this nice article about no-sql
and this as well:
NoSQL, Yes Search