SQL vs NOSQL: Which to use for this schema? - sql

I've got an upcoming project and I can't decide whether to stick with SQL or switch over to NoSQL. It's basically a reporting system with the main interface being reporting on the data entered in by users.
Here's the schema I've got mapped out:
Because this schema is so nested, I started thinking about NoSQL. With SQL, I'm afraid I'm going to have a crap-ton of joins to get to the bottom of the tree (the Record model).
My concerns, though, are two-fold:
I'm only just starting to get into NoSQL and I'm worried my
knowledge may limit me because of the tight timeframe.
Although creating data at the bottom of the tree will probably be relatively simple, I'm worried that it may be hard to report on without getting into some heavy map/reduce stuff (that I have zero experience with)
My question:
Given my concerns, do you think this schema -- because of how deeply nested it is -- lends itself more to NoSQL? If so, do you think the reporting on the "records" will be difficult?
I realize that it may be difficult to answer these questions without more info, so please let me know what other info may be helpful in coming up with an answer.
Thanks in advance for your help!

Just my opinion:
I Stared at diagram for approx 3 sec, this is clearly relational. Benefits of an RDBMS heavily outweigh a NoSQL solution here. Why would you want to use NoSQL? Are there 100,000+ records (may a million plus)? You need microsecond/millisecond performance?
NoSQL, as I understand, is not because you don't like lots of joins. It's because big systems for hierarchical data don't suit every situation. This suit this perfectly, however.

You can probably implode all of the {organisation, region,campus,event } hierarchy into one hierarchical / tree based / self-referential relation. Maybe "user", too.
That would drastically reduce the number of tables needed. for an example, please take a look at this implementation: Interesting tree/hierarchical data structure problem (which is actually more complex than yours).
BTW: I don't have the faintest idea what "metric model" means. Inches? Miles to the gallon? Or just "measurements" ? Could you please explain a bit more what you intend to do?
EDIT: BTW2: the model you propose is technically not too difficult for postgres. But it is probably bigger than necessary for humans.

My question: Given my concerns, do you think this schema -- because of
how deeply nested it is -- lends itself more to NoSQL?
Deep nesting is not a point pro or contra SQL/NoSQL.
If so, do you > think the reporting on the "records" will be difficult?
That's the tipping point and here you don't give us the relevant information: What is this "reporting" thing in your case?
Does one report aggregate much data? E.g. does it simply aggregate all records and return a sum of them?
Does it aggregate over many of your layers?
Does a report evaluate strictly hierarchical or does it correlate event1.metric4.record42 to event2.metric18.record50 (or something like that)?
How much data must be transfered from the NoSQL DB to your application only to aggregate it an throw most of the parts away.
How unstructured is your data? Well - very structured it seems.
Those are typical situations/points where RDBMs have proven their value. If these items are not important in your case, then you can choose freely.

Related

What business folks have to understand about database design

I have a business team asking me about setting up a meeting to explain them about database design considerations. Since they do not have much idea on RDMS I'm to thinking to explain below things
What is RDBMS
What is a table and what are constraints / why we need them
What is a transaction and what are ACID Properties
Things to consider before/while developing a dbms
a. Decide how much detail you need and how much you may in need future
b. Identify fields with unique values
c. Select the appropriate data types for your fields
d. Normalization and Index design
Also most of the time this team has their data coming in from flat files which we need to load into the DB and represent into the format they need. Anybody please suggest what can i explain more or any better way I can explain. And kind of their data is all over the place. I just want to emphazise more on thinking it through because we couldn't set up a stable process to do the import. Any suggestion for me is welcome as well :)
Appreciate your help!
You haven't said what your audience expects to take away from your presentation. So I'll have to guess, based on my dealings with business people in the past. Your mileage may vary.
Business people typically don't care about the skills and knowledge you put into doing a good job with database design, even when they say they do. They want to understand database design in terms of costs and benefits. That is how business people think.
So if you must cover some technical topic like indexing, do so from a cost benefit point of view. There is a cost to adding an index to a table, and there is a benefit to adding an index to a table. Figuring out in advance whether the benefit is worth the cost is the really tricky part, and they will be interested in this.
On a larger scale, data is a business asset. There is a cost to managing that asset well, and there is a benefit to managing that asset well. If you can connect your talk to these two concepts, they will be interested.
If they are really good business people, they will have a good understanding of the subject matter that the database covers, provided it's a part of the enterprise data that affects their business. If you have a good ER model of the data in the database, this model will connect every value in every table to an attribute, and every attribute will describe some aspect of the subject matter. This is a very different use of an ER model than just using it as a preliminary to creating a relational model.
Technical people tend to think of ER modeling as "relational modeling light". It's really much deeper than that. It's an analytical handle on the question "what does the data really mean?" And this is a handle on "what is the data really worth?". And this is where the technical world meets the business world.
How about starting from the basis of CRUD operations, then move on to normalization, give the scenarios for the need of Normalization and concept of Keys in RDBMS ,then you can talk about the ER modeling
Considering the fact that you are presenting to business folks, I think there would be 2 approaches best suited to your needs.
a) WHEN YOU HAVE LESS TIME:
Only cover topics which need minimum or no prior knowledge. Cover RDMS & things to consider.
Keep it simple and easy to understand. Tell them how your solution works and why it is an effective one.
Cover only topics which are relevant and make it layman friendly. Provide them the pros & cons of your DB design. Connect it to business needs.
In all cases, provide contextual examples which they may relate to with ease.
b) WHEN YOU HAVE MORE TIME
You may cover topics in detail as suggested in the previous comments. (#SQL_Underworld & #Ramya)

SQL and NoSQL which one is more suitable for this case and why?

In my project:
Data is not going to be modified (only query).
It is going to be more than 1.000.000 instances of data.
Query performance is critical.
In case of using SQL, it is going to be a single table with 7 columns. (no joints)
There are also different classification approaches used in NoSQL. Which are given below with some examples:
Column: Accumulo, Cassandra, HBase
Document: Clusterpoint, Couchdb, Couchbase, MarkLogic, MongoDB
Key-value: Dynamo, FoundationDB, MemcacheDB, Redis, Riak, FairCom c-treeACE
Graph: Allegro, Neo4J, OrientDB, Virtuoso, Stardog
Source: http://en.wikipedia.org/wiki/NoSQL#cite_note-7
First of all, does the database system really makes an observable amount of performance difference for this case?
If it makes then,can you please explain which one is more suitable for my project SQL or NoSQL, if NoSQL then which classification approach?
Thank you in advance
I am currently enrolled in a project to set up a "standard" Database with a huge amount of data. We start by implementing in SQL to see the performance of the queries. Once this is done we address the problem of performance.
There is multiple reasons for this, but to name a few:
Standard SQL is easily implemented and standard across multiple instances (as of present day)
If you know SQL, make a fast implementation. To save time and get the project going.
There are loads of information available about SQL implementations.
I cannot answer about NoSQL but hopefully someone can fill me in.
The important question you need to ask is what kind of queries you will be performing. For example ClusterPoint offers real-time aggregation, so if you need result grouping and extracting summaries, it gives you great performance.
For a regular key/value they should all perform pretty well, so pick the one you are most comfortable with.

SQL versus noSQL (speed)

When people are comparing SQL and noSQL, and concluding the upsides and downsides of each one, what I never hear anyone talking about is the speed.
Isn't performing SQL queries generally faster than performing noSQL queries?
I mean, for me this would be a really obvious conclusion, because you should always be able to find something faster if you know the structure of your database than if you don't.
But people never seem to mention this, so I want to know if my conclusion is right or wrong.
People who tend to use noSQL use it specifically because it fits their use cases. Being divorced from normal RDBMS table relationships and constraints, as well as ACID-ity of data, it's very easy to make it run a lot faster.
Consider Twitter, which uses NoSQL because a user only does very limited things on site, or one exactly - tweet. And concurrency can be considered non-existent since (1) nobody else can modify your tweet and (2) you won't normally be simultaneously tweeting from multiple devices.
The definition of noSQL systems is a very broad one -- a database that doesn't use SQL / is not a RDBMS.
Therefore, the answer to your question is, in short: "it depends".
Some noSQL systems are basically just persistent key/value storages (like Project Voldemort). If your queries are of the type "look up the value for a given key", such a system will (or at least should be) faster that an RDBMS, because it only needs to have a much smaller feature set.
Another popular type of noSQL system is the document database (like CouchDB).
These databases have no predefined data structure.
Their speed advantage relies heavily on denormalization and creating a data layout that is tailored to the queries that you will run on it. For example, for a blog, you could save a blog post in a document together with its comments. This reduces the need for joins and lookups, making your queries faster, but it also could reduce your flexibility regarding queries.
As Einstein would say, speed is relative.
If you need to store a master/detail simple application (like a shopping cart), you would need to do several Insert statements in your SQL application, also you will get a Data set of information when you do a query to get the purchase, if you're using NoSQL, and you're using it well, then you would have all the data for a single order in one simple "record" (document if you use the terms of NoSQL databases like djondb).
So, I really think that the performance of an application can be measured by the number of things it need to do to achieve a single requirement, if you need to do several Inserts to store an order and you only need one simple Insert in a database like djondb then the performance will be 10x faster in the NoSQL world, just because you're using 10 times less calls to the database layer, that's it.
To illustrate my point let me link an example I wrote sometime ago about the differences between NoSQL and SQL data models approach: https://web.archive.org/web/20160510045647/http://djondb.com/blog/nosql-masterdetail-sample/, I know it's a self reference, but basically I wrote it to address this question which I found it's the most challenging question a RDBMS guy could have and it's always a good way to explain why NoSQL is so different from SQL world, and why it will achieve better performance anytime, not because we use "nasa" technology, it's because NoSQL will let the developer do less... and get more, and less code = greater performance.
The answer is: it depends. Generally speaking, the objective of NoSQL DATABASES (no "queries") is scalability. RDBMS usually have some hard limits at some point (I'm talking about millons and millons of rows) where you could not scale any more by traditional means (Replication, clustering, partitioning), and you need something more because your needs keep growing. Or even if you manage to scale, the overall setup is quite complicated. Or you can scale reads, but not writes.
And the queries depends on the particular implementation of your server, the type of query you are doing, the columns in the table, etc... remember that queries are just one part of the RDBMS.
query time of relational database like SQL for 1000 person data is 2000 ms and graph database like neo4j
is 2ms .if you crate more node 1000000 speed stable 2 ms

Fastest way to become a MySQL expert?

I have been using MySQL for years, mainly on smaller projects until the last year or so. I'm not sure if it's the nature of the language or my lack of real tutorials that gives me the feeling of being unsure if what I'm writing is the proper way for optimization purposes and scaling purposes.
While self-taught in PHP I'm very sure of myself and the code I write, easily can compare it to others and so on.
With MySQL, I'm not sure whether (and in what cases) an INNER JOIN or LEFT JOIN should be used, nor am I aware of the large amount of functionality that it has. While I've written code for databases that handled tens of millions of records, I don't know if it's optimum. I often find that a small tweak will make a query take less than 1/10 of the original time... but how do I know that my current query isn't also slow?
I would like to become completely confident in this field in the ability to optimize databases and be scalable. Use is not a problem -- I use it on a daily basis in a number of different ways.
So, the question is, what's the path? Reading a book? Website/tutorials? Recommendations?
EXPLAIN is your friend for one. If you learn to use this tool, you should be able to optimize your queries very effectively.
Scan the the MySQL manual and read Paul DuBois' MySQL book.
Use EXPLAIN SELECT, SHOW VARIABLES, SHOW STATUS and SHOW PROCESSLIST.
Learn how the query optimizer works.
Optimize your table formats.
Maintain your tables (myisamchk, CHECK TABLE, OPTIMIZE TABLE).
Use MySQL extensions to get things done faster.
Write a MySQL UDF function if you notice that you would need some
function in many places.
Don't use GRANT on table level or column level if you don't really need
it.
http://dev.mysql.com/tech-resources/presentations/presentation-oscon2000-20000719/index.html
The only way to become an expert in something is experience and that usually takes time. And a good mentor(s) that are better than you to teach you what you are missing. The problem is you don't know what you don't know.
Research and experience - if you don't have the projects to warrant the research, make them. Make three tables with related data and make up scenarios.
E.g.
Make a table of movies their data
make a table of user
make a table of ratings for users
spend time learning how joins work, how to get movies of a particular rating range in one query, how to search the movies table ( like, regex) - as mentioned, use explain to see how different things affect speed. Make a day of it; I guarantee your
handle on it will be greatly increased.
If you're still struggling for case-scenarios, start looking here on SO for questions and try out those scenarios yourself.
I don't know if MIT open courseware has anything about databases... Well whaddya know? They do: http://ocw.mit.edu/OcwWeb/Electrical-Engineering-and-Computer-Science/6-830Fall-2005/CourseHome/
I would recommend that as one source based only on MITs reputation. If you can take a formal course from a university you may find that helpful. Also a good understanding of the fundamental discrete mathematics/logic certainly would do no harm.
As others have said, time and practice is the only real approach.
More practically, I found that EXPLAIN worked wonders for me personally. Learning to read the output of that was probably the biggest single leap I made in being able to write efficient queries.
The second thing I found really helpful was SQL Tuning by Dan Tow, which describes a fairly formal methodology for extracting performance. It's a bit involved, but works well in lots of situations. And if nothing else, it will give you a much better understanding of the way joins are processed.
Start with a class like this one: https://www.udemy.com/sql-mysql-databases/
Then use what you've learned to create and manage a number of SQL databases and run queries. Getting to the expert level is really about practice. But of course you need to learn the pieces before you can practice.

Signs of a great SQL developer

Based on their work, how do you distinguish a great SQL developer?
Examples might include:
Seldom uses CURSORs, and tries to refactor them away.
Seldom uses temporary tables, and tries to refactor them away.
Handles NULL values in OUTER JOINs with confidence.
Avoids SQL extensions that are not widely implemented.
Knows how to indent with elegance.
I've found that a great SQL developer is usually also a great database designer, and will prefer to be involved in both the design and implementation of the database. That's because a bad database design can frustrate and hold back even the best developer - good SQL instincts don't always work right in the face of pathological designs, or systems where RI is poor or non-existent. So, one way to tell a great SQL developer is to test them on data modeling.
Also, a great DB developer has to have complex join logic down cold, and know exactly what the results of various multi-way joins will be in different situations. Lack of comfort with joins is the #1 cause of bad SQL code (and bad SQL design, for that matter).
As for specific syntax things, I'd hesitate at directives like:
Does not use CURSORs.
Does not use temporary tables.
Use of those techniques might allow you to tell the difference between a dangerously amateur SQL programmer (who uses them when simple relational predicates would be far better) and a decent starting SQL programmer (who knows how to do most stuff without them). However, there are many situations in real world usage where temp tables and cursors are perfectly adequate ways (sometimes, the only ways) to accomplish things (short of moving to another layer to do the processing, which is sometimes better anyway).
So, use of advanced concepts like these isn't forbidden, but unless you're clearly dealing with a SQL expert working on a really tough problem that, for some reason, doesn't lend itself to a relational solution ... yeah, they're probably warning signs.
I don't think that cursors, temporary tables or other SQL practices are inherently bad or that their usage is a clear sign of how good a database programmer is.
I think there is the right tool for every type of problem. Sure, if you only have a hammer, everything looks like a nail. I think a great SQL programmer or database developer is a person who knows which tool is the right one in a specific situation. IMHO you can't generalize excluding specific patterns.
But a rule of thumb may be: a great database developer will find a more short and elegant solution for complex situations than the average programmer.
Here are a few things that don't apply to run-of-the-mill software developers, but do apply to someone with good SQL skills:
Defines beneficial indexes, but not redundant or unused indexes.
Employs transactions effectively.
Values referential integrity.
Applies normalization to database design.
Thinks in terms of sets, not in terms of loops.
Uses JOIN confidently.
Knows how NULL and tri-value logic works.
Understands the uses and benefits of query parameters.
The examples you give, of not using cursors, temp tables, or knowing 3 alternative queries for a given task, I would not consider indications of being a great SQL developer. Perhaps I would call someone who does those things an "acrobat."
Just to add to the already great answers; The developer can reduce a complex problem to something simple and easy to maintain.
Knows how to use INFORMATION_SCHEMA and table metadata in order to write either generic code or to code generate code in order to save repetitive database tasks.