Sql database versus document database? - sql

Small introduction:
I've tried to develop my project with sql database, entity framework, linq.
I have table 'Users' and for example i have list of user educations. I ask myself: 'How much educations user can have? 1, 2, 10...' ? And seems at 99% of cases not more than 10. So for such example in sql i need to create referenced table 'Educations'. Right? If i need to display user educations and user i need to join above mentioned tables... But what if user have 10 or even more collections with not more than 10 items in each? I need to create 10 referenced tables in sql? And than join all of them when i need to display? For better performance i've created denormized tables with shape of data that i need to show on ui. And every time when user was updated, i need to update denormilized structure.
Now i redeveloped my project to use document database(MongoDB). And i've created one document for User with all 10 collections inside.
May be i've lost something? But seems document database win here. It's very fast and very easy to support. +1 to document database.
So, what is your opinion about what better to use document database or sql database?
When I should use document database and when sql?

This article has suggestions on when to use NoSQL DB's. Also this

Related

one table with a lot of rows or a lot of tables with a view? on SQL Server

My question comes from what is more efficient when making queries and insert, since the number of registers(data) in my table will grow a lot.
I would like to know what is more efficient to do if all the data is placed within a single table or is the partition and through a View and trigger is more efficient to obtain and enter registers(data).
As already mentioned take a look at database normalization.
SQL is a way to work with relational databases and is built on the idea that we should have many tables that are linked with each other trough relationships. Thus I recommend multiple tables, because you will be able to reuse data (for example user name and surname) through specific IDs rather than copying that data each time a user performs some action on your platform and you need to insert or update some information.
Hope this helps!

How to handle many columns / variable schema?

I apologize in advance in case this question has been asked already.
I'm working on revamping a reporting application used within my company. The requirements are:
Support addition of new fields (done through web app) and allow users to select those fields when building reports. Currently there are 300 of these, and right now their values are stored in a single SQL Server table with 300 columns. Users have to be able to select these new fields in report builder. In other words, the schema is dynamic.
Improve report generation performance.
My thought process was to split up these 300 (and potentially more) columns into multiple tables (normalization), but I'm not sure that's the right approach given there doesn't seem to be a logical way of grouping data without ending up with 20+ tables.
Another option would be to store values in rows (key, attribute, attribute-value) then do a pivot, but I'm not sure that would perform well. This option would handle the dynamic schema nicely, but the pivot statements would have to be built programmatically before a user can consume data (views).
Thanks!

A database in which the users can create data types

I have a really old application using an SQL database that I need to update. I would like to take also the opportunity to improve the database structure and I would appreciate some advice.
The basic problem is that an important part of the database must be user configurable without touching the code. To be more concrete, the DB stores products and these products have different specs (i.e. columns) depending on the type. The app must be able to search for any of the columns. There are only a few types (~20) but the administrator must be able to create a new one without touching the code.
The data that needs to be stored for each product are either strings or floats, and never more than 7 of each type.
Instead of creating an interface to create and delete tables, the following "solution" was implemented.
- In the Products Table, there is one column for the id; one column for the ProducTypeID; 7 string columns and 7 float columns
- In a ProducType column, there is one column for the ProducTypeID, and 14 string columns indicating the names of the 7 string columns and 7 float columns for each product type. If a product does not need so many columns, the column name is NULL
This works but due to the extra indirection is extremely annoying to maintain the client code.
The question is: Should I stay with an SQL DB and add a way to create/delete tables or should I use a noSQL DB? Which are the pros and cons in each case?
Keep in mind that in SQL databases, adding and removing columns on a large table can be a very expensive operation which can take minutes or even hours. Doing it on-the-fly is a really bad idea. Adding a bunch of "multi-purpose" columns to a table is not much better. It's hard to query and you have a limit on how many properties a product can have.
The usual by-the-book solution when each product has 0-n dynamic properties is to create a second table ProductID(primary key) | PropertyName(primary key) | PropertyValue. This allows each product to have any number of properties. You can easily JOIN it with the main products table to get all products with their properties.
When you are open to switching database technologies, you could also use a document-oriented NoSQL database which doesn't use a fixed schema like MongoDB or CouchDB. In such databases, each document in a collection can have a different set of fields. But before you decide to make this step, evaluate how such a database would affect other parts of your application. Listing everything that could be positively or negatively affected without knowing your whole application in and out would be too broad of a question.

Dynamically creating tables as a means of partitioning: OK or bad practice?

Is it reasonable for an application to create database tables dynamically as a means of partitioning?
For example, say I have a large table "widgets" with a "userID" column identifying the owner of each row. If this table tended to grow extremely large, would it make sense to instead have the application create a new table called "widgets_{username}" for each new user? Assume that the application will only ever have to query for widgets belonging to a single user at a time (i.e. no need to try and join any of these user widget tables together).
Doing this would break up the one large table into more easily-managed chunks, but this doesn't seem like an elegant solution. In my mind, the database schema should be defined when the application is written, and any runtime data is stored as rows, not as additional tables.
As a more general question, is modifying the database schema at runtime ever ok?
Edit: This question is mostly hypothetical; I had a pretty good feeling that creating tables at runtime didn't make sense. That being said, we do have a table with millions of rows in our application. SELECTs perform fine, but things like deleting all rows owned by a particular user can take a while. Basically I'm looking for some solid reasoning why just dynamically creating a table for each user doesn't make sense for when I'm asked.
NO, NO, NO!! Now repeat after me, I will not do this because it will create many headaches and problems in the future! Databases are made to handle large amounts of information. they use indexes to quickly find what you are after. think phone book how effective is the index? would it be better to have a different book for each last name?
This will not give you anything performance wise. Keep a single table, but be sure to index on UserID and you'll be able to get the data fast. however if you split the table up, it becomes impossible/really really hard to get any info that spans multiple users, like search all users for a certain widget, count of all widgets of a certain type, etc. you need to have every query be built dynamically.
If deleting rows is slow, look into that. How many rows at one time are we talking about 10, 1000, 100000? What is your clustered index on this table? Could you use a "soft delete", where you have a status column that you UPDATE to "D" to mark the row as deleted. Can you delete the rows at a later time, with less database activity. is the delete slow because it is being blocked by other activity. look into those before you break up the table.
No, that would be a bad idea. However some DBMSs (e.g. Oracle) allow a single table to be partitioned on values of a column, which would achieve the objective without creating new tables at run time. Having said that, it is not "the norm" to partition tables like this: it is only usually done in very large databases.
Using an index on userID should result nearly in the same performance.
In my opinion, changing the database schema at runtime is bad practice.
Consider, for example, security issues...
Is it reasonable for an application to create database tables
dynamically as a means of partitioning?
No. (smile)

Dynamic Database Schema [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
Improve this question
What is a recommended architecture for providing storage for a dynamic logical database schema?
To clarify: Where a system is required to provide storage for a model whose schema may be extended or altered by its users once in production, what are some good technologies, database models or storage engines that will allow this?
A few possibilities to illustrate:
Creating/altering database objects via dynamically generated DML
Creating tables with large numbers of sparse physical columns and using only those required for the 'overlaid' logical schema
Creating a 'long, narrow' table that stores dynamic column values as rows that then need to be pivoted to create a 'short, wide' rowset containing all the values for a specific entity
Using a BigTable/SimpleDB PropertyBag type system
Any answers based on real world experience would be greatly appreciated
What you are proposing is not new. Plenty of people have tried it... most have found that they chase "infinite" flexibility and instead end up with much, much less than that. It's the "roach motel" of database designs -- data goes in, but it's almost impossible to get it out. Try and conceptualize writing the code for ANY sort of constraint and you'll see what I mean.
The end result typically is a system that is MUCH more difficult to debug, maintain, and full of data consistency problems. This is not always the case, but more often than not, that is how it ends up. Mostly because the programmer(s) don't see this train wreck coming and fail to defensively code against it. Also, often ends up the case that the "infinite" flexibility really isn't that necessary; it's a very bad "smell" when the dev team gets a spec that says "Gosh I have no clue what sort of data they are going to put here, so let 'em put WHATEVER"... and the end users are just fine having pre-defined attribute types that they can use (code up a generic phone #, and let them create any # of them -- this is trivial in a nicely normalized system and maintains flexibility and integrity!)
If you have a very good development team and are intimately aware of the problems you'll have to overcome with this design, you can successfully code up a well designed, not terribly buggy system. Most of the time.
Why start out with the odds stacked so much against you, though?
Don't believe me? Google "One True Lookup Table" or "single table design". Some good results:
http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:10678084117056
http://thedailywtf.com/Comments/Tom_Kyte_on_The_Ultimate_Extensibility.aspx?pg=3
http://www.dbazine.com/ofinterest/oi-articles/celko22
http://thedailywtf.com/Comments/The_Inner-Platform_Effect.aspx?pg=2
A strongly typed xml field in MSSQL has worked for us.
Like some others have said, don't do this unless you have no other choice. One case where this is required is if you are selling an off-the-shelf product that must allow users to record custom data. My company's product falls into this category.
If you do need to allow your customers to do this, here are a few tips:
- Create a robust administrative tool to perform the schema changes, and do not allow these changes to be made any other way.
- Make it an administrative feature; don't allow normal users to access it.
- Log every detail about every schema change. This will help you debug problems, and it will also give you CYA data if a customer does something stupid.
If you can do those things successfully (especially the first one), then any of the architectures you mentioned will work. My preference is to dynamically change the database objects, because that allows you to take advantage of your DBMS's query features when you access the data stored in the custom fields. The other three options require you load large chunks of data and then do most of your data processing in code.
I have a similar requirement and decided to use the schema-less MongoDB.
MongoDB (from "humongous") is an open source, scalable, high-performance, schema-free, document-oriented database written in the C++ programming language. (Wikipedia)
Highlights:
has rich query functionality (maybe the closest to SQL DBs)
production ready (foursquare, sourceforge use it)
Lowdarks (stuff you need to understand, so you can use mongo correctly):
no transactions (actually it has transactions but only on atomic operations)
this stuff here: http://ethangunderson.com/blog/two-reasons-to-not-use-mongodb/
durability .. mostly ACID related stuff
I did it ones in a real project:
The database consisted of one table with one field which was an array of 50. It had a 'word' index set on it. All the data was typeless so the 'word index' worked as expected. Numeric fields were represented as characters and the actual sorting had been done at client side. (It still possible to have several array fields for each data type if needed).
The logical data schema for logical tables was held within the same database with different table row 'type' (the first array element). It also supported simple versioning in copy-on-write style using same 'type' field.
Advantages:
You can rearrange and add/delete your columns dynamically, no need for dump/reload of database. Any new column data may be set to initial value (virtually) in zero time.
Fragmentation is minimal, since all records and tables are same size, sometimes it gives better performance.
All table schema is virtual. Any logical schema stucture is possible (even recursive, or object-oriented).
It is good for "write-once, read-mostly, no-delete/mark-as-deleted" data (most Web apps actually are like that).
Disadvantages:
Indexing only by full words, no abbreviation,
Complex queries are possible, but with slight performance degradation.
Depends on whether your preferred database system supports arrays and word indexes (it was inplemented in PROGRESS RDBMS).
Relational model is only in programmer's mind (i.e. only at run-time).
And now I'm thinking the next step could be - to implement such a database on the file system level. That might be relatively easy.
The whole point of having a relational DB is keeping your data safe and consistent. The moment you allow users to alter the schema, there goes your data integrity...
If your need is to store heterogeneous data, for example like a CMS scenario, I would suggest storing XML validated by an XSD in a row. Of course you lose performance and easy search capabilities, but it's a good trade off IMHO.
Since it's 2016, forget XML! Use JSON to store the non-relational data bag, with an appropriately typed column as backend. You shouldn't normally need to query by value inside the bag, which will be slow even though many contemporary SQL databases understand JSON natively.
Sounds to me like what you really want is some sort of "meta-schema", a database schema which is capable of describing a flexible schema for storing the actual data. Dynamic schema changes are touchy and not something you want to mess with, especially not if users are allowed to make the change.
You're not going to find a database which is more suited to this task than any other, so your best bet is just to select one based on other criteria. For example, what platform are you using to host the DB? What language is the app written in? etc
To clarify what I mean by "meta-schema":
CREATE TABLE data (
id INTEGER NOT NULL AUTO_INCREMENT,
key VARCHAR(255),
data TEXT,
PRIMARY KEY (id)
);
This is a very simple example, you would likely have something more specific to your needs (and hopefully a little easier to work with), but it does serve to illustrate my point. You should consider the database schema itself to be immutable at the application level; any structural changes should be reflected in the data (that-is, the instantiation of that schema).
I know that models indicated in the question are used in production systems all over. A rather large one is in use at a large university/teaching institution that I work for. They specifically use the long narrow table approach to map data gathered by many varied data acquisition systems.
Also, Google recently released their internal data sharing protocol, protocol buffer, as open source via their code site. A database system modeled on this approach would be quite interesting.
Check the following:
Entity-attribute-value model
Google Protocol Buffer
Create 2 databases
DB1 contains static tables, and represents the "real" state of the data.
DB2 is free for users to do with as they wish - they (or you) will have to write code to populate their odd-shaped tables from DB1.
EAV approach i believe is the best approach, but comes with a heavy cost
I know it's an old topic, but I guess that it never loses actuality.
I'm developing something like that right now.
Here is my approach.
I use a server setting with a MySQL, Apache, PHP, and Zend Framework 2 as application framework, but it should work as well with any other settings.
Here is a simple implementation guide, you can evolve it yourself further from this.
You would need to implement your own query language interpreter, because the effective SQL would be too complicated.
Example:
select id, password from user where email_address = "xyz#xyz.com"
The physical database layout:
Table 'specs': (should be cached in your data access layer)
id: int
parent_id: int
name: varchar(255)
Table 'items':
id: int
parent_id: int
spec_id: int
data: varchar(20000)
Contents of table 'specs':
1, 0, 'user'
2, 1, 'email_address'
3, 1, 'password'
Contents of table 'items':
1, 0, 1, ''
2, 1, 2, 'xyz#xyz.com'
3, 1, 3, 'my password'
The translation of the example in our own query language:
select id, password from user where email_address = "xyz#xyz.com"
to standard SQL would look like this:
select
parent_id, -- user id
data -- password
from
items
where
spec_id = 3 -- make sure this is a 'password' item
and
parent_id in
( -- get the 'user' item to which this 'password' item belongs
select
id
from
items
where
spec_id = 1 -- make sure this is a 'user' item
and
id in
( -- fetch all item id's with the desired 'email_address' child item
select
parent_id -- id of the parent item of the 'email_address' item
from
items
where
spec_id = 2 -- make sure this is a 'email_address' item
and
data = "xyz#xyz.com" -- with the desired data value
)
)
You will need to have the specs table cached in an associative array or hashtable or something similar to get the spec_id's from the spec names. Otherwise you would need to insert some more SQL overhead to get the spec_id's from the names, like in this snippet:
Bad example, don't use this, avoid this, cache the specs table instead!
select
parent_id,
data
from
items
where
spec_id = (select id from specs where name = "password")
and
parent_id in (
select
id
from
items
where
spec_id = (select id from specs where name = "user")
and
id in (
select
parent_id
from
items
where
spec_id = (select id from specs where name = "email_address")
and
data = "xyz#xyz.com"
)
)
I hope you get the idea and can determine for yourself whether that approach is feasible for you.
Enjoy! :-)
Over at the c2.com wiki, the idea of "Dynamic Relational" was explored. You DON'T need a DBA: columns and tables are Create-On-Write, unless you start adding constraints to make it act more like a traditional RDBMS: as a project matures, you can incrementally "lock it down".
Conceptually you can think of each row as an XML statement. For example, an employee record could be represented as:
<employee lastname="Li" firstname="Joe" salary="120000" id="318"/>
This does not imply it has to be implemented as XML, it's just a handy conceptualization. If you ask for a non-existing column, such as "SELECT madeUpColumn ...", it's treated as blank or null (unless added constraints forbid such). And it's possible to use SQL, although one has to be careful about comparisons because of the implied type model. But other than type handling, users of a Dynamic Relational system would feel right at home because they can leverage most of their existing RDBMS knowledge. Now, if somebody would just build it...
In the past I've chosen option C -- Creating a 'long, narrow' table that stores dynamic column values as rows that then need to be pivoted to create a 'short, wide' rowset containing all the values for a specific entity.. However, I was using an ORM, and that REALLY made things painful. I can't think of how you'd do it in, say, LinqToSql. I guess I'd have to create a Hashtable to reference the fields.
#Skliwz: I'm guessing he's more interested in allowing users to create user-defined fields.
ElasticSearch. You should consider it especially if you're dealing with datasets that you can partition by date, you can use JSON for your data, and are not fixed on using SQL for retrieving the data.
ES infers your schema for any new JSON fields you send, either automatically, with hints, or manually which you can define/change by one HTTP command ("mappings").
Although it does not support SQL, it has some great lookup capabilities and even aggregations.
I know this is a super old post, and much has changed in the last 11 years, but thought I would added this as it might be helpful to future readers. One of the reason's why my co-founders and I created HarperDB is to natively accomplish Dynamic schema in a single, unduplicated data set while providing full index capability. You can read more about it here:
https://harperdb.io/blog/dynamic-schema-the-harperdb-way/
sql already provides a way to change your schema: the ALTER command.
simply have a table that lists the fields that users are not allowed to change, and write a nice interface for ALTER.