How to design a database that is to change often? - sql

I inherit a project of a program that configures devices via ethernet. Settings are stored in the database. The set of settings is constantly changing as devices are developing so there's a need for a simple schema change (user must be able to perform this operation).
Now, this simplicity is achieved by the XSD-scheme (easy readable and editable), and the data is stored as XML. This approach also satisfies the requirement of the use of various database engines (MS SQL and Oracle are currently supported).
I want to move database structure to the relational model. Are there any solutions which are as easy-to-change as described one while using a relational database?

I want to move database structure to the relational model.
Why?
Do you want to be able to index/query parts of the configuration, or be able to change just one part of the configuration without touching the rest?
If no, then just treating the XML as opaque BLOB should be sufficient.
If yes, then you'll have to tell us more about the actual structure of configuration.1
1 BTW, some DBMSes can "see inside" the XML, index the elemnts and so on, but that would no longer be DBMS-agnostic.

There are several solutions to your design problem.
I suggest the following;
Use a different database. Relational databases are not the best choice for this kind of data. There are databases with good support for dynamic data. One example of such a database is mongoDB, which uses JSON-style documents.
or
2. Create one (or a small set) of Key/Value table(s). You can support a hierarcical structure by adding a parent column that points to the parent key-value pair.
I wouldn't recommend changing a relational db schema on the fly as the result of a user operation. It goes against fundamental design rules for relational database design.

Related

DocumentDB vs. SQL Database

I have a question regarding the usage of a DocumentDB or SQL-Database.
E.g. I have categories which can have multiple child categories and so on. Every category can have multiple attributes and every attribute can have one or many values. Would it be better to use a schemaless solution like a DocumentDB because I could add new sub categories etc. with no effort or is it better to stick with a schema and use a SQL-Database.
Many thanks in advance.
As #DavidMakogon said, there is not a standard & absolute right answer, it just up to you and up to application scenario. For this current needs to store a tree structure of categories with attributes, it's simple to design database schema & develop application for both without any addition condition like data volume and concurrency, etc, and both are good.
Consideration for others, there are two documents may help analyzing the features which you may need to use in your application or more suitable for your scenario, to make your choice.
MongoDB vs MySQL: Comparison Between RDBMS and Document Oriented Database, it's very similar for comparision between DocumentDB and SQL Database.
10 things never to do with a relational database, I think the advantage of RDBMS is as well known and be suitable for which scenario, but NoSQL's not.
Hope it helps.

SQL database design with some user defined fields

I'm developing a database schema to handle collection of data and later, reporting on this data.
After a requirements discussion, it seems that either an entity-attribute-value (EAV) solution, or a flat table solution would be alright - since the data is somewhat sparse but not highly sparse.
However, user defined fields will become a must in the future, but I understand that querying and optimizing an RDBMS with EAV tables can become complex.
I've taken a look at the discussion here, and I was thinking an option similar to option 1 would be possible. For example, have a number of set fields, then a number of spare fields that users can define the labels of.
In terms of reporting, is there any downside in using this approach rather than using EAV?
You will regret EAV, especially when it comes to reporting
Make sure you're aware of existing data model patterns before you try anything: Ready to use database model patterns
Familiarize yourself with Table Inheritance: How can you represent inheritance in a database?
Consider allowing users to modify their own schemas: https://martinfowler.com/bliki/UserDefinedField.html
EAV is almost always a really bad idea. If you still need custom fields after trying the above, use a blob type (like JSON or XML) with indexing: http://backchannel.org/blog/friendfeed-schemaless-mysql . Postgres's binary jsonb is fast and allows indexing/querying

How Does NoSQL Scale Out Exactly?

I use SQL Server 2012. I have a database sharded across physical tiers by User ID. In my app User is an aggregate root (i.e., nothing about Users comes from or goes into my repository without the entire User coming or going). Everything about one particular User exists on one particular machine.
Is my system any less scalable than one that employs NoSQL? Or, can someone explain how NoSQL systems scale out across servers exactly? Wouldn't they have to shard in a similar manner to what I'm doing? We've all read that NoSQL enables scalability but at the moment I don't see how, say, MongoDB would benefit my architecture.
MongoDB allows you to scale in two ways: sharding and replication. I think you can do both in MS SQL Server.
What usually is different is the data model:
In a relational database, you typically have multiple tables that reference each other. Theoretically, you can do something similar with MongoDB by using multiple collections, however this is not the way it's typically done. Instead, in MongoDB, you tend to store all the data that belongs together in the same collection. So typically you have less collections than tables in a database. This will in many times result in more redundancy (data is copied). You can try to do that in a relational database, but it's not quite so easy (there will be less tables, each having more columns).
MongoDB collections are more flexible than tables in that you don't need to define the data model up front (the exact list of columns / properties, the data types). This allows you to change the data model without having to alter the tables - the disadvantage is that you need to take this into account in the application (you can't rely on all rows / documents having the same structure). I'm not sure if you can do that in MS SQL Server.
In MongoDB, each document is a Json object, so it's a tree and not a flat table. This allows more flexibility in the data model. For example, in an application I'm developing (Apache Jackrabbit Oak / MongoMK), for each property (column) we can store multiple values; one value for each revision. Doing that in a relational database is possible, but quite tricky.

Normalisation and multi-valued fields

I'm having a problem with my students using multi-valued fields in access and getting confused about normalisation as a result.
Here is what I can make out. Given a 1-to-many relationship, e.g.
Articles Comments
-------- --------
artID{PK} commID{PK}
text text
artID{FK}
Access makes it possible to store this information into what appears to be one table, something like
Articles
--------
artID{PK}
text
comment
+ value
"value" referring to multiple comment values for the comment "column", which access actually stores as a separate table. The specifics of how the values are stored - table, its PK and FK - is completely hidden, but it is possible to query the multi-valued field, e.g. in the example above with the query
INSERT INTO article( [comment].Value )
VALUES ('thank you')
WHERE artID = 1;
But the query doesn't quite reveal the underlying structure of the hidden table implementing the multi-valued field.
Given this (disaster, in my view) - my problem is how to help newcomers to database design and normalisation understand what Access is offering them, why it may not be helpful, and that it is not a reason to ignore the basics of the relational model. More specifically:
Are there better ways, besides queries as above, to reveal the structure behind multi-valued fields?
Are there good examples of where the multi-valued field is not good enough, and shows the advantage of normalising explicitly?
Are there straightforward ways to obtain the multi-select visual output of Access multi-values, but based on separate, explicit tables?
Thanks!
I cannot give you advice in using this feature, because I never used it; however, I can give you reasons not to use it.
I want to have full control on what I'm doing. This is not the case for multi-valued fields, therefore I don't use them.
This feature is not expandable. What if you want to add a date field to your comments, for instance?
It is sometimes necessary to upsize an Access (backend) database to a "big" database (SQL Server, Oracle). These Databases don't offer such a feature. It is often the customer who decides which database has to be used. Recently I had to migrate an Access application (frontend) using an Oracle backend to a SQL-Server backend because my client decided to drop his Oracle server. Therefore it is a good idea to restrict yourself to use only common features.
For common tasks like editing lookup tables I created generic forms. My existing solutions will not work with multi-valued fields.
I have a (self-made) tool that synchronizes changes in the structure of the database on my developer’s site with the database on the client’s site. This tool cannot deal with multi-valued fields.
I have tools for the security management that can grant SELECT, INSERT, UPDATE and DELETE rights on tables or revoke them. Again, the management tool does not work with multi-valued fields.
Having a separate table for the comments allows you to quickly inspect all the comments (by opening the table). You cannot do this with multi-valued fields.
You will not see the 1 to n relation between the articles and the comments in a database diagram.
With a separate table you can choose whether you want to cascade deletes to the details table or not. If you don't, you will not be able to delete an article as long as there are comments attached to it. This can be desirable, if you want to protect the comments from being deleted inadvertently.
It is important to realize the difference between physical and logical relationships. Today the whole internet and web services (SOAP) quite much realizes on a data format that is multi-value in nature.
When you represent multi-value data with a relational database (such as Access), then behind the scenes you are using a traditional (and legitimate) relation. I cannot stress that as such, then the use of multi-value columns in Access is in fact a LEGITIMATE relational model.
The fact that table is not exposed does not negate this issue. In fact, if you represent an invoice (master record, and repeating details) as a XML data cube, then we see two things:
1) you can build and represent that invoice with a relational database like Access
2) such a relational data model that is normalized can ALSO be represented as a SINGLE xml string.
3) deleting the XML record (or string) means that cascade delete of the child rows (invoice details) MUST occur.
So while it is true that Multi-Value fields been added to Access to deal with SharePoint, it is MOST important to realize that such data can be mapped to a relational database (if you could not do this, then Access could not consume that XML data using relational database tables as ACCESS CURRENTLY DOES RIGHT NOW).
And with the web such as XML, and SharePoint then the need to consume and manage and utilize such data is not only widespread, but is in fact a basic staple of the internet.
As more and more data becomes of a complex nature, we find the requirement for multi-value data exploding in use. Anyone who used that so called "fad" the internet is thus relying and using data that is in fact VERY OFTEN XML and is multi-value (complex) in nature.
As long as the logical (not physical) relational data model is kept, then use of multi-value columns to represent such data is possible and this is exactly what Access is doing (it is mapping the relational data model to a complex model). Note that the complex (xml) data model does NOT necessary have to be relational in nature. However, if you ARE going to map such data to Access then the complex multi-value model MUST CONFORM TO A RELATIONAL data model.
This is EXACTLY what is occurring in Access.
The fact that such a correct and legitimate math relational model is not exposed is of little issue here. Are we to suggest that because Excel does not expose the binary codes used then users will never learn about computers? Or perhaps we all must program in assembler so we all correctly learn how computers works.
At the end of the day, who cares and why does this matter? The fact that people drive automatic cars today does not toss out the concept that they are using different gears to operate that car. The idea that we shut down all of society because someone is going to drive an automatic car or in this case use complex data would be galactic stupid on our part.
So keep in mind that extensions to SQL do exist in Access to query the multi-value data, but as well pointed out here those underlying tables are not exposed. However, as noted, exposing such tables would STILL REQUIRE one to not change or mess with cascade delete since that feature is required TO MAINTAIN A INTERSECTION OF FEATURES and a CORRECT MATH relational model between the complex data model (xml) and that of using two related tables to represent such data.
In other words, you can use related tables to represent the complex data model IF YOU REMOVE the ability of users to play with the referential integrity options. The RI options MUST remain as set in those hidden tables else such data will not be able to make the trip BACK to the XML or complex data model of which it was consumed from.
As noted, in regards to users being taught how gasoline reacts with oxygen for that of learning to drive a car, or using a word processor and being forced to learn a relational model and expose the underlying tables makes little sense here.
However, the points made here in regards to such tables being exposed are legitimate concerns.
The REAL problem is SQL server and Oracle etc. cannot consume or represent that complex data WHILE ACCESS CAN CONSUME such data.
As noted, the complex data ship has LONG ago sailed! XML, soap, and the basic technologies of the internet are based on this complex data model.
In effect, SQL server, Oracle and most databases cannot that consume this multi-value data represent it without users having to create and model such data in a relational fashion is a BIG shortcoming of SQL server etc.
Access stands alone in this ability to consume this data.
So, for anyone who used a smartphone, iPad or the web, you are using basic technologies that are built around using complex data, something that Access now allows.
It is likely that the rest of the industry will have to follow suit given that more and more data is complex in nature. If the database industry does not change, then the mainstream traditional relational database system will NOT be the resting place of such data.
A trend away from storing data in related tables is occurring at a rapid pace right now and products like SharePoint, or even Google docs is proof of this concept. So Access is only reacting to market pressures and it is likely that other database vendors will have to follow suit or simply give up on being part of the "fad" called the internet.
XML and complex data structures are STAPLE and fact of our industry right now – this is not an issue we all should run away from, but in fact embrace.
Albert D. Kallal (Access MVP)
Edmonton, Alberta Canada
kallal#msn.com
The technical discussion is interesting. I think the real problem lies in student understanding. Because it is available in Access students will use it, and initially it will probably provide a simple solution to some design problems. The negatives will occur later when they try and use the data. Maybe a simple example demonstrating the problems would persuade some students to avoid using multi-valued fields ? Maybe an example of storing the data in another, more usable format would help ?
Good luck !
Peter Bullard
MS Access does a great job of simplifying database management and abstracting out a lot of complexity. This however makes the learning of dbms concepts a bit difficult. Have you tried using other 'standard' dbms tools like MySQL (or even sqlite). From a learning perspective they may be better.
I know this post is old. But, it's not quite the same as every other post I've seen on this topic. This one has someone making a good case for using Multi Valued Fields...
As someone who is trying who is still trying very hard to get their head around Access, I find the discussion for and against using the Multi Valued Fields incredibly frustrating.
I'm trying to sort through it all, but if everyone is so against them, what is an alternative method? It seems that in every search result I find everyone is either telling you how to use Multi Valued Fields and Controls or telling you how horrible and what a mistake they are. Many people refer to an alternative to them, but nobody says "Here's an example". I'm here to learn about these things. And while I know that this is a simpler concept for a lot of people in these forums, I could really use some examples to take a look at.
I'm at a point where I have to decide which way to go. It would be wonderful to compare examples of using Multi Valued Fields and alternatives and using a control to select multiple values.
Or am I wrong and the functionality of a combobox where you can select multiple items is only available through Access?
I want to address the last of your questions first. There is a way of providing a visual presentation of a parent child relationship. It's called subforms. If you get help about subforms in Access, it will explain the concept.
I have used subforms in a project where I wanted to display the transaction header in a form and the transaction details in a subform. There is nothing to hinder this construct even when the data is stored in two normalized tables.
Of course, this affects the screen, not the database. That's the whole point. Normalization is relevant to storage and retrieval, not to other uses of data.

Is SQL the ''assembler'' of the NoSQL database world?

I recently came across http://www.fossil-scm.org/index.html/doc/tip/www/theory1.wiki by D. Richard Hipp, the developer responsible for SQLite.
it go me thinking, is Fossil the only NoSQL database that uses SQL?
Do others uses SQL as a 'High Level Scripting Language'?
From the article, it sounds like Fossil isn't a database any more than git is a database. Yes, it's a thing that contains data, and yes, it's backed by a database, but it seems pretty far from a database itself. So the first part of of your question basically relies on a faulty assumption. There is a database called Friendly which uses MySQL to store schema-less models, but it seems like an awkward bandaid sort of solution at best.
I'm certainly not familiar with all of the NoSQL options out there, but, to my knowledge, none of the well-though-of ones use SQL for anything. MongoDB and CouchDB, the two I'm most familiar with, both use Javascript as part of their query interface, though in very different ways. MongoDB has queries more like what you'd expect from a relational database: you can write an arbitrary query for all documents that match a certain set of attributes. However, unlike a relational database, there's no such thing as a join (you'll only ever get a list of distinct documents back, not compound documents) and you can write arbitrary Javascript code to select documents. CouchDB, on the other hand, does not allow arbitrary queries. Instead, you create views (which are essentially simpler key-value stores) using map/reduce functions written in Javascript and then query those views from a start key to and end key.
In both cases, the type of information being transmitted to the server to perform the query isn't well-suited for the type of problem that SQL is good at solving. The trade-off to SQL being so high-level (to use the logic of the author of the paper) is that it's only suitable for a very narrow set of problems.
The creator of Fossil / SQLite is working and pushing UnQL as the NoSQL standard:
UnQL means Unstructured Query Language.
It's an open query language for JSON, semi-structured and document
databases.
It looks like a stripped down version of SQL.