How to handle many columns / variable schema? - sql

I apologize in advance in case this question has been asked already.
I'm working on revamping a reporting application used within my company. The requirements are:
Support addition of new fields (done through web app) and allow users to select those fields when building reports. Currently there are 300 of these, and right now their values are stored in a single SQL Server table with 300 columns. Users have to be able to select these new fields in report builder. In other words, the schema is dynamic.
Improve report generation performance.
My thought process was to split up these 300 (and potentially more) columns into multiple tables (normalization), but I'm not sure that's the right approach given there doesn't seem to be a logical way of grouping data without ending up with 20+ tables.
Another option would be to store values in rows (key, attribute, attribute-value) then do a pivot, but I'm not sure that would perform well. This option would handle the dynamic schema nicely, but the pivot statements would have to be built programmatically before a user can consume data (views).
Thanks!

Related

one table with a lot of rows or a lot of tables with a view? on SQL Server

My question comes from what is more efficient when making queries and insert, since the number of registers(data) in my table will grow a lot.
I would like to know what is more efficient to do if all the data is placed within a single table or is the partition and through a View and trigger is more efficient to obtain and enter registers(data).
As already mentioned take a look at database normalization.
SQL is a way to work with relational databases and is built on the idea that we should have many tables that are linked with each other trough relationships. Thus I recommend multiple tables, because you will be able to reuse data (for example user name and surname) through specific IDs rather than copying that data each time a user performs some action on your platform and you need to insert or update some information.
Hope this helps!

Using multiple values in a SQL column - Is it possible (or even a good idea)?

If you were creating a todo list app where users can sign up and manage their own todo list, would it be better to:
Option 1: Have a todo_lists table in the database. There will be a username, password, and todo list items column.
Option 2: When a user creates an account, their username and password will be recorded in a users table. Then, a new table would be created automatically created, called (insert your username)_list, where their todo list items would be stored.
The problem with option 1 is I do not know if it is possible to store multiple values in a single column. (I couldn't find any good information on if SQL supports any sort of arrays.)
The problem with option 2 is I'm not sure if creating a new table for every user is bad performance-wise.
If the records all have the same structure, then they should be stored in the same table.
SQL is optimized to handle large tables, rather than large number of parallel tables. And, there are many other advantages when querying the data -- if the data is in multiple tables, then you will need to use dynamic SQL for many otherwise simple queries.
There are a handful of situations where separate tables -- even separate tables in different databases -- are desirable. These mostly center on security-intensive applications and such considerations are probably not important for your needs.

Having all contact information in one table vs. using key-value tables

(NB. The question is not a duplicate for this, since I am dealing with an ORM system)
I have a table in my database to store all Contacts information. Some of the columns for each contact is fixed (e.g. Id, InsertDate and UpdateDate). In my program I would like to give user the option to add or remove properties for each contact.
Now there are of course two alternatives here:
First is to save it all in one table and add and remove entire columns when user needs to;
Create a key-value table to save each property alongside its type and connect the record to user's id.
These alternatives are both doable. But I am wondering which one is better in terms of speed? In the program it will be a very common thing for the user to view the entire Contact list to check for updates. Plus, I am using an ORM framework (Microsoft's Entity Framework) to deal with database queries. So if the user is to add and remove columns from a table all the time, it will be a difficult task to map them to my program. But again, if alternative (1) is a significantly better option than (2), then I can reconsider the key-value option.
I have actually done both of these.
Example #1
Large, wide table with columns of data holding names, phone, address and lots of small integer values of information that tracked details of the clients.
Example #2
Many different tables separating out all of the Character Varying data fields, the small integer values etc.
Example #1 was a lot faster to code for but in terms of performance, it got pretty slow once the table filled with records. 5000 wasn't a problem. When it reached 50,000 there was a noticeable performance degradation.
Example #2 was built later in my coding experience and was built to resolve the issues found in Example #1. While it took more to get the records I was after (LEFT JOIN this and UNION that) it was MUCH faster as you could ultimately pick and choose EXACTLY what the client was after without having to search a massive wide table full of data that was not all being requested.
I would recommend Example #2 to fit your #2 in the question.
And your USER specified columns for their data set could be stored in a table just to their own (depending on how many you have I suppose) which would allow you to draw on the table specific to that USER, which would also give you unlimited ability to remove and add columns to suit that particular setup.
You could then also have another table which kept track of the custom columns in the custom column table, which would give you the ability to "recover" columns later, as in "Do you want to add this to your current column choices or to one of these columns you have deleted in the past".

A database in which the users can create data types

I have a really old application using an SQL database that I need to update. I would like to take also the opportunity to improve the database structure and I would appreciate some advice.
The basic problem is that an important part of the database must be user configurable without touching the code. To be more concrete, the DB stores products and these products have different specs (i.e. columns) depending on the type. The app must be able to search for any of the columns. There are only a few types (~20) but the administrator must be able to create a new one without touching the code.
The data that needs to be stored for each product are either strings or floats, and never more than 7 of each type.
Instead of creating an interface to create and delete tables, the following "solution" was implemented.
- In the Products Table, there is one column for the id; one column for the ProducTypeID; 7 string columns and 7 float columns
- In a ProducType column, there is one column for the ProducTypeID, and 14 string columns indicating the names of the 7 string columns and 7 float columns for each product type. If a product does not need so many columns, the column name is NULL
This works but due to the extra indirection is extremely annoying to maintain the client code.
The question is: Should I stay with an SQL DB and add a way to create/delete tables or should I use a noSQL DB? Which are the pros and cons in each case?
Keep in mind that in SQL databases, adding and removing columns on a large table can be a very expensive operation which can take minutes or even hours. Doing it on-the-fly is a really bad idea. Adding a bunch of "multi-purpose" columns to a table is not much better. It's hard to query and you have a limit on how many properties a product can have.
The usual by-the-book solution when each product has 0-n dynamic properties is to create a second table ProductID(primary key) | PropertyName(primary key) | PropertyValue. This allows each product to have any number of properties. You can easily JOIN it with the main products table to get all products with their properties.
When you are open to switching database technologies, you could also use a document-oriented NoSQL database which doesn't use a fixed schema like MongoDB or CouchDB. In such databases, each document in a collection can have a different set of fields. But before you decide to make this step, evaluate how such a database would affect other parts of your application. Listing everything that could be positively or negatively affected without knowing your whole application in and out would be too broad of a question.

DYnamic SQL examples

I have lately learned what is dynamic sql and one of the most interesting features of it to me is that we can use dynamic columns names and tables. But I cannot think about useful real life examples. The only one that came into my mind is statistical table.
Let`s say that we have table with name, type and created_data. Then we want to have a table that in columns are years from created_data column and in row type and number of names created in years. (sorry for my English)
What can be other useful real life examples of using dynamic sql with column and table as parameters? How do you use it?
Thanks for any suggestions and help :)
regards
Gabe
/edit
Thx for replies, I am particulary interested in examples that do not contain administrative things or database convertion or something like that, I am looking for examples where the code in example java is more complicated than using a dynamic sql in for example stored procedure.
An example of dynamic SQL is to fix a broken schema and make it more usable.
For example if you have hundreds of users and someone originally decided to create a new table for each user, you might want to redesign the database to have only one table. Then you'd need to migrate all the existing data to this new system.
You can query the information schema for table names with a certain naming pattern or containing certain columns then use dynamic SQL to select all the data from each of those tables then put it into a single table.
INSERT INTO users (name, col1, col2)
SELECT 'foo', col1, col2 FROM user_foo
UNION ALL
SELECT 'bar', col1, col2 FROM user_bar
UNION ALL
...
Then hopefully after doing this once you will never need to touch dynamic SQL again.
Long-long ago I have worked with appliaction where users uses their own tables in common database.
Imagine, each user can create their own table in database from UI. To get the access to data from these tables, developer needs to use the dynamic SQL.
I once had to write an Excel import where the excel sheet was not like a csv file but layed out like a matrix. So I had to deal with a unknown number of columns for 3 temporary tables (columns, rows, "infield"). The rows were also a short form of tree. Sounds weird, but was a fun to do.
In SQL Server there was no chance to handle this without dynamic SQL.
Another example from a situation I recently came up against. A MySQL database of about 250 tables, all in MyISAM engine and no database design schema, chart or other explanation at all - well, except the not so helpful table and column names.
To plan for conversion to InnoDB and find possible foreign keys, we either had to manually check all queries (and the conditions used in JOIN and WHERE clauses) created from the web frontend code or make a script that uses dynamic SQL and checks all combinations of columns with compatible datatype and compares the data stored in those columns combinations (and then manually accept or reject these possibilities).