I want to create a reference table. The number of rows and columns can grow (and become redundant) over time. The column names may change over time as well. Lets asume the value in Column 'Position' is 'IT Engineer', in row(n). For that specific row(n) further down in the column structure, lets say in column 'Behind Sheds', is a value I need to retrieve. If the number of columns was fixed, it was no issue, but now the columns are added dynamically with T-SQL, which is fairly easy to do, even renaming these columns. My question is, is it good practice to add columns dynamicaly for the example above, or is there an better alternative, and how?
Thanks.
may be better use something like addition table with fields: person_id (forign key to main table), person_param (as example 'Position'), person_value (as example 'IT Engineer')?
I believe your issue is the right for implementing the EAV model.
Basically instead of having 20 columns, you have 3:
Entity
Attribute (otherwise known as column name)
Value (might actually be more values, of different types, and you just populate the right one)
That way, for 10 entities in the database, having 20 columns each, you populate 200 rows of data in an EAV table. It has less performance, it's harder to query, but it does allow flexibility, so you can make each entity have different sets of attributes even.
Related
In our database we have many tables with a 'Notes' column. This is important functionality, but for most rows the value of Notes is null. These tables have many columns and we would like to remove some columns for better legibility.
We could add one Notes table for every table that has a notes column. But this would create clutter of a different kind- too many small tables.
My idea is to create a generic Notes table and also a reference table. The Notes table would have a column for the notes text, a column for the id of the row being linked to, and a foreign key to the reference table. The reference table would have a text value for each table for which we need notes. Using these two tables we should be able to link the note back to whichever table and column it came from.
By using this solution, we remove any cases of null values from notes and also slim down some of our tables. All at the modest price of two additional tables. It feels very 'hacky' to me however. Is there a reason why using a 'generic' id column or a reference table of other tables is a bad idea from a DB management perspective?
Managing the references to disparate entities can be really challenging in SQL Server. Postgres, by contrast, supports inheritance which makes this much simpler.
So, my recommendation is to add a notes column to every entity where you want notes. You an add a view to bring all the notes together if you need a view of all the notes.
This has minimal impact on performance or data size. There is no additional overhead for a varchar column, other than the additional NULL bit -- and that is pretty minimal.
IMO, the other solution of managing two tables doesn't bring in much efficiency but adds complexity to the solution. You should probably stick with the the notes column in the original table with datatype as varchar.
Generic id column is not bad inherently but the use of it generally gives smell of bad/hacky design.
Additionaly for SQL Server you can use sparse for the note columns to reduce size.
But i used a similary approach myself. (Note column needed for many columns to write info / changerequest / lockcomment. But normally never used).
Works fine and can be programmed genericaly in source.
But if you need only one comment column per table i wood prefer sparse
It might be a stupid question, but here goes:
Is it possible to make a dynamic table that's able to contain rows with variable number of columns and custom column names?
I have glanced over EAV-modelling, but it seems heavy. A real life example could be this:
Let's say I have a register with customers. But each customer might have different information to be entered. And depending on what you want to enter, it should be reflected in the database. (I.E. every customer has different columns)
Is this impossible/probable?
Update:
The standard approach (i.e. having a table with all needed columns and saving information only into columns that make sense for a particular customer while setting the remaining ones to NULL) doesn't work for me because what I want can't use 'fixed' column names. Example one customer might want CVR-number and another might want their phonenumber as a reference number. And a third might want some completely different information. So to avoid having a table containing 500 columns, I have now thought of making an extra table containing rows of column-data. Like so: Id, Name, Value, CustomerId. So when I want information for a customer, all I have to do is to iterate through this table with a specific customer Id.
my own edit!:
Sorry for troubling you with this simple SQL-issue! :-) Have a nice day...
You could model this as a one-to-many relationship between a Customer and a CustomerAttributes table. Something like:
**Customer table**
CustomerId
LastName
FirstName
...
**CustomerAttributes table**
CustomerId
AttributeName
AttributeValue
This is not possible in Sql-Server. As Marco says, you can store each customer's data in xml.
If all the columns are known ahead of time and some customers use one set and other customers use a different set, then sub-tables with each set of columns is the normal approach.
If the columns are not known ahead of time, then how would the data even be used? No code or reports could refer to it. Perhaps it should be stored unstructured in a general purpose 'Notes' field.
As far as I know it's not possible in standard relational databases, but you can take a look at schema-less databases called 'No-SQL' like MongoDB
The user wants to add new fields in UI dynamically. This new field should get stored in database and they should be allowed to perform CRUD on it.
Now I can do this by specifying a XML but I wanted a better way where these new columns are searchable. Also the idea of firing ALTER statement and adding a new column seems wrong.
Can anyone help me with a design pattern on database server side of how to solve this problem?
This can be approached using a key value system. You create a table with the primary key column(s) of the table you want to annotate, a column for the name of the attribute, and a column for its value. When you user wants to add an attribute (say height) to the record of person 123 you add a row to the new table with the values (123, 'HEIGHT', '140.5').
In general you cast the values to TEXT for storage but if you know all the attributes will be numeric you can choose a different type for the value column. You can also (not recommended) use several different value columns depending on the type of the data.
This technique has the advantage that you don't need to modify the database structure to add new attributes and attributes are only stored for those records that have them. The disadvantage is that querying is not as straightforward as if the columns were all in the main data table.
There is no reason why a qualified business user should not be allowed to add a column to a table. It is less likely to cause a problem than just about anything else you can imagine including adding a new row to a table or changing. the value of a data element.
Using either of the methods described above do not avoid any risk; they are simply throwbacks to COBOL filler fields or unnecessary embellishments of the database function. The result can still be unnormalized and inaccurate.
These same business persons add columns to spreadsheets and tables to Word documents without DBAs getting in their way.
Of course, just adding the column is the smallest part of getting an information system to work, but it is often the case that it is perceived to be an almost insurmountable barrier. It is in fact 5 min worth of work assuming you know where to put it. Adding a column to the proper table with the proper datatype is easy to do, easy to use, and has the best chance of encouraging data quality.
Find out what the maximum number of user-added fields will be and add them before hand. For example 'User1', 'User2', 'User3', 'User4'...etc. You can then enable the fields on the UI based on some configurable settings.
I am refactoring an old Oracle 10g schema to try to introduce some normalization. In one of the larger tables, there is a text field that has at most, 10-15 possible values. In my mind, it seems that this field is an example of unnecessary data duplication and should be extracted to a separate table.
After examining the data, I cannot find one relevant piece of information that could be associated with that text value. Basically, if I pulled that value out and put it into its own table, it would be the only field in that table. It exists today as more of a 'flag' field. Should I create a two-column table with a surrogate key, keep it as it is, or do something entirely different? Am I doing more harm than good by trying to minimize data duplication on this field?
You might save some space by extracting the column to a separate table. This is called a lookup table. It can give you a couple of other benefits:
You can declare a foreign key constraint to the lookup table, so you can rely on the column in the main table never having any value other than the 10-15 values you want.
It's easy to query for a concise list of all permitted values, by querying the lookup table. This can be faster than using SELECT DISTINCT on the main table's column. It also returns values that are permitted, but not currently used in the main table.
If you change a value in the lookup table, it automatically applies to all rows in the main table that reference it.
However, creating a lookup table with one column is not strictly normalization. You're just replacing one value with another. The attribute in the main table either already supports a normal form, or not.
Using surrogate keys (vs. natural keys) also has nothing to do with normalization. A lot of people make this mistake.
However, if you move other attributes into the lookup table, attributes that depend only on the lookup value and therefore would create repeating groups (violating 3NF) in the main table if you left them there, then that would be normalization.
If you want normalization break it out.
I think of these types of data in DBs as the equivalent of enums in C,C++,C#. Mostly you put them in the table as documentation.
I often have an ID, Name, Description, and auditing columns for them (eg modified by, modified date, create date, create by, active.) The description field is rarely used.
Example (some might say there are more than just 2)
Gender
ID Name Audit Columns...
1 Male
2 Female
Then in your contacts you would have a GenderID column which would link to this one.
Of course you don't "need" the table. You could have external documentation somewhere that says 1=Male, 2=Female -- but I think these tables serve to document a system.
If it's really a free-entry text field that's not re-used somewhere else in the database, and there's just a single field without repeated instances, I'd probably go ahead and leave it as it is. If you're determined to break it out I'd create a 'validation' table with a surrogate key and the text value, then put the surrogate key in the base table.
Share and enjoy.
Are these 10-15 values actually meaningful, or are they really just flags? If they're meaningful pieces of text and it seems wasteful to replicate them, then sure create a lookup table. But if they're just arbitrary flag values, then your new table will be nothing more than a mapping from one arbitrary value to another, and not terribly helpful.
A completely separate question is whether all or most of the rows in your big table even have a value for this column. If not, then indeed you have a good opportunity for normalization and can create a separate table linking the primary key from your base table with the flag value.
Edit: One thing. If there's some chance that one of these "flag" values is likely to be wholesale replaced with another value at some point in the future, that would be another good reason to create a table.
I have a design question.
I have to store approx 100 different attributes in a table which should be searchable also. So each attribute will be stored in its own column. The value of each attribute will always be less than 200, so I decided to use TINYINT as data type for each attribute.
Is it a good idea to create a table which will have approx 100 columns (Each of TINYINT)? What could be wrong in this design?
Or should I classify the attributes into some groups (Say 4 groups) and store them in 4 different tables (Each approx have 25 columns)
Or any other data storage technique I have to follow.
Just for example the table is Table1 and it has columns Column1,Column2 ... Column100 of each TINYINT data type.
Since size of each row is going to be very small, Is it OK to do what I explained above?
I just want to know the advantages/disadvantages of it.
If u think that it is not a good idea to have a table with 100 columns, then please suggest other alternatives.
Please note that I don;t want to store the information in composite form (e.g. few xml columns)
Thanks in advance
Wouldn't a many-to-many setup work here?
Say Table A would have a list of widget, which your attributes would apply to
Table B has your types of attributes (color, size, weight, etc), each as a different row (not column)
Table C has foreign keys to the widget id (Table A) and the attribute type (Table B) and then it actually has the attribute value
That way you don't have to change your table structure when you've got a new attribute to add, you simply add a new attribute type row to Table C
Its ok to have 100 columns. Why not? Just employ code generation to reduce handwriting of this columns.
I wouldn't worry much about the number of columns per se (unless you're stuck using some really terrible relational engine, in which case upgrading to a decent one would be my most hearty recommendation -- what engine[s] do you plan/need to support, btw?) but about the searchability thereby.
Does the table need to be efficiently searchable by the value of an attribute? If you need 100 indexes on that table, THAT might make insert and update operations slow -- how frequent are such modifications (vs reads to the table and especially searches on attribute values) and how important is their speed to you?
If you do "need it all" there just possibly may be no silver bullet of a "perfect" solution, just compromises among unpleasant alternatives -- more info is needed to weigh them. Are typical rows "sparse", i.e. mostly NULL with just a few of the 100 attributes "active" for any given row (just different subsets for each)? Is there (at least statistically) some correlation among groups of attributes (e.g. most of the time when attribute 12 is worth 93, attribute 41 will be worth 27 or 28 -- that sort of thing)?
BAsed on your last, it seems to me that you may have a bad design. WHat is the nature of these columns? Are you storing information together that shouldn't be together, are you storing information that shoul be in related tables?
So really what we need to best help you is to see what the nature of the data you have is.
what would be in
column1,column3,column10 vice column4,column15,column20,column25
I had a table with 250 columns. There's nothing wrong. For some cases, it's how it works.
unless some of the columns you are defining have a meaning "per se" as independent entities and they can be shared by multiple rows. In that case, it makes sense to normalize out the set of columns in a different table, and put a column in the original table (possibly with a foreign key constraint)
I think the correct way is to have a table that looks more like:
CREATE TABLE [dbo].[Settings](
[key] [varchar](250) NOT NULL,
[value] tinyint NOT NULL
) ON [PRIMARY]
Put an index on the key column. You can eventually make a page where the user can update the values.
Having done a lot of these in the real world, I don't understand why anyone would advocate having each variable be its own column. You have "approx 100 different attributes" so far, you don't think you are going to want to add and delete to this list? Every time you do it is a table change and a production release? You will not be able to build something to hand the maintenance off to a power user. Your reports are going to be hard-coded too? Things take off and you reach the max number of columns of 1,024 are you going to rework the whole thing?
It is nothing to expand the table above - add Category, LastEditDate, LastEditBy, IsActive, etc. or to create archiving functionality. Much more awkward to do this with the column based solution.
Performance is not going to be any different with this small amount of data, but to rely on the programmer to make and release a change every time the list changes is unworkable.