Newb SQL question: Is there a "foreign key" equivalent for referencing columns, or is it logical to store column names as strings in another table? - sql

Disclaimer: I'm an SQL newb, just discovering the great possibilities of sqlite embedded in applications.
I'm designing a small database for an application that primarily handles data storage for plotting. Say I have several instrument data tables (one for each type of instrument) with a column for each data field, such as pressure, temperature, humidity, etc. I would like to join these data tables, pull out the column names, and assign a more human-readable column "description" made available to the user for selecting the independent/dependent variables for plotting. Getting the column names in sqlite is easy enough, but I'm not sure if I should store the resulting names (and longer descriptions) as another table in the db or in a list of some sort in the application. If it were another table in the db, I think I would have to store the column name as a string... unless there is some way to reference a column in SQL? I'm not sure if there is a foreign key equivalent for columns, or if this is just a sign of bad database design needing restructuring.
Thanks for any suggestions, stack overflowers.

There are ways to do introspection in SQL and pull out table and column names, but you probably don't want to do it that way, and the implementation details of that are highly dependent on your particular RDBMS.
Since the human-readable column name is a display detail, and likely a static value, keep it out of the database and somewhere in the display/rendering portion of your app. (There are exceptions to this, such as if you want the column name to be customizable per-user.) You can keep a hash or dictionary which defines a human-friendly name and uses the actual database column name as its key. This setup can be handy for saving other settings for the display of the column, such as datatype, precision to display (if a number), etc.

What you are after is sometimes called a 'data dictionary'. If the DBMS does not provide annotation facilities, then you will need to simulate them, and that is likely to include creating a table with table names as well as column names in it. It is perfectly reasonable to do that.
Be aware of the difficulties with keeping track of changes to table names and added or deleted columns; the built-in facilities only keep track of the data in the system catalog (the 'data dictionary' information that the DBMS needs to track what goes on) and will not keep track of what is in your table.

Related

SQL: Best way to conditionally relate multiple tables to a single table based on row value in the main table

In a nutshell here's the situation, we have a database that is used to build a hierarchy of "locations". (Example: Street Address > Building 1 > First Floor > Room).
Each of the locations are stored in a table. Each location can be of a different "type". The types are defined in another table. (We use the types of locations to restrict what locations can be added to a location).
Here's the quandary we are facing: We need to be able to store different types of information for different types of locations. For example, a location type of "building" may need to have the address stored where as a location type of "room" may need to have dimensions or paint color stored.
Obviously we could create a table for each location type we define to hold the properties required for the particular location type and then use application logic to query the appropriate table to pull in a particular location's additional information. Is there a more elegant or practical way to accomplish this relationally in the database without having to rely on application logic?
Thanks!
The relationally pure way to do this would be to implement your initial suggestion i.e. have a separate table such as BUILDING_LOCATIONS that holds the attributes that are only applicable to this type of location. A different [TYPE]_LOCATIONS table would be created for each type of location that has its own attributes. In this way you could use standard database constraint functionality to ensure the integrity of the data in these tables.
Another method would be to add a series of nullable columns to the LOCATIONS table such as BUILDING_ADDRESS and ROOM_DIMENSIONS. This is not relationally pure as it means null values can exist in this table. However, you can still use standard database functionality to ensure the integrity of the data. It can be a bit more convoluted if certain values are mandatory in certain situations. Also, if there are many types of location with many differing attributes the number of columns in the table can become unwieldy.
Another method is the Entity-Attribute-Value model. Generally, this is to be avoided if at all possible. It is not relationally pure, as your column values are now no longer defined over domains, and it is extremely difficult, if not impossible, to ensure the integrity of the data. Any real attempt to do so will require a lot of bespoke coding (which needs to be carefully implemented to cater for things like concurrency control that database constraints give you for free) as you cannot use standard database constraints. However, if you are just interested in storing values for information and not doing anything with them you could use this method.
The EAV method does have a danger that because it appears so easy to add attributes to an entity, it becomes the default way of doing so. It is then used to add attributes for which vital processing is dependent and, because you cannot ensure the integrity of the data using this method, you find the values being used are meaningless and the whole logical basis for the processing is destroyed.

Design Pattern to add columns in database table dynamically

The user wants to add new fields in UI dynamically. This new field should get stored in database and they should be allowed to perform CRUD on it.
Now I can do this by specifying a XML but I wanted a better way where these new columns are searchable. Also the idea of firing ALTER statement and adding a new column seems wrong.
Can anyone help me with a design pattern on database server side of how to solve this problem?
This can be approached using a key value system. You create a table with the primary key column(s) of the table you want to annotate, a column for the name of the attribute, and a column for its value. When you user wants to add an attribute (say height) to the record of person 123 you add a row to the new table with the values (123, 'HEIGHT', '140.5').
In general you cast the values to TEXT for storage but if you know all the attributes will be numeric you can choose a different type for the value column. You can also (not recommended) use several different value columns depending on the type of the data.
This technique has the advantage that you don't need to modify the database structure to add new attributes and attributes are only stored for those records that have them. The disadvantage is that querying is not as straightforward as if the columns were all in the main data table.
There is no reason why a qualified business user should not be allowed to add a column to a table. It is less likely to cause a problem than just about anything else you can imagine including adding a new row to a table or changing. the value of a data element.
Using either of the methods described above do not avoid any risk; they are simply throwbacks to COBOL filler fields or unnecessary embellishments of the database function. The result can still be unnormalized and inaccurate.
These same business persons add columns to spreadsheets and tables to Word documents without DBAs getting in their way.
Of course, just adding the column is the smallest part of getting an information system to work, but it is often the case that it is perceived to be an almost insurmountable barrier. It is in fact 5 min worth of work assuming you know where to put it. Adding a column to the proper table with the proper datatype is easy to do, easy to use, and has the best chance of encouraging data quality.
Find out what the maximum number of user-added fields will be and add them before hand. For example 'User1', 'User2', 'User3', 'User4'...etc. You can then enable the fields on the UI based on some configurable settings.

How can i design a DB where the user can define the fields and types of a detail table in a M-D relationship?

My application has one table called 'events' and each event has approx 30 standard fields, but also user defined fields that could be any name or type, in an 'eventdata' table. Users can define these event data tables, by specifying x number of fields (either text/double/datetime/boolean) and the names of these fields. This 'eventdata' (table) can be different for each 'event'.
My current approach is to create a lookup table for the definitions. So if i need to query all 'event' and 'eventdata' per record, i do so in a M-D relaitionship using two queries (i.e. select * from events, then for each record in 'events', select * from 'some table').
Is there a better approach to doing this? I have implemented this so far, but most of my queries require two distinct calls to the DB - i cannot simply join my master 'events' table with different 'eventdata' tables for each record in in 'events'.
I guess my main question is: can i join my master table with different detail tables for each record?
E.g.
SELECT E.*, E.Tablename
FROM events E
LEFT JOIN 'E.tablename' T ON E._ID = T.ID
If not, is there a better way to design my database considering i have no idea on how many user defined fields there may be and what type they will be.
There are four ways of handling this.
Add several additional fields named "Custom1", "Custom2", "Custom3", etc. These should have a datatype of varchar(?) or similiar
Add a field to hold the unstructured data (like an XML column).
Create a table of name /value pairs which are associated with some type of template. Let them manage the template. You'll have to use pivot tables or similiar to get the data out.
Use a database like MongoDB or another NoSql style product to store this.
The above said, The first one has the advantage of being fast but limits the number of custom fields to the number you defined. Older main frame type applications work this way. SalesForce CRM used to.
The second option means that each record can have it's own custom fields. However, depending on your database there are definite challenges here. Tried this, don't recommend it.
The third one is generally harder to code for but allows for extreme flexibility. SalesForce and other applications have gone this route; including a couple I'm responsible for. The downside is that Microsoft apparently acquired a patent on doing things this way and is in the process of suing a few companies over it. Personally, I think that's bullcrap; but whatever. Point is, use at your own risk.
The fourth option is interesting. We've played with it a bit and the performance is great while coding is pretty darn simple. This might be your best bet for the unstructured data.
Those type of joins won't work because you will need to pivot the eventdata table to make it columns instead of rows. Therefore it depends on which database technology you are using.
Here is an example with MySQL: How to pivot a MySQL entity-attribute-value schema
My approach would be to avoid using a different table for each event, if that's possible.
I would use something like:
Event (EventId, ..., ...)
EventColumnType (EventColumnTypeId, EventTypeId, ColumnName)
EventColumnData (EventColumnTypeId, Data)
You are them limited to the type of data you can store (everything would have to be strings, for example), but you the number of events and columns are unrestricted.
What I'm getting from your description is you have an event table, and then a separate EventData table for each and every event.
Rather than that, why not have a single EventCustomFields table that contains a foreign key to the event table, a field Name (event+field being the PK) and a field value.
Sure it's not the best. You'd be stuck serializing the value or storing everything as a string. And you'd still be stuck doing two queries, one for the event table and one to get it's custom fields, but at least you wouldn't have a new table for every event in the system (yuck x10)
Another, (arguably worse) option is to serialize the custom fields into a single column of the and then deserialize when you need. So your query would be something like
Select E.*, C.*
From events E, customFields C
Where E.ID = C.ID
Is it possible to just impose a limit on your users? I know the tables underneath Sharepoint 2007 had a bunch of columns for custom data that were just named like CustomString1, CustomDate2, etc. That may end up easier than some of the approaches above, where everything is in one column (though that's an approach I've taken as well), and I would think it would scale up better.
The answer to your main question is: no. You can't have different rows in the result set with different columns. The result set is kind of like a table, so each row has to have the same columns. You can fake it with padding and dummy columns, but that's probably not much better.
You could try defining a fixed event data table, with (say) ten of each type of column. Then you'd store the usage metadata in a separate table and just read that in at system startup. The metadata would tell you that event type "foo" has a field "name" mapped to column string0 in the event data table, a field named "reporter" mapped to column string1, and a field named "reportDate" mapped to column date0. It's ugly and wastes space, but it's reasonably flexible. If you're in charge of the database, you can even define a view on the table so to the client it looks like a "normal" table. If the clients create their own tables and just stick the table name in the event record, then obviously this won't fly.
If you're really hardcore you can write a database procedure to query the table structures and serialize everything to a lilst of key/type/value tuples and return that in one long string as the last column, but that's probably not much handier than what you're doing now.

SQL Server - Schema/Code Analysis Rules - What would your rules include?

We're using Visual Studio Database Edition (DBPro) to manage our schema. This is a great tool that, among the many things it can do, can analyse our schema and T-SQL code based on rules (much like what FxCop does with C# code), and flag certain things as warnings and errors.
Some example rules might be that every table must have a primary key, no underscore's in column names, every stored procedure must have comments etc.
The number of rules built into DBPro is fairly small, and a bit odd. Fortunately DBPro has an API that allows the developer to create their own. I'm curious as to the types of rules you and your DB team would create (both schema rules and T-SQL rules). Looking at some of your rules might help us decide what we should consider.
Thanks - Randy
Some of mine. Not all could be tested programmatically:
No hungarian-style prefixes (like "tbl" for table, "vw" for view)
If there is any chance this would ever be ported to Oracle, no identifiers longer than 30 characters.
All table and column names expressed in lower-case letters only
Underscores between words in column and table names--we differ on this one obviously
Table names are singular ("customer" not "customers")
Words that make up table, column, and view names are not abbreviated, concatenated, or acronym-based unless necessary.
Indexes will be prefixed with “IX_”.
Primary Keys are prefixed with “PK_”.
Foreign Keys are prefixed with “FK_”.
Unique Constraints are prefixed with “UC_”.
I suspect most of my list would be hard to put in a rules engine, but here goes:
If possible I'd have it report any tables that are defined as wider than the bytes that can be stored in a record (excluding varchar(max) and text type fields) and/or a datapage.
I want all related PK and FK columns to have the same name if at all possible. The only time it isn't possible is when you need to have two FKs in the same table relating to one PK and even then, I would name it the name of the PK and a prefix or suffix describing the difference. For instance if I had a PersonID PK and a table needed to have both the sales rep id and the customer id, they would be CustomerPersonID, and RepPersonID.
I would check to make sure all FKs have an index.
I would want to know about all fields that are required but have no default value. Depending on what it is, you may not want to define a default, But I would want to be able to easily see which ones don't to hopefully find the ones that should have a default.
I would want all triggers checked to see that they are set-based and not designed to run for one row at time.
No table without a defined Unique index or PK. No table where the PK is more than one field. No table where the PK is not an int.
No object names that use reserved words for the database I'm using.
No fields with the word Date as part of the name that are not defined as date or datetime.
No table without an associated audit table.
No field called SSN, SocialSecurityNumber, etc. that is not encrypted. Same for any field named CreditCardNumber.
No user defined datatypes (In SQL Server at least, these are far more trouble than they are worth.)
No views that call other views. Experience has shown me these are often a performance disaster waiting to happen. Especially if they layer more than one layer deep.
If using replication, no table without a GUID field.
All tables should have a DateInserted field and InsertedBy field (even with auditing, it is often easier to research data problems if this info is easily available.)
Consistent use of the same case in naming. It doesn't matter which as long as all use the same one.
No tables with a field called ID. Hate these with a passion. They are so useless. ID fields should be named tablenameID if a PK and with the PK name if an FK.
No spaces or special characters in object names. In other words if you need special handling for the database to recognize it in the correct context in query, don't use it.
If it is going to analyze code as well, I'd want to see any code that uses a cursor or a correlated subquery. Why create performance problems from the start?
I would want to see if a proc uses dynamic SQl and if so if it has an input bit variable called Debug (and code to only print the dynamic SQl statment and not execute it, if the Debug variable is set to 1).
I'd want to be able to check that if there is more than one statement causing action in the database (insert/update/delete) that there is also an explicit transaction in the proc and error trapping to roll the whole thing back if any part of it fails.
I'm sure I could think of more.

What is the preferred way to store custom fields in a SQL database?

My friend is building a product to be used by different independent medical units.
The database stores a vast collection of measurements taken at different times, like the temperature, blood pressure, etc...
Let us assume these are held in a table called exams with columns temperature, pressure, etc... (as well as id, patient_id and timestamp). Most of the measurements are stored as floats, but some are of other types (strings, integers...)
While many of these measurements are handled by their product, it needs to allow the different medical units to record and process other custom measurements. A very nifty UI allows the administrator to edit these customs fields, specify their name, type, possible range of values, etc...
He is unsure as to how to store these custom fields.
He is leaning towards a separate table (say a table custom_exam_data with fields like exam_id, custom_field_id, float_value, string_value, ...)
I worry that this will make searching both more difficult to achieve and less efficient.
I am leaning towards modifying the exam table directly (while avoiding conflicts on column names with some scheme like prefixing all custom fields with an underscore or naming them custom_1, ...)
He worries about modifying the database dynamically and having different schemas for each medical unit.
Hopefully some people which more experience can weigh in on this issue.
Notes:
he is using Ruby on Rails but I think this question is pretty much framework agnostic, except from the fact that he is only looking for solutions in SQL databases only.
I simplified the problem a bit since the custom fields need to be available for more than one table, but I believe this doesn`t really impact the direction to take.
(added) A very generic reporting module will need to search, sort, generate stats, etc.. of this data, so it is required that this data be stored in the columns of the appropriate type
(added) User inputs will be filtered, for the standard fields as well as for the custom fields. For example, numbers will be checked within a given range (can't have a temperature of -12 or +444), etc... Thus, conversion to the appropriate SQL type is not a problem.
I've had to deal with this situation many times over the years, and I agree with your initial idea of modifying the DB tables directly, and using dynamic SQL to generate statements.
Creating string UserAttribute or Key/Value columns sounds appealing at first, but it leads to the inner-platform effect where you end up having to re-implement foreign keys, data types, constraints, transactions, validation, sorting, grouping, calculations, et al. inside your RDBMS. You may as well just use flat files and not SQL at all.
SQL Server provides INFORMATION_SCHEMA tables that let you create, query, and modify table schemas at runtime. This has full type checking, constraints, transactions, calculations, and everything you need already built-in, don't reinvent it.
It's strange that so many people come up with ad-hoc solutions for this when there's a well-documented pattern for it:
Entity-Attribute-Value (EAV) Model
Two alternatives are XML and Nested Sets. XML is easier to manage but generally slow. Nested Sets usually require some type of proprietary database extension to do without making a mess, like CLR types in SQL Server 2005+. They violate first-normal form, but are nevertheless the fastest-performing solution.
Microsoft Dynamics CRM achieves this by altering the database design each time a change is made. Nasty, I think.
I would say a better option would be to consider an attribute table. Even though these are often frowned upon, it gives you the flexibility you need, and you can always create views using dynamic SQL to pivot the data out again. Just make sure you always use LEFT JOINs and FKs when creating these views, so that the Query Optimizer can do its job better.
I have seen a use of your friend's idea in a commercial accounting package. The table was split into two, first contained fields solely defined by the system, second contained fields like USER_STRING1, USER_STRING2, USER_FLOAT1 etc. The tables were linked by identity value (when a record is inserted into the main table, a record with same identity is inserted into the second one). Each table that needed user fields was split like that.
Well, whenever I need to store some unknown type in a database field, I usually store it as String, serializing it as needed, and also store the type of the data.
This way, you can have any kind of data, working with any type of database.
I would be inclined to store the measurement in the database as a string (varchar) with another column identifying the measurement type. My reasoning is that it will presumably, come from the UI as a string and casting to any other datatype may introduce a corruption before the user input get's stored.
The downside is that when you go to filter result-sets by some measurement metric you will still have to perform a casting but at least the storage and persistence mechanism is not introducing corruption.
I can't tell you the best way but I can tell you how Drupal achieves a sort of schemaless structure while still using the standard RDBMSs available today.
The general idea is that there's a schema table with a list of fields. Each row really only has two columns, the 'table':String column and the 'column':String column. For each of these columns it actually defines a whole table with just an id and the actual data for that column.
The trick really is that when you are working with the data it's never more than one join away from the bundle table that lists all the possible columns so you end up not losing as much speed as you might otherwise think. This will also allow you to expand much farther than just a few medical companies unlike the custom_ prefix you were proposing.
MySQL is very fast at returning row data for short rows with few columns. In this way this scheme ends up fairly quick while allowing you lots of flexibility.
As to search, my suggestion would be to index the page content instead of the database content. Use Solr to parse through rendered pages and hold links to the actual page instead of trying to search through the database using clever SQL.
Define two new tables: custom_exam_schema and custom_exam_data.
custom_exam_data has an exam_id column, plus an additional column for every custom attribute.
custom_exam_schema would have a row to describe how to interpret each of the columns of the custom_exam_data table. It would have columns like name, type, minValue, maxValue, etc.
So, for example, to create a custom field to track the number of fingers a person has, you would add ('fingerCount', 'number', 0, 10) to custom_exam_schema and then add a column named fingerCount to the exam table.
Someone might say it's bad to change the database schema at run time, but I'd argue that configuring these custom fields is part of set up and won't happen too often. Still, this method lets you handle changes at any time and doesn't risk messing around with your core table schemas.
lets say that your friend's database has to store data values from multiple sources such as demogrphic values, diagnosis, interventions, physionomic values, physiologic exam values, hospitalisation values etc.
He might have as well to define choices, lets say his database is missing the race and the unit staff need the race of the patient (different races are more unlikely to get some diseases), they might want to use a drop down with several choices.
I would propose to use an other table that would have these choices or would you just use a "Custom_field_choices" table, which at some point is exactly the same but with a different name.
Considering that the database :
- needs to be flexible
- that data from multiple tables can be added and be customized
- that you might want to keep the integrity of the main structure of your database for distribution and uniformity purpose
- that data MUST have a limit and alarms and warnings
- that data must have units ( 10 kg or 10 pounds) ?
- that data can have a selection of choices
- that data can be with different rights (from simple user to admin)
- that these data might be needed to generate reports without modifying the code (automation)
- that these data might be needed to make cross reference analysis within the system without modifying the code
the custom table would be my solution, modifying each table would end up being too risky.
I would store those custom fields in a table where each record ( dataType, dataValue, dataUnit ) would use in one row. So there would be a relation oneToMany from one sample to the data. You can also create a table to record all the kind of cutsom types you would use. For example:
create table DataType
(
id int primary key,
name varchar(100) not null unique
description text,
uri varchar(255) //<-- can be used for an ONTOLOGY
)
create table DataRecord
(
id int primary key,
sample_id int not null,//<-- reference to the sample
dataType_id int not null, //<-- references DataType
value varchar(100),//<-- the value as string
unit varchar(50)//<-- g, mg/ml, etc... but it could also be a link to a table describing the units just like DataType
)