database design: link table for multiple different tables - sql

I have a conceptual question about database design which came up multiple times in my history as a developer.
Imagine I have a bigger database that is designed in the past, and already in production (for example you can take http://sqlfiddle.com/#!9/7c06f/1, a library database).
Now there is a new feature request: for every existing object in every table there shall be a "help text" (or something different, an error, a tag...) that you can all view in one place.
I implemented something like that multiple times in the past, but every time I'm not satisfied with the solution.
One solution is to have a link table with every table, like that:
CREATE TABLE BooksHelp (
BookId INT NOT NULL,
HelpText VARCHAR NOT NULL
);
CREATE TABLE AuthorHelp (
AuthorId INT NOT NULL,
HelpText VARCHAR NOT NULL
);
...
But I would need multiple link tables, which makes the selection of every existing "help text" difficult.
How would you design this problem? Is there another, better solution?

This appears to be similar trying to map polymorphism to the relational model - it's just a bad fit - there's no obvious right answer.
There are a few obvious solutions. The one you've identified (storing the helptext in a linked table) is neat, but requires lots of joins if you're retrieving all the books for an author belonging to a publisher etc. As the business logic says "all objects should have helptext" (and I assume each row has different helptext), it's not super logical to store this in a child table - "helptext" is an attribute of each object, not a related concept.
You could also add a helptext column to each table. That stores the attribute in the main table, and reduces the cognitive load (as well as the number of joins). This is logical if each author, book, etc. has their own helptext.

In the case of text items of some sort, especially if they are used in several tybles, like help texts, it seems preferable to centralize their text data in one table and store references in the tables that need them. This design also supports multi-lingual apps where the text items would have to be maintained in several languages.

Related

Database design variable amount of fields

I'm making a website for a client but I stumbled upon a problem and I need some advice on it.
For each project, they want to have the possibility to set a variable amount of images and (sometimes) some corresponding text.
I was thinking about storing all of the information in one field, instead of making field_1 to field_99 just in case they need 99 fields.
// database column
'../fotos/foto1.png',
'hier komt tekst',
'../fotos/foto2.png',
'', (empty text)
'../fotos/foto3.png'
This solution has some disadvantadges, there must be better manners out there to achieve this.
What's the preferred way to do this?
Create another table (e.g. FOTO_CODES) with all possibly values of foto and generate id for them.
Create another child table that will have the master table record id and ID from FOTO_CODES table and FOTO data (Image).
It's called normalization.
The solution you described violates the principle of atomicity and therefore the 1NF. You'd have trouble maintaining and querying data in this format.
This is a classic 1-to-many relationship, that can be modeled in two ways:
1) Identifying relationship:
2) Non-identifying relationship:
Both have pros and cons, StackOverflow already has plenty of discussions on this topic.

db design critic /suggestions

I am redesigning a pharmacy db system and need inputs to see if the new design is optimal or requires tweaking.
Here's a snapshot of the old system..
As can be seen, the pharmacies table stores pharmacy information, along with its address and contact information. Pharmacies are grouped together for invoicing purposes(pharmacygroup) or for sales, advertsing other purposes (banner group). The invoice group may have a different physical address, different contact information.
Here's my new design. I have split the address from both the pharmacy and pharmacygroup table into a table of its own and made a new table for contacts. Their could be technical contacts, account contacts, owner contacts etc, hence the contacttypes table. The pharmacy and the pharmacygroup can have separate contact info, I thought of making a single contact table and have a 'linktype' and 'linkid' column to indicate if its a pharmacy contact or pharmacy group contact, but I am not sure if this is a right approach. Is this a good design or will it be costly in terms of data retrieval because of the number of joins?? Another thing I noted that , is in the old design , they didn't create any foreign key constraints, although the pharmacy table had groupid and bannergroupid references for pharmacygroup and bannergroup, possibly to save time for data retrieval. Is this a good approach?
Your design looks good to me. I always prefer to have a couple of extra joins on the design step over spending time reorganizing data after system went into production. You never know in advance what kind of reports will be requested by management/sales/financial people, and proper relational design will give you more freedom.
Also, you cannot blame only a couple of extra JOINs for your performance issues. You should always look at:
data volumes (and physical data layout),
transaction amount and density,
I/O, CPU, memory usage,
your RDBMS configuration,
SQL queries quality.
In my view, JOINs will be on the bottom of this list.
As to the RI constraints (Referential Integrity), I've seen a couple of projects that had been running without any Primary/Foreign keys for increased performance. The main excuse was: we have all checks embedded into the Application and Application is the only source of any changes in the system. On the other hand, they agreed, that it is not known, whether systems were in a consistent state (in fact, analysis showed they were not).
I always stick to creating all possible keys/constraints on the design state, as there always will be some “cowboys” around, who will dig into your database and “adjust” data they seem fits better. Still, you might want to temporarily disable or even drop some constraints/indexes for the bulk data manipulations, which is also an official recommendation.
If uncertain, create 2 test databases, one with and another without constraints. Load some data and compare query performance. I think it will be similar.
And here my comments on your sketches, decisions are all yours.
You might want to create a common contacts table the same way you did for addresses, i.e. add contact_id, owner_contact_id, etc. columns to the target relations instead of referencing relations from contacts table;
As you have only one column in contacttype table (and in case you'll have a common contacts), it's better to move the only field away and avoid this table;
You seem to have mixture of singular/plural names for your tables, better to stick to a common pattern here. I personally prefer singular;
In pharmacygroup your PK is named id, while all the rest PKs follow tableid pattern, it will be easier to write scripts later if you'll use a common pattern here;
In addresses table you have fields with underscores, like street_name, while elsewhere you avoid _ — consider making it common;
References are named differently. Although it is not so highly important, I do have a couple of systems where I have to rely on the constraints' names, so it's better to use some pattern here. I use the following one:
prefix p_, f_, c_, t_, u_ or i_ for primary, foreign keys, check constraints, triggers, unique and other indexes;
name of the table;
name of the column constraint/index/trigger refers to.
Why I prefer naming tables in singular form? Because I always name PK using table_id pattern, and IMHO pharmacy_id looks better then pharmacies_id. I use this approach as I have a bunch of general-purpose scripts which relies on this pattern when performing data consistency checks prior to loading it into the main tables.
EDIT:
More on contacts.
You can use contact_id in all your tables, making it a primary contact, whatever this might mean in your application. Should you need more contacts to be there for some relations, then you can go with different prefixes, like owner_contact_id, sales_contact_id, etc.
In case you expect a huge number of contacts to be there for some relations, like pharmacygroup, then you will can add an extra table like this:
CREATE TABLE pharmacygroupcontact (
contactid int4,
groupid int4,
contact_desc text
);
It partially copies your initial groupcontacts, but consists of two FKs and a description.
Which approach is better I cannot tell as I'm not aware how Application is designed.
You have 2 contact tables, I would create one, then use linking tables to link groupcontacts and pharmacycontacts. I would definitely want to have the FK and PK relationships setup to.

How to add user customized data to database?

I am trying to design a sqlite database that will store notes. Each of these notes will have common fields like title, due date, details, priority, and completed.
In addition though, I would like to add data for more specialized notes like price for shopping list items and author/publisher data for books.
I also want to have a few general purpose fields that users can fill with whatever text data they want.
How can I design my database table in this case?
I could just have a field for each piece of data for every note, but that would waste a lot of fields and I'd like to have other options and suggestions.
There are several standard approaches you could use for solving this situation.
You could create separate tables for each kind of note, copying over the common columns in each case. this would be easy but it would make it difficult to query over all notes.
You could create one large table with many columns and some kind of type field which would let you know which type of note it is (and therefore which subset of columns to use)
CREATE TABLE NOTE ( ID int PRIMARY KEY, NOTE_TYPE int, DUEDATE datetime, ...more common fields, price NUMBER NULL, author VARCHAR(100) NULL,.. more specific fields)
you could break your tables up into a inheritance relationship something like this:
CREATE TABLE NOTE ( ID int PRIMARY KEY, NOTE_TYPE int, DUEDATE datetime, ...more common fields);
CREATE TABLE SHOPPINGLITITEM (ID int PRIMARY KEY, NOTE_ID int FORIENKEY NOTE.ID, price number ... more shopping list item fields)
Option 1 would be easy to implement but would involve lots of mostly redundant table definitions.
Option 2 would be easy to create and easy to write queries on but would be space inefficient
And option 3 would be more space efficient and less redundant but would possibly have slower queries because of all the foreign keys.
This is the typical set of trade-offs for modeling these kinds of relationships in SQL, any of these solutions could be appropriate for use case depending non your performance requirements.
You could create something like a custom_field table. It gets pretty messy once you start to normalize.
So you have your note table with it's common fields.
Now add:
dynamic_note_field
id label
1 publisher
2 color
3 size
dynamic_note_field_data
id dynamic_note_field_id value
1 1 Penguin
2 1 Marvel
3 2 Red
Finally, you can relate instances of your data with the fields they use through
note_dynamic_note_field_data
note_id dynamic_note_field_data_id
1 1
1 3
2 2
So now we've said: note_id 1 has two additional fields. The first one has a value "Penguin" and represents a publisher. The second one has a value of "Red" and represents a color.
So what's the point of normalizing it this far?
You're not wasting space adding fields to every item (you relate a note with it's additional dynamic field via the m2m table).
You're not storing redundant labels (you may continue to store redundant data however as the same publisher is likely to appear many times... this aspect is extremely subjective. If you want rich data about your publishers you typically want to take the step of turning them into their own entity rather than an ad-hoc string. Be careful when making this leap because it adds an extra level of hairiness to the db. Evaluate the use case accordingly.
The dynamic_note_field acts as your data definition. If you're interested in answering a question such as "what are the additional fields I've created" this lets you do it easily without searching all of your dynamic_note_field_data. Eventually, you might add extra info to this table such as a type field. I like to create this separation off the bat, but that might be a violation of the YAGNI principle in your case.
Disadvantages:
It's not too bad to search for all notes that have a publisher, where that publisher is "Penguin".
What's tricky is something like "Find any note with a value of 'Penguin' in any field". You don't know up front which field's your searching. At this point you're better off with a separate index that's generated alongside your normalized db data which acts as the point of truth. Again, the nice thing about normalization is that you maintain the data in a very lossless, non-destructive state.
For data you want to store but does not have to be searchable, another option is to serialize it to/from JSON and store it in a TEXT column. This gives you arbitrary structure, but you cannot readily query against those values.
Yet another option is to dump SQLite and go with an object database. I seem to recall there are one or two working for Android. I have not tried any of these, however.
Just create a small table which contains the common fields of all your notes.
Then a table for each class of special notes you have, that that contains all the extra fiels plus a reference on your first table.
For each note you will enter, you create a row in your main table (that contains the common fields) and a row in your extra table that contains the extra fields, and a reference to the row in your main table.
Then you will just have to make a join in you request.
With this solution :
1)you have a safe design (can't access fields that are not part of your note)
2)your db will be optimized

Database design: Store data from paper forms in database

Database design question for y'all. I have a form (like, the paper kind) that has several entry points for data. This form has changed, and is expected to change over years. It is being turned into a computer app, so that we can, among other things, quit wasting paper. (And minor things, like have all the data in one central store that can be queried, etc.) I'd like to store all of the forms data in a database, and have it be pretty agnostic as to the changes.
Originally, I was just considering each field to be a string -- and I had a table something like this:
FormId int (FK)
FieldName nvarchar(64)
FieldValue nvarchar(128)
...something like that. It was actually a bit more 3NFy in that FieldName was in another table, associated with an artificial key, so that the field names weren't duplicated all over the place.
However, I'd like to extend this to numeric and drop-down data. I could just store numeric data as strings, but that seems like a pretty crappy idea. Same with drop downs.
I could stop using a table, and actually use columns on the main form table (the one that FormId above references), but that means adding a column for each new item as they come along, and older forms would just be null. (And, unless I stored it, I wouldn't know when that column was created. With the string table above, it's implicit.)
I could extend the table above to something like:
FormId int (FK)
FieldName nvarchar(64)
FieldValueType int -- enum as to which of the columns below are valid (or just let nulls imply that)
FieldValue nvarchar(128)
FieldValueInt int
Combos would have to be in a OTLT (one true lookup table), which I have reservations about, but perhaps it's needed here?
Any advice on StackOverflow? I'm using MSSQL, but this is really a more general question.
Use Nulls. Proper database design is a complicated subject; you may do well to pick up a good reference and do some research on the whole thing (I gather this is a good book on the topic). In general, it sounds like you would be well served by starting with a single table that encapsulates all the fields in your form, and then putting it through the normalization process. And yes, use nulls and do NOT use an int to enumerate which columns are set to valid values; that is exactly what nulls are for.
You could have a separate table for each datatype.
I.e. to fetch an entire form you'd do an N-way join using the form id where N is the number of distinct datatypes you support (+ perhaps extras depending on the info you want - e.g. dropdown values would probably be stored in another table / your fieldname lookup / etc.)
But the design should probably also depend on how you intend to use the data, which you've said nothing about. And it would also depend on just how fast the rate of change is for these forms . . .
By creating a table with a description of your forms, you are actually defining a metadata structure. That's daunting. You would need a lot of the infrastructure needed for proper table description. I think the vendors of your database system spent a lot of effort in doing all that.
At first I thought - what a nice idea! Build your own compatibility-aware table description system!
But then I thought - I'm too stupid to do that on my own. There must be a database system capable of doing that.
So I conclude, not being a db expert, define proper defaults for 'new fields' in new form versions. Handle the compatibility issue in your business logic.
I would strongly advise against having a "generic table" like you describe.
You are essentially reinventing the relational database, which is not a good idea: Queries and updates will be very painful with your structure, and you will not be able to use the more advanced features like foreign keys and triggers, should you need them.
Just make a table(s) with columns for the data fields, and if a form does not have a field, let it be null.
Or, probably even better, have a "base table" (field that are in every form), and give names/version numbers to updated forms, and have a new table for the new columns that this version adds, then use a synthetic PK to join these new tables to your base table.
I.e.:
base table: id(numeric,PK), name, birthday, town
addresstable1: street, number, postal code, country, base_table_id (foreign key)
addresstable2: po box no, po box code, base_table_id (FK)
and so on.
That way you avoid loads of null fields; your tables are not so wide (always desirable), and your records are implicitly versioned, because the list of tables that have a record belonging to a record in your base table tells you which fields the original form had, hence what kind of form was used originally.

What is the preferred way to store custom fields in a SQL database?

My friend is building a product to be used by different independent medical units.
The database stores a vast collection of measurements taken at different times, like the temperature, blood pressure, etc...
Let us assume these are held in a table called exams with columns temperature, pressure, etc... (as well as id, patient_id and timestamp). Most of the measurements are stored as floats, but some are of other types (strings, integers...)
While many of these measurements are handled by their product, it needs to allow the different medical units to record and process other custom measurements. A very nifty UI allows the administrator to edit these customs fields, specify their name, type, possible range of values, etc...
He is unsure as to how to store these custom fields.
He is leaning towards a separate table (say a table custom_exam_data with fields like exam_id, custom_field_id, float_value, string_value, ...)
I worry that this will make searching both more difficult to achieve and less efficient.
I am leaning towards modifying the exam table directly (while avoiding conflicts on column names with some scheme like prefixing all custom fields with an underscore or naming them custom_1, ...)
He worries about modifying the database dynamically and having different schemas for each medical unit.
Hopefully some people which more experience can weigh in on this issue.
Notes:
he is using Ruby on Rails but I think this question is pretty much framework agnostic, except from the fact that he is only looking for solutions in SQL databases only.
I simplified the problem a bit since the custom fields need to be available for more than one table, but I believe this doesn`t really impact the direction to take.
(added) A very generic reporting module will need to search, sort, generate stats, etc.. of this data, so it is required that this data be stored in the columns of the appropriate type
(added) User inputs will be filtered, for the standard fields as well as for the custom fields. For example, numbers will be checked within a given range (can't have a temperature of -12 or +444), etc... Thus, conversion to the appropriate SQL type is not a problem.
I've had to deal with this situation many times over the years, and I agree with your initial idea of modifying the DB tables directly, and using dynamic SQL to generate statements.
Creating string UserAttribute or Key/Value columns sounds appealing at first, but it leads to the inner-platform effect where you end up having to re-implement foreign keys, data types, constraints, transactions, validation, sorting, grouping, calculations, et al. inside your RDBMS. You may as well just use flat files and not SQL at all.
SQL Server provides INFORMATION_SCHEMA tables that let you create, query, and modify table schemas at runtime. This has full type checking, constraints, transactions, calculations, and everything you need already built-in, don't reinvent it.
It's strange that so many people come up with ad-hoc solutions for this when there's a well-documented pattern for it:
Entity-Attribute-Value (EAV) Model
Two alternatives are XML and Nested Sets. XML is easier to manage but generally slow. Nested Sets usually require some type of proprietary database extension to do without making a mess, like CLR types in SQL Server 2005+. They violate first-normal form, but are nevertheless the fastest-performing solution.
Microsoft Dynamics CRM achieves this by altering the database design each time a change is made. Nasty, I think.
I would say a better option would be to consider an attribute table. Even though these are often frowned upon, it gives you the flexibility you need, and you can always create views using dynamic SQL to pivot the data out again. Just make sure you always use LEFT JOINs and FKs when creating these views, so that the Query Optimizer can do its job better.
I have seen a use of your friend's idea in a commercial accounting package. The table was split into two, first contained fields solely defined by the system, second contained fields like USER_STRING1, USER_STRING2, USER_FLOAT1 etc. The tables were linked by identity value (when a record is inserted into the main table, a record with same identity is inserted into the second one). Each table that needed user fields was split like that.
Well, whenever I need to store some unknown type in a database field, I usually store it as String, serializing it as needed, and also store the type of the data.
This way, you can have any kind of data, working with any type of database.
I would be inclined to store the measurement in the database as a string (varchar) with another column identifying the measurement type. My reasoning is that it will presumably, come from the UI as a string and casting to any other datatype may introduce a corruption before the user input get's stored.
The downside is that when you go to filter result-sets by some measurement metric you will still have to perform a casting but at least the storage and persistence mechanism is not introducing corruption.
I can't tell you the best way but I can tell you how Drupal achieves a sort of schemaless structure while still using the standard RDBMSs available today.
The general idea is that there's a schema table with a list of fields. Each row really only has two columns, the 'table':String column and the 'column':String column. For each of these columns it actually defines a whole table with just an id and the actual data for that column.
The trick really is that when you are working with the data it's never more than one join away from the bundle table that lists all the possible columns so you end up not losing as much speed as you might otherwise think. This will also allow you to expand much farther than just a few medical companies unlike the custom_ prefix you were proposing.
MySQL is very fast at returning row data for short rows with few columns. In this way this scheme ends up fairly quick while allowing you lots of flexibility.
As to search, my suggestion would be to index the page content instead of the database content. Use Solr to parse through rendered pages and hold links to the actual page instead of trying to search through the database using clever SQL.
Define two new tables: custom_exam_schema and custom_exam_data.
custom_exam_data has an exam_id column, plus an additional column for every custom attribute.
custom_exam_schema would have a row to describe how to interpret each of the columns of the custom_exam_data table. It would have columns like name, type, minValue, maxValue, etc.
So, for example, to create a custom field to track the number of fingers a person has, you would add ('fingerCount', 'number', 0, 10) to custom_exam_schema and then add a column named fingerCount to the exam table.
Someone might say it's bad to change the database schema at run time, but I'd argue that configuring these custom fields is part of set up and won't happen too often. Still, this method lets you handle changes at any time and doesn't risk messing around with your core table schemas.
lets say that your friend's database has to store data values from multiple sources such as demogrphic values, diagnosis, interventions, physionomic values, physiologic exam values, hospitalisation values etc.
He might have as well to define choices, lets say his database is missing the race and the unit staff need the race of the patient (different races are more unlikely to get some diseases), they might want to use a drop down with several choices.
I would propose to use an other table that would have these choices or would you just use a "Custom_field_choices" table, which at some point is exactly the same but with a different name.
Considering that the database :
- needs to be flexible
- that data from multiple tables can be added and be customized
- that you might want to keep the integrity of the main structure of your database for distribution and uniformity purpose
- that data MUST have a limit and alarms and warnings
- that data must have units ( 10 kg or 10 pounds) ?
- that data can have a selection of choices
- that data can be with different rights (from simple user to admin)
- that these data might be needed to generate reports without modifying the code (automation)
- that these data might be needed to make cross reference analysis within the system without modifying the code
the custom table would be my solution, modifying each table would end up being too risky.
I would store those custom fields in a table where each record ( dataType, dataValue, dataUnit ) would use in one row. So there would be a relation oneToMany from one sample to the data. You can also create a table to record all the kind of cutsom types you would use. For example:
create table DataType
(
id int primary key,
name varchar(100) not null unique
description text,
uri varchar(255) //<-- can be used for an ONTOLOGY
)
create table DataRecord
(
id int primary key,
sample_id int not null,//<-- reference to the sample
dataType_id int not null, //<-- references DataType
value varchar(100),//<-- the value as string
unit varchar(50)//<-- g, mg/ml, etc... but it could also be a link to a table describing the units just like DataType
)