I've been considering three similar database designs, but haven't come up with a very strong for or against regarding any of them.
One huge table for content
content: int id, enum type, int parent_id, varchar title, text body, text data
This would assign each row a type (news, blog, etc), and have fields for common/standard/searchable data, then any non-standard or trivial data is stored as serialized xml in the data field.
One table for ids, many tables for content
ids: int id, enum tableName, int parent_id
This has one large table for ids, then every other table references this id, making it easy to have hierarchical content.
A combination of the two above, where a main table stores all common info, but unimportant data is stored in a respective table.
Naturally it's easier to keep data consistent when everything has its own table, but the above ideas make it much easier to force standardization of common fields, and makes it a lot simpler to relate content to eachother (especially with tagging).
Any thoughts or links would be appreciated.
I think the idea of a master lookup table with secondary tables for actual content is the best solution. Drupal has a similar structure to what you describe and it has proven quite flexible.
http://projects.contentment.org/blog/84
Drupal has a main "node" database in a single table and references specialized tables when getting the actual content.
I'm not a fan of the idea of trying to tuck everything into a table as XML. That could prove to be a performance and flexibility dog over time.
Why not just have different tables for each type of content? News, blog, etc. It seems to me that would be the best and easiest to use option.
Different tables for different types of content. If you ever need to add a new content type or additional features to an existing content type you don't have to worry about breaking the world while you do it.
Related
I have a conceptual question about database design which came up multiple times in my history as a developer.
Imagine I have a bigger database that is designed in the past, and already in production (for example you can take http://sqlfiddle.com/#!9/7c06f/1, a library database).
Now there is a new feature request: for every existing object in every table there shall be a "help text" (or something different, an error, a tag...) that you can all view in one place.
I implemented something like that multiple times in the past, but every time I'm not satisfied with the solution.
One solution is to have a link table with every table, like that:
CREATE TABLE BooksHelp (
BookId INT NOT NULL,
HelpText VARCHAR NOT NULL
);
CREATE TABLE AuthorHelp (
AuthorId INT NOT NULL,
HelpText VARCHAR NOT NULL
);
...
But I would need multiple link tables, which makes the selection of every existing "help text" difficult.
How would you design this problem? Is there another, better solution?
This appears to be similar trying to map polymorphism to the relational model - it's just a bad fit - there's no obvious right answer.
There are a few obvious solutions. The one you've identified (storing the helptext in a linked table) is neat, but requires lots of joins if you're retrieving all the books for an author belonging to a publisher etc. As the business logic says "all objects should have helptext" (and I assume each row has different helptext), it's not super logical to store this in a child table - "helptext" is an attribute of each object, not a related concept.
You could also add a helptext column to each table. That stores the attribute in the main table, and reduces the cognitive load (as well as the number of joins). This is logical if each author, book, etc. has their own helptext.
In the case of text items of some sort, especially if they are used in several tybles, like help texts, it seems preferable to centralize their text data in one table and store references in the tables that need them. This design also supports multi-lingual apps where the text items would have to be maintained in several languages.
I am trying to design a sqlite database that will store notes. Each of these notes will have common fields like title, due date, details, priority, and completed.
In addition though, I would like to add data for more specialized notes like price for shopping list items and author/publisher data for books.
I also want to have a few general purpose fields that users can fill with whatever text data they want.
How can I design my database table in this case?
I could just have a field for each piece of data for every note, but that would waste a lot of fields and I'd like to have other options and suggestions.
There are several standard approaches you could use for solving this situation.
You could create separate tables for each kind of note, copying over the common columns in each case. this would be easy but it would make it difficult to query over all notes.
You could create one large table with many columns and some kind of type field which would let you know which type of note it is (and therefore which subset of columns to use)
CREATE TABLE NOTE ( ID int PRIMARY KEY, NOTE_TYPE int, DUEDATE datetime, ...more common fields, price NUMBER NULL, author VARCHAR(100) NULL,.. more specific fields)
you could break your tables up into a inheritance relationship something like this:
CREATE TABLE NOTE ( ID int PRIMARY KEY, NOTE_TYPE int, DUEDATE datetime, ...more common fields);
CREATE TABLE SHOPPINGLITITEM (ID int PRIMARY KEY, NOTE_ID int FORIENKEY NOTE.ID, price number ... more shopping list item fields)
Option 1 would be easy to implement but would involve lots of mostly redundant table definitions.
Option 2 would be easy to create and easy to write queries on but would be space inefficient
And option 3 would be more space efficient and less redundant but would possibly have slower queries because of all the foreign keys.
This is the typical set of trade-offs for modeling these kinds of relationships in SQL, any of these solutions could be appropriate for use case depending non your performance requirements.
You could create something like a custom_field table. It gets pretty messy once you start to normalize.
So you have your note table with it's common fields.
Now add:
dynamic_note_field
id label
1 publisher
2 color
3 size
dynamic_note_field_data
id dynamic_note_field_id value
1 1 Penguin
2 1 Marvel
3 2 Red
Finally, you can relate instances of your data with the fields they use through
note_dynamic_note_field_data
note_id dynamic_note_field_data_id
1 1
1 3
2 2
So now we've said: note_id 1 has two additional fields. The first one has a value "Penguin" and represents a publisher. The second one has a value of "Red" and represents a color.
So what's the point of normalizing it this far?
You're not wasting space adding fields to every item (you relate a note with it's additional dynamic field via the m2m table).
You're not storing redundant labels (you may continue to store redundant data however as the same publisher is likely to appear many times... this aspect is extremely subjective. If you want rich data about your publishers you typically want to take the step of turning them into their own entity rather than an ad-hoc string. Be careful when making this leap because it adds an extra level of hairiness to the db. Evaluate the use case accordingly.
The dynamic_note_field acts as your data definition. If you're interested in answering a question such as "what are the additional fields I've created" this lets you do it easily without searching all of your dynamic_note_field_data. Eventually, you might add extra info to this table such as a type field. I like to create this separation off the bat, but that might be a violation of the YAGNI principle in your case.
Disadvantages:
It's not too bad to search for all notes that have a publisher, where that publisher is "Penguin".
What's tricky is something like "Find any note with a value of 'Penguin' in any field". You don't know up front which field's your searching. At this point you're better off with a separate index that's generated alongside your normalized db data which acts as the point of truth. Again, the nice thing about normalization is that you maintain the data in a very lossless, non-destructive state.
For data you want to store but does not have to be searchable, another option is to serialize it to/from JSON and store it in a TEXT column. This gives you arbitrary structure, but you cannot readily query against those values.
Yet another option is to dump SQLite and go with an object database. I seem to recall there are one or two working for Android. I have not tried any of these, however.
Just create a small table which contains the common fields of all your notes.
Then a table for each class of special notes you have, that that contains all the extra fiels plus a reference on your first table.
For each note you will enter, you create a row in your main table (that contains the common fields) and a row in your extra table that contains the extra fields, and a reference to the row in your main table.
Then you will just have to make a join in you request.
With this solution :
1)you have a safe design (can't access fields that are not part of your note)
2)your db will be optimized
I need to store info about county, municipality and city in Norway in a mysql database. They are related in a hierarchical manner (a city belongs to a municipality which again belongs to a county).
Is it best to store this as three different tables and reference by foreign key, or should I store them in one table and relate them with a parent_id field?
What are the pros and cons of either solution? (both structural end efficiency wise)
If you've really got a limit of these three levels (county, municipality, city), I think you'll be happiest with three separate tables with foreign keys reaching up one level each. This will make queries almost trivial to write.
Using a single table with a parent_id field referencing the same table allows you to represent arbitrary tree structures, but makes querying to extract the full path from node to root an iterative process best handled in your application code.
The separate table solution will be much easier to use.
three different tables:
more efficient, if your application mostly accesses information about only one entity (county, municipality, city)
owner-member-relationship is a clear and elegant model ;)
County, Municipality, and City don't sound like they are the same kind of data ; so, I would use three different tables : one per data-type.
And, then, I would indeed use foreign keys between those.
Efficiency-speaking, not sure it'll change much :
you'll do joins on 3 tables instead of joining 3 times on the same table ; I suppose it's quite the same.
it might make a little difference when you need to work on only one of those three type of data ; but with the right indexes, the differences should be minimal.
But, structurally speaking, if those are three different kind of entities, it makes sense to use three different tables.
I would recommend for using three different tables as they are three different entities.
I would use only one table in those cases you don´t know the depth of the hierarchy, but it is not case.
I would put them in three different tables, just on the grounds that it is 3 different concepts. This will hamper speed and will complicate your queries. However given that MySQL does not have any special support for hirachical queries (like Oracle's connect by statement) these would be complicated anyway.
Different tables: it's just "right". I doubt you'll see any performance gains/losses either way but this is one where modelling it properly up-front will probably save you lots of headaches later on. For one thing it'll make SQL SELECTs easier to write and read.
You'll get different opinions coming back to you on this but my personal preference would be to have separate tables because they are separate entities.
In reality you need to think about the queries you will doing on this data and usually your answer will come from that. With separate tables your queries will look much cleaner and in the end your not saving yourself anything because you'll still be joining tables together, even if they are the same table.
I would use three separate tables, since you know exactly what categories of information you are working with, and won't need to dynamically alter the 'depth' of your hierarchy.
It'll also make the data simpler to manage, as you'll be able to tell if the data is for a city, municipality or a county just by knowing the table (and without having to discern the 'depth' of a record in the hierarchy first!).
Since you'll probably be doing self joins anyway to get the hierarchy to work, I'd doubt there would be any benefits from having all the data in a single table.
In dataware housing applications, adherents of the Kimball methodology might place these fields in the same attribute table:
create table city (
id int not null,
county varchar(50) not null,
municipality varchar(50),
city varchar(50),
primary key(id)
);
The idea being that attibutes should never be more than l join away from the fact table.
I just state this as an alternative view. I would go with the 3 table design personally.
This is a case of ‘Database Normalization’, which is the process of organizing the fields and tables of a relational database to minimize redundancy and dependency. The purpose is to isolate data so that additions, deletions, and modifications of a field can be made in just one table and then propagated through the rest of the database via the defined relationships.
Multiple tables will help in the situation if the task has been distributed among different developers, or users at different levels require different rights to view and change the data or the small tables help when you need this data for other purposes as well or so.
My vote would be for multiple tables - with data appropriately distributed.
Database design question for y'all. I have a form (like, the paper kind) that has several entry points for data. This form has changed, and is expected to change over years. It is being turned into a computer app, so that we can, among other things, quit wasting paper. (And minor things, like have all the data in one central store that can be queried, etc.) I'd like to store all of the forms data in a database, and have it be pretty agnostic as to the changes.
Originally, I was just considering each field to be a string -- and I had a table something like this:
FormId int (FK)
FieldName nvarchar(64)
FieldValue nvarchar(128)
...something like that. It was actually a bit more 3NFy in that FieldName was in another table, associated with an artificial key, so that the field names weren't duplicated all over the place.
However, I'd like to extend this to numeric and drop-down data. I could just store numeric data as strings, but that seems like a pretty crappy idea. Same with drop downs.
I could stop using a table, and actually use columns on the main form table (the one that FormId above references), but that means adding a column for each new item as they come along, and older forms would just be null. (And, unless I stored it, I wouldn't know when that column was created. With the string table above, it's implicit.)
I could extend the table above to something like:
FormId int (FK)
FieldName nvarchar(64)
FieldValueType int -- enum as to which of the columns below are valid (or just let nulls imply that)
FieldValue nvarchar(128)
FieldValueInt int
Combos would have to be in a OTLT (one true lookup table), which I have reservations about, but perhaps it's needed here?
Any advice on StackOverflow? I'm using MSSQL, but this is really a more general question.
Use Nulls. Proper database design is a complicated subject; you may do well to pick up a good reference and do some research on the whole thing (I gather this is a good book on the topic). In general, it sounds like you would be well served by starting with a single table that encapsulates all the fields in your form, and then putting it through the normalization process. And yes, use nulls and do NOT use an int to enumerate which columns are set to valid values; that is exactly what nulls are for.
You could have a separate table for each datatype.
I.e. to fetch an entire form you'd do an N-way join using the form id where N is the number of distinct datatypes you support (+ perhaps extras depending on the info you want - e.g. dropdown values would probably be stored in another table / your fieldname lookup / etc.)
But the design should probably also depend on how you intend to use the data, which you've said nothing about. And it would also depend on just how fast the rate of change is for these forms . . .
By creating a table with a description of your forms, you are actually defining a metadata structure. That's daunting. You would need a lot of the infrastructure needed for proper table description. I think the vendors of your database system spent a lot of effort in doing all that.
At first I thought - what a nice idea! Build your own compatibility-aware table description system!
But then I thought - I'm too stupid to do that on my own. There must be a database system capable of doing that.
So I conclude, not being a db expert, define proper defaults for 'new fields' in new form versions. Handle the compatibility issue in your business logic.
I would strongly advise against having a "generic table" like you describe.
You are essentially reinventing the relational database, which is not a good idea: Queries and updates will be very painful with your structure, and you will not be able to use the more advanced features like foreign keys and triggers, should you need them.
Just make a table(s) with columns for the data fields, and if a form does not have a field, let it be null.
Or, probably even better, have a "base table" (field that are in every form), and give names/version numbers to updated forms, and have a new table for the new columns that this version adds, then use a synthetic PK to join these new tables to your base table.
I.e.:
base table: id(numeric,PK), name, birthday, town
addresstable1: street, number, postal code, country, base_table_id (foreign key)
addresstable2: po box no, po box code, base_table_id (FK)
and so on.
That way you avoid loads of null fields; your tables are not so wide (always desirable), and your records are implicitly versioned, because the list of tables that have a record belonging to a record in your base table tells you which fields the original form had, hence what kind of form was used originally.
My friend is building a product to be used by different independent medical units.
The database stores a vast collection of measurements taken at different times, like the temperature, blood pressure, etc...
Let us assume these are held in a table called exams with columns temperature, pressure, etc... (as well as id, patient_id and timestamp). Most of the measurements are stored as floats, but some are of other types (strings, integers...)
While many of these measurements are handled by their product, it needs to allow the different medical units to record and process other custom measurements. A very nifty UI allows the administrator to edit these customs fields, specify their name, type, possible range of values, etc...
He is unsure as to how to store these custom fields.
He is leaning towards a separate table (say a table custom_exam_data with fields like exam_id, custom_field_id, float_value, string_value, ...)
I worry that this will make searching both more difficult to achieve and less efficient.
I am leaning towards modifying the exam table directly (while avoiding conflicts on column names with some scheme like prefixing all custom fields with an underscore or naming them custom_1, ...)
He worries about modifying the database dynamically and having different schemas for each medical unit.
Hopefully some people which more experience can weigh in on this issue.
Notes:
he is using Ruby on Rails but I think this question is pretty much framework agnostic, except from the fact that he is only looking for solutions in SQL databases only.
I simplified the problem a bit since the custom fields need to be available for more than one table, but I believe this doesn`t really impact the direction to take.
(added) A very generic reporting module will need to search, sort, generate stats, etc.. of this data, so it is required that this data be stored in the columns of the appropriate type
(added) User inputs will be filtered, for the standard fields as well as for the custom fields. For example, numbers will be checked within a given range (can't have a temperature of -12 or +444), etc... Thus, conversion to the appropriate SQL type is not a problem.
I've had to deal with this situation many times over the years, and I agree with your initial idea of modifying the DB tables directly, and using dynamic SQL to generate statements.
Creating string UserAttribute or Key/Value columns sounds appealing at first, but it leads to the inner-platform effect where you end up having to re-implement foreign keys, data types, constraints, transactions, validation, sorting, grouping, calculations, et al. inside your RDBMS. You may as well just use flat files and not SQL at all.
SQL Server provides INFORMATION_SCHEMA tables that let you create, query, and modify table schemas at runtime. This has full type checking, constraints, transactions, calculations, and everything you need already built-in, don't reinvent it.
It's strange that so many people come up with ad-hoc solutions for this when there's a well-documented pattern for it:
Entity-Attribute-Value (EAV) Model
Two alternatives are XML and Nested Sets. XML is easier to manage but generally slow. Nested Sets usually require some type of proprietary database extension to do without making a mess, like CLR types in SQL Server 2005+. They violate first-normal form, but are nevertheless the fastest-performing solution.
Microsoft Dynamics CRM achieves this by altering the database design each time a change is made. Nasty, I think.
I would say a better option would be to consider an attribute table. Even though these are often frowned upon, it gives you the flexibility you need, and you can always create views using dynamic SQL to pivot the data out again. Just make sure you always use LEFT JOINs and FKs when creating these views, so that the Query Optimizer can do its job better.
I have seen a use of your friend's idea in a commercial accounting package. The table was split into two, first contained fields solely defined by the system, second contained fields like USER_STRING1, USER_STRING2, USER_FLOAT1 etc. The tables were linked by identity value (when a record is inserted into the main table, a record with same identity is inserted into the second one). Each table that needed user fields was split like that.
Well, whenever I need to store some unknown type in a database field, I usually store it as String, serializing it as needed, and also store the type of the data.
This way, you can have any kind of data, working with any type of database.
I would be inclined to store the measurement in the database as a string (varchar) with another column identifying the measurement type. My reasoning is that it will presumably, come from the UI as a string and casting to any other datatype may introduce a corruption before the user input get's stored.
The downside is that when you go to filter result-sets by some measurement metric you will still have to perform a casting but at least the storage and persistence mechanism is not introducing corruption.
I can't tell you the best way but I can tell you how Drupal achieves a sort of schemaless structure while still using the standard RDBMSs available today.
The general idea is that there's a schema table with a list of fields. Each row really only has two columns, the 'table':String column and the 'column':String column. For each of these columns it actually defines a whole table with just an id and the actual data for that column.
The trick really is that when you are working with the data it's never more than one join away from the bundle table that lists all the possible columns so you end up not losing as much speed as you might otherwise think. This will also allow you to expand much farther than just a few medical companies unlike the custom_ prefix you were proposing.
MySQL is very fast at returning row data for short rows with few columns. In this way this scheme ends up fairly quick while allowing you lots of flexibility.
As to search, my suggestion would be to index the page content instead of the database content. Use Solr to parse through rendered pages and hold links to the actual page instead of trying to search through the database using clever SQL.
Define two new tables: custom_exam_schema and custom_exam_data.
custom_exam_data has an exam_id column, plus an additional column for every custom attribute.
custom_exam_schema would have a row to describe how to interpret each of the columns of the custom_exam_data table. It would have columns like name, type, minValue, maxValue, etc.
So, for example, to create a custom field to track the number of fingers a person has, you would add ('fingerCount', 'number', 0, 10) to custom_exam_schema and then add a column named fingerCount to the exam table.
Someone might say it's bad to change the database schema at run time, but I'd argue that configuring these custom fields is part of set up and won't happen too often. Still, this method lets you handle changes at any time and doesn't risk messing around with your core table schemas.
lets say that your friend's database has to store data values from multiple sources such as demogrphic values, diagnosis, interventions, physionomic values, physiologic exam values, hospitalisation values etc.
He might have as well to define choices, lets say his database is missing the race and the unit staff need the race of the patient (different races are more unlikely to get some diseases), they might want to use a drop down with several choices.
I would propose to use an other table that would have these choices or would you just use a "Custom_field_choices" table, which at some point is exactly the same but with a different name.
Considering that the database :
- needs to be flexible
- that data from multiple tables can be added and be customized
- that you might want to keep the integrity of the main structure of your database for distribution and uniformity purpose
- that data MUST have a limit and alarms and warnings
- that data must have units ( 10 kg or 10 pounds) ?
- that data can have a selection of choices
- that data can be with different rights (from simple user to admin)
- that these data might be needed to generate reports without modifying the code (automation)
- that these data might be needed to make cross reference analysis within the system without modifying the code
the custom table would be my solution, modifying each table would end up being too risky.
I would store those custom fields in a table where each record ( dataType, dataValue, dataUnit ) would use in one row. So there would be a relation oneToMany from one sample to the data. You can also create a table to record all the kind of cutsom types you would use. For example:
create table DataType
(
id int primary key,
name varchar(100) not null unique
description text,
uri varchar(255) //<-- can be used for an ONTOLOGY
)
create table DataRecord
(
id int primary key,
sample_id int not null,//<-- reference to the sample
dataType_id int not null, //<-- references DataType
value varchar(100),//<-- the value as string
unit varchar(50)//<-- g, mg/ml, etc... but it could also be a link to a table describing the units just like DataType
)