Hierarchy vs many tables - sql

We have a requirement to store locations. There are different types of locations. Areas, Blocks, Buildings, Floors, Rooms and Beds. So, a Bed is in a room, which is on a floor etc.
I think I have two options. First is to have a table for each type. And a foreign key to keep them all linked.
OR...
CREATE TABLE [dbo].[Location]
(
[ID] Int IDENTITY(1,1) NOT NULL,
[ParentID] Int NULL,
[LocationTypeID] Int NOT NULL,
[Description] Varchar(100) COLLATE Latin1_General_CI_AS NOT NULL
)
A table to hold all locations, in a hierarchical style.
I like the idea of this, as if we add new types, it's data driven. No new table. But, the querying can be expensive I think.
If I want to show bed details (Bed 1 in Room 5 on the 4th floor of the science building...), it's a recursive function, which is more tricky than a simple INNER JOIN of all the tables to get details about a location.
One thing though.
I need to record movements. And a movement might be from a room, to an area. So, with separate tables, it will be hard to record movements in a single 'movement' table, as which table do I FK to? With a hierarchy, it's very easy.
Also, reporting on where 1000 people are, would call the recursive query a lot to produce results. Slow? Or is there a clean way to get around this?

All methods have their own pros and cons. Another approach I have seen (especially with dates) is to use one table and encode the hierarchy in a single field (no parent id).
For example ID (integer column) = 1289674566 would mean bed 66, floor 45, etc...
It will require a little work when you need to "extract" a specific hierarchy level (for example to count the number of distinct buildings) but arithmetic operations are quite fast and you can build views on top of the base table if you want to make life easier for end users.
Just another option...

My canned suggestion is to store the data how it is in the real world, then if you're lucky you can query it without too much of a hit. If there is too much of a hit, extract the data into a format that you can easily search.
In your case, I would go with the hierarchical style you are thinking. That way you can have a building with a room with a dresser with a drawer with a box in a box. You can then move the dresser to another room and all the stuff goes with it.
You'll find the recursive CTE's are fast as long as you're not trying to 'trick' SQL server into doing something.
I just answered a hierarchical question here that has a good example for you to play with. In particular pay attention to the SORT_PATH. In the example I build a SORT_PATH that looks something like this:
TEST01.TEST03.LABSTL
SSRS: Recursive Parent Child
You can store this value in your table on EDIT/UPDATE and it can do a lot for you (for performance) as long as you don't mind the hit when you're updating the record.
If you do mind the hit on an update you can use a backend process to keep your SORT_PATH updated. In the past I've used a "DIRTY BIT" field that gets flipped when something is modified; a backend process then comes through and updates everything related to that record, but since it's a backend process users don't notice the impact. This is a good job for the SEVICE BROKER to manage -- on edit/update/delete set the DIRTY_BIT=True and send the SERVICE BROKER a message that will kick off a process that updates anything with DIRTY_BIT=True.

Have a look at using the hierachyid data type, it's a CLR datatype native to SQL server and has built in functions for querying parent/child type relationships:
Tutorial: Using the hierarchyid Data Type

Related

database design: link table for multiple different tables

I have a conceptual question about database design which came up multiple times in my history as a developer.
Imagine I have a bigger database that is designed in the past, and already in production (for example you can take http://sqlfiddle.com/#!9/7c06f/1, a library database).
Now there is a new feature request: for every existing object in every table there shall be a "help text" (or something different, an error, a tag...) that you can all view in one place.
I implemented something like that multiple times in the past, but every time I'm not satisfied with the solution.
One solution is to have a link table with every table, like that:
CREATE TABLE BooksHelp (
BookId INT NOT NULL,
HelpText VARCHAR NOT NULL
);
CREATE TABLE AuthorHelp (
AuthorId INT NOT NULL,
HelpText VARCHAR NOT NULL
);
...
But I would need multiple link tables, which makes the selection of every existing "help text" difficult.
How would you design this problem? Is there another, better solution?
This appears to be similar trying to map polymorphism to the relational model - it's just a bad fit - there's no obvious right answer.
There are a few obvious solutions. The one you've identified (storing the helptext in a linked table) is neat, but requires lots of joins if you're retrieving all the books for an author belonging to a publisher etc. As the business logic says "all objects should have helptext" (and I assume each row has different helptext), it's not super logical to store this in a child table - "helptext" is an attribute of each object, not a related concept.
You could also add a helptext column to each table. That stores the attribute in the main table, and reduces the cognitive load (as well as the number of joins). This is logical if each author, book, etc. has their own helptext.
In the case of text items of some sort, especially if they are used in several tybles, like help texts, it seems preferable to centralize their text data in one table and store references in the tables that need them. This design also supports multi-lingual apps where the text items would have to be maintained in several languages.

Is it efficient to have 2 SQL tables with the same structure?

In inventory / production systems I usually implement a table structure similar to the following description...
--- Raw Item ---
id INT(10) UNSIGNED AUTO_INCREMENT,
name VARCHAR(32) NOT NULL,
description VARCHAR(128) NOT NULL,
ideal INT(10) UNSIGNED,
PRIMARY KEY(id)
And another table with the same fields for the processed items...
Next tables for clients and providers with similar structures.
Then tables for income orders and outcome orders with similar structures.
And finally, a table which stablishes relationships between a certain processed item and the types of raw items (With quantities) required to produce a batch...
It performs well... But I wonder if it would be better to merge the similar tables and adding a field such as 'type', would like some advice please.
In my mind adding an artificial type to combine similar data is not 3NF. Go with separate tables unless you need the data in the same table.
A client and a provider have similar fields but they are two different things.
If an order migrates from PO to Processed then it is the same thing and it would be appropriate to have a status flag. If you are moving data from one table to the other then combine with a flag is preferred.
That's a good question and as many SQL good questions the answer is "it depends".
IMHO it's ok to create similar (internal structured tables) for different artifacts (with similar properties).
See you can get:
Owner
Id, Name
Pet
Id, Name
Tables with same columns but different meaning.
Sure you can got Items and RawItens in the same table and just a Flag column to differentiate between both. You can even use a self referencing FK to relate Items with RamItems but how that can affect performance?
Well as your table grows engine will need more time (resources, mem, cpu) to retrieve rows/data. If you double the rows... for most DBMS doubling tables ill affect performance a lot less doubling a table rows.
Also it affects evolutive maintenance. If now you need to add one column for your RawItems but not for your Items you can become wasting space.
"Merging" similar tables can increase dificult to understand your schema, not simplify it.
I would imagine this would depend slightly on your database schema you choose. See this link for a little more information; https://dev.mysql.com/doc/refman/5.1/en/storage-engines.html
Each one would offer you various degrees of performance improvements and lack of features in certain areas. I guess the real question is do you want to have to deal with multiple tables, or would it be easier to deal with a single table?
A simple solution to this would be to add something simple like a status to the table. Then just change your queries to run a check against the status. Very simple, and space efficient way to save both tables into one.
--- Raw Item ---
id INT(10) UNSIGNED AUTO_INCREMENT,
name VARCHAR(32) NOT NULL,
description VARCHAR(128) NOT NULL,
ideal INT(10) UNSIGNED,
status VARCHAR(9) NOT NULL,
PRIMARY KEY(id)
SELECT * from items WHERE status="PROCESSED";
Personally I would prefer this approach. It leaves you with only a single table of inventory, as opposed to having multiple tables cluttering up the database schema. Not to mention if it ever expands further (say, you have new, processed, and archived).
How are you going to use the data over time? If you need to combine all these tables together for reporting, one table is likely preferable with partitioning if it is very large. If you need to move the records from one table to the other as it moves throguh a process, it is easier to check on the status of a process if it is all in one table.
Also, if the things are very different entities, they may have differnt related tables and then combining them just muddies the waters, makes the datbase more difficult to understand, and reduces the effectiveness of your PK/FK relationships. In this case, separate tables makes the most sense. It also makes the most sense if you think the data will diverge over time as currently planned features are added.
Take for instance a customer and a Sales rep. They might have many of the same fields but what they are related to will be very differnt and you would not want a customer to be able to be put into the child table for a rep. So now you have to enfore the relationships with more than an FK. Further if you have enough child tables, it makes deleting records to be difficult. The db has to check all the tables even if only half of them might apply to a particular record.
In one database I have seen, the orginal designer combined two things that were dissimlar but had the same fields at the parent level and ended up with well over 100 FKs on that table. It was a nightmare to delete from that table and never fast. And there were occasional data integrity problems where one type of record ended up in the wrong child tables.

How important are lookup tables?

A lot of the applications I write make use of lookup tables, since that was just the way I was taught (normalization and such). The problem is that the queries I make are often more complicated because of this. They often look like this
get all posts that are still open
"SELECT * FROM posts WHERE status_id = (SELECT id FROM statuses WHERE name = 'open')"
Often times, the lookup tables themselves are very short. For instance, there may only be 3 or so different statuses. In this case, would it be okay to search for a certain type by using a constant or so in the application? Something like
get all posts that are still open
"SELECT * FROM posts WHERE status_id = ".Status::OPEN
Or, what if instead of using a foreign id, I set it as an enum and queried off of that?
Thanks.
The answer depends a little if you are limited to freeware such as PostGreSQL (not fully SQL compliant), or if you are thinking about SQL (ie. SQL compliant) and large databases.
In SQL compliant, Open Architecture databases, where there are many apps using one database, and many users using different report tools (not just the apps) to access the data, standards, normalisation, and open architecture requirements are important.
Despite the people who attempt to change the definition of "normalisation", etc. to suit their ever-changing purpose, Normalisation (the science) has not changed.
if you have data values such as {Open; Closed; etc} repeated in data tables, that is data duplication, a simple Normalisation error: if you those values change, you may have to update millions of rows, which is very limited design.
Such values should be Normalised into a Reference or Lookup table, with a short CHAR(2) PK:
O Open
C Closed
U [NotKnown]
The data values {Open;Closed;etc} are no longer duplicated in the millions of rows. It also saves space.
the second point is ease of change, if Closed were changed to Expired, again, one row needs to be changed, and that is reflected in the entire database; whereas in the un-normalised files, millions of rows need to be changed.
Adding new data values, eg. (H,HalfOpen) is then simply a matter of inserting one row.
in Open Architecture terms, the Lookup table is an ordinary table. It exists in the [SQL compliant] catalogue; as long as the FOREIGN KEY relation has been defined, the report tool can find that as well.
ENUM is a Non-SQL, do not use it. In SQL the "enum" is a Lookup table.
The next point relates to the meaningfulness of the key.
If the Key is meaningless to the user, fine, use an {INT;BIGINT;GUID;etc} or whatever is suitable; do not number them incrementally; allow "gaps".
But if the Key is meaningful to the user, do not use a meaningless number, use a meaningful Relational Key.
Now some people will get in to tangents regarding the permanence of PKs. That is a separate point. Yes, of course, always use a stable value for a PK (not "immutable", because no such thing exists, and a system-generated key does not provide row uniqueness).
{M,F} are unlikely to change
if you have used {0,1,2,4,6}, well don't change it, why would you want to. Those values were supposed to be meaningless, remember, only a meaningful Key need to be changed.
if you do use meaningful keys, use short alphabetic codes, that developers can readily understand (and infer the long description from). You will appreciate this only when you code SELECT and realise you do not have to JOIN every Lookup table. Power users too, appreciate it.
Since PKs are stable, particularly in Lookup tables, you can safely code:
WHERE status_code = 'O' -- Open
You do not have to JOIN the Lookup table and obtain the data value Open, as a developer, you are supposed to know what the Lookup PKs mean.
Last, if the database were large, and supported BI or DSS or OLAP functions in addition to OLTP (as properly Normalised databases can), then the Lookup table is actually a Dimension or Vector, in Dimension-Fact analyses. If it was not there, then it would have to be added in, to satisfy the requirements of that software, before such analyses can be mounted.
If you do that to your database from the outset, you will not have to upgrade it (and the code) later.
Your Example
SQL is a low-level language, thus it is cumbersome, especially when it comes to JOINs. That is what we have, so we need to just accept the encumbrance and deal with it. Your example code is fine. But simpler forms can do the same thing.
A report tool would generate:
SELECT p.*,
s.name
FROM posts p,
status s
WHERE p.status_id = s.status_id
AND p.status_id = 'O'
Another Exaple
For banking systems, where we use short codes which are meaningful (since they are meaningful, we do not change them with the seasons, we just add to them), given a Lookup table such as (carefully chosen, similar to ISO Country Codes):
Eq Equity
EqCS Equity/Common Share
OTC OverTheCounter
OF OTC/Future
Code such as this is common:
WHERE InstrumentTypeCode LIKE "Eq%"
And the users of the GUI would choose the value from a drop-down that displays
{Equity/Common Share;Over The Counter},
not {Eq;OTC;OF}, not {M;F;U}.
Without a lookup table, you can't do that, either in the apps, or in the report tool.
For look-up tables I use a sensible primary key -- usually just a CHAR(1) that makes sense in the domain with an additional Title (VARCHAR) field. This can maintain relationship enforcement while "keeping the SQL simple". The key to remember here is the look-up table does not "contain data". It contains identities. Some other identities might be time-zone names or assigned IOC country codes.
For instance gender:
ID Label
M Male
F Female
N Neutral
select * from people where gender = 'M'
Alternatively, an ORM could be used and manual SQL generation might never have to be done -- in this case the standard "int" surrogate key approach is fine because something else deals with it :-)
Happy coding.
Create a function for each lookup.
There is no easy way. You want performance and query simplicity. Ensure the following is maintained. You could create a SP_TestAppEnums to compare existing lookup values against the function and look for out of sync/zero returned.
CREATE FUNCTION [Enum_Post](#postname varchar(10))
RETURNS int
AS
BEGIN
DECLARE #postId int
SET #postId =
CASE #postname
WHEN 'Open' THEN 1
WHEN 'Closed' THEN 2
END
RETURN #postId
END
GO
/* Calling the function */
SELECT dbo.Enum_Post('Open')
SELECT dbo.Enum_Post('Closed')
Question is: do you need to include the lookup tables (domain tables 'round my neck of the woods) in your queries? Presumably, these sorts of tables are usually
pretty static in nature — the domain might get extended, but it probably won't get shortened.
their primary key values are pretty unlikely to change as well (e.g., the status_id for a status of 'open' is unlikely to suddenly get changed to something other than what it was created as).
If the above assumptions are correct, there's no real need to add all those extra tables to your joins just so your where clause can use a friend name instead of an id value. Just filter on status_id directly where you need to. I'd suspect the non-key attribute in the where clause ('name' in your example above) is more likely to get changes than the key attribute ('name' in your example above): you're more protected by referencing the desire key value(s) of the domain table in your join.
Domain tables serve
to limit the domain of the variable via a foreign key relationship,
to allow the domain to be expanded by adding data to the domain table,
to populate UI controls and the like with user-friendly information,
Naturally, you'd need to suck domain tables into your queries where you you actually required the non-key attributes from the domain table (e.g., descriptive name of the value).
YMMV: a lot depends on context and the nature of the problem space.
The answer is "whatever makes sense".
lookup tables involve joins or subqueries which are not always efficient. I make use of enums a lot to do this job. its efficient and fast
Where possible (and It is not always . . .), I use this rule of thumb: If I need to hard-code a value into my application (vs. let it remain a record in the database), and also store that vlue in my database, then something is amiss with my design. It's not ALWAYS true, but basically, whatever the value in question is, it either represents a piece of DATA, or a peice of PROGRAM LOGIC. It is a rare case that it is both.
NOT that you won't find yourself discovering which one it is halfway into the project. But as the others said above, there can be trade-offs either way. Just as we don't always acheive "perfect" normalization in a database design (for reason of performance, or simply because you CAN take thngs too far in pursuit of acedemic perfection . . .), we may make some concious choices about where we locate our "look-up" values.
Personally, though, I try to stand on my rule above. It is either DATA, or PROGRAM LOGIC, and rarely both. If it ends up as (or IN) a record in the databse, I try to keep it out of the Application code (except, of course, to retrieve it from the database . . .). If it is hardcoded in my application, I try to keep it out of my database.
In cases where I can't observe this rule, I DOCUMENT THE CODE with my reasoning, so three years later, some poor soul will be able to ficure out how it broke, if that happens.
The commenters have convinced me of the error of my ways. This answer and the discussion that went along with it, however, remain here for reference.
I think a constant is appropriate here, and a database table is not. As you design your application, you expect that table of statuses to never, ever change, since your application has hard-coded into it what those statuses mean, anyway. The point of a database is that the data within it will change. There are cases where the lines are fuzzy (e.g. "this data might change every few months or so…"), but this is not one of the fuzzy cases.
Statuses are a part of your application's logic; use constants to define them within the application. It's not only more strictly organized that way, but it will also allow your database interactions to be significantly speedier.

How to add user customized data to database?

I am trying to design a sqlite database that will store notes. Each of these notes will have common fields like title, due date, details, priority, and completed.
In addition though, I would like to add data for more specialized notes like price for shopping list items and author/publisher data for books.
I also want to have a few general purpose fields that users can fill with whatever text data they want.
How can I design my database table in this case?
I could just have a field for each piece of data for every note, but that would waste a lot of fields and I'd like to have other options and suggestions.
There are several standard approaches you could use for solving this situation.
You could create separate tables for each kind of note, copying over the common columns in each case. this would be easy but it would make it difficult to query over all notes.
You could create one large table with many columns and some kind of type field which would let you know which type of note it is (and therefore which subset of columns to use)
CREATE TABLE NOTE ( ID int PRIMARY KEY, NOTE_TYPE int, DUEDATE datetime, ...more common fields, price NUMBER NULL, author VARCHAR(100) NULL,.. more specific fields)
you could break your tables up into a inheritance relationship something like this:
CREATE TABLE NOTE ( ID int PRIMARY KEY, NOTE_TYPE int, DUEDATE datetime, ...more common fields);
CREATE TABLE SHOPPINGLITITEM (ID int PRIMARY KEY, NOTE_ID int FORIENKEY NOTE.ID, price number ... more shopping list item fields)
Option 1 would be easy to implement but would involve lots of mostly redundant table definitions.
Option 2 would be easy to create and easy to write queries on but would be space inefficient
And option 3 would be more space efficient and less redundant but would possibly have slower queries because of all the foreign keys.
This is the typical set of trade-offs for modeling these kinds of relationships in SQL, any of these solutions could be appropriate for use case depending non your performance requirements.
You could create something like a custom_field table. It gets pretty messy once you start to normalize.
So you have your note table with it's common fields.
Now add:
dynamic_note_field
id label
1 publisher
2 color
3 size
dynamic_note_field_data
id dynamic_note_field_id value
1 1 Penguin
2 1 Marvel
3 2 Red
Finally, you can relate instances of your data with the fields they use through
note_dynamic_note_field_data
note_id dynamic_note_field_data_id
1 1
1 3
2 2
So now we've said: note_id 1 has two additional fields. The first one has a value "Penguin" and represents a publisher. The second one has a value of "Red" and represents a color.
So what's the point of normalizing it this far?
You're not wasting space adding fields to every item (you relate a note with it's additional dynamic field via the m2m table).
You're not storing redundant labels (you may continue to store redundant data however as the same publisher is likely to appear many times... this aspect is extremely subjective. If you want rich data about your publishers you typically want to take the step of turning them into their own entity rather than an ad-hoc string. Be careful when making this leap because it adds an extra level of hairiness to the db. Evaluate the use case accordingly.
The dynamic_note_field acts as your data definition. If you're interested in answering a question such as "what are the additional fields I've created" this lets you do it easily without searching all of your dynamic_note_field_data. Eventually, you might add extra info to this table such as a type field. I like to create this separation off the bat, but that might be a violation of the YAGNI principle in your case.
Disadvantages:
It's not too bad to search for all notes that have a publisher, where that publisher is "Penguin".
What's tricky is something like "Find any note with a value of 'Penguin' in any field". You don't know up front which field's your searching. At this point you're better off with a separate index that's generated alongside your normalized db data which acts as the point of truth. Again, the nice thing about normalization is that you maintain the data in a very lossless, non-destructive state.
For data you want to store but does not have to be searchable, another option is to serialize it to/from JSON and store it in a TEXT column. This gives you arbitrary structure, but you cannot readily query against those values.
Yet another option is to dump SQLite and go with an object database. I seem to recall there are one or two working for Android. I have not tried any of these, however.
Just create a small table which contains the common fields of all your notes.
Then a table for each class of special notes you have, that that contains all the extra fiels plus a reference on your first table.
For each note you will enter, you create a row in your main table (that contains the common fields) and a row in your extra table that contains the extra fields, and a reference to the row in your main table.
Then you will just have to make a join in you request.
With this solution :
1)you have a safe design (can't access fields that are not part of your note)
2)your db will be optimized

What is the preferred way to store custom fields in a SQL database?

My friend is building a product to be used by different independent medical units.
The database stores a vast collection of measurements taken at different times, like the temperature, blood pressure, etc...
Let us assume these are held in a table called exams with columns temperature, pressure, etc... (as well as id, patient_id and timestamp). Most of the measurements are stored as floats, but some are of other types (strings, integers...)
While many of these measurements are handled by their product, it needs to allow the different medical units to record and process other custom measurements. A very nifty UI allows the administrator to edit these customs fields, specify their name, type, possible range of values, etc...
He is unsure as to how to store these custom fields.
He is leaning towards a separate table (say a table custom_exam_data with fields like exam_id, custom_field_id, float_value, string_value, ...)
I worry that this will make searching both more difficult to achieve and less efficient.
I am leaning towards modifying the exam table directly (while avoiding conflicts on column names with some scheme like prefixing all custom fields with an underscore or naming them custom_1, ...)
He worries about modifying the database dynamically and having different schemas for each medical unit.
Hopefully some people which more experience can weigh in on this issue.
Notes:
he is using Ruby on Rails but I think this question is pretty much framework agnostic, except from the fact that he is only looking for solutions in SQL databases only.
I simplified the problem a bit since the custom fields need to be available for more than one table, but I believe this doesn`t really impact the direction to take.
(added) A very generic reporting module will need to search, sort, generate stats, etc.. of this data, so it is required that this data be stored in the columns of the appropriate type
(added) User inputs will be filtered, for the standard fields as well as for the custom fields. For example, numbers will be checked within a given range (can't have a temperature of -12 or +444), etc... Thus, conversion to the appropriate SQL type is not a problem.
I've had to deal with this situation many times over the years, and I agree with your initial idea of modifying the DB tables directly, and using dynamic SQL to generate statements.
Creating string UserAttribute or Key/Value columns sounds appealing at first, but it leads to the inner-platform effect where you end up having to re-implement foreign keys, data types, constraints, transactions, validation, sorting, grouping, calculations, et al. inside your RDBMS. You may as well just use flat files and not SQL at all.
SQL Server provides INFORMATION_SCHEMA tables that let you create, query, and modify table schemas at runtime. This has full type checking, constraints, transactions, calculations, and everything you need already built-in, don't reinvent it.
It's strange that so many people come up with ad-hoc solutions for this when there's a well-documented pattern for it:
Entity-Attribute-Value (EAV) Model
Two alternatives are XML and Nested Sets. XML is easier to manage but generally slow. Nested Sets usually require some type of proprietary database extension to do without making a mess, like CLR types in SQL Server 2005+. They violate first-normal form, but are nevertheless the fastest-performing solution.
Microsoft Dynamics CRM achieves this by altering the database design each time a change is made. Nasty, I think.
I would say a better option would be to consider an attribute table. Even though these are often frowned upon, it gives you the flexibility you need, and you can always create views using dynamic SQL to pivot the data out again. Just make sure you always use LEFT JOINs and FKs when creating these views, so that the Query Optimizer can do its job better.
I have seen a use of your friend's idea in a commercial accounting package. The table was split into two, first contained fields solely defined by the system, second contained fields like USER_STRING1, USER_STRING2, USER_FLOAT1 etc. The tables were linked by identity value (when a record is inserted into the main table, a record with same identity is inserted into the second one). Each table that needed user fields was split like that.
Well, whenever I need to store some unknown type in a database field, I usually store it as String, serializing it as needed, and also store the type of the data.
This way, you can have any kind of data, working with any type of database.
I would be inclined to store the measurement in the database as a string (varchar) with another column identifying the measurement type. My reasoning is that it will presumably, come from the UI as a string and casting to any other datatype may introduce a corruption before the user input get's stored.
The downside is that when you go to filter result-sets by some measurement metric you will still have to perform a casting but at least the storage and persistence mechanism is not introducing corruption.
I can't tell you the best way but I can tell you how Drupal achieves a sort of schemaless structure while still using the standard RDBMSs available today.
The general idea is that there's a schema table with a list of fields. Each row really only has two columns, the 'table':String column and the 'column':String column. For each of these columns it actually defines a whole table with just an id and the actual data for that column.
The trick really is that when you are working with the data it's never more than one join away from the bundle table that lists all the possible columns so you end up not losing as much speed as you might otherwise think. This will also allow you to expand much farther than just a few medical companies unlike the custom_ prefix you were proposing.
MySQL is very fast at returning row data for short rows with few columns. In this way this scheme ends up fairly quick while allowing you lots of flexibility.
As to search, my suggestion would be to index the page content instead of the database content. Use Solr to parse through rendered pages and hold links to the actual page instead of trying to search through the database using clever SQL.
Define two new tables: custom_exam_schema and custom_exam_data.
custom_exam_data has an exam_id column, plus an additional column for every custom attribute.
custom_exam_schema would have a row to describe how to interpret each of the columns of the custom_exam_data table. It would have columns like name, type, minValue, maxValue, etc.
So, for example, to create a custom field to track the number of fingers a person has, you would add ('fingerCount', 'number', 0, 10) to custom_exam_schema and then add a column named fingerCount to the exam table.
Someone might say it's bad to change the database schema at run time, but I'd argue that configuring these custom fields is part of set up and won't happen too often. Still, this method lets you handle changes at any time and doesn't risk messing around with your core table schemas.
lets say that your friend's database has to store data values from multiple sources such as demogrphic values, diagnosis, interventions, physionomic values, physiologic exam values, hospitalisation values etc.
He might have as well to define choices, lets say his database is missing the race and the unit staff need the race of the patient (different races are more unlikely to get some diseases), they might want to use a drop down with several choices.
I would propose to use an other table that would have these choices or would you just use a "Custom_field_choices" table, which at some point is exactly the same but with a different name.
Considering that the database :
- needs to be flexible
- that data from multiple tables can be added and be customized
- that you might want to keep the integrity of the main structure of your database for distribution and uniformity purpose
- that data MUST have a limit and alarms and warnings
- that data must have units ( 10 kg or 10 pounds) ?
- that data can have a selection of choices
- that data can be with different rights (from simple user to admin)
- that these data might be needed to generate reports without modifying the code (automation)
- that these data might be needed to make cross reference analysis within the system without modifying the code
the custom table would be my solution, modifying each table would end up being too risky.
I would store those custom fields in a table where each record ( dataType, dataValue, dataUnit ) would use in one row. So there would be a relation oneToMany from one sample to the data. You can also create a table to record all the kind of cutsom types you would use. For example:
create table DataType
(
id int primary key,
name varchar(100) not null unique
description text,
uri varchar(255) //<-- can be used for an ONTOLOGY
)
create table DataRecord
(
id int primary key,
sample_id int not null,//<-- reference to the sample
dataType_id int not null, //<-- references DataType
value varchar(100),//<-- the value as string
unit varchar(50)//<-- g, mg/ml, etc... but it could also be a link to a table describing the units just like DataType
)