SQL - Two foreign keys that have a dependency between them - sql

The current structure is as follows:
Table RowType: RowTypeID
Table RowSubType: RowSubTypeID
FK_RowTypeID
Table ColumnDef: FK_RowTypeID
FK_RowSubTypeID (nullable)
In short, I'm mapping column definitions to rows. In some cases, those rows have subtype(s), which will have column definitions specific to them. Alternatively, I could hang those column definitions that are specific to subtypes off their own table, or I could combine the data in RowType and RowSubType into one table and work with a single ID, but I'm not sure either is a better solution (if anything, I'd lean towards the latter, as we mostly end up pulling ColumnDefs for a given RowType/RowSubType).
Is the current design SQL blasphemy?
If I keep the current structure, how do I maintain that if RowSubTypeID is specified in ColumnDef, that it must correspond to the RowType specified by RowTypeID? Should I try to enforce this with a trigger or am I missing a simple redesign that would solve the problem?

What you're having trouble with is Fourth Normal Form.
Here's the solution:
Table RowSubType: RowSubTypeID
FK_RowTypeID
UNIQUE(FK_RowTypeID, RowSubTypeID)
Table ColumnDef: ColumnDefID
FK_RowTypeID
UNIQUE(ColumnDefID, FK_RowTypeID)
Table ColumnDefSubType: FK_ColumnDefID } compound foreign key to ColumnDef
FK_RowTypeID } }
FK_RowSubTypeID } compound foreign key to RowSubType
You only need to create a row in the ColumnDefSubType table for columns that have a row subtype. But all references are constrained so you can't create an anomaly.
But for what it's worth, I agree with #Seth's comment about possible over-engineering. I'm not sure I understand how you're using these column defs and row types, but it smells like the Inner-Platform Effect anti-pattern. In SQL, just use metadata to define metadata. Don't try to use data to create a dynamic schema.
See also this excellent story: Bad CaRMa.
Re your comment: In your case I'd recommend using Class Table Inheritance or Concrete Table Inheritance. This means defining a separate table per subtype. But each column of your original text record would go into the respective column of the subtype table. That way you don't need to have your rowtype or rowsubtype tables, it's implicit by defining tables for each subtype. And you don't need your columndefs table, that's implicit by the columns defined in your tables.
See also my answer to Product table, many kinds of product, each product has many parameters or my presentation slides Practical Object-Oriented Models in SQL.

Related

SQL multiple reference

I’m in a dilemma choosing the best strategy to model my database.
Let’s say I have a two tables: Variable(ID) and Object(ID).
Now, an entry in Variable may reference another entry in Variable or in Object.
To model this, one approach is creating 2 mapping tables:
Variable_Variable(variable_id, variable_id), Variable_Object(variable_id, object_id)
The other approach is to have in the Variable table two reference columns:
Variable(ID, parent_variable_id, parent_object_id).
If this variable references another variable, then the parent_object_id is null and vice-versa.
I feel first approach is neater, but second approach is faster when querying the database.
Is there any standard to apply in this cases? Which is the usual approach for these cases?
Thanks,
Danny.
Given that all relations are 1:1 I would go with your second approach of having parent_variable_id and parent_object_id columns in your Variable table.
You could then have a CHECK constraint to ensure that only one or the other column contains a value (or neither, if your variables don't have to reference a parent).
Another alternative that you didn't mention is using a single mapping table MappingTable (variable_id, parent_variable_id, parent_object_id). The downside with this is that, if variables must have a parent, you will then have to enforce a 1:1 relationship between the Variables table and Mappings table.
I would only consider using a mapping table if modelling an n:n relationship, or if there is additional information about the relationship between a variable and it's parent that needs to be recorded.

Table referenced by other tables having different PKs

I would like to create a table called "NOTES". I was thinking this table would contain a "table_name" VARCHAR(100) which indicates what table put in the note, a "key" or multiple "key" columns representing the primary key values of the table that this note applies to and a "note" field VARCHAR(MAX). When other tables use this table they would supply THEIR primary key(s) and their "table_name" and get all the notes associated with the primary key(s) they supplied. The problem is that other tables might have 1, 2 or more PKs so I am looking for ideas on how I can design this...
What you're suggesting sounds a little convoluted to me. I would suggest something like this.
Notes
------
Id - PK
NoteTypeId - FK to NoteTypes.Id
NoteContent
NoteTypes
----------
Id - PK
Description - This could replace the "table_name" column you suggested
SomeOtherTable
--------------
Id - PK
...
Other Columns
...
NoteId - FK to Notes.Id
This would allow you to keep your data better normalized, but still get the relationships between data that you want. Note that this assumes a 1:1 relationship between rows in your other tables and Notes. If that relationship will be many to one, you'll need a cross table.
Have a look at this thread about database normalization
What is Normalisation (or Normalization)?
Additionally, you can check this resource to learn more about foreign keys
http://www.w3schools.com/sql/sql_foreignkey.asp
Instead of putting the other table name's and primary key's in this table, have the primary key of the NOTES table be NoteId. Create an FK in each other table that will make a note, and store the corresponding NoteId's in the other tables. Then you can simply join on NoteId from all of these other tables to the NOTES table.
As I understand your problem, you're attempting to "abstract" the auditing of multiple tables in a way that you might abstract a class in OOP.
While it's a great OOP design principle, it falls flat in databases for multiple reasons. Perhaps the largest single reason is that if you cannot envision it, neither will someone (even you) looking at it later have an easy time reassembling the data. Smaller that that though, is that while you tend to think of a table as a container and thus similar to an object, in reality they are implemented instances of this hypothetical container you are trying to put together and operate better if you treat them as such. By creating an audit table specific to a table or a subset of tables that share structural similarity and data similarity, you increase the performance of your database and you won't run in to strange trigger or select related issues later.
And you can't envision it not because you're not good at what you're doing, but rather, the structure is not conducive to database logging.
Instead, I would recommend that you create separate logging tables that manage the auditing of each table you want to audit or log. In fact, some fast google searches show many scripts already written to do much of this tasking for you: Example of one such search
You should create these individual tables and then if you want to be able to report on multiple table or even all tables at once, you can create a stored procedure (if you want to make queries based on criterion) or a view with an included SELECT statement that JOINs and/or UNIONs the tables you are interested in - in a fashion that makes sense to the report type. You'll still have to write new objects in to the view, but even with your original table design, you'd have to account for that.

Recommended structure for table that has FK for 3 other tables

I have a table that will contain information for 3 other tables. The design I have is that this table will have a column that will tell the objects's ID and another column will tell the objects's type (and thus the table that that row refers to).
Two questions:
a) Is that the best design or is there something else more widely accepted?
b) What is the recommend procedure to assure that IDs are valid for the given objects's type?
If I understood your question correctly, each row in your table links to exactly one of the three other tables.
Your approach (type field + one foreign key field) is a valid design, and it's useful if you want to create a general-purpose table that contains meta-information about your data (e.g. a list of records that should be retransmitted for replication).
Another approach, which might be more suitable for real application-level data, would be to have three columns, each being a foreign key to one of the three tables, and to add a constraint that requires exactly two of those fields to be null. The has the following advantages:
The three FKs do not need to have the same data type.
The JOIN syntax becomes more natural (not involving the type field).
You can add referential integrity constraints on those FK columns.
You don't need to ensure correctness of the type field -- in fact, you don't need the type field at all. The type is determined implicitly by the one FK column which is not null.
a) I'm supposing you have a relationship one to many between objects and object types. In a normal design you'd have a reference from the objecttype column in the objects table to the primary key of the object types table
b) I would enforce referential integrity in the relationship properties (this depends on the dbms you are using). It's also up to you to use cascading on updates and deletes. This way, an update or a delete of the primary key on object types table would be reflected on the objects one, updating its foreign key column (object type column) or deleting the registers that have that object type.
The basics of DB schema design are easy, but more complicated situations can be really complicated to figure out what's best. There is a lot of personal subjectivity that can come into play here, and even performance can be a factor in denormalizing a design.
Disclaimer aside, my personal recommendation is to never use a column to store more than one kind of FK, i.e. a column for FKs should store FKs that point only to a single table. If you don't do this, you have to map the cascade of that column's data into multiple sub-select queries inside your code, and it can begin to get more messy than you expected. Your given "Problem No. 2, ensuring validity between type and FK" is just the beginning of a whole world of pain that will cascade throughout your source code.
Assuming you change the design to use one field per FK reference, I would also check whether each FK field in your main "information-holding table" will be fully valid for each record. If not, I would move out the FK columns that will only be applicable some of the time to a separate table.

Normalization Help

I am refactoring an old Oracle 10g schema to try to introduce some normalization. In one of the larger tables, there is a text field that has at most, 10-15 possible values. In my mind, it seems that this field is an example of unnecessary data duplication and should be extracted to a separate table.
After examining the data, I cannot find one relevant piece of information that could be associated with that text value. Basically, if I pulled that value out and put it into its own table, it would be the only field in that table. It exists today as more of a 'flag' field. Should I create a two-column table with a surrogate key, keep it as it is, or do something entirely different? Am I doing more harm than good by trying to minimize data duplication on this field?
You might save some space by extracting the column to a separate table. This is called a lookup table. It can give you a couple of other benefits:
You can declare a foreign key constraint to the lookup table, so you can rely on the column in the main table never having any value other than the 10-15 values you want.
It's easy to query for a concise list of all permitted values, by querying the lookup table. This can be faster than using SELECT DISTINCT on the main table's column. It also returns values that are permitted, but not currently used in the main table.
If you change a value in the lookup table, it automatically applies to all rows in the main table that reference it.
However, creating a lookup table with one column is not strictly normalization. You're just replacing one value with another. The attribute in the main table either already supports a normal form, or not.
Using surrogate keys (vs. natural keys) also has nothing to do with normalization. A lot of people make this mistake.
However, if you move other attributes into the lookup table, attributes that depend only on the lookup value and therefore would create repeating groups (violating 3NF) in the main table if you left them there, then that would be normalization.
If you want normalization break it out.
I think of these types of data in DBs as the equivalent of enums in C,C++,C#. Mostly you put them in the table as documentation.
I often have an ID, Name, Description, and auditing columns for them (eg modified by, modified date, create date, create by, active.) The description field is rarely used.
Example (some might say there are more than just 2)
Gender
ID Name Audit Columns...
1 Male
2 Female
Then in your contacts you would have a GenderID column which would link to this one.
Of course you don't "need" the table. You could have external documentation somewhere that says 1=Male, 2=Female -- but I think these tables serve to document a system.
If it's really a free-entry text field that's not re-used somewhere else in the database, and there's just a single field without repeated instances, I'd probably go ahead and leave it as it is. If you're determined to break it out I'd create a 'validation' table with a surrogate key and the text value, then put the surrogate key in the base table.
Share and enjoy.
Are these 10-15 values actually meaningful, or are they really just flags? If they're meaningful pieces of text and it seems wasteful to replicate them, then sure create a lookup table. But if they're just arbitrary flag values, then your new table will be nothing more than a mapping from one arbitrary value to another, and not terribly helpful.
A completely separate question is whether all or most of the rows in your big table even have a value for this column. If not, then indeed you have a good opportunity for normalization and can create a separate table linking the primary key from your base table with the flag value.
Edit: One thing. If there's some chance that one of these "flag" values is likely to be wholesale replaced with another value at some point in the future, that would be another good reason to create a table.

Subtyping database tables

I hear a lot about subtyping tables when designing a database, and I'm fully aware of the theory behind them. However, I have never actually seen table subtyping in action. How can you create subtypes of tables? I am using MS Access, and I'm looking for a way of doing it in SQL as well as through the GUI (Access 2003).
Cheers!
An easy example would be to have a Person table with a primary key and some columns in that table. Now you can create another table called Student that has a foreign key to the person table (its supertype). Now the student table has some columns which the supertype doesn't have like GPA, Major, etc. But the name, last name and such would be in the parent table. You can always access the student name back in the Person table through the foreign key in the Student table.
Anyways, just remember the following:
The hierarchy depicts relationship between supertypes and subtypes
Supertypes has common attributes
Subtypes have uniques attributes
Subtypes of tables is a conceptual thing in EER diagrams. I haven't seen an RDBMS (excluding object-relational DBMSs) that supports it directly. They are usually implemented in either
A set of nullable columns for each property of the subtype in a single table
With a table for base type properties and some other tables with at most one row per base table that will contain subtype properties
The notion of table sub-types is useful when using an ORM mapper to produce class sub-type heirarchy that exactly models the domain.
A sub-type table will have both a Foreign Key back to its parent which is also the sub-types table's primary key.
Keep in mind that in designing a bound application, as with an Access application, subtypes impose a heavy cost in terms of joins.
For instance, if you have a supertype table with three subtype tables and you need to display all three in a single form at once (and you need to show not just the supertype date), you end up with a choice of using three outer joins and Nz(), or you need a UNION ALL of three mutually exclusive SELECT statements (one for each subtype). Neither of these will be editable.
I was going to paste some SQL from the first major app where I worked with super/subtype tables, but looking at it, the SQL is so complicated it would just confuse people. That's not so much because my app was complicated, but it's because the nature of the problem is complex -- presenting the full set of data to the user, both super- and subtypes, is by its very nature complex. My conclusion from working with it was that I'd have been better off with only one subtype table.
That's not to say it's not useful in some circumstances, just that Access's bound forms don't necessarily make it easy to present this data to the user.
I have a similar problem I've been working on.
While looking for a repeatable pattern, I wanted to make sure I didn't abandon referential integrity, which meant that I wouldn't use a (TABLE_NAME, PK_ID) solution.
I finally settled on:
Base Type Table: CUSTOMER
Sub Type Tables: PERSON, BUSINESS, GOVT_ENTITY
I put nullable PRERSON_ID, BUSINESS_ID and GOVT_ENTITY_ID fields in CUSTOMER, with foreign keys on each, and a check constraint that only one is not null. It's easy to add new sub types, just need to add the nullable foreign key and modify the check constraint.