Recommended structure for table that has FK for 3 other tables - sql

I have a table that will contain information for 3 other tables. The design I have is that this table will have a column that will tell the objects's ID and another column will tell the objects's type (and thus the table that that row refers to).
Two questions:
a) Is that the best design or is there something else more widely accepted?
b) What is the recommend procedure to assure that IDs are valid for the given objects's type?

If I understood your question correctly, each row in your table links to exactly one of the three other tables.
Your approach (type field + one foreign key field) is a valid design, and it's useful if you want to create a general-purpose table that contains meta-information about your data (e.g. a list of records that should be retransmitted for replication).
Another approach, which might be more suitable for real application-level data, would be to have three columns, each being a foreign key to one of the three tables, and to add a constraint that requires exactly two of those fields to be null. The has the following advantages:
The three FKs do not need to have the same data type.
The JOIN syntax becomes more natural (not involving the type field).
You can add referential integrity constraints on those FK columns.
You don't need to ensure correctness of the type field -- in fact, you don't need the type field at all. The type is determined implicitly by the one FK column which is not null.

a) I'm supposing you have a relationship one to many between objects and object types. In a normal design you'd have a reference from the objecttype column in the objects table to the primary key of the object types table
b) I would enforce referential integrity in the relationship properties (this depends on the dbms you are using). It's also up to you to use cascading on updates and deletes. This way, an update or a delete of the primary key on object types table would be reflected on the objects one, updating its foreign key column (object type column) or deleting the registers that have that object type.

The basics of DB schema design are easy, but more complicated situations can be really complicated to figure out what's best. There is a lot of personal subjectivity that can come into play here, and even performance can be a factor in denormalizing a design.
Disclaimer aside, my personal recommendation is to never use a column to store more than one kind of FK, i.e. a column for FKs should store FKs that point only to a single table. If you don't do this, you have to map the cascade of that column's data into multiple sub-select queries inside your code, and it can begin to get more messy than you expected. Your given "Problem No. 2, ensuring validity between type and FK" is just the beginning of a whole world of pain that will cascade throughout your source code.
Assuming you change the design to use one field per FK reference, I would also check whether each FK field in your main "information-holding table" will be fully valid for each record. If not, I would move out the FK columns that will only be applicable some of the time to a separate table.

Related

SQL Server database design with foreign keys

I have the following partial database design:
All the tables are dependent on each other so the table bvd_docflow_subdocuments is dependent on the table bdd_docflow_subsets
and the table bvd_docflow_subdocuments is dependent on bvd_docflow_subsets. So I thought I could me smart and use foreign keys on every table (and ON DELETE CASCADE). However the FK are being drilldown how further I go in to the tables.
The problem is the table bvd_docflow_documents has no point having a reference to the 1docflow_documentset_id` PK / FK. Is there a way (and maybe my design is crappy) that only the table standing above it has an FK relationship between the tables and not all the tables above it.
Edit:
More explanation:
In the bvd_docflow_subsets table information is stored about objects to create documents. There is an relation between that table and bvd_docflow_subdocuments table (This table stores master data about all the documents for an subset. (docflow_subset_id is in both tables). This is the link between those to tables.
Going further down we also got the table bvd_docflow_documents this table contains the actual document data. The link between bvd_docflow_documents and bvd_docflow_subdocuments is bvd_docflow_subdocument_id.
On every table I got an foreign key defined so when data is removed on a table all the data linked to that data is also removed.
However when we look to the bvd_docflow_documents table it has all the foreign keys from the other tables (docflow_subset_id and docflow_documentset_id) and there is the problem. The only foreign key needed for that bvd_docflow_documents table is docflow_subdocument_id and no other.
Edit 2
I have changed my design further and removed information that I don't need after initial import of the data.
See the following link for the (total) databse design:
https://sqldbm.com/Project/SQLServer/Share/_AUedvNutCEV2DGLJleUWA
The tables subsets, subdocuments and documents have a many to many relationship so I thought a table in between those 3 documents_subdocuments is the way to go were I define all the different keys for those tables.
I am not used to the database design first and then build it. But, for everything there is a first time, and I try to do make a database that is using standards and is using the power of SQL Server the correct way.
I'll address the bottom-most table and ignore the rest for the most part.
But first some comments. Your schema is simply a model of a system. To provide feedback, one must understand this "system" and how it actually works to evaluate your model. In addition, it is important to understand your entities and your reasons for choosing them and modelling them in the specified manner. Without that understanding all of this guessing based on experience.
And another comment. Slapping an identity column into every table is just lazy modelling IMO. Others will disagree, but you need to also enforce all natural keys. Do you have natural keys? It is rare not to have any. Enforce those that do exist.
And one last comment. Stop the ridiculous pattern of prepending the column names with the table names. And you should really think long and hard about using very long table names. Given what you have, I sense you need a schema for your docflow stuff.
For the documents table, your current PK makes no sense. Again, you've slapped an identity column into the table. By itself, this column is a key for the table. The inclusion of any other columns does not make the key any more "unique" - that inclusion is logical nonsense. Following your pattern, you would designate the identity column as the primary key. But ...
According to your image, the documents table is related to one and only one subdocument. You added a foreign key to that table - which matches the image. You also added additional columns and foreign keys to the "higher" tables. So now a document "points" to a specific subdocument. It also points to a specific subset - which may have no relationship to the subdocument. The same thought applies to the other FK. I have a doubt that this is logically correct. So why do these columns (and related FKs) exist? Perhaps this is the result of premature optimization - which everyone knows is the root of all evil coding. Again, it is impossible to know if this is "right" or even "useful" for your model.
To answer your question "... is there a way", the answer is obviously yes. You remove the columns of which you complain. You added them - Why? Is this perhaps a problem with the tool you are using?
And some last comments. There is nothing special about "varchar(50)". Perhaps this is a place holder that will be updated later. It may also be another sign of laziness. And generally speaking, columns with names like "type" and "code" tend to be foreign keys to "lookup" tables - because people like to add, modify, or remove these sorts categorization values over time. I'm also concerned about the column name overlap among the tables. "Location" exists in multiple tables, as do action_code and action_id. And a column named "id" (action_id) suggests a lookup to another table - is it? Should it be? Is there a relationship between action_id and action_code? From a distance it is impossible to answer any of these questions.
But designing a database is more art than science. Sometimes you just need to create something, populate it with some sample data, and then determine if it works for your needs. Everyone will get something wrong in the first try. That is expected; that is how you learn. The most difficult part is actually completing your first attempt.

Is it possible to implement a TRUE one-to-one relation?

Consider the following model where a Customer should have one and only one Address and an Address should belong to one and only one Customer:
To implement it, as almost everybody in DB field says, Shared PK is the solution:
But I think it is a fake one-to-one relationship. Because nothing in terms of database relationship actually prevents deleting any row in table Address. So truely, it is 1..[0..1] not 1..1
Am I right? Is there any other way to implement a true 1..1 relation?
Update:
Why cascade delete is not a solution:
If we consider cascade delete as a solution we should put this on either of the tables. Let's say if a row is deleted from table Address, it causes corresponding row in table Customer to be deleted. it's okay but half of the solution. If a row in Customer is deleted, the corresponding row in Address should be deleted as well. This is the second half of the solution, and it obviously makes a cycle.
Beside my comment
You could implement DELETE CASCADE See HOW
I realize there is also the problem of insert.
You have to insert Customer first and then Address
So I think the best way if you really want a 1:1 is create a single table instead.
Customer
CustomerID
Name
Address
City
Sorry, is this meant to be a real-world database relationship? In all of the many databases I have ever built with customer data, there has always been real cases of either customers with multiple addresses, or more than one organisation at the same address.
I wouldn't want to lead you into a database modelling fallacy by suggesting anything different.
Yes, the "shared PK" idiom you show is for 1-to-0-or-1.
The straightforward way to have a true 1-to-1 correspondence is to have one table with Customer and Address as CKs (candidate keys). (Via UNIQUE NOT NULL and/or PRIMARY KEY.) You could offer the separate tables as views. Unfortunately typical DBMSs have restrictions on what you can do via the views, in particular re updating.
The relational way to have separate CUSTOMER and ADDRESS tables and a third table/association/relationship with Customer and Address columns as CKs plus FKs on Customer to and from CUSTOMER and on Address to and from ADDRESS (or equivalent constraint(s)). Unfortunately most DBMSs needlessly won't let you declare cycles in FKs and you cannot impose the constraints without triggers/complexity. (Ultimately, if you want to have proper integrity in a typical SQL database you need to use triggers and complex idioms.)
Entity-oriented design methods unfortunately artificially distinguish between entities, associations and properties. Here is an example where if you consider the simplest design to simply be the one table with PKs then you don't want to always have to have distinct tables for each entity. Or if you consider the simplest design to be the three tables (or even two) with the PKs and FKs (or some other constraint(s) for 1-to-1) then unfortunately typical DBMSs just don't declaratively/ergonomically support that particular design situation.
(Straightforward relational design is to have values (that are sometimes used as ids) 1-to-1 with application things but then just have whatever relevant application relationships/associations/relations and corresponding/representing tables/relations as needed to describe your application situations.)
It's possible in principle to implement a true 1-1 data structure in some DBMSs. It's very difficult to add data or modify data in such a structure using standard SQL however. Standard SQL only permits one table to be updated at a time and therefore as soon as you insert a row into one or other table the intended constraint is broken.
Here are two examples. First using Tutorial D. Note that the comma between the two INSERT statements ensures that the 1-1 constraint is never broken:
VAR CUSTOMER REAL RELATION {
id INTEGER} KEY{id};
VAR ADDRESS REAL RELATION {
id INTEGER} KEY{id};
CONSTRAINT one_to_one (CUSTOMER{id} = ADDRESS{id});
INSERT CUSTOMER RELATION {
TUPLE {id 1234}
},
INSERT ADDRESS RELATION {
TUPLE {id 1234}
};
Now the same thing in SQL.
CREATE TABLE CUSTOMER (
id INTEGER NOT NULL PRIMARY KEY);
CREATE TABLE ADDRESS (
id INTEGER NOT NULL PRIMARY KEY);
INSERT INTO CUSTOMER (id)
VALUES (1234);
INSERT INTO ADDRESS (id)
VALUES (1234);
ALTER TABLE CUSTOMER ADD CONSTRAINT one_to_one_1
FOREIGN KEY (id) REFERENCES ADDRESS (id);
ALTER TABLE ADDRESS ADD CONSTRAINT one_to_one_2
FOREIGN KEY (id) REFERENCES CUSTOMER (id);
The SQL version uses two foreign key constraints, which is the only kind of multi-table constraint supported by most SQL DBMSs. It requires two INSERT statements which means I could only insert a row before adding the constraints, not after.
A strict one-to-one constraint probably isn't very useful in practice but it's actually just a special case of something more important and interesting: join dependency. A join dependency is effectively an "at least one" constraint between tables rather than "exactly one". In the world outside databases it is common to encounter examples of business rules that ought to be implemented as join dependencies ("each customer must have AT LEAST ONE addresss", "each order must have AT LEAST ONE item in it"). In SQL DBMSs it's hard or impossible to implement join dependencies. The usual solution is simply to ignore such business rules thus weakening the data integrity value of the database.
Yes, what you say is true, the dependent side of a 1:1 relationship may not exist -- if only for the time it takes to create the dependent entity after creating the independent entity. In fact, all relationships may have a zero on one side or the other. You can even turn the relationship into a 1:m by placing the FK of the address in the Customer row and making the field not null. You can still have addresses that aren't referenced by any customer.
At first glance, a m:n may look like an exception. The intersection entry is generally defined so that neither FK can be null. But there can be customers and addresses both that have no entry referring to them. So this is really a 0..m:0..n relationship.
What of it? Everyone I've ever worked with has understood that "one" (as in 1:1) or "many" (as in 1:m or m:n) means "no more than this." There is no "exactly this, no more or less." For example, we can design a 1:3 relationship on paper. We cannot strictly enforce it in any database. We have to use triggers, stored procedures and/or scheduled tasks to seek out and call our attention to deviations. Execute a stored procedure weekly, for instance, that will seek and and flag or delete any such orphaned addresses.
Think of it like a "frictionless surface." It exists only on paper.
I see this question as a conceptual misunderstanding. Relations are between different things. Things with a "true 1-to-1 relation" are by definition aspects or attributes of the same thing, and belong in the same table. No, of course a person and and address are not the same, but if they are inseparable, and must always be inserted, deleted, or otherwise acted upon as a unit, then as data they are "the same thing". This is exactly what is described here.
Yes, and it's actually quite easy: just put both entities in the same table!
OTOH, if you need to keep them in separate tables for some reason, then you need a key in one table referencing1 a key in another, and vice-versa. This, of course, represents a "chicken and egg" problem2 which can be resolved by deferring the enforcement of FKs to the end of the transaction3. This works only on DBMSes that support deferred constraints (such as Oracle and PostgreSQL).
1 Via a foreign key.
2 Inserting a row in the first table is impossible because that would violate the referential integrity towards the second table, but inserting a row in the second table is impossible because that would violate the referential integrity towards the first table, etc... Ditto for deletion.
3 So you simply insert both rows, and then check both FKs.

How do I structure a generic item that can have a relationship with different tables?

In my example, I have a watch, which is an indication a user wants notifications about events on a different item, say a group and an organization.
I see two ways to do this:
Have a groupwatch resource, with a groupwatch table, with id,user,group (group FK to group resource and table); and a orgwatch resource, with a orgwatch table, with id,user,organization (org FK to organization resource and table)
Have a generic watch resource, with a watch table, with id,user,type,typeid. type is one of group or organization, and typeid is the ID of the group or organization being watched.
Since both of them are watches, it seems a waste to have two different tables and resources to watch 2 different objects. It gets worse if I start watching 4, 5, 6, 20, 50 different types of resources.
On the other hand, a foreign key relationship appears impossible if I just have a generic typeid, which means that my database (if relational) and my framework (activerecord or anything else) cannot enforce it correctly.
How do I best implement this type of "association to different types of record/table for each record in my table"?
UPDATE:
Are my only choices for doing this:
separate tables/resources for each watch type, which enables the database to enforce relational integrity and do joins
single table for all watches, but I will have to enforce relational integrity and do joins at the app level?
If you add a new type of resource once every six months, you may want to define your tables in such a way that adding new resources involves changing data definitions. If you add a new resource type every week, you may want to make your data definitions stay the same when you add new types. There's a downside to either choice.
If you do choose to define table in such a way that the types are visible in the table structure, there are two patterns often used with type/subtype (aka class/subclass) situations.
One pattern has been called "single table inheritance". Put data about all the types in a single table, and leave some columns NULL wherever they do not apply.
Another pattern has been called "class table inheritance". Define one table for the superclass, with all the data that is common to all the types. Then define tables for each subtype (subclass) to contain class specific data. Make the primary key of the subtype tables a duplicate of the primary key in the supertype table, and also declare it as a foreign key that references the primary key of the supertype table. It's going to be up to the app, at insert time, to replicate the value of the primary key in the supertype table over in the subtype table.
I like Fowlers' treatment of these two patterns.
http://martinfowler.com/eaaCatalog/classTableInheritance.html
http://www.martinfowler.com/eaaCatalog/singleTableInheritance.html
This matter of sharing primary keys has a few beneficial effects.
First, it enforces the one-to-one nature of the ISa relationships.
Second, it makes it easy to find out whether a given entry belongs to a desired subtype, by just joining with the subtype table. You don't really need an extra type field.
Third, it speeds up the joins, because of the index that gets built when you declare a primary key.
If you want a structure that can adapt to new attributes without changing data definitions, you can look into E-A-V design. Be careful, though. Sometimes this results in data that is nearly impossible to use, because the logical structure is so obscure. I usually think of E-A-V as an anti-pattern for this reason, although there are some who really like the results they get from it.

SQL Referencial Integrity Between a Column and (One of Many Possible) Tables

This is more of a curiosity at the moment, but let's picture an environment where I bill on a staunch nickle&dime basis. I have many operations that my system does and they're all billable. All these operations are recorded across various tables (these tables need to be separate because they record very different kinds of information). I also want to micro manage my accounts receivables. (Forgive me if you find inconsistencies here, as this example is not a real situation)
Is there a somewhat standard way of substituting a foreign key with something that can verify that the identifier in column X on my billing table is an existing identifier within one of many operations record tables?
One idea is that when journalizing account activity, I could reference the operation's identifier as well as the operation (specifically, the table that it's in) and use a CHECK constraint. This is probably the best way to go so that my journal is not ambiguous.
Are there other ways to solve this problem, de-facto or proprietary?
Do non-relational databases solve this problem?
EDIT:
To rephrase my initial question,
Is there a somewhat standard way of substituting a foreign key with something that can verify that the identifier in column X on my billing table is an existing identifier within one of many (but not necessarily all) operations record tables?
No, there's no way to achieve this with a single foreign key column.
You can do basically one of two things:
in your table which potentially references any of the other x tables, have x foreign key reference fields (ideally: ID's of type INT), only one of which will ever be non-NULL at any given time. Each FK reference key references exactly one of your other data tables
or:
have one "child" table per master table with a proper and enforced reference, and pull together the data from those n child tables into a view (instead of a table) for your reporting / billing.
Or just totally forget about referential integrity - which I would definitely not recommend!
you can Implementing Table inheritance
see article
http://www.sqlteam.com/article/implementing-table-inheritance-in-sql-server
An alternative is to enforce complex referential integrity rules via a trigger. However,and not knowing exactly what your design is, usually when these types of questions are asked it is to work around a bad design. Look at the design first and see if you can change it to make this something that can be handled through FKs, they are much more managable than doing this sort of thing through triggers.
If you do go the trigger route, don't forget to enforce updates as well as inserts and make sure your trigger will work properly with a set-based multi-row insert and update.
A design alternative is to havea amaster table that is parent to all your tables with the differnt details and use the FK against that.

SQL - Two foreign keys that have a dependency between them

The current structure is as follows:
Table RowType: RowTypeID
Table RowSubType: RowSubTypeID
FK_RowTypeID
Table ColumnDef: FK_RowTypeID
FK_RowSubTypeID (nullable)
In short, I'm mapping column definitions to rows. In some cases, those rows have subtype(s), which will have column definitions specific to them. Alternatively, I could hang those column definitions that are specific to subtypes off their own table, or I could combine the data in RowType and RowSubType into one table and work with a single ID, but I'm not sure either is a better solution (if anything, I'd lean towards the latter, as we mostly end up pulling ColumnDefs for a given RowType/RowSubType).
Is the current design SQL blasphemy?
If I keep the current structure, how do I maintain that if RowSubTypeID is specified in ColumnDef, that it must correspond to the RowType specified by RowTypeID? Should I try to enforce this with a trigger or am I missing a simple redesign that would solve the problem?
What you're having trouble with is Fourth Normal Form.
Here's the solution:
Table RowSubType: RowSubTypeID
FK_RowTypeID
UNIQUE(FK_RowTypeID, RowSubTypeID)
Table ColumnDef: ColumnDefID
FK_RowTypeID
UNIQUE(ColumnDefID, FK_RowTypeID)
Table ColumnDefSubType: FK_ColumnDefID } compound foreign key to ColumnDef
FK_RowTypeID } }
FK_RowSubTypeID } compound foreign key to RowSubType
You only need to create a row in the ColumnDefSubType table for columns that have a row subtype. But all references are constrained so you can't create an anomaly.
But for what it's worth, I agree with #Seth's comment about possible over-engineering. I'm not sure I understand how you're using these column defs and row types, but it smells like the Inner-Platform Effect anti-pattern. In SQL, just use metadata to define metadata. Don't try to use data to create a dynamic schema.
See also this excellent story: Bad CaRMa.
Re your comment: In your case I'd recommend using Class Table Inheritance or Concrete Table Inheritance. This means defining a separate table per subtype. But each column of your original text record would go into the respective column of the subtype table. That way you don't need to have your rowtype or rowsubtype tables, it's implicit by defining tables for each subtype. And you don't need your columndefs table, that's implicit by the columns defined in your tables.
See also my answer to Product table, many kinds of product, each product has many parameters or my presentation slides Practical Object-Oriented Models in SQL.