SQL Referencial Integrity Between a Column and (One of Many Possible) Tables - sql

This is more of a curiosity at the moment, but let's picture an environment where I bill on a staunch nickle&dime basis. I have many operations that my system does and they're all billable. All these operations are recorded across various tables (these tables need to be separate because they record very different kinds of information). I also want to micro manage my accounts receivables. (Forgive me if you find inconsistencies here, as this example is not a real situation)
Is there a somewhat standard way of substituting a foreign key with something that can verify that the identifier in column X on my billing table is an existing identifier within one of many operations record tables?
One idea is that when journalizing account activity, I could reference the operation's identifier as well as the operation (specifically, the table that it's in) and use a CHECK constraint. This is probably the best way to go so that my journal is not ambiguous.
Are there other ways to solve this problem, de-facto or proprietary?
Do non-relational databases solve this problem?
EDIT:
To rephrase my initial question,
Is there a somewhat standard way of substituting a foreign key with something that can verify that the identifier in column X on my billing table is an existing identifier within one of many (but not necessarily all) operations record tables?

No, there's no way to achieve this with a single foreign key column.
You can do basically one of two things:
in your table which potentially references any of the other x tables, have x foreign key reference fields (ideally: ID's of type INT), only one of which will ever be non-NULL at any given time. Each FK reference key references exactly one of your other data tables
or:
have one "child" table per master table with a proper and enforced reference, and pull together the data from those n child tables into a view (instead of a table) for your reporting / billing.
Or just totally forget about referential integrity - which I would definitely not recommend!

you can Implementing Table inheritance
see article
http://www.sqlteam.com/article/implementing-table-inheritance-in-sql-server

An alternative is to enforce complex referential integrity rules via a trigger. However,and not knowing exactly what your design is, usually when these types of questions are asked it is to work around a bad design. Look at the design first and see if you can change it to make this something that can be handled through FKs, they are much more managable than doing this sort of thing through triggers.
If you do go the trigger route, don't forget to enforce updates as well as inserts and make sure your trigger will work properly with a set-based multi-row insert and update.
A design alternative is to havea amaster table that is parent to all your tables with the differnt details and use the FK against that.

Related

Adding a row to Table A if it has a required foreign key to Table B which has a required foreign key to Table A

This might sound complicated, so I'll give an example.
Say, I have two tables Instructor and Class.
Instructor has a required field called PreferredClassID which has a foreign key against Class.
Class has a required field called CurrentInstructorID which is a foreign key against Instructor
Is it possible to insert a row to either of these tables?
Cause if I insert a row to Instructor, I won't be able to as I'll need to supply a PreferredClassID, but I can't create a Class row either because it needs a CurrentInstructorID.
If I can't do this, how would I solve this problem? Would I just need to make one of those fields non-required (even if business requirements specifies it really should be required?)
If you find yourself here, reevaluate your data relation model.
In this case, you could simply have a lookup table called PreferredCourse with courseId and instructorId.
This will enforce that both the course and instructor exist before adding the row to the PreferredCourse lookup. Maintaining business model requirements without bending the rules of database model requirements.
While it may seem excessive to have another table, it will prevent a whole lot of maintenance overhead in both your database procedures and jobs, and your application code. Circular references create nothing but headaches and are easily solved with small lookup tables and JOINs.
The Impaler gave an example of how to accomplish this with your current data structure. Please note, that you have to 1: make a key nullable in at least one of the tables, and then 2: Perform INSERTs in a specified order. Or, 3: disable the constraints, 4: perform INSERTS, 5: reenable constraints, 6: roll back transaction if constraints are now broken.
There is a whole lot that can go wrong, simply fix the relation model now before things get out of hand.
As long as one of those foreign keys allows a null value, you're good. So you:
Insert the row that accepts the null value first (say Instructor), with a null value on the FK. Get the ID of the inserted row.
Insert in the other table (say Class). In the FK you use the ID you got from step #1. Once inserted, you get the ID of this new row.
Update the FK on the first row (Instructor) with the ID you got from step #2.
Commit.
Alternatively, if both FKs are NOT NULL then you have a bit of a problem. The options I see for this last case are:
Use deferrable FK integrity check. Some databases do allow you to insert without checking integrity until the COMMIT happens. This is really tricky, and enabling this is looking for trouble.
Disable the FK for a short period of time. Some databases allow you to enable/disable constraints. You are not deleting them, just temporarily disabling them. If you do this, don't forget to enable them back.
Drop the constraint temporarily, while you do the insert, and the add it again. This is really a work around of last resort. Adding/Dropping constraint are DML statements and usually cannot participate in a transaction. Do this at your own peril.
Something to consider (as per user7396598's answer) is looking at how normal forms apply to your data as it fits within your relational model.
In this case, it might be worth looking at the following:
With your Instructor table, is the PreferredClassID a necessary component? Does an instructor -need- to have a preferred class, or is it okay to say "Hey, I'm creating an entry for a new instructor, I don't know their preferred class."
(if they're new, they might not have a preferred class that your school offers)
This is a case where you definitely want to have a foreign key, but it should be okay to say 'I don't necessarily know the value I want to put there.'
In a similar vein, does a Class need to have an instructor when it's created? Is it possible to create a Class that an instructor has not been assigned to yet?
Again, both of these points are really a case of 'I don't know what I want to put here, but when I do, it should be a specific instance that exists in another table.'

SQL Server database design with foreign keys

I have the following partial database design:
All the tables are dependent on each other so the table bvd_docflow_subdocuments is dependent on the table bdd_docflow_subsets
and the table bvd_docflow_subdocuments is dependent on bvd_docflow_subsets. So I thought I could me smart and use foreign keys on every table (and ON DELETE CASCADE). However the FK are being drilldown how further I go in to the tables.
The problem is the table bvd_docflow_documents has no point having a reference to the 1docflow_documentset_id` PK / FK. Is there a way (and maybe my design is crappy) that only the table standing above it has an FK relationship between the tables and not all the tables above it.
Edit:
More explanation:
In the bvd_docflow_subsets table information is stored about objects to create documents. There is an relation between that table and bvd_docflow_subdocuments table (This table stores master data about all the documents for an subset. (docflow_subset_id is in both tables). This is the link between those to tables.
Going further down we also got the table bvd_docflow_documents this table contains the actual document data. The link between bvd_docflow_documents and bvd_docflow_subdocuments is bvd_docflow_subdocument_id.
On every table I got an foreign key defined so when data is removed on a table all the data linked to that data is also removed.
However when we look to the bvd_docflow_documents table it has all the foreign keys from the other tables (docflow_subset_id and docflow_documentset_id) and there is the problem. The only foreign key needed for that bvd_docflow_documents table is docflow_subdocument_id and no other.
Edit 2
I have changed my design further and removed information that I don't need after initial import of the data.
See the following link for the (total) databse design:
https://sqldbm.com/Project/SQLServer/Share/_AUedvNutCEV2DGLJleUWA
The tables subsets, subdocuments and documents have a many to many relationship so I thought a table in between those 3 documents_subdocuments is the way to go were I define all the different keys for those tables.
I am not used to the database design first and then build it. But, for everything there is a first time, and I try to do make a database that is using standards and is using the power of SQL Server the correct way.
I'll address the bottom-most table and ignore the rest for the most part.
But first some comments. Your schema is simply a model of a system. To provide feedback, one must understand this "system" and how it actually works to evaluate your model. In addition, it is important to understand your entities and your reasons for choosing them and modelling them in the specified manner. Without that understanding all of this guessing based on experience.
And another comment. Slapping an identity column into every table is just lazy modelling IMO. Others will disagree, but you need to also enforce all natural keys. Do you have natural keys? It is rare not to have any. Enforce those that do exist.
And one last comment. Stop the ridiculous pattern of prepending the column names with the table names. And you should really think long and hard about using very long table names. Given what you have, I sense you need a schema for your docflow stuff.
For the documents table, your current PK makes no sense. Again, you've slapped an identity column into the table. By itself, this column is a key for the table. The inclusion of any other columns does not make the key any more "unique" - that inclusion is logical nonsense. Following your pattern, you would designate the identity column as the primary key. But ...
According to your image, the documents table is related to one and only one subdocument. You added a foreign key to that table - which matches the image. You also added additional columns and foreign keys to the "higher" tables. So now a document "points" to a specific subdocument. It also points to a specific subset - which may have no relationship to the subdocument. The same thought applies to the other FK. I have a doubt that this is logically correct. So why do these columns (and related FKs) exist? Perhaps this is the result of premature optimization - which everyone knows is the root of all evil coding. Again, it is impossible to know if this is "right" or even "useful" for your model.
To answer your question "... is there a way", the answer is obviously yes. You remove the columns of which you complain. You added them - Why? Is this perhaps a problem with the tool you are using?
And some last comments. There is nothing special about "varchar(50)". Perhaps this is a place holder that will be updated later. It may also be another sign of laziness. And generally speaking, columns with names like "type" and "code" tend to be foreign keys to "lookup" tables - because people like to add, modify, or remove these sorts categorization values over time. I'm also concerned about the column name overlap among the tables. "Location" exists in multiple tables, as do action_code and action_id. And a column named "id" (action_id) suggests a lookup to another table - is it? Should it be? Is there a relationship between action_id and action_code? From a distance it is impossible to answer any of these questions.
But designing a database is more art than science. Sometimes you just need to create something, populate it with some sample data, and then determine if it works for your needs. Everyone will get something wrong in the first try. That is expected; that is how you learn. The most difficult part is actually completing your first attempt.

Is it possible to implement a TRUE one-to-one relation?

Consider the following model where a Customer should have one and only one Address and an Address should belong to one and only one Customer:
To implement it, as almost everybody in DB field says, Shared PK is the solution:
But I think it is a fake one-to-one relationship. Because nothing in terms of database relationship actually prevents deleting any row in table Address. So truely, it is 1..[0..1] not 1..1
Am I right? Is there any other way to implement a true 1..1 relation?
Update:
Why cascade delete is not a solution:
If we consider cascade delete as a solution we should put this on either of the tables. Let's say if a row is deleted from table Address, it causes corresponding row in table Customer to be deleted. it's okay but half of the solution. If a row in Customer is deleted, the corresponding row in Address should be deleted as well. This is the second half of the solution, and it obviously makes a cycle.
Beside my comment
You could implement DELETE CASCADE See HOW
I realize there is also the problem of insert.
You have to insert Customer first and then Address
So I think the best way if you really want a 1:1 is create a single table instead.
Customer
CustomerID
Name
Address
City
Sorry, is this meant to be a real-world database relationship? In all of the many databases I have ever built with customer data, there has always been real cases of either customers with multiple addresses, or more than one organisation at the same address.
I wouldn't want to lead you into a database modelling fallacy by suggesting anything different.
Yes, the "shared PK" idiom you show is for 1-to-0-or-1.
The straightforward way to have a true 1-to-1 correspondence is to have one table with Customer and Address as CKs (candidate keys). (Via UNIQUE NOT NULL and/or PRIMARY KEY.) You could offer the separate tables as views. Unfortunately typical DBMSs have restrictions on what you can do via the views, in particular re updating.
The relational way to have separate CUSTOMER and ADDRESS tables and a third table/association/relationship with Customer and Address columns as CKs plus FKs on Customer to and from CUSTOMER and on Address to and from ADDRESS (or equivalent constraint(s)). Unfortunately most DBMSs needlessly won't let you declare cycles in FKs and you cannot impose the constraints without triggers/complexity. (Ultimately, if you want to have proper integrity in a typical SQL database you need to use triggers and complex idioms.)
Entity-oriented design methods unfortunately artificially distinguish between entities, associations and properties. Here is an example where if you consider the simplest design to simply be the one table with PKs then you don't want to always have to have distinct tables for each entity. Or if you consider the simplest design to be the three tables (or even two) with the PKs and FKs (or some other constraint(s) for 1-to-1) then unfortunately typical DBMSs just don't declaratively/ergonomically support that particular design situation.
(Straightforward relational design is to have values (that are sometimes used as ids) 1-to-1 with application things but then just have whatever relevant application relationships/associations/relations and corresponding/representing tables/relations as needed to describe your application situations.)
It's possible in principle to implement a true 1-1 data structure in some DBMSs. It's very difficult to add data or modify data in such a structure using standard SQL however. Standard SQL only permits one table to be updated at a time and therefore as soon as you insert a row into one or other table the intended constraint is broken.
Here are two examples. First using Tutorial D. Note that the comma between the two INSERT statements ensures that the 1-1 constraint is never broken:
VAR CUSTOMER REAL RELATION {
id INTEGER} KEY{id};
VAR ADDRESS REAL RELATION {
id INTEGER} KEY{id};
CONSTRAINT one_to_one (CUSTOMER{id} = ADDRESS{id});
INSERT CUSTOMER RELATION {
TUPLE {id 1234}
},
INSERT ADDRESS RELATION {
TUPLE {id 1234}
};
Now the same thing in SQL.
CREATE TABLE CUSTOMER (
id INTEGER NOT NULL PRIMARY KEY);
CREATE TABLE ADDRESS (
id INTEGER NOT NULL PRIMARY KEY);
INSERT INTO CUSTOMER (id)
VALUES (1234);
INSERT INTO ADDRESS (id)
VALUES (1234);
ALTER TABLE CUSTOMER ADD CONSTRAINT one_to_one_1
FOREIGN KEY (id) REFERENCES ADDRESS (id);
ALTER TABLE ADDRESS ADD CONSTRAINT one_to_one_2
FOREIGN KEY (id) REFERENCES CUSTOMER (id);
The SQL version uses two foreign key constraints, which is the only kind of multi-table constraint supported by most SQL DBMSs. It requires two INSERT statements which means I could only insert a row before adding the constraints, not after.
A strict one-to-one constraint probably isn't very useful in practice but it's actually just a special case of something more important and interesting: join dependency. A join dependency is effectively an "at least one" constraint between tables rather than "exactly one". In the world outside databases it is common to encounter examples of business rules that ought to be implemented as join dependencies ("each customer must have AT LEAST ONE addresss", "each order must have AT LEAST ONE item in it"). In SQL DBMSs it's hard or impossible to implement join dependencies. The usual solution is simply to ignore such business rules thus weakening the data integrity value of the database.
Yes, what you say is true, the dependent side of a 1:1 relationship may not exist -- if only for the time it takes to create the dependent entity after creating the independent entity. In fact, all relationships may have a zero on one side or the other. You can even turn the relationship into a 1:m by placing the FK of the address in the Customer row and making the field not null. You can still have addresses that aren't referenced by any customer.
At first glance, a m:n may look like an exception. The intersection entry is generally defined so that neither FK can be null. But there can be customers and addresses both that have no entry referring to them. So this is really a 0..m:0..n relationship.
What of it? Everyone I've ever worked with has understood that "one" (as in 1:1) or "many" (as in 1:m or m:n) means "no more than this." There is no "exactly this, no more or less." For example, we can design a 1:3 relationship on paper. We cannot strictly enforce it in any database. We have to use triggers, stored procedures and/or scheduled tasks to seek out and call our attention to deviations. Execute a stored procedure weekly, for instance, that will seek and and flag or delete any such orphaned addresses.
Think of it like a "frictionless surface." It exists only on paper.
I see this question as a conceptual misunderstanding. Relations are between different things. Things with a "true 1-to-1 relation" are by definition aspects or attributes of the same thing, and belong in the same table. No, of course a person and and address are not the same, but if they are inseparable, and must always be inserted, deleted, or otherwise acted upon as a unit, then as data they are "the same thing". This is exactly what is described here.
Yes, and it's actually quite easy: just put both entities in the same table!
OTOH, if you need to keep them in separate tables for some reason, then you need a key in one table referencing1 a key in another, and vice-versa. This, of course, represents a "chicken and egg" problem2 which can be resolved by deferring the enforcement of FKs to the end of the transaction3. This works only on DBMSes that support deferred constraints (such as Oracle and PostgreSQL).
1 Via a foreign key.
2 Inserting a row in the first table is impossible because that would violate the referential integrity towards the second table, but inserting a row in the second table is impossible because that would violate the referential integrity towards the first table, etc... Ditto for deletion.
3 So you simply insert both rows, and then check both FKs.

Designing Tables Sql Server

Good Morning,
in the design of a database, I have a table (TabA's call it) that could have relationships with four other tables. In the sense that this table can be connected both with the first of four, and with the second, and the third to the fourth, but could not have links with them; or it could have one (with any of the tables), or two links (always with two of any of them), and so on.
The table TabA I added four fields that refer to the four tables which could be "null" when they do not have any connection.
Wondering is this the kind of optimal design (say the four fields in the TabA) or you can make a better design for this type of situation?
Many thanks for your reply.
dave
In answer to the question and clarification in your comment, the answer is that your design can't be improved in terms of the number of foreign key columns. Having a specific foreign key column for every potential foreign key relationship is a best practice design.
However, the schema design itself seems questionable. I don't have enough information to tell whether the "Distributori_[N]_Livello" tables are a truly hierarchical structure or not. If it is, it is often possible to use a self-referential table for hierarchical structures rather than a set of N tables, as the diagram you linked seems to use. If you are able to refactor your design in such a way, it might be possible to reduce the number of foreign key columns required.
Whether this is possible or not is not for me to say given the data provided.

Recommended structure for table that has FK for 3 other tables

I have a table that will contain information for 3 other tables. The design I have is that this table will have a column that will tell the objects's ID and another column will tell the objects's type (and thus the table that that row refers to).
Two questions:
a) Is that the best design or is there something else more widely accepted?
b) What is the recommend procedure to assure that IDs are valid for the given objects's type?
If I understood your question correctly, each row in your table links to exactly one of the three other tables.
Your approach (type field + one foreign key field) is a valid design, and it's useful if you want to create a general-purpose table that contains meta-information about your data (e.g. a list of records that should be retransmitted for replication).
Another approach, which might be more suitable for real application-level data, would be to have three columns, each being a foreign key to one of the three tables, and to add a constraint that requires exactly two of those fields to be null. The has the following advantages:
The three FKs do not need to have the same data type.
The JOIN syntax becomes more natural (not involving the type field).
You can add referential integrity constraints on those FK columns.
You don't need to ensure correctness of the type field -- in fact, you don't need the type field at all. The type is determined implicitly by the one FK column which is not null.
a) I'm supposing you have a relationship one to many between objects and object types. In a normal design you'd have a reference from the objecttype column in the objects table to the primary key of the object types table
b) I would enforce referential integrity in the relationship properties (this depends on the dbms you are using). It's also up to you to use cascading on updates and deletes. This way, an update or a delete of the primary key on object types table would be reflected on the objects one, updating its foreign key column (object type column) or deleting the registers that have that object type.
The basics of DB schema design are easy, but more complicated situations can be really complicated to figure out what's best. There is a lot of personal subjectivity that can come into play here, and even performance can be a factor in denormalizing a design.
Disclaimer aside, my personal recommendation is to never use a column to store more than one kind of FK, i.e. a column for FKs should store FKs that point only to a single table. If you don't do this, you have to map the cascade of that column's data into multiple sub-select queries inside your code, and it can begin to get more messy than you expected. Your given "Problem No. 2, ensuring validity between type and FK" is just the beginning of a whole world of pain that will cascade throughout your source code.
Assuming you change the design to use one field per FK reference, I would also check whether each FK field in your main "information-holding table" will be fully valid for each record. If not, I would move out the FK columns that will only be applicable some of the time to a separate table.