Can a foreign key have a constant instead of a field name? Relate FK to STI subclass - sql

Setup
So here's a scenario which I'm finding is rather common once you decide to play with STI (Single Table Inheritance).
You have some base type with various subtypes.
Person < (Teacher,Student,Staff,etc)
User < (Member,Admin)
Member < (Buyer,Seller)
Vehicle < (Car,Boat,Plane)
etc.
There are two major approaches to modelling that in the database:
Single Table Inheritance
One big table with a type field and a bunch of nullable fields
Class Table Inheritance
One table per type with shared PK (FK'd from the children to the parent)
While there are several issues with STI, I do like how it manages to cut down on the number of joins you have to make, as well as some of the support in frameworks like Rails, but I am running into an issue on how to relate subclass-specific tables.
For example:
Certifications should only reference Teacher-Persons
Profiles should only reference Member-Users
WingInformation should be not be related to a car or boat (unless you are Batman maybe)
Advertisements are owned by Seller-Members not Buyer-Members
With CTI, these relationships are trivial - just slap a Foreign Key on the related table and you're done:
ALTER TABLE advertisements
ADD FOREIGN KEY (seller_id) REFERENCES sellers (id)
But with STI, the similar thing wouldn't capture the subtype restriction.
ALTER TABLE advertisements
ADD FOREIGN KEY (seller_id) REFERENCES members (id)
What I would like to see is something like:
* Does not work in most (all?) databases *
ALTER TABLE advertisements
ADD FOREIGN KEY (seller_id, 'seller') REFERENCES members (id, type)
All I have been able to find is a dirty hack requiring adding a computed column to the related table:
ALTER TABLE advertisements
ADD seller_type VARCHAR(20) NOT NULL DEFAULT 'seller'
ALTER TABLE advertisements
FOREIGN KEY (seller_id, seller-type) REFERENCES members (id, type)
This strikes me as odd (not to mention inelegant).
The real questions
Is there a RDBMS out there which will allow me to do this?
Is there a reason why this isn't even possible?
Is this just one more reason why NOT to use STI except in the most trivial of cases?

There's no standard way to declare a constant in the foreign key declaration. You have to name columns.
But you could force the column to have a fixed value, using one of the following methods:
Computed column
CHECK constraint
Trigger before INSERT/UPDATE to overwrite any user-supplied value with the default value.

Related

Questionable SQL Relationship

I am going through a pluralsight course that is currently going through building an MVC application using an entity framework code-first approach. I was confused about the Database schema used for the project.
As you can see, the relationship between Securities and it's relating tables seems to be one-to-one, but the confusion comes when I realize there is no foreign key to relate the two sub-tables and they they appear to share the same primary key column.
The video before made the Securities model class abstract in order for the "Stock" and "MutualFund" model classes to inherit from it and contain all relating data. To me however, it seems that same thing could be done using a couple of foreign keys.
I guess my question is does this method of linking tables serve any useful purpose in SQL or EF? It seems to me in order to create a new record for one table, all tables would need a new record which is where I really get confused.
In ORM and EF terminology, this setup is referred to as the "Table per Type" inheritance paradigm, where there is a table per subclass, a base class table, and the primary key is shared between the subclasses and the base class.
e.g. In this case, Securities_Stock and Securities_MutualFund are two subclasses of the Securities base class / table (possibly abstract).
The relationship will be 0..1 (subclass) to 1 (base class) - i.e. only one of the records in Securities_MutualFund or Securities_Stock will exist for each base table Securities row.
There's also often a discriminator column on the base table to indicate which subclass table to join to, but that doesn't seem to be the case here.
It is also common to enforce referential integrity between the subclasses to the base table with a foreign key.
To answer your question, the reason why there's no FK between the two subclass instance tables is because each instance (with a unique Id) will only ever be in ONE of the sub class tables - it is NOT possible for the same Security to be both a mutual fund and a share.
You are right, in order for a new concrete Security record to be added, a row is needed in both the base Securities Table (must be inserted first, as their are FK's from the subclass tables to the base table), and then a row is inserted into one of the subclass tables, with the rest of the 'specific' data.
If a Foreign Key was added between Stock and Mutual Fund, it would be impossible to insert new rows into the tables.
The full pattern often looks like this:
CREATE TABLE BaseTable
(
Id INT PRIMARY KEY, -- Can also be Identity
... Common columns here
Discriminator, -- Type usually has a small range, so `INT` or `CHAR` are common
);
CREATE TABLE SubClassTable
(
Id INT PRIMARY KEY, -- Not identity, must be manually inserted
-- Specialized SubClass columns here
FOREIGN KEY (Id) REFERENCES BaseTable(Id)
);

exclusive/disjoint inheritance in SQLite

I was wondering how to implement exclusive inheritance in SQlite. By doing simply
create table Class (id integer primary key);
create table Sub1(id integer primary key references Class(id));
create table Sub2(id integer primary key references Class(id));
I have simple inheritance which does not prevent a Class to be both Sub1 and Sub2. I am looking for a way to enforce that a Class cannot be both (and optionnally, enforce it to be at least one of them).
In theory this could be possible with checks, e.g. for Sub2, something like
create table Sub2(id integer primary key references Class(id)
check(not exists(select 1 from Sub1 where Sub1.id = id limit 1)));
but this has the drawback that it would require maintenance as subclasses are added, and also that it is not accepted by SQLite (subqueries prohibited in CHECK constraints). This does not work when the check is at the table level either.
EDIT
Found a similar question (and related answers) on SO here.
You could try to use triggers (http://www.sqlite.org/lang_createtrigger.html).
For instance, you could implement your needs by creating a trigger for the table Sub(n) that, when a record is inserted in Sub(n), checks that its primary key is not alread present in Class; if it is present than fails since this means that another record with the same primary key is already present in another Sub(k) table, otherwise it insert the (primary key of the) record in Class.
In this way, you can add tables corresponding to subclasses without modifying the code of the previous tables.

SQL Foreign Key, is my example an appropriate use?

Can someone please confirm if this is an appropriate use of a foreign key(this is just an example):
Application to book a meeting room;
tblBooking -> pkName,Time,fkRoomName;
tblRoom -> pkRoomName, RoomNumber;
The UI will populate a dropdown menu using the pkRoomName data, when the booking is made the selected pkRoomName will then go to tblBooking fkRoomName.
Have I understood this correctly?
Yes, if you want to ensure that any booking that specifies a room refers to a known room, then this is an appropriate use of a foreign key.
Note, however, that just declaring the foreign key relationship is not the same thing as requiring that all bookings specify a room. Records in tblBooking with a NULL value for fkRoomName will be permitted. If you want to require a room is specified, you must also use the NOT NULL constraint on the fkRoomName field.
Finally, a small matter of semantics. I would not use "fk" in the column name for the RoomName. This is because a foreign key is a different entity from the columns it includes. It is not uncommon to have foreign keys that include multiple columns and it is also not uncommon to have several different foreign key relationships out of a single table. Therefore fkRoomName is an appropriate name for the foreign key itself, but not so much for the column.

database design pattern: many to many relationship across tables?

I have the following tables:
Section and Content
And I want to relate them.
My current approach is the following table:
In which I would store
Section to Section
Section to Content
Content to Section
Content to Content
Now, while I clearly can do that by adding a pair of fields that indicate whether the source is a section or a content, and whether the target is a section or a content, I'd like to know if there's a cleaner way to do this. and if possible using just one table for the relationship, which would be the cleanest in my opinion. I'd also like the table to be somehow related to the Section and Content tables so I can avoid manually adding constraints, or triggers that delete the relationships when a Section or Content is deleted...
Thanks as usual for the input! <3
Here's how I would design it:
CREATE TABLE Pairables (
PairableID INT IDENTITY PRIMARY KEY,
...other columns common to both Section and Content...
);
CREATE TABLE Sections (
SectionID INT PRIMARY KEY,
...other columns specific to sections...
FOREIGN KEY (SectionID) REFERENCES Pairables(PairableID)
);
CREATE TABLE Contents (
ContentID INT PRIMARY KEY,
...other columns specific to contents...
FOREIGN KEY (ContentID) REFERENCES Pairables(PairableID)
);
CREATE TABLE Pairs (
PairID INT NOT NULL,
PairableId INT NOT NULL,
IsSource BIT NOT NULL,
PRIMARY KEY (PairID, PairableID),
FOREIGN KEY (PairableID) REFERENCES Pairables(PairableID)
);
You would insert two rows in Pairs for each pair.
Now it's easy to search for either type of pairable entity, you can search for either source or target in the same column, and you still only need one many-to-many intersection table.
Yes, there is a much cleaner way to do this:
one table tracks the relations from Section to Section and enforces them as foreign key constraints
one table tracks the relations from Section to Content and enforces them as foreign key constraints
one table tracks the relations from Content to Section and enforces them as foreign key constraints
one table tracks the relations from Content to Content and enforces them as foreign key constraints
This is much cleaner than a single table with overloaded IDs that cannot be enforced by foreign key constraints. The fact that the data modeling, nor the domain modeling patterns, never mention a pattern like the one you describe should be your first alarm bell. The second alarm should be that the engine cannot enforce the constraints you envision and you have to dwell into triggers.
Having four distinct relationships modeled in one table brings no elegance to the model, it only adds obfuscation. Relational model is not C++: it has no inheritance, it has no polymorphism, it has no overloading. Trying to enforce a OO mind set into data modeling has led many a fine developers into a mud of unmaintainable trigger mesh of on-disk table-like bits vaguely resembling 'data'.

How to design a database schema to support tagging with categories?

I am trying to so something like Database Design for Tagging, except each of my tags are grouped into categories.
For example, let's say I have a database about vehicles. Let's say we actually don't know very much about vehicles, so we can't specify the columns all vehicles will have. Therefore we shall "tag" vehicles with information.
1. manufacture: Mercedes
model: SLK32 AMG
convertible: hardtop
2. manufacture: Ford
model: GT90
production phase: prototype
3. manufacture: Mazda
model: MX-5
convertible: softtop
Now as you can see all cars are tagged with their manufacture and model, but the other categories don't all match. Note that a car can only have one of each category. IE. A car can only have one manufacturer.
I want to design a database to support a search for all Mercedes, or to be able to list all manufactures.
My current design is something like this:
vehicles
int vid
String vin
vehicleTags
int vid
int tid
tags
int tid
String tag
int cid
categories
int cid
String category
I have all the right primary and foreign keys in place, except I can't handle the case where each car can only have one manufacturer. Or can I?
Can I add a foreign key constraint to the composite primary key in vehicleTags? IE. Could I add a constraint such that the composite primary key (vid, tid) can only be added to vehicleTags only if there isn't already a row in vehicleTags such that for the same vid, there isn't already a tid in the with the same cid?
My guess is no. I think the solution to this problem is add a cid column to vehicleTags, and make the new composite primary key (vid, cid). It would look like:
vehicleTags
int vid
int cid
int tid
This would prevent a car from having two manufacturers, but now I have duplicated the information that tid is in cid.
What should my schema be?
Tom noticed this problem in my database schema in my previous question, How do you do many to many table outer joins?
EDIT
I know that in the example manufacture should really be a column in the vehicle table, but let's say you can't do that. The example is just an example.
This is yet another variation on the Entity-Attribute-Value design.
A more recognizable EAV table looks like the following:
CREATE TABLE vehicleEAV (
vid INTEGER,
attr_name VARCHAR(20),
attr_value VARCHAR(100),
PRIMARY KEY (vid, attr_name),
FOREIGN KEY (vid) REFERENCES vehicles (vid)
);
Some people force attr_name to reference a lookup table of predefined attribute names, to limit the chaos.
What you've done is simply spread an EAV table over three tables, but without improving the order of your metadata:
CREATE TABLE vehicleTag (
vid INTEGER,
cid INTEGER,
tid INTEGER,
PRIMARY KEY (vid, cid),
FOREIGN KEY (vid) REFERENCES vehicles(vid),
FOREIGN KEY (cid) REFERENCES categories(cid),
FOREIGN KEY (tid) REFERENCES tags(tid)
);
CREATE TABLE categories (
cid INTEGER PRIMARY KEY,
category VARCHAR(20) -- "attr_name"
);
CREATE TABLE tags (
tid INTEGER PRIMARY KEY,
tag VARCHAR(100) -- "attr_value"
);
If you're going to use the EAV design, you only need the vehicleTags and categories tables.
CREATE TABLE vehicleTag (
vid INTEGER,
cid INTEGER, -- reference to "attr_name" lookup table
tag VARCHAR(100, -- "attr_value"
PRIMARY KEY (vid, cid),
FOREIGN KEY (vid) REFERENCES vehicles(vid),
FOREIGN KEY (cid) REFERENCES categories(cid)
);
But keep in mind that you're mixing data with metadata. You lose the ability to apply certain constraints to your data model.
How can you make one of the categories mandatory (a conventional column uses a NOT NULL constraint)?
How can you use SQL data types to validate some of your tag values? You can't, because you're using a long string for every tag value. Is this string long enough for every tag you'll need in the future? You can't tell.
How can you constrain some of your tags to a set of permitted values (a conventional table uses a foreign key to a lookup table)? This is your "softtop" vs. "soft top" example. But you can't make a constraint on the tag column because that constraint would apply to all other tag values for other categories. You'd effectively restrict engine size and paint color to "soft top" as well.
SQL databases don't work well with this model. It's extremely difficult to get right, and querying it becomes very complex. If you do continue to use SQL, you will be better off modeling the tables conventionally, with one column per attribute. If you have need to have "subtypes" then define a subordinate table per subtype (Class-Table Inheritance), or else use Single-Table Inheritance. If you have an unlimited variation in the attributes per entity, then use Serialized LOB.
Another technology that is designed for these kinds of fluid, non-relational data models is a Semantic Database, storing data in RDF and queried with SPARQL. One free solution is RDF4J (formerly Sesame).
I needed to solve this exact problem (same general domain and everything — auto parts). I found that the best solution to the problem was to use Lucene/Xapian/Ferret/Sphinx or whichever full-text indexer you prefer. Much better performance than what SQL can offer.
What you describe are not tags, tags are only values, they do not have an associated key.
Tags are normally implemented as a string column, the value being a list of values delimited.
For example #1, a tag field would contain a value such as:
"manufacture_Mercedes,model_SLK32 AMG,convertible_hardtop"
The user then would normally be able to easily filter entries, by the existence of one or more tags. It is essentially schemaless data from a database perspective. There are downsides to tags, but they also avoid the extreme complications that come from using an EAV model. If you really need an EAV model, it also might be worth considering an attributes field, which contains JSON data. It's more painful to query, but still not as horrible as querying EAV across multiple tables.
I think your solution is to simply add a manufacturer column to your vehicles table. It's an attribute that you know all the vehicles will have (i.e. cars don't spontaneously appear by themselves) and by making it a column in your vehicle table you solve the issue of having one and only one manufacturer for each vehicle. This approach would apply to any attributes that you know will be shared by all vehicles. You can then implement the tagging system for the other attributes that aren't universal.
So taking from your example the vehicle table would be something like:
vehicle
vid
vin
make
model
One way would be to slightly rethink your schema, normalising tag keys away from values:
vehicles
int vid
string vin
tags
int tid
int cid
string key
categories
int cid
string category
vehicleTags
int vid
int tid
string value
Now all you need is a unique constraint on vehicleTags(vid, tid).
Alternatively, there are ways to create constraints beyond simple foreign keys: depending on your database, can you write a custom constraint or an insert/update trigger to enforce vehicle-tag uniqueness?
I needed to solve this exact problem (same general domain and everything — auto parts). I found that the best solution to the problem was to use Lucene/Xapian/Ferret/Sphinx or whichever full-text indexer you prefer. Much better performance than what SQL can offer.
These days, I almost never end up building a database-backed web app that doesn't involve a full-text indexer. This problem and the general issue of search just come up way too often to omit indexers from your toolbox.