Inheritance in SQL. How to guarantee it? - sql

I am trying to create inheritance as in a C# object using SQL Server and I have:
create table dbo.Evaluations
(
Id int not null constraint primary key clustered (Id),
Created datetime not null
);
create table dbo.Exams
(
Id int not null,
Value int not null
// Some other fields
);
create table dbo.Tests
(
Id int not null,
Order int not null
// Some other fields
);
alter table dbo.Exams
add constraint FK_Exams_Id foreign key (Id) references dbo.Evaluations(Id);
alter table dbo.Tests
add constraint FK_Tests_Id foreign key (Id) references dbo.Evaluations(Id);
Which would translate to:
public class Evaluation {}
public class Exam : Evaluation {}
public class Test : Evaluation {}
I think this is the way to go but I have a problem:
How to force that an Evaluation has only one Test or one Exam but not both?
To find which type of evaluation I have I can check exam or test for null. But should I have an EvaluationType in Evaluations table instead?
NOTE:
In reality I have 4 subtypes each one with around 40 to 60 different columns.
And in Evaluations table I have around 20 common columns which are also the ones which i use more often to query so I get lists.

First, don't use reserved words such as order for column names.
You have a couple of choices on what to do. For this simple example, I would suggest just having the two foreign key references in the evaluation table, along with some constraints and computed columns. Something like this:
create table dbo.Evaluations
(
EvaluationId int not null constraint primary key clustered (Id),
ExamId int references exams(ExamId),
TestId int references tests(TestId),
Created datetime not null,
EvaluationType as (case when ExamId is not null then 'Exam' when TestId is not null then 'Test' end),
check (not (ExamId is not null and TestId is not null))
);
This approach gets less practical if you have lots of subtypes. For your case, though, it provides the following:
Foreign key references to the subtables.
A column specifying the type.
A validation that at most one type is set for each evaluation.
It does have a slight overhead of storing the extra, unused id, but that is a small overhead.
EDIT:
With four subtypes, you can go in the other direction of having a single reference and type in the parent table and then using conditional columns and indexes to enforce the constraints:
create table dbo.Evaluations
(
EvaluationId int not null constraint primary key clustered (Id),
EvaluationType varchar(255) not null,
ChildId int not null,
CreatedAt datetime not null,
EvaluationType as (case when ExamId is not null then 'Exam' when TestId is not null then 'Test' end),
ExamId as (case when EvaluationType = 'Exam' then ChildId end),
TestId as (case when EvaluationType = 'Test' then ChildId end),
Other1Id as (case when EvaluationType = 'Other1' then ChildId end),
Other2Id as (case when EvaluationType = 'Other2' then ChildId end),
Foreign Key (ExamId) int references exams(ExamId),
Foreign Key (TestId) int references tests(TestId),
Foreign Key (Other1Id) int references other1(Other1Id),
Foreign Key (Other2Id) int references other2(Other2Id)
);
In some ways, this is the better solution to the problem. It minimizes storage and is extensible for additional types. Note that it is using computed columns for the foreign key references, so it is still maintaining relational integrity.

My best experience is include all columns in one table.
Relation model is not much friendly with object oriented design.
If you treat every class as one table, you can get performance problems with high number of rows in "base-table" (base class) or you can suffer from a lot of joins if you have level of inheritance.
If you want minimalize amount of work to get correct structure, create your own tool, which can genrate create/alter scripts of tables for chosen classes. It's in fact pretty easy. Then you can generate also your data access layer. In result you will get automatic worker and you can focus on complex tasks and delegate work for "trained monkeys" to computer not humans.

Related

Foreign key to table with composite primary key

We are making a translation system and we're struggling finding the best way to model our database.
What we have right now is:
CREATE TABLE Translation
(
id INT NOT NULL PRIMARY KEY
EN VARCHAR(MAX) NULL
DE VARCHAR(MAX) NULL
FR VARCHAR(MAX) NULL
...
);
This solution combines all translations to one entry. Downside is that if you have to add a language, you have to add a column. The upside is that you have a primary key which can be used for foreign keys.
Alternative solution:
CREATE TABLE TranslationId
(
id INT NOT NULL PRIMARY KEY
);
CREATE TABLE Translation
(
id INT NOT NULL
Language VARCHAR(2) NOT NULL
Translation VARCHAR(MAX) NULL
);
id in Translation has a foreign key to the id of TranslationId (and is not unique in the Translation table). This solution doesn't have the disadvantage of the first solution. The disadvantage is that this may be overengineered. To get all the translations for a certain id, you need to pass through an extra table.
Both solutions will work. Any thoughts on either solution?

Database Schema - Many-to-Many Normalisation

I'm designing a schema where a case can have many forms attached and a form can be used for many cases. The Form table basically holds the structure of a html form which gets rendered on the client side. When the form is submitted the name/value pairs for the fields are stored separately. Is there any value in keeping the name/value attributes seperate from the join table as follows?
CREATE TABLE Case (
ID int NOT NULL PRIMARY KEY,
...
);
CREATE TABLE CaseForm (
CaseID int NOT NULL FOREIGN KEY REFERENCES Case (ID),
FormID int NOT NULL FOREIGN KEY REFERENCES Form (ID),
CONSTRAINT PK_CaseForm PRIMARY KEY (CaseID, FormID)
);
CREATE TABLE CaseFormAttribute (
ID int NOT NULL PRIMARY KEY,
CaseID int NOT NULL FOREIGN KEY REFERENCES CaseForm (CaseID),
FormID int NOT NULL FOREIGN KEY REFERENCES CaseForm (FormID),
Name varchar(255) NOT NULL,
Value varchar(max)
);
CREATE TABLE Form (
ID int NOT NULL PRIMARY KEY,
FieldsJson varchar (max) NOT NULL
);
I'm I overcomplicating the schema since the same many to many relationship can by achieved by turning the CaseFormAttribute table into the join table and getting rid of the CaseForm table altogether as follows?
CREATE TABLE CaseFormAttribute (
ID int NOT NULL PRIMARY KEY,
CaseID int NOT NULL FOREIGN KEY REFERENCES Case (ID),
FormID int NOT NULL FOREIGN KEY REFERENCES Form (ID),
Name varchar(255) NOT NULL,
Value varchar(max) NULL
);
Basically what I'm trying to ask is which is the better design?
The main benefit of splitting up the two would depend on whether or not additional fields would ever be added to the CaseForm table. For instance, say that you want to record if a Form is incomplete. You may add an Incomplete bit field to that effect. Now, you have two main options for retrieving that information:
Clustered index scan on CaseForm
Create a nonclustered index on CaseForm.Incomplete which includes CaseID, FormID, and scan that
If you didn't split the tables, your two main options would be:
Clustered index scan on CaseFormAttribute
Create a nonclustered index on CaseFormAttribute.Incomplete which includes CaseID, FormID, and scan that
For the purposes of this example, query options 1 and 2 are roughly the same in terms of performance. Introducing the nonclustered index adds overhead in multiple ways. It's a little less streamlined than the clustered index (it may take more reads to scan in this particular example), it's additional storage space that CaseForm will take up, and the index has to be maintained for updates to the table. Option 4 will also perform similarly, with the same caveats as option 2. Option 3 will be your worst performer, as a clustered index scan will include reading all of the BLOB data in your Value field, even though it only needs the bit in Incomplete to determine whether or not to return that (Case, Form) pair.
So it really does depend on what direction you're going in the future.
Also, if you stay with the split approach, consider shifting CaseFormAttribute.ID to CaseForm, and then use CaseForm.ID as your PK/FK in CaseFormAttribute. The caveat here is that we're assuming that all Forms will be inserted at the same time for a given Case. If that's not true, then you would invite some page splits because your inserts will be somewhat random, though still generally increasing.

Complex Foreign Key Constraint in SQL

Is there a way to define a constraint using SQL Server 2005 to not only ensure a foreign key exists in another table, but also meets a certain criteria?
For example, say I have two tables:
Table A
--------
Id - int
FK_BId - int
Table B
--------
Id - int
Name - string
SomeBoolean - bit
Can I define a constraint that sayd FK_BId must point to a record in Table B, AND that record in Table B must have SomeBoolean = true? Thanks in advance for any help you can provide.
You can enforce the business rule using a composite key on (Id, SomeBoolean), reference this in table A with a CHECK constraint on FK_BSomeBoolean to ensure it is always TRUE. BTW I'd recommend avoiding BIT and instead using CHAR(1) with domain checking e.g.
CHECK (SomeBoolean IN ('F', 'T'))
The table structure could look like this:
CREATE TABLE B
(
Id INTEGER NOT NULL UNIQUE, -- candidate key 1
Name VARCHAR(20) NOT NULL UNIQUE, -- candidate key 2
SomeBoolean CHAR(1) DEFAULT 'F' NOT NULL
CHECK (SomeBoolean IN ('F', 'T')),
UNIQUE (Id, SomeBoolean) -- superkey
);
CREATE TABLE A
(
Ib INTEGER NOT NULL UNIQUE,
FK_BId CHAR(1) NOT NULL,
FK_BSomeBoolean CHAR(1) DEFAULT 'T' NOT NULL
CHECK (FK_BSomeBoolean = 'T')
FOREIGN KEY (FK_BId, FK_BSomeBoolean)
REFERENCES B (Id, SomeBoolean)
);
I think what you're looking for is out of the scope of foreign keys, but you could do the check in triggers, stored procedures, or your code.
If it is possible to do, I'd say that you would make it a compound foreign key, using ID and SomeBoolean, but I don't think it actually cares what the value is.
In some databases (I can't check SQL Server) you can add a check constraint that references other tables.
ALTER TABLE a ADD CONSTRAINT fancy_fk
CHECK (FK_BId IN (SELECT Id FROM b WHERE SomeBoolean));
I don’t believe this behavior is standard.

Where do you store ad-hoc properties in a relational database?

Lets say you have a relational DB table like INVENTORY_ITEM. It's generic in the sense that anything that's in inventory needs a record here. Now lets say there are tons of different types of inventory and each different type might have unique fields that they want to keep track of (e.g. forks might track the number of tines, but refrigerators wouldn't have a use for that field). These fields must be user-definable per category type.
There are many ways to solve this:
Use ALTER TABLE statements to actually add nullable columns on the fly (yuk)
Have two tables with a one-to-one mapping, INVENTORY_ITEM, and INVENTORY_ITEM_USER, and use ALTER TABLE statements to add and remove nullable columns from the latter table on the fly (a bit nicer).
Add a CUSTOM_PROPERTY table, and a CUSTOM_PROPERTY_VALUE table, and add/remove rows in CUSTOM_PROPERTY when the user adds and removes rows, and store the values in the latter table. This is nice and generic, but the performance would suffer. If you had an average of 20 values per item, the number of rows in CUSTOM_PROPERTY_VALUE goes up at 20 times the rate, and you still need to include columns in CUSTOM_PROPERTY_VALUE for every different data type that you might want to store.
Have one big varchar(MAX) field on INVENTORY_ITEM to store custom properties as XML.
I guess you could have individual tables for each category type that hangs off the INVENTORY_ITEM table, and these get created/destroyed on the fly when the user creates inventory types, and the columns get updated when they add/remove properties to those types. Seems messy though.
Is there a best-practice for this? It seems to me that option 4 is clean, but doesn't allow you to easily search by the metadata. I've used a variant of 3 before, but only on a table that had a really small number of rows, so performance wasn't an issue. It always seemed to me that 2 was a good idea, but it doesn't fit well with auto-generated entity frameworks, so you'd have to exclude the custom properties table from the entity generation and just write your own custom data access code to handle it.
Am I missing any alternatives? Is there a way for SQL server to "look into" XML data in a column so it could actually do stuff with option 4 now?
I am using the xml type column for this kind of situations...
http://msdn.microsoft.com/en-us/library/ms189887.aspx
Before xml we had to use the option 3. Which in my point of view is still a good way to do it. Espacialy if you have a Data Access Layer that is able to handle the type conversion properly for you. We stored everything as string values and defined a column that held the orignial data type for the conversion.
Options 1 and 2 are a no-go. Don't change the database schema in production on the fly.
Option 5 could be done in a separate database... But still no control over the schema and the user would need the rights to create tables etc.
Definitely the 3.
Sometimes 4 if you have a very good reason to do so.
Do not ever dynamically modify database structure to accommodate for incoming data. One day something could break and damage your database. It is simply not done this way.
3 or 4 are the only ones I would consider - you don't want to be changing the schema on the fly, especially if you're using some kind of mapping layer.
I've generally gone with option 3. As a bit of sanity, I always have a type column in the CUSTOM_PROPERTY table, which is repeated in the CUSTOM_PROPERTY_VALUE table. By adding a superkey to the CUSTOM_PROPERTY table of <Primary Key, Type>, you can then have a foreign key that references this (as well as the simpler foreign key to just the primary key). And finally, a check constraint that ensures that only the relevant column in CUSTOM_PROPERTY_VALUE is not null, based on this type column.
In this way, you know that if someone has defined a CUSTOM_PROPERTY, say, Tine count, of type int, that you're actually only ever going to find an int stored in the CUSTOM_PROPERTY_VALUE table, for all instances of this property.
Edit
If you need it to reference multiple entity tables, then it can get more complex, especially if you want full referential integrity. For instance (with two distinct entity types in the database):
create table dbo.Entities (
EntityID uniqueidentifier not null,
EntityType varchar(10) not null,
constraint PK_Entities PRIMARY KEY (EntityID),
constraint CK_Entities_KnownTypes CHECK (
EntityType in ('Foo','Bar')),
constraint UQ_Entities_KnownTypes UNIQUE (EntityID,EntityType)
)
go
create table dbo.Foos (
EntityID uniqueidentifier not null,
EntityType as CAST('Foo' as varchar(10)) persisted,
FooFixedProperty1 int not null,
FooFixedProperty2 varchar(150) not null,
constraint PK_Foos PRIMARY KEY (EntityID),
constraint FK_Foos_Entities FOREIGN KEY (EntityID) references dbo.Entities (EntityID) on delete cascade,
constraint FK_Foos_Entities_Type FOREIGN KEY (EntityID,EntityType) references dbo.Entities (EntityID,EntityType)
)
go
create table dbo.Bars (
EntityID uniqueidentifier not null,
EntityType as CAST('Bar' as varchar(10)) persisted,
BarFixedProperty1 float not null,
BarFixedProperty2 int not null,
constraint PK_Bars PRIMARY KEY (EntityID),
constraint FK_Bars_Entities FOREIGN KEY (EntityID) references dbo.Entities (EntityID) on delete cascade,
constraint FK_Bars_Entities_Type FOREIGN KEY (EntityID,EntityType) references dbo.Entities (EntityID,EntityType)
)
go
create table dbo.ExtendedProperties (
PropertyID uniqueidentifier not null,
PropertyName varchar(100) not null,
PropertyType int not null,
constraint PK_ExtendedProperties PRIMARY KEY (PropertyID),
constraint CK_ExtendedProperties CHECK (
PropertyType between 1 and 4), --Or make type a varchar, and change check to IN('int', 'float'), etc
constraint UQ_ExtendedProperty_Names UNIQUE (PropertyName),
constraint UQ_ExtendedProperties_Types UNIQUE (PropertyID,PropertyType)
)
go
create table dbo.PropertyValues (
EntityID uniqueidentifier not null,
PropertyID uniqueidentifier not null,
PropertyType int not null,
IntValue int null,
FloatValue float null,
DecimalValue decimal(15,2) null,
CharValue varchar(max) null,
EntityType varchar(10) not null,
constraint PK_PropertyValues PRIMARY KEY (EntityID,PropertyID),
constraint FK_PropertyValues_ExtendedProperties FOREIGN KEY (PropertyID) references dbo.ExtendedProperties (PropertyID) on delete cascade,
constraint FK_PropertyValues_ExtendedProperty_Types FOREIGN KEY (PropertyID,PropertyType) references dbo.ExtendedProperties (PropertyID,PropertyType),
constraint FK_PropertyValues_Entities FOREIGN KEY (EntityID) references dbo.Entities (EntityID) on delete cascade,
constraint FK_PropertyValues_Entitiy_Types FOREIGN KEY (EntityID,EntityType) references dbo.Entities (EntityID,EntityType),
constraint CK_PropertyValues_OfType CHECK (
(IntValue is null or PropertyType = 1) and
(FloatValue is null or PropertyType = 2) and
(DecimalValue is null or PropertyType = 3) and
(CharValue is null or PropertyType = 4)),
--Shoot for bonus points
FooID as CASE WHEN EntityType='Foo' THEN EntityID END persisted,
constraint FK_PropertyValues_Foos FOREIGN KEY (FooID) references dbo.Foos (EntityID),
BarID as CASE WHEN EntityType='Bar' THEN EntityID END persisted,
constraint FK_PropertyValues_Bars FOREIGN KEY (BarID) references dbo.Bars (EntityID)
)
go
--Now we wrap up inserts into the Foos, Bars and PropertyValues tables as either Stored Procs, or instead of triggers
--To get the proper additional columns and/or base tables populated
My inclination would be to store things as XML if the database supports that nicely, or else have a small number of different tables for different data types (try to format data so it will fit one of a small number of types--don't use one table for VARCHAR(15), another for VARCHAR(20), etc.) Something like #5, but with all tables pre-created, and everything shoehorned into the existing tables. Each row should hold a main-record ID, record-type indicator, and a piece of data. Set up an index based on record-type, subsorted by data, and it will be possible to query for particular field values (where RecType==19 and Data=='Fred'). Querying for records that match multiple field values would be harder, but such is life.

How to emulate tagged union in a database?

What is the best way to emulate Tagged union in databases?
I'm talking about something like this:
create table t1 {
vehicle_id INTEGER NOT NULL REFERENCES car(id) OR motor(id) -- not valid
...
}
where vehicle_id would be id in car table OR motor table, and it would know which.
(assume that motor and car tables have nothing in common0
Some people use a design called Polymorphic Associations to do this, allowing vehicle_id to contain a value that exists either in car or motor tables. Then add a vehicle_type that names the table which the given row in t1 references.
The trouble is that you can't declare a real SQL foreign key constraint if you do this. There's no support in SQL for a foreign key that has multiple reference targets. There are other problems, too, but the lack of referential integrity is already a deal-breaker.
A better design is to borrow a concept from OO design of a common supertype of both car and motor:
CREATE TABLE Identifiable (
id SERIAL PRIMARY KEY
);
Then make t1 reference this super-type table:
CREATE TABLE t1 (
vehicle_id INTEGER NOT NULL,
FOREIGN KEY (vehicle_id) REFERENCES identifiable(id)
...
);
And also make the sub-types reference their parent supertype. Note that the primary key of the sub-types is not auto-incrementing. The parent supertype takes care of allocating a new id value, and the children only reference that value.
CREATE TABLE car (
id INTEGER NOT NULL,
FOREIGN KEY (id) REFERENCES identifiable(id)
...
);
CREATE TABLE motor (
id INTEGER NOT NULL,
FOREIGN KEY (id) REFERENCES identifiable(id)
...
);
Now you can have true referential integrity, but also support multiple subtype tables with their own attributes.
The answer by #Quassnoi also shows a method to enforce disjoint subtypes. That is, you want to prevent both car and motor from referencing the same row in their parent supertype table. When I do this, I use a single-column primary key for Identifiable.id but also declare a UNIQUE key over Identifiable.(id, type). The foreign keys in car and motor can reference the two-column unique key instead of the primary key.
CREATE TABLE vehicle (type INT NOT NULL, id INT NOT NULL,
PRIMARY KEY (type, id)
)
CREATE TABLE car (type INT NOT NULL DEFAULT 1, id INT NOT NULL PRIMARY KEY,
CHECK(type = 1),
FOREIGN KEY (type, id) REFERENCES vehicle
)
CREATE TABLE motorcycle (type INT NOT NULL DEFAULT 2, id INT NOT NULL PRIMARY KEY,
CHECK(type = 2),
FOREIGN KEY (type, id) REFERENCES vehicle
)
CREATE TABLE t1 (
...
vehicle_type INT NOT NULL,
vehicle_id INT NOT NULL,
FOREIGN KEY (vehicle_type, vehicle_id) REFERENCES vehicle
...
)
I think the least-boilerplate solution is to use constraint and check.
For example, consider this ADT in Haskell:
data Shape = Circle {radius::Float} | Rectangle {width::Float, height::Float}
The equivalent in MySQL/MariaDB would be (tested on 10.5.11-MariaDB):
CREATE TABLE shape (
type ENUM('circle', 'rectangle') NOT NULL,
radius FLOAT,
width FLOAT,
height FLOAT,
CONSTRAINT constraint_circle CHECK
(type <> 'circle' OR radius IS NOT NULL),
CONSTRAINT constraint_rectangle CHECK
(type <> 'rectangle' OR (width IS NOT NULL AND height IS NOT NULL))
);
INSERT INTO shape(type, radius, width, height)
VALUES ('circle', 1, NULL, NULL); -- ok
INSERT INTO shape(type, radius, width, height)
VALUES ('circle', NULL, 1, NULL); -- error, constraint_circle violated
Note that the above uses type <> x OR y instead of type = x AND y. This is because the latter essentially means that all rows must have type of x, which defeats the purpose of tagged union.
Also, note that the solution above only check for required columns, but does not check for extraneous columns.
For example, you can insert a rectangle which has defined radius.
This can be easily mitigated by adding another condition for constraint_rectangle, namely radius is null.
However, I will not recommend doing so as it makes adding new type tedious.
For example, to add a new type triangle with one new column base, not only we will need to add a new constraint, but we also need to modify the existing constraints to ensure that their base is null.
I think you could model such a reference by using table inheritance in PostgreSQL.
If you really need to know where a row comes from in a Query, you could use a simple UNION ALL statment like (this possibility has nothing to do with table inheritance):
SELECT car.*, 'car' table_name
UNION ALL
SELECT motor.*, 'motor' table_name