Is it semantically correct to use a foreign key to indicate a range?

Is it semantically correct to use a foreign key to indicate a range? - sql

I'm trying to create SQL tables to represent a series of codes used by a third-party API. So far, I have the following tables:
CREATE TABLE ApiCode (
Id int NOT NULL IDENTITY(1, 1) PRIMARY KEY,
ResponseCode char(1) NOT NULL,
ResponseSubCode char(1) NOT NULL,
ResponseSubSubCode char(1) NULL,
MinorCodeRangeId int NULL REFERENCES ApiMinorCodeRange,
Description nvarchar(500)
)
CREATE TABLE ApiMinorCodeRange (
Id int NOT NULL IDENTITY(1, 1) PRIMARY KEY,
FromMinorCode char(4) NOT NULL,
ThruMinorCode char(4) NOT NULL
)
CREATE TABLE ApiMinorCode (
Code char(4) NOT NULL PRIMARY KEY,
Description nvarchar(500)
)
The problem is, FromMinorCode and ThruMinorCode can reference codes that don't exist. For example: a range can indicate "5000 - 5ZZZ", but MinorCode might only have entries defined for "5000 - 500A". New codes are added every few months, so the ApiMinorCodeRange table needs to reference the entire range defined in the specs.
I was planning to create foreign keys anyway and mark them as NOCHECK:
ALTER TABLE ApiMinorCodeRange ADD CONSTRAINT FK_FromMinorCode FOREIGN KEY ( FromMinorCode ) REFERENCES ApiMinorCode
ALTER TABLE ApiMinorCodeRange NOCHECK CONSTRAINT FK_FromMinorCode
ALTER TABLE ApiMinorCodeRange ADD CONSTRAINT FK_ThruMinorCode FOREIGN KEY ( ThruMinorCode ) REFERENCES ApiMinorCode
ALTER TABLE ApiMinorCodeRange NOCHECK CONSTRAINT FK_ThruMinorCode
Is this semantically correct?
Will Sql Server's query optimizer be ok with foreign keys that reference an imaginary row?
Should I create a dummy value "5ZZZ - Reserved for future use" instead of setting "NoCheck"?

You are trying to implement a business rule that will apply to codes that you have not yet seen. There is not necessarily a "right" way to do this.
Does a range relationship have to include valid codes? I don't see why. For instance, this picture of a set of encyclopedias (remember those?) has ranges on each volume, such as:
A-B
C
Sto-Zyg
I don't assume that "sto" is a valid entry in that volume. I do assume that "stochastic process" would be in the volume.
Why should your codes be different? More pertinently in your case, the range in your case could (possibly) be '5' to '5ZZZ', even though a '5' might not be a valid code.
And, your rules could end up extending beyond mere ranges. Perhaps some major code has all minor codes that start with "5" and end with "Z".
My conclusion for the ranges is that requiring a foreign key relationship isn't necessary.
That said, there is another problem that you might want to deal with. What prevents a code from being in multiple ranges? I suspect that you would need a trigger to enforce this rule.

Related

SQL How to not insert duplicated values

I'm trying to create a procedure that inserts data into a table of registers but i don't want to repeat the second parameter, this is the table
CREATE TABLE Inscription
(
idClass INT references tb_class,
idStudent INT references tb_student,
)
The idea is that a student (idStudent) can register in various classes but not in the same class (idClass), I tried to add a unique constraint in the idStudent column but that only allows a student to register in one single class.

I always suggest that all tables have a numeric primary key. In addition, your foreign key references are not correct. And what you want to do is add a unique constraint.
The exact syntax depends on the database. The following is for SQL Server:
CREATE TABLE Inscriptions (
idInscription int identity(1, 1) primary key
idClass int references tb_classes(idClass),
idStudent int references tb_students(idStudnt)
unique (idClass, idStudent)
);
Notice that I name the tables as the plural of the entity, but the id using the singular.
The Inscriptions table probably wants other columns as well, such as the date/time of the inscription, the method, and other related information.

You are looking to create a constraint on your table that includes both columns idClass and idStudent.
Once that constraint is created, an attempt to insert duplicate class/student will result in an error being raised.
As your table does not seem to include a primary key, you would better make that constraint your primary key.
NB : you did not tell which RDBMS you are using hence cannot give you the exact syntax to use...

Your unique key needs to encompass both idClass and idStudent, so any particular combination cannot repeat itself.

Foreign Key column mapped to multiple primary keys

I have a database which has three tables
Messages - PK = MessageId
Drafts - PK = DraftId
History - FK = RelatedItemId
The History table has a single foreign Key [RelatedItemId] which maps to one of the two Primary keys in Messages and Drafts.
Is there a name for this relationship?
Is it just bad design?
Is there a better way to design this relationship?
Here are the CREATE TABLE statements for this question:
CREATE TABLE [dbo].[History](
[HistoryId] [uniqueidentifier] NOT NULL,
[RelatedItemId] [uniqueidentifier] NULL,
CONSTRAINT [PK_History] PRIMARY KEY CLUSTERED ( [HistoryId] ASC )
)
CREATE TABLE [dbo].[Messages](
[MessageId] [uniqueidentifier] NOT NULL,
CONSTRAINT [PK_Messages] PRIMARY KEY CLUSTERED ( [MessageId] ASC )
)
CREATE TABLE [dbo].[Drafts](
[DraftId] [uniqueidentifier] NOT NULL,
CONSTRAINT [PK_Drafts] PRIMARY KEY CLUSTERED ( [DraftId] ASC )
)

In a short description the solution you have used is called:
Polymorphic Association
Objective: Reference Multiple Parents
Resulting anti-pattern: Use dual-purpose foreign key, violating first normal form (atomic issue), loosing referential integrity
Solution: Simplify the Relationship
More information about the problem.
BTW createing a common super-table will help you:

Is there a name for this relationship?
There is no standard name that I'm aware of, but I've heard people using the term "generic FKs" or even "inner-platform effect".
Is it just bad design?
Yes.
The reason: it prevents you from declaring a FOREIGN KEY, and therefore prevents the DBMS from enforcing referential integrity directly. Therefore you must enforce it trough imperative code, which is surprisingly difficult.
Is there a better way to design this relationship?
Yes.
Create separate FOREIGN KEY for each referenced table. Make them NULL-able, but make sure exactly one of them is non-NULL, through a CHECK constraint.
Alternatively, take a look at inheritance.

Best practice I have found is to create a Function that returns whether the passed in value exists in either of your Messages and Drafts PK columns. You can then add a constraint on the column on the History that calls this function and will only insert if it passes (i.e. it exists).
Adding non-parsed example Code:
CREATE FUNCTION is_related_there (
IN #value uniqueidentifier )
RETURNS TINYINT
BEGIN
IF (select count(DraftId) from Drafts where DraftId = #value + select count(MessageId) from Messages where MessageId = #value) > 0 THEN
RETURN 1;
ELSE
RETURN 0;
END IF;
END;
ALTER TABLE History ADD CONSTRAINT
CK_HistoryExists CHECK (is_related_there (RelatedItemId) = 1)
Hope that runs and helps lol

Correct way to create a table that references variables from another table

I have these relationships:
User(uid:integer,uname:varchar), key is uid
Recipe(rid:integer,content:text), key is rid
Rating(rid:integer, uid:integer, rating:integer) , key is (uid,rid).
I built the table in the following way:
CREATE TABLE User(
uid INTEGER PRIMARY KEY ,
uname VARCHAR NOT NULL
);
CREATE TABLE Recipes(
rid INTEGER PRIMARY KEY,
content VARCHAR NOT NULL
);
Now for the Rating table: I want it to be impossible to insert a uid\rid that does not exist in User\Recipe.
My question is: which of the following is the correct way to do it? Or please suggest the correct way if none of them are correct. Moreover, I would really appreciate if someone could explain to me what is the difference between the two.
First:
CREATE TABLE Rating(
rid INTEGER,
uid INTEGER,
rating INTEGER CHECK (0<=rating and rating<=5) NOT NULL,
PRIMARY KEY(rid,uid),
FOREIGN KEY (rid) REFERENCES Recipes,
FOREIGN KEY (uid) REFERENCES User
);
Second:
CREATE TABLE Rating(
rid INTEGER REFERENCES Recipes,
uid INTEGER REFERENCES User,
rating INTEGER CHECK (0<=rating and rating<=5) NOT NULL,
PRIMARY KEY(rid,uid)
);
EDIT:
I think User is problematic as a name for a table so ignore the name.

Technically both versions are the same in Postgres. The docs for CREATE TABLE say so quite clearly:
There are two ways to define constraints: table constraints and column constraints. A column constraint is defined as part of a column definition. A table constraint definition is not tied to a particular column, and it can encompass more than one column. Every column constraint can also be written as a table constraint; a column constraint is only a notational convenience for use when the constraint only affects one column.
So when you have to reference a compound key a table constraint is the only way to go.
But for every other case I prefer the shortest and most concise form where I don't need to give names to stuff I'm not really interested in. So my version would be like this:
CREATE TABLE usr(
uid SERIAL PRIMARY KEY ,
uname TEXT NOT NULL
);
CREATE TABLE recipes(
rid SERIAL PRIMARY KEY,
content TEXT NOT NULL
);
CREATE TABLE rating(
rid INTEGER REFERENCES recipes,
uid INTEGER REFERENCES usr,
rating INTEGER NOT NULL CHECK (rating between 0 and 5),
PRIMARY KEY(rid,uid)
);

This is a SQL Server based solution, but the concept applies to most any RDBMS.
Like so:
CREATE TABLE Rating (
rid int NOT NULL,
uid int NOT NULL,
CONSTRAINT PK_Rating PRIMARY KEY (rid, uid)
);
ALTER TABLE Rating ADD CONSTRAINT FK_Rating_Recipies FOREIGN KEY(rid)
REFERENCES Recipies (rid);
ALTER TABLE Rating ADD CONSTRAINT FK_Rating_User FOREIGN KEY(uid)
REFERENCES User (uid);
This ensures that the values inside of Rating are only valid values inside of both the Users table and the Recipes table. Please note, in the Rating table I didn't include the other fields you had, just add those.
Assume in the users table you have 3 users: Joe, Bob and Bill respective ID's 1,2,3. And in the recipes table you had cookies, chicken pot pie, and pumpkin pie respective ID's are 1,2,3. Then inserting into Rating table will only allow for these values, the minute you enter 4 for a RID or a UID SQL throws an error and does not commit the transaction.
Try it yourself, its a good learning experience.

In Postgresql a correct way to implement these tables are:
CREATE SEQUENCE uid_seq;
CREATE SEQUENCE rid_seq;
CREATE TABLE User(
uid INTEGER PRIMARY KEY DEFAULT nextval('uid_seq'),
uname VARCHAR NOT NULL
);
CREATE TABLE Recipes(
rid INTEGER PRIMARY KEY DEFAULT nextval('rid_seq'),
content VARCHAR NOT NULL
);
CREATE TABLE Rating(
rid INTEGER NOT NULL REFERENCES Recipes(rid),
uid INTEGER NOT NULL REFERENCES User(uid),
rating INTEGER CHECK (0<=rating and rating<=5) NOT NULL,
PRIMARY KEY(rid,uid)
);
There is no real difference between the two options that you have written.

A simple (i.e. single-column) foreign key may be declared in-line with the column declaration or not. It's merely a question of style. A third way should be to omit foreign key declarations from the CREATE TABLE entirely and later add them using ALTER TABLE statements; done in a transaction (presumable along with all the other tables, constraints, etc) the table would never exist without its required constraints. Choose whichever you think is easiest fora human coder to read and understand i.e. is easiest to maintain.

EDIT: I overlooked the REFERENCES clause in the second version when I wrote my original answer. The two versions are identical in terms of referential integrity, there are just two ways of syntax to do this.

Complex Foreign Key Constraint in SQL

Is there a way to define a constraint using SQL Server 2005 to not only ensure a foreign key exists in another table, but also meets a certain criteria?
For example, say I have two tables:
Table A
--------
Id - int
FK_BId - int
Table B
--------
Id - int
Name - string
SomeBoolean - bit
Can I define a constraint that sayd FK_BId must point to a record in Table B, AND that record in Table B must have SomeBoolean = true? Thanks in advance for any help you can provide.

You can enforce the business rule using a composite key on (Id, SomeBoolean), reference this in table A with a CHECK constraint on FK_BSomeBoolean to ensure it is always TRUE. BTW I'd recommend avoiding BIT and instead using CHAR(1) with domain checking e.g.
CHECK (SomeBoolean IN ('F', 'T'))
The table structure could look like this:
CREATE TABLE B
(
Id INTEGER NOT NULL UNIQUE, -- candidate key 1
Name VARCHAR(20) NOT NULL UNIQUE, -- candidate key 2
SomeBoolean CHAR(1) DEFAULT 'F' NOT NULL
CHECK (SomeBoolean IN ('F', 'T')),
UNIQUE (Id, SomeBoolean) -- superkey
);
CREATE TABLE A
(
Ib INTEGER NOT NULL UNIQUE,
FK_BId CHAR(1) NOT NULL,
FK_BSomeBoolean CHAR(1) DEFAULT 'T' NOT NULL
CHECK (FK_BSomeBoolean = 'T')
FOREIGN KEY (FK_BId, FK_BSomeBoolean)
REFERENCES B (Id, SomeBoolean)
);

I think what you're looking for is out of the scope of foreign keys, but you could do the check in triggers, stored procedures, or your code.
If it is possible to do, I'd say that you would make it a compound foreign key, using ID and SomeBoolean, but I don't think it actually cares what the value is.

In some databases (I can't check SQL Server) you can add a check constraint that references other tables.
ALTER TABLE a ADD CONSTRAINT fancy_fk
CHECK (FK_BId IN (SELECT Id FROM b WHERE SomeBoolean));
I don’t believe this behavior is standard.

Where do you store ad-hoc properties in a relational database?

Lets say you have a relational DB table like INVENTORY_ITEM. It's generic in the sense that anything that's in inventory needs a record here. Now lets say there are tons of different types of inventory and each different type might have unique fields that they want to keep track of (e.g. forks might track the number of tines, but refrigerators wouldn't have a use for that field). These fields must be user-definable per category type.
There are many ways to solve this:
Use ALTER TABLE statements to actually add nullable columns on the fly (yuk)
Have two tables with a one-to-one mapping, INVENTORY_ITEM, and INVENTORY_ITEM_USER, and use ALTER TABLE statements to add and remove nullable columns from the latter table on the fly (a bit nicer).
Add a CUSTOM_PROPERTY table, and a CUSTOM_PROPERTY_VALUE table, and add/remove rows in CUSTOM_PROPERTY when the user adds and removes rows, and store the values in the latter table. This is nice and generic, but the performance would suffer. If you had an average of 20 values per item, the number of rows in CUSTOM_PROPERTY_VALUE goes up at 20 times the rate, and you still need to include columns in CUSTOM_PROPERTY_VALUE for every different data type that you might want to store.
Have one big varchar(MAX) field on INVENTORY_ITEM to store custom properties as XML.
I guess you could have individual tables for each category type that hangs off the INVENTORY_ITEM table, and these get created/destroyed on the fly when the user creates inventory types, and the columns get updated when they add/remove properties to those types. Seems messy though.
Is there a best-practice for this? It seems to me that option 4 is clean, but doesn't allow you to easily search by the metadata. I've used a variant of 3 before, but only on a table that had a really small number of rows, so performance wasn't an issue. It always seemed to me that 2 was a good idea, but it doesn't fit well with auto-generated entity frameworks, so you'd have to exclude the custom properties table from the entity generation and just write your own custom data access code to handle it.
Am I missing any alternatives? Is there a way for SQL server to "look into" XML data in a column so it could actually do stuff with option 4 now?

I am using the xml type column for this kind of situations...
http://msdn.microsoft.com/en-us/library/ms189887.aspx
Before xml we had to use the option 3. Which in my point of view is still a good way to do it. Espacialy if you have a Data Access Layer that is able to handle the type conversion properly for you. We stored everything as string values and defined a column that held the orignial data type for the conversion.
Options 1 and 2 are a no-go. Don't change the database schema in production on the fly.
Option 5 could be done in a separate database... But still no control over the schema and the user would need the rights to create tables etc.

Definitely the 3.
Sometimes 4 if you have a very good reason to do so.
Do not ever dynamically modify database structure to accommodate for incoming data. One day something could break and damage your database. It is simply not done this way.

3 or 4 are the only ones I would consider - you don't want to be changing the schema on the fly, especially if you're using some kind of mapping layer.
I've generally gone with option 3. As a bit of sanity, I always have a type column in the CUSTOM_PROPERTY table, which is repeated in the CUSTOM_PROPERTY_VALUE table. By adding a superkey to the CUSTOM_PROPERTY table of <Primary Key, Type>, you can then have a foreign key that references this (as well as the simpler foreign key to just the primary key). And finally, a check constraint that ensures that only the relevant column in CUSTOM_PROPERTY_VALUE is not null, based on this type column.
In this way, you know that if someone has defined a CUSTOM_PROPERTY, say, Tine count, of type int, that you're actually only ever going to find an int stored in the CUSTOM_PROPERTY_VALUE table, for all instances of this property.
Edit
If you need it to reference multiple entity tables, then it can get more complex, especially if you want full referential integrity. For instance (with two distinct entity types in the database):
create table dbo.Entities (
EntityID uniqueidentifier not null,
EntityType varchar(10) not null,
constraint PK_Entities PRIMARY KEY (EntityID),
constraint CK_Entities_KnownTypes CHECK (
EntityType in ('Foo','Bar')),
constraint UQ_Entities_KnownTypes UNIQUE (EntityID,EntityType)
)
go
create table dbo.Foos (
EntityID uniqueidentifier not null,
EntityType as CAST('Foo' as varchar(10)) persisted,
FooFixedProperty1 int not null,
FooFixedProperty2 varchar(150) not null,
constraint PK_Foos PRIMARY KEY (EntityID),
constraint FK_Foos_Entities FOREIGN KEY (EntityID) references dbo.Entities (EntityID) on delete cascade,
constraint FK_Foos_Entities_Type FOREIGN KEY (EntityID,EntityType) references dbo.Entities (EntityID,EntityType)
)
go
create table dbo.Bars (
EntityID uniqueidentifier not null,
EntityType as CAST('Bar' as varchar(10)) persisted,
BarFixedProperty1 float not null,
BarFixedProperty2 int not null,
constraint PK_Bars PRIMARY KEY (EntityID),
constraint FK_Bars_Entities FOREIGN KEY (EntityID) references dbo.Entities (EntityID) on delete cascade,
constraint FK_Bars_Entities_Type FOREIGN KEY (EntityID,EntityType) references dbo.Entities (EntityID,EntityType)
)
go
create table dbo.ExtendedProperties (
PropertyID uniqueidentifier not null,
PropertyName varchar(100) not null,
PropertyType int not null,
constraint PK_ExtendedProperties PRIMARY KEY (PropertyID),
constraint CK_ExtendedProperties CHECK (
PropertyType between 1 and 4), --Or make type a varchar, and change check to IN('int', 'float'), etc
constraint UQ_ExtendedProperty_Names UNIQUE (PropertyName),
constraint UQ_ExtendedProperties_Types UNIQUE (PropertyID,PropertyType)
)
go
create table dbo.PropertyValues (
EntityID uniqueidentifier not null,
PropertyID uniqueidentifier not null,
PropertyType int not null,
IntValue int null,
FloatValue float null,
DecimalValue decimal(15,2) null,
CharValue varchar(max) null,
EntityType varchar(10) not null,
constraint PK_PropertyValues PRIMARY KEY (EntityID,PropertyID),
constraint FK_PropertyValues_ExtendedProperties FOREIGN KEY (PropertyID) references dbo.ExtendedProperties (PropertyID) on delete cascade,
constraint FK_PropertyValues_ExtendedProperty_Types FOREIGN KEY (PropertyID,PropertyType) references dbo.ExtendedProperties (PropertyID,PropertyType),
constraint FK_PropertyValues_Entities FOREIGN KEY (EntityID) references dbo.Entities (EntityID) on delete cascade,
constraint FK_PropertyValues_Entitiy_Types FOREIGN KEY (EntityID,EntityType) references dbo.Entities (EntityID,EntityType),
constraint CK_PropertyValues_OfType CHECK (
(IntValue is null or PropertyType = 1) and
(FloatValue is null or PropertyType = 2) and
(DecimalValue is null or PropertyType = 3) and
(CharValue is null or PropertyType = 4)),
--Shoot for bonus points
FooID as CASE WHEN EntityType='Foo' THEN EntityID END persisted,
constraint FK_PropertyValues_Foos FOREIGN KEY (FooID) references dbo.Foos (EntityID),
BarID as CASE WHEN EntityType='Bar' THEN EntityID END persisted,
constraint FK_PropertyValues_Bars FOREIGN KEY (BarID) references dbo.Bars (EntityID)
)
go
--Now we wrap up inserts into the Foos, Bars and PropertyValues tables as either Stored Procs, or instead of triggers
--To get the proper additional columns and/or base tables populated

My inclination would be to store things as XML if the database supports that nicely, or else have a small number of different tables for different data types (try to format data so it will fit one of a small number of types--don't use one table for VARCHAR(15), another for VARCHAR(20), etc.) Something like #5, but with all tables pre-created, and everything shoehorned into the existing tables. Each row should hold a main-record ID, record-type indicator, and a piece of data. Set up an index based on record-type, subsorted by data, and it will be possible to query for particular field values (where RecType==19 and Data=='Fred'). Querying for records that match multiple field values would be harder, but such is life.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas