I have a database which has three tables
Messages - PK = MessageId
Drafts - PK = DraftId
History - FK = RelatedItemId
The History table has a single foreign Key [RelatedItemId] which maps to one of the two Primary keys in Messages and Drafts.
Is there a name for this relationship?
Is it just bad design?
Is there a better way to design this relationship?
Here are the CREATE TABLE statements for this question:
CREATE TABLE [dbo].[History](
[HistoryId] [uniqueidentifier] NOT NULL,
[RelatedItemId] [uniqueidentifier] NULL,
CONSTRAINT [PK_History] PRIMARY KEY CLUSTERED ( [HistoryId] ASC )
)
CREATE TABLE [dbo].[Messages](
[MessageId] [uniqueidentifier] NOT NULL,
CONSTRAINT [PK_Messages] PRIMARY KEY CLUSTERED ( [MessageId] ASC )
)
CREATE TABLE [dbo].[Drafts](
[DraftId] [uniqueidentifier] NOT NULL,
CONSTRAINT [PK_Drafts] PRIMARY KEY CLUSTERED ( [DraftId] ASC )
)
In a short description the solution you have used is called:
Polymorphic Association
Objective: Reference Multiple Parents
Resulting anti-pattern: Use dual-purpose foreign key, violating first normal form (atomic issue), loosing referential integrity
Solution: Simplify the Relationship
More information about the problem.
BTW createing a common super-table will help you:
Is there a name for this relationship?
There is no standard name that I'm aware of, but I've heard people using the term "generic FKs" or even "inner-platform effect".
Is it just bad design?
Yes.
The reason: it prevents you from declaring a FOREIGN KEY, and therefore prevents the DBMS from enforcing referential integrity directly. Therefore you must enforce it trough imperative code, which is surprisingly difficult.
Is there a better way to design this relationship?
Yes.
Create separate FOREIGN KEY for each referenced table. Make them NULL-able, but make sure exactly one of them is non-NULL, through a CHECK constraint.
Alternatively, take a look at inheritance.
Best practice I have found is to create a Function that returns whether the passed in value exists in either of your Messages and Drafts PK columns. You can then add a constraint on the column on the History that calls this function and will only insert if it passes (i.e. it exists).
Adding non-parsed example Code:
CREATE FUNCTION is_related_there (
IN #value uniqueidentifier )
RETURNS TINYINT
BEGIN
IF (select count(DraftId) from Drafts where DraftId = #value + select count(MessageId) from Messages where MessageId = #value) > 0 THEN
RETURN 1;
ELSE
RETURN 0;
END IF;
END;
ALTER TABLE History ADD CONSTRAINT
CK_HistoryExists CHECK (is_related_there (RelatedItemId) = 1)
Hope that runs and helps lol
Related
Im trying to create a many to many relation between a hypertable with the name 'measurements' and a table with the name 'recipe'.
A measurement can have multiple recipes and a recipe can be connected to multiple measurements.
DROP TABLE IF EXISTS measurement_ms;
CREATE TABLE IF NOT EXISTS measurement_ms
(
id SERIAL,
value VARCHAR(255) NULL,
timestamp TIMESTAMP(6) NOT NULL,
machine_id INT NOT NULL,
measurement_type_id INT NOT NULL,
point_of_measurement_id INT NOT NULL,
FOREIGN KEY (machine_id) REFERENCES machine (id),
FOREIGN KEY (measurement_type_id) REFERENCES measurement_type (id),
FOREIGN KEY (point_of_measurement_id) REFERENCES point_of_measurement (id),
PRIMARY KEY (id, timestamp)
);
CREATE INDEX ON measurement_ms (machine_id, timestamp ASC);
CREATE INDEX ON measurement_ms (measurement_type_id, timestamp ASC);
CREATE INDEX ON measurement_ms (point_of_measurement_id, timestamp ASC);
-- --------------------------------------------------------------------------
-- Create timescale hypertable
-- --------------------------------------------------------------------------
SELECT create_hypertable('measurement_ms', 'timestamp', chunk_time_interval => interval '1 day');
DROP TABLE IF EXISTS recipe;
CREATE TABLE IF NOT EXISTS recipe
(
id SERIAL PRIMARY KEY,
name VARCHAR(255) NOT NULL,
type VARCHAR(255) NOT NULL,
code INT NOT NULL
);
DROP TABLE IF EXISTS measurement_recipe;
CREATE TABLE IF NOT EXISTS measurement_recipe
(
id SERIAL PRIMARY KEY,
measurement_id INT NOT NULL,
recipe_id INT NOT NULL
FOREIGN KEY (recipe_id) REFERENCES recipe(id),
FOREIGN KEY (measurement_id) REFERENCES measurement_ms(id)
);
CREATE INDEX fk_measurement_recipe_measurement ON measurement_recipe (measurement_id ASC);
CREATE INDEX fk_measurement_recipe_recipe ON measurement_recipe (recipe_id ASC);
The SQL script as shown above are the tables that i want to connect. The solution above doesnt work because of the constraint by timescale.
Timescale has the constraint that you cant use hypertable values as a foreign key.
Is there a alternative solution for creating a many to many relationship between tables without actually using a many to many relation.
TimescaleDB is designed for time series data, where each point usually is attached to some moment in time and contains all relevant data. It is common to link each point to metadata, which are already present, however, doing opposite is uncommon. TimescaleDB is optimised for time series data by chunking data, so DMLs and many select queries don't require to touch all chunks. However, maintaining foreign key constraints into hypertable might require to touch all chunks on every insert into referencing table measurement_recipe.
The use case of the question is time series with complex measurements. The proposed schema seems to be normalisation of the original schema. I guess it simplifies querying the measurement data. I see two approaches to deal with complex measurements:
Keep data denormalised and store both recipes and measurements in measurement table in a single row or few rows with help of complex structures such as JSONB or array. The drawback is that some queries will be difficult to write and defining some continuous aggregates might not be possible.
Do normalisation as proposed in the question but don't force foreign key constraints. It will allow to store referencing values, which can be used for joining the tables. Since the normalisation is done automatically as a step of transforming incoming complex data, the constraints will be preserved if there are no bugs in the transformation code. The bugs can be prevented through regression testing. Still with normalised schema it will not be possible to use continuous aggregates, since joins are not allowed (maintaining continuous aggregates with joins might require to touch all chunks).
My suggestion is to go for option 1 and try to be smart there. I don't have good proposal as it is unclear what the original data structure in JSON is, and what the queries are.
I have two tables on a SQL database:
MODEL (ID_MODEL, MODEL, ID_manu, ID_CLASS) and CLASS ( ID_CLASS, CLASS)
They are linked using the ID_CLASS.
The query to delete information has a none rule, so it's supposed to give an error when I try to delete a row from CLASS, if the ID_CLASS is being used on the MODEL table.
But it's being ignored. It's deleting the CLASS from the CLASS table and the row on the MODEL table is removed from the DataGridView, but it's kept on the table.
The ID_CLASS options on MODELis a NOT NULL as you can see below:
CREATE TABLE [dbo].[MODEL] (
[ID_MODEL] INT IDENTITY (1, 1) NOT NULL,
[MODEL] VARCHAR (50) NOT NULL,
[ID_manu] INT NOT NULL,
[ID_CLASS] INT NOT NULL,
PRIMARY KEY CLUSTERED ([ID_MODEL] ASC),
UNIQUE NONCLUSTERED ([MODEL] ASC)
);
Even after deleting and recreating the relationship its still happening.
Also there are two other tables with a similar configuration (same rule and data type) and on that it's working as expected. It retrieves an error every time the delete query is run.
You have to add foreign key on dbo.Model:
alter table dbo.Model
add constraint FK_Model_Class foreign key (ID_Class) references dbo.Class(ID_Class);
I'm trying to create SQL tables to represent a series of codes used by a third-party API. So far, I have the following tables:
CREATE TABLE ApiCode (
Id int NOT NULL IDENTITY(1, 1) PRIMARY KEY,
ResponseCode char(1) NOT NULL,
ResponseSubCode char(1) NOT NULL,
ResponseSubSubCode char(1) NULL,
MinorCodeRangeId int NULL REFERENCES ApiMinorCodeRange,
Description nvarchar(500)
)
CREATE TABLE ApiMinorCodeRange (
Id int NOT NULL IDENTITY(1, 1) PRIMARY KEY,
FromMinorCode char(4) NOT NULL,
ThruMinorCode char(4) NOT NULL
)
CREATE TABLE ApiMinorCode (
Code char(4) NOT NULL PRIMARY KEY,
Description nvarchar(500)
)
The problem is, FromMinorCode and ThruMinorCode can reference codes that don't exist. For example: a range can indicate "5000 - 5ZZZ", but MinorCode might only have entries defined for "5000 - 500A". New codes are added every few months, so the ApiMinorCodeRange table needs to reference the entire range defined in the specs.
I was planning to create foreign keys anyway and mark them as NOCHECK:
ALTER TABLE ApiMinorCodeRange ADD CONSTRAINT FK_FromMinorCode FOREIGN KEY ( FromMinorCode ) REFERENCES ApiMinorCode
ALTER TABLE ApiMinorCodeRange NOCHECK CONSTRAINT FK_FromMinorCode
ALTER TABLE ApiMinorCodeRange ADD CONSTRAINT FK_ThruMinorCode FOREIGN KEY ( ThruMinorCode ) REFERENCES ApiMinorCode
ALTER TABLE ApiMinorCodeRange NOCHECK CONSTRAINT FK_ThruMinorCode
Is this semantically correct?
Will Sql Server's query optimizer be ok with foreign keys that reference an imaginary row?
Should I create a dummy value "5ZZZ - Reserved for future use" instead of setting "NoCheck"?
You are trying to implement a business rule that will apply to codes that you have not yet seen. There is not necessarily a "right" way to do this.
Does a range relationship have to include valid codes? I don't see why. For instance, this picture of a set of encyclopedias (remember those?) has ranges on each volume, such as:
A-B
C
Sto-Zyg
I don't assume that "sto" is a valid entry in that volume. I do assume that "stochastic process" would be in the volume.
Why should your codes be different? More pertinently in your case, the range in your case could (possibly) be '5' to '5ZZZ', even though a '5' might not be a valid code.
And, your rules could end up extending beyond mere ranges. Perhaps some major code has all minor codes that start with "5" and end with "Z".
My conclusion for the ranges is that requiring a foreign key relationship isn't necessary.
That said, there is another problem that you might want to deal with. What prevents a code from being in multiple ranges? I suspect that you would need a trigger to enforce this rule.
I'm reading a book on EF4 and I came across this problem situation:
So I was wondering how to create this database so I can follow along with the example in the book.
How would I create these tables, using simple TSQL commands? Forget about creating the database, imagine it already exists.
You've been given the code. I want to share some information on why you might want to have two tables in a relationship like that.
First when two tables have the same Primary Key and have a foreign key relationship, that means they have a one-to-one relationship. So why not just put them in the same table? There are several reasons why you might split some information out to a separate table.
First the information is conceptually separate. If the information contained in the second table relates to a separate specific concern, it makes it easier to work with it the data is in a separate table. For instance in your example they have separated out images even though they only intend to have one record per SKU. This gives you the flexibility to easily change the table later to a one-many relationship if you decide you need multiple images. It also means that when you query just for images you don't have to actually hit the other (perhaps significantly larger) table.
Which bring us to reason two to do this. You currently have a one-one relationship but you know that a future release is already scheduled to turn that to a one-many relationship. In this case it's easier to design into a separate table, so that you won't break all your code when you move to that structure. If I were planning to do this I would go ahead and create a surrogate key as the PK and create a unique index on the FK. This way when you go to the one-many relationship, all you have to do is drop the unique index and replace it with a regular index.
Another reason to separate out a one-one relationship is if the table is getting too wide. Sometimes you just have too much information about an entity to easily fit it in the maximum size a record can have. In this case, you tend to take the least used fields (or those that conceptually fit together) and move them to a separate table.
Another reason to separate them out is that although you have a one-one relationship, you may not need a record of what is in the child table for most records in the parent table. So rather than having a lot of null values in the parent table, you split it out.
The code shown by the others assumes a character-based PK. If you want a relationship of this sort when you have an auto-generating Int or GUID, you need to do the autogeneration only on the parent table. Then you store that value in the child table rather than generating a new one on that table.
When it says the tables share the same primary key, it just means that there is a field with the same name in each table, both set as Primary Keys.
Create Tables
CREATE TABLE [Product (Chapter 2)](
SKU varchar(50) NOT NULL,
Description varchar(50) NULL,
Price numeric(18, 2) NULL,
CONSTRAINT [PK_Product (Chapter 2)] PRIMARY KEY CLUSTERED
(
SKU ASC
)
)
CREATE TABLE [ProductWebInfo (Chapter 2)](
SKU varchar(50) NOT NULL,
ImageURL varchar(50) NULL,
CONSTRAINT [PK_ProductWebInfo (Chapter 2)] PRIMARY KEY CLUSTERED
(
SKU ASC
)
)
Create Relationships
ALTER TABLE [ProductWebInfo (Chapter 2)]
ADD CONSTRAINT fk_SKU
FOREIGN KEY(SKU)
REFERENCES [Product (Chapter 2)] (SKU)
It may look a bit simpler if the table names are just single words (and not key words, either), for example, if the table names were just Product and ProductWebInfo, without the (Chapter 2) appended:
ALTER TABLE ProductWebInfo
ADD CONSTRAINT fk_SKU
FOREIGN KEY(SKU)
REFERENCES Product(SKU)
This simply an example that I threw together using the table designer in SSMS, but should give you an idea (note the foreign key constraint at the end):
CREATE TABLE dbo.Product
(
SKU int NOT NULL IDENTITY (1, 1),
Description varchar(50) NOT NULL,
Price numeric(18, 2) NOT NULL
) ON [PRIMARY]
ALTER TABLE dbo.Product ADD CONSTRAINT
PK_Product PRIMARY KEY CLUSTERED
(
SKU
)
CREATE TABLE dbo.ProductWebInfo
(
SKU int NOT NULL,
ImageUrl varchar(50) NULL
) ON [PRIMARY]
ALTER TABLE dbo.ProductWebInfo ADD CONSTRAINT
FK_ProductWebInfo_Product FOREIGN KEY
(
SKU
) REFERENCES dbo.Product
(
SKU
) ON UPDATE NO ACTION
ON DELETE NO ACTION
See how to create a foreign key constraint. http://msdn.microsoft.com/en-us/library/ms175464.aspx This also has links to creating tables. You'll need to create the database as well.
To answer your question:
ALTER TABLE ProductWebInfo
ADD CONSTRAINT fk_SKU
FOREIGN KEY (SKU)
REFERENCES Product(SKU)
Lets say you have a relational DB table like INVENTORY_ITEM. It's generic in the sense that anything that's in inventory needs a record here. Now lets say there are tons of different types of inventory and each different type might have unique fields that they want to keep track of (e.g. forks might track the number of tines, but refrigerators wouldn't have a use for that field). These fields must be user-definable per category type.
There are many ways to solve this:
Use ALTER TABLE statements to actually add nullable columns on the fly (yuk)
Have two tables with a one-to-one mapping, INVENTORY_ITEM, and INVENTORY_ITEM_USER, and use ALTER TABLE statements to add and remove nullable columns from the latter table on the fly (a bit nicer).
Add a CUSTOM_PROPERTY table, and a CUSTOM_PROPERTY_VALUE table, and add/remove rows in CUSTOM_PROPERTY when the user adds and removes rows, and store the values in the latter table. This is nice and generic, but the performance would suffer. If you had an average of 20 values per item, the number of rows in CUSTOM_PROPERTY_VALUE goes up at 20 times the rate, and you still need to include columns in CUSTOM_PROPERTY_VALUE for every different data type that you might want to store.
Have one big varchar(MAX) field on INVENTORY_ITEM to store custom properties as XML.
I guess you could have individual tables for each category type that hangs off the INVENTORY_ITEM table, and these get created/destroyed on the fly when the user creates inventory types, and the columns get updated when they add/remove properties to those types. Seems messy though.
Is there a best-practice for this? It seems to me that option 4 is clean, but doesn't allow you to easily search by the metadata. I've used a variant of 3 before, but only on a table that had a really small number of rows, so performance wasn't an issue. It always seemed to me that 2 was a good idea, but it doesn't fit well with auto-generated entity frameworks, so you'd have to exclude the custom properties table from the entity generation and just write your own custom data access code to handle it.
Am I missing any alternatives? Is there a way for SQL server to "look into" XML data in a column so it could actually do stuff with option 4 now?
I am using the xml type column for this kind of situations...
http://msdn.microsoft.com/en-us/library/ms189887.aspx
Before xml we had to use the option 3. Which in my point of view is still a good way to do it. Espacialy if you have a Data Access Layer that is able to handle the type conversion properly for you. We stored everything as string values and defined a column that held the orignial data type for the conversion.
Options 1 and 2 are a no-go. Don't change the database schema in production on the fly.
Option 5 could be done in a separate database... But still no control over the schema and the user would need the rights to create tables etc.
Definitely the 3.
Sometimes 4 if you have a very good reason to do so.
Do not ever dynamically modify database structure to accommodate for incoming data. One day something could break and damage your database. It is simply not done this way.
3 or 4 are the only ones I would consider - you don't want to be changing the schema on the fly, especially if you're using some kind of mapping layer.
I've generally gone with option 3. As a bit of sanity, I always have a type column in the CUSTOM_PROPERTY table, which is repeated in the CUSTOM_PROPERTY_VALUE table. By adding a superkey to the CUSTOM_PROPERTY table of <Primary Key, Type>, you can then have a foreign key that references this (as well as the simpler foreign key to just the primary key). And finally, a check constraint that ensures that only the relevant column in CUSTOM_PROPERTY_VALUE is not null, based on this type column.
In this way, you know that if someone has defined a CUSTOM_PROPERTY, say, Tine count, of type int, that you're actually only ever going to find an int stored in the CUSTOM_PROPERTY_VALUE table, for all instances of this property.
Edit
If you need it to reference multiple entity tables, then it can get more complex, especially if you want full referential integrity. For instance (with two distinct entity types in the database):
create table dbo.Entities (
EntityID uniqueidentifier not null,
EntityType varchar(10) not null,
constraint PK_Entities PRIMARY KEY (EntityID),
constraint CK_Entities_KnownTypes CHECK (
EntityType in ('Foo','Bar')),
constraint UQ_Entities_KnownTypes UNIQUE (EntityID,EntityType)
)
go
create table dbo.Foos (
EntityID uniqueidentifier not null,
EntityType as CAST('Foo' as varchar(10)) persisted,
FooFixedProperty1 int not null,
FooFixedProperty2 varchar(150) not null,
constraint PK_Foos PRIMARY KEY (EntityID),
constraint FK_Foos_Entities FOREIGN KEY (EntityID) references dbo.Entities (EntityID) on delete cascade,
constraint FK_Foos_Entities_Type FOREIGN KEY (EntityID,EntityType) references dbo.Entities (EntityID,EntityType)
)
go
create table dbo.Bars (
EntityID uniqueidentifier not null,
EntityType as CAST('Bar' as varchar(10)) persisted,
BarFixedProperty1 float not null,
BarFixedProperty2 int not null,
constraint PK_Bars PRIMARY KEY (EntityID),
constraint FK_Bars_Entities FOREIGN KEY (EntityID) references dbo.Entities (EntityID) on delete cascade,
constraint FK_Bars_Entities_Type FOREIGN KEY (EntityID,EntityType) references dbo.Entities (EntityID,EntityType)
)
go
create table dbo.ExtendedProperties (
PropertyID uniqueidentifier not null,
PropertyName varchar(100) not null,
PropertyType int not null,
constraint PK_ExtendedProperties PRIMARY KEY (PropertyID),
constraint CK_ExtendedProperties CHECK (
PropertyType between 1 and 4), --Or make type a varchar, and change check to IN('int', 'float'), etc
constraint UQ_ExtendedProperty_Names UNIQUE (PropertyName),
constraint UQ_ExtendedProperties_Types UNIQUE (PropertyID,PropertyType)
)
go
create table dbo.PropertyValues (
EntityID uniqueidentifier not null,
PropertyID uniqueidentifier not null,
PropertyType int not null,
IntValue int null,
FloatValue float null,
DecimalValue decimal(15,2) null,
CharValue varchar(max) null,
EntityType varchar(10) not null,
constraint PK_PropertyValues PRIMARY KEY (EntityID,PropertyID),
constraint FK_PropertyValues_ExtendedProperties FOREIGN KEY (PropertyID) references dbo.ExtendedProperties (PropertyID) on delete cascade,
constraint FK_PropertyValues_ExtendedProperty_Types FOREIGN KEY (PropertyID,PropertyType) references dbo.ExtendedProperties (PropertyID,PropertyType),
constraint FK_PropertyValues_Entities FOREIGN KEY (EntityID) references dbo.Entities (EntityID) on delete cascade,
constraint FK_PropertyValues_Entitiy_Types FOREIGN KEY (EntityID,EntityType) references dbo.Entities (EntityID,EntityType),
constraint CK_PropertyValues_OfType CHECK (
(IntValue is null or PropertyType = 1) and
(FloatValue is null or PropertyType = 2) and
(DecimalValue is null or PropertyType = 3) and
(CharValue is null or PropertyType = 4)),
--Shoot for bonus points
FooID as CASE WHEN EntityType='Foo' THEN EntityID END persisted,
constraint FK_PropertyValues_Foos FOREIGN KEY (FooID) references dbo.Foos (EntityID),
BarID as CASE WHEN EntityType='Bar' THEN EntityID END persisted,
constraint FK_PropertyValues_Bars FOREIGN KEY (BarID) references dbo.Bars (EntityID)
)
go
--Now we wrap up inserts into the Foos, Bars and PropertyValues tables as either Stored Procs, or instead of triggers
--To get the proper additional columns and/or base tables populated
My inclination would be to store things as XML if the database supports that nicely, or else have a small number of different tables for different data types (try to format data so it will fit one of a small number of types--don't use one table for VARCHAR(15), another for VARCHAR(20), etc.) Something like #5, but with all tables pre-created, and everything shoehorned into the existing tables. Each row should hold a main-record ID, record-type indicator, and a piece of data. Set up an index based on record-type, subsorted by data, and it will be possible to query for particular field values (where RecType==19 and Data=='Fred'). Querying for records that match multiple field values would be harder, but such is life.