How to maintain database integrity when table inheritance cannot be used

How to maintain database integrity when table inheritance cannot be used - sql

I have several different component types that each have drastically different data specs to store so each component type needs its own table, but they all share some common columns. I'm most concerned with [component.ID] which must be a unique identifier to a component regardless of component type (unique across many tables).
First Option
My first idea was inheritance where the table for each component type inherits a generic [component] table.
create table if not exists component (
ID long primary key default nextval('component_id_seq'),
typeID long not null references componentType (ID),
manufacturerID long not null references manufacturer (ID),
retailPrice numeric check (retailPrice >= 0.0),
purchasePrice numeric check (purchasePrice >= 0.0),
manufacturerPartNum varchar(255) not null,
isLegacy boolean default false,
check (retailPrice >= purchasePrice)
);
create table if not exists motherboard (
foo long,
bar long
) inherits component; //<-- this guy right here!!
/* there would be many other tables with different specific types of components
which each inherit the [component] table*/
PostgreSQL inheritance has some caveats that seem to make this a bad idea.
Constraints like unique or primary key are not respected by the inheriting table. Even if you specify unique in the inheriting table it would only be unique in that table and could duplicate values in the parent table or other inheriting tables.
References do not carry over from the parent table. So the references for typeID or manufacturerID would not apply to the inheriting table.
References to the parent table would not include data in the inheriting tables. This is the worst deal breaker for me using inheritance because I need to be able to reference to all components regardless of type.
Second Option
If I don't use inheritance and just use the component table as a master component list with data common to any component of any type and then have a table for each type of component where each entry refers to a component.ID. that works fine but how do I enforce it?
How do I enforce that each entry in the component table has one and only one corresponding entry in only one of many other tables? The part that baffles me is that there are many tables and the corresponding entry could be in any of them.
A simple reference back to the component table will ensure that each row in the many specific component type tables has one valid component.id to which it belongs.
Third Option
Last of all I could forego a master component table altogether and just have each table for a specific component type have those same columns. Then I am left with the conundrum of how to enforce a unique component ID across many tables and also how to search across all these many tables (which may very well grow or shrink) in queries. I don't want a huge unwieldy UNION between all these tables. That would bog any select query to frozen molasses speed.
Fourth Option
This strikes me as a problem that comes up from time to time is DB design and there is probably a name for it that I don't know and perhaps a solution that is different entirely from the above three options.

The foreign key should contain the type of a subcomponent, the example speaks for itself.
create table component(
id int generated always as identity primary key,
-- or
-- id serial primary key,
type_id int not null,
general_info text,
unique (type_id, id)
);
create table subcomponent_1 (
id int primary key,
type_id int generated always as (1) stored,
-- or
-- type_id int default 1 check(type_id = 1),
specific_info text,
foreign key(type_id, id) references component(type_id, id)
);
insert into component (type_id, general_info)
values (1, 'component type 1');
insert into subcomponent_1 (id, specific_info)
values (1, 'specific info');
Note:
update component
set type_id = 2
where id = 1;
ERROR: update or delete on table "component" violates foreign key constraint "subcomponent_1_type_id_id_fkey" on table "subcomponent_1"
DETAIL: Key (type_id, id)=(1, 1) is still referenced from table "subcomponent_1".

Related

Can I use identity for primary key in more than one table in the same ER model

As it is said in the title, my question is can I use int identity(1,1) for primary key in more than one table in the same ER model? I found on Internet that Primary Key need to have unique value and row, for example if I set int identity (1,1) for table:
CREATE TABLE dbo.Persons
(
Personid int IDENTITY(1,1) PRIMARY KEY,
LastName varchar(255) NOT NULL,
FirstName varchar(255),
Age int
);
GO
and the other table
CREATE TABLE dbo.Job
(
jobID int IDENTITY(1,1) NOT NULL PRIMARY KEY,
nameJob NVARCHAR(25) NOT NULL,
Personid int FOREIGN KEY REFERENCES dbo.Persons(Personid)
);
Wouldn't Personid and jobID have the same value and because of that cause an error?

Constraints in general are defined and have a scope of one table (object) in the database. The only exception is the FOREIGN KEY which usually has a REFERENCE to another table.
The PRIMARY KEY (or any UNIQUE key) sets a constraint only on the table it is defined on and is not affecting or is not affected by other constraints on other tables.
The PRIMARY KEY defines a column or a set of columns which can be used to uniquely identify one record in one table (and none of the columns can hold NULL, UNIQUE on the other hand allows NULLs and how it is treated might differ in different database engines).
So yes, you might have the same value for PersonID and JobID, but their meaning is different. (And to select the one unique record, you will need to tell SQL Server in which table and in which column of that table you are looking for it, this is the table list and the WHERE or JOIN conditions in the query).
The query SELECT * FROM dbo.Job WHERE JobID = 1; and SELECT * FROM dbo.Person WHERE PersonID = 1; have a different meaning even when the value you are searching for is the same.
You will define the IDENTITY on the table (the table can have only one IDENTITY column). You don't need to have an IDENTITY definition on a column to have the value 1 in it, the IDENTITY just gives you an easy way to generate unique values per table.
You can share sequences across tables by using a SEQUENCE, but that will not prevent you to manually insert the same values into multiple tables.
In short, the value stored in the column is just a value, the table name, the column name and the business rules and roles will give it a meaning.
To the notion "every table needs to have a PRIMARY KEY and IDENTITY, I would like to add, that in most cases there are multiple (independent) keys in the table. Usually every entity has something what you can call business key, which is in loose terms the key what the business (humans) use to identify something. This key has very similar, but usually the same characteristics as a PRIMARY KEY with IDENTITY.
This can be a product's barcode, or the employee's ID card number, or something what is generated in another system (say HR) or a code which is assigned to a customer or partner.
These business keys are useful for humans, but not always useful for computers, but they could serve as PRIMARY KEY.
In databases we (the developers, architects) like simplicity and a business key can be very complex (in computer terms), can consist of multiple columns, and can also cause performance issues (comparing a strings is not the same as comparing numbers, comparing multiple columns is less efficient than comparing one column), but the worst, it might change over time. To resolve this, we tend to create our own technical key which then can be used by computers more easily and we have more control over it, so we use things like IDENTITYs and GUIDs and whatnot.

How to define sql relationship to variable number of tables

Let's hope my explanation is clearer than the title.
I have a set of files. Each file contains a variable number of papers/forms. So I have a table called files, with an fid.
For the sake of simplicity, let's say we have only 3 different forms, each contains its own set of data. So I have 3 tables, FormA, FormB and FormC, with their primary keys Aid, Bid, and Cid respectively.
The file can contain for example, 2 A forms and 1 B form, or 1 of each form, or 3 A forms, 2 B forms, 2 C forms. You get the idea, variable number of forms, and might include more than 1 of the same type.
How to properly represent such relationship in SQL? If it matters, I'm using PostGreSQL.

In PostgreSQL here is how I would do this. Note I am using dangerous (non-beginner/advanced) tools, and it is worth understanding the gotchas.
Now since there are a number of tables here, the question is how we manage the constraints. This is a little convoluted but here is what I would do:
CREATE TABLE file (...);
-- add your tables for tracking form data here....
CREATE TABLE file_to_form (
file_id int NOT NULL;
refkey int NOT NULL,
form_class char NOT NULL
CHECK NOINHERIT (file_id IS NULL)
); -- this table will never have anything in it.
CREATE TABLE file_to_form_a (
PRIMARY KEY (file_id, refkey, form_class)
FOREIGN KEY (refkey) REFERENCES file_a (form_id)
CHECK (form_class = 'a')
) INHERITS (file_to_form);
CREATE TABLE file_to_form_b (
PRIMARY KEY (file_id, refkey, form_class)
FOREIGN KEY (refkey) REFERENCES file_b (form_id)
CHECK (form_class = 'b')
) INHERITS (file_to_form);
-- etc
Now you have a consistent interface for showing which forms are associated with files, and can find them by searching the file_to_form table (which will function similar to a read-only view of all tables that inherit it). This is one of those cases where PostgreSQL's table inheritance really helps, if you take the gotchas seriously and put some thought into how to handle them.

Storing arbitrary attributes on tables

I have 3 tables, x, y, and z. I want to be able to attach arbitrary
attributes to each row in each table. x, y, and z have nothing in
common other than the fact that they all have an integer primary key called
id and should be able to have arbitrary attributes attached to them.
Is it better to make a single attributes table, like
create table attributes (
table enum('x', 'y', 'z'),
xyz_id integer,
name varchar(50),
value text,
primary key (table, xyz_id, name)
);
Or is it best to make separate tables, like
create table x_attributes (
x_id integer,
name varchar(50),
value text,
primary key (x_id, name),
foreign key (x_id) references x (id)
);
create table y_attributes (...);
create table z_attributes (...);
The second option (separate tables) seems to be cleaner, but requires a lot
more boilerplate on both the database side and the application side.
I'm also open to suggestions other than those two.
Note: I've considered the possibility of using a document store like MongoDB, but
the data I'm working with is fundamentally relational.

Go with one table with an enum column, it will make grabbing all of the attributes for each row easier in the long run.

Constraint To Prevent Adding Value Which Exists In Another Table

I would like to add a constraint which prevents adding a value to a column if the value exists in the primary key column of another table. Is this possible?
EDIT:
Table: MasterParts
MasterPartNumber (Primary Key)
Description
....
Table: AlternateParts
MasterPartNumber (Composite Primary Key, Foreign Key to MasterParts.MasterPartNumber)
AlternatePartNumber (Composite Primary Key)
Problem - Alternate part numbers for each master part number must not themselves exist in the master parts table.
EDIT 2:
Here is an example:
MasterParts
MasterPartNumber Decription MinLevel MaxLevel ReOderLevel
010-00820-50 Garmin GTN™ 750 1 5 2
AlternateParts
MasterPartNumber AlternatePartNumber
010-00820-50 0100082050
010-00820-50 GTN750

only way I could think of solving this would be writing a checking function(not sure what language you are working with), or trying to play around with table relationships to ensure that it's unique

Why not have a single "part" table with an "is master part" flag and then have an "alternate parts" table that maps a "master" part to one or more "alternate" parts?

Here's one way to do it without procedural code. I've deliberately left out ON UPDATE CASCADE and ON DELETE CASCADE, but in production I'd might use both. (But I'd severely limit who's allowed to update and delete part numbers.)
-- New tables
create table part_numbers (
pn varchar(50) primary key,
pn_type char(1) not null check (pn_type in ('m', 'a')),
unique (pn, pn_type)
);
create table part_numbers_master (
pn varchar(50) primary key,
pn_type char(1) not null default 'm' check (pn_type = 'm'),
description varchar(100) not null,
foreign key (pn, pn_type) references part_numbers (pn, pn_type)
);
create table part_numbers_alternate (
pn varchar(50) primary key,
pn_type char(1) not null default 'a' check (pn_type = 'a'),
foreign key (pn, pn_type) references part_numbers (pn, pn_type)
);
-- Now, your tables.
create table masterparts (
master_part_number varchar(50) primary key references part_numbers_master,
min_level integer not null default 0 check (min_level >= 0),
max_level integer not null default 0 check (max_level >= min_level),
reorder_level integer not null default 0
check ((reorder_level < max_level) and (reorder_level >= min_level))
);
create table alternateparts (
master_part_number varchar(50) not null references part_numbers_master (pn),
alternate_part_number varchar(50) not null references part_numbers_alternate (pn),
primary key (master_part_number, alternate_part_number)
);
-- Some test data
insert into part_numbers values
('010-00820-50', 'm'),
('0100082050', 'a'),
('GTN750', 'a');
insert into part_numbers_master values
('010-00820-50', 'm', 'Garmin GTN™ 750');
insert into part_numbers_alternate (pn) values
('0100082050'),
('GTN750');
insert into masterparts values
('010-00820-50', 1, 5, 2);
insert into alternateparts values
('010-00820-50', '0100082050'),
('010-00820-50', 'GTN750');
In practice, I'd build updatable views for master parts and for alternate parts, and I'd limit client access to the views. The updatable views would be responsible for managing inserts, updates, and deletes. (Depending on your company's policies, you might use stored procedures instead of updatable views.)

Your design is perfect.
But SQL isn't very helpful when you try to implement such a design. There is no declarative way in SQL to enforce your business rule. You'll have to write two triggers, one for inserts into masterparts, checking the new masterpart identifier doesn't yet exist as an alias, and the other one for inserts of aliases checking that the new alias identifier doesn't yet identiy a masterpart.
Or you can do this in the application, which is worse than triggers, from the data integrity point of view.
(If you want to read up on how to enforce constraints of arbitrary complexity within an SQL engine, best coverage I have seen of the topic is in the book "Applied Mathematics for Database Professionals")

Apart that it sounds like a possibly poor design,
You in essence want values spanning two columns in different tables, to be unique.
In order to utilize DBs native capability to check for uniqueness, you can create a 3rd, helper column, which will contain a copy of all the values inside the wanted two columns. And that column will have uniqueness constraint. So for each new value added to one of your target columns, you need to add the same value to the helper column. In order for this to be an inner DB constraint, you can add this by a trigger.
And again, needing to do the above, sounds like an evidence for a poor design.
--
Edit:
Regarding your edit:
You say " Alternate part numbers for each master part number must not themselves exist in the master parts table."
This itself is a design decision, which you don't explain.
I don't know enough about the domain of your problem, but:
If you think of master and alternate parts, as totally different things, there is no reason why you may want "Alternate part numbers for each master part number must not themselves exist in the master parts table". Otherwise, you have a common notion of "parts" be it master or alternate. This means they need to be in the same table, and column.
If the second is true, you need something like this:
table "parts"
columns:
id - pk
is_master - boolean (assuming a part can not be master and alternate at the same time)
description - text
This tables role is to list and describe the parts.
Then you have several ways to denote which part is alternate to which. It depends on whether a part can be alternate to more than one part. And it sounds that anyway one master part can have several alternates.
You can do it in the same table, or create another one.
If same: add column: alternate_to, which will be null for master parts, and will have a foreign key into the id column of the same table.
Otherwise create a table, say "alternatives" with: master_id, alternate_id both referencing with a foreign key to the parts table.
(The first above assumes that a part cannot be alternate to more than one other part. If this is not true, the second will work anyway)

How can I share the same primary key across two tables?

I'm reading a book on EF4 and I came across this problem situation:
So I was wondering how to create this database so I can follow along with the example in the book.
How would I create these tables, using simple TSQL commands? Forget about creating the database, imagine it already exists.

You've been given the code. I want to share some information on why you might want to have two tables in a relationship like that.
First when two tables have the same Primary Key and have a foreign key relationship, that means they have a one-to-one relationship. So why not just put them in the same table? There are several reasons why you might split some information out to a separate table.
First the information is conceptually separate. If the information contained in the second table relates to a separate specific concern, it makes it easier to work with it the data is in a separate table. For instance in your example they have separated out images even though they only intend to have one record per SKU. This gives you the flexibility to easily change the table later to a one-many relationship if you decide you need multiple images. It also means that when you query just for images you don't have to actually hit the other (perhaps significantly larger) table.
Which bring us to reason two to do this. You currently have a one-one relationship but you know that a future release is already scheduled to turn that to a one-many relationship. In this case it's easier to design into a separate table, so that you won't break all your code when you move to that structure. If I were planning to do this I would go ahead and create a surrogate key as the PK and create a unique index on the FK. This way when you go to the one-many relationship, all you have to do is drop the unique index and replace it with a regular index.
Another reason to separate out a one-one relationship is if the table is getting too wide. Sometimes you just have too much information about an entity to easily fit it in the maximum size a record can have. In this case, you tend to take the least used fields (or those that conceptually fit together) and move them to a separate table.
Another reason to separate them out is that although you have a one-one relationship, you may not need a record of what is in the child table for most records in the parent table. So rather than having a lot of null values in the parent table, you split it out.
The code shown by the others assumes a character-based PK. If you want a relationship of this sort when you have an auto-generating Int or GUID, you need to do the autogeneration only on the parent table. Then you store that value in the child table rather than generating a new one on that table.

When it says the tables share the same primary key, it just means that there is a field with the same name in each table, both set as Primary Keys.
Create Tables
CREATE TABLE [Product (Chapter 2)](
SKU varchar(50) NOT NULL,
Description varchar(50) NULL,
Price numeric(18, 2) NULL,
CONSTRAINT [PK_Product (Chapter 2)] PRIMARY KEY CLUSTERED
(
SKU ASC
)
)
CREATE TABLE [ProductWebInfo (Chapter 2)](
SKU varchar(50) NOT NULL,
ImageURL varchar(50) NULL,
CONSTRAINT [PK_ProductWebInfo (Chapter 2)] PRIMARY KEY CLUSTERED
(
SKU ASC
)
)
Create Relationships
ALTER TABLE [ProductWebInfo (Chapter 2)]
ADD CONSTRAINT fk_SKU
FOREIGN KEY(SKU)
REFERENCES [Product (Chapter 2)] (SKU)
It may look a bit simpler if the table names are just single words (and not key words, either), for example, if the table names were just Product and ProductWebInfo, without the (Chapter 2) appended:
ALTER TABLE ProductWebInfo
ADD CONSTRAINT fk_SKU
FOREIGN KEY(SKU)
REFERENCES Product(SKU)

This simply an example that I threw together using the table designer in SSMS, but should give you an idea (note the foreign key constraint at the end):
CREATE TABLE dbo.Product
(
SKU int NOT NULL IDENTITY (1, 1),
Description varchar(50) NOT NULL,
Price numeric(18, 2) NOT NULL
) ON [PRIMARY]
ALTER TABLE dbo.Product ADD CONSTRAINT
PK_Product PRIMARY KEY CLUSTERED
(
SKU
)
CREATE TABLE dbo.ProductWebInfo
(
SKU int NOT NULL,
ImageUrl varchar(50) NULL
) ON [PRIMARY]
ALTER TABLE dbo.ProductWebInfo ADD CONSTRAINT
FK_ProductWebInfo_Product FOREIGN KEY
(
SKU
) REFERENCES dbo.Product
(
SKU
) ON UPDATE NO ACTION
ON DELETE NO ACTION

See how to create a foreign key constraint. http://msdn.microsoft.com/en-us/library/ms175464.aspx This also has links to creating tables. You'll need to create the database as well.
To answer your question:
ALTER TABLE ProductWebInfo
ADD CONSTRAINT fk_SKU
FOREIGN KEY (SKU)
REFERENCES Product(SKU)

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas