What's the best way to have a "repeating field" in SQL? - sql

I'm trying to set up a table that links two records from a different table. These links themselves need to be correlated to another table. So at the moment, my table looks like this:
link_id (primary key)
item_id_1 (foreign key)
item_id_2 (foreign key)
link_type (metadata)
However, the links between items are not directional (i.e. it should make no difference whether an item is the first or second listed in a link). Ideally, I'd like for the item_id field to just appear twice; as it is I'll have to be careful to always be checking for duplicates to make sure that there's never a record created linking 12 to 14 if 14 to 12 already exists.
Is there an elegant database design solution to this, or should I just adopt a convention (e.g. id_1 is always the smaller id number) and police duplication within the application?
Thanks in advance!

You could use a join table.
Table1: link_id (PK), link_type
JoinTable: table1_link_id, item_id (composite primary key composed of both ids)

Benzado already pointed it out - add a constraint that enforces that item_id_1 < item_id2:
ALTER TABLE t ADD CONSTRAINT CH_ITEM1_LESSTHAN_ITEM2 CHECK (item_id_1 < item_id_2)
So this will prevent the wrong data from being entered, rejecting such updates/inserts.
If you want to automatically correct it any situation where item_id_1 > item_id_2, you could add a trigger instead (technically you could have both, but then you might have some hassle getting it to work right, as check constraints could be checked before the trigger fires). Exact syntax for triggers depends on you RDBMS.

There's a couple of ways to implement this. One would be with a combination of a check constraint and a unique constraint.
alter table t23
add constraint c1_c2_ck check (c1 < c2)
/
alter table t23
add constraint t23_uk unique (c1, c2)
/
That would work in most DBMS flavours. An alternative approach, which would work in Oracle at least, would be to use a function-based index ....
create unique index t23_uidx on t23
(least(c1,c2), greatest(c1,c2))
/

Related

RDBMS primary key design for row versioning

I want to design primary key for my table with row versioning. My table contains 2 main fields : ID and Timestamp, and bunch of other fields. For a unique "ID" , I want to store previous versions of a record. Hence I am creating primary key for the table to be combination of ID and timestamp fields.
Hence to see all the versions of a particular ID, I can give,
Select * from table_name where ID=<ID_value>
To return the most recent version of a ID, I can use
Select * from table_name where ID=<ID_value> ORDER BY timestamp desc
and get the first element.
My question here is, will this query be efficient and run in O(1) instead of scanning the entire table to get all entries matching same ID considering ID field was a part of primary key fields? Ideally to get a result in O(1), I should have provided the entire primary key. If it does need to do entire table scan, then how else can I design my primary key so that I get this request done in O(1)?
Thanks,
Sriram
The canonical reference on this subject is Effective Timestamping in Databases:
https://www.cs.arizona.edu/~rts/pubs/VLDBJ99.pdf
I usually design with a subset of this paper's recommendations, using a table containing a primary key only, with another referencing table that has that key as well change_user, valid_from and valid_until colums with appropriate defaults. This makes referential integrity easy, as well as future value insertion and history retention. Index as appropriate, and consider check constraints or triggers to prevent overlaps and gaps if you expose these fields to the application for direct modification. These have an obvious performance overhead.
We then make a "current values view" which is exposed to developers, and is also insertable via an "instead of" trigger.
It's far easier and better to use the History Table pattern for this.
create table foo (
foo_id int primary key,
name text
);
create table foo_history (
foo_id int,
version int,
name text,
operation char(1) check ( operation in ('u','d') ),
modified_at timestamp,
modified_by text
primary key (foo_id, version)
);
Create a trigger to copy a foo row to foo_history on update or delete.
https://wiki.postgresql.org/wiki/Audit_trigger_91plus for a full example with postgres

Is it possible to create a foreign key constraint using "NOT IN" logic

Is it possible to add a foreign key constraint on a table which will allow values which do NOT exist in another table?
In the example below, both tables contain the field USER_ID. The constraint is that a customer and and an employee cannot have the same USER_ID value.
I am very limited in adding new tables or altering one of the tables in any way.
CUSTOMER
--------------------------
USER_ID varchar2(10)
EMPLOYEE
--------------------------
USER_ID varchar2(10)
I thought of a few workarounds, such as a view which contains the data from both tables or adding an insert trigger on the table I can modify.
No, no such thing exists, though it is possible to fake.
If you want to do this relationally (which would be a lot better than views/triggers) the easy option is to add a E to all employee IDs and a C to all customer IDs. However, this won't work if you have other attributes and you want to ensure they're not the same person (i.e. you're not just interested in the ID).
If this is the case you need to create a third table, let's call it PEOPLE:
create table people (
user_id varchar2(10) not null
, user_type varchar2(1) not null
, constraint pk_people primary key (user_id)
, constraint chk_people_user_types check ( user_type in ('C','E') )
);
C would stand for customer and E for employee in the check constraint. You then need to create a unique index/constraint on PEOPLE:
create index ui_people_id_type on people ( user_id, user_type );
Personally, I'd stop here and completely drop your CUSTOMER and EMPLOYEE tables; they no longer have any use and your problem has been solved.
If you don't have the ability to add new columns/tables you need to speak to the people who do and convince them to change it. Over-complicating things only leads to errors in logic and confusion (believe me - using a view means you need a lot of triggers to maintain your tables and you'll have to ensure that someone only ever updates the view). It's a lot easier to do things properly, even if they take longer.
However, if you really want to continue you alter your CUSTOMER and EMPLOYEE tables to include their USER_TYPE and ensure that it's always the same for every row in the table, i.e.:
alter table customers add user_type default 'C' not null;
alter table customers add constraint chk_customers_type
check ( user_type is not null and user_type = 'C' );
Unless you are willing to change the data model as someone else has suggested, the simplest way to proceed with the existing structure while maintaining mutual exclusion is to issue check constraints on the user_ids of both tables such that they validate only to mutually exclusive series.
For example, you could issue checks to ensure that only even numbers are assigned to customers and odd numbers to employees ( or vice-versa).
Or, since both IDS are varchar, stipulate using your check constraint that the ID begins with a known substring, such as 'EMP' or 'CUST'.
But these are only tricks and are hardly relational. Ideally, one would revise the data model. Hope this helps.

ORACLE Table design: M:N table best practice

I'd like to hear your suggestions on this very basic question:
Imagine these three tables:
--DROP TABLE a_to_b;
--DROP TABLE a;
--DROP TABLE b;
CREATE TABLE A
(
ID NUMBER NOT NULL ,
NAME VARCHAR2(20) NOT NULL ,
CONSTRAINT A_PK PRIMARY KEY ( ID ) ENABLE
);
CREATE TABLE B
(
ID NUMBER NOT NULL ,
NAME VARCHAR2(20) NOT NULL ,
CONSTRAINT B_PK PRIMARY KEY ( ID ) ENABLE
);
CREATE TABLE A_TO_B
(
id NUMBER NOT NULL,
a_id NUMBER NOT NULL,
b_id NUMBER NOT NULL,
somevalue1 VARCHAR2(20) NOT NULL,
somevalue2 VARCHAR2(20) NOT NULL,
somevalue3 VARCHAR2(20) NOT NULL
) ;
How would you design table a_to_b?
I'll give some discussion starters:
synthetic id-PK column or combined a_id,b_id-PK (dropping the "id" column)
When synthetic: What other indices/constraints?
When combined: Also index on b_id? Or even b_id,a_id (don't think so)?
Also combined when these entries are referenced themselves?
Also combined when these entries perhaps are referenced themselves in the future?
Heap or Index-organized table
Always or only up to x "somevalue"-columns?
I know that the decision for one of the designs is closely related to the question how the table will be used (read/write ratio, density, etc.), but perhaps we get a 20/80 solution as blueprint for future readers.
I'm looking forward to your ideas!
Blama
I have always made the PK be the combination of the two FKs, a_id and b_id in your example. Adding a synthetic id field to this table does no good, since you never end up looking for a row based on a knowledge of its id.
Using the compound PK gives you a constraint that prevents the same instance of the relationship between a and b from being inserted twice. If duplicate entries need to be permitted, there's something wrong with your data model at the conceptual level.
The index you get behind the scenes (for every DBMS I know of) will be useful to speed up common joins. An extra index on b_id is sometimes useful, depending on the kinds of joins you do frequently.
Just as a side note, I don't use the name "id" for all my synthetic pk columns. I prefer a_id, b_id. It makes it easier to manage the metadata, even though it's a little extra typing.
CREATE TABLE A_TO_B
(
a_id NUMBER NOT NULL REFERENCES A (a_id),
b_id NUMBER NOT NULL REFERENCES B (b_id),
PRIMARY KEY (a_id, b_id),
...
) ;
It's not unusual for ORMs to require (or, in more clueful ORMs, hope for) an integer column named "id" in addition to whatever other keys you have. Apart from that, there's no need for it. An id number like that makes the table wider (which usually degrades I/O performance just slightly), and adds an index that is, strictly speaking, unnecessary. It isn't necessary to identify the entity--the existing key does that--and it leads new developers into bad habits. (Specifically, giving every table an integer column named "id", and believing that that column alone is the only key you need.)
You're likely to need one or more of these indexed.
a_id
b_id
{a_id, b_id}
{b_id, a_id}
I believe Oracle should automatically index {a_id, b_id}, because that's the primary key. Oracle doesn't automatically index foreign keys. Oracle's indexing guidelines are online.
In general, you need to think carefully about whether you need ON UPDATE CASCADE or ON DELETE CASCADE. In Oracle, you only need to think carefully about whether you need ON DELETE CASCADE. (Oracle doesn't support ON UPDATE CASCADE.)
the other comments so far are good.
also consider adding begin_dt and end_dt to the relationship. in this way, you can manage a good number of questions about each relationship through time. (consider baseline issues)

Constraint To Prevent Adding Value Which Exists In Another Table

I would like to add a constraint which prevents adding a value to a column if the value exists in the primary key column of another table. Is this possible?
EDIT:
Table: MasterParts
MasterPartNumber (Primary Key)
Description
....
Table: AlternateParts
MasterPartNumber (Composite Primary Key, Foreign Key to MasterParts.MasterPartNumber)
AlternatePartNumber (Composite Primary Key)
Problem - Alternate part numbers for each master part number must not themselves exist in the master parts table.
EDIT 2:
Here is an example:
MasterParts
MasterPartNumber Decription MinLevel MaxLevel ReOderLevel
010-00820-50 Garmin GTN™ 750 1 5 2
AlternateParts
MasterPartNumber AlternatePartNumber
010-00820-50 0100082050
010-00820-50 GTN750
only way I could think of solving this would be writing a checking function(not sure what language you are working with), or trying to play around with table relationships to ensure that it's unique
Why not have a single "part" table with an "is master part" flag and then have an "alternate parts" table that maps a "master" part to one or more "alternate" parts?
Here's one way to do it without procedural code. I've deliberately left out ON UPDATE CASCADE and ON DELETE CASCADE, but in production I'd might use both. (But I'd severely limit who's allowed to update and delete part numbers.)
-- New tables
create table part_numbers (
pn varchar(50) primary key,
pn_type char(1) not null check (pn_type in ('m', 'a')),
unique (pn, pn_type)
);
create table part_numbers_master (
pn varchar(50) primary key,
pn_type char(1) not null default 'm' check (pn_type = 'm'),
description varchar(100) not null,
foreign key (pn, pn_type) references part_numbers (pn, pn_type)
);
create table part_numbers_alternate (
pn varchar(50) primary key,
pn_type char(1) not null default 'a' check (pn_type = 'a'),
foreign key (pn, pn_type) references part_numbers (pn, pn_type)
);
-- Now, your tables.
create table masterparts (
master_part_number varchar(50) primary key references part_numbers_master,
min_level integer not null default 0 check (min_level >= 0),
max_level integer not null default 0 check (max_level >= min_level),
reorder_level integer not null default 0
check ((reorder_level < max_level) and (reorder_level >= min_level))
);
create table alternateparts (
master_part_number varchar(50) not null references part_numbers_master (pn),
alternate_part_number varchar(50) not null references part_numbers_alternate (pn),
primary key (master_part_number, alternate_part_number)
);
-- Some test data
insert into part_numbers values
('010-00820-50', 'm'),
('0100082050', 'a'),
('GTN750', 'a');
insert into part_numbers_master values
('010-00820-50', 'm', 'Garmin GTN™ 750');
insert into part_numbers_alternate (pn) values
('0100082050'),
('GTN750');
insert into masterparts values
('010-00820-50', 1, 5, 2);
insert into alternateparts values
('010-00820-50', '0100082050'),
('010-00820-50', 'GTN750');
In practice, I'd build updatable views for master parts and for alternate parts, and I'd limit client access to the views. The updatable views would be responsible for managing inserts, updates, and deletes. (Depending on your company's policies, you might use stored procedures instead of updatable views.)
Your design is perfect.
But SQL isn't very helpful when you try to implement such a design. There is no declarative way in SQL to enforce your business rule. You'll have to write two triggers, one for inserts into masterparts, checking the new masterpart identifier doesn't yet exist as an alias, and the other one for inserts of aliases checking that the new alias identifier doesn't yet identiy a masterpart.
Or you can do this in the application, which is worse than triggers, from the data integrity point of view.
(If you want to read up on how to enforce constraints of arbitrary complexity within an SQL engine, best coverage I have seen of the topic is in the book "Applied Mathematics for Database Professionals")
Apart that it sounds like a possibly poor design,
You in essence want values spanning two columns in different tables, to be unique.
In order to utilize DBs native capability to check for uniqueness, you can create a 3rd, helper column, which will contain a copy of all the values inside the wanted two columns. And that column will have uniqueness constraint. So for each new value added to one of your target columns, you need to add the same value to the helper column. In order for this to be an inner DB constraint, you can add this by a trigger.
And again, needing to do the above, sounds like an evidence for a poor design.
--
Edit:
Regarding your edit:
You say " Alternate part numbers for each master part number must not themselves exist in the master parts table."
This itself is a design decision, which you don't explain.
I don't know enough about the domain of your problem, but:
If you think of master and alternate parts, as totally different things, there is no reason why you may want "Alternate part numbers for each master part number must not themselves exist in the master parts table". Otherwise, you have a common notion of "parts" be it master or alternate. This means they need to be in the same table, and column.
If the second is true, you need something like this:
table "parts"
columns:
id - pk
is_master - boolean (assuming a part can not be master and alternate at the same time)
description - text
This tables role is to list and describe the parts.
Then you have several ways to denote which part is alternate to which. It depends on whether a part can be alternate to more than one part. And it sounds that anyway one master part can have several alternates.
You can do it in the same table, or create another one.
If same: add column: alternate_to, which will be null for master parts, and will have a foreign key into the id column of the same table.
Otherwise create a table, say "alternatives" with: master_id, alternate_id both referencing with a foreign key to the parts table.
(The first above assumes that a part cannot be alternate to more than one other part. If this is not true, the second will work anyway)

Database table id-key Null value and referential integrity

I'm learning databases, using SQLce. Got some problems, with this error:
A foreign key value cannot be inserted because a corresponding primary key value does not exist.
How does the integrity and acceptance of data work when attempting to save a data row that does not have specified one foreign key. Isn't it possible to set it to NULL in some way, meaning it will not reference the other table? In case, how would I do that? (For an integer key field)
Also, what if you save a row with a valid foreign key that corresponds to an existing primary key in other table. But then decide to delete that entry in this other table. So the foreign key will no longer be valid. Will I be allowed to delete? How does it work? I would think it should then be simply reset to a null value.. But maybe it's not that simple?
What you need to do is insert your data starting from the parent down.
So if you have an orders table and an items table that refers to orders, you have to create the new order first before adding all the children to the list.
Many of the data access libraries that you can get (in C# there is Linq to SQL) which will try and abstract this problem.
If you need to delete data you actually have to go the other way, delete the items before you delete the parent order record.
Of course, this assumes you are enforcing the foreign key, it is possible to not enforce the key, which might be useful during a bulk delete.
This is because of "bad data" you have in the tables. Check if you have all corresponding values in the primary table.
DBMS checks the referential integrity for ensuring the "correctness" of data within database.
For example, if you have a column called some_id in TableA with values 1 through 10 and a column called some_id in TableB with values 1 through 11 then TableA has no corresponding value (11) for that which you have already in TableB.
You can make a foreign key nullable but I don't recommend it. There are too many problems and inconsistencies that can arise. Redesign your tables so that you don't need to populate the foreign key for values that don't exist. Usually you can do that by moving the column to a new table for example.