Should I use a unique constraint in a table even though it isn't necessarily required?

Should I use a unique constraint in a table even though it isn't necessarily required? - sql

In Microsoft SQL Server, when creating tables, are there any downsides to using a unique constraint on a column even though you don't really need it to be unique?
An example would be descriptions for say a role in a user management system:
CREATE TABLE Role
(
ID TINYINT PRIMARY KEY NOT NULL IDENTITY(0, 1),
Title CHARACTER VARYING(32) NOT NULL UNIQUE,
Description CHARACTER VARYING(MAX) NOT NULL UNIQUE
)
My fear is that validating this constraint when doing frequent insertions in other tables will be a very time consuming process. I am unsure as to how this constraint is validated, but I feel like it could be done in a very efficient way or as a linear comparison.

Your fear becomes true: UNIQUE constraint are implemented as indices, and this is time and space consuming.
So, whenever you insert a new row, the database have to update the table, and also one index for each unique constraint.
So, according to you:
using a unique constraint on a column even though you don't really need it to be unique
the answer is no, don't use it. there are time and space downsides.
Your sample table would need a clustered index for the Id, and 2 extra indices, one for each unique constraint. This takes up space, and time to update the 3 indices on the inserts.
This would only be justified if you made queries filtering by those fields.
BY THE WAY:
The original post sample table have several flaws:
that syntax is not SQL Server syntax (and you tagged this as SQL Server)
you cannot create an index in a varchar(max) column
If you correct the syntax and create this table:
CREATE TABLE Role
(
ID tinyint PRIMARY KEY NOT NULL IDENTITY(0, 1),
Title varchar(32) NOT NULL UNIQUE,
Description varchar(32) NOT NULL UNIQUE
)
You can then execute sp_help Role and you'll find the 3 indices.

The database creates an index which backs up the UNIQUE constraint, so it should be very low-cost to do the uniqueness check.
http://msdn.microsoft.com/en-us/library/ms177420.aspx
The Database Engine automatically creates a UNIQUE index to enforce the uniqueness requirement of the UNIQUE constraint. Therefore, if an attempt to insert a duplicate row is made, the Database Engine returns an error message that states the UNIQUE constraint has been violated and does not add the row to the table. Unless a clustered index is explicitly specified, a unique, nonclustered index is created by default to enforce the UNIQUE constraint.

Is it typically a good practice to constrain it if you know the data
will always be unique but it doesn't necessarily need to be unique for
the application to function correctly?
My question to you: would it make sense for two roles to have different titles but the same description? e.g.
INSERT INTO Role ( Title , Description )
VALUES ( 'CEO' , 'Senior manager' ),
( 'CTO' , 'Senior manager' );
To me it would seem to devalue the use of the description; if there were many duplications then it might make more sense to do something more like this:
INSERT INTO Role ( Title )
VALUES ( 'CEO' ),
( 'CTO' );
INSERT INTO SeniorManagers ( Title )
VALUES ( 'CEO' ),
( 'CTO' );
But then again you are not expecting duplicates.
I assume this is a low activity table. You say you fear validating this constraint when doing frequent insertions in other tables. Well, that will not happen (unless there is a trigger we cannot see that might update this table when another table is updated).
Personally, I would ask the designer (business analyst, whatever) to justify not applying a unique constraint. If they cannot then I would impose the unqiue constraint based on common sense. As is usual for such a text column, I would also apply CHECK constraints e.g. to disallow leading/trailing/double spaces, zero-length string, etc.

On SQL Server, the data type tinyint only gives you 256 distinct values. No matter what you do outside of the id column, you're not going to end up with a very big table. It will surely perform quickly even with a dozen indexed columns.
You usually need at least one unique constraint besides the surrogate key, though. If you don't have one, you're liable to end up with data like this.
1 First title First description
2 First title First description
3 First title First description
...
17 Third title Third description
18 First title First description
Tables that permit data like that are usually wrong. Any table that uses foreign key references to this table won't be able to report correctly, say, the number of "First title" used.
I'd argue that allowing multiple, identical titles for roles in a user management system is a design error. I'd probably argue that "title" is a really bad name for that column, too.

Related

Do I really need PRIMARY KEY when using UNIQUE NOT NULL columns?

My knowledge in SQL is limited and I would appreciate someone who could help me to clarify the use of PRIMARY KEY in the following circumstances. I created a table to support ISO country information. I'm using MariaDB 10 but I believe that will not be relevant for the kind of questions I have(?)
CREATE TABLE IF NOT EXISTS python.country
(
iso_code INTEGER( 3) NOT NULL ,
iso_2_alpha VARCHAR( 2) NOT NULL ,
iso_3_alpha VARCHAR( 3) NOT NULL ,
short_name VARCHAR( 32) NOT NULL ,
long_name VARCHAR( 64) NOT NULL ,
flag_link VARCHAR(2000) DEFAULT(NULL),
CONSTRAINT CK_iso_code CHECK (iso_code > 0 AND iso_code <= 999) ,
CONSTRAINT CK_iso_alpha CHECK (
iso_2_alpha RLIKE BINARY '^[A-Z]+$' AND LENGTH(iso_2_alpha) = 2
AND
iso_3_alpha RLIKE BINARY '^[A-Z]+$' AND LENGTH(iso_3_alpha) = 3
) ,
CONSTRAINT CK_names CHECK (
short_name RLIKE '^\\p{L}+(\\.?[[:blank:]]\\p{L}+)*\\p{L}+$'
AND
long_name RLIKE '^\\p{L}+(\\.?[[:blank:]]\\p{L}+)*\\p{L}+$'
) ,
CONSTRAINT UN_short_name UNIQUE (short_name) ,
CONSTRAINT UN_long_name UNIQUE (long_name) ,
CONSTRAINT UN_iso_2_alpha UNIQUE (iso_2_alpha) ,
CONSTRAINT UN_iso_3_alpha UNIQUE (iso_3_alpha)
-- ???
-- CONSTRAINT PK_country PRIMARY KEY (iso_code,iso_2_alpha,iso_3_alpha)
); -- ENGINE = 'InnoDB';
Question 1: Since all main columns (iso_code,iso_2_alpha,iso_3_alpha) are NOT NULL and UNIQUE does make sense to create a composite PRIMARY KEY? I "believe" it's waste of space and time when inserting new elements?
Question 2: Can I use iso_code safely has being the FOREIGN KEY in other table?
Many thanks.

Since all main columns (iso_code,iso_2_alpha,iso_3_alpha) are NOT NULL and UNIQUE does make sense to create a composite PRIMARY KEY? I "believe" it's waste of space and time when inserting new elements?
Your proposed PK is a superkey over existing keys. It's not necessary in and of itself. You could choose to declare one of your unique key constraints as a PK instead but it's not necessary.
Can I use iso_code safely has being the FOREIGN KEY in other table?
If you also mark iso_code as a unique key in this table, that should work fine.
Some people would recommend that every table always have an autogenerated column marked as PK. That's fine so long as you also enforce the logical keys. Unfortunately, many people will just create that auto-PK and no other keys, which means your data is nonsense.
You've chosen (currently) to just have the logical keys. I think that's fine in this case, especially as several (iso_code, iso_2_alpha and iso_3_alpha) are likely to be more compact that the recommended autogenerated column.

Can't comment on performance and efficiency but one thing with composite keys is that when you use them as a primary key, you have to repeat them in your foreign key. I.e, PK iso_code, iso_2_alpha, iso_3_alpha will be additional FK columns in all the tables related. You also have to then query by these 3 columns in your SQL queries. Bit of a PITA IMO when you can simply use a generic, unique self generating column.
If you can use iso_code and you are sure you never ever ever will have the chance to require inserting a duplicate iso_code that has a different iso_2_alpha, iso_3_alpha then go ahead. But, you should future proof and make table more robust and anticipate the unexpected, use a new dedicated id column unrelated to the business, IMHO.

RDBMS primary key design for row versioning

I want to design primary key for my table with row versioning. My table contains 2 main fields : ID and Timestamp, and bunch of other fields. For a unique "ID" , I want to store previous versions of a record. Hence I am creating primary key for the table to be combination of ID and timestamp fields.
Hence to see all the versions of a particular ID, I can give,
Select * from table_name where ID=<ID_value>
To return the most recent version of a ID, I can use
Select * from table_name where ID=<ID_value> ORDER BY timestamp desc
and get the first element.
My question here is, will this query be efficient and run in O(1) instead of scanning the entire table to get all entries matching same ID considering ID field was a part of primary key fields? Ideally to get a result in O(1), I should have provided the entire primary key. If it does need to do entire table scan, then how else can I design my primary key so that I get this request done in O(1)?
Thanks,
Sriram

The canonical reference on this subject is Effective Timestamping in Databases:
https://www.cs.arizona.edu/~rts/pubs/VLDBJ99.pdf
I usually design with a subset of this paper's recommendations, using a table containing a primary key only, with another referencing table that has that key as well change_user, valid_from and valid_until colums with appropriate defaults. This makes referential integrity easy, as well as future value insertion and history retention. Index as appropriate, and consider check constraints or triggers to prevent overlaps and gaps if you expose these fields to the application for direct modification. These have an obvious performance overhead.
We then make a "current values view" which is exposed to developers, and is also insertable via an "instead of" trigger.

It's far easier and better to use the History Table pattern for this.
create table foo (
foo_id int primary key,
name text
);
create table foo_history (
foo_id int,
version int,
name text,
operation char(1) check ( operation in ('u','d') ),
modified_at timestamp,
modified_by text
primary key (foo_id, version)
);
Create a trigger to copy a foo row to foo_history on update or delete.
https://wiki.postgresql.org/wiki/Audit_trigger_91plus for a full example with postgres

Create unique constraint with null columns

I have a table with this layout:
CREATE TABLE Favorites (
FavoriteId uuid NOT NULL PRIMARY KEY,
UserId uuid NOT NULL,
RecipeId uuid NOT NULL,
MenuId uuid
);
I want to create a unique constraint similar to this:
ALTER TABLE Favorites
ADD CONSTRAINT Favorites_UniqueFavorite UNIQUE(UserId, MenuId, RecipeId);
However, this will allow multiple rows with the same (UserId, RecipeId), if MenuId IS NULL. I want to allow NULL in MenuId to store a favorite that has no associated menu, but I only want at most one of these rows per user/recipe pair.
The ideas I have so far are:
Use some hard-coded UUID (such as all zeros) instead of null.
However, MenuId has a FK constraint on each user's menus, so I'd then have to create a special "null" menu for every user which is a hassle.
Check for existence of a null entry using a trigger instead.
I think this is a hassle and I like avoiding triggers wherever possible. Plus, I don't trust them to guarantee my data is never in a bad state.
Just forget about it and check for the previous existence of a null entry in the middle-ware or in a insert function, and don't have this constraint.
I'm using Postgres 9.0. Is there any method I'm overlooking?

Postgres 15 or newer
Postgres 15 adds the clause NULLS NOT DISTINCT. The release notes:
Allow unique constraints and indexes to treat NULL values as not distinct (Peter Eisentraut)
Previously NULL values were always indexed as distinct values, but
this can now be changed by creating constraints and indexes using
UNIQUE NULLS NOT DISTINCT.
With this clause null is treated like just another value, and a UNIQUE constraint does not allow more than one row with the same null value. The task is simple now:
ALTER TABLE favorites
ADD CONSTRAINT favo_uni UNIQUE NULLS NOT DISTINCT (user_id, menu_id, recipe_id);
There are examples in the manual chapter "Unique Constraints".
The clause switches behavior for all keys of the same index. You can't treat null as equal for one key, but not for another.
NULLS DISTINCT remains the default (in line with standard SQL) and does not have to be spelled out.
The same clause works for a UNIQUE index, too:
CREATE UNIQUE INDEX favo_uni_idx
ON favorites (user_id, menu_id, recipe_id) NULLS NOT DISTINCT;
Note the position of the new clause after the key fields.
Postgres 14 or older
Create two partial indexes:
CREATE UNIQUE INDEX favo_3col_uni_idx ON favorites (user_id, menu_id, recipe_id)
WHERE menu_id IS NOT NULL;
CREATE UNIQUE INDEX favo_2col_uni_idx ON favorites (user_id, recipe_id)
WHERE menu_id IS NULL;
This way, there can only be one combination of (user_id, recipe_id) where menu_id IS NULL, effectively implementing the desired constraint.
Possible drawbacks:
You cannot have a foreign key referencing (user_id, menu_id, recipe_id). (It seems unlikely you'd want a FK reference three columns wide - use the PK column instead!)
You cannot base CLUSTER on a partial index.
Queries without a matching WHERE condition cannot use the partial index.
If you need a complete index, you can alternatively drop the WHERE condition from favo_3col_uni_idx and your requirements are still enforced.
The index, now comprising the whole table, overlaps with the other one and gets bigger. Depending on typical queries and the percentage of null values, this may or may not be useful. In extreme situations it may even help to maintain all three indexes (the two partial ones and a total on top).
This is a good solution for a single nullable column, maybe for two. But it gets out of hands quickly for more as you need a separate partial index for every combination of nullable columns, so the number grows binomially. For multiple nullable columns, see instead:
Why doesn't my UNIQUE constraint trigger?
Aside: I advise not to use mixed case identifiers in PostgreSQL.

You could create a unique index with a coalesce on the MenuId:
CREATE UNIQUE INDEX
Favorites_UniqueFavorite ON Favorites
(UserId, COALESCE(MenuId, '00000000-0000-0000-0000-000000000000'), RecipeId);
You'd just need to pick a UUID for the COALESCE that will never occur in "real life". You'd probably never see a zero UUID in real life but you could add a CHECK constraint if you are paranoid (and since they really are out to get you...):
alter table Favorites
add constraint check
(MenuId <> '00000000-0000-0000-0000-000000000000')

You can store favourites with no associated menu in a separate table:
CREATE TABLE FavoriteWithoutMenu
(
FavoriteWithoutMenuId uuid NOT NULL, --Primary key
UserId uuid NOT NULL,
RecipeId uuid NOT NULL,
UNIQUE KEY (UserId, RecipeId)
)

I believe there is an option that combines the previous answers into a more optimal solution.
create table unique_with_nulls (
id serial not null,
name varchar not null,
age int2 not null,
email varchar,
email_discriminator varchar not null generated always as ( coalesce(email::varchar, 0::varchar) ) stored,
constraint uwn_pkey primary key (id)
);
create unique index uwn_name_age_email_uidx on unique_with_nulls(name, age, email_discriminator);
What happens here is that the column email_discriminator will be generated at "insert-or-update-time", as either an actual email, or "0" if the former one is null. Then, your unique index must target the discriminator column.
This way we don't have to create two partial indexes, and we don't loose the ability to use indexed scans on name and age selection only.
Also, you can keep the type of the email column and we don't have any problems with the coalesce function, because email_discriminator is not a foreign key. And you don't have to worry about this column receiving unexpected values because generated columns cannot be written to.
I can see three opinionated drawbacks in this solution, but they are all fine for my needs:
the duplication of data between the email and email_discriminator.
the fact that I must write to a column and read from another.
the need to find a value that is outside the set of acceptable values of email to be the fallback one (and sometimes this could be hard to find or even subjective).

I think there is a semantic problem here. In my view, a user can have a (but only one) favourite recipe to prepare a specific menu. (The OP has menu and recipe mixed up; if I am wrong: please interchange MenuId and RecipeId below)
That implies that {user,menu} should be a unique key in this table. And it should point to exactly one recipe. If the user has no favourite recipe for this specific menu no row should exist for this {user,menu} key pair. Also: the surrogate key (FaVouRiteId) is superfluous: composite primary keys are perfectly valid for relational-mapping tables.
That would lead to the reduced table definition:
CREATE TABLE Favorites
( UserId uuid NOT NULL REFERENCES users(id)
, MenuId uuid NOT NULL REFERENCES menus(id)
, RecipeId uuid NOT NULL REFERENCES recipes(id)
, PRIMARY KEY (UserId, MenuId)
);

Constraint To Prevent Adding Value Which Exists In Another Table

I would like to add a constraint which prevents adding a value to a column if the value exists in the primary key column of another table. Is this possible?
EDIT:
Table: MasterParts
MasterPartNumber (Primary Key)
Description
....
Table: AlternateParts
MasterPartNumber (Composite Primary Key, Foreign Key to MasterParts.MasterPartNumber)
AlternatePartNumber (Composite Primary Key)
Problem - Alternate part numbers for each master part number must not themselves exist in the master parts table.
EDIT 2:
Here is an example:
MasterParts
MasterPartNumber Decription MinLevel MaxLevel ReOderLevel
010-00820-50 Garmin GTN™ 750 1 5 2
AlternateParts
MasterPartNumber AlternatePartNumber
010-00820-50 0100082050
010-00820-50 GTN750

only way I could think of solving this would be writing a checking function(not sure what language you are working with), or trying to play around with table relationships to ensure that it's unique

Why not have a single "part" table with an "is master part" flag and then have an "alternate parts" table that maps a "master" part to one or more "alternate" parts?

Here's one way to do it without procedural code. I've deliberately left out ON UPDATE CASCADE and ON DELETE CASCADE, but in production I'd might use both. (But I'd severely limit who's allowed to update and delete part numbers.)
-- New tables
create table part_numbers (
pn varchar(50) primary key,
pn_type char(1) not null check (pn_type in ('m', 'a')),
unique (pn, pn_type)
);
create table part_numbers_master (
pn varchar(50) primary key,
pn_type char(1) not null default 'm' check (pn_type = 'm'),
description varchar(100) not null,
foreign key (pn, pn_type) references part_numbers (pn, pn_type)
);
create table part_numbers_alternate (
pn varchar(50) primary key,
pn_type char(1) not null default 'a' check (pn_type = 'a'),
foreign key (pn, pn_type) references part_numbers (pn, pn_type)
);
-- Now, your tables.
create table masterparts (
master_part_number varchar(50) primary key references part_numbers_master,
min_level integer not null default 0 check (min_level >= 0),
max_level integer not null default 0 check (max_level >= min_level),
reorder_level integer not null default 0
check ((reorder_level < max_level) and (reorder_level >= min_level))
);
create table alternateparts (
master_part_number varchar(50) not null references part_numbers_master (pn),
alternate_part_number varchar(50) not null references part_numbers_alternate (pn),
primary key (master_part_number, alternate_part_number)
);
-- Some test data
insert into part_numbers values
('010-00820-50', 'm'),
('0100082050', 'a'),
('GTN750', 'a');
insert into part_numbers_master values
('010-00820-50', 'm', 'Garmin GTN™ 750');
insert into part_numbers_alternate (pn) values
('0100082050'),
('GTN750');
insert into masterparts values
('010-00820-50', 1, 5, 2);
insert into alternateparts values
('010-00820-50', '0100082050'),
('010-00820-50', 'GTN750');
In practice, I'd build updatable views for master parts and for alternate parts, and I'd limit client access to the views. The updatable views would be responsible for managing inserts, updates, and deletes. (Depending on your company's policies, you might use stored procedures instead of updatable views.)

Your design is perfect.
But SQL isn't very helpful when you try to implement such a design. There is no declarative way in SQL to enforce your business rule. You'll have to write two triggers, one for inserts into masterparts, checking the new masterpart identifier doesn't yet exist as an alias, and the other one for inserts of aliases checking that the new alias identifier doesn't yet identiy a masterpart.
Or you can do this in the application, which is worse than triggers, from the data integrity point of view.
(If you want to read up on how to enforce constraints of arbitrary complexity within an SQL engine, best coverage I have seen of the topic is in the book "Applied Mathematics for Database Professionals")

Apart that it sounds like a possibly poor design,
You in essence want values spanning two columns in different tables, to be unique.
In order to utilize DBs native capability to check for uniqueness, you can create a 3rd, helper column, which will contain a copy of all the values inside the wanted two columns. And that column will have uniqueness constraint. So for each new value added to one of your target columns, you need to add the same value to the helper column. In order for this to be an inner DB constraint, you can add this by a trigger.
And again, needing to do the above, sounds like an evidence for a poor design.
--
Edit:
Regarding your edit:
You say " Alternate part numbers for each master part number must not themselves exist in the master parts table."
This itself is a design decision, which you don't explain.
I don't know enough about the domain of your problem, but:
If you think of master and alternate parts, as totally different things, there is no reason why you may want "Alternate part numbers for each master part number must not themselves exist in the master parts table". Otherwise, you have a common notion of "parts" be it master or alternate. This means they need to be in the same table, and column.
If the second is true, you need something like this:
table "parts"
columns:
id - pk
is_master - boolean (assuming a part can not be master and alternate at the same time)
description - text
This tables role is to list and describe the parts.
Then you have several ways to denote which part is alternate to which. It depends on whether a part can be alternate to more than one part. And it sounds that anyway one master part can have several alternates.
You can do it in the same table, or create another one.
If same: add column: alternate_to, which will be null for master parts, and will have a foreign key into the id column of the same table.
Otherwise create a table, say "alternatives" with: master_id, alternate_id both referencing with a foreign key to the parts table.
(The first above assumes that a part cannot be alternate to more than one other part. If this is not true, the second will work anyway)

Can I put constraint on column without referring to another table?

I have a text column that should only have 1 of 3 possible strings. To put a constraint on it, I would have to reference another table. Can I instead put the values of the constraint directly on the column without referring to another table?

If this is SQL Server, Oracle, or PostgreSQL, yes, you can use a check constraint.
If it's MySQL, check constraints are recognized but not enforced. You can use an enum, though. If you need a comma-separated list, you can use a set.
However, this is generally frowned upon, since it's definitely not easy to maintain. Just best to create a lookup table and ensure referential integrity through that.

In addition to the CHECK constraint and ENUM data type that other mention, you could also write a trigger to enforce your desired restriction.
I don't necessarily recommend a trigger as a good solution, I'm just pointing out another option that meets your criteria of not referencing a lookup table.
My habit is to define lookup tables instead of using constraints or triggers, when the rule is simply to restrict a column to a finite set of values. The performance impact of checking against a lookup table is no worse than using CHECK constraints or triggers, and it's a lot easier to manage when the set of values might change from time to time.
Also a common task is to query the set of permitted value, for instance to populate a form field in the user interface. When the permitted values are in a lookup table, this is a lot easier than when they're defined in a list of literal values in a CHECK constraint or ENUM definition.
Re comment "how exactly to do lookup without id"
CREATE TABLE LookupStrings (
string VARCHAR(20) PRIMARY KEY
);
CREATE TABLE MainTable (
main_id INT PRIMARY KEY,
string VARCHAR(20) NOT NULL,
FOREIGN KEY (string) REFERENCES LookupStrings (string)
);
Now you can be assured that no value in MainTable.string is invalid, since the referential integrity prevents that. But you don't have to join to the LookupStrings table to get the string, when you query MainTable:
SELECT main_id, string FROM MainTable;
See? No join! But you get the string value.
Re comment about multiple foreign key columns:
You can have two individual foreign keys, each potentially pointing to different rows in the lookup table. The foreign key column doesn't have to be named the same as the column in the referenced table.
My common example is a bug-tracking database, where a bug was reported by one user, but assigned to be fixed by a different user. Both reported_by and assigned_to are foreign keys referencing the Accounts table.
CREATE TABLE Bugs (
bug_id INT PRIMARY KEY,
reported_by INT NOT NULL,
assigned_to INT,
FOREIGN KEY (reported_by) REFERENCES Accounts (account_id),
FOREIGN KEY (assigned_to) REFERENCES Accounts (account_id)
);

In Oracle, SQL Server and PostgreSQL, use CHECK constraint.
CREATE TABLE mytable (myfield INT VARCHAR(50) CHECK (myfield IN ('first', 'second', 'third'))
In MySQL, use ENUM datatype:
CREATE TABLE mytable (myfield ENUM ('first', 'second', 'third'))

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas