How to enforce a uniqueness constraint in a SQL database based on a certain condition - sql

I'm working on a SQL database, and I have the following table:
Workout Routine
id serial PRIMARY KEY,
user_id integer
REFERENCES users (id)
ON DELETE CASCADE
NOT NULL,
name varchar(255) NOT NULL,
active boolean DEFAULT TRUE,
UNIQUE(user_id, name)
Currently, the combination of user_id, and name are supposed to be unique, meaning that a user cannot have two workout routines with the same name. However, this is not quite what I want.
Instead, I would want the combination user_id and name to be unique only in cases where active = true. In other words, a user should be able to have multiple workouts with the same name that are inactive, but should not be allowed to have duplicates that are active.
Is there a way to enforce this in this table?

A partial index can be used for this:
CREATE UNIQUE INDEX ON table_name (user_id, name) WHERE active;
The fiddle

You can use a partial index to achieve this. The index will only be used for queries that include the active column, and will only be used for queries that include the active column with a value of true. This means that queries that do not include the active column will not use the index, and queries that include the active column with a value of false will not use the index.
CREATE UNIQUE INDEX workout_routine_user_id_name_active
ON workout_routine (user_id, name)
WHERE active = true;

I'm an Oracle guy - but perhaps I can be of help here. Yes, you can model what you want in several ways.
One way is to create two tables, one historical, the other current. The historical would have no unique index other than the PK on the surrogate key ID, whereas the current would also have a unique index on user_id and name.
The second way, using a single table, is to add a nullable date field that represents the closed/inactive date. NULL means active, non-NULL (a date value) would mean inactive. Create a unique index (not a PK) on user_id,name,inactive_date. If SQL Server is like Oracle and allows NULL values in a unique constraint but not multiple NULL values, that will enforce that there can be only one instance of a name for a user_id that is current (having a NULL inactive_date), but allows there to be many inactive rows since they would all have different date values.
If SQL Server acts differently than Oracle then check out the "NULLS NOT DISTINCT" option.

Related

Foreign and Primary Key Conceptual Questions

I am a newbie at SQL/PostgreSQL, and I had a conceptual question about foreign keys and keys in general:
Let's say I have two tables: Table A and Table B.
A has a bunch of columns, two of which are A.id, A.seq. The primary key is btree(A.id, A.seq) and it has a foreign key constraint that A.id references B.id. Note that A.seq is a sequential column (first row has value 1, second has 2, etc).
Now say B has a bunch of columns, one of which is the above mentioned B.id. The primary key is btree(B.id).
I have the following questions:
What exactly does btree do? What is the significance of having two column names in the btree rather than just one (as in btree(B.id)).
Why is it important that A references B instead of B referencing A? Why does order matter when it comes to foreign keys??
Thanks so much! Please correct me if I used incorrect terminology for anything.
EDIT: I am using postgres
A btree index stored values in sorted order, which means you can not only search for a single primary key value, but you can also efficiently search for a range of values:
SELECT ... WHERE id between 6060842 AND 8675309
PostgreSQL also supports other index types, but only btree is supported for a unique index (for example, the primary key).
In your B table, the primary key being a single column id means that only one row can exist for each value in id. In other words, it is unique, and if you search for a value by primary key, it will find at most one row (it may also find zero rows if you don't have a row with that value).
The primary key in your A table is for (id, seq). This means you can have multiple rows for each value of id. You can also have multiple rows for each value of seq as long as they are for different id values. The combination must be unique though. You can't have more than one row with the same pair of values.
When a foreign key in A references B, it means that the row must exist in B before you are allowed to store the row in A with the same id value. But the reverse is not necessary.
Example:
Suppose B is for users and A is for phones. You must store a user before you can store a phone record for that user. You can store one or more phones for that user, one per row in the A table. We say that a row in A therefore references the user row in B, meaning, "this phone belongs to user #1234."
But the reverse is not restricted. A user might have no phones (at least not known to this database), so there is no requirement for B to reference A. In other words, it is not required for a user to have a phone. You can store a row in B (the user) even if there is no row in A (the phones) for that user.
The reference also means you are not allowed to DELETE FROM B WHERE id = ? if there are rows in another table that reference that given row in B. Deleting that user would cause those other rows to become orphaned. No one would be able to know who those phones belonged to, if the user row they reference is deleted.
To your questions:
There are several strategies to implement unique keys. The most common one is to use an index that is "unique" using a "b-tree" strategy. That's what "btree" means in PostgreSQL.
Having two columns in a key just depends on how you want to design your table. When you have a key with more than one column that is called a "composite key".
When A references B, the columns in B must represent a "key". The columns in A do not represent a key, but just a reference to one. In fact the values in A for that column can be repeated; that is, multiple rows in A can point to the same row in B.
Your data structure makes no sense. Why would the primary key of A haver both id and name? Normally it would just be id. In some data models, you might have a version or timestamp added. I can't think of a reasonable data model where name would also be included.
In addition, B's foreign key would have to be to both id and name.
But, your question is what is btree for? Most databases don't have such an option. A primary key would typically be expressed as:
id int primary key;
constraint unq_t_id primary key (id);
btree is a type of index -- in fact the default type of index in all databases that I'm aware of. Databases that have a plethora of available types of indexes -- such as Postgres -- you can specify the index type associated with the primary key.

Constraint to limit count of value to one per group

I have a table with an Employee_ID, Position_ID, and Active_Status. The Employee_ID and Position_ID are both foreign keys and a combined key in this table. An employee may hold several positions but they should never have more than one active position at any given time. Is there a constraint that can achive this limitation?
Clearly incorect code below but something like,
CONSTRAINT chkStatus CHECK ((SELECT COUNT(ACTIVE) FROM EMPLOYEE_DETAIL WHERE ACTIVE = 'Y' GROUP BY EMPLOYEE_ID) = 1)
Multirow check constraints (aka assertions) are not supported by any major RDBMS.
SQL Assertions / Declarative multi-row constraints
You could use filtered/partial UNIQUE index instead:
CREATE UNIQUE INDEX idx ON EMPLOYEE_DETAIL(Employee_id) WHERE ACTIVE = 'Y';
Unique Partial Indexes
A partial index definition may include the UNIQUE keyword. If it does, then SQLite requires every entry in the index to be unique. This provides a mechanism for enforcing uniqueness across some subset of the rows in a table.

Allow Foreign Key creation without values but enforce not null on update or add

I need to add a foreign key to a table.
The column value will be added by the users eventually over time as each record is modified.
So I created a FK constraint with NOCHECK. But the column can't be left empty, so I declared it as NOT NULL. SQL Server won't let me do this on a nullable column, it must have a value or a default.
How can I create a FK that at first will have no entries but should it be modified a Key value must be provided?
It sounds like you have an optional relationship - modelling the FK as NULLable while still enforcing the constraint is a common approach here (NULLs FK's will not be checked by the RDBMS). e.g. In the below, we have a Person table, but initially during Person capture, we do not know in which City the user lives, so we make the CityId foreign key nullable:
CREATE TABLE City
(
CityId INT NOT NULL PRIMARY KEY,
Name NVARCHAR(200)
);
CREATE TABLE Person
(
PersonId INT NOT NULL PRIMARY KEY,
CityId INT NULL, -- Nullable
Name NVARCHAR(200),
CONSTRAINT FK_PersonCity FOREIGN KEY(CityID) REFERENCES City(CityID)
);
Person will now allow CityID to be NULL (meaning not known), or if non-null, the CityID must be a valid and exist in City. The only 'gotcha' here is that you will need to LEFT OUTER JOIN Person back to City to consider persons with unknown City.
Another strategy I've seen is to add a DUMMY row into the primary table (e.g. City(ID = 0, Name = 'Unknown') and then use this as a default for foreign keys - it will guarantee an INNER JOIN.
If you need to enforce the requirement that at a later step that the Foreign Key (City classification) becomes mandatory, I would suggest you do this in your business logic - either in code, or from a stored procedure.
When you add a column to an existing table, the new column is created with NULL values. So you can't add a not null column to a table unless you also specify a default value. In that case, the new field will contain the default value.
If you want to add a new column which will be a mandatory FK, you really only have two options:
Create the column as nullable, wait until all the fields have been filled, then change the definition of the column to not null.
Create the column as nullable, execute an update filling the fields with the correct value, then change the definition of the column to not null.
The advantage of the first option is that you're not holding locks on the table for very long. The disadvantage is that it could take weeks or months to finally getting around to populating every row.
The advantage of the second option is that the field is created and filled very quickly and all of it can be placed in one transaction. The disadvantage is that you are locking the entire table for the duration. It could be very difficult obtaining the lock in the first place and your DBA might have a few choice words about locking it for more than a few seconds.
Get with the aforementioned DBA and decide which way is best for your particular situation.

Should I use a unique constraint in a table even though it isn't necessarily required?

In Microsoft SQL Server, when creating tables, are there any downsides to using a unique constraint on a column even though you don't really need it to be unique?
An example would be descriptions for say a role in a user management system:
CREATE TABLE Role
(
ID TINYINT PRIMARY KEY NOT NULL IDENTITY(0, 1),
Title CHARACTER VARYING(32) NOT NULL UNIQUE,
Description CHARACTER VARYING(MAX) NOT NULL UNIQUE
)
My fear is that validating this constraint when doing frequent insertions in other tables will be a very time consuming process. I am unsure as to how this constraint is validated, but I feel like it could be done in a very efficient way or as a linear comparison.
Your fear becomes true: UNIQUE constraint are implemented as indices, and this is time and space consuming.
So, whenever you insert a new row, the database have to update the table, and also one index for each unique constraint.
So, according to you:
using a unique constraint on a column even though you don't really need it to be unique
the answer is no, don't use it. there are time and space downsides.
Your sample table would need a clustered index for the Id, and 2 extra indices, one for each unique constraint. This takes up space, and time to update the 3 indices on the inserts.
This would only be justified if you made queries filtering by those fields.
BY THE WAY:
The original post sample table have several flaws:
that syntax is not SQL Server syntax (and you tagged this as SQL Server)
you cannot create an index in a varchar(max) column
If you correct the syntax and create this table:
CREATE TABLE Role
(
ID tinyint PRIMARY KEY NOT NULL IDENTITY(0, 1),
Title varchar(32) NOT NULL UNIQUE,
Description varchar(32) NOT NULL UNIQUE
)
You can then execute sp_help Role and you'll find the 3 indices.
The database creates an index which backs up the UNIQUE constraint, so it should be very low-cost to do the uniqueness check.
http://msdn.microsoft.com/en-us/library/ms177420.aspx
The Database Engine automatically creates a UNIQUE index to enforce the uniqueness requirement of the UNIQUE constraint. Therefore, if an attempt to insert a duplicate row is made, the Database Engine returns an error message that states the UNIQUE constraint has been violated and does not add the row to the table. Unless a clustered index is explicitly specified, a unique, nonclustered index is created by default to enforce the UNIQUE constraint.
Is it typically a good practice to constrain it if you know the data
will always be unique but it doesn't necessarily need to be unique for
the application to function correctly?
My question to you: would it make sense for two roles to have different titles but the same description? e.g.
INSERT INTO Role ( Title , Description )
VALUES ( 'CEO' , 'Senior manager' ),
( 'CTO' , 'Senior manager' );
To me it would seem to devalue the use of the description; if there were many duplications then it might make more sense to do something more like this:
INSERT INTO Role ( Title )
VALUES ( 'CEO' ),
( 'CTO' );
INSERT INTO SeniorManagers ( Title )
VALUES ( 'CEO' ),
( 'CTO' );
But then again you are not expecting duplicates.
I assume this is a low activity table. You say you fear validating this constraint when doing frequent insertions in other tables. Well, that will not happen (unless there is a trigger we cannot see that might update this table when another table is updated).
Personally, I would ask the designer (business analyst, whatever) to justify not applying a unique constraint. If they cannot then I would impose the unqiue constraint based on common sense. As is usual for such a text column, I would also apply CHECK constraints e.g. to disallow leading/trailing/double spaces, zero-length string, etc.
On SQL Server, the data type tinyint only gives you 256 distinct values. No matter what you do outside of the id column, you're not going to end up with a very big table. It will surely perform quickly even with a dozen indexed columns.
You usually need at least one unique constraint besides the surrogate key, though. If you don't have one, you're liable to end up with data like this.
1 First title First description
2 First title First description
3 First title First description
...
17 Third title Third description
18 First title First description
Tables that permit data like that are usually wrong. Any table that uses foreign key references to this table won't be able to report correctly, say, the number of "First title" used.
I'd argue that allowing multiple, identical titles for roles in a user management system is a design error. I'd probably argue that "title" is a really bad name for that column, too.

Create unique constraint with null columns

I have a table with this layout:
CREATE TABLE Favorites (
FavoriteId uuid NOT NULL PRIMARY KEY,
UserId uuid NOT NULL,
RecipeId uuid NOT NULL,
MenuId uuid
);
I want to create a unique constraint similar to this:
ALTER TABLE Favorites
ADD CONSTRAINT Favorites_UniqueFavorite UNIQUE(UserId, MenuId, RecipeId);
However, this will allow multiple rows with the same (UserId, RecipeId), if MenuId IS NULL. I want to allow NULL in MenuId to store a favorite that has no associated menu, but I only want at most one of these rows per user/recipe pair.
The ideas I have so far are:
Use some hard-coded UUID (such as all zeros) instead of null.
However, MenuId has a FK constraint on each user's menus, so I'd then have to create a special "null" menu for every user which is a hassle.
Check for existence of a null entry using a trigger instead.
I think this is a hassle and I like avoiding triggers wherever possible. Plus, I don't trust them to guarantee my data is never in a bad state.
Just forget about it and check for the previous existence of a null entry in the middle-ware or in a insert function, and don't have this constraint.
I'm using Postgres 9.0. Is there any method I'm overlooking?
Postgres 15 or newer
Postgres 15 adds the clause NULLS NOT DISTINCT. The release notes:
Allow unique constraints and indexes to treat NULL values as not distinct (Peter Eisentraut)
Previously NULL values were always indexed as distinct values, but
this can now be changed by creating constraints and indexes using
UNIQUE NULLS NOT DISTINCT.
With this clause null is treated like just another value, and a UNIQUE constraint does not allow more than one row with the same null value. The task is simple now:
ALTER TABLE favorites
ADD CONSTRAINT favo_uni UNIQUE NULLS NOT DISTINCT (user_id, menu_id, recipe_id);
There are examples in the manual chapter "Unique Constraints".
The clause switches behavior for all keys of the same index. You can't treat null as equal for one key, but not for another.
NULLS DISTINCT remains the default (in line with standard SQL) and does not have to be spelled out.
The same clause works for a UNIQUE index, too:
CREATE UNIQUE INDEX favo_uni_idx
ON favorites (user_id, menu_id, recipe_id) NULLS NOT DISTINCT;
Note the position of the new clause after the key fields.
Postgres 14 or older
Create two partial indexes:
CREATE UNIQUE INDEX favo_3col_uni_idx ON favorites (user_id, menu_id, recipe_id)
WHERE menu_id IS NOT NULL;
CREATE UNIQUE INDEX favo_2col_uni_idx ON favorites (user_id, recipe_id)
WHERE menu_id IS NULL;
This way, there can only be one combination of (user_id, recipe_id) where menu_id IS NULL, effectively implementing the desired constraint.
Possible drawbacks:
You cannot have a foreign key referencing (user_id, menu_id, recipe_id). (It seems unlikely you'd want a FK reference three columns wide - use the PK column instead!)
You cannot base CLUSTER on a partial index.
Queries without a matching WHERE condition cannot use the partial index.
If you need a complete index, you can alternatively drop the WHERE condition from favo_3col_uni_idx and your requirements are still enforced.
The index, now comprising the whole table, overlaps with the other one and gets bigger. Depending on typical queries and the percentage of null values, this may or may not be useful. In extreme situations it may even help to maintain all three indexes (the two partial ones and a total on top).
This is a good solution for a single nullable column, maybe for two. But it gets out of hands quickly for more as you need a separate partial index for every combination of nullable columns, so the number grows binomially. For multiple nullable columns, see instead:
Why doesn't my UNIQUE constraint trigger?
Aside: I advise not to use mixed case identifiers in PostgreSQL.
You could create a unique index with a coalesce on the MenuId:
CREATE UNIQUE INDEX
Favorites_UniqueFavorite ON Favorites
(UserId, COALESCE(MenuId, '00000000-0000-0000-0000-000000000000'), RecipeId);
You'd just need to pick a UUID for the COALESCE that will never occur in "real life". You'd probably never see a zero UUID in real life but you could add a CHECK constraint if you are paranoid (and since they really are out to get you...):
alter table Favorites
add constraint check
(MenuId <> '00000000-0000-0000-0000-000000000000')
You can store favourites with no associated menu in a separate table:
CREATE TABLE FavoriteWithoutMenu
(
FavoriteWithoutMenuId uuid NOT NULL, --Primary key
UserId uuid NOT NULL,
RecipeId uuid NOT NULL,
UNIQUE KEY (UserId, RecipeId)
)
I believe there is an option that combines the previous answers into a more optimal solution.
create table unique_with_nulls (
id serial not null,
name varchar not null,
age int2 not null,
email varchar,
email_discriminator varchar not null generated always as ( coalesce(email::varchar, 0::varchar) ) stored,
constraint uwn_pkey primary key (id)
);
create unique index uwn_name_age_email_uidx on unique_with_nulls(name, age, email_discriminator);
What happens here is that the column email_discriminator will be generated at "insert-or-update-time", as either an actual email, or "0" if the former one is null. Then, your unique index must target the discriminator column.
This way we don't have to create two partial indexes, and we don't loose the ability to use indexed scans on name and age selection only.
Also, you can keep the type of the email column and we don't have any problems with the coalesce function, because email_discriminator is not a foreign key. And you don't have to worry about this column receiving unexpected values because generated columns cannot be written to.
I can see three opinionated drawbacks in this solution, but they are all fine for my needs:
the duplication of data between the email and email_discriminator.
the fact that I must write to a column and read from another.
the need to find a value that is outside the set of acceptable values of email to be the fallback one (and sometimes this could be hard to find or even subjective).
I think there is a semantic problem here. In my view, a user can have a (but only one) favourite recipe to prepare a specific menu. (The OP has menu and recipe mixed up; if I am wrong: please interchange MenuId and RecipeId below)
That implies that {user,menu} should be a unique key in this table. And it should point to exactly one recipe. If the user has no favourite recipe for this specific menu no row should exist for this {user,menu} key pair. Also: the surrogate key (FaVouRiteId) is superfluous: composite primary keys are perfectly valid for relational-mapping tables.
That would lead to the reduced table definition:
CREATE TABLE Favorites
( UserId uuid NOT NULL REFERENCES users(id)
, MenuId uuid NOT NULL REFERENCES menus(id)
, RecipeId uuid NOT NULL REFERENCES recipes(id)
, PRIMARY KEY (UserId, MenuId)
);