SQL can I have a "conditionally unique" constraint on a table? - sql

I've had this come up a couple times in my career, and none of my local peers seems to be able to answer it. Say I have a table that has a "Description" field which is a candidate key, except that sometimes a user will stop halfway through the process. So for maybe 25% of the records this value is null, but for all that are not NULL, it must be unique.
Another example might be a table which must maintain multiple "versions" of a record, and a bit value indicates which one is the "active" one. So the "candidate key" is always populated, but there may be three versions that are identical (with 0 in the active bit) and only one that is active (1 in the active bit).
I have alternate methods to solve these problems (in the first case, enforce the rule code, either in the stored procedure or business layer, and in the second, populate an archive table with a trigger and UNION the tables when I need a history). I don't want alternatives (unless there are demonstrably better solutions), I'm just wondering if any flavor of SQL can express "conditional uniqueness" in this way. I'm using MS SQL, so if there's a way to do it in that, great. I'm mostly just academically interested in the problem.

If you are using SQL Server 2008 a Index filter would maybe your solution:
http://msdn.microsoft.com/en-us/library/ms188783.aspx
This is how I enforce a Unique Index with multiple NULL values
CREATE UNIQUE INDEX [IDX_Blah] ON [tblBlah] ([MyCol]) WHERE [MyCol] IS NOT NULL

In the case of descriptions which are not yet completed, I wouldn't have those in the same table as the finalized descriptions. The final table would then have a unique index or primary key on the description.
In the case of the active/inactive, again I might have separate tables as you did with an "archive" or "history" table, but another possible way to do it in MS SQL Server at least is through the use of an indexed view:
CREATE TABLE Test_Conditionally_Unique
(
my_id INT NOT NULL,
active BIT NOT NULL DEFAULT 0
)
GO
CREATE VIEW dbo.Test_Conditionally_Unique_View
WITH SCHEMABINDING
AS
SELECT
my_id
FROM
dbo.Test_Conditionally_Unique
WHERE
active = 1
GO
CREATE UNIQUE CLUSTERED INDEX IDX1 ON Test_Conditionally_Unique_View (my_id)
GO
INSERT INTO dbo.Test_Conditionally_Unique (my_id, active)
VALUES (1, 0)
INSERT INTO dbo.Test_Conditionally_Unique (my_id, active)
VALUES (1, 0)
INSERT INTO dbo.Test_Conditionally_Unique (my_id, active)
VALUES (1, 0)
INSERT INTO dbo.Test_Conditionally_Unique (my_id, active)
VALUES (1, 1)
INSERT INTO dbo.Test_Conditionally_Unique (my_id, active)
VALUES (2, 0)
INSERT INTO dbo.Test_Conditionally_Unique (my_id, active)
VALUES (2, 1)
INSERT INTO dbo.Test_Conditionally_Unique (my_id, active)
VALUES (2, 1) -- This insert will fail
You could use this same method for the NULL/Valued descriptions as well.

Thanks for the comments, the initial version of this answer was wrong.
Here's a trick using a computed column that effectively allows a nullable unique constraint in SQL Server:
create table NullAndUnique
(
id int identity,
name varchar(50),
uniqueName as case
when name is null then cast(id as varchar(51))
else name + '_' end,
unique(uniqueName)
)
insert into NullAndUnique default values
insert into NullAndUnique default values -- Works
insert into NullAndUnique default values -- not accidentally :)
insert into NullAndUnique (name) values ('Joel')
insert into NullAndUnique (name) values ('Joel') -- Boom!
It basically uses the id when the name is null. The + '_' is to avoid cases where name might be numeric, like 1, which could collide with the id.

I'm not entirely aware of your intended use or your tables, but you could try using a one to one relationship. Split out this "sometimes" unique column into a new table, create the UNIQUE index on that column in the new table and FK back to the original table using the original tables PK. Only have a row in this new table when the "unique" data is supposed to exist.
OLD tables:
TableA
ID pk
Col1 sometimes unique
Col...
NEW tables:
TableA
ID
Col...
TableB
ID PK, FK to TableA.ID
Col1 unique index

Oracle does. A fully null key is not indexed by a Btree in index in Oracle, and Oracle uses Btree indexes to enforce unique constraints.
Assuming one wished to version ID_COLUMN based on the ACTIVE_FLAG being set to 1:
CREATE UNIQUE INDEX idx_versioning_id ON mytable
(CASE active_flag WHEN 0 THEN NULL ELSE active_flag END,
CASE active_flag WHEN 0 THEN NULL ELSE id_column END);

Related

SQL Server: Unique Index on single values of two columns (!!! Not Combination)

I have a table for teams where each team has two codes. A code for teammembers and a code for the teamleader.
TeamId Name MemberCode LeaderCode
--------------------------------------------
1 Team1 CodeXY CodeXYZ
2 Team2 CodeAB CodeBC
...
There are two unique indexes, one on MemberCode and one on LeaderCode securing that MemberCodes and LeaderCodes are unique.
But how can I define the not only MemberCodes itself are unqiue, but MemberCodes and LeaderCodes?
No MemberCode should be a LeaderCode.
Someone got an idea?
P.S.: A unique index on the two columns like Create Unique index UIDX_12 On tbl (MemberCode, LeaderCode) is no option!
With this data structure, I think you would have to have a trigger.
You can reformat the data, so you have one table and (at least) three columns:
TeamId
Code
CodeType
Then you can add constraints:
codetype is only 'member' or 'leader'
code is unique
teamid is in the teamid table
teamid/codetype is unique
This will allow you to store exactly one of each of these values for each team (assuming that the values are not NULL).
In a create table statement, this might look something like:
create table . . .
check codetype in ('member', 'leader'),
unique(code),
teamid references teams(teamid),
unique (teamid, codetype)
. . .
You can enforce this constraint with an indexed view. Something like:
create table dbo.MColumnUnique (
MemberName int not null,
LeaderName int not null
)
go
create table dbo.Two (ID int not null primary key,constraint CK_Two_ID CHECK (ID in (1,2)))
go
insert into dbo.Two(ID) values (1),(2)
go
create view dbo.MColumnUnique_Enforcer (Name)
with schemabinding
as
select
CASE WHEN ID = 1 THEN MemberName ELSE LeaderName END
from
dbo.MColumnUnique
cross join
dbo.Two
go
create unique clustered index IX_MColumnUnique_Enforcer on dbo.MColumnUnique_Enforcer (Name)
go
insert into dbo.MColumnUnique (MemberName,LeaderName) values (1,2),(3,4) --Works
go
insert into dbo.MColumnUnique (MemberName,LeaderName) values (4,5) --Fails
go
insert into dbo.MColumnUnique (MemberName,LeaderName) values (6,6) --Fails
Where hopefully you can see the parallels between my above structure and your tables.
dbo.Two is just a generally helpful helper table that contains exactly two rows, and is used to perform a limited unpivot on the data into a single column.
You could do it with a trigger, but I would use a CHECK CONSTRAINT.
Create a function that takes a varchar parameter (or whatever the datatype you use for MemberCode and LeaderCode), and returns a bit: 0 if there is no LeaderCode or MemberCode that matches the parameter value, or 1 if there is a match.
Then put a check constraint on the table that specifies:
MemberCode <> LeaderCode AND
YourFunction(MemberCode) = 0 AND
YourFunction(LeaderCode) = 0
EDIT based on Damien's comment:
To prevent the function from including the row you just added, you need to also pass the [code] column (which you say is UNIQUE), and not count the row with that value for [code].

Database Normalization using Foreign Key

I have a sample table like below where Course Completion Status of a Student is being stored:
Create Table StudentCourseCompletionStatus
(
CourseCompletionID int primary key identity(1,1),
StudentID int not null,
AlgorithmCourseStatus nvarchar(30),
DatabaseCourseStatus nvarchar(30),
NetworkingCourseStatus nvarchar(30),
MathematicsCourseStatus nvarchar(30),
ProgrammingCourseStatus nvarchar(30)
)
Insert into StudentCourseCompletionStatus Values (1, 'In Progress', 'In Progress', 'Not Started', 'Completed', 'Completed')
Insert into StudentCourseCompletionStatus Values (2, 'Not Started', 'In Progress', 'Not Started', 'Not Applicable', 'Completed')
Now as part of normalizing the schema I have created two other tables - CourseStatusType and Status for storing the Course Status names and Status.
Create Table CourseStatusType
(
CourseStatusTypeID int primary key identity(1,1),
CourseStatusType nvarchar(100) not null
)
Insert into CourseStatusType Values ('AlgorithmCourseStatus')
Insert into CourseStatusType Values ('DatabaseCourseStatus')
Insert into CourseStatusType Values ('NetworkingCourseStatus')
Insert into CourseStatusType Values ('MathematicsCourseStatus')
Insert into CourseStatusType Values ('ProgrammingCourseStatus')
Insert into CourseStatusType Values ('OperatingSystemsCourseStatus')
Insert into CourseStatusType Values ('CompilerCourseStatus')
Create Table Status
(
StatusID int primary key identity(1,1),
StatusName nvarchar (100) not null
)
Insert into Status Values ('Completed')
Insert into Status Values ('Not Started')
Insert into Status Values ('In Progress')
Insert into Status Values ('Not Applicable')
The modified table is as below:
Create Table StudentCourseCompletionStatus1
(
CourseCompletionID int primary key identity(1,1),
StudentID int not null,
CourseStatusTypeID int not null CONSTRAINT [FK_StudentCourseCompletionStatus1_CourseStatusType] FOREIGN KEY (CourseStatusTypeID) REFERENCES dbo.CourseStatusType (CourseStatusTypeID),
StatusID int not null CONSTRAINT [FK_StudentCourseCompletionStatus1_Status] FOREIGN KEY (StatusID) REFERENCES Status (StatusID),
)
I have few question on this:
Is this the correct way to normalize it ? The old table was very helpful to get data easily - I can store a student's course status in a single row, but now 5 rows are required. Is there a better way to do it?
Moving the data from the old table to this new table seems to be not an easy task. Can I achieve this using a query or I have to manually to do this ?
Any help is appreciated.
vou could also consider storing results in flat table like this:
studentID,courseID,status
1,1,"completed"
1,2,"not started"
2,1,"not started"
2,3,"in progress"
you will also need additional Courses table like this
courserId,courseName
1, math
2, programming
3, networking
and a students table
students
1 "john smith"
2 "perry clam"
3 "john deere"
etc..you could also optionally create a status table to store the distinct statusstrings statusstings and refer to their PK instead ofthestrings
studentID,courseID,status
1,1,1
1,2,2
2,1,2
2,3,3
... etc
and status table
id,status
1,"completed"
2,"not started"
3,"in progress"
the beauty of this representation is: it is quite easy to filter and aggregate data , i.e it is easy to query which subjects a particular person have completed, how many subjects are completed by an average student, etc. this things are much more difficult in the columnar design like you had. you can also easily add new subjects without the need to adapt your tables or even queries they,will just work.
you can also always usin SQLs PIVOT query to get it to a familiar columnar presentation like
name,mathstatus,programmingstatus,networkingstatus,etc..
but now 5 rows are required
No, it's still just one row. That row simply contains identifiers for values stored in other tables.
There are pros and cons to this. One of the main reasons to normalize in this way is to protect the integrity of the data. If a column is just a string then anything can be stored there. But if there's a foreign key relationship to a table containing a finite set of values then only one of those options can be stored there. Additionally, if you ever want to change the text of an option or add/remove options, you do it in a centralized place.
Moving the data from the old table to this new table seems to be not an easy task.
No problem at all. Create your new numeric columns on the data table and populate them with the identifiers of the lookup table records associated with each data table record. If they're nullable, you can make them foreign keys right away. If they're not nullable then you need to populate them before you can make them foreign keys. Once you've verified that the data is correct, remove the old de-normalized columns. Done.
In StudentCourseCompletionStatus1 you still need 2 associations to Status and CourseStatusType. So I think you should consider following variant of normalization:
It means, that your StudentCourseCompletionStatus would hold only one CourseStatusID and another table CourseStatus would hold the associations to CourseType and Status.
To move your data you can surely use a query.

SQL : Is it possible to store duplicate null values in a column with unique constraint

How do you define unique constraint in terms of nulls ?
Is there any standard followed by all the databases to allow nulls in unique column.
Quoted from a blog, here is the amount of NULL values allowed in a column with a UNIQUE constraint, in a few DBMS :
What other DBMSs do
Trudy Pelzer and I wrote this on page 260 of our book, SQL Performance
Tuning: DBMS Maximum number of NULLs when there is a UNIQUE
constraint
IBM (DB2) One
Informix One
Ingres Zero
InterBase Zero
Microsoft (SQL Server) One
MySQL Many [although the BDB storage engine was an exception]
Oracle Many
Sybase One
Trying the following code in Sql Server (2012 Express, at least) fails, as expected :
DECLARE #T TABLE
(
value int unique
)
INSERT INTO #T VALUES(1)
INSERT INTO #T VALUES(2)
INSERT INTO #T VALUES(NULL)
INSERT INTO #T VALUES(NULL)
The article mentions that the correct rule about handling null values in a column with a unique constraint should be that the value of the unique column of any two rows, if both are not NULL, should not be equal.
In other words, non null values must be unique, and multiple null values are allowed.
But not all DBMS seems to apply that rule, so it's better to confirm it with a simple test.
"...a UNIQUE column can have multiple NULLs in a row unless you explicitly add a NOT NULL constraint to each column."
Joe Celko's "SQL for Smarties", 2nd edition, Page 16.
For some fun with NULL-s, try these:
SELECT 1 = NULL
(returns NULL)
SELECT 1 IS NULL
(returns FALSE)
It depends on the database engine. MySQL allow null duplicates but SQL Server doesn't.
You can try this on SQLFiddle. MySQL return an exception for the user 'ddd' while MySQL return an exception for the user 'ccc'
CREATE TABLE users (
username varchar(20) NOT NULL,
id int unique
);
insert into users values ('aaa', null);
insert into users values ('bbb', 1);
insert into users values ('ccc', null);
insert into users values ('ddd', 1);

How to add a DB restriction - Check constraint or trigger

I want to know how can I add a DB restriction on a table. I want to simplify the problem with a table in Oracle Database as
CREATE TABLE TEST_STUDENT
(
STUDENT VARCHAR2(30 CHAR),
SUBJECT VARCHAR2(38) ,
IS_LANG NUMBER(1,0)
);
A student can have any number of subjects but only one of them can be a language (IS_LANG).
Valid data would be
Insert into TEST_STUDENT (STUDENT,SUBJECT,IS_LANG) values ('John','Math',);
Insert into TEST_STUDENT (STUDENT,SUBJECT,IS_LANG) values ('John','Science',);
Insert into TEST_STUDENT (STUDENT,SUBJECT,IS_LANG) values ('John','French',1);
Insert into TEST_STUDENT (STUDENT,SUBJECT,IS_LANG) values ('Lily','Math',);
Insert into TEST_STUDENT (STUDENT,SUBJECT,IS_LANG) values ('Lily','English',1);
however, I should not be able to insert fresh data like up the table, something like
Insert into TEST_STUDENT (STUDENT,SUBJECT,IS_LANG) values ('John','English',1);
or
Insert into TEST_STUDENT (STUDENT,SUBJECT,IS_LANG) values ('Lily','French',1);
I don't want to introduce triggers here, unless it is the only way around. I want to have this restrictions because in the actual software would have multiple client implementations trying to insert data into this table.
This one of the good examples for a partial index.
Unfortunately in Oracle you need a workaround to implement a partial index (other DBMS simply allow a WHERE clause to be applied):
create unique index idx_one_language
on test_student
(
case when is_lang = 1 then student else null end
);
This exploits the fact that Oracle does not index tuples where all columns are null. With the above expression only rows where IS_LANG = 1 will be indexed for each student. As the index is defined as unique, only one such row can exist.
Here is a SQLFiddle example: http://sqlfiddle.com/#!4/43394d/1

conditional unique constraint

I have a situation where i need to enforce a unique constraint on a set of columns, but only for one value of a column.
So for example I have a table like Table(ID, Name, RecordStatus).
RecordStatus can only have a value 1 or 2 (active or deleted), and I want to create a unique constraint on (ID, RecordStatus) only when RecordStatus = 1, since I don't care if there are multiple deleted records with the same ID.
Apart from writing triggers, can I do that?
I am using SQL Server 2005.
Behold, the filtered index. From the documentation (emphasis mine):
A filtered index is an optimized nonclustered index especially suited to cover queries that select from a well-defined subset of data. It uses a filter predicate to index a portion of rows in the table. A well-designed filtered index can improve query performance as well as reduce index maintenance and storage costs compared with full-table indexes.
And here's an example combining a unique index with a filter predicate:
create unique index MyIndex
on MyTable(ID)
where RecordStatus = 1;
This essentially enforces uniqueness of ID when RecordStatus is 1.
Following the creation of that index, a uniqueness violation will raise an arror:
Msg 2601, Level 14, State 1, Line 13
Cannot insert duplicate key row in object 'dbo.MyTable' with unique index 'MyIndex'. The duplicate key value is (9999).
Note: the filtered index was introduced in SQL Server 2008. For earlier versions of SQL Server, please see this answer.
Add a check constraint like this. The difference is, you'll return false if Status = 1 and Count > 0.
http://msdn.microsoft.com/en-us/library/ms188258.aspx
CREATE TABLE CheckConstraint
(
Id TINYINT,
Name VARCHAR(50),
RecordStatus TINYINT
)
GO
CREATE FUNCTION CheckActiveCount(
#Id INT
) RETURNS INT AS BEGIN
DECLARE #ret INT;
SELECT #ret = COUNT(*) FROM CheckConstraint WHERE Id = #Id AND RecordStatus = 1;
RETURN #ret;
END;
GO
ALTER TABLE CheckConstraint
ADD CONSTRAINT CheckActiveCountConstraint CHECK (NOT (dbo.CheckActiveCount(Id) > 1 AND RecordStatus = 1));
INSERT INTO CheckConstraint VALUES (1, 'No Problems', 2);
INSERT INTO CheckConstraint VALUES (1, 'No Problems', 2);
INSERT INTO CheckConstraint VALUES (1, 'No Problems', 2);
INSERT INTO CheckConstraint VALUES (1, 'No Problems', 1);
INSERT INTO CheckConstraint VALUES (2, 'Oh no!', 1);
INSERT INTO CheckConstraint VALUES (2, 'Oh no!', 2);
-- Msg 547, Level 16, State 0, Line 14
-- The INSERT statement conflicted with the CHECK constraint "CheckActiveCountConstraint". The conflict occurred in database "TestSchema", table "dbo.CheckConstraint".
INSERT INTO CheckConstraint VALUES (2, 'Oh no!', 1);
SELECT * FROM CheckConstraint;
-- Id Name RecordStatus
-- ---- ------------ ------------
-- 1 No Problems 2
-- 1 No Problems 2
-- 1 No Problems 2
-- 1 No Problems 1
-- 2 Oh no! 1
-- 2 Oh no! 2
ALTER TABLE CheckConstraint
DROP CONSTRAINT CheckActiveCountConstraint;
DROP FUNCTION CheckActiveCount;
DROP TABLE CheckConstraint;
You could move the deleted records to a table that lacks the constraint, and perhaps use a view with UNION of the two tables to preserve the appearance of a single table.
You can do this in a really hacky way...
Create an schemabound view on your table.
CREATE VIEW Whatever
SELECT * FROM Table
WHERE RecordStatus = 1
Now create a unique constraint on the view with the fields you want.
One note about schemabound views though, if you change the underlying tables you will have to recreate the view. Plenty of gotchas because of that.
For those still searching for a solution, I came accross a nice answer, to a similar question and I think this can be still useful for many. While moving deleted records to another table may be a better solution, for those who don't want to move the record can use the idea in the linked answer which is as follows.
Set deleted=0 when the record is available/active.
Set deleted=<row_id or some other unique value> when marking the row
as deleted.
If you can't use NULL as a RecordStatus as Bill's suggested, you could combine his idea with a function-based index. Create a function that returns NULL if the RecordStatus is not one of the values you want to consider in your constraint (and the RecordStatus otherwise) and create an index over that.
That'll have the advantage that you don't have to explicitly examine other rows in the table in your constraint, which could cause you performance issues.
I should say I don't know SQL server at all, but I have successfully used this approach in Oracle.
Because, you are going to allow duplicates, a unique constraint will not work. You can create a check constraint for RecordStatus column and a stored procedure for INSERT that checks the existing active records before inserting duplicate IDs.