MySQL - Prevent duplicate records in table via index? - sql

Using MySQL 5
I have a table like this:
date (varchar)
door (varchar)
shift (varchar)
route (varchar)
trailer (varchar)
+ other fields
This table contains user generated content (copied in from another 'master' table) and to prevent the users from creating the same data more than 1x the table has a unique index created based on the fields specified above.
The problem is that the "duplicate prevention" index doesn't work.
Users can still add in duplicate records with no errors being reported.
Is this problem due to my not understanding something about how indexes work?
Or
Is it a possible conflict with the primary key field (autoincrementing int)?
The CREATE TABLE looks like this:
CREATE TABLE /*!32312 IF NOT EXISTS*/ "tableA" (
"Date" varchar(12) default NULL,
"door" varchar(12) default NULL,
"Shift" varchar(45) default NULL,
"route" varchar(20) default NULL,
"trailer" varchar(45) default NULL,
"fieldA" varchar(45) default NULL,
"fieldB" varchar(45) default NULL,
"fieldC" varchar(45) default NULL,
"id" int(10) unsigned NOT NULL auto_increment,
PRIMARY KEY ("id"),
UNIQUE KEY "duplicate_preventer" ("Date","door","Shift","route","trailer"),
A row duplicated is:
date door shift route trailer
10/4/2009 W17 1st Shift TEST-01 NULL
10/4/2009 W17 1st Shift TEST-01 NULL

Users can still add in duplicate records with no errors being reported.
What do you mean by "duplicate records"?
Depending on collation, case, accent etc. may matter, and 'test' and 'TEST' will not be considered duplicates.
Could you please post the results of SHOW CREATE TABLE mytable?
Also, could you please run this query:
SELECT date, door, shift, route, trailer
FROM mytable
GROUP BY
date, door, shift, route, trailer
HAVING COUNT(*) > 1
If it returns the rows, the problem is with the index; if it does not, the problem is with your definition of a "duplicate".
Update:
Your columns allow NULLs.
NULL values in MySQL are not considered duplicate from the point of view of a UNIQUE index:
CREATE TABLE testtable (door VARCHAR(20), shift VARCHAR(15), UNIQUE KEY (door, shift));
INSERT
INTO testtable
VALUES
('door', NULL),
('door', NULL);
SELECT door, shift
FROM testtable
GROUP BY
door, shift
HAVING COUNT(*) > 1;
From documentation:
A UNIQUE index creates a constraint such that all values in the index must be distinct. An error occurs if you try to add a new row with a key value that matches an existing row. This constraint does not apply to NULL values except for the BDB storage engine. For other engines, a UNIQUE index allows multiple NULL values for columns that can contain NULL. If you specify a prefix value for a column in a UNIQUE index, the column values must be unique within the prefix.

Are you sure that you are using unique index instead of a normal index?
create unique index uix on my_table (date, door, shift, route, trailer);
Also that kind of index only makes sure that combination of fields is unique, you can for example have several duplicate dates if, for example, field door is different on every row. The difference could something that is hard to spot, for example a space in end of the value or lowercase/uppercase difference.
Update: your unique index seems to be in order. The problem is elsewhere.

I think you'd want to create a unique constraint on the fields you don't want duplicated. This will in turn create a unique index.
Like this:
ALTER TABLE YourTable
ADD CONSTRAINT uc_yourconstraintname UNIQUE (date, door, shift, route, trailer)

Related

Create unique constraint with null columns

I have a table with this layout:
CREATE TABLE Favorites (
FavoriteId uuid NOT NULL PRIMARY KEY,
UserId uuid NOT NULL,
RecipeId uuid NOT NULL,
MenuId uuid
);
I want to create a unique constraint similar to this:
ALTER TABLE Favorites
ADD CONSTRAINT Favorites_UniqueFavorite UNIQUE(UserId, MenuId, RecipeId);
However, this will allow multiple rows with the same (UserId, RecipeId), if MenuId IS NULL. I want to allow NULL in MenuId to store a favorite that has no associated menu, but I only want at most one of these rows per user/recipe pair.
The ideas I have so far are:
Use some hard-coded UUID (such as all zeros) instead of null.
However, MenuId has a FK constraint on each user's menus, so I'd then have to create a special "null" menu for every user which is a hassle.
Check for existence of a null entry using a trigger instead.
I think this is a hassle and I like avoiding triggers wherever possible. Plus, I don't trust them to guarantee my data is never in a bad state.
Just forget about it and check for the previous existence of a null entry in the middle-ware or in a insert function, and don't have this constraint.
I'm using Postgres 9.0. Is there any method I'm overlooking?
Postgres 15 or newer
Postgres 15 adds the clause NULLS NOT DISTINCT. The release notes:
Allow unique constraints and indexes to treat NULL values as not distinct (Peter Eisentraut)
Previously NULL values were always indexed as distinct values, but
this can now be changed by creating constraints and indexes using
UNIQUE NULLS NOT DISTINCT.
With this clause null is treated like just another value, and a UNIQUE constraint does not allow more than one row with the same null value. The task is simple now:
ALTER TABLE favorites
ADD CONSTRAINT favo_uni UNIQUE NULLS NOT DISTINCT (user_id, menu_id, recipe_id);
There are examples in the manual chapter "Unique Constraints".
The clause switches behavior for all keys of the same index. You can't treat null as equal for one key, but not for another.
NULLS DISTINCT remains the default (in line with standard SQL) and does not have to be spelled out.
The same clause works for a UNIQUE index, too:
CREATE UNIQUE INDEX favo_uni_idx
ON favorites (user_id, menu_id, recipe_id) NULLS NOT DISTINCT;
Note the position of the new clause after the key fields.
Postgres 14 or older
Create two partial indexes:
CREATE UNIQUE INDEX favo_3col_uni_idx ON favorites (user_id, menu_id, recipe_id)
WHERE menu_id IS NOT NULL;
CREATE UNIQUE INDEX favo_2col_uni_idx ON favorites (user_id, recipe_id)
WHERE menu_id IS NULL;
This way, there can only be one combination of (user_id, recipe_id) where menu_id IS NULL, effectively implementing the desired constraint.
Possible drawbacks:
You cannot have a foreign key referencing (user_id, menu_id, recipe_id). (It seems unlikely you'd want a FK reference three columns wide - use the PK column instead!)
You cannot base CLUSTER on a partial index.
Queries without a matching WHERE condition cannot use the partial index.
If you need a complete index, you can alternatively drop the WHERE condition from favo_3col_uni_idx and your requirements are still enforced.
The index, now comprising the whole table, overlaps with the other one and gets bigger. Depending on typical queries and the percentage of null values, this may or may not be useful. In extreme situations it may even help to maintain all three indexes (the two partial ones and a total on top).
This is a good solution for a single nullable column, maybe for two. But it gets out of hands quickly for more as you need a separate partial index for every combination of nullable columns, so the number grows binomially. For multiple nullable columns, see instead:
Why doesn't my UNIQUE constraint trigger?
Aside: I advise not to use mixed case identifiers in PostgreSQL.
You could create a unique index with a coalesce on the MenuId:
CREATE UNIQUE INDEX
Favorites_UniqueFavorite ON Favorites
(UserId, COALESCE(MenuId, '00000000-0000-0000-0000-000000000000'), RecipeId);
You'd just need to pick a UUID for the COALESCE that will never occur in "real life". You'd probably never see a zero UUID in real life but you could add a CHECK constraint if you are paranoid (and since they really are out to get you...):
alter table Favorites
add constraint check
(MenuId <> '00000000-0000-0000-0000-000000000000')
You can store favourites with no associated menu in a separate table:
CREATE TABLE FavoriteWithoutMenu
(
FavoriteWithoutMenuId uuid NOT NULL, --Primary key
UserId uuid NOT NULL,
RecipeId uuid NOT NULL,
UNIQUE KEY (UserId, RecipeId)
)
I believe there is an option that combines the previous answers into a more optimal solution.
create table unique_with_nulls (
id serial not null,
name varchar not null,
age int2 not null,
email varchar,
email_discriminator varchar not null generated always as ( coalesce(email::varchar, 0::varchar) ) stored,
constraint uwn_pkey primary key (id)
);
create unique index uwn_name_age_email_uidx on unique_with_nulls(name, age, email_discriminator);
What happens here is that the column email_discriminator will be generated at "insert-or-update-time", as either an actual email, or "0" if the former one is null. Then, your unique index must target the discriminator column.
This way we don't have to create two partial indexes, and we don't loose the ability to use indexed scans on name and age selection only.
Also, you can keep the type of the email column and we don't have any problems with the coalesce function, because email_discriminator is not a foreign key. And you don't have to worry about this column receiving unexpected values because generated columns cannot be written to.
I can see three opinionated drawbacks in this solution, but they are all fine for my needs:
the duplication of data between the email and email_discriminator.
the fact that I must write to a column and read from another.
the need to find a value that is outside the set of acceptable values of email to be the fallback one (and sometimes this could be hard to find or even subjective).
I think there is a semantic problem here. In my view, a user can have a (but only one) favourite recipe to prepare a specific menu. (The OP has menu and recipe mixed up; if I am wrong: please interchange MenuId and RecipeId below)
That implies that {user,menu} should be a unique key in this table. And it should point to exactly one recipe. If the user has no favourite recipe for this specific menu no row should exist for this {user,menu} key pair. Also: the surrogate key (FaVouRiteId) is superfluous: composite primary keys are perfectly valid for relational-mapping tables.
That would lead to the reduced table definition:
CREATE TABLE Favorites
( UserId uuid NOT NULL REFERENCES users(id)
, MenuId uuid NOT NULL REFERENCES menus(id)
, RecipeId uuid NOT NULL REFERENCES recipes(id)
, PRIMARY KEY (UserId, MenuId)
);

Will multiply insert requests to the same table with direct query and store-procedure cause collision?

Multiply users can call store procedure(SP), that will make some changes to mytable in SQL Server. This SP should insert some rows to mytable that has reference to itself through parentid column.
TABLE mytable(
id int identity(1,1) primary key,
name varchar(20) not null,
parentId int not null foreign key references mytable(id)
)
in order to insert row to such table, accordingly to other posts, I have 2 ways:
Allow null to parentid column by ALTER TABLE mytable alter column parentid int null;, insert the row, update parentid and than disable null to parentid
Allow IDENTITY by set identity_insert maytable on, insert dummy row with id=-1 and parentid=-1, insert the correct row with reference to -1, update the parentid to SCOPE_IDENTITY() and in the end set IDENTITY to off
The case:
Assume I take the 2nd way. SP managed to set identity_insert mytable on BUT didn't yet finished the execution of the rest SP. At this time, there are other INSERT requests(NOT through SP) to the mytable table like INSERT INTO mytable(name,parentid) VALUES('theateist', -1). No id is specified because they assumed that IDENTITY is off and therefore id is auto-incremental.
The Question:
Will this cause errors while inserting because IDENTITY, in this period of time, is ON and not auto-incremental any more and therefore it will require id specification? If yes, it will be better to use the 1st way, isn't it?
Thank you
identity_insert is a per-connection setting - you won't affect other connections/statements running against this table.
I definitely wouldn't suggest going the first way, if it could be avoided, since it could impact other users of the table - e.g. some other connection could do a broken insert (parentid=null) while the column definition allows it, and then your stored proc will break. Also, setting a column not null forces a full table scan to occur, so this won't work well as the table grows.
If you did stick with method 2, you've still got an issue with what happens if two connections run this stored proc simultaneously - they'll both want to insert the -1 row, at different times, and delete it also. You'll have conflicts.
I'm guessing the problem you're having is inserting the "roots" of the tree(s), since they have no parent, and so you're attempting to have them self referencing. I'd instead probably make the roots have a null parentid permanently. If there's some other key column(s), these could be used in a filtered index or indexed view to ensure that only one root exists for each key.
Imagine that we're building some form of family trees, and ignoring most of the realities of such beasts (such as most families requiring children to have two parents):
CREATE TABLE People (
PersonID int IDENTITY(1,1) not null,
Surname varchar(30) not null,
Forename varchar(30) not null,
ParentID int null,
constraint PK_People PRIMARY KEY (PersonID),
constraint FK_People_Parents FOREIGN KEY (ParentID) references People (PersonID)
)
CREATE UNIQUE INDEX IX_SoleFamilyRoot ON People (Surname) WHERE (ParentID is null)
This ensures that, within each family (as identified by the surname), exactly one person has a null ParentID. Hopefully, you can modify this example to fit your model.
On SQL Server 2005 and earlier, you have to use an indexed view instead.

difference between key column and non key one

Sorry for the newbie question.
I have define a table in a following way
CREATE TABLE `class1` (
`name` varchar(20) NOT NULL,
`familyname` varchar(20) NOT NULL,
`id` int(11) DEFAULT NULL,
KEY `class1` (`name`,`familyname`));
Can you explain me what is the difference here that i have here a name and family name as key.
I have to enter values to the table and run queries but i can't understand what it gives me ? If i not define name and family name as keys i get the same results .
A key, otherwise known as an index, will not change the results of a query, but sometimes it can make the query faster.
If you're interested, you can read about different kinds of keys here.
The KEY keyword in the create table syntax in mysql is a synonym for INDEX and will create an index on the 'name', 'familyname' columns.
This will not effect any constrains on the table but will make a query on the table faster when using the 'name' and 'familyname' columns in the where clause of a select statement.
If you wanted to create a primary key on those 2 columns you should use:
CREATE TABLE `class1` (
`name` varchar(20) NOT NULL,
`familyname` varchar(20) NOT NULL,
`id` int(11) DEFAULT NULL,
PRIMARY KEY (`name`,`familyname`));
This will stop you being able to insert multiple rows with the same 'name' and 'familyname' combination.
If you want to get data from your table, based upon the key or index fields (as said above), your query will execute faster, because the table will keep indexes about those values.
But, there is a downside to this as well. Adding unneeded indexes will have a negative effect on your performance. So to sum this all up: Indexes can speed up your database, but always check if they are really needed before adding them.

SQL - How can I apply a "semi-unique" constraint?

I have a (simplified) table consisting of three columns:
id INT PRIMARY KEY NOT NULL AUTO_INCREMENT,
foreignID INT NOT NULL,
name VARCHAR NOT NULL
Basically, I would like to add a constraint (at the database level rather than at the application level) where it only possible for one unique 'name' to exist per foreignID. For example, given the data (id, foreignid, name):
1,1,Name1
2,1,Name2
3,1,Name3
4,2,Name1
5,2,Name2
I want the constraint to fail if the user tries to insert another 'Name3' under foreignId 1, but succeed if the user tries to insert 'Name3' under foreignId 2. For this reason I cannot simply make the whole column UNIQUE.
I am having difficulty coming up with a SQL expression to achieve this, can anybody help me?
Thanks
Add a unique constraint on (ForeignID, Name). The exact syntax for that depends on your DBMS. For SQL Server:
CREATE UNIQUE INDEX IndexName ON YourTable (ForeignID, Name)
(The terms "unique constraint", "unique key" and "unique index" mean roughly the same.)
create a composite (multiple - column ) key... on those two columns
Create Unique Index MyIndexName On TableName(ForeignID, Name)

How to create a unique index on a NULL column?

I am using SQL Server 2005. I want to constrain the values in a column to be unique, while allowing NULLS.
My current solution involves a unique index on a view like so:
CREATE VIEW vw_unq WITH SCHEMABINDING AS
SELECT Column1
FROM MyTable
WHERE Column1 IS NOT NULL
CREATE UNIQUE CLUSTERED INDEX unq_idx ON vw_unq (Column1)
Any better ideas?
Using SQL Server 2008, you can create a filtered index.
CREATE UNIQUE INDEX AK_MyTable_Column1 ON MyTable (Column1) WHERE Column1 IS NOT NULL
Another option is a trigger to check uniqueness, but this could affect performance.
The calculated column trick is widely known as a "nullbuster"; my notes credit Steve Kass:
CREATE TABLE dupNulls (
pk int identity(1,1) primary key,
X int NULL,
nullbuster as (case when X is null then pk else 0 end),
CONSTRAINT dupNulls_uqX UNIQUE (X,nullbuster)
)
Pretty sure you can't do that, as it violates the purpose of uniques.
However, this person seems to have a decent work around:
http://sqlservercodebook.blogspot.com/2008/04/multiple-null-values-in-unique-index-in.html
It is possible to use filter predicates to specify which rows to include in the index.
From the documentation:
WHERE <filter_predicate> Creates a filtered index by specifying which
rows to include in the index. The filtered index must be a
nonclustered index on a table. Creates filtered statistics for the
data rows in the filtered index.
Example:
CREATE TABLE Table1 (
NullableCol int NULL
)
CREATE UNIQUE INDEX IX_Table1 ON Table1 (NullableCol) WHERE NullableCol IS NOT NULL;
Strictly speaking, a unique nullable column (or set of columns) can be NULL (or a record of NULLs) only once, since having the same value (and this includes NULL) more than once obviously violates the unique constraint.
However, that doesn't mean the concept of "unique nullable columns" is valid; to actually implement it in any relational database we just have to bear in mind that this kind of databases are meant to be normalized to properly work, and normalization usually involves the addition of several (non-entity) extra tables to establish relationships between the entities.
Let's work a basic example considering only one "unique nullable column", it's easy to expand it to more such columns.
Suppose we the information represented by a table like this:
create table the_entity_incorrect
(
id integer,
uniqnull integer null, /* we want this to be "unique and nullable" */
primary key (id)
);
We can do it by putting uniqnull apart and adding a second table to establish a relationship between uniqnull values and the_entity (rather than having uniqnull "inside" the_entity):
create table the_entity
(
id integer,
primary key(id)
);
create table the_relation
(
the_entity_id integer not null,
uniqnull integer not null,
unique(the_entity_id),
unique(uniqnull),
/* primary key can be both or either of the_entity_id or uniqnull */
primary key (the_entity_id, uniqnull),
foreign key (the_entity_id) references the_entity(id)
);
To associate a value of uniqnull to a row in the_entity we need to also add a row in the_relation.
For rows in the_entity were no uniqnull values are associated (i.e. for the ones we would put NULL in the_entity_incorrect) we simply do not add a row in the_relation.
Note that values for uniqnull will be unique for all the_relation, and also notice that for each value in the_entity there can be at most one value in the_relation, since the primary and foreign keys on it enforce this.
Then, if a value of 5 for uniqnull is to be associated with an the_entity id of 3, we need to:
start transaction;
insert into the_entity (id) values (3);
insert into the_relation (the_entity_id, uniqnull) values (3, 5);
commit;
And, if an id value of 10 for the_entity has no uniqnull counterpart, we only do:
start transaction;
insert into the_entity (id) values (10);
commit;
To denormalize this information and obtain the data a table like the_entity_incorrect would hold, we need to:
select
id, uniqnull
from
the_entity left outer join the_relation
on
the_entity.id = the_relation.the_entity_id
;
The "left outer join" operator ensures all rows from the_entity will appear in the result, putting NULL in the uniqnull column when no matching columns are present in the_relation.
Remember, any effort spent for some days (or weeks or months) in designing a well normalized database (and the corresponding denormalizing views and procedures) will save you years (or decades) of pain and wasted resources.