SQL: dealing with unique values in the table when UNIQUE key constraint isn't applicable - sql

What you do when you need to maintain a table with unique values when you can't use UNIQUE constraint?
For example, I use MySQL and want to map my urls to ids. So I create a table:
CREATE TABLE url (id INTEGER PRIMARY KEY AUTO_INCREMENT, url VARCHAR(2048));
The problem is that mysql doesn't allow unique field bigger than 1000 bytes.
How in general do insert only if not exist in sql atomically?

You could create an extra field which would be the hash of a url e.g. md5, and make that hash field unique. You can certainly be sure that the URL is unique then, and with almost 100% certainty you can insert a new URL if it isn't already there.
It is tempting to create a table lock, however creating a table lock will implicitly commit the transaction you are working on: http://www.databasesandlife.com/mysql-lock-tables-does-an-implicit-commit/
You could create a single-row table e.g. name mutex, type=InnoDB, insert a row into it, and do a select for update on that row to create a lock which is compatible with transactions. It's nasty but that's the way I do table locks in MySQL in my applications :(

You could use a not exist condition:
insert YourTable
(url)
values ('blah blah blah')
where not exists
(
select *
from YourTable
where url = 'blah blah blah'
)

In my opinion the best way to handle it is to write a trigger. The trigger is going to check each value in the table to see whether they are equal and if yes, to raise an error. However, I don't think an URL will go beyond 1000 characters but if it does in your case, you should write a trigger to handle the uniqueness.

Related

How can i prevent to have same values for one user

I have this kind of tables:
https://ibb.co/sPn5zT7
Here in the UserPl table, the ProgrammingLanguageId and KnowledgeId are foreign keys, connected with Primary Keys of Knowledge and ProgrammingLanguage table.
I want to make when I insert for example
insert into userPLs values(1,'a7ac3486-e852-42c0-a458-9075eb5ed7d7','Doe',1,1)
here Doe says that he knows C# with basic knowledge. I want to prevent in the next insert to be impossible for Doe, to be inserted again something like this:
insert into userPLs values(1,'a7ac3486-e852-42c0-a458-9075eb5ed7d7','Doe',1,2)
or
insert into userPLs values(2,'a7ac3486-e852-42c0-a458-9075eb5ed7d7','Doe',1,2)
because he once said that his knowledge of C# is basic.
I AM USING MS SQL SERVER
How can I achieve this?
Try to set a unique index, where required
You can prevent the insert with a constraint.
alter table UserPl
add constraint UserLanguageSkillLevel
unique (UserId, ProgrammingLanguageId);
You'll still gave to catch failed inserts or modify the front end to eliminate the opportunity to add contradictory information in the first place.
A uniqueness constraint is ultimately enforced with an index. If you create a unique index directly rather than by using a constraint you could also apply the ignore_dup_key index setting and let the engine silently discard bad inserts. I'm not going to endorse that as an ideal approach but it might be useful as a temporary stopgap.
Having Primary key / Cluster Index on the table UserPl would enforce whatever the combination of your needs i.e.
If User cannot know multiple programming languages, then key goes
Create clustered index CLU_UserPL on UserPl (UserID)
If User can can know multiple programming languages, but cannot have multiple level of knowledge in programming languages, then key goes
Create clustered index CLU_UserPL on UserPl (UserID, ProgrammingLanguageID)
If User can can know multiple programming languages, also have multiple level of knowledge in programming languages, then key goes
Create clustered index CLU_UserPL on UserPl (RecID) --- might be new identity column
or
Create clustered index CLU_UserPL on UserPl (UserPLID)
this can be achieved by using constraints UNIQUE.
Here is a detailed articles about UNIQUE constraint W3School UNIQUE Article
Simple words, UNIQUE is a constraint that will ensure there is no same value allowed in the selected field.
If you want to have another way to prevent Doe to insert new values in the table, you would like to use another method which is IF EXISTS
IF EXISTS (SELECT * FROM userPLs WHERE UserId = 'THE USER ID')
BEGIN
PRINT 'Data Already Exists! Insert will be ignored!'
END
ELSE
BEGIN
PRINT 'Data doesn''t exists! Proceeding to insert the data!'
//Start inserting the data
END
UPDATED ANSWER
Here is the modified SQL Query with IF EXISTS but with another condition.
IF EXISTS (SELECT * FROM userPLs WHERE UserId = 'THE USER ID' AND ProgrammingLanguageId = 'The ID')
BEGIN
PRINT 'Data Already Exists! Insert will be ignored!'
END
ELSE
BEGIN
PRINT 'Data doesn''t exists! Proceeding to insert the data!'
//Start inserting the data
END
The query above will solve your issue. If you are wondering how does it works, below is a simple explanation:
The query will check for the UserId first. Does the UserId has been registered to Database?
Next, the query will also check, does the data that will be inserted to Database (ProgrammingLanguageId) also exists in the Database for the selected user?
If the UserId is already registered and the UserId has the same ProgrammingLanguageId with the ID that will be inserted to database, it will ignore the insert and shows "Data Already Exists! Insert will be ignored"
But IF The UserId is already registered in the Database but HAS NO ProgrammingLanguageId that match with the data that will be inserted, it will start insert the data
For a better usage, I think you should create a trigger that will occur whenever an Insert is being executed.

Making a column unique with one exception

We have an application whose work flow involves submitting information to an outside group and then inputting the user's id number into the system.
For that reason we allow a set default value "00000000" to be put into the id field as a tentative value before the entry is approved and a permanent one is put in.
What I'm looking for is essentially a way to ensure that the column remains unique except for that one value.
What I'm basically looking for is a UNIQUE constraint, however instead of NULL being the blank option it being "00000000". I've considered doing it as part of a CHECK constraint, however that seems like it'd be a big performance hit. (Under the assumption that UNIQUE does some kind of indexing)
Use Filtered Index
as the Following:-
CREATE UNIQUE NONCLUSTERED INDEX idx_yourcolumn_notspecificvalue
ON YourTable(yourcolumn)
WHERE yourcolumn != "00000000";
Example:
-- Create Table
Create table Test (id int identity, code varchar (100))
-- Create Unique Filtered Index
CREATE UNIQUE NONCLUSTERED INDEX idx_MyCol_Filtered
ON Test(code)
WHERE code != '00000000';
-- Insert Dumy Data >> '00000000' is repeated and '0101' is once
insert into Test (code)
Values ('00000000'),
('00000000'),
('00000000'),
('0101')
select * from Test
The Result:
-- Now try inserting '0101' again
insert into Test (code) Values ('0101')
The Result:
For more details:
Create Filtered Indexes
Approving the user entry through work flow sound like very crucial business logic. I would like to suggest that generate random but unique (like time stamp) number and insert with new user entry. Keep additional column which differentiate ( flag) approved entries from unapproved entries. Once the user gets approval from work flow, update the id and flag.

Adding Row in existing table (SQL Server 2005)

I want to add another row in my existing table and I'm a bit hesitant if I'm doing the right thing because it might skew the database. I have my script below and would like to hear your thoughts about it.
I want to add another row for 'Jane' in the table, which will be 'SKATING" in the ACT column.
Table: [Emp_table].[ACT].[LIST_EMP]
My script is:
INSERT INTO [Emp_table].[ACT].[LIST_EMP]
([ENTITY],[TYPE],[EMP_COD],[DATE],[LINE_NO],[ACT],[NAME])
VALUES
('REG','EMP','45233','2016-06-20 00:00:00:00','2','SKATING','JANE')
Will this do the trick?
Your statement looks ok. If the database has a problem with it (for example, due to a foreign key constraint violation), it will reject the statement.
If any of the fields in your table are numeric (and not varchar or char), just remove the quotes around the corresponding field. For example, if emp_cod and line_no are int, insert the following values instead:
('REG','EMP',45233,'2016-06-20 00:00:00:00',2,'SKATING','JANE')
Inserting records into a database has always been the most common reason why I've lost a lot of my hairs on my head!
SQL is great when it comes to SELECT or even UPDATEs but when it comes to INSERTs it's like someone from another planet came into the SQL standards commitee and managed to get their way of doing it implemented into the final SQL standard!
If your table does not have an automatic primary key that automatically gets generated on every insert, then you have to code it yourself to manage avoiding duplicates.
Start by writing a normal SELECT to see if the record(s) you're going to add don't already exist. But as Robert implied, your table may not have a primary key because it looks like a LOG table to me. So insert away!
If it does require to have a unique record everytime, then I strongly suggest you create a primary key for the table, either an auto generated one or a combination of your existing columns.
Assuming the first five combined columns make a unique key, this select will determine if your data you're inserting does not already exist...
SELECT COUNT(*) AS FoundRec FROM [Emp_table].[ACT].[LIST_EMP]
WHERE [ENTITY] = wsEntity AND [TYPE] = wsType AND [EMP_COD] = wsEmpCod AND [DATE] = wsDate AND [LINE_NO] = wsLineno
The wsXXX declarations, you will have to replace them with direct values or have them DECLAREd earlier in your script.
If you ran this alone and recieved a value of 1 or more, then the data exists already in your table, at least those 5 first columns. A true duplicate test will require you to test EVERY column in your table, but it should give you an idea.
In the INSERT, to do it all as one statement, you can do this ...
INSERT INTO [Emp_table].[ACT].[LIST_EMP]
([ENTITY],[TYPE],[EMP_COD],[DATE],[LINE_NO],[ACT],[NAME])
VALUES
('REG','EMP','45233','2016-06-20 00:00:00:00','2','SKATING','JANE')
WHERE (SELECT COUNT(*) AS FoundRec FROM [Emp_table].[ACT].[LIST_EMP]
WHERE [ENTITY] = wsEntity AND [TYPE] = wsType AND
[EMP_COD] = wsEmpCod AND [DATE] = wsDate AND
[LINE_NO] = wsLineno) = 0
Just replace the wsXXX variables with the values you want to insert.
I hope that made sense.

Why are sequences not updated when COPY is performed in PostgreSQL?

I'm inserting bulk records using COPY statement in PostgreSQL. What I realize is, the sequence IDs are not getting updated and when I try to insert a record later, it throws duplicate sequence ID. Should I manually update the sequence number to get the number of records after performing COPY? Isn't there a solution while performing COPY, just increment the sequence variable, that is, the primary key field of the table? Please clarify me on this. Thanks in advance!
For instance, if I insert 200 records, COPY does good and my table shows all the records. When I manually insert a record later, it says duplicate sequence ID error. It very well implies that it didn’t increment the sequence ids during COPYing as work fine during normal INSERTing. Instead of instructing the sequence id to set the max number of records, won’t there be any mechanism to educate the COPY command to increment the sequence IDs during its bulk COPYing option?
You ask:
Should I manually update the sequence number to get the number of records after performing COPY?
Yes, you should, as documented here:
Update the sequence value after a COPY FROM:
| BEGIN;
| COPY distributors FROM 'input_file';
| SELECT setval('serial', max(id)) FROM distributors;
| END;
You write:
it didn’t increment the sequence ids during COPYing as work fine during normal INSERTing
But that is not so! :) When you perform a normal INSERT, typically you do not specify an explicit value for the SEQUENCE-backed primary key. If you did, you would run in to the same problems as you are having now:
postgres=> create table uh_oh (id serial not null primary key, data char(1));
NOTICE: CREATE TABLE will create implicit sequence "uh_oh_id_seq" for serial column "uh_oh.id"
NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "uh_oh_pkey" for table "uh_oh"
CREATE TABLE
postgres=> insert into uh_oh (id, data) values (1, 'x');
INSERT 0 1
postgres=> insert into uh_oh (data) values ('a');
ERROR: duplicate key value violates unique constraint "uh_oh_pkey"
DETAIL: Key (id)=(1) already exists.
Your COPY command, of course, is supplying an explicit id value, just like the example INSERT above.
I realize that this is a bit old but maybe someone might still be looking for the answer.
As other said COPY works in a similar way as INSERT, so for inserting into a table that has a sequence, you simply don't mention the sequence field at all and it is taken care of for you. For COPY it works in the same exact way. But doesn't it COPY require ALL fields in the table to be present in the text file? The correct answer is NO, it doesn't, but it is the default behavior.
To COPY and leave the sequence out do the following:
COPY $YOURSCHEMA.$YOURTABLE(col1,col2,col3,col4) FROM '$your_input_file' DELIMITER ',' CSV HEADER;
No need to manually update the schema afterwards, it works as intended and in my testing is just about as fast.
You could copy to a sister table, then insert into mytable select * from sister - that would increment the sequence.
If your loaded data has the id field, don't select it for the insert: insert into mytable (col1, col2, col3) select col1, col2, col3 from sister

Including multiple columns in a single index in Postgres

I have a 'users' table with two columns, 'email' and 'new_email'. I need:
A case-insensitive uniqueness constraint covering both columns - i.e., if "Bob#Example.com" appears in one row's 'email' column, then inserting "bob#example.com" into another row's (or even the same row's) 'new_email' column should fail.
Fast case-insensitive searching for a given email address in either the 'email' or 'new_email' fields - i.e. find the row where the new_email OR email is "Bob#example.com", case-insensitive.
I know that I could do this more easily by creating a related 'emails' table, but I'm expecting to be looking up users in this table (by primary key) from several applications, and I'd like to avoid duplicating the join logic in various places to also retrieve their emails. So I think some kind of expression index would be best, if that's possible.
If this isn't possible, I suppose my next best option would be to create a view that the other applications could use to easily fetch a user's emails along with their other information, but I'm not sure how to do that either.
I'm using Postgres 8.4. Thank you!
I think you'll have to use a trigger to enforce your cross-column uniqueness constraint. If you add unique indexes on each column and then a trigger something like this (untested off the top of my head code):
CREATE FUNCTION no_dups_allowed() RETURNS trigger AS $$
DECLARE
r ROW;
BEGIN
SELECT 1 INTO r
FROM users
WHERE LOWER(email) = LOWER(NEW.email_new)
OR LOWER(email_new) = LOWER(NEW.email);
IF FOUND THEN
-- Found a duplicate so it is time for a hissy fit!
RAISE 'Duplicate email address found' USING ERRCODE = 'unique_violation';
END;
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
You'd want something like that as a BEFORE INSERT and BEFORE UPDATE trigger. That trigger would take care of catching cross-column duplicates and the unique indexes would take care of in-column duplicates.
Some useful references:
FOUND
RAISE
Triggers
Trigger Procedures
You'll want the individual indexes for your queries anyway and using the uniqueness half of the indexes simplifies your trigger by leaving it to only deal with the cross-column part; if you try to do it all in the trigger, then you'll have to watch out for updating a row without really changing the email or email_new columns.
For the querying half, you could create a view that used a UNION to combine the two columns. You could also create a function to merge the user's email addresses into one list. Hard to say which would be best without know more details of these other queries but I suspect that fixing all the other queries to know about email and email_new would be the best approach; you'll have to update all the other queries to use the view or function anyway so why build a view or function at all?
No need for triggers. Try this:
create table et (email text, email2 text);
create unique index et_u on et (coalesce(lower(email),lower(email2)));
insert into et (email,email2) values ('scott#gmail.com',NULL);
insert into et (email,email2) values ('scott#gmail.com',NULL);
ERROR: duplicate key value violates unique constraint "et_u"
insert into et (email,email2) values (NULL,'scott#gmail.com');
ERROR: duplicate key value violates unique constraint "et_u"
insert into et (email,email2) values (NULL,'Scott#gmail.com');
ERROR: duplicate key value violates unique constraint "et_u"