Update duplicate varchars to be unique in SQL database - sql

I need to change a database to add a unique constraint on a table column, but the VARCHAR data in it is not unique.
How can I update those duplicate records so that each value is unique by adding a sequential number at the end of the existing data?
e.g. I would like to change 'name' to 'name1', 'name2', 'name3'

Here are 2 examples with using the MS SQL SERVER flavor of sql.
Setup Example:
create table test (id int identity primary key, val varchar(20) )
--id is a pk for the cursor so it can update using "where current of"
-- name a is not duplicated
-- name b is duplicated 3 times
-- name c is duplicated 2 times
insert test values('name a')
insert test values('name b')
insert test values('name c')
insert test values('name b')
insert test values('name b')
insert test values('name c')
Sql 2005\2008: ( Computed Table Expression )
begin tran; -- Computed table expressions require the statement prior to end with ;
with cte(val,row) as (
select val, row_number() over (partition by val order by val) row
--partiton is important. it resets the row_number on a new val
from test
where val in ( -- only return values that are duplicated
select val
from test
group by val
having count(val)>1
)
)
update cte set val = val + ltrim(str(row))
--ltrim(str(row)) = converting the int to a string and removing the padding from the str command.
select * from test
rollback
Sql 2000: (Cursor example)
begin tran
declare #row int, #last varchar(20), #current varchar(20)
set #last = ''
declare dupes cursor
for
select val
from test
where val in ( -- only return values that are duplicated
select val
from test
group by val
having count(val)>1
)
order by val
for update of val
open dupes
fetch next from dupes into #current
while ##fetch_status = 0
begin
--new set of dupes, like the partition by in the 2005 example
if #last != #current
set #row = 1
update test
--#last is being set during the update statement
set val = val + ltrim(str(#row)), #last = val
where current of dupes
set #row = #row + 1
fetch next from dupes into #current
end
close dupes
deallocate dupes
select * from test
rollback
I rolled back each of the updates because my script file contains both examples. This allowed me to test the functionality without resetting the rows on the table.

Open a cursor on the table, ordered by that column. Keep a previous value variable, initialized to null, and an index variable initialized to 0. If the current value = the previous value, increment the index and append the index to the field value. if the current value <> the previous value, reset the index to 0 and keep the field value as is. Set the previous value variable = the current value. Move on to the next row and repeat.

You could add another column to it... like
update mytable set mycolumn = concat(mycolumn, id)
where id in (<select duplicate records>);
replace id with whatever column makes mycolumn unique

What database are you using?
In Oracle there is a:
NOVALIDATE Validates changes but does not validate data previously existing in the table
Example:
ALTER TABLE <table_name> ENABLE NOVALIDATE UNIQUE;
If you are not using Oracle then check the SQL reference for your respective database.

Related

In SQL Server 2008 R2, is there a way to create a custom auto increment identity field without using IDENTITY(1,1)?

I would like to be able to pull the custom key value from a table, but would also like it to perform like SQL Server's IDENTITY(1,1) column on inserts.
The custom key is for another application and will need to be used by different functions so the value will need to be pulled from a table and available for other areas.
Here are some if my attempts:
Tried a trigger on the table works well on single inserts, failed on using SQL insert (forgetting the fact that a triggers are not per row but by batch)
ALTER TRIGGER [sales].[trg_NextInvoiceDocNo]
ON [sales].[Invoice]
AFTER INSERT
AS
BEGIN
DECLARE #ResultVar VARCHAR(25)
DECLARE #Key VARCHAR(25)
EXEC [dbo].[usp_GetNextKeyCounterChar]
#tcForTbl = 'docNbr', #tcForGrp = 'docNbr', #NewKey = #ResultVar OUTPUT
UPDATE sales.InvoiceRET
SET DocNbr = #ResultVar
FROM sales.InvoiceRET
JOIN inserted ON inserted.id = sales.InvoiceRET.id;
END;
Thought about a scalar function, but functions cannot exec stored procedures or update statements in order to set the new key value in the lookup table.
Thanks
You can use ROW_NUMBER() depending on the type of concurrency you are dealing with. Here is some sample data and a demo you can run locally.
-- Sample table
USE tempdb
GO
IF OBJECT_ID('dbo.sometable','U') IS NOT NULL DROP TABLE dbo.sometable;
GO
CREATE TABLE dbo.sometable
(
SomeId INT NULL,
Col1 INT NOT NULL
);
GO
-- Stored Proc to insert data
CREATE PROC dbo.InsertProc #output BIT AS
BEGIN -- Your proc starts here
INSERT dbo.sometable(Col1)
SELECT datasource.[value]
FROM (VALUES(CHECKSUM(NEWID())%100)) AS datasource([value]) -- simulating data from somewhere
CROSS APPLY (VALUES(1),(1),(1)) AS x(x);
WITH
id(MaxId) AS (SELECT ISNULL(MAX(t.SomeId),0) FROM dbo.sometable AS t),
xx AS
(
SELECT s.SomeId, RN = ROW_NUMBER() OVER (ORDER BY (SELECT NULL))+id.MaxId, s.Col1, id.MaxId
FROM id AS id
CROSS JOIN dbo.sometable AS s
WHERE s.SomeId IS NULL
)
UPDATE xx SET xx.SomeId = xx.RN;
IF #output = 1
SELECT t.* FROM dbo.sometable AS t;
END
GO
Each time I run: EXEC dbo.InsertProc 1; it returns 3 more rows with the correct ID col. Each time I execute it, it adds more rows and auto-increments as needed.
SomeId Col1
-------- ------
1 62
2 73
3 -17

Generate a unique column sequence value based on a query handling concurrency

I have a requirement to automatically generate a column's value based on another query's result. Because this column value must be unique, I need to take into consideration concurrent requests. This query needs to generate a unique value for a support ticket generator.
The template for the unique value is CustomerName-Month-Year-SupportTicketForThisMonthCount.
So the script should automatically generate:
AcmeCo-10-2019-1
AcmeCo-10-2019-2
AcmeCo-10-2019-3
and so on as support tickets are created. How can ensure that AcmeCo-10-2019-1 is not generated twice if two support tickets are created at the same time for AcmeCo?
insert into SupportTickets (name)
select concat_ws('-', #CustomerName, #Month, #Year, COUNT())
from SupportTickets
where customerName = #CustomerName
and CreatedDate between #MonthStart and #MonthEnd;
One possibility:
Create a counter table:
create table Counter (
Id int identify(1,1),
Name varchar(64)
Count1 int
)
Name is a unique identifier for the sequence, and in your case name would be CustomerName-Month-Year i.e. you would end up with a row in this table for every Customer/Year/Month combination.
Then write a stored procedure similar to the following to allocate a new sequence number:
create procedure [dbo].[Counter_Next]
(
#Name varchar(64)
, #Value int out -- Value to be used
)
as
begin
set nocount, xact_abort on;
declare #Temp int;
begin tran;
-- Ensure we have an exclusive lock before changing variables
select top 1 1 from dbo.Counter with (tablockx);
set #Value = null; -- if a value is passed in it stuffs us up, so null it
-- Attempt an update and assignment in a single statement
update dbo.[Counter] set
#Value = Count1 = Count1 + 1
where [Name] = #Name;
if ##rowcount = 0 begin
set #Value = 10001; -- Some starting value
-- Create a new record if none exists
insert into dbo.[Counter] ([Name], Count1)
select #Name, #Value;
end;
commit tran;
return 0;
end;
You could look into using a TIME type instead of COUNT() to create unique values. That way it is much less likely to have duplicates. Hope that helps

Microsoft SQL Server - default value provided by stored procedure

Is it possible to have a non-null column where the value is generated at insert by calling a stored procedure the parameters of which are values passed to insert into the row?
For example, I have table User:
| username | name | surname | id |
Insert looks like this:
INSERT INTO USER (username, name, surname)
VALUES ('myusername', 'myname', 'mysurname');
The id column is populated with an (integer) value retrieved by calling stored procedure mystoredproc with parameters myusername, myname, mysurname.
A further question is, would this stored procedure be called on each row, or can it be called in a grouped fashion. For example, I'd like my stored procedure to take the name and append a random integer to it so that that if I insert 100 users with the name 'David', they will get the same id and the stored procedure will be called only once. A bit of a bad example on the second point.
Good day,
Is it possible to have a non-null column where the value is generated at insert by calling a stored procedure
Option 1: please check if this work for you
Specify Default Value for the Column and use "NOT NULL"
create trigger on the table AFTER INSERT
Inside the trigger, you can use the virtual table "inserted" in order to get the inserted values.
Using these values (using the inserted table) you can update the column using the logic you need for all the rows at once
** there is no need to use external SP probably, but you can execute SP from trigger if needed
** All executed by a trigger is in the same transaction as the original query.
would this stored procedure be called on each row
NO! The trigger will be executed once for all rows you insert in the same statement. The inserted table includes all the rows which were inserted. In your update section (step 4) you can update all the rows which were inserted in once and no need to execute something for each row
** If you do use external SP which is executed from the trigger then you can pass it all the inserted table as one using Table-Valued Parameter
------------------- update ---------------
Here is a full example of using this logic:
drop table if exists T;
CREATE TABLE T (id int identity(2,2), c int NOT NULL default 1)
GO
CREATE TRIGGER tr ON T AFTER INSERT
AS BEGIN
SET NOCOUNT ON;
UPDATE T SET T.c = T2.C + 1
FROM inserted T2
INNER JOIN T T1 ON T1.id = T2.id
END
INSERT T(c) values (1) -- I insert the value 1 but the trigger will change it to 1+1=2
select * from T
GO
-- test multiple rows:
INSERT T(c) values (10),(20),(30),(40)
select * from T
GO
DECLARE #rc INT = 0,
#UserID INT = ABS(CHECKSUM(NEWID())) % 1000000 + 1;
WHILE #rc = 0
BEGIN
IF NOT EXISTS (SELECT 1 FROM dbo.Users WHERE UserId= #UserId)
BEGIN
INSERT dbo.Users(UserId) WHERE Username = #UserName SELECT #UserId;
SET #rc = 1;
END
ELSE
BEGIN
SELECT #UserId = ABS(CHECKSUM(NEWID())) % 1000000 + 1,
#rc = 0;
END
END

using MERGE for incremental insert

I have a scenario where one column of the target table needs to be auto incremented . I do not have identity enabled on this column. So i need to pick the last number and add 1 to it , each time an insert is done.
http://sqlfiddle.com/#!6/61eb4/5
A similar scenario is given in the fiddle link. I do not want the productid of ProductChanges table to be inserted. Instead, i need the last id to be picked and i need it to be incremented and inserted for each new row
Code to get this working
DECLARE #intctr int
SELECT #intctr = MAX(productid)+1 from products
DECLARE #strQry varchar(200)
SET #strQry =
'CREATE SEQUENCE dbo.seq_key_prd
START WITH ' +convert( varchar(12),#intctr) +' INCREMENT BY 1 ;'
print #strQry
exec( #strQry)
alter table Products
add default next value for seq_key_prd
for ProductId;
GO
--Merge statement for data sync
MERGE Products USING ProductChanges ON (Products.Productid = ProductChanges.Productid)
WHEN MATCHED AND Products.VendorlD =0 THEN DELETE
WHEN NOT MATCHED by target then insert (productid,Productname,VendorlD)
values(default,productname,VendorlD)
WHEN MATCHED THEN UPDATE SET
Products.ProductName = ProductChanges.ProductName ,
Products.VendorlD = ProductChanges.VendorlD;
1)create sequence and set to target table.
example
CREATE SEQUENCE table_seq
MINVALUE 1
START WITH 1
INCREMENT BY 1
CACHE 20;
2)create trigger for that sequence,to set the table
CREATE OR REPLACE TRIGGER my_trigger
BEFORE INSERT
ON myTable
FOR EACH ROW
WHEN (new.id is null)
DECLARE
v_id qname.qname_id%TYPE;
BEGIN
SELECT table_seq.nextval INTO v_id FROM DUAL;
:new.qname_id := v_id;
END my_trigger;

Generating the Next Id when Id is non-AutoNumber

I have a table called Employee. The EmpId column serves as the primary key. In my scenario, I cannot make it AutoNumber.
What would be the best way of generating the the next EmpId for the new row that I want to insert in the table?
I am using SQL Server 2008 with C#.
Here is the code that i am currently getting, but to enter Id's in key value pair tables or link tables (m*n relations)
Create PROCEDURE [dbo].[mSP_GetNEXTID]
#NEXTID int out,
#TABLENAME varchar(100),
#UPDATE CHAR(1) = NULL
AS
BEGIN
DECLARE #QUERY VARCHAR(500)
BEGIN
IF EXISTS (SELECT LASTID FROM LASTIDS WHERE TABLENAME = #TABLENAME and active=1)
BEGIN
SELECT #NEXTID = LASTID FROM LASTIDS WHERE TABLENAME = #TABLENAME and active=1
IF(#UPDATE IS NULL OR #UPDATE = '')
BEGIN
UPDATE LASTIDS
SET LASTID = LASTID + 1
WHERE TABLENAME = #TABLENAME
and active=1
END
END
ELSE
BEGIN
SET #NEXTID = 1
INSERT INTO LASTIDS(LASTID,TABLENAME, ACTIVE)
VALUES(#NEXTID+1,#TABLENAME, 1)
END
END
END
Using MAX(id) + 1 is a bad idea both performance and concurrency wise.
Instead you should resort to sequences which were design specifically for this kind of problem.
CREATE SEQUENCE EmpIdSeq AS bigint
START WITH 1
INCREMENT BY 1;
And to generate the next id use:
SELECT NEXT VALUE FOR EmpIdSeq;
You can use the generated value in a insert statement:
INSERT Emp (EmpId, X, Y)
VALUES (NEXT VALUE FOR EmpIdSeq, 'x', 'y');
And even use it as default for your column:
CREATE TABLE Emp
(
EmpId bigint PRIMARY KEY CLUSTERED
DEFAULT (NEXT VALUE FOR EmpIdSeq),
X nvarchar(255) NULL,
Y nvarchar(255) NULL
);
Update: The above solution is only applicable to SQL Server 2012+. For older versions you can simulate the sequence behavior using dummy tables with identity fields:
CREATE TABLE EmpIdSeq (
SeqID bigint IDENTITY PRIMARY KEY CLUSTERED
);
And procedures that emulates NEXT VALUE:
CREATE PROCEDURE GetNewSeqVal_Emp
#NewSeqVal bigint OUTPUT
AS
BEGIN
SET NOCOUNT ON
INSERT EmpIdSeq DEFAULT VALUES
SET #NewSeqVal = scope_identity()
DELETE FROM EmpIdSeq WITH (READPAST)
END;
Usage exemple:
DECLARE #NewSeqVal bigint
EXEC GetNewSeqVal_Emp #NewSeqVal OUTPUT
The performance overhead of deleting the last inserted element will be minimal; still, as pointed out by the original author, you can optionally remove the delete statement and schedule a maintenance job to delete the table contents off-hour (trading space for performance).
Adapted from SQL Server Customer Advisory Team Blog.
Working SQL Fiddle
The above
select max(empid) + 1 from employee
is the way to get the next number, but if there are multiple user inserting into the database, then context switching might cause two users to get the same value for empid and then add 1 to each and then end up with repeat ids. If you do have multiple users, you may have to lock the table while inserting. This is not the best practice and that is why the auto increment exists for database tables.
I hope this works for you. Considering that your ID field is an integer
INSERT INTO Table WITH (TABLOCK)
(SELECT CASE WHEN MAX(ID) IS NULL
THEN 1 ELSE MAX(ID)+1 END FROM Table), VALUE_1, VALUE_2....
Try following query
INSERT INTO Table VALUES
((SELECT isnull(MAX(ID),0)+1 FROM Table), VALUE_1, VALUE_2....)
you have to check isnull in on max values otherwise it will return null in final result when table contain no rows .