T-SQL: How to make cell-values unique? - sql

If I have a table where the cells in a column should not have the same values, how do I check this and update? (I know I can set constraints in the settings, but I don't want to do that.)
Say the column name is called unique hash name and contains
Peter
Peter
Peter
Dave
Dave
and so on. I want that to transform to:
Peter
Peter1
Peter2
Dave
Dave1
What is the T-SQL for SQL Server to do that?
Update: For clarity's sake, let's call the table "Persons" and the cell I want unique "UniqueName". Could you make it a SELECT-statement, so I can test the result before updating. And I am using SQL Server 2005 and above.

EDIT: I've changed the query to use your field names and added a "select-only" query for you to preview.
This is actually pretty easy to do... just use ROW_NUMBER() with a PARTITION clause:
UPDATE Persons SET UniqueName = temp.DeDupded FROM
(SELECT ID,
CASE WHEN ROW_NUMBER() OVER
(PARTITION BY UniqueName ORDER BY UniqueName) = 1 THEN UniqueName
ELSE UniqueName + CONVERT(VARCHAR, ROW_NUMBER()
OVER (PARTITION BY UniqueName ORDER BY UniqueName)-1) END AS DeDupded
FROM Persons) temp
WHERE Persons.ID = temp.ID
If you want a "select-only", then here you go:
SELECT ID,
CASE WHEN ROW_NUMBER() OVER
(PARTITION BY UniqueName ORDER BY UniqueName) = 1 THEN UniqueName
ELSE UniqueName + CONVERT(VARCHAR, ROW_NUMBER()
OVER (PARTITION BY UniqueName ORDER BY UniqueName)-1) END AS DeDupded
FROM Persons
EDIT Again: If you're looking for a SQL Server 2000 Solution...
CREATE TABLE #Persons ( ID INT IDENTITY(1, 1), UniqueName VARCHAR(100) )
INSERT INTO #Persons VALUES ('Bob')
INSERT INTO #Persons VALUES ('Bob')
INSERT INTO #Persons VALUES ('Bob')
INSERT INTO #Persons VALUES ('John')
INSERT INTO #Persons VALUES ('John')
SELECT
ID,
CASE WHEN Position = 0 THEN UniqueName
ELSE UniqueName + (CONVERT(VARCHAR, Position))
END AS UniqueName
FROM
(SELECT
ID,
UniqueName,
(SELECT COUNT(*) FROM #Persons p2 WHERE
p1.UniqueName = p2.UniqueName AND p1.ID > p2.ID) AS Position
FROM
#Persons p1) _temp
DROP TABLE #Persons

This feels like a pretty clear use-case for a trigger (insert,update).

Related

SQL Dynamically update duplicate row values to be unique

The Problem
I need to update a table so that any duplicate rows are updated to have unique values.
The Catch
I need to dynamically ensure that the value I am updating the duplicate row to is also unique.
My Solution So Far (with test case)
CREATE TABLE #temp (name nvarchar(100), ID uniqueidentifier)
INSERT INTO #temp (Name, ID)
VALUES ('Duplicate', '32208C09-C0C3-408C-AB60-273811722194')
INSERT INTO #temp (Name, ID)
VALUES ('Duplicate', '32208C09-C0C3-408C-AB60-273811722194')
INSERT INTO #temp (Name, ID)
VALUES ('Duplicate (2)', '32208C09-C0C3-408C-AB60-273811722194')
;WITH cte AS (
SELECT Name
, ROW_NUMBER() OVER (PARTITION BY Name, ID ORDER BY Name) RowNum
FROM #temp
)
UPDATE cte
SET Name = CONCAT(Name, ' (', RowNum, ')')
WHERE RowNum > 1
SELECT * FROM #temp
DROP TABLE #temp
As you can tell, this will update the table so there is only one row with the name 'Duplicate' but two rows with the name 'Duplicate (2)'. How can I check and account for duplicates in the value I am updating to?
You could use another CTe which gets you the highest Number and then use that to generate the "next" number.
for 2 or more digits you need to adapt it
CREATE TABLE #temp (name nvarchar(100), AssetMakeID uniqueidentifier)
INSERT INTO #temp (Name, AssetMakeID)
VALUES ('Duplicate', '32208C09-C0C3-408C-AB60-273811722194')
INSERT INTO #temp (Name, AssetMakeID)
VALUES ('Duplicate', '32208C09-C0C3-408C-AB60-273811722194')
INSERT INTO #temp (Name, AssetMakeID)
VALUES ('Duplicate (2)', '32208C09-C0C3-408C-AB60-273811722194')
;WITH CTE1 AS (SELECT
MAX( COALESCE(REPLACE(REPLACE(SUBSTRING(name, PATINDEX('%([0-9])%', name), PATINDEX('%)%', name + 't') - PATINDEX('%(%',
name) + 1),'(','') ,')','') ,0)) hinum
,SUBSTRING(name,1, PATINDEX('% ([0-9])%', name) ) name
FROM #temp
WHERE SUBSTRING(name,1, PATINDEX('% ([0-9])%', name) ) IS NOT NULL
GROUP BY SUBSTRING(name,1, PATINDEX('% ([0-9])%', name) ) ),
cte AS (
SELECT #temp.Name
, CASE WHEN ROW_NUMBER() OVER (PARTITION BY #temp.Name, AssetMakeID ORDER BY #temp.Name) > 1 THEN
ROW_NUMBER() OVER (PARTITION BY #temp.Name, AssetMakeID ORDER BY #temp.Name) + hinum -1
ELSe ROW_NUMBER() OVER (PARTITION BY #temp.Name, AssetMakeID ORDER BY #temp.Name) END RowNum
FROM #temp LEFT JOIN CTE1 ON #temp.name = CTE1.name
)
UPDATE cte
SET Name = CONCAT(Name, ' (', RowNum, ')')
WHERE RowNum > 1
SELECT * FROM #temp
name
AssetMakeID
Duplicate
32208c09-c0c3-408c-ab60-273811722194
Duplicate (3)
32208c09-c0c3-408c-ab60-273811722194
Duplicate (2)
32208c09-c0c3-408c-ab60-273811722194
fiddle
Well the easy way is to use a unique string in the update so there is no way your update can cause a duplicate. The current timestamp (with milliseconds) works well. Like this:
UPDATE cte
SET Name = CONCAT(Name, ' (', RowNum, ') at ',convert(varchar(22),getdate(),126))
WHERE RowNum > 1
This will cope with one level of duplication e.g. 'Duplicate (2)' but not two e.g. 'Duplicate (2) (2)'.
Essentially just apply the same logic again in a second cte. In fact you should be able to do this using a recursive CTE to get it to work for all levels.
That said you could use a more unique method of de-duplicating names e.g. just add a guid and it will be unique.
WITH cte1 AS (
SELECT Name, Id
, ROW_NUMBER() OVER (PARTITION BY Name, ID ORDER BY Name) RowNum
-- You should already have one, but if not generate it
, ROW_NUMBER() OVER (ORDER BY Name) UniqueId
FROM #temp
), cte2 as (
SELECT NewName Name, RowNum, UniqueId
, ROW_NUMBER() OVER (PARTITION BY NewName, ID ORDER BY NewName) RowNum2
FROM cte1
CROSS APPLY (
VALUES (CASE WHEN RowNum = 1 THEN Name ELSE CONCAT(Name, ' (', RowNum, ')') END)
) n (NewName)
)
UPDATE c1 SET
Name = CASE WHEN RowNum2 = 1 THEN c2.Name ELSE CONCAT(c2.Name, ' (', RowNum2, ')') END
FROM cte1 c1
INNER JOIN cte2 c2 on c2.UniqueId = c1.UniqueId
WHERE c1.RowNum > 1 or RowNum2 > 1;
I am going to choose another answer as the correct answer since I personally prefer it, but I thought I'd post what I ended up doing myself.
DROP TABLE IF EXISTS #temp
CREATE TABLE #temp (name nvarchar(100), ID uniqueidentifier)
INSERT INTO #temp (Name, ID)
VALUES ('Duplicate', '32208C09-C0C3-408C-AB60-273811722194')
INSERT INTO #temp (Name, ID)
VALUES ('Duplicate', '32208C09-C0C3-408C-AB60-273811722194')
INSERT INTO #temp (Name, ID)
VALUES ('Duplicate (2)', '32208C09-C0C3-408C-AB60-273811722194')
DECLARE #doWhileTrueFlag bit = 1
WHILE (#doWhileTrueFlag = 1)
BEGIN
;WITH cte AS (
SELECT
Name,
ROW_NUMBER() OVER (PARTITION BY Name, ID ORDER BY Name) RowNum
FROM #temp
)
UPDATE cte
SET Name = CONCAT(Name, ' (', RowNum, ')')
WHERE RowNum > 1
SET #doWhileTrueFlag = CASE
WHEN ##ROWCOUNT > 0 THEN 1
ELSE 0
END
END
SELECT * FROM #temp
DROP TABLE #temp
This performs the update I was already doing in a loop until no more updates are done. A rather inelegant solution, but the names created are prettier for the clients.

How to get the each record with some condition

I have following data:
DECLARE #temp TABLE (
ID int
,sn varchar(200)
,comment varchar(2000)
,rownumber int
)
insert into #temp values(1,'sn1',NULL,1)
insert into #temp values(2,'sn1','aaa',2)
insert into #temp values(3,'sn1','bbb',3)
insert into #temp values(4,'sn1',NULL,4)
insert into #temp values(5,'sn2',NULL,1)
insert into #temp values(6,'sn2',NULL,2)
insert into #temp values(7,'sn2',NULL,3)
select * from #temp
And I want to output like this:
2 sn1 aaa 2
5 sn2 NULL 1
same sn, if comment have value, get this lower rownumber's record. For sn1, have two records with comment value, so here, get the the record with rownumber=2
If comment doesn't have value, get the lower rownumber's record. For sn2, get the record with rownumber=1
May I know how to write this SQL?
This is a prioritization query. I think row_number() is the simplest method:
select t.*
from (select t.*,
row_number() over (partition by sn
order by (case when comment is not null then 1 else 2 end),
rownumber
) as seqnum
from #temp t
) t
where seqnum = 1;
Here is a db<>fiddle.

SQL to combine results into one group in the where clause

I have a query
SELECT name,
COUNT (name)
FROM employee
WHERE LOCATION IS LIKE (%%NY%%)
GROUP BY name
name coount
alex m 10
alex.m 5
alex.ma 1
alex 500
How can I combine all the alex's into just one Alex
so that I get the output as
name count
alex 516
I need something like if it matches alex%% then consider it as alex
Here is your dynamic solution on the below for SQL Server.
First, let's see the sample data I worked on:
create table #temp
(name varchar(20))
insert into #temp values ('jack')
insert into #temp values ('jack rx')
insert into #temp values ('jack.a')
insert into #temp values ('jack.bb')
insert into #temp values ('jack.xy')
insert into #temp values ('brandon.12')
insert into #temp values ('brandon')
insert into #temp values ('brandon.k7s')
insert into #temp values ('brandon.bg')
insert into #temp values ('Jonathan')
Then, we need to employ string operators:
;with cte (name, charin, charin_space) as
(
select name,CHARINDEX('.',name,0) as charin, CHARINDEX(' ',name,0) as charin_space
from #temp
)
select name,(case when charin = 0 and charin_space = 0 then name
when charin = 0 and charin_space <> 0 then SUBSTRING(name,0,charin_space)
when charin <> 0 and charin_space = 0 then SUBSTRING(name,0,charin)
end) as mainName
into #temp2
from cte
The temp table #temp2 has the names only like jack, brandon and jonathan. All we need is to connect those tables now and use group by like:
select t2.MainName,COUNT(t2.MainName)
from #temp t1
inner join #temp2 t2 on t1.name = t2.name
group by t2.mainName
I hope it helps!
You need to get part of the name. But this only work for SQL Server. You don't specify which dbms you are using. The query works with your example, but it will also pick up Alexa, Alexander, ...
SELECT LEFT(name, 4),
SUM(coount)
FROM employee
WHERE LOCATION IS LIKE (%%NY%%)
GROUP BY LEFT(name, 4)

SQL Select Counter by Group

Here is the code I've written to create a scenario:
USE tempdb
GO
IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'dbo.Emp') AND type in (N'U'))
DROP TABLE Emp
GO
CREATE TABLE Emp(
EmpID Int Identity(10,1) Primary Key,
EmpGroupID Int)
GO
INSERT INTO Emp(EmpGroupID) VALUES(1000)
INSERT INTO Emp(EmpGroupID) VALUES(1000)
INSERT INTO Emp(EmpGroupID) VALUES(1000)
INSERT INTO Emp(EmpGroupID) VALUES(2000)
INSERT INTO Emp(EmpGroupID) VALUES(2000)
INSERT INTO Emp(EmpGroupID) VALUES(2000)
INSERT INTO Emp(EmpGroupID) VALUES(3000)
GO
SELECT * FROM Emp
ORDER BY EmpGroupID,EmpID
What I need is for each group to have a counter variable, incrementing by 1, such that all the rows for Group 1000 have counter=1, groupid=2000 has counter=2, groupid=3000 has counter=3.
SELECT ?,EmpID,EmpGroupID
FROM Emp
ORDER BY EmpGroupID,EmpID
-- The result I'm looking for is:
1,10,1000
1,11,1000
1,12,1000
2,13,2000
2,14,2000
2,15,2000
3,16,3000
You're describing a dense ranking of groups:
SELECT
DENSE_RANK() OVER (ORDER BY EmpGroupID) as Counter,
EmpID,
EmpGroupID
FROM Emp
ORDER BY EmpGroupID,EmpID
And here's some reference material: http://msdn.microsoft.com/en-us/library/ms189798.aspx
You mean, you need a query that produces textual output with the commas as shown?
Try:
SELECT Counter + ',' + EmpGroupID + ',' + EmpID
FROM Table
ORDER BY EmpGroupID
ORDER BY can have more than one clause
Try
SELECT Counter,EmpGroupID, EmpID
ORDER BY Counter,EmpGroupID, EmpID
Guessing from your description, do you want something like
SELECT EmpGroupID, EmpID, COUNT(1) AS Counter
FROM some-table-name
GROUP BY EmpGroupID, EmpID
ORDER BY COUNT(1), EmpGroupID, EmpID
That's for SQL Server - in other cases you may be able to say
ORDER BY Counter, EmpGroupID, EmpID
It took me a while to understand what you were asking. As I understand it, you want to create and populate the 'Counter' column based on the EmpGroupID? If so, then something like this:
SELECT EmpGroupID, EmpID,
(SELECT COUNT(*) +1
FROM [table]
WHERE t2.EmpGroupID < t1.EmpGroupID GROUP BY t2.EmpGroupID
) AS Counter
FROM [table] t1
ORDER BY EmpGroupID, EmpID
Try this:
SELECT DENSE_RANK() OVER (ORDER BY EmpID) as 'counter',GroupID
FROM Emp
ORDER BY counter, EmpGroupID

Pivot String SQL

I am trying to Pivot this table whose name is #salida
IDJOB NAME DATE
1 Michael NULL
1 Aaron NULl
THe result which I want to obtain is
IDJOB DATE NAME1 NAME2
1 NULL Michael Aaron
My code is this
SELECT *
FROM #salida
PIVOT
(
MAX([Name]) FOR [Name] IN ([Name1],[Name2])
) PVT GROUP BY IdJob,Date,Name1,Name2 ;
SELECT * FROM #salida
The result which obtain is
IDJOB DATE NAME1 NAME2
1 NULL NULL NULL
#XabiIparra, see a mock up. you need to partition by the IdJob and then add the columns needed.
DECLARE #salida TABLE(idjob VARCHAR(100),[Name] VARCHAR(100),[DATE] DATE);
INSERT INTO #salida VALUES
(1,'Michael', NULL)
,(1,'Aaron', NULL)
,(2,'Banabas', NULL)
SELECT p.*
FROM
(
SELECT *
,'NAME'+CAST(ROW_NUMBER() OVER(PARTITION BY [idjob] ORDER BY NAME) AS varchar(100)) ColumnName
FROM #salida
)t
PIVOT
(
MAX([Name]) FOR ColumnName IN (NAME1,NAME2,NAME3,NAME4,NAME5 /*add as many as you need*/)
)p;
How about must using aggregation and min() and max()?
select idjob, date, min(name), max(name)
from #salida
group by idjob, date;
SQL tables represent unordered sets, so there is no ordering to the values (unless another column specifies the ordering). So, this is probably the simplest way to get two different values in the same row.