Rename duplicate rows - sql-server-2005

Here's a simplified example of my problem. I have a table where there's a "Name" column with duplicate entries:
ID Name
--- ----
1 AAA
2 AAA
3 AAA
4 BBB
5 CCC
6 CCC
7 DDD
8 DDD
9 DDD
10 DDD
Doing a GROUP BY like SELECT Name, COUNT(*) AS [Count] FROM Table GROUP BY Name results in this:
Name Count
---- -----
AAA 3
BBB 1
CCC 2
DDD 4
I'm only concerned about the duplicates, so I'll add a HAVING clause, SELECT Name, COUNT(*) AS [Count] FROM Table GROUP BY Name HAVING COUNT(*) > 1:
Name Count
---- -----
AAA 3
CCC 2
DDD 4
Trivial so far, but now things get tricky: I need a query to get me all the duplicate records, but with a nice incrementing indicator added to the Name column. The result should look something like this:
ID Name
--- --------
1 AAA
2 AAA (2)
3 AAA (3)
5 CCC
6 CCC (2)
7 DDD
8 DDD (2)
9 DDD (3)
10 DDD (4)
Note row 4 with "BBB" is excluded, and the first duplicate keeps the original Name.
Using an EXISTS statement gives me all the records I need, but how do I go about creating the new Name value?
SELECT * FROM Table AS T1
WHERE EXISTS (
SELECT Name, COUNT(*) AS [Count]
FROM Table
GROUP BY Name
HAVING (COUNT(*) > 1) AND (Name = T1.Name))
ORDER BY Name
I need to create an UPDATE statement that will fix all the duplicates, i.e. change the Name as per this pattern.
Update:
Figured it out now. It was the PARTITION BY clause I was missing.

With Dups As
(
Select Id, Name
, Row_Number() Over ( Partition By Name Order By Id ) As Rnk
From Table
)
Select D.Id
, D.Name + Case
When D.Rnk > 1 Then ' (' + Cast(D.Rnk As varchar(10)) + ')'
Else ''
End As Name
From Dups As D
If you want an update statement you can use pretty much the same structure:
With Dups As
(
Select Id, Name
, Row_Number() Over ( Partition By Name Order By Id ) As Rnk
From Table
)
Update Table
Set Name = T.Name + Case
When D.Rnk > 1 Then ' (' + Cast(D.Rnk As varchar(10)) + ')'
Else ''
End
From Table As T
Join Dups As D
On D.Id = T.Id

Just update the subquery directly:
update d
set Name = Name+'('+cast(r as varchar(10))+')'
from ( select Name,
row_number() over (partition by Name order by Name) as r
from [table]
) d
where r > 1

SELECT ROW_NUMBER() OVER(ORDER BY Name) AS RowNum,
Name,
Name + '(' + ROW_NUMBER() OVER(PARTITION BY Name ORDER BY Name) + ')' concatenatedName
FROM Table
WHERE Name IN
(
SELECT Name
FROM Table
GROUP BY Name
HAVING COUNT(*) > 1
)
This will get you what you originally asked for. For the update statement, you'll want to do a while and update the top 1
DECLARE #Pointer VARCHAR(20), #Count INT
WHILE EXISTS(SELECT Name FROM Table GROUP BY Name HAVING COUNT(1) > 1)
BEGIN
SELECT TOP 1 #Pointer = Name, #Count = COUNT(1) FROM Table GROUP BY Name HAVING COUNT(1) > 1
UPDATE TOP (1) TABLE
SET Name = Name + '(' + #Count + ')'
WHERE Name = #Pointer
END

There's no need to do an UPDATE at all. The following will create the table for INSERT as desired
SELECT
ROW_NUMBER() OVER(ORDER BY tb2.Id) Id,
tb2.Name + CASE WHEN COUNT(*) > 1 THEN ' (' + CONVERT(VARCHAR, Count(*)) + ')' ELSE '' END [Name]
FROM
tb tb1,
tb tb2
WHERE
tb1.Name = tb2.Name AND
tb1.Id <= tb2.Id
GROUP BY
tb2.Name,
tb2.Id

Here's an even simpler UPDATE statement:
UPDATE
tb
SET
[Name] = [Name] + ' (' + CONVERT(VARCHAR, ROW_NUMBER () OVER (PARTITION BY [Name] ORDER BY Id)) + ')'
WHERE
ROW_NUMBER () OVER (PARTITION BY [Name] ORDER BY Id) > 1

Related

Get top 5 records for each group and Concate them in a Row per group

I have a table Contacts that basically looks like following:
Id | Name | ContactId | Contact | Amount
---------------------------------------------
1 | A | 1 | 12323432 | 555
---------------------------------------------
1 | A | 2 | 23432434 | 349
---------------------------------------------
2 | B | 3 | 98867665 | 297
--------------------------------------------
2 | B | 4 | 88867662 | 142
--------------------------------------------
2 | B | 5 | null | 698
--------------------------------------------
Here, ContactId is unique throughout the table. Contact can be NULL & I would like to exclude those.
Now, I want to select top 5 contacts for each Id based on their Amount. I am accomplished that by following query:
WITH cte AS (
SELECT id, Contact, amount, ROW_NUMBER()
over (
PARTITION BY id
order by amount desc
) AS RowNo
FROM contacts
where contact is not null
)
select *from cte where RowNo <= 5
It's working fine upto this point. Now I want to concate these (<=5) record for each group & show them in a single row by concatenating them.
Expected Result :
Id | Name | Contact
-------------------------------
1 | A | 12323432;23432434
-------------------------------
2 | B | 98867665;88867662
I am using following query to achieve this but it still gives all records in separate rows and also including Null values too:
WITH cte AS (
SELECT id, Contact, amount,contactid, ROW_NUMBER()
over (
PARTITION BY id
order by amount desc
) AS RowNo
FROM contacts
where contact is not null
)
select *from id, name,
STUFF ((
SELECT distinct '; ' + isnull(contact,'') FROM cte
WHERE co.id= cte.id and co.contactid= cte.contactid
and RowNo <= 5
FOR XML PATH('')),1, 1, '')as contact
from contacts co inner join cte where cte.id = co.id and co.contactid= cte.contactid
Above query still gives me all top 5 contacts in diff rows & including null too.
Is it a good idea to use CTE and STUFF togather? Please suggest if there is any better approach than this.
I got the problem with my final query:
I don't need original Contact table in my final Select, since I already have everything I needed in CTE. Also, Inside STUFF(), I'm using contactid to join which is what actually I'm trying to concat here. Since I'm using that condition for join, I am getting records in diff rows. I've removed these 2 condition and it worked.
WITH cte AS (
SELECT id, Contact, amount,contactid, ROW_NUMBER()
over (
PARTITION BY id
order by amount desc
) AS RowNo
FROM contacts
where contact is not null
)
select *from id, name,
STUFF ((
SELECT distinct '; ' + isnull(contact,'') FROM cte
WHERE co.id= cte.id
and RowNo <= 5
FOR XML PATH('')),1, 1, '')as contact
from cte where rowno <= 5
You can use conditional aggregation:
id, name, contact,
select id, name,
concat(max(case when seqnum = 1 then contact + ';' end),
max(case when seqnum = 2 then contact + ';' end),
max(case when seqnum = 3 then contact + ';' end),
max(case when seqnum = 4 then contact + ';' end),
max(case when seqnum = 5 then contact + ';' end)
) as contacts
from (select c.*
row_number() over (partition by id order by amount desc) as seqnum
from contacts c
where contact is not null
) c
group by id, name;
If you are running SQL Server 2017 or higher, you can use string_agg(): as most other aggregate functions, it ignores null values by design.
select id, name, string_agg(contact, ',') within group (order by rn) all_contacts
from (
select id, name, contact
row_number() over (partition by id order by amount desc) as rn
from contacts
where contact is not null
) t
where rn <= 5
group by id, name
Note that you don't strictly need a CTE here; you can return the columns you need from the subquery, and use them directly in the outer query.
In earlier versions, one approach using stuff() and for xml path is:
with cte as (
select id, name, contact,
row_number() over (partition by id order by amount desc) as rn
from contacts
where contact is not null
)
select id, name,
stuff(
(
select ', ' + c1.concat
from cte c1
where c1.id = c.id and c1.rn <= 5
order by c1.rn
for xml path (''), type
).value('.', 'varchar(max)'), 1, 2, ''
) all_contacts
from cte
group by id, name
I agree with #GMB. STRING_AGG() is what you need ...
WITH
contacts(Id,nm,ContactId,Contact,Amount) AS (
SELECT 1,'A',1,12323432,555
UNION ALL SELECT 1,'A',2,23432434,349
UNION ALL SELECT 2,'B',3,98867665,297
UNION ALL SELECT 2,'B',4,88867662,142
UNION ALL SELECT 2,'B',5,NULL ,698
)
,
with_filter_val AS (
SELECT
*
, ROW_NUMBER() OVER(PARTITION BY id ORDER BY amount DESC) AS rn
FROM contacts
)
SELECT
id
, nm
, STRING_AGG(CAST(contact AS CHAR(8)),',') AS contact_list
FROM with_filter_val
WHERE rn <=5
GROUP BY
id
, nm
-- out id | nm | contact_list
-- out ----+----+-------------------
-- out 1 | A | 12323432,23432434
-- out 2 | B | 98867665,88867662

Select non existing Numbers from Table each ID

I‘m new in learning TSQL and I‘m struggling getting the numbers that doesn‘t exist in my table each ID.
Example:
CustomerID Group
1 1
3 1
6 1
4 2
7 2
I wanna get the ID which does not exist and select them like this
CustomerID Group
2 1
4 1
5 1
5 2
6 2
....
..
The solution by usin a cte doesn‘t work well or inserting first the data and do a not exist where clause.
Any Ideas?
If you can live with ranges rather than a list with each one, then an efficient method uses lead():
select group_id, (customer_id + 1) as first_missing_customer_id,
(next_ci - 1) as last_missing_customer_id
from (select t.*,
lead(customer_id) over (partition by group_id order by customer_id) as next_ci
from t
) t
where next_ci <> customer_id + 1
Cross join 2 recursive CTEs to get all the possible combinations of [CustomerID] and [Group] and then LEFT join to the table:
declare #c int = (select max([CustomerID]) from tablename);
declare #g int = (select max([Group]) from tablename);
with
customers as (
select 1 as cust
union all
select cust + 1
from customers where cust < #c
),
groups as (
select 1 as gr
union all
select gr + 1
from groups where gr < #g
),
cte as (
select *
from customers cross join groups
)
select c.cust as [CustomerID], c.gr as [Group]
from cte c left join tablename t
on t.[CustomerID] = c.cust and t.[Group] = c.gr
where t.[CustomerID] is null
and c.cust > (select min([CustomerID]) from tablename where [Group] = c.gr)
and c.cust < (select max([CustomerID]) from tablename where [Group] = c.gr)
See the demo.
Results:
> CustomerID | Group
> ---------: | ----:
> 2 | 1
> 4 | 1
> 5 | 1
> 5 | 2
> 6 | 2

Find Common Rows for some Row Values in SQL

I have a table with Ids and a subId column. And I have a user defined data type with a list of SubIds. I want all those ids which have all the sub-ids present in my user-defined data type. for example:
The table is:
ID SubID
1 2
1 3
1 4
2 3
2 4
2 2
3 3
3 2
and the data type is
CREATE TYPE SubIds AS TABLE
( SubId INT );
GO
With Value
SubID
3
4
I want the output to be
ID
1
2
Because only the ID 1 and 2 contain both the subIds 3 & 4
Note: the combination of Id and Sub ID will always be unique if its of any use
Let's assume that #s is your table of ids:
select t.ID
from t
Where t.SubId in (select SubId from #s)
group by t.Id
having count(*) = (select count(*) from #s);
This assumes that the two tables do not have duplicates. If duplicates are present, you can use:
select t.ID
from t
Where t.SubId in (select SubId from #s)
group by t.Id
having count(distinct t.SubId) = (select count(distinct s.SubId) from #s s);
Try this way
select ID
from yourtable
Where SubID in (3,4)
Group by ID
having Count(distinct SubID)=2
Another more flexible approach
select ID
from yourtable
Group by ID
having sum(case when SubID = 3 then 1 else 0 end) >= 1
and sum(case when SubID = 4 then 1 else 0 end) >= 1
If you want to pull SubId's from SubIds table type then,
SELECT ID
FROM yourtable T
JOIN (SELECT SubID,
Count(1) OVER() AS cnt
FROM SubIds) S
ON T.SubID = S.SubID
GROUP BY ID,Cnt
HAVING Count(DISTINCT T.SubID) = s.cnt

Modify record using stored procedure

I am new with the stored procedure.
I have 4 million records so that manually cannot do that so use stored procedure.
I have a table like:
Id Name
-----------------
1 abc
2 xyz
3 abc
4 pqr
5 abc
6 pqr
And in that table one filed is called Name. In Name column, some are record are same name so I want to modify records and want like:
Id Name
---------------------
1 abc
2 xyz
3 abc-1
4 pqr
5 abc-2
6 pqr-1
& Insert it into another table which have same schema.
You can do this using an updatable CTE:
with toupdate as (
select t.*, row_number() over (partition by name order by id) as seqnum
from onetable t
)
update toupdate
set name = name + '-' + cast(seqnum - 1 as varchar(255))
where seqnum > 1;
Actually, that updates it in place. To put this into another table:
with toinsert as (
select t.*, row_number() over (partition by name order by id) as seqnum
from onetable t
)
select id,
(case when seqnum = 1 then name
else name + '-' + cast(seqnum - 1 as varchar(255))
end) as name
into newtable
from toinsert;
This will update the table
;WITH cte AS
(
SELECT id,
ROW_NUMBER() OVER(PARTITION BY Name ORDER BY Id) AS rno,
FROM table1
)
update t.Name = t.Name + '-'+ c.rno
from table1 t
join cte c on c.id = t.id
where c.rno >1
To insert simply use select with charindex
select * into Table2 from table1
where CHARINDEX('-',name) > 1

Group and tally values for each record in SQL [duplicate]

This question already has answers here:
How to use GROUP BY to concatenate strings in SQL Server?
(22 answers)
Closed 8 years ago.
Im trying to run a select statement to group records having similar IDs but also tally the values from another column for each master ID. So for example below. The result for each line will be the first instance unique ID and the 2 names shown from each record separated by semi colon. Thanks in advance.
Current set
ID Name Cnt
-------------------------------- ----------------- ---
0001D72BA5F664BE129B6AB5744E2BE0 Talati, Shilpa 1
0001D72BA5F664BE129B6AB5744E2BE0 Weaver, Larry 1
0007EAB7CE9A3F2F95D2D63D0BBD08A9 St-Hilaire, Edith 1
0007EAB7CE9A3F2F95D2D63D0BBD08A9 Talati, Shilpa 1
Result:
0001D72BA5F664BE129B6AB5744E2BE0 Talati, Shilpa; Weaver, Larry
The easiest way to solve this in SQL Server is:
select masterId, min(name) + '; ' + max(name)
from table t
group by masterId;
Here's one way using a recursive common table expression. Given a table like this:
create table dbo.Fizzbuzz
(
id int not null identity(1,1) primary key clustered ,
group_id int not null ,
name varchar(50) not null ,
cnt int not null ,
)
containing this data
id group_id name cnt
-- -------- ------ ---
1 1 Bob 3
2 1 Carol 5
3 1 Ted 6
4 1 Alice 16
5 2 Harold 72
6 2 Maude 28
This query
with recursive_cte as
(
select group_id = t.group_id ,
row = t.row ,
name = convert(varchar(8000),t.name) ,
cnt = t.cnt
from ( select * ,
row = row_number() over (
partition by group_id
order by id
)
from dbo.Fizzbuzz
) t
where t.row = 1
UNION ALL
select group_id = prv.group_id ,
row = nxt.row ,
name = convert(varchar(8000), prv.name + ' and ' + nxt.name ) ,
cnt = prv.cnt + nxt.cnt
from recursive_cte prv
join ( select * ,
row = row_number() over (
partition by group_id
order by id
)
from dbo.Fizzbuzz
) nxt on nxt.group_id = prv.group_id
and nxt.row = prv.row + 1
)
select group_id = t.group_id ,
total = t.cnt ,
names = t.name
from ( select * ,
rank = rank() over (
partition by group_id
order by row desc
)
from recursive_cte
) t
where rank = 1
order by group_id
produces the following output
group_id cnt name
-------- --- -------------------------------
1 30 Bob and Carol and Ted and Alice
2 100 Harold and Maude
One should note however, that the depth of recursion is bounded in SQL Server.
SELECT
t1.ID,
(SELECT Name + '; '
FROM yourtable t2
WHERE t1.ID = t2.ID
for xml path('')) as Name
FROM yourtable t1
GROUP BY t1.ID