Modify record using stored procedure - sql

I am new with the stored procedure.
I have 4 million records so that manually cannot do that so use stored procedure.
I have a table like:
Id Name
-----------------
1 abc
2 xyz
3 abc
4 pqr
5 abc
6 pqr
And in that table one filed is called Name. In Name column, some are record are same name so I want to modify records and want like:
Id Name
---------------------
1 abc
2 xyz
3 abc-1
4 pqr
5 abc-2
6 pqr-1
& Insert it into another table which have same schema.

You can do this using an updatable CTE:
with toupdate as (
select t.*, row_number() over (partition by name order by id) as seqnum
from onetable t
)
update toupdate
set name = name + '-' + cast(seqnum - 1 as varchar(255))
where seqnum > 1;
Actually, that updates it in place. To put this into another table:
with toinsert as (
select t.*, row_number() over (partition by name order by id) as seqnum
from onetable t
)
select id,
(case when seqnum = 1 then name
else name + '-' + cast(seqnum - 1 as varchar(255))
end) as name
into newtable
from toinsert;

This will update the table
;WITH cte AS
(
SELECT id,
ROW_NUMBER() OVER(PARTITION BY Name ORDER BY Id) AS rno,
FROM table1
)
update t.Name = t.Name + '-'+ c.rno
from table1 t
join cte c on c.id = t.id
where c.rno >1
To insert simply use select with charindex
select * into Table2 from table1
where CHARINDEX('-',name) > 1

Related

Get top 5 records for each group and Concate them in a Row per group

I have a table Contacts that basically looks like following:
Id | Name | ContactId | Contact | Amount
---------------------------------------------
1 | A | 1 | 12323432 | 555
---------------------------------------------
1 | A | 2 | 23432434 | 349
---------------------------------------------
2 | B | 3 | 98867665 | 297
--------------------------------------------
2 | B | 4 | 88867662 | 142
--------------------------------------------
2 | B | 5 | null | 698
--------------------------------------------
Here, ContactId is unique throughout the table. Contact can be NULL & I would like to exclude those.
Now, I want to select top 5 contacts for each Id based on their Amount. I am accomplished that by following query:
WITH cte AS (
SELECT id, Contact, amount, ROW_NUMBER()
over (
PARTITION BY id
order by amount desc
) AS RowNo
FROM contacts
where contact is not null
)
select *from cte where RowNo <= 5
It's working fine upto this point. Now I want to concate these (<=5) record for each group & show them in a single row by concatenating them.
Expected Result :
Id | Name | Contact
-------------------------------
1 | A | 12323432;23432434
-------------------------------
2 | B | 98867665;88867662
I am using following query to achieve this but it still gives all records in separate rows and also including Null values too:
WITH cte AS (
SELECT id, Contact, amount,contactid, ROW_NUMBER()
over (
PARTITION BY id
order by amount desc
) AS RowNo
FROM contacts
where contact is not null
)
select *from id, name,
STUFF ((
SELECT distinct '; ' + isnull(contact,'') FROM cte
WHERE co.id= cte.id and co.contactid= cte.contactid
and RowNo <= 5
FOR XML PATH('')),1, 1, '')as contact
from contacts co inner join cte where cte.id = co.id and co.contactid= cte.contactid
Above query still gives me all top 5 contacts in diff rows & including null too.
Is it a good idea to use CTE and STUFF togather? Please suggest if there is any better approach than this.
I got the problem with my final query:
I don't need original Contact table in my final Select, since I already have everything I needed in CTE. Also, Inside STUFF(), I'm using contactid to join which is what actually I'm trying to concat here. Since I'm using that condition for join, I am getting records in diff rows. I've removed these 2 condition and it worked.
WITH cte AS (
SELECT id, Contact, amount,contactid, ROW_NUMBER()
over (
PARTITION BY id
order by amount desc
) AS RowNo
FROM contacts
where contact is not null
)
select *from id, name,
STUFF ((
SELECT distinct '; ' + isnull(contact,'') FROM cte
WHERE co.id= cte.id
and RowNo <= 5
FOR XML PATH('')),1, 1, '')as contact
from cte where rowno <= 5
You can use conditional aggregation:
id, name, contact,
select id, name,
concat(max(case when seqnum = 1 then contact + ';' end),
max(case when seqnum = 2 then contact + ';' end),
max(case when seqnum = 3 then contact + ';' end),
max(case when seqnum = 4 then contact + ';' end),
max(case when seqnum = 5 then contact + ';' end)
) as contacts
from (select c.*
row_number() over (partition by id order by amount desc) as seqnum
from contacts c
where contact is not null
) c
group by id, name;
If you are running SQL Server 2017 or higher, you can use string_agg(): as most other aggregate functions, it ignores null values by design.
select id, name, string_agg(contact, ',') within group (order by rn) all_contacts
from (
select id, name, contact
row_number() over (partition by id order by amount desc) as rn
from contacts
where contact is not null
) t
where rn <= 5
group by id, name
Note that you don't strictly need a CTE here; you can return the columns you need from the subquery, and use them directly in the outer query.
In earlier versions, one approach using stuff() and for xml path is:
with cte as (
select id, name, contact,
row_number() over (partition by id order by amount desc) as rn
from contacts
where contact is not null
)
select id, name,
stuff(
(
select ', ' + c1.concat
from cte c1
where c1.id = c.id and c1.rn <= 5
order by c1.rn
for xml path (''), type
).value('.', 'varchar(max)'), 1, 2, ''
) all_contacts
from cte
group by id, name
I agree with #GMB. STRING_AGG() is what you need ...
WITH
contacts(Id,nm,ContactId,Contact,Amount) AS (
SELECT 1,'A',1,12323432,555
UNION ALL SELECT 1,'A',2,23432434,349
UNION ALL SELECT 2,'B',3,98867665,297
UNION ALL SELECT 2,'B',4,88867662,142
UNION ALL SELECT 2,'B',5,NULL ,698
)
,
with_filter_val AS (
SELECT
*
, ROW_NUMBER() OVER(PARTITION BY id ORDER BY amount DESC) AS rn
FROM contacts
)
SELECT
id
, nm
, STRING_AGG(CAST(contact AS CHAR(8)),',') AS contact_list
FROM with_filter_val
WHERE rn <=5
GROUP BY
id
, nm
-- out id | nm | contact_list
-- out ----+----+-------------------
-- out 1 | A | 12323432,23432434
-- out 2 | B | 98867665,88867662

Need to delete duplicate records from the table using row_number()

I am having a table test having data as follows and I want to delete the trsid 124 and I have millions entry in my DB it is just a scenarion. Concept is to delete the duplicate entry from the table
--------------------------------------------
TrsId | ID | Name |
--------------------------------------------
123 | 1 | ABC |
124 | 1 | ABC |
I am trying something like
delete from test
select T.* from
(
select ROW_NUMBER() over (partition by ID order by name) as r,
Trsid,
ID,
name
from test
) t
where r = 2
Even if I update the query which is Ok for me
update test set id=NULL
select T.* from
(
select ROW_NUMBER() over (partition by ID order by name) as r,
Trsid,
ID,
name
from test
) t
where r = 2
But if i run both this query it deletes all the records from table test. And if i update it update both the records.
I dont know what I am doing wrong here
WITH cte AS
(
SELECT ROW_NUMBER() OVER(PARTITION by ID ORDER BY name) AS Row
FROM test
)
DELETE FROM cte
WHERE Row > 1
Use the below query.
;WITH cte_1
AS (SELECT ROW_NUMBER() OVER(PARTITION BY ID,NAME ORDER BY TrsId ) Rno,*
FROM YourTable)
DELETE
FROM cte_1
WHERE RNO>1
WITH cte_DUP AS (
SELECT * FROM (
select <col1,col2,col3..coln>, row_number()
over(partition by <col1,col2,col3..coln>
order by <col1,col2,col3..coln> ) rownumber
from <your table> ) AB WHERE rownumber > 1)
DELETE FROM cte_DUP WHERE ROWNUMBER > 1
To find duplicate records we can write like below query,
;WITH dup_val
AS (SELECT a,
b,
Row_number()
OVER(
partition BY a, b
ORDER BY b, NAME)AS [RANK]
FROM table_name)
SELECT *
FROM dup_val
WHERE [rank] <> 1;

Any other alternative to write this SQL query

I need to select data base upon three conditions
Find the latest date (StorageDate Column) from the table for each record
See if there is more then one entry for date (StorageDate Column) found in first step for same ID (ID Column)
and then see if DuplicateID is = 2
So if table has following data:
ID |StorageDate | DuplicateTypeID
1 |2014-10-22 | 1
1 |2014-10-22 | 2
1 |2014-10-18 | 1
2 |2014-10-12 | 1
3 |2014-10-11 | 1
4 |2014-09-02 | 1
4 |2014-09-02 | 2
Then I should get following results
ID
1
4
I have written following query but it is really slow, I was wondering if anyone has better way to write it.
SELECT DISTINCT(TD.RecordID)
FROM dbo.MyTable TD
JOIN (
SELECT T1.RecordID, T2.MaxDate,COUNT(*) AS RecordCount
FROM MyTable T1 WITH (nolock)
JOIN (
SELECT RecordID, MAX(StorageDate) AS MaxDate
FROM MyTable WITH (nolock)
GROUP BY RecordID)T2
ON T1.RecordID = T2.RecordID AND T1.StorageDate = T2.MaxDate
GROUP BY T1.RecordID, T2.MaxDate
HAVING COUNT(*) > 1
)PT ON TD.RecordID = PT.RecordID AND TD.StorageDate = PT.MaxDate
WHERE TD.DuplicateTypeID = 2
Try this and see how the performance goes:
;WITH
tmp AS
(
SELECT *,
RANK() OVER (PARTITION BY ID ORDER BY StorageDate DESC) AS StorageDateRank,
COUNT(ID) OVER (PARTITION BY ID, StorageDate) AS StorageDateCount
FROM MyTable
)
SELECT DISTINCT ID
FROM tmp
WHERE StorageDateRank = 1 -- latest date for each ID
AND StorageDateCount > 1 -- more than 1 entry for date
AND DuplicateTypeID = 2 -- DuplicateTypeID = 2
You can use analytic function rank , can you try this query ?
Select recordId from
(
select *, rank() over ( partition by recordId order by [StorageDate] desc) as rn
from mytable
) T
where rn =1
group by recordId
having count(*) >1
and sum( case when duplicatetypeid =2 then 1 else 0 end) >=1

Group and tally values for each record in SQL [duplicate]

This question already has answers here:
How to use GROUP BY to concatenate strings in SQL Server?
(22 answers)
Closed 8 years ago.
Im trying to run a select statement to group records having similar IDs but also tally the values from another column for each master ID. So for example below. The result for each line will be the first instance unique ID and the 2 names shown from each record separated by semi colon. Thanks in advance.
Current set
ID Name Cnt
-------------------------------- ----------------- ---
0001D72BA5F664BE129B6AB5744E2BE0 Talati, Shilpa 1
0001D72BA5F664BE129B6AB5744E2BE0 Weaver, Larry 1
0007EAB7CE9A3F2F95D2D63D0BBD08A9 St-Hilaire, Edith 1
0007EAB7CE9A3F2F95D2D63D0BBD08A9 Talati, Shilpa 1
Result:
0001D72BA5F664BE129B6AB5744E2BE0 Talati, Shilpa; Weaver, Larry
The easiest way to solve this in SQL Server is:
select masterId, min(name) + '; ' + max(name)
from table t
group by masterId;
Here's one way using a recursive common table expression. Given a table like this:
create table dbo.Fizzbuzz
(
id int not null identity(1,1) primary key clustered ,
group_id int not null ,
name varchar(50) not null ,
cnt int not null ,
)
containing this data
id group_id name cnt
-- -------- ------ ---
1 1 Bob 3
2 1 Carol 5
3 1 Ted 6
4 1 Alice 16
5 2 Harold 72
6 2 Maude 28
This query
with recursive_cte as
(
select group_id = t.group_id ,
row = t.row ,
name = convert(varchar(8000),t.name) ,
cnt = t.cnt
from ( select * ,
row = row_number() over (
partition by group_id
order by id
)
from dbo.Fizzbuzz
) t
where t.row = 1
UNION ALL
select group_id = prv.group_id ,
row = nxt.row ,
name = convert(varchar(8000), prv.name + ' and ' + nxt.name ) ,
cnt = prv.cnt + nxt.cnt
from recursive_cte prv
join ( select * ,
row = row_number() over (
partition by group_id
order by id
)
from dbo.Fizzbuzz
) nxt on nxt.group_id = prv.group_id
and nxt.row = prv.row + 1
)
select group_id = t.group_id ,
total = t.cnt ,
names = t.name
from ( select * ,
rank = rank() over (
partition by group_id
order by row desc
)
from recursive_cte
) t
where rank = 1
order by group_id
produces the following output
group_id cnt name
-------- --- -------------------------------
1 30 Bob and Carol and Ted and Alice
2 100 Harold and Maude
One should note however, that the depth of recursion is bounded in SQL Server.
SELECT
t1.ID,
(SELECT Name + '; '
FROM yourtable t2
WHERE t1.ID = t2.ID
for xml path('')) as Name
FROM yourtable t1
GROUP BY t1.ID

Rename duplicate rows

Here's a simplified example of my problem. I have a table where there's a "Name" column with duplicate entries:
ID Name
--- ----
1 AAA
2 AAA
3 AAA
4 BBB
5 CCC
6 CCC
7 DDD
8 DDD
9 DDD
10 DDD
Doing a GROUP BY like SELECT Name, COUNT(*) AS [Count] FROM Table GROUP BY Name results in this:
Name Count
---- -----
AAA 3
BBB 1
CCC 2
DDD 4
I'm only concerned about the duplicates, so I'll add a HAVING clause, SELECT Name, COUNT(*) AS [Count] FROM Table GROUP BY Name HAVING COUNT(*) > 1:
Name Count
---- -----
AAA 3
CCC 2
DDD 4
Trivial so far, but now things get tricky: I need a query to get me all the duplicate records, but with a nice incrementing indicator added to the Name column. The result should look something like this:
ID Name
--- --------
1 AAA
2 AAA (2)
3 AAA (3)
5 CCC
6 CCC (2)
7 DDD
8 DDD (2)
9 DDD (3)
10 DDD (4)
Note row 4 with "BBB" is excluded, and the first duplicate keeps the original Name.
Using an EXISTS statement gives me all the records I need, but how do I go about creating the new Name value?
SELECT * FROM Table AS T1
WHERE EXISTS (
SELECT Name, COUNT(*) AS [Count]
FROM Table
GROUP BY Name
HAVING (COUNT(*) > 1) AND (Name = T1.Name))
ORDER BY Name
I need to create an UPDATE statement that will fix all the duplicates, i.e. change the Name as per this pattern.
Update:
Figured it out now. It was the PARTITION BY clause I was missing.
With Dups As
(
Select Id, Name
, Row_Number() Over ( Partition By Name Order By Id ) As Rnk
From Table
)
Select D.Id
, D.Name + Case
When D.Rnk > 1 Then ' (' + Cast(D.Rnk As varchar(10)) + ')'
Else ''
End As Name
From Dups As D
If you want an update statement you can use pretty much the same structure:
With Dups As
(
Select Id, Name
, Row_Number() Over ( Partition By Name Order By Id ) As Rnk
From Table
)
Update Table
Set Name = T.Name + Case
When D.Rnk > 1 Then ' (' + Cast(D.Rnk As varchar(10)) + ')'
Else ''
End
From Table As T
Join Dups As D
On D.Id = T.Id
Just update the subquery directly:
update d
set Name = Name+'('+cast(r as varchar(10))+')'
from ( select Name,
row_number() over (partition by Name order by Name) as r
from [table]
) d
where r > 1
SELECT ROW_NUMBER() OVER(ORDER BY Name) AS RowNum,
Name,
Name + '(' + ROW_NUMBER() OVER(PARTITION BY Name ORDER BY Name) + ')' concatenatedName
FROM Table
WHERE Name IN
(
SELECT Name
FROM Table
GROUP BY Name
HAVING COUNT(*) > 1
)
This will get you what you originally asked for. For the update statement, you'll want to do a while and update the top 1
DECLARE #Pointer VARCHAR(20), #Count INT
WHILE EXISTS(SELECT Name FROM Table GROUP BY Name HAVING COUNT(1) > 1)
BEGIN
SELECT TOP 1 #Pointer = Name, #Count = COUNT(1) FROM Table GROUP BY Name HAVING COUNT(1) > 1
UPDATE TOP (1) TABLE
SET Name = Name + '(' + #Count + ')'
WHERE Name = #Pointer
END
There's no need to do an UPDATE at all. The following will create the table for INSERT as desired
SELECT
ROW_NUMBER() OVER(ORDER BY tb2.Id) Id,
tb2.Name + CASE WHEN COUNT(*) > 1 THEN ' (' + CONVERT(VARCHAR, Count(*)) + ')' ELSE '' END [Name]
FROM
tb tb1,
tb tb2
WHERE
tb1.Name = tb2.Name AND
tb1.Id <= tb2.Id
GROUP BY
tb2.Name,
tb2.Id
Here's an even simpler UPDATE statement:
UPDATE
tb
SET
[Name] = [Name] + ' (' + CONVERT(VARCHAR, ROW_NUMBER () OVER (PARTITION BY [Name] ORDER BY Id)) + ')'
WHERE
ROW_NUMBER () OVER (PARTITION BY [Name] ORDER BY Id) > 1