Group and tally values for each record in SQL [duplicate] - sql

This question already has answers here:
How to use GROUP BY to concatenate strings in SQL Server?
(22 answers)
Closed 8 years ago.
Im trying to run a select statement to group records having similar IDs but also tally the values from another column for each master ID. So for example below. The result for each line will be the first instance unique ID and the 2 names shown from each record separated by semi colon. Thanks in advance.
Current set
ID Name Cnt
-------------------------------- ----------------- ---
0001D72BA5F664BE129B6AB5744E2BE0 Talati, Shilpa 1
0001D72BA5F664BE129B6AB5744E2BE0 Weaver, Larry 1
0007EAB7CE9A3F2F95D2D63D0BBD08A9 St-Hilaire, Edith 1
0007EAB7CE9A3F2F95D2D63D0BBD08A9 Talati, Shilpa 1
Result:
0001D72BA5F664BE129B6AB5744E2BE0 Talati, Shilpa; Weaver, Larry

The easiest way to solve this in SQL Server is:
select masterId, min(name) + '; ' + max(name)
from table t
group by masterId;

Here's one way using a recursive common table expression. Given a table like this:
create table dbo.Fizzbuzz
(
id int not null identity(1,1) primary key clustered ,
group_id int not null ,
name varchar(50) not null ,
cnt int not null ,
)
containing this data
id group_id name cnt
-- -------- ------ ---
1 1 Bob 3
2 1 Carol 5
3 1 Ted 6
4 1 Alice 16
5 2 Harold 72
6 2 Maude 28
This query
with recursive_cte as
(
select group_id = t.group_id ,
row = t.row ,
name = convert(varchar(8000),t.name) ,
cnt = t.cnt
from ( select * ,
row = row_number() over (
partition by group_id
order by id
)
from dbo.Fizzbuzz
) t
where t.row = 1
UNION ALL
select group_id = prv.group_id ,
row = nxt.row ,
name = convert(varchar(8000), prv.name + ' and ' + nxt.name ) ,
cnt = prv.cnt + nxt.cnt
from recursive_cte prv
join ( select * ,
row = row_number() over (
partition by group_id
order by id
)
from dbo.Fizzbuzz
) nxt on nxt.group_id = prv.group_id
and nxt.row = prv.row + 1
)
select group_id = t.group_id ,
total = t.cnt ,
names = t.name
from ( select * ,
rank = rank() over (
partition by group_id
order by row desc
)
from recursive_cte
) t
where rank = 1
order by group_id
produces the following output
group_id cnt name
-------- --- -------------------------------
1 30 Bob and Carol and Ted and Alice
2 100 Harold and Maude
One should note however, that the depth of recursion is bounded in SQL Server.

SELECT
t1.ID,
(SELECT Name + '; '
FROM yourtable t2
WHERE t1.ID = t2.ID
for xml path('')) as Name
FROM yourtable t1
GROUP BY t1.ID

Related

Get top 5 records for each group and Concate them in a Row per group

I have a table Contacts that basically looks like following:
Id | Name | ContactId | Contact | Amount
---------------------------------------------
1 | A | 1 | 12323432 | 555
---------------------------------------------
1 | A | 2 | 23432434 | 349
---------------------------------------------
2 | B | 3 | 98867665 | 297
--------------------------------------------
2 | B | 4 | 88867662 | 142
--------------------------------------------
2 | B | 5 | null | 698
--------------------------------------------
Here, ContactId is unique throughout the table. Contact can be NULL & I would like to exclude those.
Now, I want to select top 5 contacts for each Id based on their Amount. I am accomplished that by following query:
WITH cte AS (
SELECT id, Contact, amount, ROW_NUMBER()
over (
PARTITION BY id
order by amount desc
) AS RowNo
FROM contacts
where contact is not null
)
select *from cte where RowNo <= 5
It's working fine upto this point. Now I want to concate these (<=5) record for each group & show them in a single row by concatenating them.
Expected Result :
Id | Name | Contact
-------------------------------
1 | A | 12323432;23432434
-------------------------------
2 | B | 98867665;88867662
I am using following query to achieve this but it still gives all records in separate rows and also including Null values too:
WITH cte AS (
SELECT id, Contact, amount,contactid, ROW_NUMBER()
over (
PARTITION BY id
order by amount desc
) AS RowNo
FROM contacts
where contact is not null
)
select *from id, name,
STUFF ((
SELECT distinct '; ' + isnull(contact,'') FROM cte
WHERE co.id= cte.id and co.contactid= cte.contactid
and RowNo <= 5
FOR XML PATH('')),1, 1, '')as contact
from contacts co inner join cte where cte.id = co.id and co.contactid= cte.contactid
Above query still gives me all top 5 contacts in diff rows & including null too.
Is it a good idea to use CTE and STUFF togather? Please suggest if there is any better approach than this.
I got the problem with my final query:
I don't need original Contact table in my final Select, since I already have everything I needed in CTE. Also, Inside STUFF(), I'm using contactid to join which is what actually I'm trying to concat here. Since I'm using that condition for join, I am getting records in diff rows. I've removed these 2 condition and it worked.
WITH cte AS (
SELECT id, Contact, amount,contactid, ROW_NUMBER()
over (
PARTITION BY id
order by amount desc
) AS RowNo
FROM contacts
where contact is not null
)
select *from id, name,
STUFF ((
SELECT distinct '; ' + isnull(contact,'') FROM cte
WHERE co.id= cte.id
and RowNo <= 5
FOR XML PATH('')),1, 1, '')as contact
from cte where rowno <= 5
You can use conditional aggregation:
id, name, contact,
select id, name,
concat(max(case when seqnum = 1 then contact + ';' end),
max(case when seqnum = 2 then contact + ';' end),
max(case when seqnum = 3 then contact + ';' end),
max(case when seqnum = 4 then contact + ';' end),
max(case when seqnum = 5 then contact + ';' end)
) as contacts
from (select c.*
row_number() over (partition by id order by amount desc) as seqnum
from contacts c
where contact is not null
) c
group by id, name;
If you are running SQL Server 2017 or higher, you can use string_agg(): as most other aggregate functions, it ignores null values by design.
select id, name, string_agg(contact, ',') within group (order by rn) all_contacts
from (
select id, name, contact
row_number() over (partition by id order by amount desc) as rn
from contacts
where contact is not null
) t
where rn <= 5
group by id, name
Note that you don't strictly need a CTE here; you can return the columns you need from the subquery, and use them directly in the outer query.
In earlier versions, one approach using stuff() and for xml path is:
with cte as (
select id, name, contact,
row_number() over (partition by id order by amount desc) as rn
from contacts
where contact is not null
)
select id, name,
stuff(
(
select ', ' + c1.concat
from cte c1
where c1.id = c.id and c1.rn <= 5
order by c1.rn
for xml path (''), type
).value('.', 'varchar(max)'), 1, 2, ''
) all_contacts
from cte
group by id, name
I agree with #GMB. STRING_AGG() is what you need ...
WITH
contacts(Id,nm,ContactId,Contact,Amount) AS (
SELECT 1,'A',1,12323432,555
UNION ALL SELECT 1,'A',2,23432434,349
UNION ALL SELECT 2,'B',3,98867665,297
UNION ALL SELECT 2,'B',4,88867662,142
UNION ALL SELECT 2,'B',5,NULL ,698
)
,
with_filter_val AS (
SELECT
*
, ROW_NUMBER() OVER(PARTITION BY id ORDER BY amount DESC) AS rn
FROM contacts
)
SELECT
id
, nm
, STRING_AGG(CAST(contact AS CHAR(8)),',') AS contact_list
FROM with_filter_val
WHERE rn <=5
GROUP BY
id
, nm
-- out id | nm | contact_list
-- out ----+----+-------------------
-- out 1 | A | 12323432,23432434
-- out 2 | B | 98867665,88867662

MS-SQL max ID with inner join

Can't see the wood for the trees on this and I'm sure it's simple.
I'm trying to return the max ID for a related record in a joined table
Table1
NiD
Name
1
Peter
2
John
3
Arthur
Table2
ID
NiD
Value
1
1
5
2
2
10
3
3
10
4
1
20
5
2
15
Max Results
NiD
ID
Value
1
4
20
2
5
15
3
3
10
You can use row_number() for this:
select NiD, ID, Value
from (select t2.*,
row_number() over (partition by NiD order by ID desc) as seqnum
from table2 t2
) t2
where seqnum = 1;
As the question is stated, you do not need table1, because table2 has all the ids.
This is how I'd do it, I think ID and Value will be NULL when Table2 does not have a corresponding entry for a Table1 record:
SELECT NiD, ID, [Value]
FROM Table1
OUTER APPLY (
SELECT TOP 1 ID, [Value]
FROM Table2
WHERE Table1.NiD = Table2.NiD
ORDER BY [Value] DESC
) AS Top_Table2
CREATE TABLE Names
(
NID INT,
[Name] VARCHAR(MAX)
)
CREATE TABLE Results
(
ID INT,
NID INT,
VALUE INT
)
INSERT INTO Names VALUES (1,'Peter'),(2,'John'),(3,'Arthur')
INSERT INTO Results VALUES (1,1,5),(2,2,10),(3,3,10),(4,1,20),(5,2,15)
SELECT a.NID,
r.ID,
a.MaxVal
FROM (
SELECT NID,
MAX(VALUE) as MaxVal
FROM Results r
GROUP BY NID
) a
JOIN Results r
ON a.NID = r.NID AND a.MaxVal = r.VALUE
ORDER BY NID
Here's what I have used in similar situations, performance was fine, provided that the data set wasn't too large (under 1M rows).
SELECT
table1.nid
,table2.id
,table2.value
FROM table1
INNER JOIN table2 ON table1.nid = table2.nid
WHERE table2.value = (
SELECT MAX(value)
FROM table2
WHERE nid = table1.nid)
ORDER BY 1

Modify record using stored procedure

I am new with the stored procedure.
I have 4 million records so that manually cannot do that so use stored procedure.
I have a table like:
Id Name
-----------------
1 abc
2 xyz
3 abc
4 pqr
5 abc
6 pqr
And in that table one filed is called Name. In Name column, some are record are same name so I want to modify records and want like:
Id Name
---------------------
1 abc
2 xyz
3 abc-1
4 pqr
5 abc-2
6 pqr-1
& Insert it into another table which have same schema.
You can do this using an updatable CTE:
with toupdate as (
select t.*, row_number() over (partition by name order by id) as seqnum
from onetable t
)
update toupdate
set name = name + '-' + cast(seqnum - 1 as varchar(255))
where seqnum > 1;
Actually, that updates it in place. To put this into another table:
with toinsert as (
select t.*, row_number() over (partition by name order by id) as seqnum
from onetable t
)
select id,
(case when seqnum = 1 then name
else name + '-' + cast(seqnum - 1 as varchar(255))
end) as name
into newtable
from toinsert;
This will update the table
;WITH cte AS
(
SELECT id,
ROW_NUMBER() OVER(PARTITION BY Name ORDER BY Id) AS rno,
FROM table1
)
update t.Name = t.Name + '-'+ c.rno
from table1 t
join cte c on c.id = t.id
where c.rno >1
To insert simply use select with charindex
select * into Table2 from table1
where CHARINDEX('-',name) > 1

Any other alternative to write this SQL query

I need to select data base upon three conditions
Find the latest date (StorageDate Column) from the table for each record
See if there is more then one entry for date (StorageDate Column) found in first step for same ID (ID Column)
and then see if DuplicateID is = 2
So if table has following data:
ID |StorageDate | DuplicateTypeID
1 |2014-10-22 | 1
1 |2014-10-22 | 2
1 |2014-10-18 | 1
2 |2014-10-12 | 1
3 |2014-10-11 | 1
4 |2014-09-02 | 1
4 |2014-09-02 | 2
Then I should get following results
ID
1
4
I have written following query but it is really slow, I was wondering if anyone has better way to write it.
SELECT DISTINCT(TD.RecordID)
FROM dbo.MyTable TD
JOIN (
SELECT T1.RecordID, T2.MaxDate,COUNT(*) AS RecordCount
FROM MyTable T1 WITH (nolock)
JOIN (
SELECT RecordID, MAX(StorageDate) AS MaxDate
FROM MyTable WITH (nolock)
GROUP BY RecordID)T2
ON T1.RecordID = T2.RecordID AND T1.StorageDate = T2.MaxDate
GROUP BY T1.RecordID, T2.MaxDate
HAVING COUNT(*) > 1
)PT ON TD.RecordID = PT.RecordID AND TD.StorageDate = PT.MaxDate
WHERE TD.DuplicateTypeID = 2
Try this and see how the performance goes:
;WITH
tmp AS
(
SELECT *,
RANK() OVER (PARTITION BY ID ORDER BY StorageDate DESC) AS StorageDateRank,
COUNT(ID) OVER (PARTITION BY ID, StorageDate) AS StorageDateCount
FROM MyTable
)
SELECT DISTINCT ID
FROM tmp
WHERE StorageDateRank = 1 -- latest date for each ID
AND StorageDateCount > 1 -- more than 1 entry for date
AND DuplicateTypeID = 2 -- DuplicateTypeID = 2
You can use analytic function rank , can you try this query ?
Select recordId from
(
select *, rank() over ( partition by recordId order by [StorageDate] desc) as rn
from mytable
) T
where rn =1
group by recordId
having count(*) >1
and sum( case when duplicatetypeid =2 then 1 else 0 end) >=1

Help me with my select query

I have this table in sql server 2005:
id student active
1 Bob 1
3 Rob 0
5 Steve 1
7 John 1
8 Mark 0
10 Dave 0
16 Nick 1
My select query returns an active student by a given id.
But I also want to return the ids of prev and next student who are active. If no prev, it will be 0 or null. Same for next.
Example: for id=5, my select would return
id student prev_id next_id
5 steve 1 7
Example: for id=7, my select would return
id student prev_id next_id
7 John 5 16
Example: for id=16, my select would return
id student prev_id next_id
16 Nick 7 0
How do I write this select query?
I have query but I just can't get the prev id correctly. It always returns the first active id.
Thanks
EDIT:
Here is the query I have right now.
select id, student,
(select top 1 id from test where id<7 and active=1) as prev,
(select top 1 id from test where id>7 and active=1) as next
from test where id=7--I used 7 just as an example. it will be a parameter
try something like this
SELECT ID,
Student,
( SELECT TOP 1
ID
FROM dbo.table AS pT
WHERE pT.ID < T.ID And Active = 1
ORDER BY ID DESC
) AS PrevID,
( SELECT TOP 1
ID
FROM dbo.table AS pT
WHERE pT.ID > T.ID And Active = 1
ORDER BY ID
) AS NextID
FROM dbo.table AS T
Working sample
DECLARE #T TABLE (id int, student varchar(10), active bit)
insert #t select
1 ,'Bob', 1 union all select
3 ,'Rob', 0 union all select
5 ,'Steve', 1 union all select
7 ,'John', 1 union all select
8 ,'Mark', 0 union all select
10 ,'Dave', 0 union all select
16 ,'Nick', 1
---- your query starts below this line
declare #id int set #id = 5
select id, student,
isnull((select top(1) Prev.id from #T Prev
where Prev.id < T.id and Prev.active=1
order by Prev.id desc),0) Prev,
isnull((select top(1) Next.id from #T Next
where Next.id > T.id and Next.active=1
order by Next.id),0) Next
from #T T
where id = #id
The ISNULLs are to return 0 when there is no match - NULL would have worked fine but your question has 0 when there is no Next.
You may want to take a look at Common Table Expression, a feature for only SQL Server for recursive queries, you can find a link here
But this sound like homework, and probebly not the right forum to ask it in.
Regards
You could use a nested query. I obviously can't test this out, but you should get the idea.
SELECT id, student ,
(SELECT C1.id FROM students S1 WHERE S1.active = 1 AND S1.id < S.id LIMIT 1) AS beforeActive,
(SELECT C2.id FROM categories S2 WHERE S2.active = 1 AND S2.id > S.id LIMIT 1) AS afterActive
FROM students S
Efficiency wise, I've no idea how well this query will perform
This will give you a little more control, especially since you are paginating.
WITH NumberedSet AS (
SELECT s.id,
s.student,
row_number() OVER (ORDER BY s.id) AS rownum
FROM dbo.students AS s
WHERE s.active = 1
)
SELECT cur.id,
cur.student,
isnull(prv.id,0) AS prev_id,
isnull(nxt.id,0) AS next_id
FROM NumberedSet AS cur
LEFT JOIN NumberedSet AS prv ON cur.rownum - 1 = prv.rownum
LEFT JOIN NumberedSet AS nxt ON cur.rownum + 1 = nxt.rownum
;