Concatenate multiple rows from inside a correlated 'group by' subquery into a single text string - sql

Similar questions have been asked before but I am specifically looking for an answer to do much the same with a correlated subquery.
I am doing this on SQL Server, and I cannot utilize stored procedure or temp table creation approach.
For those familiar with Client Matter billing; I have formulated a 'group by' query using row_number technique to return me back the top 3 performers for each unique clientmatter, summing their amounts over a period of time.
This gives me something like this:
clientmatterno attorneyname amount seq_num
111111.00001 John Doe $30,000 1
111111.00001 Mark Tim $23,000 2
111111.00001 Jane Sue $15,000 3
111111.00001 Mary Ann $5,000 4
222221.00501 John Doe $35,000 1
222221.00501 David Hu $30,000 2
444444.00003 Shelly Y $50,000 1
I think, I would have to first do a group by clause to sum up the amounts for each attorney in order to find the totals and hence get the correct seq_num to appear across.
I am now trying to use this subquery results to do the string concatenation such that I get the following results:
111111.00001 John Doe|Mark Tim|Jane Sue
222221.00501 John Doe|David Hu
444444.00003 Shelly Y
The Query that I think will work, seeing past questions on this topic:
select subq.clientmatterno as [Id],
,
STUFF(
(SELECT DISTINCT ',' + subq.attorneyname
FROM ????
WHERE ????
FOR XML PATH (''))
, 1, 1, '') AS TopPerformers
from (
SELECT clientmatterno, attorneyname, sum(amount),
row_number() over (partition by clientmatterno order by sum(amount) desc) as seq_num
FROM ...
WHERE ...
GROUP BY clientmatterno, attorneyname
) as subq
where seq_num <= 3
group by clientmatterno
My problem is on how to connect and build up the STUFF function. The error is very simple: I cannot seem to use the subquery set 'subq' in the FROM clause inside the STUFF function.
I have not tried out XML FOR Auto approach.

Try using a common table expression instead of a derived table:
with cte as (
SELECT
clientmatterno,
attorneyname,
sum(amount) amount,
seq = row_number() over (partition by clientmatterno order by sum(amount) desc)
FROM ...
WHERE ...
GROUP BY clientmatterno, attorneyname
)
SELECT
clientmatterno,
STUFF(
(
SELECT '|' + attorneyname
FROM cte
WHERE clientmatterno = a.clientmatterno
AND seq <= 3
FOR XML PATH ('')
), 1, 1, ''
) AS Attorneynames
FROM cte AS a
GROUP BY clientmatterno

Related

SQL Select, is it possible to merge specific dupe rows into 1 based on a key?

I'm a little stumped on this. I have a table that looks like the following:
Group_Key Trigger_Type Event_Type Result_Id
1 A A 1
2 B B 2
3 C C 3
3 C C 4
4 E E 5
5 F F 6
5 F F 7
There are rows that will have the same survey (all columns should be the same aside from result_id) key but they will have a different result_Id. Is it possible to do a select on the table that grabs the rows and instead of returning 2 rows because of the result_id, it groups those ones that have dupes into a single row with the result_id being a concatenated string? So for instance, return this:
Group_Key Trigger_Type Event_Type Result_Id
1 A A 1
2 B B 2
3 C C 3,4
4 E E 5
5 F F 6,7
Is this possible?
Thank you,
Here's an example using a recursive CTE to replicate the functionality of string_agg(). This example is from the upsert scripts for execsql, and was written by Elizabeth Shea. It will have to be modified for your particular use, substituting your own column names for the execsql variable references.
if object_id('tempdb..#agg_string') is not null drop table #agg_string;
with enum as
(
select
cast(!!#string_col!! as varchar(max)) as agg_string,
row_number() over (order by !!#order_col!!) as row_num
from
!!#table_name!!
),
agg as
(
select
one.agg_string,
one.row_num
from
enum as one
where
one.row_num=1
UNION ALL
select
agg.agg_string + '!!#delimiter!!' + enum.agg_string as agg_string,
enum.row_num
from
agg, enum
where
enum.row_num=agg.row_num+1
)
select
agg_string
into #agg_string
from agg
where row_num=(select max(row_num) from agg);
Using Gordon Linoff's hint you can GROUP BY the values that should be the same and concatenate the other values in one row using STRING_AGG:
SELECT Group_Key, Trigger_Type, Event_Type, STRING_AGG(Result_Id, ',') as ResultId
FROM myTable
GROUP BY Group_Key, Trigger_Type, Event_Type
I would add the ordering of values to MicSim's solutions as follows:
select Group_Key, Trigger_Type, Event_Type,
string_agg(Result_Id, ',') within group (order by Result_Id)
from survey
group by Group_Key, Trigger_Type, Event_Type
MS-SQL Server will not recognise STRING_AGG function. Try stuff() as shown below:
SELECT Group_Key ,Trigger_Type, Event_Type,
STUFF(
(SELECT CONCAT( Result_Id , ', ') AS [text()]
FROM [dbo].[TestTable] t2
WHERE t1.Group_Key = t2.Group_Key
AND t1.Trigger_Type = t2.Trigger_Type
AND t1.Event_Type= t2.Event_Type
ORDER BY Group_Key ,Trigger_Type, Event_Type
FOR XML PATH('')), 1, 0, '') AS Result_Id
FROM [dbo].[TestTable] t1
GROUP BY Group_Key ,Trigger_Type, Event_Type
Hope this helps.

Complex SQL query or queries

I looked at other examples, but I don't know enough about SQL to adapt it to my needs. I have a table that looks like this:
ID Month NAME COUNT First LAST TOTAL
------------------------------------------------------
1 JAN2013 fred 4
2 MAR2013 fred 5
3 APR2014 fred 1
4 JAN2013 Tom 6
5 MAR2014 Tom 1
6 APR2014 Tom 1
This could be in separate queries, but I need 'First' to equal the first month that a particular name is used, so every row with fred would have JAN2013 in the first field for example. I need the 'Last" column to equal the month of the last record of each name, and finally I need the 'total' column to be the sum of all the counts for each name, so in each row that had fred the total would be 10 in this sample data. This is over my head. Can one of you assist?
This is crude but should do the trick. I renamed your fields a bit because you are using a bunch of "RESERVED" sql words and that is bad form.
;WITH cte as
(
Select
[NAME]
,[nmCOUNT]
,ROW_NUMBER() over (partition by NAME order by txtMONTH ASC) as 'FirstMonth'
,ROW_NUMBER() over (partition by NAME order by txtMONTH DESC) as 'LastMonth'
,SUM([nmCOUNT]) as 'TotNameCount'
From Table
Group by NAME, [nmCOUNT]
)
,cteFirst as
(
Select
NAME
,[nmCOUNT]
,[TotNameCount]
,[txtMONTH] as 'ansFirst'
From cte
Where FirstMonth = 1
)
,cteLast as
(
Select
NAME
,[txtMONTH] as 'ansLast'
From cte
Where LastMonth = 1
Select c.NAME, c.nmCount, c.ansFirst, l.ansLast, c.TotNameCount
From cteFirst c
LEFT JOIN cteLast l on c.NAME = l.NAME

How to concatenate rows delimited with comma using standard SQL?

Let's suppose we have a table T1 and a table T2. There is a relation of 1:n between T1 and T2. I would like to select all T1 along with all their T2, every row corresponding to T1 records with T2 values concatenated, using only SQL-standard operations.
Example:
T1 = Person
T2 = Popularity (by year)
for each year a person has a certain popularity
I would like to write a selection using SQL-standard operations, resulting something like this:
Person.Name Popularity.Value
John Smith 1.2,5,4.2
John Doe NULL
Jane Smith 8
where there are 3 records in the popularity table for John Smith, none for John Doe and one for Jane Smith, their values being the values represented above. Is this possible? How?
I'm using Oracle but would like to do this using only standard SQL.
Here's one technique, using recursive Common Table Expressions. Unfortunately, I'm not confident on its performance.
I'm sure that there are ways to improve this code, but it shows that there doesn't seem to be an easy way to do something like this using just the SQL standard.
As far as I can see, there really should be some kind of STRINGJOIN aggregate function that would be used with GROUP BY. That would make things like this much easier...
This query assumes that there is some kind of PersonID that joins the two relations, but the Name would work too.
WITH cte (id, Name, Value, ValueCount) AS (
SELECT id,
Name,
CAST(Value AS VARCHAR(MAX)) AS Value,
1 AS ValueCount
FROM (
SELECT ROW_NUMBER() OVER (PARTITION BY Name ORDER BY Name) AS id,
Name,
Value
FROM Person AS per
INNER JOIN Popularity AS pop
ON per.PersonID = pop.PersonID
) AS e
WHERE id = 1
UNION ALL
SELECT e.id,
e.Name,
cte.Value + ',' + CAST(e.Value AS VARCHAR(MAX)) AS Value,
cte.ValueCount + 1 AS ValueCount
FROM (
SELECT ROW_NUMBER() OVER (PARTITION BY Name ORDER BY Name) AS id,
Name,
Value
FROM Person AS per
INNER JOIN Popularity AS pop
ON per.PersonID = pop.PersonID
) AS e
INNER JOIN cte
ON e.id = cte.id + 1
AND e.Name = cte.Name
)
SELECT p.Name, agg.Value
FROM Person p
LEFT JOIN (
SELECT Name, Value
FROM (
SELECT Name,
Value,
ROW_NUMBER() OVER (PARTITION BY Name ORDER BY ValueCount DESC)AS id
FROM cte
) AS p
WHERE id = 1
) AS agg
ON p.Name = agg.Name
This is an example result:
--------------------------------
| Name | Value |
--------------------------------
| John Smith | 1.2,5,4.2 |
--------------------------------
| John Doe | NULL |
--------------------------------
| Jane Smith | 8 |
--------------------------------
As per in Oracle you can use listagg to achive this -
select t1.Person_Name, listagg(t2.Popularity_Value)
within group(order by t2.Popularity_Value)
from t1, t2
where t1.Person_Name = t2.Person_Name (+)
group by t1.Person_Name
I hope this will solve your problem.
But the comment you have given after #DavidJashi question .. well this is not sql standard and I think he is correct. I am also with David that you can not achieve this in pure sql statement.
I know that I'm SUPER late to the party, but for anyone else that might find this, I don't believe that this is possible using pure SQL92. As I discovered in the last few months fighting with NetSuite to try to figure out what Oracle methods I can and cannot use with their ODBC driver, I discovered that they only "support and guarantee" SQL92 standard.
I discovered this, because I had a need to perform a LISTAGG(). Once I found out I was restricted to SQL92, I did some digging through the historical records, and LISTAGG() and recursive queries (common table expressions) are NOT supported in SQL92, at all.
LISTAGG() was added in Oracle SQL version 11g Release 2 (2009 – 11 years ago: reference https://oracle-base.com/articles/misc/string-aggregation-techniques#listagg) , CTEs were added to Oracle SQL in version 9.2 (2007 – 13 years ago: reference https://www.databasestar.com/sql-cte-with/).
VERY frustrating that it's completely impossible to accomplish this kind of effect in pure SQL92, so I had to solve the problem in my C# code after I pulled a ton of extra unnecessary data. Very frustrating.

SQL Query - Display Count & All ID's With Same Name

I'm trying to display the amount of table entries with the same name and the unique ID's associated with each of those entries.
So I have a table like so...
Table Names
------------------------------
ID Name
0 John
1 Mike
2 John
3 Mike
4 Adam
5 Mike
I would like the output to be something like:
Name | Count | IDs
---------------------
Mike 3 1,3,5
John 2 0,2
Adam 1 4
I have the following query which does this except display all the unique ID's:
select name, count(*) as ct from names group by name order by ct desc;
select name,
count(id) as ct,
group_concat(id) as IDs
from names
group by name
order by ct desc;
You can use GROUP_CONCAT for that
Depending on version of MSSQL you are using (2005+), you can use the FOR XML PATH option.
SELECT
Name,
COUNT(*) AS ct,
STUFF((SELECT ',' + CAST(ID AS varchar(MAX))
FROM names i
WHERE i.Name = n.Name FOR XML PATH(''))
, 1, 1, '') as IDs
FROM names n
GROUP BY Name
ORDER BY ct DESC
Closest thing to group_concat you'll get on MSSQL unless you use the SQLCLR option (which I have no experience doing). The STUFF function takes care of the leading comma. Also, you don't want to alias the inner SELECT as it will wrap the element you're selecting in an XML element (alias of TD causes each element to return as <TD>value</TD>).
Given the input above, here's the result I get:
Name ct IDs
Mike 3 1,3,5
John 2 0,2
Adam 1 4
EDIT: DISCLAIMER
This technique will not work as intended for string fields that could possibly contain special characters (like ampersands &, less than <, greater than >, and any number of other formatting characters). As such, this technique is most beneficial for simple integer values, although can still be used for text if you are ABSOLUTELY SURE there are no special characters that would need to be escaped. As such, read the solution posted HERE to ensure these characters get properly escaped.
Here is another SQL Server method, using recursive CTE:
Link to SQLFiddle
; with MyCTE(name,ids, name_id, seq)
as(
select name, CAST( '' AS VARCHAR(8000) ), -1, 0
from Data
group by name
union all
select d.name,
CAST( ids + CASE WHEN seq = 0 THEN '' ELSE ', ' END + cast(id as varchar) AS VARCHAR(8000) ),
CAST( id AS int),
seq + 1
from MyCTE cte
join Data d
on cte.name = d.name
where d.id > cte.name_id
)
SELECT name, ids
FROM ( SELECT name, ids,
RANK() OVER ( PARTITION BY name ORDER BY seq DESC )
FROM MyCTE ) D ( name, ids, rank )
WHERE rank = 1

How can I SELECT distinct data based on a date field?

I have table that stores a log of changes to objects in another table. Here are my table contents:
ObjID Color Date User
------- ------- ------------------------ --------
1 Red 2010-01-01 12:22:00.000 Joe
1 Blue 2010-01-02 15:22:00.000 Jill
1 Green 2010-01-03 16:22:00.000 Joe
1 White 2010-01-10 09:22:00.000 Mike
2 Red 2010-01-09 10:22:00.000 Mike
2 Blue 2010-01-12 09:22:00.000 Jill
2 Orange 2010-01-12 15:22:00.000 Joe
I want to select the most recent date for each Object, as well as the Color and User on the date of that record.
Bascically, I want this result set:
ObjID Color Date User
------- ------- ------------------------ --------
1 White 2010-01-10 09:22:00.000 Mike
2 Orange 2010-01-12 15:22:00.000 Joe
I'm having trouble wrapping my head around the SQL query I need to write to get this data...
I am retrieving data via ODBC from an iSeries DB2 database (AS/400).
Hey there, I think you want the following (where ColorTable is your table name):
SELECT Color.*
FROM ColorTable as Color
INNER JOIN
(
SELECT ObjID, MAX(Date) as Date
FROM ColorTable
GROUP BY ObjID
) as MaxDateByColor
ON Color.ObjID = MaxDateByColor.ObjID
AND Color.Date = MaxDateByColor.Date
Assuming at least SQL Server 2005
DECLARE #T TABLE (ObjID INT,Color VARCHAR(10),[Date] DATETIME,[User] VARCHAR(50))
INSERT INTO #T
SELECT 1,'Red',' 2010-01-01 12:22:00.000','Joe' UNION ALL
SELECT 1,'Blue','2010-01-02 15:22:00.000','Jill' UNION ALL
SELECT 1,'Green',' 2010-01-03 16:22:00.000','Joe' UNION ALL
SELECT 1,'White',' 2010-01-10 09:22:00.000','Mike' UNION ALL
SELECT 2,'Red',' 2010-01-09 10:22:00.000','Mike' UNION ALL
SELECT 2,'Blue','2010-01-12 09:22:00.000','Jill' UNION ALL
SELECT 2,'Orange','2010-01-12 15:22:00.000','Joe'
;WITH T AS
(
SELECT *,
ROW_NUMBER() OVER (PARTITION BY ObjID ORDER BY Date DESC) AS RN
FROM #T
)
SELECT ObjID,
Color,
[Date],
[User]
FROM T
WHERE RN=1
Or a SQL Server 2000 method from the article linked to in the comments
SELECT ObjID,
CAST(SUBSTRING(string, 24, 33) AS VARCHAR(10)) AS Color,
CAST(SUBSTRING(string, 1, 23) AS DATETIME ) AS [Date],
CAST(SUBSTRING(string, 34, 83) AS VARCHAR(50)) AS [User]
FROM
(
SELECT ObjID,
MAX((CONVERT(CHAR(23), [Date], 126)
+ CAST(Color AS CHAR(10))
+ CAST([User] AS CHAR(50))) COLLATE Latin1_General_BIN) AS string
FROM #T
GROUP BY ObjID) T;
If you have an Objects table and your ObjectHistory table has an index on ObjID and date, then this could perform better than other queries given so far:
SELECT
X.*
FROM
Objects O
CROSS APPLY (
SELECT TOP 1 *
FROM ObjectHistory H
WHERE O.ObjID = O.ObjID
ORDER BY H.[Date] DESC
) X
The performance improvement may only come if you're pulling columns from the Objects table, too, but it's worth a shot.
If you want all Objects regardless of whether they have a history entry, switch to OUTER APPLY (and of course use O.ObjID instead of H.ObjID).
The neat thing about this query is that
It solves for situations where the Date value can have duplicates
It can support an arbitrary number of items per group (say, the top 5 instead of the top 1)
See these two related questions:
SQL/mysql - Select distinct/UNIQUE but return all columns?
And:
How to efficiently determine changes between rows using SQL
SELECT t1.* FROM Table_name as t1
INNER JOIN (
SELECT MAX(Date) as MaxDate, ObjID FROM Table_name
GROUP BY ObjID
) as t2
ON t1.ObjID = t2.ObjID AND t1.Date = t2.MaxDate
You can find out, per object, its most recent change like this:
select objectid, max(changedate) as LatestChange
from LOG
group by objectid
You can then get the color and user columns by linking the set returned above, instantiated as an inline view that has been given an alias, to the same table again:
select color, user, FOO.objectid, FOO.LatestChange
from LOG
inner join
(
select objectid, max(changedate) as LatestChange
from LOG
group by objectid
) as FOO
on LOG.objectid = FOO.objectid and LOG.changedate = FOO.LatestChange
like martin smiths above,
simply just do a row number over partition and pick one of the rows that is most recent
like
SELECT Color,Date,User
FROM (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY User ORDER BY [DATE]) AS ROW_NUMBER
FROM [tablename]
) AS ROWS
WHERE
ROW_NUMBER = 2