SQL conditional aggregation? - sql

Let's say I have the following table:
name virtual message
--------------------------
a 1 'm1'
a 1 'm2'
a 0 'm3'
a 0 'm4'
b 1 'm5'
b 0 'm6'
c 0 'm7'
I want to group by name but only concat the message if virtual is 1.
The result I am looking for is:
name concat_message
---------------------
a 'm1,m2'
b 'm5'
c ''
I couldn't find a way to conditionally aggregate using string_agg.

Standard SQL offers listagg() to aggregate strings. So this looks something like:
select name,
listagg(case when virtual = 1 then message end, ',') within group (order by message)
from t
group by name;
However, most databases have different names (and syntax) for string aggregation, such as string_agg() or group_concat().
EDIT:
In BQ the syntax would be:
select name,
string_agg(case when virtual = 1 then message end, ',')
from t
group by name;
That said, I would recommend array_agg() rather than string_agg().

Consider below
select name,
ifnull(string_agg(if(virtual=1,message,null)), '') as concat_message
from your_table
group by name
If applied to sample data in your question - output is

use xml xpath to rotate row data into a single column
declare #temp table(name varchar(1), virtual int, message varchar(2))
insert into #temp
values('a' ,1 , 'm1'),
('a', 1 , 'm2'),
('a', 0 , 'm3'),
('a', 0 , 'm4'),
('b', 1 , 'm5'),
('b', 0 , 'm6'),
('c', 0 , 'm7')
select tmp2.name, stuff((select ','+message from #temp tmp1
where
tmp1.virtual=1
and tmp1.name=tmp2.name
for xml path('')),1,1,'') result
from #temp tmp2
where tmp2.virtual=1
group by tmp2.name
output:
name result
a m1,m2
b m5

Related

SQL Comma separated values comparisons

I am having a challenge comparing values in Available column with values in Required column. They are both comma separated.
Available
Required
Match
One, Two, Three
One, Three
1
One, Three
Three, Five
0
One, Two, Three
Two
1
What I want to achieve is, if values in the Required column are all found in the Available column then it gives me a match of 1 and 0 if one or more values that are in the Required column is missing in the Available column
I want to achieve this in SQL.
If I understand the question correctly, an approach based on STRING_SPLIT() and an appropriate JOIN is an option:
Sample data:
SELECT *
INTO Data
FROM (VALUES
('One, Two, Three', 'One, Three'),
('One, Three', 'Three, Five'),
('One, Two, Three', 'Two')
) v (Available, Required)
Statement:
SELECT
Available, Required,
CASE
WHEN EXISTS (
SELECT 1
FROM STRING_SPLIT(Required, ',') s1
LEFT JOIN STRING_SPLIT(Available, ',') s2 ON TRIM(s1.[value]) = TRIM(s2.[value])
WHERE s2.[value] IS NULL
) THEN 0
ELSE 1
END AS Match
FROM Data
Result:
Available
Required
Match
One, Two, Three
One, Three
1
One, Three
Three, Five
0
One, Two, Three
Two
1
A variation of Zhorov's solution.
It is using a set based operator EXCEPT.
SQL
-- DDL and sample data population, start
DECLARE #tbl TABLE (Available VARCHAR(100), Required VARCHAR(100));
INSERT INTO #tbl (Available, Required) VALUES
('One, Two, Three', 'One, Three'),
('One, Three', 'Three, Five'),
('One, Two, Three', 'Two');
-- DDL and sample data population, end
SELECT t.*
, [Match] = CASE
WHEN EXISTS (
SELECT TRIM([value]) FROM STRING_SPLIT(Required, ',')
EXCEPT
SELECT TRIM([value]) FROM STRING_SPLIT(Available, ',')
) THEN 0
ELSE 1
END
FROM #tbl AS t;
Output
+-----------------+-------------+-------+
| Available | Required | Match |
+-----------------+-------------+-------+
| One, Two, Three | One, Three | 1 |
| One, Three | Three, Five | 0 |
| One, Two, Three | Two | 1 |
+-----------------+-------------+-------+
You need to do a cross join to look in all available values, your query would be :
SELECT t.*
,case when SUM(CASE
WHEN t1.Available LIKE '%' + t.Required + '%'
THEN 1
ELSE 0
END) > 0 THEN 1 ELSE 0 END AS [Match_Calculated]
FROM YOUR_TABLE t
CROSS JOIN YOUR_TABLE t1
GROUP BY t.Available
,t.Required
,t.Match
Here's a dbfiddle
You can use "STRING_SPLIT" to achieve your request
;with Source as
(
select 1 id,'One,Two,Three' Available,'One,Three' Required
union all
select 2 id,'One,Three' Available,'Three,Five' Required
union all
select 3 id,'One,Two,Three' Available,'Two' Required
)
,AvailableTmp as
(
SELECT t.id,
x.value
FROM Source t
CROSS APPLY (SELECT trim(value) value
FROM string_split(t.Available, ',')) x
)
,RequiredTmp as
(
SELECT t.id,
x.value
FROM Source t
CROSS APPLY (SELECT trim(value) value
FROM string_split(t.Required, ',')) x
)
,AllMatchTmp as
(
select a.id
,1 Match
From RequiredTmp a
left join AvailableTmp b on a.id=b.id and a.value = b.value
group by a.id
having max(case when b.value is null then 1 else 0 end ) = 0
)
select a.id
,a.Available
,a.Required
,ISNULL(b.Match,0) Match
from Source a
left join AllMatchTmp b on a.id = b.id
Another way using STRING_SPLIT
DECLARE #data TABLE (Available VARCHAR(100), [Required] VARCHAR(100),
INDEX IX_data(Available,[Required]));
INSERT #data
VALUES ('One, Two, Three', 'One, Three'),('One, Three', 'Three, Five'),
('One, Two, Three', 'Two');
SELECT
Available = d.Available,
[Required] = d.[Required],
[Match] = MIN(f.X)
FROM #data AS d
CROSS APPLY STRING_SPLIT(REPLACE(d.[Required],' ',''),',') AS split
CROSS APPLY (VALUES(REPLACE(d.[Available],' ',''))) AS cleaned(String)
CROSS APPLY (VALUES(IIF(split.[value] NOT IN
(SELECT s.[value] FROM STRING_SPLIT(cleaned.String,',') AS s),0,1))) AS f(X)
GROUP BY d.Available, d.[Required];

Group by absorb NULL unless it's the only value

I'm trying to group by a primary column and a secondary column. I want to ignore NULL in the secondary column unless it's the only value.
CREATE TABLE #tempx1 ( Id INT, [Foo] VARCHAR(10), OtherKeyId INT );
INSERT INTO #tempx1 ([Id],[Foo],[OtherKeyId]) VALUES
(1, 'A', NULL),
(2, 'B', NULL),
(3, 'B', 1),
(4, 'C', NULL),
(5, 'C', 1),
(6, 'C', 2);
I'm trying to get output like
Foo OtherKeyId
A NULL
B 1
C 1
C 2
This question is similar, but takes the MAX of the column I want, so it ignores other non-NULL values and won't work.
I tried to work out something based on this question, but I don't quite understand what that query does and can't get my output to work
-- Doesn't include Foo='A', creates duplicates for 'B' and 'C'
WITH cte AS (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY [Foo] ORDER BY [OtherKeyId]) rn1
FROM #tempx1
)
SELECT c1.[Foo], c1.[OtherKeyId], c1.rn1
FROM cte c1
INNER JOIN cte c2 ON c2.[OtherKeyId] = c1.[OtherKeyId] AND c2.rn1 = c1.rn1
This is for a modern SQL Server: Microsoft SQL Server 2019
You can use a GROUP BY expression with HAVING clause like below one
SELECT [Foo],[OtherKeyId]
FROM #tempx1 t
GROUP BY [Foo],[OtherKeyId]
HAVING SUM(CASE WHEN [OtherKeyId] IS NULL THEN 0 END) IS NULL
OR ( SELECT COUNT(*) FROM #tempx1 WHERE [Foo] = t.[Foo] ) = 1
Demo
Hmmm . . . I think you want filtering:
select t.*
from #tempx1 t
where t.otherkeyid is not null or
not exists (select 1
from #tempx1 t2
where t2.foo = t.foo and t2.otherkeyid is not null
);
My actual problem is a bit more complicated than presented here, I ended up using the idea from Barbaros Özhan solution to count the number of items. This ends up with two inner queries on the data set with two different GROUP BY. I'm able to get the results I need on my real dataset using a query like the following:
SELECT
a.[Foo],
b.[OtherKeyId]
FROM (
SELECT
[Foo],
COUNT([OtherKeyId]) [C]
FROM #tempx1 t
GROUP BY [Foo]
) a
JOIN (
SELECT
[Foo],
[OtherKeyId]
FROM #tempx1 t
GROUP BY [Foo], [OtherKeyId]
) b ON b.[Foo] = a.[Foo]
WHERE
(b.[OtherKeyId] IS NULL AND a.[C] = 0)
OR (b.[OtherKeyId] IS NOT NULL AND a.[C] > 0)

sql generate code based on three column values

I have three columns
suppose
row no column1 column2 column3
1 A B C
2 A B C
3 D E F
4 G H I
5 G H C
I want to generate code by combining these three column values
For Eg.
1)ABC001
2)ABC002
3)DEF001
4)GHI001
5)GHC001
by checking combination of three columns
logic is that
if values of three columns are same then like first time it shows 'ABC001'
and 2nd time it shows 'ABC002'
You can try this:
I dont know what you want for logic with 00, but you can add them manuel or let the rn decide for you
declare #mytable table (rowno int,col1 nvarchar(50),col2 nvarchar(50),col3 nvarchar(50)
)
insert into #mytable
values
(1,'A', 'B', 'C'),
(2,'A', 'B', 'C'),
(3,'D', 'E', 'F'),
(4,'G', 'H', 'I'),
(5,'G', 'H', 'C')
Select rowno,col1,col2,col3,
case when rn >= 10 and rn < 100 then concatcol+'0'+cast(rn as nvarchar(50))
when rn >= 100 then concatcol+cast(rn as nvarchar(50))
else concatcol+'00'+cast(rn as nvarchar(50)) end as ConcatCol from (
select rowno,col1,col2,col3
,Col1+col2+col3 as ConcatCol,ROW_NUMBER() over(partition by col1,col2,col3 order by rowno) as rn from #mytable
) x
order by rowno
My case when makes sure when you hit number 10 it writes ABC010 and when it hits above 100 it writes ABC100 else if its under 10 it writes ABC001 and so on.
Result
TSQL: CONCAT(column1,column2,column3,RIGHT(REPLICATE("0", 3) + LEFT(row_no, 3), 3))
You should combine your columns like below :
SELECT CONVERT(VARCHAR(MAX), ROW_NUMBER() OVER(ORDER BY
(
SELECT NULL
)))+') '+DATA AS Data
FROM
(
SELECT column1+column2+column3+'00'+CONVERT(VARCHAR(MAX), ROW_NUMBER() OVER(PARTITION BY column1,
column2,
column3 ORDER BY
(
SELECT NULL
))) DATA
FROM <table_name>
) T;
Result :
1)ABC001
2)ABC002
3)DEF001
4)GHI001
5)GHC001
MySQL:
CONCAT(column1,column2,column3,LPAD(row_no, 3, '0'))
[you will need to enclose the 'row no' in ticks if there is a space in the name of the field instead of underscore.]

How can I use PIVOT to show simultationly average and count in its cells?

Looking at the syntax I get the strong impression, that PIVOT doesn't support anything beyond a single aggregate function to be calculated for a cell.
From statistical view showing just some averages without giving the number of cases an average refers to is very unsatisfying ( that is the polite version ).
Is there some nice pattern to evaluate pivots based on avg and pivots based on count and mix them together to give a nice result?
Yes you need to use the old style cross tab for this. The PIVOT is just syntactic sugar that resolves to pretty much the same approach.
SELECT AVG(CASE WHEN col='foo' THEN col END) AS AvgFoo,
COUNT(CASE WHEN col='foo' THEN col END) AS CountFoo,...
If you have many aggregates you could always use a CTE
WITH cte As
(
SELECT CASE WHEN col='foo' THEN col END AS Foo...
)
SELECT MAX(Foo),MIN(Foo), COUNT(Foo), STDEV(Foo)
FROM cte
Simultaneous.. in its cells. So you mean within the same cell, therefore as a varchar?
You could calc the avg and count values in an aggregate query before using the pivot, and concatenate them together as text.
The role of the PIVOT operator here would only be to transform rows to columns, and some aggregate function (e.g. MAX/MIN) would be used only because it is required by the syntax - your pre-calculated aggregate query would only have one value per pivoted column.
EDIT
Following bernd_k's oracle/mssql solution, I would like to point out another way to do this in SQL Server. It requires streamlining the multiple columns into a single column.
SELECT MODULE,
modus + '_' + case which when 1 then 'AVG' else 'COUNT' end AS modus,
case which when 1 then AVG(duration) else COUNT(duration) end AS value
FROM test_data, (select 1 as which union all select 2) x
GROUP BY MODULE, modus, which
SELECT *
FROM (
SELECT MODULE,
modus + '_' + case which when 1 then 'AVG' else 'COUNT' end AS modus,
case which when 1 then CAST(AVG(1.0*duration) AS NUMERIC(10,2)) else COUNT(duration) end AS value
FROM test_data, (select 1 as which union all select 2) x
GROUP BY MODULE, modus, which
) P
PIVOT (MAX(value) FOR modus in ([A_AVG], [A_COUNT], [B_AVG], [B_COUNT])
) AS pvt
ORDER BY pvt.MODULE
In the example above, AVG and COUNT are compatible (count - int => numeric). If they are not, convert both explicitly to a compatible type.
Note - The first query shows AVG for M2/A as 2, due to integer averaging. The 2nd (pivoted) query shows the actual average taking into account decimals.
Solution for Oracle 11g + :
create table test_data (
module varchar2(30),
modus varchar2(30),
duration Number(10)
);
insert into test_data values ('M1', 'A', 5);
insert into test_data values ('M1', 'A', 5);
insert into test_data values ('M1', 'B', 3);
insert into test_data values ('M2', 'A', 1);
insert into test_data values ('M2', 'A', 4);
select *
FROM (
select *
from test_data
)
PIVOT (
AVG(duration) avg , count(duration) count
FOR modus in ( 'A', 'B')
) pvt
ORDER BY pvt.module;
I do not like the column names containing apostrophes, but the result contains what I want:
MODULE 'A'_AVG 'A'_COUNT 'B'_AVG 'B'_COUNT
------------------------------ ---------- ---------- ---------- ----------
M1 5 2 3 1
M2 2.5 2 0
I really wonder what the Microsoft boys did, when they only allowed one aggregate function within pivot. I call evaluation avgs without accompanying counts statistical lies.
SQL-Server 2005 + (based on Cyberwiki):
CREATE TABLE test_data (
MODULE VARCHAR(30),
modus VARCHAR(30),
duration INTEGER
);
INSERT INTO test_data VALUES ('M1', 'A', 5);
INSERT INTO test_data VALUES ('M1', 'A', 5);
INSERT INTO test_data VALUES ('M1', 'B', 3);
INSERT INTO test_data VALUES ('M2', 'A', 1);
INSERT INTO test_data VALUES ('M2', 'A', 4);
SELECT MODULE, modus, ISNULL(LTRIM(STR(AVG(duration))), '') + '|' + ISNULL(LTRIM(STR(COUNT(duration))), '') RESULT
FROM test_data
GROUP BY MODULE, modus;
SELECT *
FROM (
SELECT MODULE, modus, ISNULL(LTRIM(STR(AVG(duration))), '') + '|' + ISNULL(LTRIM(STR(COUNT(duration))), '') RESULT
FROM test_data
GROUP BY MODULE, modus
) T
PIVOT (
MAX(RESULT)
FOR modus in ( [A], [B])
) AS pvt
ORDER BY pvt.MODULE
result:
MODULE A B
------------------------------ --------------------- ---------------------
M1 5|2 3|1
M2 2|2 NULL

SQL 2005 Merge / concatenate multiple rows to one column

We have a bit of a SQL quandry. Say I have a results that look like this...
61E77D90-D53D-4E2E-A09E-9D6F012EB59C | A
61E77D90-D53D-4E2E-A09E-9D6F012EB59C | B
61E77D90-D53D-4E2E-A09E-9D6F012EB59C | C
61E77D90-D53D-4E2E-A09E-9D6F012EB59C | D
7ce953ca-a55b-4c55-a52c-9d6f012ea903 | E
7ce953ca-a55b-4c55-a52c-9d6f012ea903 | F
is there a way I can group these results within SQL to return as
61E77D90-D53D-4E2E-A09E-9D6F012EB59C | A B C D
7ce953ca-a55b-4c55-a52c-9d6f012ea903 | E F
Any ideas people?
Many thanks
Dave
try this:
set nocount on;
declare #t table (id char(36), x char(1))
insert into #t (id, x)
select '61E77D90-D53D-4E2E-A09E-9D6F012EB59C' , 'A' union
select '61E77D90-D53D-4E2E-A09E-9D6F012EB59C' , 'B' union
select '61E77D90-D53D-4E2E-A09E-9D6F012EB59C' , 'C' union
select '61E77D90-D53D-4E2E-A09E-9D6F012EB59C' , 'D' union
select '7ce953ca-a55b-4c55-a52c-9d6f012ea903' , 'E' union
select '7ce953ca-a55b-4c55-a52c-9d6f012ea903' , 'F'
set nocount off
SELECT p1.id,
stuff(
(SELECT
' ' + x
FROM #t p2
WHERE p2.id=p1.id
ORDER BY id, x
FOR XML PATH('')
)
,1,1, ''
) AS YourValues
FROM #t p1
GROUP BY id
OUTPUT:
id YourValues
------------------------------------ --------------
61E77D90-D53D-4E2E-A09E-9D6F012EB59C A B C D
7ce953ca-a55b-4c55-a52c-9d6f012ea903 E F
(2 row(s) affected)
EDIT
based on OP's comment about this needing to run for an existing query, try this:
;WITH YourBugQuery AS
(
--replace this with your own query
select '61E77D90-D53D-4E2E-A09E-9D6F012EB59C' AS ColID , 'A' AS ColX
union select '61E77D90-D53D-4E2E-A09E-9D6F012EB59C' , 'B'
union select '61E77D90-D53D-4E2E-A09E-9D6F012EB59C' , 'C'
union select '61E77D90-D53D-4E2E-A09E-9D6F012EB59C' , 'D'
union select '7ce953ca-a55b-4c55-a52c-9d6f012ea903' , 'E'
union select '7ce953ca-a55b-4c55-a52c-9d6f012ea903' , 'F'
)
SELECT p1.ColID,
stuff(
(SELECT
' ' + ColX
FROM YourBugQuery p2
WHERE p2.ColID=p1.ColID
ORDER BY ColID, ColX
FOR XML PATH('')
)
,1,1, ''
) AS YourValues
FROM YourBugQuery p1
GROUP BY ColID
this has the same results set as displayed above.
I prefer to define a custom user-defined aggregate. Here's an example of a UDA which will accomplish something very close to what you're asking.
Why use a user-defined aggregate instead of a nested SELECT? It's all about performance, and what you are willing to put up with. For a small amount of elements, you can most certainly get away with a nested SELECT, but for large "n", you'll notice that the query plan essentially runs the nested SELECT once for every row in the output list. This can be the kiss of death if you're talking about a large number of rows. With a UDA, it's possible to aggregate these values in a single pass.
The tradeoff, of course, is that the UDA requires you to use the CLR to deploy it, and that's something not a lot of people do often. In Oracle, this particular situation is a bit nicer as you can use PL/SQL directly to create your user-defined aggregate, but I digress...
Another way of doing it is to use the FOR XML PATH option
SELECT
[ID],
(
SELECT
[Value] + ' '
FROM
[YourTable] [YourTable2]
WHERE
[YourTable2].[ID] = [YourTable].[ID]
ORDER BY
[Value]
FOR XML PATH('')
) [Values]
FROM
[YourTable]
GROUP BY
[YourTable].[ID]