Transpose many aggregates in SQL - sql

I have two tables, cases, my main table, and activities, which shows work being done against certain cases.
CREATE TABLE cases
([caseno] int, [case_detail] varchar(8), [date_received] datetime)
;
INSERT INTO cases
([caseno], [case_detail], [date_received])
VALUES
(1, 'DETAIL A', '2018-04-01 00:00:00'),
(2, 'DETAIL B', '2018-05-01 00:00:00'),
(3, 'DETAIL C', '2018-06-01 00:00:00')
;
CREATE TABLE activities
([caseno] int, [activity] int, [team] varchar(1))
;
INSERT INTO activities
([caseno], [activity], [team])
VALUES
(1, 00, 'A'),
(1, 10, 'A'),
(1, 00, 'A'),
(1, 00, 'B'),
(1, 90, 'C'),
(1, 00, 'C'),
(1, 00, 'A'),
(2, 10, 'A'),
(2, 00, 'A'),
(2, 00, 'B'),
(3, 90, 'C'),
(3, 00, 'C')
;
I'm interested in aggregating the activities data, for activity = '00', split by team, and attaching to the cases data.
I've achieved this in the following way but I suspect it is not optimal. The cases table is about 1million rows and activities table is 200million rows or so.
SELECT T.*, A.A, B.B, C.C FROM cases T
LEFT JOIN (SELECT caseno, COUNT(*) AS A FROM activities WHERE activity = '00' AND team = 'A' GROUP BY caseno) A ON T.[caseno] = A.[caseno]
LEFT JOIN (SELECT caseno, COUNT(*) AS B FROM activities WHERE activity = '00' AND team = 'B' GROUP BY caseno) B ON T.[caseno] = B.[caseno]
LEFT JOIN (SELECT caseno, COUNT(*) AS C FROM activities WHERE activity = '00' AND team = 'C' GROUP BY caseno) C ON T.[caseno] = C.[caseno]
https://dbfiddle.uk/?rdbms=sqlserver_2017&fiddle=92632c9af821935790a7986e6f654b13

You could use conditional aggregation:
SELECT c.caseno, case_detail, date_received,
COUNT(CASE WHEN team = 'A' THEN 1 END) AS a,
COUNT(CASE WHEN team = 'B' THEN 1 END) AS b,
COUNT(CASE WHEN team = 'C' THEN 1 END) AS c
FROM cases c
LEFT JOIN activities a
ON c.caseno = a.caseno
AND a.activity = '00'
GROUP BY c.caseno, case_detail, date_received;
db<>fiddle demo
EDIT
Without typing all columns in GROUP BY:
WITH cte AS (
SELECT c.caseno,
COUNT(CASE WHEN team = 'A' THEN 1 END) AS a,
COUNT(CASE WHEN team = 'B' THEN 1 END) AS b,
COUNT(CASE WHEN team = 'C' THEN 1 END) AS c
FROM cases c
LEFT JOIN activities a
ON c.caseno = a.caseno
AND a.activity = '00'
GROUP BY c.caseno -- only PK
)
SELECT * FROM cte JOIN cases c ON cte.caseno = c.caseno;

Pivot solution
select *
from
( select cs.*,A.team
from cases cs
join activities a on cs.caseno=a.caseno and a.activity = 00
) C
pivot
(count(team)
for team in ([A],[B],[C])
) pvt
give us same result
sample

Related

Case when with aggregation in BigQuery

I have data of how much users spend in several games in BigQuery:
CREATE TABLE if not EXISTS user_values (
user_id int,
value float,
game char
);
INSERT INTO user_values VALUES
(1, 10, 'A'),
(1, 10, 'A'),
(1, 2, 'A'),
(1, 4, 'B'),
(1, 5, 'B'),
(2, 0, 'A'),
(2, 10, 'B'),
(2, 6, 'B');
I want to check, for every user, if they've spent more than 20 in game A and more than 15 in game B. In this case, the output table should be:
user_id,game,spent_more_than_cutoff
1,A,TRUE
1,B,FALSE
2,A,FALSE
2,B,TRUE
I want to do this for an arbitrary number of users and 5-10 games. I've tried this:
select
game,
user_id,
case
when sum(value) > 20 and game = 'A' then TRUE
when sum(value) > 15 and game = 'B' then TRUE
else FALSE
end as spent_more_than_cutoff,
from user_values
group by 1, 2
but I get thrown the following error:
Column 3 contains an aggregation function, which is not allowed in GROUP BY at [19:20]
What's the simplest way of doing this in BigQuery without needing to do different queries for different games?
Is there an all function that can help to do something like this?
select
game,
user_id,
case
when sum(value) > 20 and all(game) = 'A' then TRUE
when sum(value) > 15 and all(game) = 'B' then TRUE
else FALSE
end as spent_more_than_cutoff,
from user_values
group by 1, 2
I want to do this for an arbitrary number of users and 5-10 games
Consider below approach
with cutoffs as (
select 'A' game, 20 cutoff union all
select 'B', 15
)
select user_id, game,
sum(value) > any_value(cutoff) spent_more_than_cutoff
from user_values
left join cutoffs using(game)
group by user_id, game
If applied to sample data for user_values in your question - output is
The expression for filtering on the game needs to be the argument to the sum():
select game, user_id,
(sum(case when game = 'A' then value end) > 20 and
sum(case when game = 'B' then value end) > 15
) as spent_more_than_cutoff
from user_values
group by 1, 2;
Note that you are returning a boolean so no case is needed.
Try this one:
select game,
user_id,
sum(if(game = 'A', value, 0)) > 20 or sum(if(game = 'B', value, 0)) > 15 as spent_more_than_cutoff
from user_values
group by 1, 2;

Joining two tables and finding the earliest date SQL

I have the following two tables and I need to get the following result:
Table 1
(A, 1, 01/01/2015),
(A, 1, 10/01/2015),
(A, 2, 20/01/2015),
(A, 2, 01/05/2015),
(B, 1, 20/02/2014),
(B, 1, 20/02/2015),
(B, 2, 20/02/2016),
(B, 2, 06/05/2015)
Table 2
(A, 1, 123),
(A, 1, 123),
(A, 2, 234),
(A, 2, 234),
(B, 1, 123),
(B, 2, 123),
I want to return the earliest date of each distinct combo:
(A, 123, 01/01/2015),
(A, 234, 20/01/2015),
(B, 123, 20/02/2014)
Code I have tried:
DECLARE #table1 TABLE (letter1 CHAR(1), num1 INT, date1 INT)
DECLARE #table2 TABLE (letter1 CHAR(1), num1 INT, num2 INT)
INSERT INTO #table1 VALUES
('A', 1, 01012015),
('A', 1, 10012015),
('A', 2, 20012015),
('A', 2, 01052015),
('B', 1, 20022014),
('B', 1, 20022015),
('B', 2, 20022016),
('B', 2, 06052015)
INSERT INTO #table2 VALUES
('A', 1, 123),
('A', 1, 123),
('A', 2, 234),
('A', 2, 234),
('B', 1, 123),
('B', 2, 123)
SELECT DISTINCT [#table1].letter1, num2, MIN(date1) FROM #table1
INNER JOIN #table2 ON [#table1].letter1 = [#table2].letter1 AND [#table1].num1 = [#table2].num1
GROUP BY [#table1].letter1, [#table1].num1, num2
You can use row_number() function :
select top (1) with ties t.letter1, t2.num2, t.date1
from table1 t inner join
table2 t2
on t2.letter1 = t.letter1 AND t2.num1 = t.num1
order by row_number() over (partition by t2.letter1, t2.num2 order by t.date1 desc);
Just giving a try . may be add date1 in group by cluase
SELECT DISTINCT [#table1].letter1, num2, MIN(date1) FROM #table1 INNER JOIN #table2 ON [#table1].letter1 = [#table2].letter1 AND [#table1].num1 = [#table2].num1 GROUP BY [#table1].letter1, [#table1].num1, num2,date1
;with cte as
(
select name, IIF(c = 1, 0, id) id, value from --- Here We separet the whole into two groups.
(
select name, id, value, count(*) c from #table2 group by name, id, value
) ct ---- Here one group (B) has same value (123).
---- And another group (A) have diff values (123,234))
)select c.name, c.value, min(t1.yyymmdd) from cte c
join #table1 t1
on c.name = t1.name
and c.id = t1.id ------ Id's join must. Because it has two different ids
and c.id <> 0 ------ 'A' group has been joined
group by c.name, value union
select c.name, c.value, min(t1.yyymmdd) Earlier_date from cte c
join #table1 t1
on c.name = t1.name ------ Id's join is no need. Because it has one id.
and c.id = 0 ------ 'B' group has been joined
group by c.name, value

Logic to check if exact ids (3+ records) are present in a group in SQL Server

I have some sample data like:
INSERT INTO mytable
([FK_ID], [TYPE_ID])
VALUES
(10, 1),
(11, 1), (11, 2),
(12, 1), (12, 2), (12, 3),
(14, 1), (14, 2), (14, 3), (14, 4),
(15, 1), (15, 2), (15, 4)
Now, here I am trying to check if in each group by FK_ID we have exact match of TYPE_ID values for 1, 2 & 3.
So, the expected output is like:
(10, 1) this should fail
As in group FK_ID = 10 we only have one record
(11, 1), (11, 2) this should also fail
As in group FK_ID = 11 we have two records.
(12, 1), (12, 2), (12, 3) this should pass
As in group FK_ID = 12 we have two records.
And all the TYPE_ID are exactly matching 1, 2 & 3 values.
(14, 1), (14, 2), (14, 3), (14, 4) this should also fail
As we have 4 records here.
(15, 1), (15, 2), (15, 4) this should also fail
Even though we have three records, it should fail as the TYPE_ID here (1, 2, 4) are not matching with required match (1, 2, 3).
Here is my attempt:
select * from mytable t1
where exists (select COUNT(t2.TYPE_ID)
from mytable t2 where t2.FK_ID = t1.FK_ID
and t2.TYPE_ID IN (1, 2, 3)
group by t2.FK_ID having COUNT(t2.TYPE_ID) = 3);
This is not working as expected, because it also pass for FK_ID = 14 which has four records.
Demo: SQL Fiddle
Also, how we can make it generic so that if we need to check for 4 or more TYPE_ID values like (1,2,3,4) or (1,2,3,4,5), we can do that easily by updating few values.
The following query will do what you want:
select fk_id
from t
group by fk_id
having sum(case when type_id in (1, 2, 3) then 1 else 0 end) = 3 and
sum(case when type_id not in (1, 2, 3) then 1 else 0 end) = 0;
This assumes that you have no duplicate pairs (although depending on how you want to handle duplicates, it might be as easy as using, from (select distinct * from t) t).
As for "genericness", you need to update the in lists and the 3.
If you want something more generic:
with vals as (
select id
from (values (1), (2), (3)) v(id)
)
select fk_id
from t
group by fk_id
having sum(case when type_id in (select id from vals) then 1 else 0 end) = (select count(*) from vals) and
sum(case when type_id not in (select id from vals) then 1 else 0 end) = 0;
You can use this code:
SELECT y.fk_id FROM
(SELECT x.fk_id, COUNT(x.type_id) AS count, SUM(x.type_id) AS sum
FROM mytable x GROUP BY (x.fk_id)) AS y
WHERE y.count = 3 AND y.sum = 6
For making it generic, you can equal y.count with N and y.sum with N*(N-1)/2, where N is the number you are looking for (1, 2, ..., N).
You can try this query. COUNT and DISTINCT used for eliminate duplicate records.
SELECT
[FK_ID]
FROM
#mytable T
GROUP BY
[FK_ID]
HAVING
COUNT(DISTINCT CASE WHEN [TYPE_ID] IN (1,2,3) THEN [TYPE_ID] END) = 3
AND COUNT(CASE WHEN [TYPE_ID] NOT IN (1,2,3) THEN [TYPE_ID] END) = 0
Try this:
select FK_ID,count(distinct TYPE_ID) from mytable
where TYPE_ID<=3
group by FK_ID
having count(distinct TYPE_ID)=3
You should use CTE with Dynamic pass Value which you have mentioned in Q.
WITH CTE
AS (
SELECT FK_ID,
COUNT(*) CNT
FROM #mytable
GROUP BY FK_ID
HAVING COUNT(*) = 3) <----- Pass Value here What you want to Display Result,
CTE1
AS (
SELECT T.[ID],
T.[FK_ID],
T.[TYPE_ID],
ROW_NUMBER() OVER(PARTITION BY T.[FK_ID] ORDER BY
(
SELECT NULL
)) RN
FROM #mytable T
INNER JOIN CTE C ON C.FK_ID = T.FK_ID),
CTE2
AS (
SELECT C1.FK_ID
FROM CTE1 C1
GROUP BY C1.FK_ID
HAVING SUM(C1.TYPE_ID) = SUM(C1.RN))
SELECT TT1.*
FROM CTE2 C2
INNER JOIN #mytable TT1 ON TT1.FK_ID = C2.FK_ID;
From above SQL Command which will produce Result (I have passed 3) :
ID FK_ID TYPE_ID
4 12 1
5 12 2
6 12 3

SQL count multiple cells as combination

I have the following SQL table which shows case number and their value, the case number always appear 2 cases in a group, I want to count how many the combinations with same case number appearing in the table. Be ware the order could be different, see case A and C, both of them should be count as the same combination.
case value
A 1992
A 1956
B 2000
B 2001
C 1956
C 1992
The goal is to get the total number of each combination, so the output format doesn't matter. One of expected result:
Seq value frequency
1 1992 2
1 1956 2
2 2000 1
2 2001 1
What about if there are 3 cases as a combination?
This works with any number of values for any case. It only increment frequency count when cases have the same number of values and each one have a match, no matter in which order.
CREATE TABLE #Table1
([Case] varchar(1), [Value] int)
;
INSERT INTO #Table1
([Case], [Value])
VALUES
('A', 1992), ('A', 1956), ('A', 1997), ('B', 2000), ('B', 2001), ('C', 1956),
('C', 1992), ('C', 1997), /*('C',1993),*/ ('D', 2005), ('D', 2008), ('E', 1956),
('E', 1992) , ('F', 1956), ('F', 1992), ('G', 1956), ('G', 1992) ;
--Query
select min(a.[Case]) [Case], [Values], count(*) frequency
from (
SELECT t.[Case],
stuff(
(
select ',' + cast (t1.[Value] as varchar(20))
from #table1 t1
where t1.[Case] = t.[Case]
order by t1.[Value]
for xml path('')
),1,1,'') [Values]
FROM #table1 t
GROUP BY t.[Case]
)a
group by [Values]
order by [Case]
Result whith values sorted in ascending order
Case Values frequency
A 1956,1992,1997 2
B 2000,2001 1
D 2005,2008 1
E 1956,1992 3
Data sample, SQL Server 2014
CREATE TABLE Table1
([case] varchar(1), [value] int)
;
INSERT INTO Table1
([case], [value])
VALUES
('A', 1992),
('A', 1956),
('B', 2000),
('B', 2001),
('C', 1956),
('C', 1992)
;
Query
select dense_rank() over (ORDER BY min(a.[case])) seq, a.value, count(*) freq
from table1 a left join table1 b
on a.value=b.value and a.[case]<>b.[case]
group by a.value
order by a.value
http://sqlfiddle.com/#!6/40a87/3
This is not exactly as you post expected results, but respond on what you just request in the previous comment.
select min, max, count (*) freq
from (
select a.[case] [case], min(a.value) min, max(a.value) max
from table1 a
group by a.[case]) b
group by min, max
order by min, max
http://sqlfiddle.com/#!6/40a87/18

Delete rows in table that are sum of other rows per group

Group rows by T, and in each group find the row that is the largest or smallest (if values are negative) sum of other rows from that group, and delete that row (one for each group), if group does not have enough elements to find sum or enough but none of the rows indicates sum of others nothing happens
CREATE TABLE Test (
T varchar(10),
V int
);
INSERT INTO Test
VALUES ('A', 4),
('B', -5),
('C', 5),
('A', 2),
('B', -1),
('C', 10),
('A', 2),
('B', -4),
('C', 5),
('D', 0);
expected result:
A 2
A 2
B -1
B -4
C 5
C 5
D 0
Like the comments, the requirements seem strange. The below code assumes that the summing is already pre-populated and merely removes the largest/smallest as long as the highest value is not 0.
if object_id('tempdb..#test') is not null
drop table #test
CREATE TABLE #Test (
T varchar(10),
V int
);
INSERT INTO #Test
VALUES ('A', 4), ('B', -5), ('C', 5), ('A', 2), ('B', -1), ('C', 10), ('A', 2), ('B', -4), ('C', 5), ('D', 0);
if object_id('tempdb..#test2') is not null
drop table #test2
SELECT
T,
V,
ABS(V) as absV
INTO #TEST2
FROM #TEST
SELECT * FROM #TEST2
if object_id('tempdb..#max') is not null
drop table #max
SELECT
T,
MAX(absV) AS MaxAbsV
INTO #Max
FROM #TEST2
GROUP BY T
HAVING MAX(AbsV) != 0
DELETE #TEST2
FROM #TEST2
INNER JOIN #MAX ON #TEST2.T = #MAX.T AND #TEST2.absV = #Max.MaxAbsV
SELECT * FROM #TEST2
ORDER BY T ASC
; with cte as
(
select T, V,
R = row_number() over (partition by T order by ABS(V) desc),
C = count(*) over (partition by T)
from Test
)
delete c
from cte c
inner join
(
select T, S = sum(V)
from cte
where R <> 1
group by T
) s on c.T = s.T
where c.C >= 3
and c.R = 1
and c.V = s.S
Using ABS and NOT Exists
DECLARE #Test TABLE (
T varchar(10),
V int
);
INSERT INTO #Test
VALUES ('A', 4), ('B', -5), ('C', 5), ('A', 2), ('B', -1), ('C', 10), ('A', 2), ('B', -4), ('C', 5), ('D', 0);
;WITH CTE as (
select T,max(ABS(v ))v from #Test
WHERE V <> 0
GROUP BY T )
SELECT T,V FROM #Test T where NOT exists (Select 1 FROM cte WHERE T = T.T AND v = ABS(T.V) )
ORDER BY T.T
Determine first if the rows are positive or negative by checking if SUM(V) is positive. And then determine if the smallest or largest value is equal to the SUM of the other rows, by subtracting from SUM(V) the MIN(V) if negative or MAX(V) if positive:
DELETE t
FROM Test t
INNER JOIN (
SELECT
T,
SUM(V) - CASE WHEN SUM(V) >= 0 THEN MAX(V) ELSE MIN(V) END AS ToDelete
FROM Test
GROUP BY T
HAVING COUNT(*) >= 3
) a
ON a.T = t.T
AND a.ToDelete = t.V
ONLINE DEMO
You can use the below query to get the required output :-
select * into #t1 from test
select * from
(
select TT.T as T,TT.V as V
from test TT
JOIN
(select T,max(abs(V)) as V from #t1
group by T) P
on TT.T=P.T
where abs(TT.V) <> P.V
UNION ALL
select A.T as T,A.V as V from test A
JOIN(
select T,count(T) as Tcount from test
group by T
having count(T)=1) B on A.T=B.T
) X order by T
drop table #t1
You are looking for a value per group that is the sum of all the group's other values. E.g. 4 of (2,2,4) or -5 of (-5,-4,-1).
This is usually only one record per group. But it can be multiple times the same number. Here are examples for ties: (0,0) or (-2,2,4,4), or (-2,-2,4,4,4) or (-10,3,3,3,3,4).
As you see, you are looking in any way for values that equal half of the group's total sum. (Of course. We are looking for n+n, where one n is in one record and the other n is the sum of all the other records.)
The only special case is when there is only one value in the group which is zero. That we don't want to delete of course.
Here is an update statement that cannot deal with ties, but would delete all maximum values instead of just one:
delete from test
where 2 * v =
(
select case when count(*) = 1 then null else sum(v) end
from test fullgroup
where fullgroup.t = test.t
);
In order to deal with ties you would need artificial row numbers, so as to delete only one record of all candidates.
with candidates as
(
select t, v, row_number() over (partition by t order by t) as rn
from
(
select
t, v,
sum(v) over (partition by t) as sumv,
count(*) over (partition by t) as cnt
from test
) comparables
where sumv = 2 * v and cnt > 1
)
delete
from candidates
where rn = 1;
SQL fiddle: http://sqlfiddle.com/#!6/6d97e/1
See if below query helps:
DELETE [Audit].[dbo].[Test] FROM [Audit].[dbo].[Test] as AA
INNER JOIN (select T,
CASE
WHEN MAX(V) < 0 THEN MIN(V)
WHEN MIN(V) > 0 THEN MAX(V) ELSE MAX(V)
END as MAX_V,
CASE
WHEN SUM(V) > 0 THEN SUM(V) - MAX(V)
WHEN SUM(V) < 0 THEN SUM(V) - MIN(V) ELSE SUM(V)
END as SUM_V_REST
from [Audit].[dbo].[Test]
Group by T
Having Count(V) > 1) as BB ON AA.T = BB.T and AA.V = BB.MAX_V