Case when with aggregation in BigQuery - sql

I have data of how much users spend in several games in BigQuery:
CREATE TABLE if not EXISTS user_values (
user_id int,
value float,
game char
);
INSERT INTO user_values VALUES
(1, 10, 'A'),
(1, 10, 'A'),
(1, 2, 'A'),
(1, 4, 'B'),
(1, 5, 'B'),
(2, 0, 'A'),
(2, 10, 'B'),
(2, 6, 'B');
I want to check, for every user, if they've spent more than 20 in game A and more than 15 in game B. In this case, the output table should be:
user_id,game,spent_more_than_cutoff
1,A,TRUE
1,B,FALSE
2,A,FALSE
2,B,TRUE
I want to do this for an arbitrary number of users and 5-10 games. I've tried this:
select
game,
user_id,
case
when sum(value) > 20 and game = 'A' then TRUE
when sum(value) > 15 and game = 'B' then TRUE
else FALSE
end as spent_more_than_cutoff,
from user_values
group by 1, 2
but I get thrown the following error:
Column 3 contains an aggregation function, which is not allowed in GROUP BY at [19:20]
What's the simplest way of doing this in BigQuery without needing to do different queries for different games?
Is there an all function that can help to do something like this?
select
game,
user_id,
case
when sum(value) > 20 and all(game) = 'A' then TRUE
when sum(value) > 15 and all(game) = 'B' then TRUE
else FALSE
end as spent_more_than_cutoff,
from user_values
group by 1, 2

I want to do this for an arbitrary number of users and 5-10 games
Consider below approach
with cutoffs as (
select 'A' game, 20 cutoff union all
select 'B', 15
)
select user_id, game,
sum(value) > any_value(cutoff) spent_more_than_cutoff
from user_values
left join cutoffs using(game)
group by user_id, game
If applied to sample data for user_values in your question - output is

The expression for filtering on the game needs to be the argument to the sum():
select game, user_id,
(sum(case when game = 'A' then value end) > 20 and
sum(case when game = 'B' then value end) > 15
) as spent_more_than_cutoff
from user_values
group by 1, 2;
Note that you are returning a boolean so no case is needed.

Try this one:
select game,
user_id,
sum(if(game = 'A', value, 0)) > 20 or sum(if(game = 'B', value, 0)) > 15 as spent_more_than_cutoff
from user_values
group by 1, 2;

Related

Group elements of a column into mulitple subgroups SQL

I am looking at different breeds of cattle and their AnimalTypeCode , BreedCateoryID and resultant Growth.
I have the following query
SELECT DATEPART(yyyy,[KillDate])
,[AnimalTypeCode]
,AVG([Growth])
,[BreedCategoryID]
FROM [dbo].[tblAnimal]
WHERE (AnimalTypeCode='C'
or AnimalTypeCode= 'E')
GROUP BY DATEPART(yyyy,[KillDate])
,[AnimalTypeCode]
,[BreedCategoryID]
GO
This query is good and gives me almost what I want, but BreedCategoryID is numbered 1 through 7 and I would like to group them:
(1 = Pure Dairy),
(2 and 3 = Dairy)
(4, 5, 6 and 7 = Beef)
So instead of getting the mean Growthrate for each BreedCategoryID I would like to get the average for Pure Dairy, Dairy, and Beef.
Any help greatly appreciated!
You can assign a new "variable" using cross apply in the from clause:
SELECT YEAR(KillDate]), a.AnimalTypeCode, v.grp,
AVG([Growth])
FROM [dbo].[tblAnimal] a CROSS APPLY
(VALUES (CASE WHEN a.BreedCategoryID IN (1) THEN 'Pure Dairy'
WHEN a.BreedCategoryID IN (2, 3) THEN 'Dairy'
WHEN a.BreedCategoryID IN (4, 5, 6, 7) THEN 'Beef'
END)
) as v(grp)
WHERE a.AnimalTypeCode IN ('C', 'E')
GROUP BY YEAR(KillDate]), a.AnimalTypeCode, v.grp;
Note that I also introduced table aliases and qualified all the column references.
Do the calculations in a derived table (the subquery). GROUP BY its result:
select killyear, [AnimalTypeCode], AVG([Growth]), BreedCat
(
SELECT DATEPART(yyyy,[KillDate]) killyear
,[AnimalTypeCode]
,[Growth]
,case when [BreedCategoryID] = 1 then 'Pure Dairy'
when [BreedCategoryID] in (2, 3) then 'Dairy'
when [BreedCategoryID] in (4, 5, 6, 7) then 'Beef'
end BreedCat
FROM [dbo].[tblAnimal]
WHERE (AnimalTypeCode='C'
or AnimalTypeCode= 'E')
) dt
GROUP BY killyear
,[AnimalTypeCode]
,BreedCat

Conditional group by with window function in Snowflake query

I have a table in Snowflake in following format:
create temp_test(name string, split string, value int)
insert into temp_test
values ('A','a', 100), ('A','b', 200), ('A','c',300), ('A', 'd', 400), ('A', 'e',500), ('B', 'a', 1000), ('B','b', 2000), ('B','c', 3000), ('B', 'd',4000), ('B','e', 5000)
First step, I needed only top 2 value per name (sorted on value), so I used following query to get that:
select name, split, value,
row_number() over (PARTITION BY (name) order by value desc) as row_num
from temp_test
qualify row_num <= 2
Which gives me following resultset:
NAME SPLIT VALUE ROW_NUM
A e 500 1
A d 400 2
B e 5000 1
B d 4000 2
Now, I need to sum values other than Top 2 and put it in a different Split named as "Others", like this:
NAME SPLIT VALUE
A e 500
A d 400
A Others 600
B e 5000
B d 4000
B Others 6000
How to do that in Snowflake query or SQL in general?
with data as (
select name, split, value,
row_number() over (partition by (name) order by value desc) as row_num
from temp_test
)
select
name,
case when row_num <= 2 then split else 'Others' end as split,
sum(value) as value
from data
group by name, case when row_num <= 2 then row_num else 3 end
Shawnt00's answer is good, but for the record in Snowflake this can be written simpler:
Firstly the group by at the end can refer to the results by index or name:
GROUP BY 1,2
or
GROUP BY name, split
also as the CASE only has too branches an IFF can be used and seems you are using a CTE to add the row_number you can push the IFF into the CTE also
WITH data AS (
SELECT name, value,
ROW_NUMBER() OVER (PARTITION BY name ORDER BY value DESC) AS row_num,
IFF(row_num < 3, split, 'Others') as n_split
FROM VALUES ('A','a', 100), ('A','b', 200), ('A','c',300), ('A', 'd', 400),
('A', 'e',500), ('B', 'a', 1000), ('B','b', 2000), ('B','c', 3000),
('B', 'd',4000), ('B','e', 5000)
v(name, split, value)
)
SELECT
name,
n_split,
SUM(value) AS value
FROM data
GROUP BY name, n_split;
and if super keen on small SQL push the ROW_NUMBER into the IFF:
WITH data AS (
SELECT name, value,
IFF(ROW_NUMBER() OVER (PARTITION BY name ORDER BY value DESC) < 3, split, 'Others') as n_split
FROM VALUES ('A','a', 100), ('A','b', 200), ('A','c',300), ('A', 'd', 400),
('A', 'e',500), ('B', 'a', 1000), ('B','b', 2000), ('B','c', 3000),
('B', 'd',4000), ('B','e', 5000)
v(name, split, value)
)
SELECT
name,
n_split AS split,
SUM(value) AS value
FROM data
GROUP BY name, n_split;
gives:
NAME SPLIT VALUE
A e 500
A d 400
A Others 600
B e 5000
B d 4000
B Others 6000

Repeat previous value of the column in SQL

I want to know if is possible to repeat the value of the current row until I found another one and then repeat that
WHEN DATE=201903 always is going to be the value of CODE and I want to repeat that value until I found a different value an repeat that.
CREATE TABLE ForgeRock
(
[id] VARCHAR(13),
[code] VARCHAR(57),
[date] VARCHAR(13),
[code2] VARCHAR(7)
);
INSERT INTO ForgeRock ([id], [code], [date], [code2])
VALUES (1, 21, 201903, 0),
(1, 21, 201902, 0),
(1, 21, 201901, 0),
(1, 21, 201812, 0),
(1, 21, 201811, 0),
(1, 21, 201810, 22),
(1, 21, 201809, 0),
(1, 21, 201808, 0),
(1, 21, 201807, 0);
SELECT
*,
result = (CASE WHEN date = 201903 THEN code
WHEN date <> 201903 AND code2 = 0 THEN code
ELSE code2
END)
FROM
ForgeRock
but the number 22 just repeat once and I want that from the moment 22 appears use that number always something like this
Yes, you can do this. The ideal way would be to use LAG() with IGNORE NULLs, but SQL Server does not support that. So here is another method:
select fr.*,
(case when grp = 0 then code else max(code2) over (partition by grp) end) as result
from (select fr.*,
sum(case when code2 <> 0 then 1 else 0 end) over (order by date desc) as grp
from ForgeRock fr
) fr
order by date desc;
This assigns a "group" to rows for each code2 value. Every new code2 generates a new group. Then, we can use max() over this group to spread the value over all the rows in the group.
Finally, the outer query chooses between code and code2.
Here is a db<>fiddle.

Logic to check if exact ids (3+ records) are present in a group in SQL Server

I have some sample data like:
INSERT INTO mytable
([FK_ID], [TYPE_ID])
VALUES
(10, 1),
(11, 1), (11, 2),
(12, 1), (12, 2), (12, 3),
(14, 1), (14, 2), (14, 3), (14, 4),
(15, 1), (15, 2), (15, 4)
Now, here I am trying to check if in each group by FK_ID we have exact match of TYPE_ID values for 1, 2 & 3.
So, the expected output is like:
(10, 1) this should fail
As in group FK_ID = 10 we only have one record
(11, 1), (11, 2) this should also fail
As in group FK_ID = 11 we have two records.
(12, 1), (12, 2), (12, 3) this should pass
As in group FK_ID = 12 we have two records.
And all the TYPE_ID are exactly matching 1, 2 & 3 values.
(14, 1), (14, 2), (14, 3), (14, 4) this should also fail
As we have 4 records here.
(15, 1), (15, 2), (15, 4) this should also fail
Even though we have three records, it should fail as the TYPE_ID here (1, 2, 4) are not matching with required match (1, 2, 3).
Here is my attempt:
select * from mytable t1
where exists (select COUNT(t2.TYPE_ID)
from mytable t2 where t2.FK_ID = t1.FK_ID
and t2.TYPE_ID IN (1, 2, 3)
group by t2.FK_ID having COUNT(t2.TYPE_ID) = 3);
This is not working as expected, because it also pass for FK_ID = 14 which has four records.
Demo: SQL Fiddle
Also, how we can make it generic so that if we need to check for 4 or more TYPE_ID values like (1,2,3,4) or (1,2,3,4,5), we can do that easily by updating few values.
The following query will do what you want:
select fk_id
from t
group by fk_id
having sum(case when type_id in (1, 2, 3) then 1 else 0 end) = 3 and
sum(case when type_id not in (1, 2, 3) then 1 else 0 end) = 0;
This assumes that you have no duplicate pairs (although depending on how you want to handle duplicates, it might be as easy as using, from (select distinct * from t) t).
As for "genericness", you need to update the in lists and the 3.
If you want something more generic:
with vals as (
select id
from (values (1), (2), (3)) v(id)
)
select fk_id
from t
group by fk_id
having sum(case when type_id in (select id from vals) then 1 else 0 end) = (select count(*) from vals) and
sum(case when type_id not in (select id from vals) then 1 else 0 end) = 0;
You can use this code:
SELECT y.fk_id FROM
(SELECT x.fk_id, COUNT(x.type_id) AS count, SUM(x.type_id) AS sum
FROM mytable x GROUP BY (x.fk_id)) AS y
WHERE y.count = 3 AND y.sum = 6
For making it generic, you can equal y.count with N and y.sum with N*(N-1)/2, where N is the number you are looking for (1, 2, ..., N).
You can try this query. COUNT and DISTINCT used for eliminate duplicate records.
SELECT
[FK_ID]
FROM
#mytable T
GROUP BY
[FK_ID]
HAVING
COUNT(DISTINCT CASE WHEN [TYPE_ID] IN (1,2,3) THEN [TYPE_ID] END) = 3
AND COUNT(CASE WHEN [TYPE_ID] NOT IN (1,2,3) THEN [TYPE_ID] END) = 0
Try this:
select FK_ID,count(distinct TYPE_ID) from mytable
where TYPE_ID<=3
group by FK_ID
having count(distinct TYPE_ID)=3
You should use CTE with Dynamic pass Value which you have mentioned in Q.
WITH CTE
AS (
SELECT FK_ID,
COUNT(*) CNT
FROM #mytable
GROUP BY FK_ID
HAVING COUNT(*) = 3) <----- Pass Value here What you want to Display Result,
CTE1
AS (
SELECT T.[ID],
T.[FK_ID],
T.[TYPE_ID],
ROW_NUMBER() OVER(PARTITION BY T.[FK_ID] ORDER BY
(
SELECT NULL
)) RN
FROM #mytable T
INNER JOIN CTE C ON C.FK_ID = T.FK_ID),
CTE2
AS (
SELECT C1.FK_ID
FROM CTE1 C1
GROUP BY C1.FK_ID
HAVING SUM(C1.TYPE_ID) = SUM(C1.RN))
SELECT TT1.*
FROM CTE2 C2
INNER JOIN #mytable TT1 ON TT1.FK_ID = C2.FK_ID;
From above SQL Command which will produce Result (I have passed 3) :
ID FK_ID TYPE_ID
4 12 1
5 12 2
6 12 3

SQL Server- Return Items Only When All Sub-Items Are Available

I have an Item table (denormalized for this example) containing a list of items, parts and whether the part is available. I want to return all the items for which all the parts are available. Each item can have a varying number of parts. For example:
Item Part Available
A 1 Y
A 2 N
A 3 N
B 1 Y
B 4 Y
C 2 N
C 5 Y
D 4 Y
D 6 Y
D 7 Y
The query should return the following:
Item Part
B 1
B 4
D 4
D 6
D 7
Thanks in advance for any assistance.
Here is one trick using Max() Over() Window aggregate Function
SELECT Item,
Part
FROM (SELECT Max([Available])OVER(partition BY [Item]) m_av,*
FROM yourtable) a
WHERE m_av = 'Y'
or using Group By and Having clause
Using IN clause
SELECT Item,
Part
FROM yourtable
WHERE Item IN (SELECT Item
FROM yourtable
GROUP BY Item
HAVING Count(*) = Sum(Iif(Available = 'Y', 1, 0)))
using Exists
SELECT Item,
Part
FROM yourtable A
WHERE EXISTS (SELECT 1
FROM yourtable B
WHERE A.Item = B.Item
HAVING Count(*) = Sum(Iif(Available = 'Y', 1, 0)))
using NOT EXISTS
SELECT Item,
Part
FROM yourtable A
WHERE NOT EXISTS (SELECT *
FROM yourtable B
WHERE A.Item = B.Item
AND B.Available = 'N')
I'd start with rephrasing the requirement - you want to return the items that don't have any parts that are not available. Once you put it like that, it's easy to translate the requirement to SQL using the not exists operator:
SELECT item, part
FROM parts a
WHERE NOT EXISTS (SELECT *
FROM parts b
WHERE a.item = b.item AND b.available = 'N')
Using window function does a single table read.
MIN and MAX window function
select *
from (
select
t.*,
max(available) over (partition by item) a,
min(available) over (partition by item) b
from your_table t
) t where a = b and a = 'Y';
COUNT window function:
select *
from (
select
t.*,
count(*) over (partition by item) n1
count(case when available = 'Y' then 1 end) over (partition by item) n2
from your_table t
) t where n1 = n2;
U can use NOT IN OR NOT EXISTS to achieve this
NOT EXISTS
Select item, part
from table as T1
where not exists( select 1 from tbl where item = t1.item and available = 'N')
NOT IN
Select item, part
from table
where item not in( select item from tbl where available = 'N')
I want to point out that the question in the text is: "I want to return all the items for which all the parts are available". However, your example results include the parts.
If the question is indeed that you want the items only, then you can use simple aggregation:
select item
from parts
group by item
having min(available) = max(available) and min(available) = 'Y';
If you indeed want the detail on the parts as well, then the other answers provide that information.
I do like it problems lend themselves well to being solved by infrequently used language features:
with cte as (
select * from (values
('A', 1, 'Y'),
('A', 2, 'N'),
('A', 3, 'N'),
('B', 1, 'Y'),
('B', 4, 'Y'),
('C', 2, 'N'),
('C', 5, 'Y'),
('D', 4, 'Y'),
('D', 6, 'Y'),
('D', 7, 'Y')
) as x(Item, Part, Available)
)
select *
into #t
from cte as c;
select *
from #t as c
where 'Y' = all (
select Available
from #t as a
where c.Item = a.Item
)
Here, we use a correlated subquery and the all keyword to see if all of the parts are available. My understanding is that, like exists, this will stop if it finds a counter-example.