Making a conditional aggregate - sql

I have tricky grouping problem for our business reasons, I have a table which has values like this
----------------------------
| NAME | TYPE | VALUE |
----------------------------
| N1 | T1 | V1 |
| N1 | T2 | V2 |
| N1 | NULL | V3 |
| N2 | T2 | V4 |
| N2 | NULL | V5 |
| N3 | NULL | V6 |
-----------------------------
I need to group it in a way that,
The first level grouping will be by name.
At the second level,
When the available types are T1,T2 and NULL, group T1 and NULL together and have T2 grouped seperately.
When the available types are T2 and NULL, group NULL with T2.
When NULL is the only available type, just have it as it is.
The expected O/P for the above table is,
----------------------------
| N1 | T1 | V1+V3 |
| N1 | T2 | V2 |
| N2 | T2 | V4+V5 |
| N3 | NULL | V6 |
-----------------------------
How to achieve this in snowflake sql. Or any other server, so that I can find an equivalent in Snowflake.

The following query should work:
SELECT t1.NAME, COALESCE(TYPE, MIN_TYPE), SUM(VALUE)
FROM mytable AS t1
JOIN (
SELECT NAME, MIN(TYPE) AS MIN_TYPE
FROM mytable
GROUP BY NAME
) AS t2 ON t1.NAME = t2.NAME
GROUP BY t1.NAME, COALESCE(TYPE, MIN_TYPE)
The query uses a derived table in order to extract the MIN(TYPE) value per NAME. Using COALESCE we can then convert NULL to either T1 or T2.
Edit:
You can create a pivoted version of the expected result set using the following query:
SELECT NAME,
CASE
WHEN T1SUM IS NULL THEN 0
ELSE COALESCE(T1SUM, 0) + COALESCE(NULLSUM,0)
END AS T1SUM,
CASE
WHEN T1SUM IS NULL AND T2SUM IS NOT NULL
THEN COALESCE(T2SUM, 0) + COALESCE(NULLSUM,0)
ELSE COALESCE(T2SUM, 0)
END AS T2SUM,
CASE
WHEN T1SUM IS NULL AND T2SUM IS NULL THEN COALESCE(NULLSUM,0)
ELSE 0
END AS NULLSUM
FROM (
SELECT NAME,
SUM(CASE WHEN TYPE = 'T1' THEN VALUE END) AS T1SUM,
SUM(CASE WHEN TYPE = 'T2' THEN VALUE END) AS T2SUM,
SUM(CASE WHEN TYPE IS NULL THEN VALUE END) AS NULLSUM
FROM mytable
GROUP BY NAME) AS t

So in Giorgos's answer that totals are given in a pivoted, or single row be case form, not many rows per case, and this can be written simpler:
with this data:
WITH data_table(name, type, value) AS (
SELECT * FROM VALUES
(10, 1, 100 ),
(10, 2, 200 ),
(10, null, 400 ),
(11, 2, 100 ),
(11, null, 200 ),
(12, null, 100 )
)
and this SQL
SELECT name
,SUM(IFF(type=1, value, null)) as t1_val
,SUM(IFF(type=2, value, null)) as t2_val
,SUM(IFF(type is null, value, null)) as tnull_val
,IFF(t1_val is not null, t1_val + zeroifnull(tnull_val), null) as c1_sum
,IFF(t1_val is not null, t2_val, t2_val + zeroifnull(tnull_val)) as c2_sum
,IFF(t1_val is null AND t2_val is null, tnull_val, null) as c3_sum
FROM data_table
GROUP BY 1;
we get:
NAME
T1_VAL
T2_VAL
TNULL_VAL
C1_SUM
C2_SUM
C3_SUM
10
100
200
400
500
200
null
11
null
100
200
null
300
null
12
null
null
100
null
null
100
which shows for the 10 row the null sum binds with 1 sum, for the 11 row the null sum binds with the 2 sum, and in the 12 row we get the null sum by itself.
We can unpivot these values if we wish, but joining to a mini table with 3 rows like so:
SELECT d.name,
p.c2 as type,
case p.c1
WHEN 1 then d.c1_sum
WHEN 2 then d.c2_sum
ELSE d.c3_sum
end as value
FROM (
SELECT name
,SUM(IFF(type=1, value, null)) as t1_val
,SUM(IFF(type=2, value, null)) as t2_val
,SUM(IFF(type is null, value, null)) as tnull_val
,IFF(t1_val is not null, t1_val + zeroifnull(tnull_val), null) as c1_sum
,IFF(t1_val is not null, t2_val, t2_val + zeroifnull(tnull_val)) as c2_sum
,IFF(t1_val is null AND t2_val is null, tnull_val, null) as c3_sum
FROM data_table
GROUP BY 1
) AS d
JOIN (
SELECT column1 as c1, column2 as c2
FROM VALUES (1,'T1'),(2,'T2'),(null,'null')
) AS p
ON ((d.c1_sum is not null AND p.c1 = 1)
OR (d.c2_sum is not null AND p.c1 = 2)
OR (d.c3_sum is not null AND p.c1 is null))
ORDER BY 1,2;
which gives the original requested output:
NAME
TYPE
VALUE
10
T1
500
10
T2
200
11
T2
300
12
null
100

Related

SQL How to update previous and next row value without using lead and lag?

I am trying to flag some rows based on a value in a a column, but I also need to put same flag for previous and next row as well based on the current row value.
so below is my table
-- create a table
CREATE TABLE table1 (
id INTEGER PRIMARY KEY,
time INTEGER,
event varchar NOT NULL
);
-- insert some values
INSERT INTO table1 VALUES (1, '1', 'r');
INSERT INTO table1 VALUES (2, '2', 'r');
INSERT INTO table1 VALUES (3, '3', 's');
INSERT INTO table1 VALUES (4, '4', 'r');
INSERT INTO table1 VALUES (5, '5', 'r');
INSERT INTO table1 VALUES (6, '6', 'r');
INSERT INTO table1 VALUES (7, '7', 's');
INSERT INTO table1 VALUES (8, '8', 'r');
INSERT INTO table1 VALUES (9, '9', 'r');
INSERT INTO table1 VALUES (10, '10', 's');
I want to add a column flag that contains 0 for event='s' and also for it's previous and next row as well. but cannot use lead or lag or temp table due to system constraints.
so my final output looks like this
+-----------+--------+------+
| timestamp | events | flag |
+-----------+--------+------+
| 1 | r | 1 |
| 2 | r | 0 |
| 3 | s | 0 |
| 4 | r | 0 |
| 5 | r | 1 |
| 6 | r | 1 |
| 7 | r | 0 |
| 8 | s | 0 |
| 9 | r | 0 |
| 10 | r | 0 |
| 11 | s | 0 |
+-----------+--------+------+
what I have tried so far is following
SELECT a.time, a.event, 0 as flag
FROM table1 AS a
JOIN table1 AS b
ON b.event = 's' AND abs(a.id - b.id) <= 1
I get all the rows which I need to flag as 0 but missing out on 1
TimeStamp is ordered time but for ease of solving converted it to integer.
Try the following:
With CTE As
(
Select id, time, event,
Case
When event='r' then -10 else id
End as f
From table1
)
Select id, time, event,
Case
when id in
(select f from cte where f<>-10
union
select f+1 from cte where f<>-10
union
select f-1 from cte where f<>-10) then 0 else 1
End As flag
From CTE
Where the -10 in When event='r' then -10 else id is any integer value not existed in the id column even if it has been added by 1.
See a demo from db<>fiddle.
Update to cover the gaps in the id column:
With CTE As
(
Select M.id, M.time, M.event,
Case
When M.event='r' then -10 else id
End as f,
Case
when M.event='s' then
(select top 1 T.id from table1 T where T.id > M.id order by T.id)
else -10
End As Lead_val,
Case
when M.event='s' then
(select top 1 T.id from table1 T where T.id < M.id order by T.id desc)
else -10
End As Lag_val
From table1 M
)
Select T.id, T.time, T.event,
Case
when T.id in (
select f from cte
union
select Lead_val from cte
union
select Lag_val from cte
)
then 0 else 1
End as flag
From table1 T
See a demo from db<>fiddle.
Another alternative. To address the possibility of gaps, I use row_number to generate a gap-less sequence and then use a self-join to avoid LEAD and LAG.
with cte as (select *, ROW_NUMBER() over (order by time) as rno from table1
)
select main.*,
case when main.event = 's' then 0
when main.event <> 's' and after.event = 's' then 0
when main.event <> 's' and prior.event = 's' then 0
else 1 end as [flag],
prior.rno as [r-1], prior.id as [prior id], prior.event as [prior event],
after.rno as [r+1], after.id as [after id], after.event as [after event]
from cte as main
left join cte as prior on main.rno = prior.rno + 1
left join cte as after on main.rno = after.rno - 1
order by main.rno;
fiddle to demonstrate - containing some extra rows with gaps to illustrate. It is not clear what logic is most appropriate for choosing the prior/next rows so I used the "time" column.

SQL return second max date for each id, date and channel

I have the following table:
id channel_id date
1 | 1 | 2017-01-10
1 | 2 | 2018-02-05
1 | 1 | 2019-03-07
1 | 2 | 2020-03-15
2 | 1 | 2018-01-17
2 | 1 | 2019-07-20
2 | 1 | 2020-01-10
I want to return for previous maximum date for each date and id but two separate columns for both channel_id. So, one column for previous max date for channel_id is equal to 1 and another for previous max date for channel_id is equal to 2. What I want to get can be found below:
id channel_id date prev_date_channel_id1 prev_date_channel_id2
1 | 1 | 2017-01-10 | NULL | NULL |
1 | 2 | 2018-02-05 | 2017-01-10 | NULL |
1 | 1 | 2019-03-07 | 2017-01-10 | 2018-02-05 |
1 | 2 | 2020-03-15 | 2019-03-07 | 2018-02-05 |
2 | 1 | 2018-01-17 | NULL | NULL |
2 | 1 | 2019-07-20 | 2018-01-17 | NULL |
2 | 1 | 2020-01-10 | 2019-07-20 | NULL |
I made a query as below and returns what I want but takes too much time. I'd appreciate any optimization suggestions!
SELECT
a.id,
a.date,
MAX(c.date) AS prev_date_channel_id1,
MAX(d.date) AS prev_date_channel_id2
FROM
table a
LEFT JOIN
table c ON a.id=c.id AND a.date>c.date AND c.channel_id=1
LEFT JOIN
table d ON a.id=d.id AND a.date>d.date AND d.channel_id=2
GROUP BY a.id, a.date
Use lag() for the previous date and a cumulative conditional max for the channel 2 date:
select t.*, lag(date) over (partition by id order by date) as prev_date,
max(case when channel = 2 then date end) over
(partition by id
order by date
rows between unbounded preceding and 1 row preceding
) as prev_date_channel2
from t;
I think there's an error in your "expected output" for the value of prev_date_channel_id1 on the last row (it should be 2019-07-20).
In any case, with appropriate indexing an outer apply top 1 construct might serve you better:
create table t
(
id int,
channel_id int,
[date] date
constraint pk_t primary key clustered (id, channel_id, [date])
);
insert t values
(1, 1, '2017-01-10'),
(1, 2, '2018-02-05'),
(1, 1, '2019-03-07'),
(1, 2, '2020-03-15'),
(2, 1, '2018-01-17'),
(2, 1, '2019-07-20'),
(2, 1, '2020-01-10');
select t1.id,
t1.channel_id,
t1.[date],
prev_date_channel_id1 = c1.dt,
prev_date_channel_id2 = c2.dt
from t t1
outer apply (
select top 1 [date]
from t
where id = t1.id
and channel_id = 1
and [date] < t1.[date]
order by date desc
) c1(dt)
outer apply (
select top 1 [date]
from t
where id = t1.id
and channel_id = 2
and [date] < t1.[date]
order by date desc
) c2(dt)
order by t1.id, t1.[date];
Or possibly faster still, especially with the key changed to constraint pk_t primary key clustered (id, [date], [channel_id]))
select t1.id,
t1.channel_id,
t1.[date],
prev_date_channel_id1 = prev.c1,
prev_date_channel_id2 = prev.c2
from t t1
outer apply (
select c1 = max(iif(channel_id = 1, [date], null)),
c2 = max(iif(channel_id = 2, [date], null))
from t
where id = t1.id
and [date] < t1.[date]
) prev
Assuming you have an index on those three columns, you can use subqueries:
SELECT [T0].[id],
[T0].[channel_id],
[T0].[date],
[prev_date_channel_id1] = (
SELECT MAX([T1].[date])
FROM [t] [T1]
WHERE [T1].[id] = [T0].[id]
AND [T1].[date] < [T0].[date]
AND [T1].[channel_id] = 1
),
[prev_date_channel_id2] = (
SELECT MAX([T1].[date])
FROM [t] [T1]
WHERE [T1].[id] = [T0].[id]
AND [T1].[date] < [T0].[date]
AND [T1].[channel_id] = 2
)
FROM [t] [T0];

Select only the "most complete" record

I need to solve the following problem.
Let's suppose I have a table with 4 fields called a, b, c, d.
I have the following records:
-------------------------------------
a | b | c | d
-------------------------------------
1 | 2 | | row 1
1 | 2 | 3 | 4 row 2
1 | 2 | | 4 row 3
1 | 2 | 3 | row 4
As it's possible to observe, rows 1,3,4 are "sub-records" of row 2.
What I would like to do is, to extract only 2nd row.
Could you help me please?
Thanks in advance for the answer
EDIT: I need to be more specific.
I could have also the cases:
-------------------------------------
a | b | c | d
-------------------------------------
1 | 2 | | row 1
1 | 2 | | 4 row 2
1 | | | 4 row 3
where I need to extract the 2nd row,
-------------------------------------
a | b | c | d
-------------------------------------
1 | 2 | | row 1
1 | 2 | 3 | row 2
1 | | 3 | row 3
and again I need to extract the 2nd row.
Same for couples,
a | b | c | d
-------------------------------------
1 | | | row 1
1 | | 3 | row 2
| | 3 | row 3
and so on for the other examples.
(Of course, it's now always 2nd row)
Using a NOT EXISTS the records that have a better duplicate can be filtered out.
create table abcd (
a int,
b int,
c int,
d int
);
insert into abcd (a, b, c, d) values
(1, 2, null, null)
,(1, 2, 3, 4)
,(1, 2, null, 4)
,(1, 2, 3, null)
,(2, 3, null,null)
,(2, 3, null, 5)
,(2, null, null, 5)
,(3, null, null, null)
,(3, null, 5, null)
,(null, null, 5, null)
SELECT *
FROM abcd AS t
WHERE NOT EXISTS
(
select 1
from abcd as d
where (t.a is null or d.a = t.a)
and (t.b is null or d.b = t.b)
and (t.c is null or d.c = t.c)
and (t.d is null or d.d = t.d)
and (case when t.a is null then 0 else 1 end +
case when t.b is null then 0 else 1 end +
case when t.c is null then 0 else 1 end +
case when t.d is null then 0 else 1 end) <
(case when d.a is null then 0 else 1 end +
case when d.b is null then 0 else 1 end +
case when d.c is null then 0 else 1 end +
case when d.d is null then 0 else 1 end)
);
a | b | c | d
-: | ---: | ---: | ---:
1 | 2 | 3 | 4
2 | 3 | null | 5
3 | null | 5 | null
db<>fiddle here
You will need to compute a "completion index" for each row. In the example you provided, you might use something along the lines of:
(CASE WHEN a IS NULL THEN 0 ELSE 1) +
(CASE WHEN b IS NULL THEN 0 ELSE 1) +
(CASE WHEN c IS NULL THEN 0 ELSE 1) +
(CASE WHEN d IS NULL THEN 0 ELSE 1) AS CompletionIndex
Then SELECT the top 1 ordered by CompletionIndex in descending order.
This is obviously not very scalable across a large number of columns. But if you have a large number of sparsely populated columns you might consider a row-based rather than column-based structure for your data. That design would make it much easier to count the number of non-NULL values for each entity.
Most complete rows, by your definition, are the ones with the least null columns:
SELECT * FROM tablename
WHERE (
(CASE WHEN a IS NULL THEN 0 ELSE 1 END) +
(CASE WHEN b IS NULL THEN 0 ELSE 1 END) +
(CASE WHEN c IS NULL THEN 0 ELSE 1 END) +
(CASE WHEN d IS NULL THEN 0 ELSE 1 END)
) =
(SELECT MAX(
(CASE WHEN a IS NULL THEN 0 ELSE 1 END) +
(CASE WHEN b IS NULL THEN 0 ELSE 1 END) +
(CASE WHEN c IS NULL THEN 0 ELSE 1 END) +
(CASE WHEN d IS NULL THEN 0 ELSE 1 END))
FROM tablename)
Hmmm . . . I think you can use not exists:
with t as (
select t.*, row_number() over (order by a) as id
from t
)
select t.*
from t
where not exists (select 1
from t t2
where ((t2.a is not distinct from t.a or t2.a is not null and t.a is null) and
(t2.b is not distinct from t.b or t2.b is not null and t.b is null) and
(t2.c is not distinct from t.c or t2.c is not null and t.c is null) and
(t2.d is not distinct from t.d or t2.d is not null and t.d is null)
) and
t2.id <> t.id
);
The logic is that no more specific row exists, where the values match
Here is a db<>fiddle.
As mentioned by Gordon Linoff, we do have to use something like not exists too,
Edit Using EXCEPT helps
This might work...
SELECT * from table1
EXCEPT
(
SELECT t1.*
FROM table1 t1
JOIN table1 t2
ON COALESCE(t1.a, t2.a, -1) = COALESCE(t2.a, -1)
AND COALESCE(t1.b, t2.b, -1) = COALESCE(t2.b, -1)
AND COALESCE(t1.c, t2.c, -1) = COALESCE(t2.c, -1)
AND COALESCE(t1.d, t2.d, -1) = COALESCE(t2.d, -1)
)
Here, t1 is every subset row.
Note: We are assuming value -1 as sentinel value and it does not occur in any column.

SQL Order By On two columns but same prority

I'm stuck on this simple select and don't know what to do.
I Have this:
ID | Group
===========
1 | NULL
2 | 100
3 | 100
4 | 100
5 | 200
6 | 200
7 | 100
8 | NULL
and want this:
ID | Group
===========
1 | NULL
2 | 100
3 | 100
4 | 100
7 | 100
5 | 200
6 | 200
8 | NULL
all group members keep together, but others order by ID.
I can not write this script because of that NULL records. NULL means that there is not any group for this record.
First you want to order your rows by the minimum ID of their group - or their own ID in case they belong to no group.Then you want to order by ID. That is:
order by min(id) over (partition by case when grp is null then id else grp end), id
If IDs and groups can overlap (i.e. the same number can be used for an ID and for a group, e.g. add a record for ID 9 / group 1 to your sample data) you should change the partition clause to something like
order by min(id) over (partition by case when grp is null
then 'ID' + cast(id as varchar)
else 'GRP' + cast(grp as varchar) end),
id;
Rextester demo: http://rextester.com/GPHBW5600
What about data after a null? In a comment you said don't sort the null.
declare #T table (ID int primary key, grp int);
insert into #T values
(1, NULL)
, (3, 100)
, (5, 200)
, (6, 200)
, (7, 100)
, (8, NULL)
, (9, 200)
, (10, 100)
, (11, NULL)
, (12, 150);
select ttt.*
from ( select tt.*
, sum(ff) over (order by tt.ID) as sGrp
from ( select t.*
, iif(grp is null or lag(grp) over (order by id) is null, 1, 0) as ff
from #T t
) tt
) ttt
order by ttt.sGrp, ttt.grp, ttt.id
ID grp ff sGrp
----------- ----------- ----------- -----------
1 NULL 1 1
3 100 1 2
7 100 0 2
5 200 0 2
6 200 0 2
8 NULL 1 3
10 100 0 4
9 200 1 4
11 NULL 1 5
12 150 1 6

How to write a query to allow null in minimum function

I need to write a query to get minimum values for a column from a table and if the value is null then I want to include that row. I wrote following query but it ignores the null values. How I can modify this query to include null values in the result?
select * from TABLE where COLUMN = (select min(COLUMN) from TABLE );
If the table is like below
|ID | VALUE | NAME
101 1 John
101 null John
102 1 Bill
103 1 Tina
103 null Tina
104 null James
Result Should be
|ID | VALUE | NAME
101 1 John
102 1 Bill
103 1 Tina
104 null James
You need distinct on:
with my_table(id, value, name) as (
values
(101, 1, 'John'),
(101, null, 'John'),
(102, 1, 'Bill'),
(103, 1, 'Tina'),
(103, null, 'Tina'),
(104, null, 'James')
)
select distinct on (id) *
from my_table
order by id, value
id | value | name
-----+-------+-------
101 | 1 | John
102 | 1 | Bill
103 | 1 | Tina
104 | | James
(4 rows)
Distinct on is a fantastic feature specific for Postgres. An alternative in other RDBMS may be:
select t.id, t.value, t.name
from my_table t
join (
select id, min(value) as value
from my_table
group by id
) u on u.id = t.id and u.value is not distinct from t.value;
Note, you should use is not distinct from because value may be null.
SQL SERVER
select DISTINCT j.ID,j.VALUE,j.NAME from Table1 j
join (
select id, MIN(VALUE) VALUE from Table1
group by id
) as t
on t.ID = j.ID and (t.VALUE = j.VALUE or t.VALUE is null)
You cannot do an equals (=) for a null value, you have to check is null or so. So one simple solution is to default the null value to a number that would not otherwise be used:
select * from TABLE where coalesce(COLUMN, -9999) = (select min(coalesce(COLUMN,-9999)) from TABLE );
The coalesce function returns the first non-null value passed to it.
with c as (
select column as c
from table
order by column nulls first
limit 1
)
select *
from table cross join c
where column = c or column is null
If you want to user order by:
select t.*
from t
order by t.column asc nulls first
limit 1;
Alternatively, use rank():
select t.*
from (select t.*,
rank() over (order by col asc nulls first) as seqnum
from t
) t
where seqnum = 1;
I hope this solve your problem.
SELECT id,
CASE WHEN MIN(
CASE WHEN value IS NULL THEN 0 ELSE 1 END) = 0 THEN null
ELSE MIN(value) END
FROM tableName
GROUP BY id
or using COALESCE.
SELECT id,
CASE WHEN MIN(COALESCE(value, 0)) = 0 THEN null
ELSE MIN(value) END
FROM tableName
GROUP BY id
I am on mobile phone now, so I cannot test.