group by and select non null value if present - sql

Is there a way I can perform group by and use non value for a column if any. i.e
a | b | c | d | e | f |
---------------------------------------------------
1 | 2 | 3 | x | test1 | 2019-07-01 07:17:01 |
1 | 2 | 3 | NULL | test2 | 2019-07-01 10:23:11 |
1 | 2 | 3 | NULL | test3 | 2019-07-01 22:00:51 |
1 | 2 | 7 | NULL | testTet | 2019-07-01 23:00:00 |
In my case above if d is present for say a=1,b=2,c=3 it will always be x otherwise it can come null. So my query would be like
select a,
b,
c,
d,
count(distinct e) as something
from tableX
where f between '2019-07-01 00:00:00' and '2019-07-01 23:59:59.999'
group by a,
b,
c,
d
the results would be:
a | b | c | d | something |
------------------------------|
1 | 2 | 3 | x | 1 |
1 | 2 | 3 | NULL | 2 |
1 |2 | 7 | NULL | 1 |
whereas it will be wonderful if I can have (since for each group by combination I know it's null or that unique value if present):
a | b | c | d | something |
------------------------------|
1 | 2 | 3 | x | 3 |
1 | 2 | 7 | NULL | 1 |

From your sample data I think that you don't need d in the group by clause.
So get its max:
select
a, b, c,
max(d) d,
count(distinct e) as something
from tableX
where f between '2019-07-01 00:00:00' and '2019-07-01 23:59:59.999'
group by a, b, c

try like below
with cte as (select a,
b,
c,
d,
count(distinct e) as something
from tableX
where f between '2019-07-01 00:00:00' and '2019-07-01 23:59:59.999'
group by a,
b,
c,
d) select a,b,c,max(d) as d,sum(something) from cte group by a,b,c

Related

Querying last non-null values of time-series table in Postgres

I have a time-series table which looks like the following:
time | a | b | c | d
--------------------+---------+----------+---------+---------
2016-05-15 00:08:22 | | | |
2016-05-15 01:50:56 | | | 26.8301 |
2016-05-15 02:41:58 | | | |
2016-05-15 03:01:37 | | | |
2016-05-15 04:45:18 | | | |
2016-05-15 05:45:32 | | | 26.9688 |
2016-05-15 06:01:48 | | | |
2016-05-15 07:47:56 | | | | 27.1269
2016-05-15 08:01:22 | | | |
2016-05-15 09:35:36 | 26.7441 | 29.8398 | | 26.9981
2016-05-15 10:08:53 | | | |
2016-05-15 11:08:30 | | | |
2016-05-15 12:14:59 | | | |
2016-05-15 13:33:36 | 27.4277 | 29.7695 | |
2016-05-15 14:36:36 | 27.4688 | 29.6836 | |
2016-05-15 15:37:36 | 27.1016 | | |
I want to return last non-null values of every column:
like this (best option):
time | column | value
--------------------+--------- +-------
2016-05-15 15:37:36 | a | 27.1016
2016-05-15 14:36:36 | b | 29.6836
2016-05-15 05:45:32 | c | 26.9688
2016-05-15 09:35:36 | d | 26.9981
like this:
column | value
-------- +-------
a | 27.1016
b | 29.6836
c | 26.9688
d | 26.9981
or at least like this:
a | b | c | d
--------+----------+---------+---------
27.1016 | 29.6836 | 26.9688 | 26.9981
Thanks!
You can unpivot and select the last row:
select distinct on (v.which) t.time, v.which, v.val
from t cross join lateral
(values (a, 'a'), (b, 'b'), (c, 'c'), (d, 'd')) v(val, which)
where v.val is not null
order by v.which, t.time desc;
I suggest another answer, but I see now that #GordonLinoff answer is better.
with src as (
select '0' as pos, 1 as a, 2 as b, null as c
union all select '1', null as a, null as b, 7 as c
union all select '2', 2 as a, null as b, 3 as c
union all select '3', null as a, null as b, null as c
union all select '4', null as a, 4 as b, null as c
),
n as (
select row_number() over() as rn, src.* from src
)
(select last_value(pos) over (order by rn desc) as timestamp, 'a' as column, last_value(a) over (order by rn desc) as value
from n
where a is not null
limit 1)
union all
(select last_value(pos) over (order by rn desc) as timestamp, 'b' as column, last_value(b) over (order by rn desc) as value
from n
where b is not null
limit 1)
union all
(select last_value(pos) over (order by rn desc) as timestamp, 'c' as column, last_value(c) over (order by rn desc) as value
from n
where c is not null
limit 1)
timestamp | column | value
:-------- | :----- | ----:
2 | a | 2
4 | b | 4
2 | c | 3
db<>fiddle here

Flatten multiple arrays with uneven lengths in BigQuery

I'm trying to flatten arrays in different columns with different lengths without duplicating the results.
For example (using standard SQL):
WITH
x AS (
SELECT
ARRAY[1,
2,
3] AS a,
ARRAY[1,
2] AS b)
SELECT
a,
b
FROM
x,
x.a,
x.b
Produces:
+-----++-----+
| a | b |
+-----++-----+
| 1 | 1 |
| 1 | 2 |
| 2 | 1 |
| 2 | 2 |
| 3 | 1 |
| 3 | 2 |
+-----++-----+
It should look like this:
+-----++-----+
| a | b |
+-----++-----+
| 1 | 1 |
| 2 | 2 |
| 3 | null |
+-----++-----+
You can use JOIN:
SELECT a, b
FROM x LEFT JOIN
UNNEST(x.a) a left join
unnest(x.b) b
ON a = b;

SQL - Results based partially on aggregate of particular column

Thanks in advance for any assistance. I have a situation where I need a snapshot of SQL data but part of the results need to be based on the aggregate of one column. Here's a tiny subset of my data:
| A | B | last_date | next_date | C | D |
| 1 | 3 | 01/01/2000 | 01/01/2003 | 1 | 1 |
| 1 | 3 | 01/01/2001 | 01/01/2004 | 1 | 2 |
| 2 | 3 | 01/01/2002 | 01/01/2005 | 2 | 3 |
| 2 | 4 | 01/01/2003 | 01/01/2006 | 3 | 4 |
My results need to be grouped by columns A and B, the MAX of last_date and the MIN of next date. But the kicker is that the values for columns C and D should be the values that correspond to the MIN of next date. So for the above data subset my results would be:
| A | B | last_date | next_date | C | D |
| 1 | 3 | 01/01/2001 | 01/01/2003 | 1 | 1 |
| 2 | 3 | 01/01/2002 | 01/01/2005 | 2 | 3 |
| 2 | 4 | 01/01/2003 | 01/01/2006 | 3 | 4 |
Note how the first row of results has the value of last_date from the 2nd row of the initial data, but the values for columns C and D correspond to the first row from the initial data. In the case where there is an exact duplication of columns A, B, max(last_date), and min(next_date) but the values for columns C and D don't match, then I don't care which one is returned - but I must only return one row, not multiples.
You can use row_number adn get this results as below:
Select A, B, MaxLast_date, MinNext_date, C, D from (
select *, max(last_date) over(partition by A, B) as MaxLast_date, Min(next_date) over(partition by A, B) as MinNext_date,
next_rn = Row_number() over(partition by A, B order by next_date) from #yourtable
) a
Where a.next_rn = 1
Other way is with top (1) with ties as below:
Select top(1) with ties *, max(last_date) over(partition by A, B) as MaxLast_date, Min(next_date) over(partition by A, B) as MinNext_date
from #yourtable
Order by Row_number() over(partition by A, B order by next_date)
Output:
+---+---+--------------+--------------+---+---+
| A | B | MaxLast_date | MinNext_date | C | D |
+---+---+--------------+--------------+---+---+
| 1 | 3 | 2001-01-01 | 2003-01-01 | 1 | 1 |
| 2 | 3 | 2002-01-01 | 2005-01-01 | 2 | 3 |
| 2 | 4 | 2003-01-01 | 2006-01-01 | 3 | 4 |
+---+---+--------------+--------------+---+---+
Demo

hive top K sum() records per group by key

for a table TBL with columns A, B, C, I want to group by and select A, B where I only take the top K values of B very sum(C)
without the top limit, this is:
select A, B, sum(C) from TBL group by A, B
with the values
A | B | C
--+---+----
a | 1 | 10
a | 2 | 20
a | 1 | 5
a | 3 | 12
b | 3 | 100
b | 2 | 90
b | 1 | 120
c | 5 | 10
and limit of 2, the results will be
A | B | sum(C)
--+---+-------
a | 1 | 15
a | 2 | 20
b | 1 | 120
b | 3 | 100
c | 5 | 10
select A
,B
,sum_C
from (select A
,B
,sum(C) as sum_C
,row_number () over
(
partition by A
order by sum(C) desc
) as rn
from TBL
group by A
,B
) t
where rn <= 2
+---+---+-------+
| a | b | sum_c |
+---+---+-------+
| a | 2 | 20 |
| a | 1 | 15 |
| b | 1 | 120 |
| b | 3 | 100 |
| c | 5 | 10 |
+---+---+-------+
You can use windowing functions to achieve this.
Query:
SELECT a, b, c
FROM (
SELECT *
, ROW_NUMBER() OVER (PARTITION BY a ORDER BY c DESC) AS rank
FROM (
SELECT A AS a
, B AS b
, SUM(C) AS c
FROM db.table
GROUP BY A, B ) x ) y
WHERE rank < 3
Output:
a b c
a 2 20
a 1 15
b 1 120
b 3 100
c 5 10

Combining the result of two queries using SQLite

Let's say I have the following tables:
Table A Table B
| a | b | c | | a | b | c |
|---|---|---| |---|---|---|
| a | b | c | | a | b | c |
| a | c | d | | a | c | d |
| c | d | e | | a | b | c |
| f | g | h | | o | p | q |
| a | a | a | | a | b | c |
and I want to find all the rows that differ from both tables.
This gives me the rows from A that are not in B:
SELECT a, b, c FROM A
EXCEPT
SELECT a, b, c FROM B;
| a | b | c |
|---|---|---|
| a | a | a |
| c | d | e |
| f | g | h |
and this gives me the rows from B that are not in A:
SELECT a, b, c FROM B
EXCEPT
SELECT a, b, c FROM A;
| a | b | c |
|---|---|---|
| o | p | q |
So to combine the two I tried using
SELECT a, b, c FROM A
EXCEPT
SELECT a, b, c FROM B
UNION
SELECT a, b, c FROM B
EXCEPT
SELECT a, b, c FROM A;
but it still only gives me the rows from B.
How do I get rows from both tables, like this?
| a | b | c |
|---|---|---|
| a | a | a |
| c | d | e |
| f | g | h |
| o | p | q |
Try enclosing the individual except in parentheses and then do the union:
select * from (
select a, b, c from a
except
select a, b, c from b
)
union
select * from (
select a, b, c from b
except
select a, b, c from a
);