Using window function in redshift to aggregate conditionally

Using window function in redshift to aggregate conditionally - sql

I have a table with following data:
Link to test data: http://sqlfiddle.com/#!15/dce01/1/0
I want to aggregate the items column (using listagg) for each group in gid in sequence as specified by seq column based on the condition that aggregation ends when pid becomes 0 again for a group.
i.e.
for group g1, there would be 2 aggregations; 1 for seq 1-3 and another for sequence 4-6; since for group g1, the pid becomes 0 for seq 4.
I expect the result for the given example to be as follows (Please note that seq in result is the min value of seq for the group where the pid becomes 0):

I understand your question as a gaps and island problem, where you want to group together adjacent rows having the same gid untiil a pid having value 0 is met.
Here is one way to solve it using a window sum to define the groups: basically, a new island starts everytime a pid of 0 is met. The rest is just aggregation:
select
gid,
min(seq) seq,
listagg(items, ',') within group(order by seq) items
from (
select
t.*,
sum(case when pid = 0 then 1 else 0 end) over(partition by gid order by seq) grp
from mytable t
) t
group by gid, grp
order by gid, grp

it's gaps and islands problem:
with
subgroup_ids as (
select *, sum(case when pid=0 then 1 else 0 end) over (partition by gid order by seq) as subgroup_id
from tablename
)
select gid, subgroup_id, listagg(items,',')
from subgroup_ids
group by 1,2

Related

SUM a specific column in next rows until a condition is true

Here is a table of articles and I want to store sum of Mass Column from next rows in sumNext Column based on a condition.
If next row has same floor (in floorNo column) as current row, then add the mass of next rows until the floor is changed
E.g : Rows three has sumNext = 2. That is computed by adding the mass from row four and row five because both rows has same floor number as row three.
id
mass
symbol
floorNo
sumNext
2891176
1
D
1
0
2891177
1
L
8
0
2891178
1
L
1
2
2891179
1
L
1
1
2891180
1
1
0
2891181
1
5
2
2891182
1
5
1
2891183
1
5
0
Here is the query, that is generating this table, I just want to add sumNext column with the right value inside.
WITH items AS (SELECT
SP.id,
SP.mass,
SP.symbol,
SP.floorNo
FROM articles SP
ORDER BY
DECODE(SP.symbol,
'P',1,
'D',2,
'L',3,
4 ) asc)
SELECT CLS.*
FROM items CLS;

You could use below solution which uses
common table expression (cte) technique to put all consecutive rows with same FLOORNO value in the same group (new grp column).
Then uses the analytic version of SUM function to sum all next MASS per grp column as required.
Items_RowsNumbered (id, mass, symbol, floorNo, rnb) as (
select ID, MASS, SYMBOL, FLOORNO
, row_number()over(
order by DECODE(symbol, 'P',1, 'D',2, 'L',3, 4 ) asc, ID )
/*
You need to add ID column (or any others columns that can identify each row uniquely)
in the "order by" clause to make the result deterministic
*/
from (Your source query)Items
)
, cte(id, mass, symbol, floorNo, rnb, grp) as (
select id, mass, symbol, floorNo, rnb, 1 grp
from Items_RowsNumbered
where rnb = 1
union all
select t.id, t.mass, t.symbol, t.floorNo, t.rnb
, case when t.floorNo = c.floorNo then c.grp else c.grp + 1 end grp
from Items_RowsNumbered t
join cte c on (c.rnb + 1 = t.rnb)
)
select
ID, MASS, SYMBOL, FLOORNO
/*, RNB, GRP*/
, nvl(
sum(MASS)over(
partition by grp
order by rnb
ROWS BETWEEN 1 FOLLOWING and UNBOUNDED FOLLOWING)
, 0
) sumNext
from cte
;
demo on db<>fiddle

This is a typical gaps-and-islands problem. You can use LAG() in order to determine the exact partitions, and then SUM() analytic function such as
WITH ii AS
(
SELECT i.*,
ROW_NUMBER() OVER (ORDER BY id DESC) AS rn2,
ROW_NUMBER() OVER (PARTITION BY floorNo ORDER BY id DESC) AS rn1
FROM items i
)
SELECT id,mass,symbol, floorNo,
SUM(mass) OVER (PARTITION BY rn2-rn1 ORDER BY id DESC)-1 AS sumNext
FROM ii
ORDER BY id
Demo

RANK in SQL but start at 1 again when number is greater than

I need an sql code for the below. I want it to RANK however if DSLR >= 60 then I want the rank to start again like below.
Thanks

Assuming that you have a column that defines the ordering of the rows, say id, you can address this as a gaps-and-islands problem. Islands are group of adjacent record that start with a dslr above 60. We can identify them with a window sum, then rank within each island:
select dslr, rank() over(partition by grp order by id) as rn
from (
select t.*,
sum(case when dslr >= 60 then 1 else 0 end) over(order by id) as grp
from mytable t
) t

How do I create a new SQL table with custom column names and populate these columns

So I currently have an SQL statement that generates a table with the most frequent occurring value as well as the least frequent occurring value in a table. However this table has 2 rows with the row values as well as the fields. I need to create a custom table with 2 columns with min and max. Then have one row with one value for each. The value for these columns needs to be from the same row.
(SELECT name, COUNT(name) AS frequency
FROM firefighter_certifications
GROUP BY name
ORDER BY frequency DESC limit 1)
UNION
(SELECT name, COUNT(name) AS frequency
FROM firefighter_certifications
GROUP BY name
ORDER BY frequency ASC limit 1);
So for the above query I would need the names of the min and max values in one row. I need to be able to define the name of new columns for the generated SQL query as well.
Min_Name | Max_Name
Certif_1 | Certif_2

I think this query should give you the results you want. It ranks each name according to the number of times it appears in the table, then uses conditional aggregation to select the min and max frequency names in one row:
with cte as (
select name,
row_number() over (order by count(*) desc) as maxr,
row_number() over (order by count(*)) as minr
from firefighter_certifications
group by name
)
select max(case when minr = 1 then name end) as Min_Name,
max(case when maxr = 1 then name end) as Max_Name
from cte

Postgres doesn't offer "first" and "last" aggregation functions. But there are other, similar methods:
select distinct first_value(name) over (order by cnt desc, name) as name_at_max,
first_value(name) over (order by cnt asc, name) as name_at_min
from (select name, count(*) as cnt
from firefighter_certifications
group by name
) n;
Or without any subquery at all:
select first_value(name) over (order by count(*) desc, name) as name_at_max,
first_value(name) over (order by count(*) asc, name) as name_at_min
from firefighter_certifications
group by name
limit 1;
Here is a db<>fiddle

Presto SQL - Rank Multiple Conditions for Multiple Columns

I am trying to write a single query (if possible) to rank ids based on multiple conditions.
My table is like this:
id group subgroup value
1 A Q 12
2 A Z 10
3 B Z 14
4 A Z 20
5 B W 20
I tried this query:
SELECT id,
CASE WHEN group = 'A' THEN ROW_NUMBER() OVER (PARTITION BY group ORDER BY SUM(value) DESC) AS rank_group
CASE WHEN group = 'A' AND subgroup = 'Z' THEN ROW_NUMBER() OVER (PARTITION BY group, subgroup ORDER BY SUM(value) DESC) AS rank_subgroup
FROM table
GROUP BY group, subgroup
But ended up with something like this:
id rank_group rank_subgroup
1 1 1
1 2 2
I would like to get each distinct id and return the rank based on the conditions of the case statement, but it looks like adding the needed partition causes a multiplication as the group by is necessary. I could write individual queries for each column, but I'd like to avoid if possible.

Do you want something like this?
select t.*,
dense_rank() over (order by sumg, group),
dense_rank() over (partition by group order by sumsg, subg),
from (select t.*,
sum(value) over (partition by group) as sumg,
sum(value) over (partition by group, subgroup) as sumsg
from t
) t;
This is my best guess at interpreting what you might want.

Find the nth value in hive

I am trying to identify the Nth Score Value which is also dependant on another variable.
For example I want to see the nth Transaction amount of each person, the issue I currently have is that my RANK does not re-start the count of n at each name, it just continues down the output like a row count:
Syntax example:
SELECT name, txn_amount, dense_rank() over (order by name,txn_amount desc ) as nth_value FROM payment_table
Any help is greatly appreciated.
P.S I am using HIVE to run this if it helps

You need to partition by one value and order by the other:
SELECT name, txn_amount,
FROM (SELECT pt.*,
dense_rank() over (partition by name order by txn_amount desc ) as nth_value
FROM payment_table pt
) pt
WHERE nth_value = X;
The subquery is needed to get a particular value. If you want multiple values in the same row, you can use GROUP BY:
SELECT name,
MAX(CASE WHEN nth_value = 1 THEN txn_amount END) as value_1,
MAX(CASE WHEN nth_value = 2 THEN txn_amount END) as value_2
FROM (SELECT pt.*,
dense_rank() over (partition by name order by txn_amount desc ) as nth_value
FROM payment_table pt
) pt
WHERE nth_value = X
GROUP BY name;
Note: DENSE_RANK() will ignore duplicates. If you want to see those as well (so the second value could have the same value as the first), then use ROW_NUMBER().

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Using window function in redshift to aggregate conditionally - sql

it's gaps and islands problem: with subgroup_ids as ( select *, sum(case when pid=0 then 1 else 0 end) over (partition by gid order by seq) as subgroup_id from tablename ) select gid, subgroup_id, listagg(items,',') from subgroup_ids group by 1,2

Related

SUM a specific column in next rows until a condition is true

RANK in SQL but start at 1 again when number is greater than

How do I create a new SQL table with custom column names and populate these columns

Presto SQL - Rank Multiple Conditions for Multiple Columns

Find the nth value in hive

Categories

Resources