postgresql cumsum by condition - sql

I have the table
I need to calculate cumsum group by id for every row with type="end".
Can anyone see the problem?
Output result

This is a little tricky. One method is to assign a grouping by reverse counting the ends. Then use dense_rank():
select t.*,
dense_rank() over (order by grp desc) as result
from (select t.*,
count(*) filter (where type = 'end') over (order by created desc) as grp
from t
) t;
You can also do this without a subquery:
select t.*,
(count(*) filter (where type = 'end') over () -
count(*) filter (where type = 'end') over (order by created desc) -
1
)
from t;

Related

SQL filter query results based on analytic function

I'd like to find an efficient way to filter my RANK() OVER function in SQL.
I have the following query:
SELECT
base.ITEM_SKU_NBR,
RANK() OVER (ORDER BY SUM(base.NET_SLS_AMT) DESC) AS SLS_rank,
RANK() OVER (ORDER BY COUNT(DISTINCT base.txn_id) DESC) AS txn_rank
FROM
`my_table` base
GROUP BY
1
Which returns this result set:
Now I'd like to filter for items where the SLS_rank is < 10 OR the txn_rank is < 10. Ideally I'd like to do this in the HAVING clause, like this:
SELECT
base.ITEM_SKU_NBR,
RANK() OVER (ORDER BY SUM(base.NET_SLS_AMT) DESC) AS SLS_rank,
RANK() OVER (ORDER BY COUNT(DISTINCT base.txn_id) DESC) AS txn_rank
FROM
`my_table` base
GROUP BY
1
HAVING
SLS_rank < 10 OR txn_rank < 10
But bigquery throws an error:
Column SLS_rank contains an analytic function, which is not allowed in HAVING clause at [9:8]
The only option I can think of is to create this as a separate table and selecting from there, but that doesn't seem very pretty. Any other ideas on how to do this?
Update June 2021.
BigQuery announced support for the QUALIFY clause on the 10th of May, 2021.
The QUALIFY clause filters the results of analytic functions. An analytic function is required to be present in the QUALIFY clause or the SELECT list.
What you need can be achieved with QUALIFY in the following way:
SELECT
base.ITEM_SKU_NBR,
RANK() OVER (ORDER BY SUM(base.NET_SLS_AMT) DESC) AS SLS_rank,
RANK() OVER (ORDER BY COUNT(DISTINCT base.txn_id) DESC) AS txn_rank
FROM `my_table` base
GROUP BY 1
QUALIFY SLS_rank < 10 OR txn_rank < 10
Find more examples in the documentation.
SELECT * FROM (
SELECT
base.ITEM_SKU_NBR,
RANK() OVER (ORDER BY SUM(base.NET_SLS_AMT) DESC) AS SLS_rank,
RANK() OVER (ORDER BY COUNT(DISTINCT base.txn_id) DESC) AS txn_rank
FROM `my_table` base
GROUP BY 1
)
WHERE SLS_rank < 300 OR txn_rank < 300

How to find most frequent value in SQL column and return that value?

I was trying to do something like this:
select nume_produs
from incasari
group by id
having count(nume_produs) = max(count(nume_produs));
but it doesn't work
Do a GROUP BY. Order by count descending. Fetch the first row (highest count) only.
select nume_produs, count(*) as cnt
from incasari
group by nume_produs
order by cnt desc
fetch first 1 row with ties
For the most common value in the column:
select num_produs
from (select nume_produs, count(*) as cnt,
row_number() over (order by count(*)) as seqnum
from incasari
group by nume_produs
) i
where seqnum = 1;
If you want multiple values in the event of duplicates, use rank() instead of row_number().
If you want the most common value per id, then add partition by:
select num_produs
from (select nume_produs, count(*) as cnt,
row_number() over (partition by id order by count(*)) as seqnum
from incasari
group by nume_produs
) i
where seqnum = 1;
SELECT `nume_produs`,
COUNT(`nume_produs`) AS `value_occurrence`
FROM `incasari`
GROUP BY `nume_produs`
ORDER BY `value_occurrence` DESC
LIMIT 1;
Increase 1 if you want to see the N most common values of the column.

listagg(distinct column) over()

Any idea of alternatives to listagg(distinct column) over() that are supported- something to NOT be grouping by the rest of the columns? I have 20+..
You can use a subquery with row_number() to identify the first value to include in the listagg(), such as:
select listagg(case when seqnum = 1 then column end) within group (order by column) over (order by ?)
from (select t.*, row_number() over (partition by column order by column) as seqnum
from t
) t

SQL Finding five largest numbers instead of one Max in a table

I have a table and I need to run a query that contains some aggregation Functions like Maximum , Average , Standard Deviation , ...
but instead of one Maximum I should return 5 largest number.
the simplified query is something like this:
SELECT OSI_KEY , MAX(VALUE) , AVG(VALUE) , STDDEV(VALUE), variance(VALUE)
FROM DATA_VALUES_5MIN_6_2013
GROUP BY OSI_KEY
ORDER BY OSI_KEY
and I need some Magical ;) Query like this:
SELECT OSI_KEY , MAX1(VALUE) ,MAX2(VALUE) ,MAX3(VALUE) ,MAX4(VALUE) , MAX5(VALUE) ,
AVG(VALUE) , STDDEV(VALUE), variance(VALUE)
FROM DATA_VALUES_5MIN_6_2013
GROUP BY OSI_KEY
ORDER BY OSI_KEY
I appreciate your considerations.
Oracle has an NTH_VALUE() function. Unfortunately, it is only an analytic function and not a window function. This leads to the strange construct of SELECT DISTINCT with a bunch of analytic functions:
SELECT DISTINCT OSI_KEY,
MAX(VALUE) OVER (PARTITION BY OSI_KEY),
NTH_VALUE(VALUE, 2) OVER (PARTITION BY OSI_KEY ORDER BY VALUE DESC) as MAX_2,
NTH_VALUE(VALUE, 3) OVER (PARTITION BY OSI_KEY ORDER BY VALUE DESC) as MAX_3,
NTH_VALUE(VALUE, 4) OVER (PARTITION BY OSI_KEY ORDER BY VALUE DESC) as MAX_4,
NTH_VALUE(VALUE, 5) OVER (PARTITION BY OSI_KEY ORDER BY VALUE DESC) as MAX_5,
AVG(VALUE) OVER (PARTITION BY OSI_KEY),
STDDEV(VALUE) OVER (PARTITION BY OSI_KEY),
variance(VALUE) OVER (PARTITION BY OSI_KEY)
FROM DATA_VALUES_5MIN_6_2013
ORDER BY OSI_KEY;
You can also do this using conditional aggregation, with a row_number() or dense_rank() in a subquery.
SELECT OSI_KEY, MaxValue FROM (
SELECT OSI_KEY, MAX(value) AS MaxValue FROM table GROUP BY OSI_KEY
)
ORDER BY MaxValue DESC
FETCH FIRST 5 ROWS ONLY;

How to get single closest value for each column type in DB2

I have this query:
SELECT * FROM TABLE1 WHERE KEY_COLUMN='NJCRF' AND TYPE_COLUMN IN ('SCORE1', 'SCORE2', 'SCORE3') AND DATE_EFFECTIVE_COLUMN<='2016-09-17'
I get about 12 record(rows) as result.
How to get result closest to DATE_EFFECTIVE_COLUMN for each TYPE_COLUMN? In this case, how to get three records, for each type, that are closest to effective date?
UPDATE: I could use TOP if I had to go over only single type, but I have three at this moment and for each of them I need to get closest time result.
Hope I made it clear, let me know if you need more info.
If I understand correctly, you can use ROW_NUMBER():
SELECT t.*
FROM (SELECT t.*,
ROW_NUMBER() OVER (PARTITION BY TYPE_COLUMN ORDER BY DATE_EFFECTIVE_COLUMN DESC) as seqnum
FROM TABLE1 t
WHERE KEY_COLUMN = 'NJCRF' AND
TYPE_COLUMN IN ('SCORE1', 'SCORE2', 'SCORE3') AND
DATE_EFFECTIVE_COLUMN <= '2016-09-17'
) t
WHERE seqnum = 1;
If you want three records per type, just use seqnum <= 3.
I like ROW_NUMBER() for this. You want to partition by TYPE, which will start the row count over for each type, then order by DATE_EFFECTIVE desc, and take only the highest date (the first row):
SELECT *
FROM (
SELECT *,
ROW_NUMBER() over (PARTITION BY TYPE_COLUMN ORDER BY DATE_EFFECTIVE_COLUMN desc) RN
FROM TABLE1
WHERE KEY_COLUMN = 'NJCRF'
AND TYPE_COLUMN IN ('SCORE1', 'SCORE2', 'SCORE3')
AND DATE_EFFECTIVE_COLUMN <= '2016-09-17'
) A
WHERE RN = 1