Rank rows in a column under conditions on a different column - sql

I have the following dataset:
id | date | state
-----------------------
1 | 01/01/17 | high
1 | 02/01/17 | high
1 | 03/01/17 | high
1 | 04/01/17 | miss
1 | 05/01/17 | high
2 | 01/01/17 | miss
2 | 02/01/17 | high
2 | 03/01/17 | high
2 | 04/01/17 | miss
2 | 05/01/17 | miss
2 | 06/01/17 | high
I want to create a column rank_state which ranks, within groups of id, the entries as per increasing date (starting from rank 0) which do not have the state of "miss". Furthermore, the rank repeats itself if the entry has a state of "miss". The output should look like:
id | date | state | rank_state
------------------------------------
1 | 01/01/17 | high | 0
1 | 02/01/17 | high | 1
1 | 03/01/17 | high | 2
1 | 04/01/17 | miss | 2
1 | 05/01/17 | high | 3
2 | 01/01/17 | miss | 0
2 | 02/01/17 | high | 0
2 | 03/01/17 | high | 1
2 | 04/01/17 | miss | 1
2 | 05/01/17 | miss | 1
2 | 06/01/17 | high | 2
For example, the 4th row has a rank of 2 since it's state is "miss", i.e. it repeats the rank of row 3 (the same applies to rows 9 and 10). Please note that rows 6 and 7 should have rank 0.
I have tried the following:
,(case when state is not in ('miss') then (rank() over (partition by id order by date desc) - 1) end) as state_rank
and
,rank() over (partition by id order by case when state is not in ('miss') then date end) as state_rank
but neither give me the desired result. Any ideas would be very helpful.

More than likely you want:
SELECT *,
GREATEST(
COUNT(case when state != 'miss' then 1 else null end)
OVER(PARTITION BY id ORDER BY date) - 1,
0
) as "state_rank"
FROM tbl;
SQL Fiddle
Basically:
make your window frame (partition) overid
only count the ones that aren't 'miss'
because it could be a negative number if starting the record, you can slap on the GREATEST to use 0 (preventing negatives)

Just add frame_clause to vol7ron's answer since Redshift requires it :
select *
, GREATEST(COUNT(case when state != 'miss' then 1 else null end)
OVER(PARTITION BY id order by date rows between unbounded preceding and current row) -1 , 0 ) as state_rank
from tbl;

Related

Running maths over an entire database and ranking all users

I have a database of bets. Each bet has a 'Win', 'Loss', or 'Pending' state. What I want to do is to have an SQL statement that will get the last, say, 20 bets a user has placed, find out their ROI (Total profit / Total staked * 100).
So I'm just wondering if there is a better way to do this. Do I basically have to get the users table, loop over every user, get their last 20 bets, find the ROI and then order it. If my User table gets huge then this process is going to take ages, right?
Is creating a 'View' going to save on this time?
Is there a way to do this in one statement that won't cost my life in processing time?
Here are the tables
Users
| ID | User |
| 1 | Test1 |
| 2 | Test2 |
| 3 | Test3 |
| 4 | Test4 |
Bets
| ID | User | Amount | Odds | Result |
| 1 | 1 | 10 | 1.35 | Win |
| 2 | 1 | 25 | 2.55 | Win |
| 3 | 3 | 15 | 1.65 | Loss |
| 4 | 2 | 11 | 2.12 | Pending |
Se essentially I would like a table that ranks them as ROI.
| User | AmountBet | AmountWon | ROI |
| 1 | 35 | 77 | 215 |
| 2 | 11 | 0 | 0 |
| 3 | 15 | 0 | 0 |
| 4 | 0 | 0 | 0 |
Assuming the ID of the bets table represents increasing time such that it can be used to identify "last 20", then
WITH b
AS
(
SELECT id,
user,
CASE WHEN result = 'Pending' THEN 0 ELSE amount END AS amount,
CASE WHEN result = 'Win' THEN amount * odds ELSE 0 END as winnings,
ROW_NUMBER() OVER (PARTITION BY user ORDER BY id DESC) AS rownum
FROM bets
)
SELECT user,
SUM(amount) AS amount_bet,
SUM(winnings) AS amount_won,
CASE
WHEN SUM(amount) > 0
THEN SUM(winnings) * 100 / SUM(amount)
ELSE 0
END AS roi
FROM b
WHERE rownum < 21
GROUP BY user;
dbfiddle.uk

How to find longest subsequence based on conditions in Impala SQL

I have a SQL table on Impala that contains ID, dt (monthly basis with no skipped month), and status of each person ID. I want to check how long that each ID is in each status (my expected answer is shown on expected column)
I tried to solve this problem on the value column by using
count(status) over (partition by ID, status order by dt)
but it doesn't reset the value when the status is changed.
+------+------------+--------+-------+----------+
| ID | dt | status | value | expected |
+------+------------+--------+-------+----------+
| 0001 | 01/01/2020 | 0 | 1 | 1 |
| 0001 | 01/02/2020 | 0 | 2 | 2 |
| 0001 | 01/03/2020 | 1 | 1 | 1 |
| 0001 | 01/04/2020 | 1 | 2 | 2 |
| 0001 | 01/05/2020 | 1 | 3 | 3 |
| 0001 | 01/06/2020 | 0 | 3 | 1 |
| 0001 | 01/07/2020 | 1 | 4 | 1 |
| 0001 | 01/08/2020 | 1 | 5 | 2 |
+------+------------+--------+-------+----------+
Is there anyway to reset the counter when the status is changed?
When you partition by ID and status, two groups are formed for the values 0 and 1 in status field. So, the months 1, 2, 6 go into first group with 0 status and the months 3, 4, 5, 7, 8 go into the second group with 1 status. Then, the count function counts the number of statuses individually in those groups. Thus the first group has counts from 1 to 3 and the second group has counts from 1 to 5. This query so far doesn't account for the change in statuses rather just simply divide the record set as per different status values.
One approach would be to divide the records into different blocks where each status change starts a new block. The below query follows this approach and gives the expected result:
SELECT ID,dt,status,
COUNT(status) OVER(PARTITION BY ID,block_number ORDER BY dt) as value
FROM (
SELECT ID,dt,status,
SUM(change_in_status) OVER(PARTITION BY ID ORDER BY dt) as block_number
FROM(
SELECT ID,dt,status,
CASE WHEN
status<>LAG(status) OVER(PARTITION BY ID ORDER BY dt)
OR LAG(status) OVER(PARTITION BY ID ORDER BY dt) IS NULL
THEN 1
ELSE 0
END as change_in_status
FROM statuses
) derive_status_changes
) derive_blocks;
Here is a working example in DB Fiddle.

In Redshift, how do I run the opposite of a SUM function

Assuming I have a data table
date | user_id | user_last_name | order_id | is_new_session
------------+------------+----------------+-----------+---------------
2014-09-01 | A | B | 1 | t
2014-09-01 | A | B | 5 | f
2014-09-02 | A | B | 8 | t
2014-09-01 | B | B | 2 | t
2014-09-02 | B | test | 3 | t
2014-09-03 | B | test | 4 | t
2014-09-04 | B | test | 6 | t
2014-09-04 | B | test | 7 | f
2014-09-05 | B | test | 9 | t
2014-09-05 | B | test | 10 | f
I want to get another column in Redshift which basically assigns session numbers to each users session. It starts at 1 for the first record for each user and as you move further down, if it encounters a true in the "is_new_session" column, it increments. Stays the same if it encounters a false. If it hits a new user, the value resets to 1. The ideal output for this table would be:
1
1
2
1
2
3
4
4
5
5
In my mind it's kind of the opposite of a SUM(1) over (Partition BY user_id, is_new_session ORDER BY user_id, date ASC)
Any ideas?
Thanks!
I think you want an incremental sum:
select t.*,
sum(case when is_new_session then 1 else 0 end) over (partition by user_id order by date) as session_number
from t;
In Redshift, you might need the windowing clause:
select t.*,
sum(case when is_new_session then 1 else 0 end) over
(partition by user_id
order by date
rows between unbounded preceding and current row
) as session_number
from t;

Select dynamic couples of lines in SQL (PostgreSQL)

My objective is to make dynamic group of lines (of product by TYPE & COLOR in fact)
I don't know if it's possible just with one select query.
But : I want to create group of lines (A PRODUCT is a TYPE and a COLOR) as per the number_per_group column and I want to do this grouping depending on the date order (Order By DATE)
A single product with a NB_PER_GROUP number 2 is exclude from the final result.
Table :
-----------------------------------------------
NUM | TYPE | COLOR | NB_PER_GROUP | DATE
-----------------------------------------------
0 | 1 | 1 | 2 | ...
1 | 1 | 1 | 2 |
2 | 1 | 2 | 2 |
3 | 1 | 2 | 2 |
4 | 1 | 1 | 2 |
5 | 1 | 1 | 2 |
6 | 4 | 1 | 3 |
7 | 1 | 1 | 2 |
8 | 4 | 1 | 3 |
9 | 4 | 1 | 3 |
10 | 5 | 1 | 2 |
Results :
------------------------
GROUP_NUMBER | NUM |
------------------------
0 | 0 |
0 | 1 |
~~~~~~~~~~~~~~~~~~~~~~~~
1 | 2 |
1 | 3 |
~~~~~~~~~~~~~~~~~~~~~~~~
2 | 4 |
2 | 5 |
~~~~~~~~~~~~~~~~~~~~~~~~
3 | 6 |
3 | 8 |
3 | 9 |
If you have another way to solve this problem, I will accept it.
What about something like this?
select max(gn.group_number) group_number, ip.num
from products ip
join (
select date, type, color, row_number() over (order by date) - 1 group_number
from (
select op.num, op.type, op.color, op.nb_per_group, op.date, (row_number() over (partition by op.type, op.color order by op.date) - 1) % nb_per_group group_order
from products op
) sq
where sq.group_order = 0
) gn
on ip.type = gn.type
and ip.color = gn.color
and ip.date >= gn.date
group by ip.num
order by group_number, ip.num
This may only work if your nb_per_group values are the same for each combination of type and color. It may also require unique dates, but that could probably be worked around if required.
The innermost subquery partitions the rows by type and color, orders them by date, then calculates the row numbers modulo nb_per_group; this forms a 0-based count for the group that resets to 0 each time nb_per_group is exceeded.
The next-level subquery finds all of the 0 values we mapped in the lower subquery and assigns group numbers to them.
Finally, the outermost query ties each row in the products table to a group number, calculated as the highest group number that split off before this product's date.

SQL rank with priority

Still learning SQL and would greatly appreciate any help or advice on this one. I have a table with a value column and two ID columns that specify which group that row belongs to, i.e:
value | GroupA | GroupB
12 | 1 | 0
16 | 1 | 0
19 | 0 | 1
11 | 1 | 0
30 | 0 | 1
16 | 0 | 1
I would like to order this table in a descending order, but give ranking priority to those rows with 1 in group A before ranking those in group B. The output should look something like this.
value | GroupA | GroupB | Rank
12 | 1 | 0 | 2
16 | 1 | 0 | 1
19 | 0 | 1 | 5
11 | 1 | 0 | 3
30 | 0 | 1 | 4
16 | 0 | 1 | 6
I'm fully agree with TimSchmelter - you shouldn't store group in bit columns. In your current schema query could look like
select
Value, GroupA, GroupB,
row_number() over(order by GroupA desc, value desc) as [Rank]
from Table1
but if you will have more groups in the future, you have to write case inside the over clause
sql fiddle example
Try this
select *,Row_number() OVER(ORDER BY groupA*100+value desc ) as Rank
from Ranks
order by groupA desc,value desc