I'm trying to give a rank to two columns. Both columns are numeric values and one represents an X coordinate and the other a Y Coordinate.
My select statement is as follows:
SELECT
Dense_Rank() over (Partition by X,uniqueid Order by Y ASC) as Y_Rank,
Dense_Rank() over (Partition by Y,uniqueid Order by X ASC) as X_Rank
,[uniqueid]
,[X]
,[Y]
FROM xxxx.xxxxx
The idea being that the statement above would generate an x/y coordinate when concatenated. However my result set is not producing "unique" coordinates or if two rows have the same X but different Y values I'm getting X=1, Y=1 for both rows rather than X=1,Y=1 & X=1,Y=2.
My first question is why is this happening or what am I doing wrong and secondly it appears that rank()/Dense_Rank() doesn't look beyond the decimal, is that true?
I've cast the values to int to remove possible conflicts with floats
+--------+--------+-----+----+--------------------------------------+------------------+
| X_Rank | Y_Rank | X | Y | uniqueid | Slot_Side |
+--------+--------+-----+----+--------------------------------------+------------------+
| 1 | 1 | 29 | 4 | 00000000-0000-0000-0000-fffff27fdf27 | 1 |
| 1 | 2 | 29 | 45 | 00000000-0000-0000-0000-fffff27fdf27 | 1 |
| 1 | 1 | 52 | 6 | 00000000-0000-0000-0000-fffff27fdf2d | 1 |
| 2 | 1 | 236 | 6 | 00000000-0000-0000-0000-fffff27fdf2d | 1 |
| 1 | 1 | 33 | 45 | 00000000-0000-0000-0000-fffff27fe073 | 0 |
| 1 | 1 | 5 | 3 | 00000000-0000-0000-0000-fffff27fe073 | 0 |
| 2 | 1 | 55 | 3 | 00000000-0000-0000-0000-fffff27fe073 | 0 |
| 2 | 1 | 83 | 45 | 00000000-0000-0000-0000-fffff27fe073 | 0 |
| 3 | 1 | 133 | 45 | 00000000-0000-0000-0000-fffff27fe073 | 0 |
| 3 | 1 | 105 | 3 | 00000000-0000-0000-0000-fffff27fe073 | 0 |
| 4 | 1 | 155 | 3 | 00000000-0000-0000-0000-fffff27fe073 | 0 |
| 4 | 1 | 183 | 45 | 00000000-0000-0000-0000-fffff27fe073 | 0 |
| 5 | 1 | 233 | 45 | 00000000-0000-0000-0000-fffff27fe073 | 0 |
| 5 | 1 | 205 | 3 | 00000000-0000-0000-0000-fffff27fe073 | 0 |
| 6 | 1 | 255 | 3 | 00000000-0000-0000-0000-fffff27fe073 | 0 |
| 6 | 1 | 283 | 45 | 00000000-0000-0000-0000-fffff27fe073 | 0 |
| 7 | 1 | 333 | 45 | 00000000-0000-0000-0000-fffff27fe073 | 0 |
| 7 | 1 | 305 | 3 | 00000000-0000-0000-0000-fffff27fe073 | 0 |
| 8 | 1 | 355 | 3 | 00000000-0000-0000-0000-fffff27fe073 | 0 |
| 8 | 1 | 383 | 45 | 00000000-0000-0000-0000-fffff27fe073 | 0 |
| 1 | 1 | 5 | 2 | 00000000-0000-0000-0000-fffff27fe074 | 0 |
| 2 | 1 | 41 | 2 | 00000000-0000-0000-0000-fffff27fe074 | 0 |
| 3 | 1 | 77 | 2 | 00000000-0000-0000-0000-fffff27fe074 | 0 |
| 4 | 1 | 113 | 2 | 00000000-0000-0000-0000-fffff27fe074 | 0 |
| 5 | 1 | 149 | 2 | 00000000-0000-0000-0000-fffff27fe074 | 0 |
| 6 | 1 | 185 | 2 | 00000000-0000-0000-0000-fffff27fe074 | 0 |
| 7 | 1 | 221 | 2 | 00000000-0000-0000-0000-fffff27fe074 | 0 |
| 8 | 1 | 257 | 2 | 00000000-0000-0000-0000-fffff27fe074 | 0 |
| 9 | 1 | 293 | 2 | 00000000-0000-0000-0000-fffff27fe074 | 0 |
+--------+--------+-----+----+--------------------------------------+------------------+
So I found the flaw in my logic.
Here is the corrected statement:
Select
Dense_Rank() over (Partition by Y,uniqueid Order by Y) as X_Rank,
Dense_Rank() over (Partition by uniqueid Order by Y) as Y_Rank
,[uniqueid]
,[X]
,[Y]
FROM xxxx.xxxxx
I was tricked by thinking my X_Rank statement was working when really it was a coincidence given the data I was using. The corrected statement gives more weight to the X_Rank by prioritizing or ranking/partitioning it based upon Y. The idea being that I really care about where the Y values change in relation to X, as this indicates the start of a new "Row". My original statement looked at both X and Y equally which returned undesired results. Once I made the relationship for X_Rank I only need to look at Y as it is ordered rather than what Y means in relation to X. Sorry for my poor explanation but at least it works. Thanks for the help regardless!
Related
I would like to calculate the Theil–Sen estimator per ID for the value column in the sample table below using hive. The Theil–Sen estimator is defined here https://en.wikipedia.org/wiki/Theil%E2%80%93Sen_estimator, I tried to use arrays but could not figure out a solution. Any help is appreciated.
+----+-------+-------+
| 1 | 1 | 10 |
| 1 | 2 | 20 |
| 1 | 3 | 30 |
| 1 | 4 | 40 |
| 1 | 5 | 50 |
| 2 | 1 | 100 |
| 2 | 2 | 90 |
| 2 | 3 | 102 |
| 2 | 4 | 75 |
| 2 | 5 | 70 |
| 2 | 6 | 50 |
| 2 | 7 | 100 |
| 2 | 8 | 80 |
| 2 | 9 | 60 |
| 2 | 10 | 50 |
| 2 | 11 | 40 |
| 2 | 12 | 40 |
+----+-------+-------+
Good day. I have a permutated table with condition and I am running redshift DB. This is a table with events log and I splitted it into session start (bool = 1) and session continue (bool = 0) like this:
=======================
| ID | BOOL |
=======================
| 1 | 0 |
| 2 | 1 |
| 3 | 0 |
| 4 | 0 |
| 5 | 0 |
| 6 | 0 |
| 7 | 0 |
| 8 | 0 |
| 9 | 0 |
| 10 | 0 |
| 11 | 1 |
| 12 | 0 |
| 13 | 0 |
| 14 | 1 |
| 15 | 0 |
| 16 | 0 |
=======================
I need to create sesssion_id column with something like dense_rank:
================================
| ID | BOOL | D_RANK |
================================
| 1 | 0 | 1 |
| 2 | 1 | 2 |
| 3 | 0 | 2 |
| 4 | 0 | 2 |
| 5 | 0 | 2 |
| 6 | 0 | 2 |
| 7 | 0 | 2 |
| 8 | 0 | 2 |
| 9 | 0 | 2 |
| 10 | 0 | 2 |
| 11 | 1 | 3 |
| 12 | 0 | 3 |
| 13 | 0 | 3 |
| 14 | 1 | 4 |
| 15 | 0 | 4 |
| 16 | 0 | 4 |
================================
Is there any option to do this? Would appreciate any help.
Use a cumulative sum. Assuming that bool is the start of a new session:
select t.*,
sum(bool) over (order by id) as session_id
from t;
Note: This will start at 0. You can add 1 if you need.
In the query below, I don't get the results i would expect. Any insights why? How could i reformulate such query to get the desired results?
Schema (SQLite v3.30)
WITH RECURSIVE
cnt(x,y) AS (VALUES(0,ABS(Random()%3)) UNION ALL SELECT x+1, ABS(Random()%3) FROM cnt WHERE x<10),
i_rnd as (SELECT r1.x, r1.y, (SELECT COUNT(*) FROM cnt as r2 WHERE r2.y<=r1.y) as idx FROM cnt as r1)
SELECT * FROM i_rnd ORDER BY y;
result:
| x | y | idx |
| --- | --- | --- |
| 1 | 0 | 3 |
| 5 | 0 | 6 |
| 8 | 0 | 5 |
| 9 | 0 | 4 |
| 10 | 0 | 2 |
| 3 | 1 | 4 |
| 0 | 2 | 11 |
| 2 | 2 | 11 |
| 4 | 2 | 11 |
| 6 | 2 | 11 |
| 7 | 2 | 11 |
expected result:
| x | y | idx |
| --- | --- | --- |
| 1 | 0 | 5 |
| 5 | 0 | 5 |
| 8 | 0 | 5 |
| 9 | 0 | 5 |
| 10 | 0 | 5 |
| 3 | 1 | 6 |
| 0 | 2 | 11 |
| 2 | 2 | 11 |
| 4 | 2 | 11 |
| 6 | 2 | 11 |
| 7 | 2 | 11 |
In other words, idx should indicate how many rows have y less or equal than the y of row considered.
I would just use:
select cnt.*,
count(*) over (order by y)
from cnt;
Here is a db<>fiddle.
The issue with your code is probably that the CTE is re-evaluated each time it is called, so the values are not consistent -- a problem with volatile functions in CTEs.
I need to increase the value of the proceeding row number by 1. When the row encounters another condition I then need to reset the counter. This is probably easiest explained with an example:
+---------+------------+------------+-----------+----------------+
| Acct_ID | Ins_Date | Acct_RowID | indicator | Desired_Output |
+---------+------------+------------+-----------+----------------+
| 5841 | 07/11/2019 | 1 | 1 | 1 |
| 5841 | 08/11/2019 | 2 | 0 | 2 |
| 5841 | 09/11/2019 | 3 | 0 | 3 |
| 5841 | 10/11/2019 | 4 | 0 | 4 |
| 5841 | 11/11/2019 | 5 | 1 | 1 |
| 5841 | 12/11/2019 | 6 | 0 | 2 |
| 5841 | 13/11/2019 | 7 | 1 | 1 |
| 5841 | 14/11/2019 | 8 | 0 | 2 |
| 5841 | 15/11/2019 | 9 | 0 | 3 |
| 5841 | 16/11/2019 | 10 | 0 | 4 |
| 5841 | 17/11/2019 | 11 | 0 | 5 |
| 5841 | 18/11/2019 | 12 | 0 | 6 |
| 5132 | 11/03/2019 | 1 | 1 | 1 |
| 5132 | 12/03/2019 | 2 | 0 | 2 |
| 5132 | 13/03/2019 | 3 | 0 | 3 |
| 5132 | 14/03/2019 | 4 | 1 | 1 |
| 5132 | 15/03/2019 | 5 | 0 | 2 |
| 5132 | 16/03/2019 | 6 | 0 | 3 |
| 5132 | 17/03/2019 | 7 | 0 | 4 |
| 5132 | 18/03/2019 | 8 | 0 | 5 |
| 5132 | 19/03/2019 | 9 | 1 | 1 |
| 5132 | 20/03/2019 | 10 | 0 | 2 |
+---------+------------+------------+-----------+----------------+
The column I want to create is 'Desired_Output'. It can be seen from this table that I need to use the column 'indicator'. I want the following row to be n+1; unless the next row is 1. The counter needs to reset when the value 1 is encountered again.
I have tried to use a loop method of some sort but this did not produce the desired results.
Is this possible in some way?
The trick is to identify the group of consecutive rows starts from indicator 1 to the next 1. This is achieve by using the cross apply finding the Acct_RowID with indicator = 1 and use that as a Grp_RowID to use as partition by in the row_number() window function
select *,
Desired_Output = row_number() over (partition by t.Acct_ID, Grp_RowID
order by Acct_RowID)
from your_table t
cross apply
(
select Grp_RowID = max(Acct_RowID)
from your_table x
where x.Acct_ID = t.Acct_ID
and x.Acct_RowID <= t.Acct_RowID
and x.indicator = 1
) g
I have this table
+----+--------+------------+-----------+
| Id | day_id | subject_id | period_Id |
+----+--------+------------+-----------+
| 1 | 1 | 1 | 1 |
| 2 | 1 | 2 | 2 |
| 8 | 2 | 6 | 1 |
| 9 | 2 | 7 | 2 |
| 15 | 3 | 3 | 1 |
| 16 | 3 | 4 | 2 |
| 22 | 4 | 5 | 1 |
| 23 | 4 | 5 | 2 |
| 24 | 4 | 6 | 3 |
| 29 | 5 | 8 | 1 |
| 30 | 5 | 1 | 2 |
to something like this
| Id | day_id | subject_id | period_Id |
| 1 | 1 | 1 | 1 |
| 8 | 2 | 6 | 1 |
| 15 | 3 | 3 | 1 |
| 22 | 4 | 5 | 1 |
| 29 | 5 | 8 | 1 |
| 2 | 1 | 2 | 2 |
| 2 | 1 | 2 | 2 |
| 16 | 3 | 4 | 2 |
| 23 | 4 | 5 | 2 |
| 30 | 5 | 1 | 2 |
+----+--------+------------+-----------+
SO, I want to choose one period with a different subject each day and doing this for number of weeks. so first subject dose not come until all subject have been chosen.
You can ORDER BY period_id first and then by day_id:
SELECT *
FROM your_table
ORDER BY period_Id, day_Id
LiveDemo