Complex Ranking in SQL (Teradata) - sql

I have a peculiar problem at hand. I need to rank in the following manner:
Each ID gets a new rank.
rank #1 is assigned to the ID with the lowest date. However, the subsequent dates for that particular ID can be higher but they will get the incremental rank w.r.t other IDs.
(E.g. ADF32 series will be considered to be ranked first as it had the lowest date, although it ends with dates 09-Nov, and RT659 starts with 13-Aug it will be ranked subsequently)
For a particular ID, if the days are consecutive then ranks are same, else they add by 1.
For a particular ID, ranks are given in date ASC.
How to formulate a query?

You need two steps:
select
id_col
,dt_col
,dense_rank()
over (order by min_dt, id_col, dt_col - rnk) as part_col
from
(
select
id_col
,dt_col
,min(dt_col)
over (partition by id_col) as min_dt
,rank()
over (partition by id_col
order by dt_col) as rnk
from tab
) as dt
dt_col - rnk caluclates the same result for consecutives dates -> same rank

Try datediff on lead/lag and then perform partitioned ranking
select t.ID_COL,t.dt_col,
rank() over(partition by t.ID_COL, t.date_diff order by t.dt_col desc) as rankk
from ( SELECT ID_COL,dt_col,
DATEDIFF(day, Lag(dt_col, 1) OVER(ORDER BY dt_col),dt_col) as date_diff FROM table1 ) t

One way to think about this problem is "when to add 1 to the rank". Well, that occurs when the previous value on a row with the same id_col differs by more than one day. Or when the row is the earliest day for an id.
This turns the problem into a cumulative sum:
select t.*,
sum(case when prev_dt_col = dt_col - 1 then 0 else 1
end) over
(order by min_dt_col, id_col, dt_col) as ranking
from (select t.*,
lag(dt_col) over (partition by id_col order by dt_col) as prev_dt_col,
min(dt_col) over (partition by id_col) as min_dt_col
from t
) t;

Related

SQL Server : using CTE row partition to serialize sequential timestamps

I think I just need a little help with this but is there a way to incrementally count steps in SQL using some type of CTE row partition? I'm using SQL Server 2008 so won't be able to use the LAG function.
In the below, I am trying to find a way to calculate the Step Number as pictured below where for each unique ITEM in my table, in this case G43251, it calculates the process Step_Number based on the Date (timestamp) and the process type. For those with the same timestamp & process_type, it would label them both as the same Step_Number as there other fields that could cause the timestamp to repeat twice.
Right now I am playing around with this below and seeing how maybe I could fit in a DISTINCT timestamp methodology ? So that it doesn't count each row as something new.
WITH cte AS
(
SELECT
*,
ROW_NUMBER() OVER (ORDER BY Timestamp_Posted DESC)
- ROW_NUMBER() OVER (PARTITION BY Item ORDER BY Timestamp_Posted Desc) rn
FROM
#t1
)
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY Item, rn ORDER BY Timestamp_Posted DESC) rn2
FROM
cte
ORDER BY
Timestamp_Posted DESC
Please use dense_rank() instead of row_number()
SELECT *, dense_rank() OVER(Partition By Item ORDER BY Timestamp_Posted, Process_Type ) Step_Number
FROM #t1
ORDER BY Timestamp_Posted DESC

Generate custom group ranking in sql

As posted, I am trying to generate group ranking based on Is_True_Mod column. Here Until next 1 comes, I want 1 group to be there. Please find expected output in SQL. Here in expected output, rows grouped based on Is_True_Mode column. Regular ranking showing for reference ( order by ranking should be their )
You can identify the groups using a cumulative sum. Then you can you row_number() to enumerate the rows:
select t.*,
row_number() over (partition by grp order by regularranking) as expected_output
from (select t.*,
sum(is_true_mode) over (order by regularranking) as grp
from t
) t;

SQL query for backfilling register read values

I have a table with ID,timestamp,register reads for a day, the register reads are like running totals starts at 12.00 at midnight and ends at 11.00 at night.
Problem is there are some random timeintervals in which the cumulative reads may not be present, I need to back fill those,
The below picture gives a snapshot of the problem, The KWH_RDNG is the difference between two cumulative intervals divided by 1000, but the 4th column 5.851 is actually accumulation of 3 missing hours along with the 4th hour value. its fine if i simply divide 5.851/4 and distribute it.
The challenge is they can happen at random intervals and it can be different for different meters (1st column). I am using SQL Server 2016.
Please help.!!
This is a gaps and islands problem -- sort of. You need to identify groups of NULL values with the subsequent value. One method is to use a cumulative sum of the non-NULL value on or after each value. This defines the groups.
Then, you need the count and the reading. So, this should do the calculation:
select t.*,
(max_kwh_rding / cnt) as new_kwh_rding
from (select t.*, count(*) over (partition by meter_serial, grp) as cnt,
max(kwh_rding) over (partition by meter_serial, grp) as max_kwh_rding
from (select t.*,
count(kwh_rding) over (partition by meter_serial order by read_utc desc rows between unbounded preceding and current row) as grp
from t
) t
) t
where cnt > 1;
You can incorporate this into an update:
with toupdate as (
select t.*,
(max_kwh_rding / cnt) as new_kwh_rding
from (select t.*, count(*) over (partition by meter_serial, grp) as cnt,
max(kwh_rding) over (partition by meter_serial, grp) as max_kwh_rding
from (select t.*,
count(kwh_rding) over (partition by meter_serial order by read_utc desc rows between unbounded preceding and current row) as grp
from t
) t
) t
where cnt > 1
)
update toupdate
set kwh_rding = max_kwh_rding;

Finding consecutive patterns (with SQL)

A table consecutive in PostgreSQL:
Each se_id has an idx
from 0 up to 100 - here 0 to 9.
The search pattern:
SELECT *
FROM consecutive
WHERE val_3_bool = 1
AND val_1_dur > 4100 AND val_1_dur < 5900
Now I'm looking for the longest consecutive appearance of this pattern
for each p_id - and the AVG of the counted val_1_dur.
Is it possible to calculate this in pure SQL?
table as txt
"Result" as txt
One method is the difference of row numbers approach to get the sequences for each:
select pid, count(*) as in_a_row, sum(val1_dur) as dur
from (select t.*,
row_number() over (partition by pid order by idx) as seqnum,
row_number() over (partition by pid, val3_bool order by idx) as seqnum_d
from consecutive t
) t
group by (seqnun - seqnum_d), pid, val3_bool;
If you are looking specifically for "1" values, then add where val3_bool = 1 to the outer query. To understand why this works, I would suggest that you stare at the results of the subquery, so you can understand why the difference defines the consecutive values.
You can then get the max using distinct on:
select distinct on (pid) t.*
from (select pid, count(*) as in_a_row, sum(val1_dur) as dur
from (select t.*,
row_number() over (partition by pid order by idx) as seqnum,
row_number() over (partition by pid, val3_bool order by idx) as seqnum_d
from consecutive t
) t
group by (seqnun - seqnum_d), pid, val3_bool;
) t
order by pid, in_a_row desc;
The distinct on does not require an additional level of subquery, but I think that makes the logic clearer.
There are Window Functions, that enable you to compare one line with the previous and next one.
https://community.modeanalytics.com/sql/tutorial/sql-window-functions/
https://www.postgresql.org/docs/current/static/tutorial-window.html
As seen on How to compare the current row with next and previous row in PostgreSQL? and Filtering by window function result in Postgresql

SQL Find the minimum date based on consecutive values

I'm having trouble constructing a query that can find consecutive values meeting a condition. Example data below, note that Date is sorted DESC and is grouped by ID.
To be selected, for each ID, the most recent RESULT must be 'Fail', and what I need back is the earliest date in that run of 'Fails'. For ID==1, only the 1st two values are of interest (the last doesn't count due to prior 'Complete'. ID==2 doesn't count at all, failing the first condition, and for ID==3, only the first value matters.
A result table might be:
The trick seems to be doing some type of run-length encoding, but even with several attempts manipulating ROW_NUM and an attempt at the tabibitosan method for grouping consecutive values, I've been unable to gain traction.
Any help would be appreciated.
If your database supports window functions, you can do
select id, case when result='Fail' then earliest_fail_date end earliest_fail_date
from (
select t.*
,row_number() over(partition by id order by dt desc) rn
,min(case when result = 'Fail' then dt end) over(partition by id) earliest_fail_date
from tablename t
) x
where rn=1
Use row_number to get the latest row in the table. min() over() to get the earliest fail date for each id. If the first row has status Fail, you select the earliest_fail_date or else it would be null.
It should be noted that the expected result for id=1 is wrong. It should be 2016-09-20 as it is the earliest fail date.
Edit: Having re-read the question, i think this is what you might be looking for. Getting the minimum Fail date from the latest consecutive groups of Fail rows.
with grps as (
select t.*,row_number() over(partition by id order by dt desc) rn
,row_number() over(partition by id order by dt)-row_number() over(partition by id,result order by dt) grp
from tablename t
)
,maxfailgrp as (
select g.*,
max(case when result = 'Fail' then grp end) over(partition by id) maxgrp
from grps g
)
select id,
case when result = 'Fail' then (select min(dt) from maxfailgrp where id = m.id and grp=m.maxgrp) end earliest_fail_date
from maxfailgrp m
where rn=1
Sample Demo