SQL Server : using CTE row partition to serialize sequential timestamps - sql

I think I just need a little help with this but is there a way to incrementally count steps in SQL using some type of CTE row partition? I'm using SQL Server 2008 so won't be able to use the LAG function.
In the below, I am trying to find a way to calculate the Step Number as pictured below where for each unique ITEM in my table, in this case G43251, it calculates the process Step_Number based on the Date (timestamp) and the process type. For those with the same timestamp & process_type, it would label them both as the same Step_Number as there other fields that could cause the timestamp to repeat twice.
Right now I am playing around with this below and seeing how maybe I could fit in a DISTINCT timestamp methodology ? So that it doesn't count each row as something new.
WITH cte AS
(
SELECT
*,
ROW_NUMBER() OVER (ORDER BY Timestamp_Posted DESC)
- ROW_NUMBER() OVER (PARTITION BY Item ORDER BY Timestamp_Posted Desc) rn
FROM
#t1
)
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY Item, rn ORDER BY Timestamp_Posted DESC) rn2
FROM
cte
ORDER BY
Timestamp_Posted DESC

Please use dense_rank() instead of row_number()
SELECT *, dense_rank() OVER(Partition By Item ORDER BY Timestamp_Posted, Process_Type ) Step_Number
FROM #t1
ORDER BY Timestamp_Posted DESC

Related

Get last two rows from a row_number() window function in snowflake

Hopefully, someone can help me...
I'm trying to get the last two values from a row_number() window function. Let's say my results contain row numbers up to 6, for example. How would it be possible to get the rows where the row number is 5 and 6?
Let me know if it can be done with another window function or in another way.
Kind regards,
Using QUALIFY:
SELECT *
FROM tab
QUALIFY ROW_NUMBER() OVER(ORDER BY ... DESC) <= 2;
This approach could be further extended to get two rows per each partition:
SELECT *
FROM tab
QUALIFY ROW_NUMBER() OVER(PARTITION BY ... ORDER BY ... DESC) <= 2;
You can use top with order by desc like:
select top 2 row_number() over([partition by] [order by]) as rn
from table
order by rn desc
I'd say #Shmiel is the formal and elegant way, just in case, would be the same as :
WITH CTE AS
(SELECT product,
user_id,
ROW_NUMBER() OVER (PARTITION BY user_id order by product desc)
as RN
FROM Mytable)
SELECT product, user_id
FROM CTE
WHERE RN < 3;
You will use order by [order_condition] with "desc". And then you will use RN(row number) to select as many rows as you want

Complex Ranking in SQL (Teradata)

I have a peculiar problem at hand. I need to rank in the following manner:
Each ID gets a new rank.
rank #1 is assigned to the ID with the lowest date. However, the subsequent dates for that particular ID can be higher but they will get the incremental rank w.r.t other IDs.
(E.g. ADF32 series will be considered to be ranked first as it had the lowest date, although it ends with dates 09-Nov, and RT659 starts with 13-Aug it will be ranked subsequently)
For a particular ID, if the days are consecutive then ranks are same, else they add by 1.
For a particular ID, ranks are given in date ASC.
How to formulate a query?
You need two steps:
select
id_col
,dt_col
,dense_rank()
over (order by min_dt, id_col, dt_col - rnk) as part_col
from
(
select
id_col
,dt_col
,min(dt_col)
over (partition by id_col) as min_dt
,rank()
over (partition by id_col
order by dt_col) as rnk
from tab
) as dt
dt_col - rnk caluclates the same result for consecutives dates -> same rank
Try datediff on lead/lag and then perform partitioned ranking
select t.ID_COL,t.dt_col,
rank() over(partition by t.ID_COL, t.date_diff order by t.dt_col desc) as rankk
from ( SELECT ID_COL,dt_col,
DATEDIFF(day, Lag(dt_col, 1) OVER(ORDER BY dt_col),dt_col) as date_diff FROM table1 ) t
One way to think about this problem is "when to add 1 to the rank". Well, that occurs when the previous value on a row with the same id_col differs by more than one day. Or when the row is the earliest day for an id.
This turns the problem into a cumulative sum:
select t.*,
sum(case when prev_dt_col = dt_col - 1 then 0 else 1
end) over
(order by min_dt_col, id_col, dt_col) as ranking
from (select t.*,
lag(dt_col) over (partition by id_col order by dt_col) as prev_dt_col,
min(dt_col) over (partition by id_col) as min_dt_col
from t
) t;

SQL - How to sort values in special order?

For example, I have a table ordered by column "code". Also, I now exact number of rows of my table (6 for this case).
I need to create one more column with rank using next rules:
The first value has the first code (1)
The second value has the last code (6)
The third value has the second code (2)
The forth value has penultimate code (5) etc.
How can I create this order? Even if you have just an idea without query, share it with me, please.
You could use:
WITH cte AS (
SELECT *, ROW_NUMBER() OVER(ORDER BY code ASC) rn1,
ROW_NUMBER() OVER(ORDER BY code DESC) rn2
FROM tab
)
SELECT *
FROM cte
ORDER BY ABS(rn2 - rn1) DESC, code;
db<>fiddle demo
How it works: two counters based on code, calculate difference so first and last has the same value, when tie prefer code.
I would use row_number() too, but I think the logic you want is more:
select *
from (
select t.*,
row_number() over(order by code asc ) rn_asc,
row_number() over(order by code desc) rn_desc
from tab t
) t
order by case when rn_asc <= rn_desc then rn_asc else rn_desc end, rn_asc;
This ranks records in both directions, and then uses the smallest of the two ranks for ordering. The second sorting criteria ensures that the smallest value of the two consistently comes first.

convert timestamp to date in hive and get data for that date

I have a query like this
select distinct emp,phno,addrs,email from cdv.emp;
Now I want to get only data which is created on the latest generated date and not old.
I have an audit column created_on - this is the unique key and Timestamp
select distinct emp,phno,addrs,email from cdv.emp;
I expect latest data based on created_on(timestamp) column which is generated in 24 hours or say the Max date
Use rank analytic function.It will work much faster than IN subquery:
select distinct emp,phno,addrs,email
from
(
select emp,phno,addrs,email,
rank() over(order by to_date(c.created_on) desc) rn
from cdv.emp c
)s
where rn=1;
If you want latest record per emp,phno,addrs,email, then you can use row_number() without distinct. If this method is applicable, this will be even faster:
select emp,phno,addrs,email
from
(
select emp,phno,addrs,email,
row_number() over(partition by emp,phno,addrs,email order by to_date(c.created_on) desc) rn
from cdv.emp c
)s
where rn=1;

SQL Finding five largest numbers instead of one Max in a table

I have a table and I need to run a query that contains some aggregation Functions like Maximum , Average , Standard Deviation , ...
but instead of one Maximum I should return 5 largest number.
the simplified query is something like this:
SELECT OSI_KEY , MAX(VALUE) , AVG(VALUE) , STDDEV(VALUE), variance(VALUE)
FROM DATA_VALUES_5MIN_6_2013
GROUP BY OSI_KEY
ORDER BY OSI_KEY
and I need some Magical ;) Query like this:
SELECT OSI_KEY , MAX1(VALUE) ,MAX2(VALUE) ,MAX3(VALUE) ,MAX4(VALUE) , MAX5(VALUE) ,
AVG(VALUE) , STDDEV(VALUE), variance(VALUE)
FROM DATA_VALUES_5MIN_6_2013
GROUP BY OSI_KEY
ORDER BY OSI_KEY
I appreciate your considerations.
Oracle has an NTH_VALUE() function. Unfortunately, it is only an analytic function and not a window function. This leads to the strange construct of SELECT DISTINCT with a bunch of analytic functions:
SELECT DISTINCT OSI_KEY,
MAX(VALUE) OVER (PARTITION BY OSI_KEY),
NTH_VALUE(VALUE, 2) OVER (PARTITION BY OSI_KEY ORDER BY VALUE DESC) as MAX_2,
NTH_VALUE(VALUE, 3) OVER (PARTITION BY OSI_KEY ORDER BY VALUE DESC) as MAX_3,
NTH_VALUE(VALUE, 4) OVER (PARTITION BY OSI_KEY ORDER BY VALUE DESC) as MAX_4,
NTH_VALUE(VALUE, 5) OVER (PARTITION BY OSI_KEY ORDER BY VALUE DESC) as MAX_5,
AVG(VALUE) OVER (PARTITION BY OSI_KEY),
STDDEV(VALUE) OVER (PARTITION BY OSI_KEY),
variance(VALUE) OVER (PARTITION BY OSI_KEY)
FROM DATA_VALUES_5MIN_6_2013
ORDER BY OSI_KEY;
You can also do this using conditional aggregation, with a row_number() or dense_rank() in a subquery.
SELECT OSI_KEY, MaxValue FROM (
SELECT OSI_KEY, MAX(value) AS MaxValue FROM table GROUP BY OSI_KEY
)
ORDER BY MaxValue DESC
FETCH FIRST 5 ROWS ONLY;