Hive/Spark Repeat Dense Rank N Times - sql

I have a table and I need to repeat the rank/dense rank value n times. Ive seen some posts where the numbering restarts by some partition but for my case I do not have a column I am partitioning.
I am looking for something like this
This is how I have my code currently
WITH d_rank_tbl AS(
SELECT
id,
1+ (dense_rank() over (order by id) - 1) % 10 as d_rank
FROM id_bucket)
SELECT
id,
dense_rank() over (partition by d_rank order by rand())
FROM d_rank_tbl

How about arithmetic instead?
select t.*,
floor((row_number() over (order by id) + 2) / 3) as d_rank
from id_bucket;
The + 2 is so the numbering starts at 1 instead of 0.

Related

How to increment the dense rank based on condition

Below enclosed is my requirement.
I want to increment the dense rank function with the cap of each 5 line items by the partition of seller_state and warehouse_id code. for more clarification I have attached sample data of my requirement kindly help me on same.
below mentioned queries are my tries.
CASE
WHEN icta_amount < 0 THEN (DENSE_RANK() OVER (PARTITION BY seller_state ORDER BY seller_state,warehouse_id)) % 5
WHEN icta_amount >= 0 THEN (DENSE_RANK() OVER (PARTITION BY seller_state ORDER BY seller_state,warehouse_id))% 5
END AS DENSE_RANK,
if i add warehouse_id in partition clause in all the places i am getting only 1 don't know the meaning of that.
Thank you in advance.
I'd start with a row_number partitioned by the seller_state and warehouse_id, floor that into groups of five, and then dense_rank over it:
SELECT seller_state, warehouse_id,
DENSE_RANK() OVER (PARTITION BY seller_state, warehouse_id
ORDER BY seller_state, warehouse_id, FLOOR((rn - 1) / 5.0))
FROM (SELECT seller_state, warehouse_id,
ROW_NUMBER() OVER (PARTITION BY seller_state, warehouse_id) AS RN
FROM mytable) t
SQLFiddle demo
If you prefer to use dense_rank() then you'd want to use integer division to mark off the blocks:
with data as (
select seller_state, warehouse_id,
row_number() over (partition by seller_state, warehouse_id
order by seller_state, warehouse_id) as rn
from T
)
select seller_state, warehouse_id,
dense_rank() over (order by seller_state, warehouse_id, (rn - 1) / 5) as rnk
from data;
You could also mark the spots where the counter should increase and them accumulate them. Count off each of the rows in the same state and warehouse. When you find one with where row number mod 5 = 1 mark it as step for your ranking counter. This will immediately reset on a change of state or warehouse as desired:
sum(case when rn % 5 = 1 then 1 end)
over (order by seller_state_warehouse_id, rn) as rnk
Some platforms do not require the order by where it is simply repeating the partition by columns while others do.
https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=d32dba8fccbfb8b85a7eb26f4c8f4849

Complex Ranking in SQL (Teradata)

I have a peculiar problem at hand. I need to rank in the following manner:
Each ID gets a new rank.
rank #1 is assigned to the ID with the lowest date. However, the subsequent dates for that particular ID can be higher but they will get the incremental rank w.r.t other IDs.
(E.g. ADF32 series will be considered to be ranked first as it had the lowest date, although it ends with dates 09-Nov, and RT659 starts with 13-Aug it will be ranked subsequently)
For a particular ID, if the days are consecutive then ranks are same, else they add by 1.
For a particular ID, ranks are given in date ASC.
How to formulate a query?
You need two steps:
select
id_col
,dt_col
,dense_rank()
over (order by min_dt, id_col, dt_col - rnk) as part_col
from
(
select
id_col
,dt_col
,min(dt_col)
over (partition by id_col) as min_dt
,rank()
over (partition by id_col
order by dt_col) as rnk
from tab
) as dt
dt_col - rnk caluclates the same result for consecutives dates -> same rank
Try datediff on lead/lag and then perform partitioned ranking
select t.ID_COL,t.dt_col,
rank() over(partition by t.ID_COL, t.date_diff order by t.dt_col desc) as rankk
from ( SELECT ID_COL,dt_col,
DATEDIFF(day, Lag(dt_col, 1) OVER(ORDER BY dt_col),dt_col) as date_diff FROM table1 ) t
One way to think about this problem is "when to add 1 to the rank". Well, that occurs when the previous value on a row with the same id_col differs by more than one day. Or when the row is the earliest day for an id.
This turns the problem into a cumulative sum:
select t.*,
sum(case when prev_dt_col = dt_col - 1 then 0 else 1
end) over
(order by min_dt_col, id_col, dt_col) as ranking
from (select t.*,
lag(dt_col) over (partition by id_col order by dt_col) as prev_dt_col,
min(dt_col) over (partition by id_col) as min_dt_col
from t
) t;

I want to generate continuously number by 2 column and batch wise

I want to generate continuously number with the combination of 2 columns and in batch size of 5. Anybody can help to solve this?
An adoption of #GordonLinoff's answer...
SELECT
name,
rank,
DENSE_RANK() OVER (ORDER BY name DESC, Rank, ((seqnum - 1) / 5)) AS rno
FROM
(
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY name, rank ORDER BY (SELECT null)) AS seqnum
FROM
yourTable
)
sequenced
ORDER BY
3
You can use row_number() and arithmetic:
select name, rank,
((seqnum - 1) / 5) + 1 as rno
from (select t.*,
row_number() as (partition by name, rank order by (select null)) as seqnum
from t
) t
order by seqnum;

How to group numbered rows into groups

In Sql Server I need to take repeating sets of row numbers and group those into segments or sub groups. I'm trying to achieve column B using Sql. I've achieved column a using the row_number() function but I'm not sure how to get to Column B.
Here is the logic for row_number()
1 + ((row_number() over (order by TimeStamp, FileName, OrderID) - 1) % 5) AS [Row_Number]
Your row_number() is of the form:
row_number() over (partition by colA order by colB)
What you seem to want is:
dense_rank() over (order by colA)
That is, the partition key(s) used for the row_number() should be the order by keys for the dense_rank().
EDIT:
Your code is:
1 + ((row_number() over (order by TimeStamp, FileName, OrderID) - 1) % 5) AS [Row_Number]
In this case, there is no partition by. What you really want simply integer division. This is easy:
1 + ((row_number() over (order by TimeStamp, FileName, OrderID) - 1) / 5) AS [Row_Number]
I would go with a simple solution:
SELECT [Row_Number], GroupNumber
FROM (
SELECT [Row_Number]
, row_number()over(partition by [Row_Number] order by [Row_Number]) as GroupNumber
--Note: The order by clause above should be replaced with however you are ordering the groups of row numbers)
FROM YourTableOrInlineView
) z
ORDER BY GroupNumber, [Row_Number]

Row number in sql having issue with generating

I am facing one issue in below query
CREATE TABLE #tmp(rowid int,settle_id int)
insert into #tmp
select top 100
case when row_number() over (order by settle_id) > 10 then row_number() over (order by settle_id) - 10 else row_number() over (order by settle_id) end as rowid,settle_id from student_id(nolock)
select * from #tmp
drop table #tmp
I want row id should start from 1 -> 10 everytime but for first two sets it start from 1->10 but there after it starts with 11.
Please let me know what i am missing.
Use the below query to get the expected result.
SELECT
CASE WHEN ((row_number() over(order by settle_id) % 10) = 0)
THEN 10
ELSE (row_number() over (ORDER BY settle_id) % 10)
END AS RowID, settle_id
FROM student
Try using modulo arithmetic:
select ((row_number() over (order by settle_id) - 1) % 10) + 1 as rowid, settle_id
from student;
Some databases use the mod() function instead of %.