Change the result of RANK() based on conditions in other columns - sql

Now I have a table in redshift like this:
Table Project_team
Employee_ID Employee_Name Start_date Ranking Is_leader Is_Parttime_Staff
Emp001 John 2014-04-01 1 No No
Emp002 Mary 2015-02-01 2 No Yes
Emp003 Terry 2015-02-15 3 Yes No
Emp004 Peter 2016-02-05 4 No No
Emp004 Morris 2016-05-01 5 No No
Initially there is no ranking for staff.
What I do is to use the rank() function like this:
RANK() over (partition by Employee_ID,Employee_Name order by Start_date) as page_seq
However, now I want to manipulate the ranking based on their status. If the employee is leader then he or she should be ranked at the first. If he or she is parttime staff then should be ranked at the last. The table should be sth like this:
Employee_ID Employee_Name Start_date Ranking Is_leader Is_Parttime_Staff
Emp003 Terry 2015-02-15 1 Yes No
Emp001 John 2014-04-01 2 No No
Emp004 Peter 2016-02-05 3 No No
Emp004 Morris 2016-05-01 4 No No
Emp002 Mary 2015-02-01 5 No Yes
I tried to use the case function to manipulate it like
Case when Is_leader = true then Ranking = 1 else RANK() over (partition by Employee_ID,Employee_Name order by Start_date) End as page_seq.
However it does not work.
What is the process that I need to change the ranking based on other conditions in other columns?
Many thanks!

use dense_rank()
demo
select *,dense_Rank() over(order by case when leader='yes' then 1 else 0 end desc, case when parmanent='yes' then 1 else 0 end)
from cte1
output:
id name leader parmanent employeerank
1 A yes no 1
3 C no no 2
2 B no yes 3

Related

Oracle Query to find the Nth oldest visit of a person

I have the following Oracle table
PersonID
VisitedOn
1
1/1/2017
1
1/1/2018
1
1/1/2019
1
1/1/2020
1
2/1/2020
1
3/1/2020
1
5/1/2021
1
6/1/2022
2
1/1/2015
2
1/1/2017
2
1/1/2018
2
1/1/2019
2
1/1/2020
2
2/1/2020
3
1/1/2017
3
1/1/2018
3
1/1/2019
3
1/1/2020
3
2/1/2020
3
3/1/2020
3
5/1/2021
I try to write a query to return the Nth oldest visit of each person.
For instance if I want to return the 5th oldest visit (N=5) the result would be
PersonID
VisitDate
1
1/1/2020
2
1/1/2017
3
1/1/2019
I think this will work:
Ran test with this data:
create table test (PersonID number, VisitedOn date);
insert into test values(1,'01-JAN-2000');
insert into test values(1,'01-JAN-2001');
insert into test values(1,'01-JAN-2002');
insert into test values(1,'01-JAN-2003');
insert into test values(2,'01-JAN-2000');
insert into test values(2,'01-JAN-2001');
select personid, visitedon
from (
select personid,
visitedon,
row_number() over ( partition by personid order by visitedon ) rn
from test
)
where rn=5
What this does is use an analytic function to assign a row number to each set of records partitioned by the person id, then pick the Nth row from each partitioned group, where the rows in each group are sorted by date. If you run the inner query by itself, you will see where the row_number is assigned:
PERSONID VISITEDON RN
1 01-JAN-00 1
1 01-JAN-01 2
1 01-JAN-02 3
1 01-JAN-03 4
2 01-JAN-00 1
2 01-JAN-01 2

Display the latest modified record for each employee

emp table as like this
id Name Date Modified
1 Ram 2017-01-05
2 Kishore 2017-02-04
3 John 2017-04-22
1 Ram K 2017-04-25
1 Ram Kumar 2017-05-01
2 Kishore Babu 2017-05-05
3 John B 2017-06-01
Assuming you're using a reasonable rdbms that supports window functions, row_number should do the trick:
SELECT id, name, date_modified
FROM (SELECT id, name, date_modified,
ROW_NUMBER() OVER (PARTITION BY id ORDER BY date_modified DESC) rn
FROM emp) t
WHERE rn = 1

Why cant we use rank() analytic function to delete duplicates in a table?

I have created an emp table with the following records in it.
create table emp(
EMPNO integer,
EMPNAME varchar2(20),
SALARY number);
select * from emp;
empno empname salary
10 bill 2000
11 bill 2000
12 mark 3000
12 mark 3000
12 mark 3000
12 philip 3000
12 john 3000
13 tom 4000
14 tom 4000
14 jerry 5000
14 matt 5000
15 susan 5000
To delete duplicates i have been using the rownum() function along with partition by and order by clause with the query as follows:
delete from emp where rowid in
(
select rid from
(
select rowid rid,
row_number() over(partition by empno order by empno) rn
from emp
)
where rn > 1
);
--6 rows deleted
The query deletes all the employee records with duplicate empno's and the result looks somethin like this:
empno empname salary
10 bill 2000
11 bill 2000
12 mark 3000
13 tom 4000
14 tom 4000
15 susan 5000
When i use the inner query to fetch the rownumbers for all the results in the table it gives me the following result:
select rowid as rid,empno,empname,
row_number() over(partition by empno order by empno) rn
from emp;
rowid rownumber
AACDJUAAPAAGLlTAAA 10 bill 1
AACDJUAAPAAGLlTAAB 11 bill 1
AACDJUAAPAAGLlTAAE 12 mark 1
AACDJUAAPAAGLlTAAD 12 mark 2
AACDJUAAPAAGLlTAAC 12 mark 3
AACDJUAAPAAGLlTAAF 12 philip 4
AACDJUAAPAAGLlTAAG 12 john 5
AACDJUAAPAAGLlTAAH 13 tom 1
AACDJUAAPAAGLlTAAI 14 tom 1
AACDJUAAPAAGLlTAAJ 14 jerry 2
AACDJUAAPAAGLlTAAK 14 matt 3
AACDJUAAPAAGLlTAAL 15 susan 1
But when i use rank() in place of the rownumber() function it gives me the following result:
select rowid as rid,empno,empname,
rank() over(partition by empno order by empno) rn
from emp;
rowid rank
AACDJUAAPAAGLlTAAA 10 bill 1
AACDJUAAPAAGLlTAAB 11 bill 1
AACDJUAAPAAGLlTAAE 12 mark 1
AACDJUAAPAAGLlTAAD 12 mark 1
AACDJUAAPAAGLlTAAC 12 mark 1
AACDJUAAPAAGLlTAAF 12 philip 1
AACDJUAAPAAGLlTAAG 12 john 1
AACDJUAAPAAGLlTAAH 13 tom 1
AACDJUAAPAAGLlTAAI 14 tom 1
AACDJUAAPAAGLlTAAJ 14 jerry 1
AACDJUAAPAAGLlTAAK 14 matt 1
AACDJUAAPAAGLlTAAL 15 susan 1
So my question here is why does rank() give the same value to all the records in the table even though there are duplicate empid's?
That's the way RANK() works. It would be rather surprising to get different RANK values for equal-ranking rows within the partition. In fact, the ORDER BY clause is the significant driver for RANK within a partition, but since you're using the same columns for the partitions as for the ordering, it is clear that every row ranks first within their respective partition (as they're the only value in the partition)
See an explanation in this blog post, where this SQL (PostgreSQL syntax)
SELECT
v,
ROW_NUMBER() OVER (window) row_number,
RANK() OVER (window) rank,
DENSE_RANK() OVER (window) dense_rank
FROM t
WINDOW window AS (ORDER BY v)
ORDER BY v
... produces this output
+---+------------+------+------------+
| V | ROW_NUMBER | RANK | DENSE_RANK |
+---+------------+------+------------+
| a | 1 | 1 | 1 |
| a | 2 | 1 | 1 |
| a | 3 | 1 | 1 |
| b | 4 | 4 | 2 |
| c | 5 | 5 | 3 |
| c | 6 | 5 | 3 |
| d | 7 | 7 | 4 |
| e | 8 | 8 | 5 |
+---+------------+------+------------+
There are three "ranking" analytic functions: row_number(), rank(), and dense_rank().
These all work very similarly. They assign numbers, in order, to rows within a group. The group is defined by the partition by clause. The ordering is defined by the order by clause. The difference between the three is how they handle duplicate values.
row_number() always returns sequential numbers within the group. When there are ties, then equal valued rows have sequential values, but they are different.
dense_rank() assigns sequential values with no gaps. However, equal valued rows are given the same values. The next value has a rank one more.
rank() assigns sequential values with gaps. Equal valued rows have the same value but subsequent rows have a gap.
Here is an example:
value row_number dense_rank rank
a 1 1 1
b 2 2 2
b 3 2 2
b 4 2 2
c 5 3 5
d 6 4 6
d 7 4 6

SQL Server Count number of overlaps (date ranges)

I have a table which stores vehicles and the dates they are rented out. I would like to find out if the dates overlap and the count of overlaps for a vehicle in SQL Server 2008. The result I am expecting is as follows.
ID Vehicle StartDate EndDate Overlap
==============================================================
1 Ford Focus 01/01/2014 31/01/2014 1
2 Ford Focus 20/01/2014 20/02/2014 1
3 Ford Focus 01/03/2014 28/03/2014 0
4 Mercedes 18/03/2014 24/03/2014 0
5 Mercedes 01/07/2014 31/07/2014 2
6 Mercedes 15/07/2014 31/07/2014 2
7 Mercedes 25/07/2014 25/08/2014 2
You can try this query:
select *, (select count(*) from test
where not (v.StartDate > EndDate or v.EndDate < StartDate)
and Vehicle = v.Vehicle and ID != v.ID) as Overlap
from test v
Sql fiddle demo.

How to refine last but one?

I have the following table . I need to get the last but one event associate for each event
event_id event_date event_associate
1 2/14/2014 ben
1 2/15/2014 ben
1 2/16/2014 steve
1 2/17/2014 steve // this associate is the last but one for event 1
1 2/18/2014 paul
2 2/19/2014 paul
2 2/20/2014 paul // this associate is the last but one for event 2
2 2/21/2014 ben
3 2/22/2014 paul
3 2/23/2014 paul
3 2/24/2014 ben
3 2/25/2014 steve // this associate is the last but one for event 3
3 2/26/2014 ben
I need to find out who was the last but one event_associate for each event . The result should be
event_id event_associate
1 steve
2 paul
3 steve
I know in order to do this I need to maximize event_date and exclude the last event_associate
So I tried
SELECT event_id , event_associate
WHERE NOT EXISTS (
SELECT *
FROM mytable
WHERE event_date = MAX(event_date)
)
QUALIFY ROW_NUMBER() OVER ( PARTITION BY event_id ORDER BY event_date DESC) = 1
But I do not know how to use EXISTS in this case .
You are quite close, you just need the 2nd row based on ROW_NUMBER:
select t.*,
row_number()
over (partition by event_id
order by event_date desc)
from tab as t
qualify
row_number()
over (partition by event_id
order by event_date desc) = 2
-- or simply
-- qualify rn = 2