Add a counter based consecutive dates - sql

I have an employee table with employee name and the dates when the employee was on leave. My task is to identify employees who have takes 3 or 5 consecutive days of leave. I tried to add a row_number but it wouldn't restart correct based on the consecutive dates. The desired counter I am after is shown below. Any suggestions please?
Employee Leave Date Desired Counter
John 25-Jan-20 1
John 26-Jan-20 2
John 27-Jan-20 3
John 28-Jan-20 4
John 15-Mar-20 1
John 16-Mar-20 2
Mary 12-Feb-20 1
Mary 13-Feb-20 2
Mary 20-Apr-20 1
Desired output (same as in text)

This is a gaps and island problem: islands represents consecutive days of leaves, and you want to enumerate the rows of each island.
Here is an approach that uses the date difference against a monotonically increasing counter to build the groups:
select t.*,
row_number() over(
partition by employee, dateadd(day, -rn, leave_date)
order by leave_date
) counter
from (
select t.*,
row_number() over(partition by employee order by leave_date) rn
from mytable t
) t
order by employee, leave_date
Demo on DB Fiddle

Related

Display DISTINCT value on SQL statement by column condition

i'm introducing you the problem with DISTINCT values by column condition i have dealt with and can't provide
any idea how i can solve it.
So. The problem is i have two Stephen here declared , but i don't want duplicates:
**
The problem:
**
id vehicle_id worker_id user_type user_fullname
9 1 NULL external_users John Dalton
10 1 16 employees Mike
11 1 1 employees Stephen
12 2 173 employee Nicholas
13 2 1 employee Stephen
14 1 NULL external_users Peter
**
The desired output:**
id vehicle_id worker_id user_type user_fullname
9 1 NULL external_users John Dalton
10 1 16 employees Mike
12 2 173 employee Nicholas
13 2 1 employee Stephen
14 1 NULL external_users Peter
I have tried CASE statements but without success. When i group by it by worker_id,
it removes another duplicates, so i figured out it needs to be grouped by some special condition?
If anyone can provide me some hint how i can solve this problem , i will be very grateful.
Thank's!
There are no duplicate rows in this table. Just because Stephen appears twice doesn't make them duplicates because the ID, VEHICLE_ID, and USER_TYPE are different.
What you need to do is decide how you want to identify the Stephen record you wish to see in the output. Is it the one with the highest VEHICLE_ID? The "latest" record, i.e. the one with the highest ID?
You will use that rule in a window function to order the rows within your criteria, and then use that row number to filter down to the results you want. Something like this:
select id, vehicle_id, worker_id, user_type, user_fullname
from (
select id, vehicle_id, worker_id, user_type, user_fullname,
row_number() over (partition by worker_id, user_fullname order by id desc) n
from user_vehicle
) t
where t.n = 1

Count distinct over partition by

I am trying to do a distinct count of names partitioned over their roles. So, in the example below: I have a table with the names and the person's role.
I would like a role count column that gives the total number of distinct people in that role. For example, the role manager comes up four times but there are only 3 distinct people for that role - Sam comes up again on a different date.
If I remove the date column, it works fine using:
select
a.date,
a.Name,
a.Role,
count(a.Role) over (partition by a.Role) as Role_Count
from table a
group by a.date, a.name, a.role
Including the date column then makes it count the total roles rather than by distinct name (which I know I haven't identified in the partition). Giving 4 managers and 3 analysts.
How do I fix this?
Desired output:
Date
Name
Role
Role_Count
01/01
Sam
Manager
3
02/01
Sam
Manager
3
01/01
John
Manager
3
01/01
Dan
Manager
3
01/01
Bob
Analyst
2
02/01
Bob
Analyst
2
01/01
Mike
Analyst
2
Current output:
Date
Name
Role
Role_Count
01/01
Sam
Manager
4
02/01
Sam
Manager
4
01/01
John
Manager
4
01/01
Dan
Manager
4
01/01
Bob
Analyst
3
02/01
Bob
Analyst
3
01/01
Mike
Analyst
3
Unfortunately, SQL Server (and other databases as well) don't support COUNT(DISTINCT) as a window function. Fortunately, there is a simple trick to work around this -- the sum of DENSE_RANK()s minus one:
select a.Name, a.Role,
(dense_rank() over (partition by a.Role order by a.Name asc) +
dense_rank() over (partition by a.Role order by a.Name desc) -
1
) as distinct_names_in_role
from table a
group by a.name, a.role
Unfortunately, COUNT(DISTINCT is not available as a window aggregate. But we can use a combination of DENSE_RANK and MAX to simulate it:
select
a.Name,
a.Role,
MAX(rnk) OVER (PARTITION BY date, Role) as Role_Count
from (
SELECT *,
DENSE_RANK() OVER (PARTITION BY date, Role ORDER BY Name) AS rnk
FROM table
) a
If Name may have nulls then we need to take that into account:
select
a.Name,
a.Role,
MAX(CASE WHEN Name IS NOT NULL THEN rnk END) OVER (PARTITION BY date, Role) as Role_Count
from (
SELECT *,
DENSE_RANK() OVER (PARTITION BY date, Role, CASE WHEN Name IS NULL THEN 0 ELSE 1 END ORDER BY Name) AS rnk
FROM table
) a

SQL find and group consecutive number in rows without duplicate

So I have a table like this:
Taxi Client Time
Tom A 1
Tom A 2
Tom B 3
Tom A 4
Tom A 5
Tom A 6
Tom B 7
Tom B 8
Bob A 1
Bob A 2
Bob A 3
and the expected result will be like this:
Tom 3
Bob 1
I have used the partition function to count the consecutive value but the result become this:
Tom A 2
Tom A 3
Tom B 2
Bob A 2
Please help, I am not good in English, thanks!
This is a variation of a gaps-and-islands problem. You can solve it using window functions:
select taxi, count(*)
from (select t.taxi, t.client, count(*) as num_times
from (select t.*,
row_number() over (partition by taxi order by time) as seqnum,
row_number() over (partition by taxi, client order by time) as seqnum_c
from t
) t
group by t.taxi, t.client, (seqnum - seqnum_c)
having count(*) >= 2
)
group by taxi;
use distinct count
select taxi ,count( distinct cient)
from table_name
group by taxi
It seems your expected output is wrong
I don't see where you get the number 3 from. If you're trying to do what your question says and group by client in consecutive order only and then get the number of different groups, I can help you out with the following query. Bob has 1 group and Tom has 4.
Partition by taxi, ORDER BY taxi, time and check if this client matches the previous client for this taxi. If yes, do not count this row. If no, count this row, this is a new group.
SELECT FEE.taxi,
SUM(FEE.clientNotSameAsPreviousInSequence)
FROM
(
SELECT taxi,
CASE
WHEN PreviousClient IS NULL THEN
1
WHEN PreviousClient <> client THEN
1
ELSE
0
END AS clientNotSameAsPreviousInSequence
FROM
(
SELECT *,
LAG(client) OVER (PARTITION BY taxi ORDER BY taxi, time) AS PreviousClient
FROM table
) taxisWithPreviousClient
) FEE
GROUP BY FEE.taxi;

How to partition the following table in DB2

I am trying to partition and order the following table, where I have used all sorts of row_number() over() and dense_rank() over() combinations but am not getting what I need.
The MWE table is as follows:
Person Visit Last_Visit Gap_1_yr
------ ----- -------- --------
1 01/01/2001 01/01/2000 NULL
1 01/01/2003 01/01/2001 gap
1 01/01/2004 01/01/2003 NULL
1 01/01/2006 01/01/2004 gap
2 01/01/2005 01/01/2002 gap
2 01/01/2010 01/01/2005 gap
where a person turns up for an appointment, and if the persons next appointment is > 365 days from their previous appointment (I used a lag function for this).
What I want is, whenever there is a gap, I want to partition so I have the following:
Person Visit Last_Visit Gap_1_yr SEQ
------ ----- -------- -------- ---
1 01/01/2001 01/01/2000 NULL 1
1 01/01/2003 01/01/2001 gap 2
1 01/01/2004 01/01/2003 NULL 2
1 01/01/2006 01/01/2004 gap 3
2 01/01/2005 01/01/2002 gap 1
2 01/01/2010 01/01/2005 gap 2
You see that when there is a gap, the sequence iterates by one until the next gap - all per person.
I have tried:
row_number() over(partition by person order by gap)
but this iterates for every cell in SEQ until finding a new person -ignores gaps
and have tried:
dense_rank() over(partition by person order by gap)
returns 1's in every cell in SEQ
dense_rank() over(partition by person,gap order by gap)
also returns all 1's.
does anyone have any suggestions?
Convert the gap to a flag. Then use sum() to do a cumulative sum of the flag:
select mwe.*,
sum(case when gap_1_yr = 'gap' then 1 else 0 end) over
(partition by person order by visit)
) as seq
from mwe;

Retrieve highest value from sql table

How can retrieve that data:
Name Title Profit
Peter CEO 2
Robert A.D 3
Michael Vice 5
Peter CEO 4
Robert Admin 5
Robert CEO 13
Adrin Promotion 8
Michael Vice 21
Peter CEO 3
Robert Admin 15
to get this:
Peter........4
Robert.......15
Michael......21
Adrin........8
I want to get the highest profit value from each name.
If there are multiple equal names always take the highest value.
select name,max(profit) from table group by name
Since this type of request almost always follows with "now can I include the title?" - here is a query that gets the highest profit for each name but can include all the other columns without grouping or applying arbitrary aggregates to those other columns:
;WITH x AS
(
SELECT Name, Title, Profit, rn = ROW_NUMBER()
OVER (PARTITION BY Name ORDER BY Profit DESC)
FROM dbo.table
)
SELECT Name, Title, Profit
FROM x
WHERE rn = 1;