Oracle Query to find the Nth oldest visit of a person - sql

I have the following Oracle table
PersonID
VisitedOn
1
1/1/2017
1
1/1/2018
1
1/1/2019
1
1/1/2020
1
2/1/2020
1
3/1/2020
1
5/1/2021
1
6/1/2022
2
1/1/2015
2
1/1/2017
2
1/1/2018
2
1/1/2019
2
1/1/2020
2
2/1/2020
3
1/1/2017
3
1/1/2018
3
1/1/2019
3
1/1/2020
3
2/1/2020
3
3/1/2020
3
5/1/2021
I try to write a query to return the Nth oldest visit of each person.
For instance if I want to return the 5th oldest visit (N=5) the result would be
PersonID
VisitDate
1
1/1/2020
2
1/1/2017
3
1/1/2019

I think this will work:
Ran test with this data:
create table test (PersonID number, VisitedOn date);
insert into test values(1,'01-JAN-2000');
insert into test values(1,'01-JAN-2001');
insert into test values(1,'01-JAN-2002');
insert into test values(1,'01-JAN-2003');
insert into test values(2,'01-JAN-2000');
insert into test values(2,'01-JAN-2001');
select personid, visitedon
from (
select personid,
visitedon,
row_number() over ( partition by personid order by visitedon ) rn
from test
)
where rn=5
What this does is use an analytic function to assign a row number to each set of records partitioned by the person id, then pick the Nth row from each partitioned group, where the rows in each group are sorted by date. If you run the inner query by itself, you will see where the row_number is assigned:
PERSONID VISITEDON RN
1 01-JAN-00 1
1 01-JAN-01 2
1 01-JAN-02 3
1 01-JAN-03 4
2 01-JAN-00 1
2 01-JAN-01 2

Related

Select max date for each register, null if does not exists

I have these tables: Employee (id, name, number), Configuration (id, years, licence_days), Periods (id, start_date, end_date, configuration_id, employee_id, period_type):
Employee table:
id name number
---- ----- -------
1 Bob 355
2 John 467
3 Maria 568
4 Josh 871
configuration table:
id years licence_days
---- ----- ------------
1 1 8
2 3 16
3 5 24
Periods table:
id start_date end_date configuration_id employee_id period_type
---- ---------- ------- ---------------- ----------- -----------
1 2021-05-23 2021-05-31 1 1 vaccation
2 2021-05-24 2021-06-01 1 2 vaccation
3 2021-03-01 2021-03-17 2 2 vaccation
4 2021-05-05 2021-05-21 2 2 vaccation
5 2021-01-01 2021-01-17 2 4 vaccation
I want this result:
Result:
employee_id years licence_days max(end_date)
1 1 8 2021-05-31
1 3 16 null
1 5 24 null
2 1 8 2021-06-01
2 3 16 2021-05-21
2 5 24 null
3 1 8 null
3 3 16 null
3 5 24 null
4 1 8 null
4 3 16 2021-01-17
4 5 24 null
i.e., I want to select all Employees with all configuration, and for each one of that, the max end_date of the "vaccation" type (or null if it does not exists).
How can I do that
Oracle supports cross joins, right? So may be something like that?
SELECT e.employee_id, c.years, c.licence_days, max(p.end_date)
FROM Employee e
CROSS JOIN configuration c
LEFT JOIN Periods p
ON e.employee_id = p.employee_id
AND c.configuration_id = p.configuration_id
GROUP BY e.employee_id, c.years, c.licence_days
ORDER BY e.employee_id, c.years
#umberto-petrov chooses wisely with the ANSI CROSS JOIN syntax for a cartesian join. However, in the very weak probability that your requires output of configurations even where there is no employees, you can go with something like :
EDIT: Filtering the Periods join with 'vaccation' as asked in the comments.
If you have to filter for some employee ids, change ON 1 = 1 by ON Employee.id IN (id1, id2, ...). It still keeps every configurations but only takes employees that match the ids.
SELECT Employee.employee_id,
Configuration.years,
Configuration.licence_days,
MAX(Configuration.end_date) max_end_date
FROM Configuration LEFT JOIN Employee ON 1 = 1
LEFT JOIN Periods ON Periods.configuration_id = Configuration.id
AND Periods.employee_id = Employee.id
AND Periods.period_type = 'vaccation'
GROUP BY Employee.employee_id,
Configuration.years,
Configuration.licence_days
ORDER BY Employee.employee_id,
Configuration.years,
Configuration.licence_days
We start from configuration to take every records from this one at least, then made a LEFT CARTESIAN JOIN with Employee and finally a full LET JOIN on Periods for both. That way , if there is no employees, this will output configuration_id and NULL for years, licence_days and max end_date.

t-sql to summarize range of dates from flat list of dates, grouped by other columns

Suppose I had the following table:
UserId AttributeId DateStart
1 3 1/1/2020
1 4 1/9/2020
1 3 2/2/2020
2 3 3/5/2020
2 3 4/1/2020
2 3 5/1/2020
For each unique UserId/AttributeId pair, it is assumed that the DateEnd is the day prior to the next DateStart for that pair, otherwise it is null (or some default like crazy far into the future - 12/31/3000).
Applying this operation to the above table would yield:
UserId AttributeId DateStart DateEnd
1 3 1/1/2020 2/1/2020
1 4 1/9/2020 <null>
1 3 2/2/2020 <null>
2 3 3/5/2020 3/31/2020
2 3 4/1/2020 4/30/2020
2 3 5/1/2020 <null>
What T-SQL, executing in SQL Server 2008 R2, would accomplish this?
I have changed query)
Try this please:
SELECT
UserId,AttributeId,DateStart,Min(DateEnd)DateEnd
FROM
(
SELECT X.UserId,X.AttributeId,X.DateStart, DATEADD(DD,-1,Y.DateStart) DateEnd
FROM TAB X LEFT JOIN TAB Y
ON (X.UserId=Y.UserId) AND (X.AttributeId=Y.AttributeId)
AND (X.DateStart<Y.DateStart)
)
T
GROUP BY UserId,AttributeId,DateStart
ORDER BY DateStart
You are describing lead():
select t.*,
dateadd(day, -1, lead(dateStart) over (partition by userId, attributeId order by dateStart)) as dateEnd
from t;

Use Calendar table to generate historical view of the data

I have a created_date (timestamp) on 1 of my tables, that also has the duration column of a project, and I need to join with another table that only has first_day_of_month column that has the first day of each month, and other relevant information.
Table 1
id project_id created_date duration
1 12345 01/01/2015 10
2 12345 20/10/2015 11
3 12345 10/04/2016 13
4 12345 10/08/2016 15
Table 2
project_id month_start_date
12345 01/01/2015
12345 01/02/2015
12345 01/03/2015
12345 01/04/2015
...
12345 01/08/2016
Expected result
project_id month_start_date duration
12345 01/01/2015 10
12345 01/02/2015 10
...
12345 01/10/2015 11
12345 01/11/2015 11
...
12345 01/04/2016 13
12345 01/05/2016 13
12345 01/06/2016 13
...
12345 01/08/2016 15
I want to be able to present the data listed in my second table historically. So, basically I want the query to return the same duration related to the month_start_date, so that values will repeat until another dateadd(month,datediff(month,0,created_date),0) = first_day_of_month is met... and so forth.
This is my query:
select table2.project_name,
table2.month_start_date,
table1.duration,
table1.created_date
from table1 left outer join table2
on table1.project_id=table2.project_id
where dateadd(month,datediff(month,0,table1.created_date),0)<=table2.month_start_date
group by table2.project_name,table2.month_start_date,table1.duration,table1.created_date
order by table2.month_start_date asc
but I get repeated records on this:
Result I'm getting
project_id month_start_date duration
12345 01/01/2015 10
12345 01/02/2015 10
...
12345 01/10/2015 10
12345 01/10/2015 11
...
12345 01/04/2016 10
12345 01/04/2016 11
12345 01/04/2016 13
...
12345 01/08/2016 10
12345 01/08/2016 11
12345 01/08/2016 13
12345 01/08/2016 15
Can anyone help?
Thank you!
I'd use CROSS/OUTER APPLY operator.
Here is one possible variant. For each row in your calendar table Table2 (for each month) the inner correlated subquery inside the CROSS APPLY finds one row from Table1. It will be the row with the same project_id and the first row with created_date before the month_start_date plus 1 month.
SELECT
Table2.project_id
,Table2.month_start_date
,Durations.duration
FROM
Table2
CROSS APPLY
(
SELECT TOP(1) Table1.duration
FROM Table1
WHERE
Table1.project_id = Table2.project_id
AND Table1.created_date < DATEADD(month, 1, Table2.month_start_date)
ORDER BY Table1.created_date DESC
) AS Durations
;
Make sure that Table1 has index on (project_id, created_date) include (duration). Otherwise, performance would be poor.

Generate sequence based on the value in the previous row and current row

I have the below table having student information.
S_ID Group_ID Date Score
12345 1 1/1/2015 1
12345 1 2/1/2015 2
12345 1 3/1/2015 4
12345 1 4/1/2015 5
12345 1 9/1/2015 3
12345 1 10/1/2015 8
12345 2 1/1/2015 2
12345 2 2/1/2015 4
12345 2 3/1/2015 6
I want to generate a new table based for few students after adding a sequence column as shown below
S_ID Group_ID Date Score Sequence
12345 1 1/1/2015 1 1
12345 1 2/1/2015 2 2
12345 1 3/1/2015 4 3
12345 1 4/1/2015 5 4
12345 1 9/1/2015 3 3
12345 1 10/1/2015 8 4
12345 2 1/1/2015 2 2
12345 2 2/1/2015 4 3
12345 2 3/1/2015 6 4
Rules:
Sequence should be generated for each combination of S_ID, Group_I
For the first record, sequence number will be same as the Score
2nd record onwards, this will be 1 + the previous sequence number
if the difference between the date of the previous row and current row is
more than 100 days, sequence number will be restarted (same as the
Score for that record)
This is a large table and I am looking for the most optimized SQL. Any help would be greatly appreciated
The trick here is to find where the sequence numbers start over. This is for new students, groups, and when the previous date has too big a gap. For the latter, you can use lag() to calculate a "new dates start flag" and then aggregate this to get a grouping.
select t.*,
(first_value(score) over (partition by s_id, group_id, grp order by date) +
row_number() over (partition by s_id, group_id, grp order by date) - 1
) as sequence
from (select t.*,
sum(case when prev_date is null or prev_date < date - 100
then 1 else 0
end) over (partition by s_id, group_id order by date) as grp
from (select t.*,
lag(date) over (partition by s_id, group_id order by date) as prev_date
from t
) t
) t;

How many Days each item was in each State, the full value of the period

This post is really similar to my question:
SQL Server : how many days each item was in each state
but I dont have the column Revision to see wich is the previous state, and also I want to get the full time of a status, I b
....
I'm want to get how long one item has been in one status in general, my table look like this:
ID DATE STATUS
3D56B7B1-FCB3-4897-BAEB-004796E0DC8D 2016-04-05 11:30:00.000 1
3D56B7B1-FCB3-4897-BAEB-004796E0DC8D 2016-04-08 11:30:00.000 13
274C5DA9-9C38-4A54-A697-009933BB7B7F 2016-04-29 08:00:00.000 5
274C5DA9-9C38-4A54-A697-009933BB7B7F 2016-05-04 08:00:00.000 4
A70A66DC-9D9E-49BE-93CF-00F9E3E06CE2 2016-04-14 07:50:00.000 1
A70A66DC-9D9E-49BE-93CF-00F9E3E06CE2 2016-04-21 14:00:00.000 2
A70A66DC-9D9E-49BE-93CF-00F9E3E06CE2 2016-04-23 12:15:00.000 3
A70A66DC-9D9E-49BE-93CF-00F9E3E06CE2 2016-04-23 16:15:00.000 1
BF122AE1-CB39-4967-8F37-012DC55E92A7 2016-04-05 10:30:00.000 1
BF122AE1-CB39-4967-8F37-012DC55E92A7 2016-04-20 17:00:00.000 5
I want to get this
Column 1 : ID Column 2 : Status Column 3 : Time with the status
Column 3 : Time with the status
= NextDate - PreviosDate + 1
if is the last Status, is count as 1
if is more than one Status on the same day, I get the Last one (u can say that only mather the last Status of the day)
by ID, Status must be unique
I should look like this:
ID STATUS TIME
3D56B7B1-FCB3-4897-BAEB-004796E0DC8D 1 3
3D56B7B1-FCB3-4897-BAEB-004796E0DC8D 13 1
274C5DA9-9C38-4A54-A697-009933BB7B7F 5 5
274C5DA9-9C38-4A54-A697-009933BB7B7F 4 1
A70A66DC-9D9E-49BE-93CF-00F9E3E06CE2 1 8
A70A66DC-9D9E-49BE-93CF-00F9E3E06CE2 2 2
BF122AE1-CB39-4967-8F37-012DC55E92A7 1 15
BF122AE1-CB39-4967-8F37-012DC55E92A 5 1
Thanks to #ConradFrix comments, this is how works ..
WITH CTE
AS
(
SELECT
ID,
STATUS,
DATE,
LEAD(DATE, 1) over (partition by ID order by DATE) LEAD,
ISNULL(DATEDIFF(DAYOFYEAR, DATE,
LEAD(DATE, 1) over (partition by ID order by DATE)), 1) DIF_BY_LEAD
FROM TABLE_NAME
)
SELECT ID, STATUS, SUM(DIF_BY_LEAD) AS TIME_STATUS
FROM CTE GROUP BY ID, STATUS
ORDER BY ID, STATUS