SQL - Create multiple records from 1 record based on days between 2 dates - sql

I have a table that holds an employee's leave. If an employee takes more than 1 day off in a row for example 22-05-2020 to the 26-05-2020 this will be displayed as one record. I am trying to get this displayed as 5 records, one for each day they were off.
My table is called: Emp_Annual_Leave
and has the following fields
emp_no leave_type leave_year half_day start_date end_date days_of_leave
12345 Annual 2020 N 22/05/2020 26/05/2020 5
above is how it is currently displayed and I am trying to display the above record like below:
emp_no leave_type leave_year half_day start_date end_date days_of_leave leave_date
12345 Annual 2020 N 22/05/2020 26/05/2020 1 22/05/2020
12345 Annual 2020 N 22/05/2020 26/05/2020 1 23/05/2020
12345 Annual 2020 N 22/05/2020 26/05/2020 1 24/05/2020
12345 Annual 2020 N 22/05/2020 26/05/2020 1 25/05/2020
12345 Annual 2020 N 22/05/2020 26/05/2020 1 26/05/2020
Does anyone know I would go about doing this? I have a feeling I need to use ROW_NUMBER() OVER(PARTITION BY) but any attempts I have made haven't worked well for me.
Thanks in advance,
EDIT:
the table I need to create here is a subquery in a bigger query and needs to be joined back to other queries and tables in my DB. I didn't include this as part of my original question, updated to include now incase this impacts the methods I need to use

You could use a recursive query:
with cte as (
select emp_no, leave_type, leave_year, half_day, start_date, end_date, days_of_leave, start_date as leave_date from emp_annual_leave
union all
select emp_no, leave_type, leave_year, half_day, start_date, end_date, days_of_leave, dateadd(day, 1, leave_date)
from cte
where leave_date < end_date
)
select * from cte
If a given leave may span over more than 100 days, you need to add option (maxrecursion 0) at the end of the query.

Related

Oracle SQL: How to fill Null value with data from most recent previous date that is not null?

Essentially date field is updated every month along with other fields, however one field is only updated ~6 times throughout the year. For months where that field is not updated, looking to show the most recent previous data
Date
Emp_no
Sales
Group
Jan
1234
100
Med
Feb
1234
200
---
Mar
1234
170
---
Apr
1234
150
Low
May
1234
180
---
Jun
1234
90
High
Jul
1234
100
---
Need it to show:
Date
Emp_no
Sales
Group
Jan
1234
100
Med
Feb
1234
200
Med
Mar
1234
170
Med
Apr
1234
150
Low
May
1234
180
Low
Jun
1234
90
High
Jul
1234
100
High
This field is not updated at set intervals, could be 1-4 months of Nulls in a row
Tried something like this to get the second most recent date but unsure how to deal with the fact that i could need between 1-4 months prior
LAG(Group)
OVER(PARTITION BY emp_no
ORDER BY date)
Thanks!
This is the traditional "gaps and islands" problem.
There are various ways to solve it, a simple version will work for you.
First, create a new identifier that splits the rows in to "groups", where only the first row in each group is NOT NULL.
SUM(CASE WHEN "group" IS NOT NULL THEN 1 ELSE 0 END) OVER (PARTION BY emp_no ORDER BY "date") AS emp_group_id
Then you can use MAX() in another window function, as all "groups" will only have one NOT NULL value.
WITH
gaps
AS
(
SELECT
t.*,
SUM(
CASE WHEN "group" IS NOT NULL
THEN 1
ELSE 0
END
)
OVER (
PARTITION BY emp_no
ORDER BY "date"
)
AS emp_group_id
FROM
your_table t
)
SELECT
"date",
emp_no,
sales,
MAX("group")
OVER (
PARTITION BY emp_no, emp_group_id
)
AS "group"
FROM
gaps
Edit
Ignore all that.
Oracle has IGNORE NULLS.
LAST_VALUE("group" IGNORE NULLS)
OVER (
PARTITION BY emp_no
ORDER BY "date"
ROWS BETWEEN UNBOUNDED PRECEDING
AND CURRENT ROW
)
AS "group"

Find most visited Hotel by month in PostgreSQL

I have a table with couple of customers resided in a hotel for a month or months. I need to find 3 most visited hotels by month. In case one customer lived in a hotel for three months, then it refers for three month. To be more precise below table hotel I have:
id
usr_id
srch_ci
srch_co
hotel_id
1
13
2021-10-01
2021-11-22
200
2
12
2021-10-11
2021-10-22
300
3
11
2021-10-28
2021-11-05
200
4
10
2021-10-28
2021-12-03
100
Result should look like below:
mnth
hotel_id
rnk
visits
2021-10
200
1
2
2021-10
100
2
1
2021-10
300
2
1
2021-11
200
1
2
2021-11
100
2
1
2021-12
100
1
1
As we can see above, user_id = 10 stayed in a hotel = 100 for 3 different months. That means it is counted for 3 different month for a hotel as 1 count. And for 2021-12 month only user = 10 stayed, for this reason in 2021-12 month hotel = 100 is ranked as 1st.
I solved problem using generate_series function in Postgres. That is what I was looking for. This link helped me. Splitting single row into multiple rows based on date
SELECT hotel_id,mnth,visits,
ROW_NUMBER() OVER (PARTITION BY mnth ORDER BY visits DESC) AS rnk FROM (
SELECT hotel_id,to_char(live_mnth,'YYYY-MM') AS mnth,count(*) AS visits FROM (
SELECT id,usr_id,hotel_id,date_in,date_out,
generate_series(date_in, date_out, '1 MONTH')::DATE AS live_mnth
FROM (
SELECT *,TO_CHAR(srch_ci, 'yyyy-mm-01')::date AS date_in,
TO_CHAR(srch_co, 'yyyy-mm-01')::date AS date_out
FROM hotels
) s
) s GROUP BY hotel_id,to_char(live_mnth,'YYYY-MM')
) t

Count distinct customers who bought in previous period and not in next period Bigquery

I have a dataset in bigquery which contains order_date: DATE and customer_id.
order_date | CustomerID
2019-01-01 | 111
2019-02-01 | 112
2020-01-01 | 111
2020-02-01 | 113
2021-01-01 | 115
2021-02-01 | 119
I try to count distinct customer_id between the months of the previous year and the same months of the current year. For example, from 2019-01-01 to 2020-01-01, then from 2019-02-01 to 2020-02-01, and then who not bought in the same period of next year 2020-01-01 to 2021-01-01, then 2020-02-01 to 2021-02-01.
The output I am expect
order_date| count distinct CustomerID|who not buy in the next period
2020-01-01| 5191 |250
2020-02-01| 4859 |500
2020-03-01| 3567 |349
..........| .... |......
and the next periods shouldn't include the previous.
I tried the code below but it works in another way
with customers as (
select distinct date_trunc(date(order_date),month) as dates,
CUSTOMER_WID
from t
where date(order_date) between '2018-01-01' and current_date()-1
)
select
dates,
customers_previous,
customers_next_period
from
(
select dates,
count(CUSTOMER_WID) as customers_previous,
count(case when customer_wid_next is null then 1 end) as customers_next_period,
from (
select prev.dates,
prev.CUSTOMER_WID,
next.dates as next_dates,
next.CUSTOMER_WID as customer_wid_next
from customers as prev
left join customers
as next on next.dates=date_add(prev.dates,interval 1 year)
and prev.CUSTOMER_WID=next.CUSTOMER_WID
) as t2
group by dates
)
order by 1,2
Thanks in advance.
If I understand correctly, you are trying to count values on a window of time, and for that I recommend using window functions - docs here and here a great article explaining how it works.
That said, my recommendation would be:
SELECT DISTINCT
periods,
COUNT(DISTINCT CustomerID) OVER 12mos AS count_customers_last_12_mos
FROM (
SELECT
order_date,
FORMAT_DATE('%Y%m', order_date) AS periods,
customer_id
FROM dataset
)
WINDOW 12mos AS ( # window of last 12 months without current month
PARTITION BY periods ORDER BY periods DESC
ROWS BETWEEN 12 PRECEEDING AND 1 PRECEEDING
)
I believe from this you can build some customizations to improve the aggregations you want.
You can generate the periods using unnest(generate_date_array()). Then use joins to bring in the customers from the previous 12 months and the next 12 months. Finally, aggregate and count the customers:
select period,
count(distinct c_prev.customer_wid),
count(distinct c_next.customer_wid)
from unnest(generate_date_array(date '2020-01-01', date '2021-01-01', interval '1 month')) period join
customers c_prev
on c_prev.order_date <= period and
c_prev.order_date > date_add(period, interval -12 month) left join
customers c_next
on c_next.customer_wid = c_prev.customer_wid and
c_next.order_date > period and
c_next.order_date <= date_add(period, interval 12 month)
group by period;

Add N business days to a given date skipping holidays, exceptions and weekends in SQL DB2

I'm facing a challenging task here, spent a day on it and I was only able to solve it through a procedure but it is taking too long to run for all projects.
I would like to solve it in a single query if possible (no functions or procedures).
There is already some questions here doing it in programming languages OR sql functions/procedures (Wich I also solved min). So I'm asking if it is possible to solve it with just SQL
The background info is:
A project table
A phase table
A holiday table
A dayexception table which cancel a holiday or a weekend day (make that date as a working day) and it is associated with a project
A project may have 0-N phases
A phase have a start date, a duration and a draworder (needed by the system)
Working days is all days that is not weekend days and not a holiday (exception is if that date is in dayexception table)
Consider this following scenario:
project | phase(s) | Dayexception | Holiday
id | id pid start duration draworder | pid date | date
1 | 1 1 2014-01-20 10 0 | 1 2014-01-25 | 2014-01-25
| 2 1 2014-02-17 14 2 | |
The ENDDATE for the project id 1 and phase id 1 is actually 2014-01-31 see the generated data below:
The date on the below data (and now on) is formatted as dd/mm/yyyy (Brazil format) and the value N is null
proj pha start day weekday dayexcp holiday workday
1 1 20/01/2014 20/01/2014 2 N N 1
1 1 20/01/2014 21/01/2014 3 N N 1
1 1 20/01/2014 22/01/2014 4 N N 1
1 1 20/01/2014 23/01/2014 5 N N 1
1 1 20/01/2014 24/01/2014 6 N N 1
1 1 20/01/2014 25/01/2014 7 25/01/2014 25/01/2014 1
1 1 20/01/2014 26/01/2014 1 N N 0
1 1 20/01/2014 27/01/2014 2 N 27/01/2014 0
1 1 20/01/2014 28/01/2014 3 N N 1
1 1 20/01/2014 29/01/2014 4 N N 1
To generate the above data I created a view daysOfYear with all days from 2014 and 2015 (it can be bigger or smaller, created it with two years for the year turn cases) with a CTE query if you guys want to see it let me know and I will add it here. And the following select statement:
select ph.project_id proj,
ph.id phase_id pha,
ph.start,
dy.curday day,
dy.weekday, /*weekday here is a calling to the weekday function of db2*/
doe.exceptiondate dayexcp,
h.date holiday,
case when exceptiondate is not null or (weekday not in (1,7) and h.date is null)
then 1 else 0 end as workday
from phase ph
inner join daysofyear dy
on (year(ph.start) = dy.year)
left join dayexception doe
on (ph.project_id = doe.project_id
and dy.curday = truncate(doe.exceptiondate))
left join holiday h
on (dy.curday = truncate(h.date))
where ph.project_id = 1
and ph.id = 1
and dy.year in (year(ph.start),year(ph.start)+1)
and dy.curday>=ph.start
and dy.curday<=ph.start + ((duration - 1) days)
order by ph.project_id, start, dy.curday, draworder
To solve this scenario I created the following query:
select project_id,
min(start),
max(day) + sum(case when workday=0 then 1 else 0 end) days as enddate
from project_phase_days /*(view to the above select)*/
This will return correctly:
proj start enddate
1 20/01/2014 31/01/2014
The problem I couldn't solve is if the days I'm adding (non workdays sum(case when workday=0 then 1 else 0 end) days ) to the last enddate (max(day)) is weekend days or holidays or exceptions.
See the following scenario (The duration for the below phase is 7):
proj pha start day weekday dayexcp holiday workday
81 578 14/04/2014 14/04/2014 2 N N 1
81 578 14/04/2014 15/04/2014 3 N N 1
81 578 14/04/2014 16/04/2014 4 N N 1
81 578 14/04/2014 17/04/2014 5 N N 1
81 578 14/04/2014 18/04/2014 6 N 18/04/2014 0
81 578 14/04/2014 19/04/2014 7 N 0
81 578 14/04/2014 20/04/2014 1 N 20/04/2014 0
/*the below data I added to show the problem*/
81 578 14/04/2014 21/04/2014 2 N 21/04/2014 0
81 578 14/04/2014 22/04/2014 3 N 1
81 578 14/04/2014 23/04/2014 4 N 1
81 578 14/04/2014 24/04/2014 5 N 1
With the above data my query will return
proj start enddate
81 14/04/2014 23/04/2014
But the correct result would be the enddate as 24/04/2014 that's because my query doesn't take into account if the days after the last day is weekend days or holidays (or exceptions for that matter) as you can see in the dataset above the day 21/04/2014 which is outside my duration is also a Holiday.
I also tried to create a CTE on phase table to add a day for each iteration until the duration is over but I couldn't add the exceptions nor the holidays because the DB2 won't let me add a left join on the CTE recursion. Like this:
with CTE (projectid, start, enddate, duration, level) as (
select projectid, start, start as enddate, duration, 1
from phase
where project_id=1
and phase_id=1
UNION ALL
select projectid, start, enddate + (level days), duration,
case when isWorkDay(enddate + (level days)) then level+1 else level end as level
from CTE left join dayexception on ...
left join holiday on ...
where level < duration
) select * from CTE
PS: the above query doesn't work because of the DB2 limitations and isWorkDay is just as example (it would be a case on the dayexception and holiday table values).
If you have any doubts, please just ask in the comments.
Any help would be greatly appreciated. Thanks.
How to count business days forward and backwards.
Background last Century I worked at this company that used this technique. So this is a pseudo code answer. It worked great for their purposes.
What you need is a table that contains a date column and and id column that increments by one. Fill the table with only business dates... That's the tricky part because of the observing date on another date. Like 2017-01-02 was a holiday where I work but its not really a recognized holiday AFAIK.
How to get 200 business days in the future.
Select the min(id) where date >= to current date.
Select the date where id=id+200.
How to get 200 business days in the past.
Select the min(id) from table with a date >= to current date.
Select the date with id=id-200.
Business days between.
select count(*) from myBusinessDays where "date" between startdate and enddate
Good Luck as this is pseudo code.
So, using the idea of #danny117 answer I was able to create a query to solve my problem. Not exactly his idea but it gave me directions to solve it, so I will mark it as the correct answer and this answer is to share the actual code to solve it.
First let me share the view I created to the periods. As I said I created a view daysofyear with the data of 2014 and 2015 (in my final solution I added a considerable bigger interval without impacting in the end result). Ps: the date format here is in Brazil format dd/mm/yyyy
create or replace view daysofyear as
with CTE (curday, year, weekday) as (
select a1.firstday, year(a1.firstday), dayofweek(a1.firstday)
from (select to_date('01/01/1990', 'dd/mm/yyyy') firstday
from sysibm.sysdummy1) as a1
union all
select a.curday + 1 day as sumday,
year(a.curday + 1 day),
dayofweek(a.curday + 1 day)
from CTE a
where a.curday < to_date('31/12/2050', 'dd/mm/yyyy')
)
select * from cte;
With that View I then created another view with the query on my question adding an amount of days based on my historical data (bigger phase + a considerable margin) here it is:
create or replace view project_phase_days as
select ph.project_id proj,
ph.id phase_id pha,
ph.start,
dy.curday day,
dy.weekday, /*weekday here is a calling to the weekday function of db2*/
doe.exceptiondate dayexcp,
h.date holiday,
ph.duration,
case when exceptiondate is not null or (weekday not in (1,7) and h.date is null)
then 1 else 0 end as workday
from phase ph
inner join daysofyear dy
on (year(ph.start) = dy.year)
left join dayexception doe
on (ph.project_id = doe.project_id
and dy.curday = truncate(doe.exceptiondate))
left join holiday h
on (dy.curday = truncate(h.date))
where dy.year in (year(ph.start),year(ph.start)+1)
and dy.curday>=ph.start
and dy.curday<=ph.start + ((duration - 1) days) + 200 days
/*max duration in database is 110*/
After that I then created this query:
select p.id,
a.start,
a.curday as enddate
from project p left join
(
select p1.project_id,
p1.duration,
p1.start,
p1.curday,
row_number() over (partition by p1.project_id
order by p1.project_id, p1.start, p1.curday) rorder
from project_phase_days p1
where p1.validday=1
) as a
on (p.id = a.project_id
and a.rorder = a.duration)
order by p.id, a.start
What it does is select all workdays from my view (joined with my other days view) rownumber based on the project_id ordered by project_id, start date and current day (curday) I then join with the project table to get the trick part that solved the problem which is a.rorder = a.duration
If you guys need more explanation I will be glad to provide.

How to identify and aggregate sequence from start and end dates

I'm trying to identify a consecutive sequence in dates, per person, as well as sum amount for that sequence. My records table looks like this:
person start_date end_date amount
1 2015-09-10 2015-09-11 500
1 2015-09-11 2015-09-12 100
1 2015-09-13 2015-09-14 200
1 2015-10-05 2015-10-07 2000
2 2015-10-05 2015-10-05 300
2 2015-10-06 2015-10-06 1000
3 2015-04-23 2015-04-23 900
The resulting query should be this:
person sequence_start_date sequence_end_date amount
1 2015-09-10 2015-09-14 800
1 2015-10-05 2015-10-07 2000
2 2015-10-05 2015-10-06 1400
3 2015-04-23 2015-04-23 900
Below, I can use LAG and LEAD to identify the sequence start_date and end_date, but I don't have a way to aggregate the amount. I'm assuming the answer will involve some sort of ROW_NUMBER() window function that will partition by sequence, I just can't figure out how to make the sequence identifiable to the function.
SELECT
person
,COALESCE(sequence_start_date, LAG(sequence_start_date, 1) OVER (ORDER BY person, start_date)) AS "sequence_start_date"
,COALESCE(sequence_end_date, LEAD(sequence_end_date, 1) OVER (ORDER BY person, start_date)) AS "sequence_end_date"
FROM
(
SELECT
person
,start_date
,end_date
,CASE WHEN LAG(end_date, 1) OVER (PARTITION BY person ORDER BY start_date) + interval '1 day' = start_date
THEN NULL
ELSE start_date
END AS "sequence_start_date"
,CASE WHEN LEAD(start_date, 1) OVER (PARTITION BY person ORDER BY start_date) - interval '1 day' = end_date
THEN NULL
ELSE end_date
END AS "sequence_end_date"
,amount
FROM records
) sq
Even your updated (sub)query still isn't quite right for the data you've presented, which is inconsistent about whether the start date of the second and subsequent rows in a sequence should be equal to their previous rows' end date or one day later. The query can be updated pretty easily to accommodate both, if that's needed.
In any case, you cannot use COALESCE as a window function. Aggregate functions may be used as window functions by providing an OVER clause, but not ordinary functions. There are nevertheless ways to apply window function to this task. Here's a way to identify the sequences in your data (as presented):
SELECT
person
,MAX(sequence_start_date)
OVER (
PARTITION BY person
ORDER BY start_date
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
AS "sequence_start_date"
,MIN(sequence_end_date)
OVER (
PARTITION BY person
ORDER BY start_date
ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)
AS "sequence_end_date"
,amount
FROM
(
SELECT
person
,start_date
,end_date
,CASE WHEN LAG(end_date, 1) OVER (PARTITION BY person ORDER BY start_date) + interval '1 day' >= start_date
THEN date '0001-01-01'
ELSE start_date
END AS "sequence_start_date"
,CASE WHEN LEAD(start_date, 1) OVER (PARTITION BY person ORDER BY start_date) - interval '1 day' <= end_date
THEN NULL
ELSE end_date
END AS "sequence_end_date"
,amount
FROM records
order by person, start_date
) sq_part
ORDER BY person, sequence_start_date
That relies on MAX() and MIN() instead of COALESCE(), and it applies window framing to get the appropriate scope for each of those within each partition. Results:
person sequence_start_date sequence_end_date amount
1 September, 10 2015 00:00:00 September, 12 2015 00:00:00 500
1 September, 10 2015 00:00:00 September, 12 2015 00:00:00 100
1 October, 05 2015 00:00:00 October, 07 2015 00:00:00 2000
2 October, 05 2015 00:00:00 October, 06 2015 00:00:00 300
2 October, 05 2015 00:00:00 October, 06 2015 00:00:00 1000
3 April, 23 2015 00:00:00 April, 23 2015 00:00:00 900
Do note that that does not require an exact match of end date with subsequent start date; all rows for each person that abut or overlap will be assigned to the same sequence. If (person, start_date) cannot be relied upon to be unique, however, then you probably need to order the partitions by end date as well.
And now you have a way to identify the sequences: they are characterized by the triple person, sequence_start_date, sequence_end_date. (Or actually, you need only the person and one of those dates for identification purposes, but read on.) You can wrap the above query as an inline view of an outer aggregate query to produce your desired result:
SELECT
person,
sequence_start_date,
sequence_end_date,
SUM(amount) AS "amount"
FROM ( <above query> ) sq
GROUP BY person, sequence_start_date, sequence_end_date
Of course you need both dates as grouping columns if you're going to select them.
Why not:
select a1.person, a1.sequence_start_date, a1.sequence_end_date,
sum(rx.amount)
as amount
from (EXISTING_QUERY) a1
left join records rx
on rx.person = a1.person
and rx.start_date >= a1.start_date
and rx.end_date <= a1.end_date
group by a1.person, a1.sequence_start_date, a1.sequence_end_date