Counting Sick days over the weekend - sql

I'm trying to solve a problem in the following (simplified) dataset:
Name
Date
Workday
Calenderday
Leave
PersonA
2023-01-01
0
1
NULL
PersonA
2023-01-07
0
1
NULL
PersonA
2023-01-08
0
1
NULL
PersonA
2023-01-13
1
1
Sick
PersonA
2023-01-14
0
1
NULL
PersonA
2023-01-15
0
1
NULL
PersonA
2023-01-16
1
1
Sick
PersonA
2023-01-20
1
1
Holiday
PersonA
2023-01-21
0
1
NULL
PersonA
2023-01-22
0
1
NULL
PersonA
2023-01-23
1
1
Holiday
PersonB
2023-01-01
0
1
NULL
PersonB
2023-01-02
1
1
Sick
PersonB
2023-01-03
1
1
Sick
Where the lines with NULL in [Leave] is weekend.
What I want is a result looking like this:
Name
Leave
PeriodStartDate
PeriodEndDate
Workdays
Weekdays
PersonA
Sick
2023-01-13
2023-01-16
2
4
PersonA
Holiday
2023-01-20
2023-01-23
2
4
PersonB
Sick
2023-01-02
2023-01-03
2
2
where the difference between [Workdays] and [Weekdays] is that weekdays also counts the weekend.
What I have been trying is to first make a row (in two different ways)
ROW_NUMBER() OVER (PARTITION BY \[Name\] ORDER BY \[Date\]) as RowNo1
ROW_NUMBER() OVER (PARTITION BY \[Name\], \[Leave\] ORDER BY \[Date\]) as RowNo2
and after that to make a period base date:
DATEADD(DAY, 0 - \[RowNo1\], Date) as PeriodBaseDate1
,DATEADD(DAY, 0 - \[RowNo2\], \[Date\]) as PeriodBaseDate2
and after that do something like this:
MIN(\[Date\]) as PeriodStartDate
,MAX(\[Dato\]) as PeriodEndDate
,SUM(\[Calenderday\]) as Weekdays
,SUM(\[Workday\]) as Workdays
GROUP BY \[PeriodBaseDate (1 or 2?)\], \[Leave\], \[Name\]
But whatever I do I can't seem to get it to count the weekends in the periods.
It doesn't have to include my try with the RowNo, PeriodBaseDate etc.

As we don't have your actual full solutions, I've provided a full working one. I firstly use LAST_VALUE to have all the rows have a value for their Leave (provided there was a non-NULL value previously).
Once I do that, you have a gaps and island problem, and can aggregate based on that.
I assume you are using SQL Server 2022, the latest version of SQL Server at the time of writing, as no version details are given and thus have access to the IGNORE NULLS syntax.
SELECT *
INTO dbo.YourTable
FROM (VALUES('PersonA',CONVERT(date,'2023-01-01'),0,1,NULL),
('PersonA',CONVERT(date,'2023-01-07'),0,1,NULL),
('PersonA',CONVERT(date,'2023-01-08'),0,1,NULL),
('PersonA',CONVERT(date,'2023-01-13'),1,1,'Sick'),
('PersonA',CONVERT(date,'2023-01-14'),0,1,NULL),
('PersonA',CONVERT(date,'2023-01-15'),0,1,NULL),
('PersonA',CONVERT(date,'2023-01-16'),1,1,'Sick'),
('PersonA',CONVERT(date,'2023-01-20'),1,1,'Holiday'),
('PersonA',CONVERT(date,'2023-01-21'),0,1,NULL),
('PersonA',CONVERT(date,'2023-01-22'),0,1,NULL),
('PersonA',CONVERT(date,'2023-01-23'),1,1,'Holiday'),
('PersonB',CONVERT(date,'2023-01-01'),0,1,NULL),
('PersonB',CONVERT(date,'2023-01-02'),1,1,'Sick'),
('PersonB',CONVERT(date,'2023-01-03'),1,1,'Sick'))V(Name,Date,Workday,Calenderday,Leave);
GO
WITH Leaves AS(
SELECT Name,
[Date],
Workday,
Calenderday, --It's spelt Calendar, you should correct this typopgraphical error as objects with typoes lead to further problems.
--Leave,
LAST_VALUE(Leave) IGNORE NULLS OVER (PARTITION BY Name ORDER BY Date
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS Leave
FROM dbo.YourTable YT),
LeaveGroups AS(
SELECT Name,
[Date],
Workday,
CalenderDay,
Leave,
ROW_NUMBER() OVER (PARTITION BY Name ORDER BY Date) -
ROW_NUMBER() OVER (PARTITION BY Name, Leave ORDER BY Date) AS Grp
FROM Leaves)
SELECT Name,
Leave,
MIN([Date]) AS PeriodStartDate,
MAX([Date]) AS PeriodEndDate,
SUM(WorkDay) AS WorkDays, --Assumes Workday is not a bit, if it is, CAST or CONVERT it to a int
DATEDIFF(DAY,MIN([Date]), MAX([Date]))+1 AS Weekdays
--SUM(CASE WHEN (DATEPART(WEEKDAY,[Date]) + ##DATEFIRST + 5) % 7 BETWEEN 0 AND 4 THEN 1 END) AS Weekdays --This method is language agnostic
FROM LeaveGroups
WHERE Leave IS NOT NULL
GROUP BY Name,
Leave,
Grp
ORDER BY Name,
PeriodStartDate;
GO
DROP TABLE dbo.YourTable;

I am not sure what you are trying to do. Based on what I understood, below script gives the expected output.
SELECT Name, Leave, Min(Date) PeriodStartDate,Max(Date) PeriodEndDate, SUM(Workday) Workdays, DATEDIFF(DAY,Min(Date),Max(Date))+ 1 Weekdays from YourTable
WHERE Leave IS NOT NULL
GROUP BY Name, Leave

Related

How to subtract next row from first one for each account id in SQL?

The question I am trying to answer is how can I return the correct order and sequence of weeks for each ID? For example, while it is true the first week for each ID will always start at 1 (its the first week in the series), it could be the following date in the series may also be within the first week (e.g., so should return 1 again) or perhaps be a date that falls in the 3rd week (e.g., so should return 3).
The code I've written so far is:
select distinct
row_number() over (partition by ID group by date) row_nums
,ID
,date
from table_a
Which simply returns the running tally of dates by ID, and doesn't take into account what week number that date falls in.
But what I'm looking for is this:
Here's some setup code to assist:
CREATE TABLE random_table
(
ID VarChar(50),
date DATETIME
);
INSERT INTO random_table
VALUES
('AAA',5/14/2021),
('AAA',6/2/2021),
('AAA',7/9/2021),
('BBB', 5/25/2021),
('CCC', 12/2/2020),
('CCC',12/6/2020),
('CCC',12/10/2020),
('CCC',12/14/2020),
('CCC',12/18/2020),
('CCC',12/22/2020),
('CCC',12/26/2020),
('CCC',12/30/2020),
('CCC',1/3/2021),
('DDD',1/7/2021),
('DDD',1/11/2021)
with adj as (
select *, dateadd(day, -1, "date") as adj_dt
from table_a
)
select
datediff(week,
min(adj_dt) over (partition by id),
adj_dt) + 1 as week_logic,
id, "date"
from adj
This assumes that your idea of weeks corresponds with ##datefirst set as Sunday. For a Sunday to Saturday definition you would find 12/06/2020 and 12/10/2020 in the same week, so presumably you want something like a Monday start instead (which also seems to line up with the numbering for 12/02/2020, 12/14/2020 and 12/18/2020.) I'm compensating by sliding backward a day in the weeks calculation. That step could be handled inline without a CTE but perhaps it illustrates the approach more clearly.
Your objective isn't clear but I think you would benefit from a Tally-Table of the weeks and then LEFT JOIN to your source data.
This will give you a row for each week AND source data if it exists
SELECT
CASE WHEN ROW_NUMBER() OVER (PARTITION BY ID ORDER BY [date])=1 THEN 1
ELSE DATEPART(WK, (DATE) ) - DATEPART(WK, FIRST_VALUE([DATE]) OVER (PARTITION BY ID ORDER BY [date])) END PD,
ID,
CONVERT(VARCHAR(10), [date],120)
FROM random_table rt
ORDER BY ID,[date]
DBFIDDLE
output:
PD
ID
(No column name)
1
AAA
2021-05-14
3
AAA
2021-06-02
8
AAA
2021-07-09
1
BBB
2021-05-25
1
CCC
2020-12-02
1
CCC
2020-12-06
1
CCC
2020-12-10
2
CCC
2020-12-14
2
CCC
2020-12-18
3
CCC
2020-12-22
3
CCC
2020-12-26
4
CCC
2020-12-30
-47
CCC
2021-01-03
1
DDD
2021-01-07
1
DDD
2021-01-11
Dates are in the format YYYY-MM-DD.
I will leave the -47 in here, so you can fix it yourself (as an exercise) 😁😉

SQL Troubleshooting Help on Table Structure

I'm attempting to calculate average number of days between a customer's 1st and 3rd purchase, but struggling to get the data ordered in a way that will allow me to calculate.
I currently have the below data table. (Note: Order sequence number refers to the number order for that customer.)
Order Date
Customer Number
Order Sequence Number
2020-09-20
1
1
2021-01-20
1
2
2021-01-21
1
3
2020-10-01
2
1
2020-08-06
3
1
2020-09-06
3
2
2020-09-09
3
3
I've been trying to get the data to look like the following table. [To then be able to calculate datediff on the last two columns.]
Customer Number
Order Count
First Order Date
Third Order Date
1
3
2020-09-20
2021-01-21
2
1
2020-10-01
Null
3
3
2020-08-06
2020-09-09
I've completely messed up the code, but here's what I've been trying.
CREATE TABLE X2 as
SELECT
customer_number,
max(order_sequence_number) as order_count,
CASE
WHEN order_sequence_number = 1 then order_date
ELSE null
END as first_order_date,
CASE
WHEN order_sequence_number = 3 then order_date
ELSE null
END as third_order_date
FROM X1
GROUP BY customer_number;
Can someone please tell me what I'm missing? Thanks in advance!
You are on the right track but you need aggregation functions:
SELECT customer_number,
max(order_sequence_number) as order_count,
MAX(CASE WHEN order_sequence_number = 1 THEN order_date END) as first_order_date,
MAX(CASE WHEN order_sequence_number = 3 THEN order_date END) as third_order_date
FROM X1
GROUP BY customer_number;
To get the difference in days, you would just subtract the two expressions using whatever date arithmetic is supported in your database.

T-SQL filtering records based on dates and time difference with other records

I have a table for which I have to perform a rather complex filter: first a filter by date is applied, but then records from the previous and next days should be included if their time difference does not exceed 8 hours compared to its prev or next record (depending if the date is less or greater than filter date).
For those adjacent days the selection should stop at the first record that does not satisfy this condition.
This is how my raw data looks like:
Id
Desc
EntryDate
1
Event type 1
2021-03-12 21:55:00.000
2
Event type 1
2021-03-12 01:10:00.000
3
Event type 1
2021-03-11 20:17:00.000
4
Event type 1
2021-03-11 05:04:00.000
5
Event type 1
2021-03-10 23:58:00.000
6
Event type 1
2021-03-10 11:01:00.000
7
Event type 1
2021-03-10 10:00:00.000
In this example set, if my filter date is '2021-03-11', my expected result set should be all records from that day plus adjacent records from 03-12 and 03-10 that satisfy the 8 hours condition. Note how record with Id 7 is not be included because record with Id 6 does not comply:
Id
EntryDate
2
2021-03-12 01:10:00.000
3
2021-03-11 20:17:00.000
4
2021-03-11 05:04:00.000
5
2021-03-10 23:58:00.000
Need advice how to write this complex query
This is a variant of gaps-and-islands. Define the difference . . . and then groups based on the differences:
with e as (
select t.*
from (select t.*,
sum(case when prev_entrydate > dateadd(hour, -8, entrydate) then 0 else 1 end) over (order by entrydate) as grp
from (select t.*,
lag(entrydate) over (order by entrydate) as prev_entrydate
from t
) t
)
select e.*
from e.*
where e.grp in (select e2.grp
from t e2
where date(e2.entrydate) = #filterdate
);
Note: I'm not sure exactly how filter date is applied. This assumes that it is any events on the entire day, which means that there might be multiple groups. If there is only one group (say the first group on the day), the query can be simplified a bit from a performance perspective.
declare #DateTime datetime = '2021-03-11'
select *
from t
where t.EntryDate between DATEADD(hour , -8 , #DateTime) and DATEADD(hour , 32 , #DateTime)

Expanding/changing my query to find more entries using (potentially) IFELSE

My question will use this dataset as an example. I have a query setup (I have changed variables to more generic variables for the sake of posting this on the internet so the query may not make perfect sense) that picks the most recent date for a given account. So the query returns values with a reason_type of 1 with the most recent date. This query has effective_date set to is not null.
account date effective_date value reason_type
123456 4/20/2017 5/1/2017 5 1
123456 1/20/2017 2/1/2017 10 1
987654 2/5/2018 3/1/2018 15 1
987654 12/31/2017 2/1/2018 20 1
456789 4/27/2018 5/1/2018 50 1
456789 1/24/2018 2/1/2018 60 1
456123 4/25/2017 null 15 2
789123 5/1/2017 null 16 2
666888 2/1/2018 null 31 2
333222 1/1/2018 null 20 2
What I am looking to do now is to basically use that logic to only apply to reason_type
if there is an entry for it, otherwise have it default to reason_type
I think I should be using an IFELSE, but I'm admittedly not knowledgeable about how I would go about that.
Here is the code that I currently have to return the reason_type 1s most recent entry.
I hope my question is clear.
SELECT account, date, effective_date, value, reason_type
from
(
SELECT account, date, effective_date, value, reason_type
ROW_NUMBER() over (partition by account order by date desc) rn
from mytable
WHERE value is not null
AND effective_date is not null
)
WHERE rn =1
I think you might want something like this (do you really have a column named date by the way? That seems like a bad idea):
SELECT account, date, effective_date, value, reason_type
FROM (
SELECT account, date, effective_date, value, reason_type
, ROW_NUMBER() OVER ( PARTITION BY account ORDER BY date DESC ) AS rn
FROM mytable
WHERE value IS NOT NULL
) WHERE rn = 1
-- effective_date IS NULL or is on or before today's date
AND ( effective_date IS NULL OR effective_date < TRUNC(SYSDATE+1) );
Hope this helps.

Get MAX count but keep the repeated calculated value if highest

I have the following table, I am using SQL Server 2008
BayNo FixDateTime FixType
1 04/05/2015 16:15:00 tyre change
1 12/05/2015 00:15:00 oil change
1 12/05/2015 08:15:00 engine tuning
1 04/05/2016 08:11:00 car tuning
2 13/05/2015 19:30:00 puncture
2 14/05/2015 08:00:00 light repair
2 15/05/2015 10:30:00 super op
2 20/05/2015 12:30:00 wiper change
2 12/05/2016 09:30:00 denting
2 12/05/2016 10:30:00 wiper repair
2 12/06/2016 10:30:00 exhaust repair
4 12/05/2016 05:30:00 stereo unlock
4 17/05/2016 15:05:00 door handle repair
on any given day need do find the highest number of fixes made on a given bay number, and if that calculated number is repeated then it should also appear in the resultset
so would like to see the result set as follows
BayNo FixDateTime noOfFixes
1 12/05/2015 00:15:00 2
2 12/05/2016 09:30:00 2
4 12/05/2016 05:30:00 1
4 17/05/2016 15:05:00 1
I manage to get the counts of each but struggling to get the max and keep the highest calculated repeated value. can someone help please
Use window functions.
Get the count for each day by bayno and also find the min fixdatetime for each day per bayno.
Then use dense_rank to compute the highest ranked row for each bayno based on the number of fixes.
Finally get the highest ranked rows.
select distinct bayno,minfixdatetime,no_of_fixes
from (
select bayno,minfixdatetime,no_of_fixes
,dense_rank() over(partition by bayno order by no_of_fixes desc) rnk
from (
select t.*,
count(*) over(partition by bayno,cast(fixdatetime as date)) no_of_fixes,
min(fixdatetime) over(partition by bayno,cast(fixdatetime as date)) minfixdatetime
from tablename t
) x
) y
where rnk = 1
Sample Demo
You are looking for rank() or dense_rank(). I would right the query like this:
select bayno, thedate, numFixes
from (select bayno, cast(fixdatetime) as date) as thedate,
count(*) as numFixes,
rank() over (partition by cast(fixdatetime as date) order by count(*) desc) as seqnum
from t
group by bayno, cast(fixdatetime as date)
) b
where seqnum = 1;
Note that this returns the date in question. The date does not have a time component.