Calculate the duration from start date and end date in SQL

Calculate the duration from start date and end date in SQL - sql

I am using SQL Server.
select DISTINCT caseNumber, dateStarted,dateStopped from patientView where dateStarted !='' and dateStopped != '';
We get the following output,
CaseNumber
dateStarted
dateStopped
1
2022-01-01
2022-01-04
1
2022-01-05
2022-01-19
2
2022-01-03
2022-01-10
4
2022-01-05
2022-01-11
4
2022-01-13
2022-01-14
4
2022-01-21
2022-01-23
5
2022-01-15
2022-01-16
5
2022-01-17
2022-01-24
5
2022-01-24
2022-01-26
8
2022-01-17
2022-01-20
8
2022-01-21
2022-01-28
11
2022-01-18
2022-01-25
11
2022-01-26
2022-01-27
I want to calculate the duration for each caseNumber. For eg. caseNumber 1 has 2 rows and hence total duration would be 18days.

I would suggest using the group by keyword to group redundant case numbers and take the min for the startdates and max for stopdates. You can do something like:
SELECT caseNumber, max(dateStopped)-min(dateStarted)
from patientView
where dateStarted != '' and dateStopped != ''
GROUP BY caseNumber;

It is not clear whether you want the sum of the durations for individual patientView records or the duration from the earliest start to the latest end. It is also not clear whether the stop date is inclusive or exclusive. Is 2022-01-01 to 2022-01-04 considered 3 days or 4 days?
Here is code that shows 4 different calculations:
DECLARE #patientView TABLE (CaseNumber INT, dateStarted DATETIME, dateStopped DATETIME)
INSERT #patientView
VALUES
(1, '2022-01-01 ', '2022-01-04'),
(1, '2022-01-05 ', '2022-01-19'),
(2, '2022-01-03 ', '2022-01-10'),
(4, '2022-01-05 ', '2022-01-11'),
(4, '2022-01-13 ', '2022-01-14'),
(4, '2022-01-21 ', '2022-01-23'),
(5, '2022-01-15 ', '2022-01-16'),
(5, '2022-01-17 ', '2022-01-24'),
(5, '2022-01-24 ', '2022-01-26'),
(8, '2022-01-17 ', '2022-01-20'),
(8, '2022-01-21 ', '2022-01-28'),
(11, '2022-01-18 ', '2022-01-25'),
(11, '2022-01-26 ', '2022-01-27')
SELECT
CaseNumber,
SumDaysExclusive = SUM(DATEDIFF(day, dateStarted, dateStopped)),
SumDaysInclusive = SUM(DATEDIFF(day, dateStarted, dateStopped) + 1),
RangeDaysExclusive = DATEDIFF(day, MIN(dateStarted), MAX(dateStopped)),
RangeDaysInclusive = DATEDIFF(day, MIN(dateStarted), MAX(dateStopped)) + 1
FROM #patientView
GROUP BY CaseNumber
ORDER BY CaseNumber
Results:
CaseNumber
SumDaysExclusive
SumDaysInclusive
RangeDaysExclusive
RangeDaysInclusive
1
17
19
18
19
2
7
8
7
8
4
9
12
18
19
5
10
13
11
12
8
10
12
11
12
11
8
10
9
10
db<>fiddle
The test data above uses DATETIME types. (DATE would also work.) If you have dates stored as character data (not a good practice), you may need to add CAST or CONVERT statements.

Related

How to prevent SQL query from returning overlapping groups?

I'm trying to generate a report that displays the number of failed login attempts that happen within 30 minutes of each other. The data for this report is in a SQL database.
This is the query I'm using to pull the data out.
SELECT
A.LoginID,
A.LogDatetime AS firstAttempt,
MAX(B.LogDatetime) AS lastAttempt,
COUNT(B.LoginID) + 1 AS attempts
FROM
UserLoginHistory A
JOIN UserLoginHistory B ON A.LoginID = B.LoginID
WHERE
A.SuccessfulFlag = 0
AND B.SuccessfulFlag = 0
AND A.LogDatetime < B.LogDatetime
AND B.LogDatetime <= DATEADD(minute, 30, A.LogDatetime)
GROUP BY
A.LoginID, A.LogDatetime
ORDER BY
A.LoginID, A.LogDatetime
This returns results that looks something like this:
Row
LoginID
firstAttempt
lastAttempt
attempts
1
1
2022-05-01 00:00
2022-05-01 00:29
6
2
1
2022-05-01 00:06
2022-05-01 00:33
6
3
1
2022-05-01 00:13
2022-05-01 00:39
6
4
1
2022-05-01 00:15
2022-05-01 00:45
6
5
1
2022-05-01 00:20
2022-05-01 00:50
6
6
1
2022-05-01 00:29
2022-05-01 00:55
6
7
1
2022-05-01 00:33
2022-05-01 01:01
6
8
1
2022-05-01 00:39
2022-05-01 01:04
6
...
...
...
...
...
However, you can see that the rows overlap a lot. For example, row 1 shows attempts from 00:00 to 00:29, which overlaps with row 2 showing attempts from 00:06 to 00:33. Row 2 ought to be like row 7 (00:33 - 01:01), since that row's firstAttempt is the next one after row 1's lastAttempt.

You might need to use recursive CTE's or insert your data into a temp table and loop it with updates to remove the overlaps.
Do you need to have set starting times? As a quick work around you could round down the the DATETIME to 30 minute intervals, that would ensure the logins don't overlap but it will only group the attempts by 30 minute buckets
SELECT
A.LoginID,
DATEADD(MINUTE, ROUND(DATEDIFF(MINUTE, '2022-01-01', A.LogDatetime) / 30.0, 0) * 30, '2022-01-01') AS LoginInterval,
MIN(A.LogDatetime) AS firstAttempt,
MAX(A.LogDatetime) AS lastAttempt,
COUNT(*) attempts
FROM
UserLoginHistory A
WHERE
A.SuccessfulFlag = 0
GROUP BY
A.LoginID, DATEADD(MINUTE, ROUND(DATEDIFF(MINUTE, '2022-01-01', A.LogDatetime) / 30.0, 0) * 30, '2022-01-01')
ORDER BY
A.LoginID, LoginInterval

Query to find active days per year to find revenue per user per year

I have 2 dimension tables and 1 fact table as follows:
user_dim
user_id
user_name
user_joining_date
1
Steve
2013-01-04
2
Adam
2012-11-01
3
John
2013-05-05
4
Tony
2012-01-01
5
Dan
2010-01-01
6
Alex
2019-01-01
7
Kim
2019-01-01
bundle_dim
bundle_id
bundle_name
bundle_type
bundle_cost_per_day
101
movies and TV
prime
5.5
102
TV and sports
prime
6.5
103
Cooking
prime
7
104
Sports and news
prime
5
105
kids movie
extra
2
106
kids educative
extra
3.5
107
spanish news
extra
2.5
108
Spanish TV and sports
extra
3.5
109
Travel
extra
2
plans_fact
user_id
bundle_id
bundle_start_date
bundle_end_date
1
101
2019-10-10
2020-10-10
2
107
2020-01-15
(null)
2
106
2020-01-15
2020-12-31
2
101
2020-01-15
(null)
2
103
2020-01-15
2020-02-15
1
101
2020-10-11
(null)
1
107
2019-10-10
2020-10-10
1
105
2019-10-10
2020-10-10
4
101
2021-01-01
2021-02-01
3
104
2020-02-17
2020-03-17
2
108
2020-01-15
(null)
4
102
2021-01-01
(null)
4
103
2021-01-01
(null)
4
108
2021-01-01
(null)
5
103
2020-01-15
(null)
5
101
2020-01-15
2020-02-15
6
101
2021-01-01
2021-01-17
6
101
2021-01-20
(null)
6
108
2021-01-01
(null)
7
104
2020-02-17
(null)
7
103
2020-01-17
2020-01-18
1
102
2020-12-11
(null)
2
106
2021-01-01
(null)
7
107
2020-01-15
(null)
note: NULL bundle_end_date refers to active subscription.
user active days can be calculated as: bundle_end_date - bundle_start_date (for the given bundle)
total revenue per user could be calculated as : total no. of active days * bundle rate per day
I am looking to write a query to find revenue generated per user per year.
Here is what I have for the overall revenue per user:
select pf.user_id
, sum(datediff(day, pf.bundle_start_date, coalesce(pf.bundle_end_date, getdate())) * bd.price_per_day) total_cost_per_bundle
from plans_fact pf
inner join bundle_dim bd on bd.bundle_id = pf.bundle_id
group by pf.user_id
order by pf.user_id;

You need a 'year' table to help parse out each multi-year spanning row into it's seperate years. For each year, you need to also recalculate the start and end dates. That's what I do in the yearParsed cte in the code below. I hard code the years into the join statement that creates y. You probably will do it different but however you get those values will work.
After that, pretty much sum as you did before, just adding the year column to your grouping.
Aside from that, all I did was move the null coalesce logic to the cte to make the overall logic simpler.
with yearParsed as (
select pf.*,
y.year,
startDt = iif(pf.bundle_start_date > y.startDt, pf.bundle_start_date, y.startDt),
endDt = iif(ap.bundle_end_date < y.endDt, ap.bundle_end_date, y.endDt)
from plans_fact pf
cross apply (select bundle_end_date = isnull(pf.bundle_end_date, getdate())) ap
join (values
(2019, '2019-01-01', '2019-12-31'),
(2020, '2020-01-01', '2020-12-31'),
(2021, '2021-01-01', '2021-12-31')
) y (year, startDt, endDt)
on pf.bundle_start_date <= y.endDt
and ap.bundle_end_date >= y.startDt
)
select yp.user_id,
yp.year,
total_cost_per_bundle = sum(datediff(day, yp.startDt, yp.endDt) * bd.bundle_cost_per_day)
from yearParsed yp
join bundle_dim bd on bd.bundle_id = yp.bundle_id
group by yp.user_id,
yp.year
order by yp.user_id,
yp.year;
Now, if this is common, you should probably create a base-table for your 'year' table. But if it's not common, but for this report you don't want to have to keep coming back to hard-code the year information into the y table, you can do this:
declare #yearTable table (
year int,
startDt char(10),
endDt char(10)
);
with y as (
select year = year(min(pf.bundle_start_date))
from #plans_fact pf
union all
select year + 1
from y
where year < year(getdate())
)
insert #yearTable
select year,
startDt = convert(char(4),year) + '-01-01',
endDt = convert(char(4),year) + '-12-31'
from y;
and it will create the appropriate years for you. But you can see why creating a base table may be preferred if you have this or a similar need often.

Sum and Count by month, shown with last day of that month

I have a transaction table like this:
Trandate channelID branch amount
--------- --------- ------ ------
01/05/2019 1 2 2000
11/05/2019 1 2 2200
09/03/2020 1 2 5600
15/03/2020 1 2 600
12/10/2019 2 10 12000
12/10/2019 2 10 12000
15/11/2019 4 7 4400
15/02/2020 4 2 2500
I need to sum amount and count transactions by year and month. I tried this:
select DISTINCT
DATEPART(YEAR,a.TranDate) as [YearT],
DATEPART(MONTH,a.TranDate) as [monthT],
count(*) as [countoftran],
sum(a.Amount) as [amount],
a.Name as [branch],
a.ChannelName as [channelID]
from transactions as a
where a.TranDate>'20181231'
group by a.Name, a.ChannelName, DATEPART(YEAR,a.TranDate), DATEPART(MONTH,a.TranDate)
order by a.Name, YearT, MonthT
It works like charm. However, I will use this data on PowerBI thus I cannot show these results in a "line graphic" due to the year and month info being in separate columns.
I tried changing format on SQL to 'YYYYMM' alas powerBI doesn't recognise this column as date.
So, in the end, I need a result table looks like this:
YearT channelID branch Tamount TranT
--------- --------- ------ ------- -----
31/05/2019 1 2 4400 2
30/03/2020 1 2 7800 2
31/10/2019 2 10 24000 2
30/11/2019 4 7 4400 1
29/02/2020 4 2 2500 1
I have tried several little changes with no result.
Help is much appreciated.

You may try with the following statement:
SELECT
EOMONTH(DATEFROMPARTS(YEAR(Trandate), MONTH(Trandate), 1)) AS YearT,
branch, channelID,
SUM(amount) AS TAmount,
COUNT(*) AS TranT
FROM (VALUES
('20190501', 1, 2, 2000),
('20190511', 1, 2, 2200),
('20200309', 1, 2, 5600),
('20200315', 1, 2, 600),
('20191012', 2, 10, 12000),
('20191012', 2, 10, 12000),
('20191115', 4, 7, 4400),
('20200215', 4, 2, 2500)
) v (Trandate, channelID, branch, amount)
GROUP BY DATEFROMPARTS(YEAR(Trandate), MONTH(Trandate), 1), branch, channelID
ORDER BY DATEFROMPARTS(YEAR(Trandate), MONTH(Trandate), 1)
Result:
YearT branch channelID TAmount TranT
2019-05-31 2 1 4200 2
2019-10-31 10 2 24000 2
2019-11-30 7 4 4400 1
2020-02-29 2 4 2500 1
2020-03-31 2 1 6200 2

How to count employees per hour working in between hours?

How to count employees per hour working in between intime and outtime hours.
I have below table format with intime,outtime of employee .
My Table :
emp_reader_id att_date in_time out_time Shift_In_Time Shift_Out_Time
111 2020-03-01 2020-03-01 08:55:24.000 2020-03-01 10:26:56.000 09:00:00.0000000 10:30:00.0000000
112 2020-03-01 2020-03-01 08:45:49.000 2020-03-01 11:36:14.000 09:00:00.0000000 11:30:00.0000000
113 2020-03-01 2020-03-01 10:58:19.000 2020-03-01 13:36:31.000 09:00:00.0000000 12:00:00.0000000
Need to count the employee in the below format.
Expected Output:
Period Working Employee Count
0 - 1 0
1 - 2 0
2 - 3 0
3 - 4 0
4 - 5 0
5 - 6 0
6 - 7 0
7 - 8 0
8 - 9 2
9 - 10 2
10 - 11 3
11 - 12 2
12 - 13 1
13 - 14 1
14 - 15 0
15 - 16 0
16 - 17 0
17 - 18 0
18 - 19 0
19 - 20 0
20 - 21 0
21 - 22 0
22 - 23 0
23 - 0 0
I tried with below query with my raw data , but it will not work i need from above table
SELECT
(DATENAME(hour, C.DT) + ' - ' + DATENAME(hour, DATEADD(hour, 2, C.DT))) as PERIOD,
Count(C.EVENTID) as Emp_Work_On_Time
FROM
trnevents C
WHERE convert(varchar(50),C.DT,23) ='2020-03-01'
GROUP BY (DATENAME(hour, C.DT) + ' - ' +
DATENAME(hour, DATEADD(hour, 2, C.DT)))

you need to have a list of hours (0 to 23) and then left join to your table.
The following query uses recursive cte to generate that list. You may also use VALUES constructor or TALLY table. Which will gives same effect
; with hours as
(
select hour = 0
union all
select hour = hour + 1
from hours
where hour < 23
)
select convert(varchar(2), h.hour) + ' - ' + convert(varchar(2), (h.hour + 1) % 24) as [Period],
count(t.emp_reader_id) as [Working Employee Count]
from hours h
left join timesheet t on h.hour >= datepart(hour, in_time)
and h.hour <= datepart(hour, out_time)
group by h.hour
Demo : db<>fiddle

Hope that might help but take a look how shift in and shift out are in the code... seems to me its automatic so it could have all you need
SELECT COUNT(Idemp) from aaShiftCountEmp WHERE in_time<'2020-03-01 09:00:00.000' AND out_time>'2020-03-01 10:00:00.000'
this is just example for 9h to 10h but u can make it auto,
btw are u sure that this shoul not show SHIFT ppl cOUNT? i mean u sure 0-1, 1-2 instead of 0-1.30, 1.30-3?? etc?

Set based solution to generate batch number based on proximity and type of record in SQL server

I have a table which has the transactions. Each transaction is represented by a row. The row has a field TranCode indicating the type of transaction and also the date of transaction is also recorded. Following is the table, and corresponding data.
create table t
(
id int identity(1,1),
TranDate datetime,
TranCode int,
BatchNo int
)
GO
insert into t (TranDate, TranCode)
VALUES(GETDATE(), 1),
(DATEADD(MINUTE, 1, GETDATE()), 1),
(DATEADD(MINUTE, 2, GETDATE()), 1),
(DATEADD(MINUTE, 3, GETDATE()), 1),
(DATEADD(MINUTE, 4, GETDATE()), 2),
(DATEADD(MINUTE, 5, GETDATE()), 2),
(DATEADD(MINUTE, 6, GETDATE()), 2),
(DATEADD(MINUTE, 7, GETDATE()), 2),
(DATEADD(MINUTE, 8, GETDATE()), 2),
(DATEADD(MINUTE, 9, GETDATE()), 1),
(DATEADD(MINUTE, 10, GETDATE()), 1),
(DATEADD(MINUTE, 11, GETDATE()), 1),
(DATEADD(MINUTE, 12, GETDATE()), 2),
(DATEADD(MINUTE, 13, GETDATE()), 2),
(DATEADD(MINUTE, 14, GETDATE()), 1),
(DATEADD(MINUTE, 15, GETDATE()), 1),
(DATEADD(MINUTE, 16, GETDATE()), 1),
(DATEADD(MINUTE, 17, GETDATE()), 2),
(DATEADD(MINUTE, 18, GETDATE()), 2),
(DATEADD(MINUTE, 19, GETDATE()), 1),
(DATEADD(MINUTE, 20, GETDATE()), 1),
(DATEADD(MINUTE, 21, GETDATE()), 1),
(DATEADD(MINUTE, 21, GETDATE()), 1)
After the above code, the table contains the following data, well values in the tranDate field will be different for you, but that is fine.
id TranDate TranCode BatchNo
----------- ----------------------- ----------- -----------
1 2015-02-12 20:40:47.547 1 NULL
2 2015-02-12 20:41:47.547 1 NULL
3 2015-02-12 20:42:47.547 1 NULL
4 2015-02-12 20:43:47.547 1 NULL
5 2015-02-12 20:44:47.547 2 NULL
6 2015-02-12 20:45:47.547 2 NULL
7 2015-02-12 20:46:47.547 2 NULL
8 2015-02-12 20:47:47.547 2 NULL
9 2015-02-12 20:48:47.547 2 NULL
10 2015-02-12 20:49:47.547 1 NULL
11 2015-02-12 20:50:47.547 1 NULL
12 2015-02-12 20:51:47.547 1 NULL
13 2015-02-12 20:52:47.547 2 NULL
14 2015-02-12 20:53:47.547 2 NULL
15 2015-02-12 20:54:47.547 1 NULL
16 2015-02-12 20:55:47.547 1 NULL
17 2015-02-12 20:56:47.547 1 NULL
18 2015-02-12 20:57:47.547 2 NULL
19 2015-02-12 20:58:47.547 2 NULL
20 2015-02-12 20:59:47.547 1 NULL
21 2015-02-12 21:00:47.547 1 NULL
22 2015-02-12 21:01:47.547 1 NULL
23 2015-02-12 21:01:47.547 1 NULL
I want a set based solution and not a cursor or row based solution to update the batch number for the rows. For example, the first 4 records should get a batchNo of 1 as they have TranCode as 1, and the next 5 (having tranCode of 2 and are closer to each other in time) should have batchNo as 2, and the next 4 should have 3 and so on. Following is the expected output.
id TranDate TranCode BatchNo
----------- ----------------------- ----------- -----------
1 2015-02-12 20:43:59.123 1 1
2 2015-02-12 20:44:59.123 1 1
3 2015-02-12 20:45:59.123 1 1
4 2015-02-12 20:46:59.123 1 1
5 2015-02-12 20:47:59.123 2 2
6 2015-02-12 20:48:59.123 2 2
7 2015-02-12 20:49:59.123 2 2
8 2015-02-12 20:50:59.123 2 2
9 2015-02-12 20:51:59.123 2 2
10 2015-02-12 20:52:59.123 1 3
11 2015-02-12 20:53:59.123 1 3
12 2015-02-12 20:54:59.123 1 3
13 2015-02-12 20:55:59.123 2 4
14 2015-02-12 20:56:59.123 2 4
15 2015-02-12 20:57:59.123 1 5
16 2015-02-12 20:58:59.123 1 5
17 2015-02-12 20:59:59.123 1 5
18 2015-02-12 21:00:59.123 2 6
19 2015-02-12 21:01:59.123 2 6
20 2015-02-12 21:02:59.123 1 7
21 2015-02-12 21:03:59.123 1 7
22 2015-02-12 21:04:59.123 1 7
23 2015-02-12 21:04:59.123 1 7
I have tried very hard with row_number, rank and dense_rank and none of them came for my rescue. I am looking for set based solution as I want really good performance.
Your help is very much appreciated.

You could do this using Recursive CTE. I also used the lead function to check the next row and determine if you transcode changed.
Query:
WITH A
AS (
SELECT id
,trancode
,trandate
,lead(trancode) OVER (ORDER BY id,trancode) leadcode
FROM #t
)
,cte
AS (
SELECT id
,trandate
,trancode
,lead(trancode) OVER (ORDER BY id,trancode) leadcode
,1 batchnum
,1 nextbatchnum
,id + 1 nxtId
FROM #t
WHERE id = 1
UNION ALL
SELECT A.id
,A.trandate
,A.trancode
,A.leadcode
,nextbatchnum
,CASE
WHEN A.trancode <> A.leadcode THEN nextbatchnum + 1 ELSE nextbatchnum END nextbatchnum
,A.id + 1 nxtid
FROM A
INNER JOIN CTE B ON A.id = B.nxtId
)
SELECT id
,trandate
,trancode
,batchnum
FROM CTE
OPTION (MAXRECURSION 100)
Result:
id trandate trancode batchnum
1 2015-02-12 10:19:06.717 1 1
2 2015-02-12 10:20:06.717 1 1
3 2015-02-12 10:21:06.717 1 1
4 2015-02-12 10:22:06.717 1 1
5 2015-02-12 10:23:06.717 2 2
6 2015-02-12 10:24:06.717 2 2
7 2015-02-12 10:25:06.717 2 2
8 2015-02-12 10:26:06.717 2 2
9 2015-02-12 10:27:06.717 2 2
10 2015-02-12 10:28:06.717 1 3
11 2015-02-12 10:29:06.717 1 3
12 2015-02-12 10:30:06.717 1 3
13 2015-02-12 10:31:06.717 2 4
14 2015-02-12 10:32:06.717 2 4
15 2015-02-12 10:33:06.717 1 5
16 2015-02-12 10:34:06.717 1 5
17 2015-02-12 10:35:06.717 1 5
18 2015-02-12 10:36:06.717 2 6
19 2015-02-12 10:37:06.717 2 6
20 2015-02-12 10:38:06.717 1 7
21 2015-02-12 10:39:06.717 1 7
22 2015-02-12 10:40:06.717 1 7
23 2015-02-12 10:40:06.717 1 7

I think that ultimately the operation you wish to perform on the data is not relational, so a nice set based solution doesn't exist. What you are trying to do relies on the order, and on the row sequentially before/after it, and so needs to use a cursor somewhere.

I've managed to get your desired output using a recursive CTE, although it's not optimised, but thought it might be useful to post what I've done to give you something to work with.
The issue I have with this is the GROUP BY and MAX I'm using on the result set to get the correct values. I'm sure it can be done in a better way.
;WITH cte
AS ( SELECT ID ,
TranDate ,
TranCode ,
1 AS BatchNumber
FROM t
UNION ALL
SELECT t.ID ,
t.TranDate ,
t.TranCode ,
CASE WHEN t.TranCode != cte.TranCode
THEN cte.BatchNumber + 1
ELSE cte.BatchNumber
END AS BatchNumber
FROM t
INNER JOIN cte ON t.id = cte.Id + 1
)
SELECT id ,
trandate ,
trancode ,
MAX(cte.BatchNumber) AS BatchNumber
FROM cte
GROUP BY id ,
tranDate ,
trancode

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Calculate the duration from start date and end date in SQL - sql

I would suggest using the group by keyword to group redundant case numbers and take the min for the startdates and max for stopdates. You can do something like: SELECT caseNumber, max(dateStopped)-min(dateStarted) from patientView where dateStarted != '' and dateStopped != '' GROUP BY caseNumber;

Related

How to prevent SQL query from returning overlapping groups?

Query to find active days per year to find revenue per user per year

Sum and Count by month, shown with last day of that month

How to count employees per hour working in between hours?

Set based solution to generate batch number based on proximity and type of record in SQL server

Categories

Resources