SQL: Differences between times and counting the frequency - sql

I have the following data ordered by events, ID and then start_time:
EVENT ID START_TIME END_TIME
1 101 1:00 2:00
1 101 3:00 3:30
1 102 1:00 4:00
1 102 5:00 6:00
2 103 10:00 11:00
2 103 12:00 13:00
2 103 13:30 14:00
2 103 14:30 15:00
And I want to end up with the following:
Difference_hour Frequency
1 3
0,5 2
I would like to obtain a query that is looking at the difference between the END_TIME of an ID and the START_TIME of the same ID within the same EVENT (to mention specifically, i am not interested in the difference between the START_TIME and END_TIME of the same row).
Example: in event 1 we have to ID's 101, and I would like to have the difference between the first END_TIME (2:00) and the following START_TIME on the second row 3:00). The difference is 1 hour. If we do this similar for ID 102, we end up with another difference of 1 hour.
In the end, I would like to count the frequency of each of the differences, which can be seen in the second table.

select diff_hour, count(*)
from
(
select (next_start - end_time)*86400 as diff_hour
from
(
select end_time, lead(start_time) over (partition by event, id order by start_time) next_start
from MyTable
) x1
where next_start is not null
) x2
group by diff_hour

Related

How do I calculate the amount of time between multiple datetimes in multiple rows in sql

I've done a search but I can't find any that are exactly what I need. I need to be able to calculate the amount of time that someone has been in the building over time in a sql query (T-SQL on SQL Server). The data looks like this:
UserId Clocking Status
------------------------------
1 01/12/2020 09:00 In
2 01/12/2020 09:12 In
1 01/12/2020 09:25 Out
3 01/12/2020 10:00 In
2 01/12/2020 10:45 Out
3 01/12/2020 13:11 Out
1 03/12/2020 11:14 In
2 03/12/2020 15:56 In
1 03/12/2020 16:04 Out
2 03/12/2020 17:00 Out
I want the output to look like this:
UserId TimeInBuilding
----------------------
1 03:35
2 05:25
3 03:11
Assuming that the ins/outs are perfectly interleaved, you can do this by assigning the next "out" time to the "in" time and aggregating:
select userid,
sum(datediff(second, clocking, out_time)) / (60.0 * 60) as decimal_hours
from (select t.*,
lead(clocking) over (partition by userid order by clocking) as out_time
from t
) t
where status = 'In'
group by userid;
You can convert this to HH:MM format using:
select userid,
convert(varchar(5),
convert(time,
dateadd(second,
sum(datediff(second, clocking, out_time),
0)
)
) as hhmm
from (select t.*,
lead(clocking) over (partition by userid order by clocking) as out_time
from t
) t
where status = 'In'
group by userid;
Here is a db<>fiddle.

How to generate series using start and end date and quarters on postgres

I have a table like shown below where I want to use the start and end date to evenly distribute the value for each row to the 3 months in each quarter to all of the quarters in between start and end date (last two columns).
I am familiar with generate series and intervals in Postgres but I am having hard time to get what I want.
My table has and ID column that groups rows together, a quarter column that indicates which quarter the row references for the ID, a value column that is the value for the whole quarter (and every quarter in the date range), and start_date and end_date columns indicating the date range. Here is a sample:
ID quarter value start_date end_date
1 2 152 2019-11-07 2050-12-30
1 1 785 2019-11-07 2050-12-30
2 2 152 2019-03-05 2050-12-30
2 1 785 2019-03-05 2050-12-30
3 4 41 2018-06-12 2050-12-30
3 3 50 2018-06-12 2050-12-30
3 2 88 2018-06-12 2050-12-30
3 1 29 2018-06-12 2050-12-30
4 2 1607 2018-12-17 2050-12-30
4 1 4803 2018-12-17 2050-12-30
Here is my desired output (for ID 1):
ID quarter value start_date end_date
1 2 152/3 2020-04-01 2020-07-01
1 1 785/3 2020-01-01 2020-04-01
1 2 152/3 2021-04-01 2021-07-01
1 1 785/3 2021-01-01 2021-04-01
start_date in the output will be the next quarter on first table. I need the series to be generated from the start_date to the end_date of the first table.
You can do this by using the GENERATE_SERIES function and passing in the start and end date for each unique (by ID) row and setting the interval to 3 months. Then join the result back with your original table on both ID and quarter.
Here's an example (note original_data is what I've called your first table):
WITH
quarters_table AS (
SELECT
t.ID,
(EXTRACT('month' FROM t.quarter_date) - 1)::INT / 3 + 1 AS quarter,
t.quarter_date::DATE AS start_date,
COALESCE(
LEAD(t.quarter_date) OVER (),
DATE_TRUNC('quarter', t.original_end_date) + INTERVAL '3 months'
)::DATE AS end_date
FROM (
SELECT
original_record.ID,
original_record.end_date AS original_end_date,
GENERATE_SERIES(
DATE_TRUNC('quarter', original_record.start_date),
DATE_TRUNC('quarter', original_record.end_date),
INTERVAL '3 months'
) AS quarter_date
FROM (
SELECT DISTINCT ON (original_data.ID)
original_data.ID,
original_data.start_date,
original_data.end_date
FROM
original_data
ORDER BY
original_data.ID
) AS original_record
) AS t
)
SELECT
quarters_table.ID,
quarters_table.quarter,
original_data.value::DOUBLE PRECISION / 3 AS value,
quarters_table.start_date,
quarters_table.end_date
FROM
quarters_table
INNER JOIN
original_data
ON
quarters_table.ID = original_data.ID
AND quarters_table.quarter = original_data.quarter;
Sample output:
id | quarter | value | start_date | end_date
----+---------+------------------+------------+------------
1 | 1 | 261.666666666667 | 2020-01-01 | 2020-04-01
1 | 2 | 50.6666666666667 | 2020-04-01 | 2020-07-01
1 | 1 | 261.666666666667 | 2021-01-01 | 2021-04-01
1 | 2 | 50.6666666666667 | 2021-04-01 | 2021-07-01
For completeness, here's the original_data table I've used in testing:
WITH
original_data AS (
SELECT
1 AS ID,
2 AS quarter,
152 AS value,
'2019-11-07'::DATE AS start_date,
'2050-12-30'::DATE AS end_date
UNION ALL
SELECT
1 AS ID,
1 AS quarter,
785 AS value,
'2019-11-07'::DATE AS start_date,
'2050-12-30'::DATE AS end_date
UNION ALL
SELECT
2 AS ID,
2 AS quarter,
152 AS value,
'2019-03-05'::DATE AS start_date,
'2050-12-30'::DATE AS end_date
-- ...
)
This is one way to go about it. Showing an example based on the output you've outlined. You can then add more conditions to the CASE/WHEN for additional quarters.
SELECT
ID,
Quarter,
Value/3 AS "Value",
CASE
WHEN Quarter = 1 THEN '2020-01-01'
WHEN Quarter = 2 THEN '2020-04-01'
END AS "Start_Date",
CASE
WHEN Quarter = 1 THEN '2020-04-01'
WHEN Quarter = 2 THEN '2020-07-01'
END AS "End_Date"
FROM
Table

Giving a common value to groups of consecutive hours in SQL

I am using Netezza.
Let's say I have a table with two fields: one field is a timestamp corresponding to every hour in the day, the other is an indicator for whether or not a patient took an antacid during the hour. The table looks as follows:
Timestamp Antacid?
11/23/2016 08:00 1
11/23/2016 09:00 1
11/23/2016 10:00 1
11/23/2016 11:00 0
11/23/2016 12:00 0
11/23/2016 13:00 1
11/23/2016 14:00 1
11/23/2016 15:00 0
Is there a way to assign a common partition value to each set of consecutive hour intervals? Something like this...
Timestamp Antacid? Group
11/23/2016 08:00 1 1
11/23/2016 09:00 1 1
11/23/2016 10:00 1 1
11/23/2016 11:00 0 NULL
11/23/2016 12:00 0 NULL
11/23/2016 13:00 1 2
11/23/2016 14:00 1 2
11/23/2016 15:00 0 NULL
I would ultimately like to figure out the start date and end date for all consecutive hours of antacid usage (so the start and end dates for the first group would be 11/23/2016 08:00 and 11/23/2016 10:00 respectively, and the start/end dates for the second group would be 11/23/2016 13:00 and 11/23/2016 14:00, respectively). I have done this before with consecutive days using extract(epoch from date - row_number()) but I'm not sure how to handle hours.
I assume this has to be done for each patient (id in the query here). You can use
select id,antacid,min(dt) startdate,max(dt) enddate from (
select t.*,
-row_number() over(partition by id,antacid order by dt)
+ row_number() over(partition by id order by dt) grp
from t
) x
where antacid = 1
group by id,antacid,grp
order by 1,3
The inner query gets you the continuous groups of 0 and 1 for antacid for a given patient id. Because you only need the start and end dates for antacid=1, you can use a where clause to filter.
Add partition by date if this has to be done for each day.
Edit: Grouping rows only if the difference between the current row and the next row is one hour.
select id,antacid,min(dt) startdate,max(dt) enddate from (
select t.*,
--change dateadd as per Netezza functions so you add -row_number hours
dateadd(hour,-row_number() over(partition by id,antacid order by dt),dt) grp
from t
) x
where antacid = 1
group by id,antacid,grp
order by 1,3

SQL Server Query to Get Available Employee based on Schedule

I have two tables, parent table Employees and child table Employees_Availability, like this:
Employees table:
EmployeesID Name Group Availability_Order Available
--------------------------------------------------------------
1 Steve Sales 1 TRUE
2 Ann Sales 2 TRUE
3 Jack Sales 3 FALSE
4 Sandy Support 4 TRUE
5 Bill Support 5 TRUE
6 John Support 6 TRUE
Employees_Schedule table:
EmployeesID Day From To
----------------------------------------------
1 Monday 8:00 12:00
1 Monday 13:00 17:00
2 Monday 12:00 13:00
3 Tuesday 7:30 11:30
3 Wednesday 7:30 11:30
3 Friday 14:30 16:30
4 Tuesday 11:30 17:00
5 Wednesday 8:00 12:00
5 Wednesday 13:00 17:00
5 Thursday 12:00 13:00
5 Friday 7:30 11:30
6 Friday 12:00 13:00
How can I create a query that given date/time and Group return first available employee? I am using SQL Server 2012. Here is what I started doing but got stuck:
Select top 1
Name
from
Empolyees e join? Employees_Schedule s
on
e.employeesID = s.EmployeesID
where
e.group = 'Sales'
and DATENAME(Weekday,'5/24/2016 10:00') = s.Day
and CAST('5/24/2016 10:00' AS TIME) 'hh:mm' >= CAST(s.from AS TIME)
and CAST('5/24/2016 10:00' AS TIME) 'hh:mm' <= CAST(s.to AS TIME)
order by
e.availability_order
Thanks
Have you looked into Window Function and CTE? You could easily achieve this with, for example..
Row_Number() OVER(PARTITION BY day ORDER BY starttime ASC) as ColumnName
Combined with predicate
WHERE columnName = 1 AND groupName = 'groupname'
For detail, read BOL on OVER()Clause here, and CTE here.
It looks like you're close. If you wrap the main part of your SQL in a Common Table Expression and use the row_number() window function then you can find the first available:
;with cte as (
Select top 1
Name,
row_number() over (order by ea.From) PrioritySequence
from
Empolyees e join? Employees_Schedule s
on
e.employeesID = s.EmployeesID
where
e.group = 'Sales'
and DATENAME(Weekday,'5/24/2016 10:00') = s.Day
and CAST('5/24/2016 10:00' AS TIME) 'hh:mm' >= CAST(s.from AS TIME)
and CAST('5/24/2016 10:00' AS TIME) 'hh:mm' <= CAST(s.to AS TIME)
)
select *
from cte
where PrioritySequence = 1

SQL getting datediff from same field

I have a problem. I need to get the date difference in terms of hours in my table but the problem is it is saved in the same field. This is my table would look like.
RecNo. Employeeno recorddate recordtime recordval
1 001 8/22/2014 8:15 AM 1
2 001 8/22/2014 5:00 PM 2
3 001 8/24/2014 8:01 AM 1
4 001 8/24/2014 5:01 PM 2
1 indicates time in and 2 indicates time out. Now, How will i get the number of hours worked for each day? What i want to get is something like this.
Date hoursworked
8/22/2014 8
8/24/2014 8
I am using VS 2010 and SQL server 2005
You could self-join each "in" record with its corresponding "out" record and use datediff to subtract them:
SELECT time_in.employeeno AS "Employee No",
time_in.recorddate AS "Date",
DATEDIFF (hour, time_in.recordtime, time_out.recordtime)
AS "Hours Worked"
FROM (SELECT *
FROM my_table
WHERE recordval = 1) time_in
INNER JOIN (SELECT *
FROM my_table
WHERE recordval = 2) time_out
ON time_in.employeeno = time_out.employeeno AND
time_in.recorddate = time_out.recorddate
If you always record time in and time out for every employee, and just one per day, using a self-join should work:
SELECT
t1.Employeeno,
t1.recorddate,
t1.recordtime AS [TimeIn],
t2.recordtime AS [TimeOut],
DATEDIFF(HOUR,t1.recordtime, t2.recordtime) AS [HoursWorked]
FROM Table1 t1
INNER JOIN Table1 t2 ON
t1.Employeeno = t2.Employeeno
AND t1.recorddate = t2.recorddate
WHERE t1.recordval = 1 AND t2.recordval = 2
I included the recordtime fields as time in, time out, if you don't want them just remove them.
Note that this datediff calculation gives 9 hours, and not 8 as you suggested.
Sample SQL Fiddle
Using this sample data:
with table1 as (
select * from ( values
(1,'001', cast('20140822' as datetime),cast('08:15:00 am' as time),1)
,(2,'001', cast('20140822' as datetime),cast('05:00:00 pm' as time),2)
,(3,'001', cast('20140824' as datetime),cast('08:01:00 am' as time),1)
,(4,'001', cast('20140824' as datetime),cast('04:59:00 pm' as time),2)
,(5,'001', cast('20140825' as datetime),cast('10:01:00 pm' as time),1)
,(6,'001', cast('20140826' as datetime),cast('05:59:00 am' as time),2)
)data(RecNo,EmployeeNo,recordDate,recordTime,recordVal)
)
this query
SELECT
Employeeno
,convert(char(10),recorddate,120) as DateStart
,convert(char(5),cast(TimeIn as time)) as TimeIn
,convert(char(5),cast(TimeOut as time)) as TimeOut
,DATEDIFF(minute,timeIn, timeOut) / 60 AS [HoursWorked]
,DATEDIFF(minute,timeIn, timeOut) % 60 AS [MinutesWorked]
FROM (
SELECT
tIn.Employeeno,
tIn.recorddate,
dateadd(minute, datediff(minute,0,tIn.recordTime), tIn.recordDate)
as TimeIn,
( SELECT TOP 1
dateadd(minute, datediff(minute,0,tOut.recordTime), tOut.recordDate)
as TimeOut
FROM Table1 tOut
WHERE tOut.RecordVal = 2
AND tOut.EmployeeNo = tIn.EmployeeNo
AND tOut.RecNo > tIn.RecNo
ORDER BY tOut.EmployeeNo, tOut.RecNo
) as TimeOut
FROM Table1 tIn
WHERE tIn.recordval = 1
) T
yields (as desired)
Employeeno DateStart TimeIn TimeOut HoursWorked MinutesWorked
---------- ---------- ------ ------- ----------- -------------
001 2014-08-22 08:15 17:00 8 45
001 2014-08-24 08:01 16:59 8 58
001 2014-08-25 22:01 05:59 7 58
No assumptions are made about shifts not running across midnight (see case 3).
This particular implementation may not be the most performant way to construct this correlated subquery, so if there is a performance problem come back and we can look at it again. However running those tests requires a large dataset which I don't feel like constructing just now.