I am using Netezza.
Let's say I have a table with two fields: one field is a timestamp corresponding to every hour in the day, the other is an indicator for whether or not a patient took an antacid during the hour. The table looks as follows:
Timestamp Antacid?
11/23/2016 08:00 1
11/23/2016 09:00 1
11/23/2016 10:00 1
11/23/2016 11:00 0
11/23/2016 12:00 0
11/23/2016 13:00 1
11/23/2016 14:00 1
11/23/2016 15:00 0
Is there a way to assign a common partition value to each set of consecutive hour intervals? Something like this...
Timestamp Antacid? Group
11/23/2016 08:00 1 1
11/23/2016 09:00 1 1
11/23/2016 10:00 1 1
11/23/2016 11:00 0 NULL
11/23/2016 12:00 0 NULL
11/23/2016 13:00 1 2
11/23/2016 14:00 1 2
11/23/2016 15:00 0 NULL
I would ultimately like to figure out the start date and end date for all consecutive hours of antacid usage (so the start and end dates for the first group would be 11/23/2016 08:00 and 11/23/2016 10:00 respectively, and the start/end dates for the second group would be 11/23/2016 13:00 and 11/23/2016 14:00, respectively). I have done this before with consecutive days using extract(epoch from date - row_number()) but I'm not sure how to handle hours.
I assume this has to be done for each patient (id in the query here). You can use
select id,antacid,min(dt) startdate,max(dt) enddate from (
select t.*,
-row_number() over(partition by id,antacid order by dt)
+ row_number() over(partition by id order by dt) grp
from t
) x
where antacid = 1
group by id,antacid,grp
order by 1,3
The inner query gets you the continuous groups of 0 and 1 for antacid for a given patient id. Because you only need the start and end dates for antacid=1, you can use a where clause to filter.
Add partition by date if this has to be done for each day.
Edit: Grouping rows only if the difference between the current row and the next row is one hour.
select id,antacid,min(dt) startdate,max(dt) enddate from (
select t.*,
--change dateadd as per Netezza functions so you add -row_number hours
dateadd(hour,-row_number() over(partition by id,antacid order by dt),dt) grp
from t
) x
where antacid = 1
group by id,antacid,grp
order by 1,3
Related
I have a table in SQL Server about how people going in and out of building.
user_id
datetime
direction
1
27.09.2022 10:30
in
1
27.09.2022 12:30
out
1
27.09.2022 14:30
in
1
27.09.2022 15:35
out
2
27.09.2022 11:30
in
2
27.09.2022 13:20
out
2
27.09.2022 15:00
in
2
27.09.2022 15:40
out
3
27.09.2022 11:45
in
3
27.09.2022 11:46
in
3
27.09.2022 15:40
out
3
27.09.2022 15:47
in
3
27.09.2022 18:00
out
I need to calculate how much time each user spent inside the building by days.
For example, on 27th Sep user #1 spent 3 hours 5 minutes. User #2 spent 2 hours 30 minutes.
There is also a bug that may spoil the results - sometimes I may have two 'in' or two 'out' in a row, like in case of user #3. I understand the nature of such bug, and know I only have to keep last of two same rows (in fact user #3 entered in 11:46, not 11:45). Does anyone have an idea how to solve that?
select user_id
,sum(time_spent) as time_spent_minutes
from (
select *
,datediff(minute, lag(case when direction = 'in' then datetime end) over(partition by user_id order by datetime), datetime) as time_spent
from t
) t
group by user_id
user_id
time_spent_minutes
1
185
2
150
Fiddle
The window functions would be a nice fit here.
Example or Updated dbFiddle
Select user_id
,Duration = convert(time(0),dateadd(second,sum(Secs),0))
From (
Select user_id
,Secs = datediff(second,case when direction ='in'
and lead([direction],1) over (partition by user_id order by datetime)='out'
then [datetime]
end
,lead([datetime],1) over (partition by user_id order by datetime))
From YourTable
) A
Group By user_id
Results
user_id Duration
1 03:05:00 -- << Check your desired results
2 02:30:00
3 06:07:00
What I want to do is to calculate the number of shifts and hours worked by each employee in any given 7-day period. In order to achieve this, I need to identify and group 'islands' of shifts. Note that this 7-day period is not tied to a calendar week and the beginning and ending of this 7-day period would vary from employee to employee. This is sets it apart from other similar questions asked her in the past.
I have a table like this:
Person ID Start Date End Date Start time End time Hours Worked
12345 06-07-20 06-07-20 6:00 AM 7:45 AM 1.75
12345 06-07-20 06-07-20 8:15 AM 8:45 AM 0.50
12345 06-07-20 06-07-20 9:19 AM 9:43 AM 0.40
12345 08-07-20 08-07-20 12:00 AM 12:39 AM 0.65
12345 09-07-20 09-07-20 10:05 PM 11:59 PM 1.90
12345 11-07-20 11-07-20 4:39 PM 4:54 PM 0.25
12345 22-07-20 22-07-20 7:00 AM 7:30 AM 0.50
12345 23-07-20 23-07-20 1:00 PM 3:00 PM 2.00
12345 24-07-20 24-07-20 9:14 AM 9:35 AM 0.35
12345 27-07-20 27-07-20 4:00 PM 6:00 PM 2.00
12345 27-07-20 27-07-20 2:00 PM 4:00 PM 2.00
12345 28-07-20 28-07-20 9:00 AM 10:00 AM 1.00
12345 28-07-20 28-07-20 4:39 AM 4:59 AM 0.34
I want group and summarise the data above like this:
Person ID From To Number of shifts Number of Hours
12345 06-07-20 11-07-20 6 5.45
12345 22-07-20 28-07-20 7 8.19
Note that the first grouping for employee 12345 starts on 06-07-20 and ends on 11-07-20 because these shifts fall within the 06-07-20 - 13-07-20 7-day window.
The next day 7-day window is from 22-07-20 to 28-07-20, which means that the start date for the 7-day window has to be dynamic and based on the data i.e. not constant which makes this a complex task.
Also note that an employee may work multiple shifts in a day and that the shifts may not be consecutive.
I was playing around with using DATEDIFF() with LAG() and LEAD() but was unable to get to where I want. Any help would be appreciated.
I think you need a recursive CTE gor this. The idea is to enumerate the shifts of each person, and then iteratively walk the dataset, while keeping track of the first date of the period - when there is more than 7 days between the start of a period and the current date, the start date resets, and a new group starts.
with recursive
data as (select t.*, row_number() over(partition by personid order by start_date) rn from mytable t)
cte as (
select personid, start_date, start_date end_date, hours_worked, rn
from data
where rn = 1
union all
select
c.personid,
case when d.start_date > dateadd(day, 7, c.start_date) then d.start_date else c.start_date end,
d.start_date,
d.hours_worked,
d.rn
from cte c
inner join data d on d.personid = c.personid and d.rn = c.rn + 1
)
select personid, start_date, max(start_date) end_date, count(*) no_shifts, sum(hours_worked)
from cte
group by personid, start_date
This assumes that:
dates do not span over multiple days, as shown in your sample data
dates are stored as date datatype, and times as time
I have been trying to separate groups in data being stored on my oracle database for more accurate analysis.
Current Output
Time Location
10:00 A111
11:00 A112
12:00 S111
13:00 S234
17:00 A234
18:00 S747
19:00 A878
Desired Output
Time Location Group Number
10:00 A111 1
11:00 A112 1
12:00 S111 1
13:00 S234 1
17:00 A234 2
18:00 S747 2
19:00 A878 3
I have been trying to use over and partition by to assign the values, however I can only get into to increment all the time not only on a change. Also tried using lag but I struggled to make use of that.
I only need the value in the second column to start from 1 and increment when the first letter of field 1 changes (using substr).
This is my attempt using row_number but I am far off I think. There would be a time column in the output as well not shown above.
select event_time, st_location, Row_Number() over(partition by
SUBSTR(location,1,1) order
by event_time)
as groupnumber from pic
Any help would be really appreciated!
Edit:
Time Location Group Number
10:00 A-10112 1
11:00 A-10421 1
12:00 ST-10621 1
13:00 ST-23412 1
17:00 A-19112 2
18:00 ST-74712 2
19:00 A-87812 3
It is a gap and island problem. Use the following code:
select location,
dense_rank() over (partition by SUBSTR(location,1,1) order by grp)
from
(
select (row_number() over (order by time)) -
(row_number() over (partition by SUBSTR(location,1,1) order by time)) grp,
location,
time
from data
) t
order by time
dbfiddle demo
The main idea is in the subquery which isolates consecutive sequences of items (computation of grp column). The rest is simple once you have the grp column.
select DENSE_RANK() over(partition by SUBSTR("location",1,1) ORDER BY SUBSTR("location",1,2))
as Rownumber,
"location" from Table1;
Demo
http://sqlfiddle.com/#!4/21120/16
I have the following data ordered by events, ID and then start_time:
EVENT ID START_TIME END_TIME
1 101 1:00 2:00
1 101 3:00 3:30
1 102 1:00 4:00
1 102 5:00 6:00
2 103 10:00 11:00
2 103 12:00 13:00
2 103 13:30 14:00
2 103 14:30 15:00
And I want to end up with the following:
Difference_hour Frequency
1 3
0,5 2
I would like to obtain a query that is looking at the difference between the END_TIME of an ID and the START_TIME of the same ID within the same EVENT (to mention specifically, i am not interested in the difference between the START_TIME and END_TIME of the same row).
Example: in event 1 we have to ID's 101, and I would like to have the difference between the first END_TIME (2:00) and the following START_TIME on the second row 3:00). The difference is 1 hour. If we do this similar for ID 102, we end up with another difference of 1 hour.
In the end, I would like to count the frequency of each of the differences, which can be seen in the second table.
select diff_hour, count(*)
from
(
select (next_start - end_time)*86400 as diff_hour
from
(
select end_time, lead(start_time) over (partition by event, id order by start_time) next_start
from MyTable
) x1
where next_start is not null
) x2
group by diff_hour
I have two tables, parent table Employees and child table Employees_Availability, like this:
Employees table:
EmployeesID Name Group Availability_Order Available
--------------------------------------------------------------
1 Steve Sales 1 TRUE
2 Ann Sales 2 TRUE
3 Jack Sales 3 FALSE
4 Sandy Support 4 TRUE
5 Bill Support 5 TRUE
6 John Support 6 TRUE
Employees_Schedule table:
EmployeesID Day From To
----------------------------------------------
1 Monday 8:00 12:00
1 Monday 13:00 17:00
2 Monday 12:00 13:00
3 Tuesday 7:30 11:30
3 Wednesday 7:30 11:30
3 Friday 14:30 16:30
4 Tuesday 11:30 17:00
5 Wednesday 8:00 12:00
5 Wednesday 13:00 17:00
5 Thursday 12:00 13:00
5 Friday 7:30 11:30
6 Friday 12:00 13:00
How can I create a query that given date/time and Group return first available employee? I am using SQL Server 2012. Here is what I started doing but got stuck:
Select top 1
Name
from
Empolyees e join? Employees_Schedule s
on
e.employeesID = s.EmployeesID
where
e.group = 'Sales'
and DATENAME(Weekday,'5/24/2016 10:00') = s.Day
and CAST('5/24/2016 10:00' AS TIME) 'hh:mm' >= CAST(s.from AS TIME)
and CAST('5/24/2016 10:00' AS TIME) 'hh:mm' <= CAST(s.to AS TIME)
order by
e.availability_order
Thanks
Have you looked into Window Function and CTE? You could easily achieve this with, for example..
Row_Number() OVER(PARTITION BY day ORDER BY starttime ASC) as ColumnName
Combined with predicate
WHERE columnName = 1 AND groupName = 'groupname'
For detail, read BOL on OVER()Clause here, and CTE here.
It looks like you're close. If you wrap the main part of your SQL in a Common Table Expression and use the row_number() window function then you can find the first available:
;with cte as (
Select top 1
Name,
row_number() over (order by ea.From) PrioritySequence
from
Empolyees e join? Employees_Schedule s
on
e.employeesID = s.EmployeesID
where
e.group = 'Sales'
and DATENAME(Weekday,'5/24/2016 10:00') = s.Day
and CAST('5/24/2016 10:00' AS TIME) 'hh:mm' >= CAST(s.from AS TIME)
and CAST('5/24/2016 10:00' AS TIME) 'hh:mm' <= CAST(s.to AS TIME)
)
select *
from cte
where PrioritySequence = 1