Summing field in other rows conditionally - sql

I have table in the form like below:
Pilot
Leg
Duration
Takeoff
John
1
60
9:00:00
John
2
60
9:00:00
John
3
30
9:00:00
Paul
1
60
12:00:00
Paul
2
30
12:00:00
Paul
3
30
12:00:00
Paul
4
60
12:00:00
And I am trying to figure out is a query to get the following:
Pilot
Leg
Duration
Takeoff
LegStart
John
1
60
9:00:00
9:00:00
John
2
60
9:00:00
10:00:00
John
3
30
9:00:00
10:30:00
Paul
1
60
12:00:00
12:00:00
Paul
2
30
12:00:00
13:00:00
Paul
3
30
12:00:00
13:30:00
Paul
4
60
12:00:00
14:00:00
So the 'LegStart' time is the 'TakeOff' time, plus the duration of prior legs for that pilot.
Now , to do this in SQL, I need to somehow add up the durations of prior legs for the same pilot. But for the life of me... I cannot figure out how you can do this because the pilots can have a variable number of legs, so joining doesn't get you anywhere.

You can use a cumulative sum. The trick is including this in the
select t.*,
sum(duration) over (partition by pilot order by leg) as running_duration,
datetime_add(takeoff,
interval (sum(duration) over (partition by pilot order by leg) - duration) minute
) as leg_start
from t;
Note: This assumes that takeoff is a datetime.

Try analytic SUM sum(duration) over (partition by pilot order by leg):
with mytable as (
select 'John' as pilot, 1 as leg, 60 as duration, time '9:00:00' as takeoff union all
select 'John', 2, 60, '9:00:00' union all
select 'John', 3, 30, '9:00:00' union all
select 'Paul', 1, 60, '12:00:00' union all
select 'Paul', 2, 30, '12:00:00' union all
select 'Paul', 3, 30, '12:00:00' union all
select 'Paul', 4, 60, '12:00:00'
)
select
*,
time_add(takeoff, interval ifnull(sum(duration) over (partition by pilot order by leg ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING), 0) minute) as legstart
from mytable

Related

How to create a sql statement or anonymous plsql block to increase count when only having a start time and an end time

I have an ask for a count of number of guests in a venue broken down to the minute. The data set I have available to me is the venue, the date/time the guest entered the venue, and the date/time the guest exited the venue. The business is asking for a breakdown by minute of the count of guests in the venue.
For example, guest A enters the venue at 12:00 and exits at 13:00. Guest B enters the venue at 12:30 and exits at 13:30. The expected output would show a count of 1 from 12:00 to 12:29, a count of two from 12:30 to 13:00, and back to a count of one from 13:00 to 13:30.
I’m struggling with the ask due to restrictions placed upon me. I am not authorized to make any structure changes; therefore, no DDL, which means I am restricted to SQL or anonymous PLSQL blocks. More information: however, I am unsure if it is necessary. The database version is 12.2c and it is running on AIX.
I do have a workaround where I extract the dataset as a csv and import it into a C# console application, which I wrote, but I would prefer if the ask can be conducted within the Oracle ecosystem.
I appreciate any help or insight you can share about my problem.
You can solve this problem with a combination of several tricks: connect by level <= 91 to create the 91 minutes for the time frame, a left join to include all minutes even if there isn't an event at that minute, a case and sum to count and sum arrivals and departures, and finally an analytic function to generate the running total of guests by adding arrivals and subtracting departures.
--The number of guests present per minute.
select
the_minute,
sum(arrive_counter + depart_counter) over (order by the_minute) guest_count
from
(
--Join time and visits and count arrivals and departures.
select
the_minute,
sum(case when the_minute = arrive_date then 1 else 0 end) arrive_counter,
sum(case when the_minute = depart_date then -1 else 0 end) depart_counter
from
(
--Every minute for a time period. (Change to 1441 for an entire day.)
select timestamp '2022-01-24 12:00:00' + (level - 1) * interval '1' minute the_minute
from dual
connect by level <= 91
) minutes
left join visit
on minutes.the_minute = arrive_date
or minutes.the_minute = depart_date
group by the_minute
order by the_minute
)
order by the_minute;
Results:
THE_MINUTE GUEST_COUNT
24-JAN-22 12.00.00.000000000 PM 1
24-JAN-22 12.01.00.000000000 PM 1
...
24-JAN-22 12.28.00.000000000 PM 1
24-JAN-22 12.29.00.000000000 PM 1
24-JAN-22 12.30.00.000000000 PM 2
24-JAN-22 12.31.00.000000000 PM 2
...
24-JAN-22 12.58.00.000000000 PM 2
24-JAN-22 12.59.00.000000000 PM 2
24-JAN-22 01.00.00.000000000 PM 1
24-JAN-22 01.01.00.000000000 PM 1
...
24-JAN-22 01.28.00.000000000 PM 1
24-JAN-22 01.29.00.000000000 PM 1
24-JAN-22 01.30.00.000000000 PM 0
You can use:
SELECT timestamp AS time_from,
LEAD(timestamp) OVER(ORDER BY timestamp) AS time_to,
SUM(SUM(change_in_guests)) OVER (ORDER BY timestamp) AS guests
FROM guests
UNPIVOT(
timestamp FOR change_in_guests IN (
entry AS +1,
exit AS -1
)
)
GROUP BY timestamp;
Which, for the sample data:
CREATE TABLE guests (id, entry, exit) AS
SELECT 'A', DATE '2022-01-25' + INTERVAL '12:00' HOUR TO MINUTE, DATE '2022-01-25' + INTERVAL '13:00' HOUR TO MINUTE FROM DUAL UNION ALL
SELECT 'B', DATE '2022-01-25' + INTERVAL '12:30' HOUR TO MINUTE, DATE '2022-01-25' + INTERVAL '13:30' HOUR TO MINUTE FROM DUAL;
Outputs:
TIME_FROM
TIME_TO
GUESTS
2022-01-25 12:00:00
2022-01-25 12:30:00
1
2022-01-25 12:30:00
2022-01-25 13:00:00
2
2022-01-25 13:00:00
2022-01-25 13:30:00
1
2022-01-25 13:30:00
null
0
If you want it minute-by-minute then:
WITH minutes (minute, time_to, guests) AS (
SELECT timestamp,
LEAD(timestamp) OVER(ORDER BY timestamp),
SUM(SUM(change_in_guests)) OVER (ORDER BY timestamp)
FROM guests
UNPIVOT(
timestamp FOR change_in_guests IN (
entry AS +1,
exit AS -1
)
)
GROUP BY timestamp
UNION ALL
SELECT minute + INTERVAL '1' MINUTE,
time_to,
guests
FROM minutes
WHERE minute + INTERVAL '1' MINUTE < time_to
)
SEARCH DEPTH FIRST BY minute SET order_rn
SELECT minute,
guests
FROM minutes;
Which outputs:
MINUTE
GUESTS
2022-01-25 12:00:00
1
2022-01-25 12:01:00
1
2022-01-25 12:02:00
1
...
...
2022-01-25 12:28:00
1
2022-01-25 12:29:00
1
2022-01-25 12:30:00
2
2022-01-25 12:31:00
2
...
...
2022-01-25 12:58:00
2
2022-01-25 12:59:00
2
2022-01-25 13:00:00
1
2022-01-25 13:01:00
1
...
...
2022-01-25 13:28:00
1
2022-01-25 13:29:00
1
2022-01-25 13:30:00
0
db<>fiddle here

Merge several rows into one if they have a gap less than a 5 seconds

I'm trying to find a SQL query that lets me to merge some rows into one from a table that have a gap less than 5 seconds. For example, I have a table like the following:
Name | Time
==============================
John 2021-02-01 13:08:10
John 2021-02-01 13:08:12
John 2021-02-01 17:35:23
John 2021-02-07 11:16:31
Walt 2021-01-14 10:23:48
Joseph 2021-01-23 07:04:33
Walt 2021-01-14 10:23:51
Walt 2021-01-04 09:22:45
So, I want to have a result like this:
Name | Time
==============================
John 2021-02-01
John 2021-02-01
John 2021-02-07
Walt 2021-01-14
Walt 2021-01-04
Joseph 2021-01-23
For John there are two rows that have a gap less than 5 seconds, so they will merge in one row for the same day. The same happens with Walt.
Can I do this with a SQL query?
Thank you in advance.
You just need to check if the next date is within 5 seconds after the current row and if so - remove such a row. This can be achieved with LEAD analytical function.
with a as (
select 'John' as name, convert(datetime, '2021-02-01 13:08:10', 120) as dt union all
select 'John' as name, convert(datetime, '2021-02-01 13:08:12', 120) as dt union all
select 'John' as name, convert(datetime, '2021-02-01 13:08:15', 120) as dt union all
select 'John' as name, convert(datetime, '2021-02-01 17:35:23', 120) as dt union all
select 'John' as name, convert(datetime, '2021-02-07 11:16:31', 120) as dt union all
select 'Walt' as name, convert(datetime, '2021-01-14 10:23:48', 120) as dt union all
select 'Joseph' as name, convert(datetime, '2021-01-23 07:04:33', 120) as dt union all
select 'Walt' as name, convert(datetime, '2021-01-14 10:23:51', 120) as dt union all
select 'Walt' as name, convert(datetime, '2021-01-04 09:22:45', 120) as dt
)
, gap_size as (
select
name,
dt,
/*Check the difference between current row and the next row per name*/
datediff(s,
dt,
lead(dt) over(partition by name order by dt asc)
) as within_5_secs_with_next
from a
)
select
name,
cast(dt as date) as dt_date
from gap_size
where coalesce(within_5_secs_with_next, 10) >= 5
order by name, dt asc
GO
name | dt_date
:----- | :---------
John | 2021-02-01
John | 2021-02-01
John | 2021-02-07
Joseph | 2021-01-23
Walt | 2021-01-04
Walt | 2021-01-14
db<>fiddle here

BigQuery: Computing the timestamp diff in time ordered rows in a group

Given a table like this, I would like to compute the time duration of each state before changing to a different state:
id state timestamp
1 1 2018-08-17 10:40:00
1 2 2018-08-17 12:40:00
1 1 2018-08-17 14:40:00
2 1 2018-08-17 09:00:00
2 2 2018-08-17 12:00:00
The output I want is:
id state date duration
1 1 2018-08-17 2 hours
1 2 2018-08-17 2 hours
1 1 2018-08-17 9 hours 20 minutes (until the end of the day in this case)
2 1 2018-08-17 3 hours
2 2 2018-08-17 12 hours (until the end of the day in this case)
I am not so sure whether this is doable in SQL. I feel like I have to write a UDF against aggregated state and timestamp (grouped by id and ordered by ts) which outputs an array of struct (id, state, date, and duration). This array can be flattened.
Below is for BigQuery Standard SQL
#standardSQL
SELECT id, state,
IFNULL(
TIMESTAMP_DIFF(LEAD(ts) OVER(PARTITION BY id ORDER BY ts), ts, MINUTE),
24*60 - TIMESTAMP_DIFF(ts, TIMESTAMP_TRUNC(ts, DAY), MINUTE)
) AS duration_minutes
FROM `project.dataset.table`
You can test, play with above using dummy data from your question:
#standardSQL
WITH `project.dataset.table` AS (
SELECT 1 id, 1 state, TIMESTAMP('2018-08-17 10:40:00') ts UNION ALL
SELECT 1, 2, '2018-08-17 12:40:00' UNION ALL
SELECT 1, 1, '2018-08-17 14:40:00' UNION ALL
SELECT 2, 1, '2018-08-17 09:00:00' UNION ALL
SELECT 2, 2, '2018-08-17 12:00:00'
)
SELECT id, state,
IFNULL(
TIMESTAMP_DIFF(LEAD(ts) OVER(PARTITION BY id ORDER BY ts), ts, MINUTE),
24*60 - TIMESTAMP_DIFF(ts, TIMESTAMP_TRUNC(ts, DAY), MINUTE)
) AS duration_minutes
FROM `project.dataset.table`
-- ORDER BY id, ts
with result as below
Row id state duration_minutes
1 1 1 120
2 1 2 120
3 1 1 560
4 2 1 180
5 2 2 720
If you need your output formatted exactly the qay you showed in question - use below
#standardSQL
SELECT id, state, ts, duration_minutes,
FORMAT('%i hours %i minutes', DIV(duration_minutes, 60), MOD(duration_minutes, 60)) duration
FROM (
SELECT id, state, ts,
IFNULL(
TIMESTAMP_DIFF(LEAD(ts) OVER(PARTITION BY id ORDER BY ts), ts, MINUTE),
24*60 - TIMESTAMP_DIFF(ts, TIMESTAMP_TRUNC(ts, DAY), MINUTE)
) AS duration_minutes
FROM `project.dataset.table`
)
In this case you output will look like below
Row id state ts duration_minutes duration
1 1 1 2018-08-17 10:40:00 UTC 120 2 hours 0 minutes
2 1 2 2018-08-17 12:40:00 UTC 120 2 hours 0 minutes
3 1 1 2018-08-17 14:40:00 UTC 560 9 hours 20 minutes
4 2 1 2018-08-17 09:00:00 UTC 180 3 hours 0 minutes
5 2 2 2018-08-17 12:00:00 UTC 720 12 hours 0 minutes
Sure, you will most likely still need to adjust above to your particular case - but you've got a good start I think

Dense_rank and sum

I have this common table expression
WITH total_hour
AS (
SELECT
employee_id,
SUM(ROUND(CAST(DATEDIFF(MINUTE, start_time, finish_time) AS NUMERIC(18, 0)) / 60, 2)) AS total_h
FROM Timesheet t
WHERE t.employee_id = #employee_id
AND DENSE_RANK() OVER (
ORDER BY DATEDIFF(DAY, '20130925', date_worked) / 7 DESC ) = #rank
GROUP BY t.personnel_id
)
This is the sample data:
ID employee_id worked_date start_time finish_time
1 1 2013-09-25 09:00:00 17:30:00
2 1 2013-09-26 07:00:00 17:00:00
8 1 2013-10-01 09:00:00 17:00:00
9 1 2013-10-04 09:00:00 17:00:00
12 1 2013-10-07 09:00:00 17:00:00
13 1 2013-10-30 09:00:00 17:00:00
14 1 2013-10-28 09:00:00 17:00:00
15 1 2013-11-01 09:00:00 17:00:00
Supposed Wednesday is the first day of the week and my based date is 2013-09-25. I want to get the total number of hours worked from 09-25 to 10-01 when #rank is 1 and total hour from 10-02 to 10-08 when #rank=2 and so on.
Thanks
To get the number of hours worked for an employee within a particular week, just use a suitable WHERE criteria. No need to use DENSE_RANK or similar windowed functions for this.
Assuming you have a #Week parameter, that contains an integer (0 for current week, 1 for last week, 2 for week before that, etc.):
SELECT
employee_id
SUM(ROUND(CAST(DATEDIFF(MINUTE, start_time, finish_time) AS NUMERIC(18, 0)) / 60, 2)) AS total_h
FROM
Timesheet t
WHERE
t.employee_id = #employee_id AND
date_worked BETWEEN DATEADD(ww, DATEDIFF(ww,0,GETDATE()) - #Week, 0)
AND DATEADD(ww, DATEDIFF(ww,0,GETDATE()) - #Week, 0) + 7
Here, I've used the current date (GETDATE()) as the base date, but you could just replace it with 20130925, if that's what you need.

how to query the count of records from first day to last day of the month

I would like to get the count of every day records from my table.
For example I have a table “Employee” with the following fields ID, EmpNo, DateHired.
And I have the following records
ID EmpNo DateHired
1 000001 3/2/2013 12:00:00 AM
2 000002 3/14/2013 12:00:00 AM
3 000003 3/14/2013 12:00:00 AM
4 000004 3/21/2013 12:00:00 AM
5 000005 4/2/2013 12:00:00 AM
6 000006 4/3/2013 12:00:00 AM
7 000007 4/3/2013 12:00:00 AM
8 000008 4/3/2013 12:00:00 AM
9 000009 4/3/2013 12:00:00 AM
10 000010 4/4/2013 12:00:00 AM
11 000011 4/5/2013 12:00:00 AM
12 000012 5/1/2013 12:00:00 AM
And the current month is april,
how can I get this value:
Count Day
0 4/1/2013 12:00:00 AM
1 4/2/2013 12:00:00 AM
4 4/3/2013 12:00:00 AM
1 4/4/2013 12:00:00 AM
1 4/5/2013 12:00:00 AM
0 4/6/2013 12:00:00 AM
0 4/7/2013 12:00:00 AM
0 4/8/2013 12:00:00 AM
Up to
0 4/30/2013 12:00:00 AM
You need to create a calendar for the whole month of April in order to get the whole dates of the month. With the aid of using Common Table Expression, you can get what you want.
After creating a calendar, join it with table Employee using LEFT JOIN so dates will have no matches on table Employee will still be included on the result.
WITH April_Calendar
AS
(
SELECT CAST('20130401' as datetime) AS [date]
UNION ALL
SELECT DATEADD(dd, 1, [date])
FROM April_Calendar
WHERE DATEADD(dd, 1, [date]) <= '20130430'
)
SELECT a.date, COUNT(b.DateHired) totalCount
FROM April_Calendar a
LEFT JOIN Employee b
ON a.date = b.DateHired
GROUP BY a.date
ORDER BY a.date
SQLFiddle Demo
Try
SELECT COUNT(*) as 'Count', DateHired as 'Day'
FROM Employee
WHERE DateHired BEWTEEN %date1 AND %date2
GROUP BY DateHired
Untested, should work though.
This could be the query...
SELECT COUNT(ID) as Count, DateHired FROM Employee GROUP BY DateHired
Following can be helpful in the case...
http://www.sqlite.org/lang_datefunc.html
Hope it helps..