Given Start Time , End Time & Secs. Obtain secs in a specific Hour - sql

I'm using Vertica Database. I am trying to get the total secs in a particular hour from the following example session data. Any sample SQL code would be very helpful - Thanks
start time end time session length(secs)
2010-02-21 20:30:00 2010-02-21 23:30:00 10800
2010-02-21 21:30:00 2010-02-21 22:30:00 3600
2010-02-21 21:45:00 2010-02-21 21:59:00 840
2010-02-21 22:00:00 2010-02-21 22:20:00 1200
2010-02-21 22:30:00 2010-02-21 23:30:00 3600
Desired Output
hour secs_in_that_hour
20 1800
21 6240
22 8400
23 3600

You would need a table containing every hour, so that you could join it in. That join would be based on the hour being within start and end time and then you can extract the time using (min(hour end,end time) - max(hour start,start time)). Then group on the hour and sum.
Since I don't know vertica, I have no complete answer to this.

Vertica is based on PostgresSQL, especially language-wise. The best thing you could do is look up Postgres's Date Time functions and related tutorials. I haven't found an instance where a Postgres time function does not work in Vertica.
http://www.postgresql.org/docs/8.0/interactive/functions-datetime.html
There is probably a datediff type function you can use. (Sorry, I don't have to time to look it up.)

See Vertica function
TIMESERIES Clause
Provides gap-filling and interpolation (GFI) computation, an important component of time series analytics computation. See Using Time Series Analytics in the Programmer's Guide for details and examples.
Syntax
TIMESERIES slice_time AS 'length_and_time_unit_expression' OVER (
... [ window_partition_clause (page 147) [ , ... ] ]
... ORDER BY time_expression )
... [ ORDER BY table_column [ , ... ] ]

The simplest way is to just extract epoch (number of seconds) on the interval (the difference between timestamps).
As for the overlapping sums, you'll need to first break it out by hour. Some of these hours don't exist so you'll need to generate them using a TIMESERIES clause.
The idea will be to first create your hourly time slices, then theta join to find (and fan out) for all possible matches on this. This is basically looking for any and all overlaps of the time range. Luckily, this is pretty simple as it is just anywhere the start time is before the end of the slice and the end time is greater than the start of the slice.
Then you use greatest and least to find the actual time to start and stop within the slice, subtract them out, convert interval to seconds and done.
See below for the example.
with slices as (
select slice_time slice_time_start, slice_time + interval '1 hour' slice_time_end
from (
select min(start_time) time_range from mytest
union all
select max(end_time) from mytest
) range
timeseries slice_time as '1 HOUR' over (order by range.time_range)
)
select slice_time_start "hour", extract(epoch from sum( least(end_time, slice_time_end)-greatest(slice_time_start, start_time))) secs_in_that_hour
from slices join mytest on ( start_time < slice_time_end and end_time > slice_time_start)
group by 1
order by 1
There may be some edge cases or so additional filtering needed if your data isn't so clean.

Related

Count overnight hours as one day

I have a dataset where certain operations occur during the overnight hours which I'd like to attribute to the day before.
For example, anything happening between 2/23 8pm and 2/24 6am should be included in 2/23's metrics rather than 2/24. Anything from 6:01 am to 7:59pm should be counted in 2/24's metrics.
I've seen a few posts about decrementing time by 6 hours but that doesn't work in this case.
Is there a way to use an If function to specify that midnight-6am should be counted as date-1 rather than date without affecting the metrics for the 6am - 7:59pm hours?
Thanks in advance! Also, a SQL newbie here so apologies if I have lots of followup questions.
You can use date_add with -6 hours and then optionally cast the timestamp as a date.
create table t (dcol datetime);
insert into t values
('2022-02-25 06:01:00'),
('2022-02-25 06:00:00'),
('2022-02-25 05:59:00');
SELECT CAST(DATE_ADD(dcol, INTERVAL -6 HOUR)AS DATE) FROM t;
| CAST(DATE_ADD(dcol, INTERVAL -6 HOUR)AS DATE) |
| :-------------------------------------------- |
| 2022-02-25 |
| 2022-02-25 |
| 2022-02-24 |
db<>fiddle here
As said in the comments, your requirement is the occurrences in a 6 AM to 6 AM day instead of a 12-12 day. You can achieve this by decreasing the time by 6 hours as shown in #Kendle’s answer. Another way to do it is to use an IF condition as shown below. Here, the date is decremented if the time is before 6 AM on each day and the new date is put in a new column.
Query:
SELECT
IF
(TIME(eventTime) <= "06:00:00",
DATE_ADD(DATE(eventTime), INTERVAL -1 DAY),
DATE(eventTime)) AS newEventTime
FROM
`project.dataset.table`
ORDER BY
eventTime;
Output from sample data:
As seen in the output, timestamps before 6 AM are considered for the previous day while the ones after are considered in the current day.

Get count of matching time ranges for every minute of the day in Postgres

Problem
I have a table of records each containing id, in_datetime, and out_datetime. A record is considered "open" during the time between the in_datetime and out_datetime. I want to know how many time records were "open" for each minute of the day (regardless of date). For example, for the last 90 days I want to know how many records were "open" at 3:14 am, then 3:15 am, then 3:16 am, then... If no records were "open" at 2:00 am the query should return 0 or null instead of excluding the row, thus 1440 rows should always be returned (the number of minutes in a day). Datetimes are stored in UTC and need to be cast to a time zone.
Simplified example graphic
record_id | time_range
| 0123456789 (these are minutes past midnight)
1 | =========
2 | ===
3 | =======
4 | ===
5 | ==
______________________
result 3323343210
Desired output
time | count of open records at this time
00:00 120
00:01 135
00:02 132
...
23:57 57
23:58 62
23:59 60
No more than 1440 records would ever be returned as there are only 1440 minutes in the day.
What I've tried
1.) In a subquery, I currently generate a minutely series of times for the entire range of each time record. I then group those by time and get a count of the records per minute.
Here is a db-fiddle using my current query:
select
trs.minutes,
count(trs.minutes)
from (
select
generate_series(
DATE_TRUNC('minute', (time_records.in_datetime::timestamptz AT TIME ZONE 'America/Denver')),
DATE_TRUNC('minute', (time_records.out_datetime::timestamptz AT TIME ZONE 'America/Denver')),
interval '1 min'
)::time as minutes
from
time_records
) trs
group by
trs.minutes
This works but is quite inefficient and takes several seconds to run due to the size of my table. Additionally, it excludes times when no records were open. I think somehow I could use window functions to count the number of overlapping time records for each minute of the day, but I don't quite understand how to do that.
2.) Modifying Gordon Linoff's query in his answer below, I came to this (db-fiddle link):
with tr as (
select
date_trunc('minute', (tr.in_datetime::timestamptz AT TIME ZONE 'America/Denver'))::time as m,
1 as inc
from
time_records tr
union all
select
(date_trunc('minute', (tr.out_datetime::timestamptz AT TIME ZONE 'America/Denver')) + interval '1 minute')::time as m,
-1 as inc
from
time_records tr
union all
select
minutes::time,
0
from
generate_series(timestamp '2000-01-01 00:00', timestamp '2000-01-01 23:59', interval '1 min') as minutes
)
select
m,
sum(inc) as changes_at_inc,
sum(sum(inc)) over (order by m) as running_count
from
tr
where
m is not null
group by
m
order by
m;
This runs reasonably quickly, but towards the end of the day (about 22:00 onwards in the linked example) the values turn negative for some reason. Additionally, this query doesn't seem to work correctly with records with time ranges that cross over midnight. It's a step in the right direction, but I unfortunately don't understand it enough to improve on it further.
Here is a faster method. Generate "in" and "out" records for when something gets counted. Then aggregate and use a running sum.
To get all minutes, throw in a generate_series() for the time period in question:
with tr as (
select date_trunc('minute', (tr.in_datetime::timestamptz AT TIME ZONE 'America/Denver')) as m,
1 as inc
from time_records tr
union all
select date_trunc('minute', (tr.out_datetime::timestamptz AT TIME ZONE 'America/Denver')) + interval '1 minute' as m,
-1 as inc
from time_records tr
union all
select generate_series(date_trunc('minute',
min(tr.in_datetime::timestamptz AT TIME ZONE 'America/Denver')),
date_trunc('minute',
max(tr.out_datetime::timestamptz AT TIME ZONE 'America/Denver')),
interval '1 minute'
), 0
from time_records tr
)
select m,
sum(inc) as changes_at_inc,
sum(sum(inc)) over (order by m) as running_count
from tr
group by m
order by m;

Subtracting timestamps in sql

I have the following two cases
Case1
Table1
Name Start_Sub1 End_Sub1 Start_Sub2 End_Sub2
A 2018-09-19 07:42:00 2018-09-19 09:12:00 2018-09-23 04:02:00 2018-09-23 05:09:00
I want to find the total time the student has spent in the exam, i.e in both the subjects. Which function should I use to get this.
Case 2:
Due to human error, the data has been documented like this:
Name Start_Sub1 End_Sub1 Start_Sub2 End_Sub2
A 2018-09-19 07:42:00 2018-09-19 09:12:00 2018-09-19 08:02:00 2018-09-19 02:09:00
In this case, the time is overlapping in both the timestamps. Can the total time spent in the exam be calculated in such a scenario?
You can convert the timestamps to seconds using the EXTRACT() function and find out if the segments overlap or not. The query should look like:
select
*,
case
when extract(epoch from End_Sub1) < extract(epoch from Start_Sub2)
then extract(epoch from End_Sub1) - extract(epoch from Start_Sub1) +
extract(epoch from End_Sub2) - extract(epoch from Start_Sub2)
else extract(epoch from End_Sub2) - extract(epoch from Start_Sub1)
end as diff
from table1
For details on how to get the time difference see Time difference in seconds using Nettezza.

how to write a sql to calculate working hours minus rest time

I have a table of rest time in work shift
Begin end
12:00 12:30
17:30 18:30
Now I want to write a SQL to calculate actual working hours given the start and end time. For example if start at 9:00 and end at 15:00, the actual hours is 6-rest time=5.5 hours and if start at 9:00 and end at 20:00 the actual hours is 10 hours. How to write a procedure to check it in SQL server? Thx.
There are no schema details to work with here, which means the following SQL is generic and will have to be altered to fit your db.
SELECT
(datediff(minute, shiftStartTime, shiftEndTime)
- datediff(minute,breakStartTime,breakEndTime)) / 60.0
FROM yourTable
Notes:
If they can have multiple breaks, you need to sum up all the break times in minutes before deducting it from the shift period.
the calculation is specifically in minutes because the datediff counts the number of boundaries passed, so the date diff in hours between 11:59 and 12:01 is 1, even though the break is 2 minutes, you would count that as 1 hour if you count hours using the function.
If you can provide more schema details, we would be able to craft a more complete statement.
you can try below way using DATEDIFF
select *, CONVERT(time(7),DATEADD(s, DATEDIFF(s,S,E),'00:00:00')) from QQ
http://sqlfiddle.com/#!18/01213d/1
for your case column name will be
select *, CONVERT(time(7),DATEADD(s, DATEDIFF(s,Begin,end),'00:00:00')) from yourtable

Vertica date series is starting one month before specified date

I work with a Vertica database and I needed to make a query that, given two dates, would give me a list of all months between said dates. For example, if I were to give the query 2015-01-01 and 2015-12-31, it would output me the following list:
2015-01-01
2015-02-01
2015-03-01
2015-04-01
2015-05-01
2015-06-01
2015-07-01
2015-08-01
2015-09-01
2015-10-01
2015-11-01
2015-12-01
After a bit of digging, I was able to discover the following query:
SELECT date_trunc('MONTH', ts)::date as Mois
FROM
(
SELECT '2015-01-01'::TIMESTAMP as tm
UNION
SELECT '2015-12-31'::TIMESTAMP as tm
) as t
TIMESERIES ts as '1 month' OVER (ORDER BY tm)
This query works and gives me the following output:
2014-12-01
2015-01-01
2015-02-01
2015-03-01
2015-04-01
2015-05-01
2015-06-01
2015-07-01
2015-08-01
2015-09-01
2015-10-01
2015-11-01
2015-12-01
As you can see, by giving the query a starting date of '2015-01-01' or anywhere in january for that matters, I end up with an extra entry, namely 2014-12-01. In itself, the bug (or whatever you want to call this unexpected behavior) is easy to circumvent (just start in february), but I have to admit my curiosity's piked. Why exactly is the serie starting one month BEFORE the date I specified?
EDIT: Alright, after reading Kimbo's warning and confirming that indeed, long periods will eventually cause problems, I was able to come up with the following query that readjusts the dates correctly.
SELECT ts as originalMonth,
ts +
(
mod
(
day(first_value(ts) over (order by ts)) - day(ts) + day(last_day(ts)),
day(last_day(ts))
)
) as adjustedMonth
FROM
(
SELECT ts
FROM
(
SELECT '2015-01-01'::TIMESTAMP as tm
UNION
SELECT '2018-12-31'::TIMESTAMP as tm
) as t
TIMESERIES ts as '1 month' OVER (ORDER BY tm)
) as temp
The only problem I have is that I have no control over the initial day of the first record of the series. It's set automatically by Vertica to the current day. So if I run this query on the 31st of the month, I wonder how it'll behave. I guess I'll just have to wait for december to see unless someone knows how to get timeseries to behave in a way that would allow me to test it.
EDIT: Okay, so after trying out many different date combinations, I was able to determine that the day which the series starts changes depending on the date you specify. This caused a whole lot of problems... until we decided to go the simple way. Instead of using a month interval, we used a day interval and only selected one specific day per month. WAY simpler and it works all the time. Here's the final query:
SELECT ts as originalMonth
FROM
(
SELECT ts
FROM
(
SELECT '2000-02-01'::TIMESTAMP as tm
UNION
SELECT '2018-12-31'::TIMESTAMP as tm
) as t
TIMESERIES ts as '1 day' OVER (ORDER BY tm)
) as temp
where day(ts) = 1
I think it boils down to this statement from the doc: http://my.vertica.com/docs/7.1.x/HTML/index.htm#Authoring/SQLReferenceManual/Statements/SELECT/TIMESERIESClause.htm
TIME_SLICE can return the start or end time of a time slice, depending
on the value of its fourth input parameter (start_or_end). TIMESERIES,
on the other hand, always returns the start time of each time slice.
When you define a time interval with some start date (2015-01-01, for example), then TIMESERIES ts AS '1 month' will create for its first time slice a slice that starts 1 month ahead of that first data point, so 2014-12-01. When you do DATE_TRUNC('MON', ts), that of course sets the first date value to 2014-12-01 even if your start date is 2015-01-03, or whatever.
e: I want to throw out one more warning -- your use of DATE_TRUNC achieves what you need, I think. But, from the doc: Unlike TIME_SLICE, the time slice length and time unit expressed in [TIMESERIES] length_and_time_unit_expr must be constants so gaps in the time slices are well-defined. This means that '1 month' is actually 30 days exactly. This obviously has problems if you're going for more than a couple years.