Calculate total time without vacations in postgres - sql

I have a database table that represents activities and for each activity, how long it took.
It looks something like this :
activity_id | name | status | start_date | end_date
=================================================================
1 | name1 | WIP | 2019-07-24 ... | 2019-07-24 ...
start_date and end_date are timestamps. I use a view with a column total_time that is described like that:
date_part('day'::text,
COALESCE(sprint_activity.end_date::timestamp with time zone, CURRENT_TIMESTAMP)
- sprint_activity.start_date::timestamp with time zone
) + date_part('hour'::text,
COALESCE(sprint_activity.end_date::timestamp with time zone, CURRENT_TIMESTAMP)
- sprint_activity.start_date::timestamp with time zone
) / 24::double precision AS total_time
I would like to create a table for vacation or half day vacations that looks like:
date | work_percentage
=================================================
2019-07-24 | 0.4
2019-07-23 | 0.7
And then, I would like to calculate total_time in a way that uses this vacations table such that:
If a date is not in the column it's considered to have work_percentage==1
For every date that is in the table, reduce the relative percentage from the total_time query.
So let's take an example:
Activity - "Write report" started at 11-July-2019 14:00 and ended at 15-July-2019 19:00 - so the time diff is 4 days and 5 hours.
The 13th and 14th were weekend so I'd like to have a column in the vacations table that holds 2019-07-13 with work_percentage == 1 and the same for the 14th.
Deducting those vacations, the time diff would be 2 days and 5 hours as the 13th and 14th are not workdays.
Hope this example explains it better.

I think you can take this example and add some modifications based on your database
Just ddl statements to test script
create table activities (
user_id int,
activity_id int,
name text,
status text,
start_date timestamp,
end_date timestamp
);
create table vacations (
user_id int,
date date,
work_percentage numeric
);
insert into activities
values
(1, 1, 'name1', 'WIP', timestamp'2019-07-20 10:00:00', timestamp'2019-07-25 8:00:00'),
(2, 2, 'name2', 'DONE', timestamp'2019-07-28 19:00:00', timestamp'2019-08-01 7:00:00'),
(1, 3, 'name3', 'DONE', timestamp'2019-07-21 12:00:00', timestamp'2019-07-21 15:00:00'),
(-1, 4, 'Write report', 'DONE', timestamp'2019-07-11 14:00:00', timestamp'2019-07-15 19:00:00');
insert into vacations
values
(1, date'2019-07-21', 0.5),
(1, date'2019-07-22', 0),
(1, date'2019-07-23', 0.25),
(2, date'2019-07-29', 0),
(2, date'2019-07-30', 0),
(-1, date'2019-07-13', 0),
(-1, date'2019-07-14', 0);
sql script
with
daily_activity as (
select
*,
date(
generate_series(
date(start_date),
date(end_date),
interval'1 day')
) as date_key
from
activities
),
raw_data as (
select
da.*,
v.work_percentage,
case
when date(start_date) = date(end_date)
then (end_date - start_date) * coalesce(work_percentage, 1)
when date(start_date) = date_key
then (date(start_date) + 1 - start_date) * coalesce(work_percentage, 1)
when date(end_date) = date_key
then (end_date - date(end_date)) * coalesce(work_percentage, 1)
else interval'24 hours' * coalesce(work_percentage, 1)
end as activity_coverage
from
daily_activity as da
left join vacations as v on da.user_id = v.user_id
and da.date_key = v.date
)
select
user_id,
activity_id,
name,
status,
start_date,
end_date,
justify_interval(sum(activity_coverage)) as total_activity_time
from
raw_data
group by
1, 2, 3, 4, 5, 6

Related

Calculating time between lines - How to use an extra timestamp for the last line

I'm having trouble calculating time between lines.
I would like to calculate how much time an user spent at the station on a specific day.
The first problem is the one line offset, result is shown in the line below.
The second problem is how can I use the end of shift time for the last user row.
CREATE TABLE adata (
id serial PRIMARY KEY,
user_id INT NOT NULL,
station_id INT NOT NULL,
shift_stop TIMESTAMP NOT NULL,
shift_date DATE NOT NULL,
created_at TIMESTAMP NOT NULL,
shift_start TIMESTAMP NOT NULL
);
insert into adata (id,user_id,station_id,shift_stop,shift_date,created_at, shift_start) values
(1, 1, 1, '2022-01-01 15:00:00', '2022-01-01','2022-01-01 10:00:00'),
(2, 2, 1, '2022-01-01 15:00:00', '2022-01-01','2022-01-01 10:01:00','2022-01-01 10:00:00'),
(3, 1, 2, '2022-01-01 15:00:00', '2022-01-01','2022-01-01 11:00:00','2022-01-01 10:00:00'),
(4, 2, 2, '2022-01-01 15:00:00', '2022-01-01','2022-01-01 12:00:00','2022-01-01 10:00:00'),
(5, 2, 3, '2022-01-01 15:00:00', '2022-01-01','2022-01-01 12:30:00','2022-01-01 10:00:00');
select
t.user_id,
t.shift_stop,
t.created_at,
EXTRACT(EPOCH FROM (lag(t.created_at) over (partition by t.user_id order by t.created_at ) - t.created_at )) as time, t.station_id,
t.id
FROM adata t
where DATE(t.shift_date) = '2022-01-01'
Exmaple: http://sqlfiddle.com/#!17/a8979/1
You can use COALESCE to use shift_stop value and LEAD to get the time a user spent at a station.
SELECT
*,
EXTRACT(EPOCH FROM (COALESCE(LEAD(created_at) OVER (PARTITION BY user_id, shift_date ORDER BY created_at), shift_stop) - created_at)) AS time
FROM adata
WHERE DATE(shift_date) = '2022-01-01'
ORDER BY user_id, id
Or you can provide a default value for LEAD function
SELECT
*,
EXTRACT(EPOCH FROM (LEAD(created_at, 1, shift_stop) OVER (PARTITION BY user_id, shift_date ORDER BY created_at) - created_at)) AS time
FROM adata
WHERE DATE(shift_date) = '2022-01-01'
ORDER BY user_id, id
Both queries return the same results:
id
user_id
station_id
shift_stop
shift_date
created_at
time
1
1
1
2022-01-01 15:00:00
2022-01-01
2022-01-01 10:00:00
3600.000000
3
1
2
2022-01-01 15:00:00
2022-01-01
2022-01-01 11:00:00
14400.000000
2
2
1
2022-01-01 15:00:00
2022-01-01
2022-01-01 10:01:00
7140.000000
4
2
2
2022-01-01 15:00:00
2022-01-01
2022-01-01 12:00:00
1800.000000
5
2
3
2022-01-01 15:00:00
2022-01-01
2022-01-01 12:30:00
9000.000000
You can check a working demo here

Calculating averages by quarters

I have a table in presto with 2 columns: date and value.
I want to calculate the average of 2nd Quarter's values so the expected result should be:
15.
How can I do this in presto?
date value
2021-01-01 10
2021-01-30 20
2021-02-10 10
2021-04-01 20
2021-04-02 10
2021-07-10 20
You can divide month by 3 and group by the result:
-- sample data
WITH dataset (date, value) AS (
VALUES (date '2021-01-01' , 10),
(date '2021-01-30' , 20),
(date '2021-02-10' , 10),
(date '2021-04-01' , 20),
(date '2021-04-02' , 10),
(date '2021-07-10', 20)
)
--query
SELECT avg(value)
FROM dataset
WHERE month(date) / 3 = 1
GROUP BY month(date) / 3
Output:
_col0
15.0
Use quarter function:
with mytable as (
SELECT * FROM (
VALUES
(date '2021-01-01', 10),
(date '2021-01-30', 20),
(date '2021-02-10', 10),
(date '2021-04-01', 20),
(date '2021-04-02', 10),
(date '2021-07-10', 20)
) AS t (date, value)
)
select quarter(date) as qt, avg(value) as avg
from mytable
where quarter(date)=2
group by quarter(date)
Result:
qt avg
2 15.0

Add missing month in result with values from previous month

I have a result set with month as first column. Some of the month are missing in the result. I need to add previous month record as the missing month till last month.
Current data:
Desired Output:
I have a sql but instead of filling for just missing month it is taking every rows into account and populate it.
select
to_char(generate_series(date_trunc('MONTH',to_date(period,'YYYYMMDD')+interval '1' month),
date_trunc('MONTH',now()+interval '1' day),
interval '1' month) - interval '1 day','YYYYMMDD') as period,
name,age,salary,rating
from( values ('20201205','Alex',35,100,'A+'),
('20210110','Alex',35,110,'A'),
('20210512','Alex',35,999,'A+'),
('20210625','Jhon',20,175,'B-'),
('20210922','Jhon',20,200,'B+')) v (period,name,age,salary,rating) order by 2,3,4,5,1;
Output of this query:
Can someone help in getting desired output.
Regards!!
You can achieve this with a recursive cte like this:
with RECURSIVE ctetest as (SELECT * FROM (values ('2020-12-31'::date,'Alex',35,100,'A+'),
('2021-01-31'::date,'Alex',35,110,'A'),
('2021-05-31'::date,'Alex',35,999,'A+'),
('2021-06-30'::date,'Jhon',20,175,'B-'),
('2021-09-30'::date,'Jhon',20,200,'B+')) v (mth, emp, age, salary, rating)),
cte AS (
SELECT MIN(mth) AS mth, emp, age, salary, rating
FROM ctetest
GROUP BY emp, age, salary, rating
UNION
SELECT COALESCE(n.mth, (l.mth + interval '1 day' + interval '1 month' - interval '1 day')::date), COALESCE(n.emp, l.emp),
COALESCE(n.age, l.age), COALESCE(n.salary, l.salary), COALESCE(n.rating, l.rating)
FROM cte l
LEFT OUTER JOIN ctetest n ON n.mth = (l.mth + interval '1 day' + interval '1 month' - interval '1 day')::date
AND n.emp = l.emp
WHERE (l.mth + interval '1 day' + interval '1 month' - interval '1 day')::date <= (SELECT MAX(mth) FROM ctetest)
)
SELECT * FROM cte order by 2, 1;
Note that although ctetest is not itself recursive, being only used to get the test data, if any cte among multiple ctes are recursive, you must have the recursive keyword after the with.
You can use cross join lateral to fill the gaps and then union all with the original data.
WITH the_table (period, name, age, salary, rating) as ( values
('2020-12-01'::date, 'Alex', 35, 100, 'A+'),
('2021-01-01'::date, 'Alex', 35, 110, 'A'),
('2021-05-01'::date, 'Alex', 35, 999, 'A+'),
('2021-06-01'::date, 'Jhon', 20, 100, 'B-'),
('2021-09-01'::date, 'Jhon', 20, 200, 'B+')
),
t as (
select *, coalesce(
lead(period) over (partition by name order by period) - interval 'P1M',
max(period) over ()
) last_period
from the_table
)
SELECT lat::date period, name, age, salary, rating
from t
cross join lateral generate_series
(period + interval 'P1M', last_period, interval 'P1M') lat
UNION ALL
SELECT * from the_table
ORDER BY name, period;
Please note that using integer data type for a date column is sub-optimal. Better review your data design and use date data type instead. You can then present it as integer if necessary.
period
name
age
salary
rating
2020-12-01
Alex
35
100
A+
2021-01-01
Alex
35
110
A
2021-02-01
Alex
35
110
A
2021-03-01
Alex
35
110
A
2021-04-01
Alex
35
110
A
2021-05-01
Alex
35
999
A+
2021-06-01
Alex
35
999
A+
2021-07-01
Alex
35
999
A+
2021-08-01
Alex
35
999
A+
2021-09-01
Alex
35
999
A+
2021-06-01
Jhon
20
100
B-
2021-07-01
Jhon
20
100
B-
2021-08-01
Jhon
20
100
B-
2021-09-01
Jhon
20
200
B+

Calculate total time worked in a day with multiple stops and starts

I can use DATEDIFF to find the difference between one set of dates like this
DATEDIFF(MINUTE, #startdate, #enddate)
but how would I find the total time span between multiple sets of dates? I don't know how many sets (stops and starts) I will have.
The data is on multiple rows with start and stops.
ID TimeStamp StartOrStop TimeCode
----------------------------------------------------------------
1 2017-01-01 07:00:00 Start 1
2 2017-01-01 08:15:00 Stop 2
3 2017-01-01 10:00:00 Start 1
4 2017-01-01 11:00:00 Stop 2
5 2017-01-01 10:30:00 Start 1
6 2017-01-01 12:00:00 Stop 2
This code would work assuming that your table only store data from one person, and they should be of the order Start/Stop/Start/Stop
WITH StartTime AS (
SELECT
TimeStamp
, ROW_NUMBER() PARTITION BY (ORDER BY TimeStamp) RowNum
FROM
<<table>>
WHERE
TimeCode = 1
), StopTime AS (
SELECT
TimeStamp
, ROW_NUMBER() PARTITION BY (ORDER BY TimeStamp) RowNum
FROM
<<table>>
WHERE
TimeCode = 2
)
SELECT
SUM (DATEDIFF( MINUTE, StartTime.TimeStamp, StopTime.TimeStamp )) As TotalTime
FROM
StartTime
JOIN StopTime ON StartTime.RowNum = StopTime.RowNum
This will work if your starts and stops are reliable. Your sample has two starts in order - 10:00 and 10:30 starts. I assume in production you will have an employee id to group on, so I added this to the sample data in place of the identity column.
Also in production, the CTE sets will be reduced by using a parameter on date. If there are overnight shifts, you would want your stops CTE to use dateadd(day, 1, #startDate) as your upper bound when retrieving end date.
Set up sample:
declare #temp table (
EmpId int,
TimeStamp datetime,
StartOrStop varchar(55),
TimeCode int
);
insert into #temp
values
(1, '2017-01-01 07:00:00', 'Start', 1),
(1, '2017-01-01 08:15:00', 'Stop', 2),
(1, '2017-01-01 10:00:00', 'Start', 1),
(1, '2017-01-01 11:00:00', 'Stop', 2),
(2, '2017-01-01 10:30:00', 'Start', 1),
(2, '2017-01-01 12:00:00', 'Stop', 2)
Query:
;with starts as (
select t.EmpId,
t.TimeStamp as StartTime,
row_number() over (partition by t.EmpId order by t.TimeStamp asc) as rn
from #temp t
where Timecode = 1 --Start time code?
),
stops as (
select t.EmpId,
t.TimeStamp as EndTime,
row_number() over (partition by t.EmpId order by t.TimeStamp asc) as rn
from #temp t
where Timecode = 2 --Stop time code?
)
select cast(min(sub.StartTime) as date) as WorkDay,
sub.EmpId as Employee,
min(sub.StartTime) as ClockIn,
min(sub.EndTime) as ClockOut,
sum(sub.MinutesWorked) as MinutesWorked
from
(
select strt.EmpId,
strt.StartTime,
stp.EndTime,
datediff(minute, strt.StartTime, stp.EndTime) as MinutesWorked
from starts strt
inner join stops stp
on strt.EmpId = stp.EmpId
and strt.rn = stp.rn
)sub
group by sub.EmpId
This works assuming your table has an incremental ID and interleaving start/stop records
--Data sample as provided
declare #temp table (
Id int,
TimeStamp datetime,
StartOrStop varchar(55),
TimeCode int
);
insert into #temp
values
(1, '2017-01-01 07:00:00', 'Start', 1),
(2, '2017-01-01 08:15:00', 'Stop', 2),
(3, '2017-01-01 10:00:00', 'Start', 1),
(4, '2017-01-01 11:00:00', 'Stop', 2),
(5, '2017-01-01 10:30:00', 'Start', 1),
(6, '2017-01-01 12:00:00', 'Stop', 2)
--let's see every pair start/stop and discard stop/start
select start.timestamp start, stop.timestamp stop,
datediff(mi,start.timestamp,stop.timestamp) minutes
from #temp start inner join #temp stop
on start.id+1= stop.id and start.timecode=1
--Sum all for required result
select sum(datediff(mi,start.timestamp,stop.timestamp) ) totalMinutes
from #temp start inner join #temp stop
on start.id+1= stop.id and start.timecode=1
Results
+-------------------------+-------------------------+---------+
| start | stop | minutes |
+-------------------------+-------------------------+---------+
| 2017-01-01 07:00:00.000 | 2017-01-01 08:15:00.000 | 75 |
| 2017-01-01 10:00:00.000 | 2017-01-01 11:00:00.000 | 60 |
| 2017-01-01 10:30:00.000 | 2017-01-01 12:00:00.000 | 90 |
+-------------------------+-------------------------+---------+
+--------------+
| totalMinutes |
+--------------+
| 225 |
+--------------+
Maybe the tricky part is the join clause. We need to join #table with itself by deferring 1 ID. Here is where on start.id+1= stop.id did its work.
In the other hand, for excluding stop/start couple we use start.timecode=1. In case we don't have a column with this information, something like stop.id%2=0 works just fine.

Get time difference between row values grouped by event

I am using Postgres 9.3.3
I have a table with multiple events, two of them are "AVAILABLE" and "UNAVAILABLE". These events are assigned to a specific object. There are also other object ids in this table (removed for clarity):
What I need is the "available" time per day, something like that:
SQL Fiddle
select
object_id, day,
sum(upper(available) - lower(available)) as available
from (
select
g.object_id, date_trunc('day', d) as day,
(
available *
tsrange(date_trunc('day', d), date_trunc('day', d)::date + 1, '[)')
) as available
from
(
select
object_id, event,
tsrange(
timestamp,
lead(timestamp) over(
partition by object_id order by timestamp
),
'[)'
) as available
from events
where event in ('AVAILABLE', 'UNAVAILABLE')
) s
right join
(
generate_series(
(select min(timestamp) from events),
(select max(timestamp) from events),
'1 day'
) g (d)
cross join
(select distinct object_id from events) s
) g on
tsrange(date_trunc('day', d), date_trunc('day', d)::date + 1, '[)') && available and
(event = 'AVAILABLE' or event is null) and
g.object_id = s.object_id
) s
group by 1, 2
order by 1, 2
psql output
object_id | day | available
-----------+---------------------+-----------
1 | 1970-01-02 00:00:00 | 12:00:00
1 | 1970-01-03 00:00:00 | 12:00:00
1 | 1970-01-04 00:00:00 |
1 | 1970-01-05 00:00:00 | 1 day
1 | 1970-01-06 00:00:00 | 1 day
1 | 1970-01-07 00:00:00 | 12:00:00
Table DDL
create table events (
object_id int,
event text,
timestamp timestamp
);
insert into events (object_id, event, timestamp) values
(1, 'AVAILABLE', '1970-01-02 12:00:00'),
(1, 'UNAVAILABLE', '1970-01-03 12:00:00'),
(1, 'AVAILABLE', '1970-01-05 00:00:00'),
(1, 'UNAVAILABLE', '1970-01-07 12:00:00');
Your example output suggests that you want all your objects to be returned, but grouped. If that is the case, this query can do that
select object_id, day, sum(upper(tsrange) - lower(tsrange))
from (
select object_id, date(day) as day, e.tsrange * tsrange(day, day + interval '1' day) tsrange
from generate_series(timestamp '1970-01-01', '1970-01-07', interval '1' day) day
left join (
select object_id,
case event
when 'AVAILABLE' then tsrange(timestamp, lead(timestamp) over (partition by object_id order by timestamp))
else null
end tsrange
from events
where event in ('AVAILABLE', 'UNAVAILABLE')
) e on e.tsrange && tsrange(day, day + interval '1' day)
) d
group by object_id, day
order by day, object_id
But that will output something like that (if you have multiple object_ids):
object_id | day | sum
-----------+--------------+-----------
| '1970-01-01' |
1 | '1970-01-02' | '12:00:00'
1 | '1970-01-03' | '12:00:00'
| '1970-01-04' |
1 | '1970-01-05' | '1 day'
1 | '1970-01-06' | '1 day'
2 | '1970-01-06' | '12:00:00'
1 | '1970-01-07' | '12:00:00'
In my opinion it would make much more sense, if you would query just one object at a time:
select day, sum(upper(tsrange) - lower(tsrange))
from (
select date(day) as day, e.tsrange * tsrange(day, day + interval '1' day) tsrange
from generate_series(timestamp '1970-01-01', '1970-01-07', interval '1' day) day
left join (
select case event
when 'AVAILABLE' then tsrange(timestamp, lead(timestamp) over (partition by object_id order by timestamp))
else null
end tsrange
from events
where event in ('AVAILABLE', 'UNAVAILABLE')
and object_id = 1
) e on e.tsrange && tsrange(day, day + interval '1' day)
) d
group by day
order by day
This will output something, like:
day | sum
--------------+----------
'1970-01-01' |
'1970-01-02' | '12:00:00'
'1970-01-03' | '12:00:00'
'1970-01-04' |
'1970-01-05' | '1 day'
'1970-01-06' | '1 day'
'1970-01-07' | '12:00:00'
I used this schema/data for my outputs:
create table events (
object_id int,
event text,
timestamp timestamp
);
insert into events (object_id, event, timestamp)
values (1, 'AVAILABLE', '1970-01-02 12:00:00'),
(1, 'UNAVAILABLE', '1970-01-03 12:00:00'),
(1, 'AVAILABLE', '1970-01-05 00:00:00'),
(1, 'UNAVAILABLE', '1970-01-07 12:00:00'),
(2, 'AVAILABLE', '1970-01-06 00:00:00'),
(2, 'UNAVAILABLE', '1970-01-06 06:00:00'),
(2, 'AVAILABLE', '1970-01-06 12:00:00'),
(2, 'UNAVAILABLE', '1970-01-06 18:00:00');
This is a partial answer. If we assume that the next event after available is unavailable, then lead() comes to the rescue and the following is a start:
select object_id, to_char(timestamp, 'YYYY-MM-DD') as day,
to_char(nextts - timestamp, 'HH24:MI') as interval
from (select t.*,
lead(timestamp) over (partition by object_id order by timestamp) as nextts
from table t
where event in ('AVAILABLE', 'UNAVAILABLE')
) t
where event = 'AVAILABLE'
group by object_id, to_char(timestamp, 'YYYY-MM-DD');
I suspect, though, that when the interval spans multiple days, you want to split the days into separate parts. This becomes more of a challenge.