SQL/Postgres datetime division / normalizing - sql

I have this activity table
+--------------+------------------+
| Field | Type |
+--------------+------------------+
| id | int(11) unsigned |
| start_date | timestamp |
| end_date | timestamp |
| ... | |
+--------------+------------------+
I need a view which groups these activities by start_date by DAY, but in such a way that, if the end_date is not in the same day as start_date, the view contain the entry again but with the start_date set to 00:00 of the next day.. (and so on, repeated as many times as needed until the start_date is in the same day as the end_date)
As an example:
if the activity table contains:
+--------------+----------------------------+----------------------------+
| id | start_date | end_date |
+--------------+----------------------------+----------------------------+
| 1 | 2014-12-02 14:12:00+00 | 2014-12-03 06:45:00+00 |
| 2 | 2014-12-05 15:25:00+00 | 2014-12-05 07:29:00+00 |
+--------------+----------------------------+----------------------------+
The view should contain:
+--------------+----------------------------+----------------------------+
| activity_id | start_date | end_date |
+--------------+----------------------------+----------------------------+
| 1 | 2014-12-02 14:12:00+00 | 2014-12-02 23:59:59+00 |
| 1 | 2014-12-03 00:00:00+00 | 2014-12-03 06:45:00+00 |
| 2 | 2014-12-05 15:25:00+00 | 2014-12-05 07:29:00+00 |
+--------------+----------------------------+----------------------------+
Any help would be greatly appreciated!
PS: I'm using postgresql

To get the needed rows, start by using a set returning function along with a lateral join. From there, use CASE statements and date arithmetics to pull out the relevant values.
Here's an example to get you started:
with data as (
select id, start_date, end_date
from (values
(1, '2014-12-02 14:12:00+00'::timestamptz, '2014-12-03 06:45:00+00'::timestamptz),
(2, '2014-12-05 15:25:00+00'::timestamptz, '2014-12-05 07:29:00+00'::timestamptz)
) as rows (id, start_date, end_date)
)
select data.id,
case days.d = date_trunc('day', data.start_date)
when true then data.start_date
else days.d
end as start_date,
case days.d = date_trunc('day', data.end_date)
when true then data.end_date
else days.d + interval '1 day' - interval '1 sec'
end as end_date
from data
join generate_series(
date_trunc('day', data.start_date),
date_trunc('day', data.end_date),
'1 day'
) as days (d)
on days.d >= date_trunc('day', data.start_date)
and days.d <= date_trunc('day', data.end_date)
id | start_date | end_date
----+------------------------+------------------------
1 | 2014-12-02 15:12:00+01 | 2014-12-02 23:59:59+01
1 | 2014-12-03 00:00:00+01 | 2014-12-03 07:45:00+01
2 | 2014-12-05 16:25:00+01 | 2014-12-05 08:29:00+01
(3 rows)
As an aside, depending on what you're doing, it might make more sense for you to use a date range:
with data as (
select id, start_date, end_date
from (values
(1, '2014-12-02 14:12:00+00'::timestamptz, '2014-12-03 06:45:00+00'::timestamptz),
(2, '2014-12-05 07:25:00+00'::timestamptz, '2014-12-05 15:29:00+00'::timestamptz)
) as rows (id, start_date, end_date)
)
select data.id,
tstzrange(data.start_date, data.end_date)
from data;
id | tstzrange
----+-----------------------------------------------------
1 | ["2014-12-02 15:12:00+01","2014-12-03 07:45:00+01")
2 | ["2014-12-05 08:25:00+01","2014-12-05 16:29:00+01")
(2 rows)

Related

How can I insert data in table form into another table provided some specific conditions are satisfied

Logic: If today is Monday (reference 'time' table), data present in S should be inserted into M (along with a sent_day column which will have today's date).
If today is not Monday, dates corresponding to current week (unique week_id) should be checked in M table. If any of these dates are available in M then S should not be inserted into M. If these dates are not available in M then S should be inserted into M
time
+------------+------------+----------------+
| cal_dt | cal_day | week_id |
+------------+------------+----------------+
| 2020-03-23 | Monday | 123 |
| 2020-03-24 | Tuesday | 123 |
| 2020-03-25 | Wednesday | 123 |
| 2020-03-26 | Thursday | 123 |
| 2020-03-27 | Friday | 123 |
| 2020-03-30 | Monday | 124 |
| 2020-03-31 | Tueday | 124 |
+------------+------------+----------------+
M
+------------+----------+-------+
| sent_day | item | price |
+------------+----------+-------+
| 2020-03-11 | pen | 10 |
| 2020-03-11 | book | 50 |
| 2020-03-13 | Eraser | 5 |
| 2020-03-13 | sharpner | 5 |
+------------+----------+-------+
S
+----------+-------+
| item | price |
+----------+-------+
| pen | 25 |
| book | 20 |
| Eraser | 10 |
| sharpner | 3 |
+----------+-------+
Insert INTO M
SELECT
CASE WHEN(SELECT cal_day FROM time WHERE cal_dt = current_date) = 'Monday' THEN s.*
ELSE
(CASE WHEN(SELECT cal_dt FROM time WHERE wk_id =(SELECT wk_id FROM time WHERE cal_dt = current_date ) NOT IN (SELECT DISTINCT sent_day FROM M) THEN 1 ELSE 0 END)
THEN s.* ELSE END
FROM s
I would do this in two separate INSERT statements:
The first condition ("if today is monday") is quite easy:
insert into m (sent_day, item, price)
select current_date, item, price
from s
where exists (select *
from "time"
where cal_dt = current_date
and cal_day = 'Monday');
I find storing the date and the week day a bit confusing as the week day can easily be extracted from the day. For the test "if today is Monday" it's actually not necessary to consult the "time" table at all:
insert into m (sent_day, item, price)
select current_date, item, price
from s
where extract(dow from current_date) = 1;
The second part is a bit more complicated, but if I understand it correctly, it should be something like this:
insert into m (sent_day, item, price)
select current_date, item, price
from s
where not exists (select *
from m
where m.sent_day in (select cal_dt
from "time" t
where cal_dt = current_date
and cal_day <> 'Monday'));
If you just want a single INSERT statement, you could simply do a UNION ALL between the two selects:
insert into m (sent_day, item, price)
select current_date, item, price
from s
where extract(dow from current_date) = 1
union all
select current_date, item, price
from s
where not exists (select *
from m
where m.sent_day in (select cal_dt
from "time" t
where cal_dt = current_date
and cal_day <> 'Monday'));

Query to get user count for every month and previous month

We have the following activity table and would like to query it to get the number of unique users for each month and the previous month. The date field (createdat) is a timestamp. The query needs to work in PostgreSQL.
Activity table:
| id | userid | createdat | username |
|--------|--------|-------------------------|----------------|
| 1d658a | 4957f3 | 2016-12-06 21:16:35:942 | Tom Jones |
| 3a86e3 | 684edf | 2016-12-03 21:16:35:943 | Harry Smith |
| 595756 | 582107 | 2016-12-26 21:16:35:944 | William Hanson |
| 2c87fe | 784723 | 2016-12-07 21:16:35:945 | April Cordon |
| 32509a | 4957f3 | 2016-12-20 21:16:35:946 | Tom Jones |
| 72e703 | 582107 | 2017-01-01 21:16:35:947 | William Hanson |
| 6d658a | 582107 | 2016-12-06 21:16:35:948 | William Hanson |
| 5c077c | 5934c4 | 2016-12-06 21:16:35:949 | Sandra Holmes |
| 92142b | 57ea5c | 2016-12-15 21:16:35:950 | Lucy Lawless |
| 3dd0a6 | 5934c4 | 2016-12-04 21:16:35:951 | Sandra Holmes |
| 43509a | 4957f3 | 2016-11-20 21:16:35:946 | Tom Jones |
| 85142b | 57ea5c | 2016-11-15 21:16:35:950 | Lucy Lawless |
| 7c87fe | 784723 | 2017-1-07 21:16:35:945 | April Cordon |
| 9c87fe | 784723 | 2017-2-07 21:16:35:946 | April Cordon |
Results:
| Month | UserThis Month | UserPreviousMonth |
|----------|----------------|-------------------|
| Dec 2016 | 6 | 2 |
| Jan 2017 | 2 | 6 |
| Feb 2017 | 1 | 2 |
You can try this query. to_char to get MON YYYY, You can try to write a subquery with lag windows function to get UserPreviousMonth count.
SELECT *
FROM (SELECT To_char(createdat, 'MON YYYY') Months,
Count(DISTINCT username) UserThisMonth,
Lag(Count(DISTINCT username)) OVER (
ORDER BY Date_part('year', createdat),
Date_part('month',createdat)
) UserPreviousMonth
FROM t
GROUP BY Date_part('year', createdat),
To_char(createdat, 'MON YYYY'),
Date_part('month', createdat)) t
WHERE userpreviousmonth IS NOT NULL
sqlfiddle:http://sqlfiddle.com/#!15/45e52/2
| months | userthismonth | userpreviousmonth |
|----------|---------------|-------------------|
| DEC 2016 | 6 | 2 |
| JAN 2017 | 2 | 6 |
| FEB 2017 | 1 | 2 |
EDIT
Types of Dec 2016 and Jan 2017 ... must string, because DateTime need a full date like 2017-01-01. If you need to be sorted and used on the graph I will suggest you sort on this query years and months columns, then make date string on front-end.
SELECT *
FROM (SELECT Date_part('year', createdat) years,
Date_part('month', createdat) months,
Count(DISTINCT username) UserThisMonth,
Lag(Count(DISTINCT username)) OVER (
ORDER BY Date_part('year', createdat),
Date_part('month',createdat)
) UserPreviousMonth
FROM user_activity
GROUP BY Date_part('year', createdat),
Date_part('month', createdat)) t
WHERE userpreviousmonth IS NOT NULL
sqlfiddle:http://sqlfiddle.com/#!15/2da2b/4
| years | months | userthismonth | userpreviousmonth |
|-------|--------|---------------|-------------------|
| 2016 | 12 | 6 | 2 |
| 2017 | 1 | 2 | 6 |
| 2017 | 2 | 1 | 2 |
Fastest and simplest with date_trunc(). Use to_char() once to display the month in preferred format:
WITH cte AS (
SELECT date_trunc('month', createdat) AS mon
, count(DISTINCT username) AS ct
FROM activity
GROUP BY 1
)
SELECT to_char(t1.mon, 'MON YYYY') AS month
, t1.ct AS users_this_month
, t2.ct AS users_previous_month
FROM cte t1
LEFT JOIN cte t2 ON t2.mon = t1.mon - interval '1 mon'
ORDER BY t1.mon;
db<>fiddle here
You commented:
the "Month" field in the results table needs to be a "date" data type so it can be sorted and used on the graph.
For this, simply cast in the final SELECT:
SELECT t1.mon::date AS month ...
Grouping and ordering by a (truncated) timestamp value is more efficient (and reliable) than by multiple values or a text representation.
The result includes the first month ('NOV 2016' in your demo), showing NULL for users_previous_month - like for any previous month without entries. You might want to display 0 instead or drop the row ...
Related:
How to get the date and time from timestamp in PostgreSQL select query?
PostgreSQL: running count of rows for a query 'by minute'
Aside: usernames in the form of "Tom Jones" are typically not unique. You'll want to operate with a unique ID instead.
Edit:
Shamelessly using #D-Shih's superior method of generating year/month combinations.
A couple of solutions:
WITH ua AS (
SELECT
TO_CHAR(createdate, 'YYYYMM') AS year_month,
COUNT(DISTINCT userid) distinct_users
FROM user_activity
GROUP BY
TO_CHAR(createdate, 'YYYYMM')
)
SELECT * FROM (
SELECT
TO_DATE(ua.year_month || '01', 'YYYYMMDD')
+ INTERVAL '1 month'
- INTERVAL '1 day'
AS month_end,
ua.distinct_users,
LAG(ua.distinct_users) OVER (ORDER BY ua.year_month) distinct_users_last_month
FROM ua
) uas WHERE uas.distinct_users_last_month IS NOT NULL
ORDER BY month_end DESC;
No windowing required:
WITH ua AS (
SELECT
TO_CHAR(createdate, 'YYYYMM') AS year_month,
TO_CHAR(createdate - INTERVAL '1 MONTH', 'YYYYMM') AS last_month,
COUNT(DISTINCT userid) AS distinct_users
FROM user_activity
GROUP BY
TO_CHAR(createdate, 'YYYYMM'),
TO_CHAR(createdate - INTERVAL '1 MONTH', 'YYYYMM')
)
SELECT
TO_DATE(ua1.year_month || '01', 'YYYYMMDD')
+ INTERVAL '1 month'
- INTERVAL '1 day'
AS month_end,
ua1.distinct_users,
ua2.distinct_users AS last_distinct_users
FROM
ua ua1 LEFT OUTER JOIN ua ua2
ON ua1.year_month = ua2.last_month
WHERE ua2.distinct_users IS NOT NULL
ORDER BY ua1.year_month DESC;
DDL:
CREATE TABLE user_activity (
id varchar(50),
userid varchar(50),
createdate timestamp,
username varchar(50)
);
COMMIT;
Data:
INSERT INTO user_activity VALUES ('1d658a','4957f3','20161206 21:16:35'::timestamp,'Tom Jones');
INSERT INTO user_activity VALUES ('3a86e3','684edf','20161203 21:16:35'::timestamp,'Harry Smith');
INSERT INTO user_activity VALUES ('595756','582107','20161226 21:16:35'::timestamp,'William Hanson');
INSERT INTO user_activity VALUES ('2c87fe','784723','20161207 21:16:35'::timestamp,'April Cordon');
INSERT INTO user_activity VALUES ('32509a','4957f3','20161220 21:16:35'::timestamp,'Tom Jones');
INSERT INTO user_activity VALUES ('72e703','582107','20170101 21:16:35'::timestamp,'William Hanson');
INSERT INTO user_activity VALUES ('6d658a','582107','20161206 21:16:35'::timestamp,'William Hanson');
INSERT INTO user_activity VALUES ('5c077c','5934c4','20161206 21:16:35'::timestamp,'Sandra Holmes');
INSERT INTO user_activity VALUES ('92142b','57ea5c','20161215 21:16:35'::timestamp,'Lucy Lawless');
INSERT INTO user_activity VALUES ('3dd0a6','5934c4','20161204 21:16:35'::timestamp,'Sandra Holmes');
INSERT INTO user_activity VALUES ('43509a','4957f3','20161120 21:16:35'::timestamp,'Tom Jones');
INSERT INTO user_activity VALUES ('85142b','57ea5c','20161115 21:16:35'::timestamp,'Lucy Lawless');
INSERT INTO user_activity VALUES ('7c87fe','784723','20170107 21:16:35'::timestamp,'April Cordon');
INSERT INTO user_activity VALUES ('9c87fe','784723','20170207 21:16:35'::timestamp,'April Cordo');
COMMIT;

Split rows on different days if summing hours value to given day exceeds midnight

I have a structure like this
+-----+-----+------------+----------+------+----------------------+---+
| Row | id | date | time | hour | description | |
+-----+-----+------------+----------+------+----------------------+---+
| 1 | foo | 2018-03-02 | 19:00:00 | 8 | across single day | |
| 2 | bar | 2018-03-02 | 23:00:00 | 1 | end at midnight | |
| 3 | qux | 2018-03-02 | 10:00:00 | 3 | inside single day | |
| 4 | quz | 2018-03-02 | 23:15:00 | 2 | with minutes | |
+-----+-----+------------+----------+------+----------------------+---+
(I added the description column only to understand the context, for analysis purpose is useless)
Here is the statement to generate table
WITH table AS (
SELECT "foo" as id, CURRENT_dATE() AS date, TIME(19,0,0) AS time,8 AS hour
UNION ALL
SELECT "bar", CURRENT_dATE(), TIME(23,0,0), 1
UNION ALL
SELECT "qux", CURRENT_dATE(), TIME(10,0,0), 3
UNION ALL
SELECT "quz", CURRENT_dATE(), TIME(23,15,0), 2
)
SELECT * FROM table
Adding the hour value to the given time, I need to split the row on multiple ones, if the sum goes on the next day.
Jumps on multiple days are NOT to be considered, like +27 hours (this should simplify the scenario)
My initial idea was starting from adding the hours value in a date field, in order to obtain start and end limits of the interval
SELECT
id,
DATETIME(date, time) AS date_start,
DATETIME_ADD(DATETIME(date, time), INTERVAL hour HOUR) AS date_end
FROM table
here is the result
+-----+-----+---------------------+---------------------+---+
| Row | id | date_start | date_end | |
+-----+-----+---------------------+---------------------+---+
| 1 | foo | 2018-03-02T19:00:00 | 2018-03-03T03:00:00 | |
| 2 | bar | 2018-03-02T23:00:00 | 2018-03-03T00:00:00 | |
| 3 | qux | 2018-03-02T10:00:00 | 2018-03-02T13:00:00 | |
| 4 | quz | 2018-03-02T23:15:00 | 2018-03-03T01:15:00 | |
+-----+-----+---------------------+---------------------+---+
but now I'm stuck on how to proceed considering the existing interval.
Starting from this table, the rows should be splitted if the day change, like
+-----+-----+------------+-------------+----------+-------+--+
| Row | id | date | hourt_start | hour_end | hours | |
+-----+-----+------------+-------------+----------+-------+--+
| 1 | foo | 2018-03-02 | 19:00:00 | 00:00:00 | 5 | |
| 2 | foo | 2018-03-03 | 00:00:00 | 03:00:00 | 3 | |
| 3 | bar | 2018-03-02 | 23:00:00 | 00:00:00 | 1 | |
| 4 | qux | 2018-03-02 | 10:00:00 | 13:00:00 | 3 | |
| 5 | quz | 2018-03-02 | 23:15:00 | 00:00:00 | 0.75 | |
| 6 | quz | 2018-03-03 | 00:00:00 | 01:15:00 | 1.25 | |
+-----+-----+------------+-------------+----------+-------+--+
I tried to study a similar scenario from an already analyzed scenario, but I was unable to adapt it for handling the day component as well.
My whole final scenario will include both this approach and the other one analyzed in the other question (split on single days and then split on given breaks of hours), but I can approach these 2 themes separately, first query split with day (this question) and then split on time breaks (other question)
Interesting problem ... I tried the following:
Create a second table creating all the new rows starting at midnight
UNION ALL it with source table while correcting hours of old rows accordingly
Commented Result:
WITH table AS (
SELECT "foo" as id, CURRENT_dATE() AS date, TIME(19,0,0) AS time,8 AS hour
UNION ALL
SELECT "bar", CURRENT_dATE(), TIME(23,0,0), 1
UNION ALL
SELECT "qux", CURRENT_dATE(), TIME(10,0,0), 3
)
,table2 AS (
SELECT
id,
-- create datetime, add hours, then cast as date again
CAST( datetime_add( datetime(date, time), INTERVAL hour HOUR) AS date) date,
time(0,0,0) AS time -- losing minutes and seconds
-- substract hours to midnight
,hour - (24-EXTRACT(HOUR FROM time)) hour
FROM
table
WHERE
date != CAST( datetime_add( datetime(date,time), INTERVAL hour HOUR) AS date) )
SELECT
id
,date
,time
-- correct hour if midnight split
,IF(EXTRACT(hour from time)+hour > 24,24-EXTRACT(hour from time),hour) hour
FROM
table
UNION ALL
SELECT
*
FROM
table2
Hope, it makes sense.
Of course, if you need to consider jumps over multiple days, the correction fails :)
Here a possibile solution I came up starting from #Martin Weitzmann approach.
I used 2 different ways:
ids where there is a "jump" on the day
ids which are in the same day
and a final UNION ALL of the two data
I forgot to mention the first time that the hours value of the input value can be float (portion of hours) so I added that too.
#standardSQL
WITH
input AS (
-- change of day
SELECT "bap" as id, CURRENT_dATE() AS date, TIME(19,0,0) AS time, 8.0 AS hour UNION ALL
-- end at midnight
SELECT "bar", CURRENT_dATE(), TIME(23,0,0), 1.0 UNION ALL
-- inside single day
SELECT "foo", CURRENT_dATE(), TIME(10,0,0), 3.0 UNION ALL
-- change of day with minutes and float hours
SELECT "qux", CURRENT_dATE(), TIME(23,15,0), 2.5 UNION ALL
-- start from midnight
SELECT "quz",CURRENT_dATE(), TIME(0,0,0), 4.5
),
-- Calculate end_date and end_time summing hours value
table AS (
SELECT
id,
date AS start_date,
time AS start_time,
EXTRACT(DATE FROM DATETIME_ADD(DATETIME(date,time), INTERVAL CAST(hour*3600 AS INT64) SECOND)) AS end_date,
EXTRACT(TIME FROM DATETIME_ADD(DATETIME(date,time), INTERVAL CAST(hour*3600 AS INT64) SECOND)) AS end_time
FROM input
),
-- portion that start from start_time and end at midnight
start_to_midnight AS (
SELECT
id,
start_time,
start_date,
TIME(23,59,59) as end_time,
start_date as end_date
FROM
table
WHERE end_date > start_date
),
-- portion that start from midnightand end at end_time
midnight_to_end AS (
SELECT
id,
TIME(0,0,0) as start_time,
end_date as start_date,
end_time,
end_date
FROM
table
WHERE
end_date > start_date
-- Avoid rows that starts from 0:0:0 and ends to 0:0:0 (original row ends at 0:0:0)
AND end_time != TIME(0,0,0)
)
-- Union of the 3 tables
SELECT
id,
start_date,
start_time,
end_time
FROM (
SELECT id, start_time, end_time, start_date FROM table WHERE start_date = end_date
UNION ALL
SELECT id, start_time, end_time, start_date FROM start_to_midnight
UNION ALL
SELECT id, start_time, end_time, start_date FROM midnight_to_end
)
ORDER BY id,start_date,start_time
Here is the provided output
+-----+-----+------------+------------+----------+---+
| Row | id | start_date | start_time | end_time | |
+-----+-----+------------+------------+----------+---+
| 1 | bap | 2018-03-03 | 19:00:00 | 23:59:59 | |
| 2 | bap | 2018-03-04 | 00:00:00 | 03:00:00 | |
| 3 | bar | 2018-03-03 | 23:00:00 | 23:59:59 | |
| 4 | foo | 2018-03-03 | 10:00:00 | 13:00:00 | |
| 5 | qux | 2018-03-03 | 23:15:00 | 23:59:59 | |
| 6 | qux | 2018-03-04 | 00:00:00 | 01:45:00 | |
| 7 | quz | 2018-03-03 | 00:00:00 | 04:30:00 | |
+-----+-----+------------+------------+----------+---+

ORDER BY datediff() reversed with unnamed column in mariaDB

Example queries below, can you tell me why they return a different result? Specifically, why the order is reversed.
There's only one difference between the two: in the second query, the datediff in the select clause is named and re-used in the ORDER BY, while in the first one it is not named.
This is with mariadb 10.1.18 as well as 10.2.12.
First query:
select Start_Date, min(End_Date), datediff(min(End_Date), Start_Date)
from (
select Start_Date
from Projects
where Start_Date
not in (select End_Date from Projects)
) a,
(select End_Date
from Projects
where End_Date
not in (select Start_Date from Projects)
) b
where Start_Date < End_Date
group by Start_Date
order by datediff(min(End_Date), Start_Date)
;
+------------+---------------+-------------------------------------+
| Start_Date | min(End_Date) | datediff(min(End_Date), Start_Date) |
+------------+---------------+-------------------------------------+
| 2015-10-01 | 2015-10-04 | 3 |
| 2015-10-13 | 2015-10-15 | 2 |
| 2015-10-28 | 2015-10-29 | 1 |
| 2015-10-30 | 2015-10-31 | 1 |
+------------+---------------+-------------------------------------+
Second query:
select Start_Date, min(End_Date), datediff(min(End_Date), Start_Date) as 'test_diff'
from (
select Start_Date
from Projects
where Start_Date
not in (select End_Date from Projects)
) a,
(select End_Date
from Projects
where End_Date
not in (select Start_Date from Projects)
) b
where Start_Date < End_Date
group by Start_Date
order by test_diff
;
+------------+---------------+-----------+
| Start_Date | min(End_Date) | test_diff |
+------------+---------------+-----------+
| 2015-10-28 | 2015-10-29 | 1 |
| 2015-10-30 | 2015-10-31 | 1 |
| 2015-10-13 | 2015-10-15 | 2 |
| 2015-10-01 | 2015-10-04 | 3 |
+------------+---------------+-----------+
your second Query has
order by test_diff
and your first does not in you add this line to the first it will show as the second does.
if you change the order by on the second query to
order by test_diff DESC
it will look like the first, putting the result in DESCending Order
Sounds like a bug. Please file a bug report.
Meanwhile, the problem can probably be worked around by making a subquery of most of the query, then doing the ORDER BY in the outside query.

SQL: how to select the time range of an employee and group by timeframe interval 30mins

I have three columns, time-in(timestamp), time-out(timestamp) and employee.
I need to get the number of employees that work in a specific timeframe (30min interval). For example:
employee_id timein timeout
101 10:10 12:59
102 9:07 12:16
103 11:16 12:08
I need a query that will give me this result
timeframe count(employee_id)
09:00 1
09:30 1
10:00 2
10:30 2
11:00 3
11:30 3
12:00 3
12:30 1
I really hope I made it clear. Thanks
See this demo: http://sqlfiddle.com/#!17/2477f/1
SELECT x.timeframe, count(employee_id)
FROM (
select time '8:00' + x * interval '30 minute' as timeframe,
time '8:00' + (x+1) * interval '30 minute' as timeframe_end
from generate_series(0,10) x
) x
LEFT JOIN employee t
/* (StartA <= EndB) and (EndA >= StartB) */
ON x.timeframe <= t.timeout
AND x.timeframe_end >= t.timein
GROUP BY x.timeframe
ORDER BY 1
SELECT x.timeframe, count(employee_id)
FROM (
select time '8:00' + x * interval '30 minute' as timeframe,
time '8:00' + (x+1) * interval '30 minute' as timeframe_end
from generate_series(0,12) x
) x
LEFT JOIN employee t
/* (StartA < EndB) and (EndA > StartB) */
ON x.timeframe < t.timeout
AND x.timeframe_end > t.timein
GROUP BY x.timeframe
ORDER BY 1
| timeframe | count |
|-----------|-------|
| 08:00:00 | 0 |
| 08:30:00 | 0 |
| 09:00:00 | 1 |
| 09:30:00 | 1 |
| 10:00:00 | 2 |
| 10:30:00 | 2 |
| 11:00:00 | 3 |
| 11:30:00 | 3 |
| 12:00:00 | 3 |
| 12:30:00 | 1 |
| 13:00:00 | 1 |
| 13:30:00 | 1 |
| 14:00:00 | 0 |
The join condition uses a formula from this answer for checking whether two ranges overlap or not:
(StartA < EndB) and (EndA > StartB)
The demo also shows how the query behaves for edge cases:
(113, '13:00', '13:01'),
(115, '13:30', '14:00')
The latter employe started at 13:30 and finished at 14:00, so it is included in 13:30 timeframe, but is not included in 14:00 timeframe.
| 13:00:00 | 1 |
| 13:30:00 | 1 |
| 14:00:00 | 0 |
The problem might be with employes that start and finish a work multiple times within the same timeframe (workers who make frequent coffee breaks), for example:
(113, '13:00', '13:01'),
(113, '13:12', '13:15'),
(113, '13:22', '13:26')
for such cases you need to count distinct employees, using: count(DISTINCT employee_id)
Try something like this.
SELECT timeframe,
COUNT (employee_id)
FROM employee a
RIGHT JOIN
(SELECT *
FROM generate_series (TIMESTAMP '2017-09-01 09:00:00',
TIMESTAMP '2017-09-01 17:00:00',
INTERVAL '0.5 HOUR' ) AS timeframe) b
ON b.timeframe >= timein
AND b.timeframe <= timeout
GROUP BY timeframe
ORDER BY timeframe ;
SELECT out_time-in_time time_frame, count(*) FROM
TABLE_NAME GROUP BY out_time-in_time
I tested against a sample local data.
employee_id | in_time | out_time
-------------+----------+----------
101 | 09:07:00 | 12:08:00
102 | 10:07:00 | 17:08:00
103 | 12:07:00 | 17:08:00
104 | 12:07:00 | 17:08:00
105 | 10:07:00 | 17:08:00
Output from the query.
time_frame | count
------------+-------
07:01:00 | 2
03:01:00 | 1
05:01:00 | 2
You can include rounding off logic accordingly on finding the difference.
SQL Fiddle
PostgreSQL 9.6 Schema Setup:
CREATE TABLE emp_time
("employee_id" int, "timein" time, "timeout" time)
;
INSERT INTO emp_time
("employee_id", "timein", "timeout")
VALUES
(101, '10:10', '12:59'),
(102, '9:07', '12:16'),
(103, '11:16', '12:08')
;
Query 1:
SELECT
slot_start
, slot_end
, count(employee_id)
FROM (
SELECT slot_start, slot_start + INTERVAL '30 MINUTE' slot_end
FROM generate_series (TIMESTAMP '2017-01-01 09:00:00', TIMESTAMP '2017-01-01 16:30:00', INTERVAL '30 MINUTE' ) AS slot_start
) t
LEFT JOIN emp_time et ON et.timein < t.slot_end::time and et.timeout > t.slot_start::time
GROUP BY
slot_start
, slot_end
ORDER BY
slot_start
, slot_end
;
Results:
| slot_start | slot_end | count |
|----------------------|----------------------|-------|
| 2017-01-01T09:00:00Z | 2017-01-01T09:30:00Z | 1 |
| 2017-01-01T09:30:00Z | 2017-01-01T10:00:00Z | 1 |
| 2017-01-01T10:00:00Z | 2017-01-01T10:30:00Z | 2 |
| 2017-01-01T10:30:00Z | 2017-01-01T11:00:00Z | 2 |
| 2017-01-01T11:00:00Z | 2017-01-01T11:30:00Z | 3 |
| 2017-01-01T11:30:00Z | 2017-01-01T12:00:00Z | 3 |
| 2017-01-01T12:00:00Z | 2017-01-01T12:30:00Z | 3 |
| 2017-01-01T12:30:00Z | 2017-01-01T13:00:00Z | 1 |
| 2017-01-01T13:00:00Z | 2017-01-01T13:30:00Z | 0 |
| 2017-01-01T13:30:00Z | 2017-01-01T14:00:00Z | 0 |
| 2017-01-01T14:00:00Z | 2017-01-01T14:30:00Z | 0 |
| 2017-01-01T14:30:00Z | 2017-01-01T15:00:00Z | 0 |
| 2017-01-01T15:00:00Z | 2017-01-01T15:30:00Z | 0 |
| 2017-01-01T15:30:00Z | 2017-01-01T16:00:00Z | 0 |
| 2017-01-01T16:00:00Z | 2017-01-01T16:30:00Z | 0 |
| 2017-01-01T16:30:00Z | 2017-01-01T17:00:00Z | 0 |