Rolling 12 month Sum in PostgreSQL - sql

I need to be able to create a Trailing Twelve Month report using SQL (PostgreSQL) - essentially a window/rolling 12 month sum that sums up the current month's totals + the previous 11 months for each month.
I have this table:
CREATE TABLE order_test(
order_id text,
sale_date date,
delivery_date date,
customer_id text,
vendor_id text,
order_total float);
with these values:
insert into order_test
values ('1', '2016-06-01', '2016-06-10', '2', '3', 200.10),
('2', '2016-06-02', '2016-06-11', '2', '4', 150.50),
('3', '2016-07-02', '2016-07-11', '5', '4', 100.50),
('4', '2016-07-02', '2016-07-11', '1', '4', 150.50),
('5', '2016-07-02', '2016-07-11', '1', '4', 150.50),
('6', '2016-08-02', '2016-08-11', '6', '4', 300.50),
('7', '2016-08-02', '2016-08-11', '6', '4', 150.50),
('8', '2016-09-02', '2016-09-11', '1', '4', 150.50),
('9', '2016-10-02', '2016-10-11', '1', '4', 150.50),
('10', '2016-11-02', '2016-11-11', '1', '4', 150.50),
('11', '2016-12-02', '2016-12-11', '6', '4', 150.50),
('12', '2017-01-02', '2017-01-11', '7', '4', 150.50),
('13', '2017-01-02', '2017-01-11', '1', '4', 150.50),
('14', '2017-01-02', '2017-01-11', '1', '4', 100.50),
('15', '2017-02-02', '2017-02-11', '1', '4', 150.50),
('16', '2017-02-02', '2017-02-11', '1', '4', 150.50),
('17', '2017-03-02', '2017-03-11', '2', '4', 150.50),
('18', '2017-03-02', '2017-03-11', '2', '4', 150.50),
('19', '2017-04-02', '2017-04-11', '6', '4', 120.50),
('20', '2017-05-02', '2017-05-11', '1', '4', 150.50),
('21', '2017-06-02', '2017-06-11', '2', '4', 150.50),
('22', '2017-06-02', '2017-06-11', '1', '4', 130.50),
('23', '2017-07-02', '2017-07-11', '1', '4', 150.50),
('24', '2017-07-02', '2017-07-11', '5', '4', 200.50),
('25', '2017-08-02', '2017-08-11', '1', '4', 150.50),
('26', '2017-09-02', '2017-09-11', '2', '4', 100.50),
('27', '2017-09-02', '2017-10-11', '1', '4', 150.50);
These are individual sales. For each month, I need the previous 11 months + that month's total (sale month).
I've tried a window calculation like this:
select date_trunc('month', sale_date) as sale_month,
sum(order_total) over w as total_sales
from order_test
where (delivery_date < current_date) and
(sale_date >= (date_trunc('month', current_date) - interval '1 year'))
window w as (Partition by date_trunc('month', sale_date)
order by sale_date
rows between current row and 11 following)
but it's giving me this:
sale_month total_sales
1 01.09.2016 00:00:00 150,5
2 01.10.2016 00:00:00 150,5
3 01.11.2016 00:00:00 150,5
4 01.12.2016 00:00:00 150,5
5 01.01.2017 00:00:00 401,5
6 01.01.2017 00:00:00 251
7 01.01.2017 00:00:00 100,5
8 01.02.2017 00:00:00 301
9 01.02.2017 00:00:00 150,5
10 01.03.2017 00:00:00 301
11 01.03.2017 00:00:00 150,5
12 01.04.2017 00:00:00 120,5
13 01.05.2017 00:00:00 150,5
14 01.06.2017 00:00:00 281
15 01.06.2017 00:00:00 130,5
16 01.07.2017 00:00:00 351
17 01.07.2017 00:00:00 200,5
18 01.08.2017 00:00:00 150,5
19 01.09.2017 00:00:00 100,5
where there should only be one row per month.

In inner query derived table, you need to truncate Sale_Date column to month precision using date_trunc and group by the resulting column to get the Month_total sales and then in outer query, use cumulative window sum function on month_total sales data ordering by Sale_Month to get your desired result as below.
SELECT sale_Month
,month_total
,sum(month_total) OVER (
ORDER BY sale_Month ASC rows BETWEEN 11 preceding
AND CURRENT row
) AS Sum_Series
FROM (
SELECT date_trunc('month', Sale_Date) AS Sale_Month
,sum(Order_Total) AS Month_Total
FROM order_test
GROUP BY 1
ORDER BY 1
) t
Kindly note that AND CURRENT row is optional as cumulative window function includes the current row by default, so the query can be rewritten as below.
SELECT sale_Month
,month_total
,sum(month_total) OVER (
ORDER BY sale_Month ASC rows 11 preceding
) AS Sum_Series
FROM (
SELECT date_trunc('month', Sale_Date) AS Sale_Month
,sum(Order_Total) AS Month_Total
FROM order_test
GROUP BY 1
ORDER BY 1
) t
Result:
sale_month month_total sum_series
----------------------------------------------
2016-06-01T00:00:00Z 350.6 350.6
2016-07-01T00:00:00Z 401.5 752.1
2016-08-01T00:00:00Z 451 1203.1
2016-09-01T00:00:00Z 150.5 1353.6
2016-10-01T00:00:00Z 150.5 1504.1
2016-11-01T00:00:00Z 150.5 1654.6
2016-12-01T00:00:00Z 150.5 1805.1
2017-01-01T00:00:00Z 401.5 2206.6
2017-02-01T00:00:00Z 301 2507.6
2017-03-01T00:00:00Z 301 2808.6
2017-04-01T00:00:00Z 120.5 2929.1
2017-05-01T00:00:00Z 150.5 3079.6
2017-06-01T00:00:00Z 281 3010
2017-07-01T00:00:00Z 351 2959.5
2017-08-01T00:00:00Z 150.5 2659
2017-09-01T00:00:00Z 251 2759.5
You can check the demo here

If I understand it correctly, you want all months to have cumulative data for the last 11 months. But the first 11 rows won't have preceding 11 entries to calculate the rolling sum. But you have mentioned that all months should have a cumulative total.
So I believe you are looking for something like this.
with x as (
select date_trunc('month', sale_date) as sale_month,sum(order_total) as monthly_order_total from order_test
group by 1 order by 1 asc)
select sale_month, monthly_order_total,
sum(monthly_order_total ) over (order by sale_month asc rows between 11 preceding and current row)
from x

Related

fetching startdate and enddate from timestemp column and partition by modes in SQL

i am trying to solve this this is table1 and i am trying to have below output i am not able to build up a logic that how could i fetch start date and end date from same timestemp column in SQL.
CREATE TABLE table1 (
`batch` INTEGER,
`timestemp` VARCHAR(8),
`mo` INTEGER,
`speed` INTEGER
);
INSERT INTO table1
(`batch`, `timestemp`, `mo`, `speed`)
VALUES
('1', '00:18:00', '0', '0'),
('1', '01:18:00', '0', '0'),
('1', '02:18:00', '0', '0'),
('1', '03:18:00', '1', '5'),
('1', '04:18:00', '1', '6'),
('1', '05:18:00', '1', '7'),
('1', '06:18:00', '2', '10'),
('1', '07:18:00', '2', '9'),
('1', '08:18:00', '2', '8'),
('1', '09:18:00', '3', '12'),
('1', '10:18:00', '3', '23'),
('1', '11:18:00', '3', '21'),
('1', '12:18:00', '4', '20'),
('1', '13:18:00', '4', '22');
mo=mode
batch
timestemp
mo
speed
1
00:18:00
0
0
1
01:18:00
0
0
1
02:18:00
0
0
1
03:18:00
1
5
1
04:18:00
1
6
1
05:18:00
1
7
1
06:18:00
2
10
1
07:18:00
2
9
1
08:18:00
2
8
1
09:18:00
3
12
1
10:18:00
3
23
1
11:18:00
3
21
1
12:18:00
4
20
1
13:18:00
4
22
ooutput:
batch
start time
end time
mode
1
00:18:00
03:17:00
0
1
03:18:00
06:17:00
1
1
06:18:00
09:17:00
2
1
09:18:00
12:17:00
3
1
12:18:00
13:18:00
4
Schema (MySQL v8.0)
CREATE TABLE table1 (
`batch` INTEGER,
`timestemp` TIME,
`mo` INTEGER,
`speed` INTEGER
);
INSERT INTO table1
(`batch`, `timestemp`, `mo`, `speed`)
VALUES
('1', '00:18:00', '0', '0'),
('1', '01:18:00', '0', '0'),
('1', '02:18:00', '0', '0'),
('1', '03:18:00', '1', '5'),
('1', '04:18:00', '1', '6'),
('1', '05:18:00', '1', '7'),
('1', '06:18:00', '2', '10'),
('1', '07:18:00', '2', '9'),
('1', '08:18:00', '2', '8'),
('1', '09:18:00', '3', '12'),
('1', '10:18:00', '3', '23'),
('1', '11:18:00', '3', '21'),
('1', '12:18:00', '4', '20'),
('1', '13:18:00', '4', '22');
Query
SELECT batch
, mode
, start_time
, COALESCE(SUBTIME(LEAD(start_time) OVER (ORDER BY start_time), '00:01:00'), end_time) end_time
FROM (
SELECT batch
, min(timestemp) start_time
, max(timestemp) end_time
, mo mode
FROM table1
GROUP BY batch, mo
) min_max;
batch
mode
start_time
end_time
1
0
00:18:00
03:17:00
1
1
03:18:00
06:17:00
1
2
06:18:00
09:17:00
1
3
09:18:00
12:17:00
1
4
12:18:00
13:18:00
View on DB Fiddle

How can I get a query displayed monthly with a subquery

I have a query here with a subquery. This shows me the complete result for the year. How do I get it to be displayed to me on a monthly basis. I've tried a few things but always get an error message
Here my query
SELECT
masch_nr, SUM(dauer) AS Prozess_Verfügbarkeit,
(SELECT SUM(dauer)
FROM [hydra1].[hydadm].[ereignis]
WHERE YEAR(begin_ts) = YEAR(CURRENT_TIMESTAMP)
AND masch_nr = 'FIMI1'
AND bmktonr IN ('1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11')) AS Verfügbarkeit
FROM
[hydra1].[hydadm].[ereignis]
WHERE
YEAR(begin_ts) = YEAR(CURRENT_TIMESTAMP)
AND masch_nr = 'FIMI1'
AND bmktonr IN ('7', '11')
GROUP BY
masch_nr
The result should look like this:
Month | Prozess_Verfügbarkeit | Verfügbarkeit
------+-----------------------+--------------
1 | 344 | 4556
2 | 445 | 5654
Thank you
You can probably simplify this by using conditional aggregation
SELECT
YEAR(begin_ts) AS [Year]
, MONTH(begin_ts) AS [Month]
, masch_nr
, SUM(CASE WHEN bmktonr IN ('7', '11')
THEN dauer END) AS Prozess_Verfügbarkeit
, SUM(CASE WHEN bmktonr IN ('1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11')
THEN dauer END) AS Verfügbarkeit
FROM [hydra1].[hydadm].[ereignis]
WHERE masch_nr = 'FIMI1'
AND begin_ts >= DATEADD(YEAR,DATEDIFF(YEAR,0,CURRENT_TIMESTAMP),0)
AND begin_ts < DATEADD(YEAR,DATEDIFF(YEAR,0,CURRENT_TIMESTAMP)+1,0)
GROUP BY YEAR(begin_ts), MONTH(begin_ts), masch_nr
ORDER BY [Year], [Month], masch_nr

How to get distinct count for last x weeks data but group by week in redshift?

I have a below query which I run gives me the single count for month as current month and dates_for_week has list of all the dates for last week from Sunday to Saturday.
select COUNT(DISTINCT(CLIENTID))
FROM process.data
where type = 'pots'
and stype= 'kites'
and tires IN ('abc', 'def', 'ghi', 'jkl')
and comp IN ('data', 'hello', 'world')
AND year = '2020'
-- this is for month october but week 43
and (month = '10' and dates_for_week IN ('18', '19', '20', '21', '22', '23', '24'))
As of now the output I see is this -
Count
-----
982
Now I am trying to make this query dynamic such that it can give me count for past 6 weeks something like below as an output:
Count Week
------------
982 W43
123 W42
126 W41
127 W40
128 W39
129 W38
I am able to convert above query in a dynamic way which gives me the count for current month october and previous week which is 43 and it works fine as shown below but I am not sure how can I change it so that it can give me data for all past 6 weeks in the above output format. It looks like I need to change the month also dynamically for some week to get output for past 6 weeks.
select COUNT(DISTINCT(CLIENTID))
FROM process.data
where type = 'pots'
and stype= 'kites'
and tires IN ('abc', 'def', 'ghi', 'jkl')
and comp IN ('data', 'hello', 'world')
AND year = '2020'
-- this is for month october but week 43
and (
month = extract(month from current_date)
and dates_for_week IN (
select
date_part('d',((DATE_TRUNC('week', CURRENT_DATE) - 9) + row_number() over (order by true))::date)
from process.data
limit 7
)
)
So what I need is this for last 6 week and group by week to give me the count as shown above. Is this possible to do by any chance?
and (month = '10' and dates_for_week IN ('18', '19', '20', '21', '22', '23', '24'))
and (month = '10' and dates_for_week IN ('11', '12', '13', '14', '15', '16', '17'))
and (month = '10' and dates_for_week IN ('4', '5', '6', '7', '8', '9', '10'))
and (month IN ('9', '10') and dates_for_week IN ('27', '28', '29', '30', '1', '2', '3'))
and (month = '9' and dates_for_week IN ('20', '21', '22', '23', '24', '25', '26'))
and (month = '9' and dates_for_week IN ('13', '14', '15', '16', '17', '18', '19'))
You have years, months and days in seperate columns if I understand it correctly. I think the easiest way is to "build" a proper date column and then working with this column.
The following query should give you the last 6 weeks including the current week.
select
EXTRACT(week from TO_DATE(year||'-'||month||'-'|| dates_for_week,'YYYY-MM-DD')) week_num
,COUNT(DISTINCT(CLIENTID))
FROM process.data
where type = 'pots'
and stype= 'kites'
and tires IN ('abc', 'def', 'ghi', 'jkl')
and comp IN ('data', 'hello', 'world')
and TO_DATE(year||'-'||month||'-'|| dates_for_week,'YYYY-MM-DD') >= DATEADD(day,-42,DATE_TRUNC('week', sysdate))
GROUP BY 1
ORDER BY 1 desc
However, there might be a challenge as weeks start on a Monday in redshift, so might need to do a slight manipulation (adding one day):
select
EXTRACT(week from DATEADD(day,1,TO_DATE(year||'-'||month||'-'|| dates_for_week,'YYYY-MM-DD'))) week_num
,COUNT(DISTINCT(CLIENTID))
FROM process.data
where type = 'pots'
and stype= 'kites'
and tires IN ('abc', 'def', 'ghi', 'jkl')
and comp IN ('data', 'hello', 'world')
and DATEADD(day,1,TO_DATE(year||'-'||month||'-'|| dates_for_week,'YYYY-MM-DD')) BETWEEN DATEADD(day,-42,DATE_TRUNC('week', sysdate)) AND DATEADD(day,-1,DATE_TRUNC('week', sysdate))
GROUP BY 1
ORDER BY 1 desc
Debugging:
I would start running this query first, to check if the date is calculated properly
select COUNT(DISTINCT(CLIENTID))
FROM process.data
where type = 'pots'
and stype= 'kites'
and tires IN ('abc', 'def', 'ghi', 'jkl')
and comp IN ('data', 'hello', 'world')
AND year = '2020'
-- this is for month october but week 43
and TO_DATE(year||'-'||month||'-'|| dates_for_week,'YYYY-MM-DD') between '2020-10-18' and '2020-10-24'
Afterwards I would see if the week is calculated correctly:
select
EXTRACT(week from DATEADD(day,1,TO_DATE(year||'-'||month||'-'|| dates_for_week,'YYYY-MM-DD'))) week_num
,COUNT(DISTINCT(CLIENTID))
FROM process.data
where type = 'pots'
and stype= 'kites'
and tires IN ('abc', 'def', 'ghi', 'jkl')
and comp IN ('data', 'hello', 'world')
AND year = '2020'
-- this is for month october but week 43
and TO_DATE(year||'-'||month||'-'|| dates_for_week,'YYYY-MM-DD') between '2020-10-18' and '2020-10-24'
group by 1
order by 1
And last but not least I would extend the timeframe and make it dynamic:
select
EXTRACT(week from DATEADD(day,1,TO_DATE(year||'-'||month||'-'|| dates_for_week,'YYYY-MM-DD'))) week_num
,COUNT(DISTINCT(CLIENTID))
FROM process.data
where type = 'pots'
and stype= 'kites'
and tires IN ('abc', 'def', 'ghi', 'jkl')
and comp IN ('data', 'hello', 'world')
AND year = '2020'
-- this is for month october but week 43
and DATEADD(day,1,TO_DATE(year||'-'||month||'-'|| dates_for_week,'YYYY-MM-DD')) Between DATEADD(day,-42,DATE_TRUNC('week', sysdate)) and DATEADD(day,-1,DATE_TRUNC('week', sysdate))
group by 1
order by 1
Assuming that you have some kind of date column, you can simply use something like this
select date_part(w, {your_date_column) as week_number,
COUNT(DISTINCT(CLIENTID))
FROM process.data
where type = 'pots'
and stype= 'kites'
and tires IN ('abc', 'def', 'ghi', 'jkl')
and comp IN ('data', 'hello', 'world')
AND year = '2020'
group by 1
You could use order by and limit:
select year, week, COUNT(DISTINCT CLIENTID)
from process.data
where type = 'pots' and
stype= 'kites' and
tires IN ('abc', 'def', 'ghi', 'jkl') and
comp IN ('data', 'hello', 'world')
group by year, dates_for_week
order by year desc, week desc
limit 6;
This is assuming that you have a week column, which seems like a reasonable assumption.
This is a simple way to accomplish what you want to do. I am guessing that on Redshift it should have decent performance.

Oracle - Distinct with a Group by date

I have a table like this in a Oracle 11g Database
TABLE REC
REC_ID NUMBER
CARD_ID NUMBER
REC_STATUS NUMBER
REC_DATE TIMESTAMP
I want to know how many cards have recharged in a given period, but I want this information grouped by date.
If a card_id has already been counted on one date, it should not be counted on another next (distinct).
Here some test data
with
REC ( REC_ID, CARD_ID, REC_STATUS, REC_DATE ) as (
select '1', '100', '1', SYSTIMESTAMP - 5 from dual union all
select '2', '100', '1', SYSTIMESTAMP - 5 from dual union all
select '3', '200', '1', SYSTIMESTAMP - 5 from dual union all
select '4', '100', '1', SYSTIMESTAMP - 4 from dual union all
select '5', '300', '1', SYSTIMESTAMP - 4 from dual union all
select '6', '200', '1', SYSTIMESTAMP - 4 from dual union all
select '7', '100', '1', SYSTIMESTAMP - 3 from dual union all
select '8', '400', '1', SYSTIMESTAMP - 3 from dual union all
select '9', '400', '1', SYSTIMESTAMP - 3 from dual union all
select '10', '400', '1', SYSTIMESTAMP - 2 from dual union all
select '11', '300', '1', SYSTIMESTAMP - 2 from dual union all
select '12', '100', '1', SYSTIMESTAMP - 2 from dual union all
select '13', '400', '1', SYSTIMESTAMP - 2 from dual
)
-- end of test data
When I execute the query like this, i have the total count of 4, which is correct.
SELECT
COUNT(DISTINCT CARD_ID) COUNT
FROM REC
WHERE REC_STATUS = 1
Result
|COUNT|
| 4 |
But, when I try this way, i have the total count of 10.
This is because when I use a "group by" on a date and use the "distinct" in the ID, the distinction will only work for the grouped date, not for the total period.
SELECT
TRUNC(REC_DATE) DAT,
COUNT(DISTINCT CARD_ID) COUNT
FROM REC
WHERE REC_STATUS = 1
GROUP BY TRUNC(REC_DATE)
Result
DAT | COUNT
14/06/19 | 2
13/06/19 | 3
15/06/19 | 3
12/06/19 | 2
What can I do to apply the distinct to the period total and still keep the grouping by date?
Perhaps you just want the earliest date for each card:
SELECT TRUNC(MIN_REC_DATE) as DAT,
COUNT(*) as COUNT
FROM (SELECT CARD_ID, MIN(REC_DATE) as MIN_REC_DATE
FROM REC
WHERE REC_STATUS = 1
GROUP BY CARD_ID
) R
GROUP BY TRUNC(MIN_REC_DATE);

Postgresql. Merge and split date ranges from two tables by set of keys

I'm trying to combine multiple date ranges from two same tables with same or diferrent data. (PostgreSql 9.*)
Tables structure:
CREATE TABLE "first_activities" (
"id" int4 NOT NULL DEFAULT nextval('first_activities_id_seq'::regclass),
"start_time" timestamptz,
"end_time" timestamptz,
"activity_type" int2,
"user_id" int4
)
WITH (OIDS=FALSE);
ALTER TABLE "first_activities" ADD PRIMARY KEY ("id") NOT DEFERRABLE INITIALLY IMMEDIATE;
CREATE TABLE "second_activities" (
"id" int4 NOT NULL DEFAULT nextval('second_activities_id_seq'::regclass),
"start_time" timestamptz,
"end_time" timestamptz,
"activity_type" int2,
"user_id" int4
)
WITH (OIDS=FALSE);
ALTER TABLE "second_activities" ADD PRIMARY KEY ("id") NOT DEFERRABLE INITIALLY IMMEDIATE;
Data in First table:
INSERT INTO "first_activities" VALUES
(NULL, '2014-10-31 01:00:00', '2014-10-31 02:00:00', '3', '1'),
(NULL, '2014-10-31 02:00:00', '2014-10-31 03:00:00', '4', '1'),
(NULL, '2014-10-31 03:00:00', '2014-10-31 04:00:00', '2', '1'),
(NULL, '2014-10-31 04:30:00', '2014-10-31 05:00:00', '3', '1'),
(NULL, '2014-10-31 05:30:00', '2014-11-01 06:00:00', '4', '1'),
(NULL, '2014-11-01 06:30:00', '2014-11-01 07:00:00', '2', '1'),
(NULL, '2014-11-01 07:30:00', '2014-11-01 08:00:00', '1', '1'),
(NULL, '2014-11-01 08:00:00', '2014-11-01 09:00:00', '3', '1'),
(NULL, '2014-11-01 09:00:00', '2014-11-02 10:00:00', '4', '1'),
(NULL, '2014-08-27 10:00:00', '2014-08-27 11:00:00', '2', '1'),
(NULL, '2014-08-27 11:00:00', '2014-08-27 12:00:00', '1', '1'),
Data in Second table:
INSERT INTO "second_activities" VALUES
(NULL, '2014-10-31 01:00:00', '2014-10-31 02:00:00', '3', '1'),
(NULL, '2014-10-31 02:00:00', '2014-10-31 03:00:00', '4', '1'),
-- Differece from first table
(NULL, '2014-10-31 03:30:00', '2014-10-31 04:00:00', '1', '1'),
(NULL, '2014-10-31 04:25:00', '2014-10-31 04:35:00', '3', '1'),
(NULL, '2014-10-31 04:45:00', '2014-10-31 05:35:00', '3', '1'),
-- End of Difference from first table
(NULL, '2014-08-27 10:00:00', '2014-08-27 11:00:00', '2', '1'),
(NULL, '2014-08-27 11:00:00', '2014-08-27 12:00:00', '1', '1');
How can I filter result set that starting from query:
SELECT * FROM first_activities UNION ALL SELECT * from second_activities
ORDER BY start_time ASC;
to get final result set.
Final Result:
-- merge same data by user_id and activity_type and combine with
-- and split data with range intersection but not same user_id and acitvity_type
-- start_time end_time type user_id
'2014-10-31 01:00:00', '2014-10-31 02:00:00', '3', '1');
'2014-10-31 02:00:00', '2014-10-31 03:00:00', '4', '1');
--data dont merge. Splitting with range intersection
'2014-10-31 03:00:00', '2014-10-31 03:30:00', '2', '1'); -- from first table
'2014-10-31 03:30:00', '2014-10-31 04:00:00', '1', '1'); -- from second table
-- data merged by same user_id and activity_type
'2014-10-31 04:25:00', '2014-10-31 05:35:00', '3', '1');
'2014-10-31 05:30:00', '2014-11-01 06:00:00', '4', '1');
'2014-11-01 06:30:00', '2014-11-01 07:00:00', '2', '1');
'2014-11-01 07:30:00', '2014-11-01 08:00:00', '1', '1');
'2014-11-01 08:00:00', '2014-11-01 09:00:00', '3', '1');
'2014-11-01 09:00:00', '2014-11-02 10:00:00', '4', '1');
'2014-08-27 10:00:00', '2014-08-27 11:00:00', '2', '1');
'2014-08-27 11:00:00', '2014-08-27 12:00:00', '1', '1');
The issue can be reduced to the question of how to combine (compact) a group of adjacent (overlapping) ranges into one. I had to deal with this some time ago and found it a bit complicated in plain SQL. There is a simple solution using loop in a plpgsql code, but I found also a general solution with the use of custom aggregate.
The function compact_ranges(anyrange, anyrange) returns the sum of ranges if they are adjacent (overlapping) or the second range otherwise:
create or replace function compact_ranges(anyrange, anyrange)
returns anyrange language sql as $$
select case
when $1 && $2 or $1 -|- $2 then $1+ $2
else $2
end
$$;
create aggregate compact_ranges_agg (anyrange) (
sfunc = compact_ranges,
stype = anyrange
);
The aggregate has a narrow scope of usage, it should be called as a progressive window function like in the example:
with test(rng) as (
values
('[ 1, 2)'::int4range),
('[ 3, 7)'), -- group 1
('[ 5, 10)'), -- group 1
('[ 6, 8)'), -- group 1
('[11, 17)'), -- group 2
('[12, 16)'), -- group 2
('[15, 16)'), -- group 2
('[18, 19)')
)
select distinct on (lower(new_rng)) new_rng
from (
select *, compact_ranges_agg(rng) over (order by rng) new_rng
from test
) s
order by lower(new_rng), new_rng desc;
new_rng
---------
[1,2)
[3,10)
[11,17)
[18,19)
(4 rows)
In the same way you can use it for your tables:
with merged as (
select tstzrange(start_time, end_time) rng, activity_type, user_id
from first_activities
union
select tstzrange(start_time, end_time) rng, activity_type, user_id
from second_activities
),
compacted as (
select distinct on (user_id, activity_type, lower(new_rng))
lower(new_rng) start_time,
upper(new_rng) end_time,
activity_type,
user_id
from (
select
user_id, activity_type,
compact_ranges_agg(rng) over (partition by user_id, activity_type order by rng) new_rng
from merged
) s
order by user_id, activity_type, lower(new_rng), new_rng desc
)
select
start_time,
case when end_time > lead(start_time) over w then lead(start_time) over w else end_time end,
activity_type,
user_id
from compacted
window w as (order by start_time)
order by start_time;
The result:
start_time | end_time | activity_type | user_id
------------------------+------------------------+---------------+---------
2014-08-27 10:00:00+02 | 2014-08-27 11:00:00+02 | 2 | 1
2014-08-27 11:00:00+02 | 2014-08-27 12:00:00+02 | 1 | 1
2014-10-31 01:00:00+01 | 2014-10-31 02:00:00+01 | 3 | 1
2014-10-31 02:00:00+01 | 2014-10-31 03:00:00+01 | 4 | 1
2014-10-31 03:00:00+01 | 2014-10-31 03:30:00+01 | 2 | 1
2014-10-31 03:30:00+01 | 2014-10-31 04:00:00+01 | 1 | 1
2014-10-31 04:25:00+01 | 2014-10-31 05:30:00+01 | 3 | 1
2014-10-31 05:30:00+01 | 2014-11-01 06:00:00+01 | 4 | 1
2014-11-01 06:30:00+01 | 2014-11-01 07:00:00+01 | 2 | 1
2014-11-01 07:30:00+01 | 2014-11-01 08:00:00+01 | 1 | 1
2014-11-01 08:00:00+01 | 2014-11-01 09:00:00+01 | 3 | 1
2014-11-01 09:00:00+01 | 2014-11-02 10:00:00+01 | 4 | 1
(12 rows)