Group rows with start- end date by month on PostgresSQL - sql

I have a database with a tbl_registration with rows that look like
ID | start_date_time | end_date_time | ...
1 | 2021-01-01 14:00:15 | 2021-01-01 14:00:15
2 | 2021-02-01 14:00:15 | null
4 | 2021-05-15 14:00:15 | 2024-01-01 14:00:15
5 | 2019--15 14:00:15 | 2024-01-01 14:00:15
endDate can be null
It contains 500.000 - 1.000.000 of records
We want to create an overview of year grouped by month that shows the amount of records that are active in that month. So a registration is counted per month if it lies (partially) in that month based on start and end date.
I can do a query per month like this
select count (id)
from tbl_registration
where
(r.end_date_time >= to_timestamp('01/01/2021 00:00:00', 'DD/MM/YYYY HH24:MI:SS') or r.end_date_time is null )
and r.start_date_time < to_timestamp('01/02/2021 00:00:00', 'DD/MM/YYYY HH24:MI:SS');
But that forces me to repeat this query 12 times.
I don't see a creative way to solve this in one query that would give me as a result 12 rows, one for each month
I've been looking at the generate_series function, but I don't see how I can group on the comparison of those start- and end dates

Postgres supports generate_series() . . . so generate the dates you want then then construct the query. One method is:
select gs.mon, x.cnt
from generate_series('2021-01-01'::date, '2021-12-01'::date, interval '1 month') gs(mon) left join lateral
(select count(*) as cnt
from tbl_registration
where r.end_date_time >= gs.mon or r.end_date_time is null) and
r.start_date_time < gs.mon + interval '1 month'
) x
on 1=1;

Related

How can i split 1 row into multiply rows in SQL

I want to split something like this:
Value | Startdate | Enddate
XXXX | 2.July | 16 August
Into this:
Value | Startdate | Enddate
XXXX | 2.July | 31 July
XXXX | 1.August | 16 August
The value is not important for now.
If I understand correctly, you want to split your range into different months. A convenient method uses generate_series():
select value, greatest(startdate, gs.mon), least(enddate, gs.mon + interval '1 month - 1 day')
from t cross join lateral
generate_series(date_trunc('month', startdate), date_trunc('month', enddate), interval '1 month'
) gs(mon)
Here is a db<>fiddle.

for each month, count entries with interval

i'm having hard times with creating a statistics with sum of ongoing subscriptions per month
i have table subscriptions
id | created_at | cancelled_at
----------------------------------------
1 | 2020-12-29 13:56:12 | null
2 | 2021-02-15 01:06:25 | 2021-04-21 19:35:31
3 | 2021-03-22 02:42:19 | null
4 | 2021-04-21 19:35:31 | null
and statistics should look as follows:
month | count
---------------
12/2020 | 1 -- #1
01/2021 | 1 -- #1
02/2021 | 2 -- #1 + #2
03/2021 | 3 -- #1 + #2 + #3
04/2021 | 3 -- #1 + #3 + #4, not #2 since it ends that month
05/2021 | 3 -- #1 + #3 + #4
so far i was able to make list of all months i need the stats for:
select generate_series(min, max, '1 month') as "month"
from (
select date_trunc('month', min(created_at)) as min,
now() as max
from subscriptions
) months;
and get the right number of subscriptions for specific month
select sum(
case
when
make_date(2021, 04, 1) >= date_trunc('month', created_at)
and make_date(2021, 04, 1); < date_trunc('month', coalesce(cancelled_at, now() + interval '1 month'))
then 1
else 0
end
) as total
from subscriptions
-- returns 3
but i am struggling combining those together... would OVER (which i am unexperienced with) be of any use for me? i found Count cumulative total in Postgresql but it's different case (dates are fixed)... or is the proper approach to use function with FOR somehow?
You can use generate_series() to generate the months and then a correlated subquery to calculate the actives:
select yyyymm,
(select count(*)
from subscriptions s
where s.created_at < gs.yyyymm + interval '1 month' and
(s.cancelled_at > gs.yyyymm + interval '1 month' or s.cancelled_at is null)
) as count
from generate_series('2020-12-01'::date, '2021-05-01'::date, interval '1 month'
) gs(yyyymm);

Database Query to generate a Time-based Chart

I have a logins table in the following (simplified) structure:
id | login_time
---------
1 | 2019-02-04 18:14:30.026361+00
2 | 2019-02-04 22:10:19.720065+00
3 | 2019-02-06 15:51:53.799014+00
Now I want to generate chart like this:
https://prnt.sc/mifz6y
Basically I want to show the logins within the past 48 hours.
My current query:
SELECT count(*), date_trunc('hour', login_time) as time_trunced FROM user_logins
WHERE login_time > now() - interval '48' hour
GROUP BY time_trunced
ORDER BY time_trunced DESC
This works as long as there are entries for every hour. However, if in some hour there were no logins, there will be no entry selected, like this:
time_trunced | count
---------------------
12:00 | 1
13:00 | 2
15:00 | 3
16:00 | 5
I would need a continous query, so that I can simply put the count values into an array:
time_trunced | count
---------------------
12:00 | 1
13:00 | 2
14:00 | 0 <-- This is missing
15:00 | 3
16:00 | 5
Based on that I can simply transform the query result into an array like [1, 2, 0, 3, 5] and pass that to my frontend.
Is this possible with postgresql? Or do I need to implement my own logic?
I think I would do:
select gs.h, count(ul.login_time)
from generate_series(
date_trunc('hour', now() - interval '48 hour'),
date_trunc('hour', now()),
interval '1 hour'
) gs(h) left join
user_logins ul
on ul.login_time >= gs.h and
ul.login_time < gs.h + interval '1 hour'
group by gs.h
order by gs.h;
This can almost certainly be tidied up a bit but should give you some ides. Props to clamp for the generate_series() tip :
SELECT t.time_trunced,coalesce(l.login_count,0) as logins
FROM
(
-- Generate an inline view with all hours between the min & max values in user_logins table
SELECT date_trunc('hour',a.min_time)+ interval '1h' * b.hr_offset as time_trunced
FROM (select min(login_time) as min_time from user_logins) a
JOIN (select generate_series(0,(select ceil((EXTRACT(EPOCH FROM max(login_time))-EXTRACT(EPOCH FROM min(login_time)))/3600) from user_logins)::int) as hr_offset) b on true
) t
LEFT JOIN
(
-- OP's original query tweaked a bit
SELECT count(*) as login_count, date_trunc('hour', login_time) as time_trunced
FROM user_logins
GROUP BY time_trunced
) l on t.time_trunced=l.time_trunced
order BY 1 desc;

SQLite: Sum of differences between two dates group by every date

I have a SQLite database with start and stop datetimes
With the following SQL query I get the difference hours between start and stop:
SELECT starttime, stoptime, cast((strftime('%s',stoptime)-strftime('%s',starttime)) AS real)/60/60 AS diffHours FROM tracktime;
I need a SQL query, which delivers the sum of multiple timestamps, grouped by every day (also whole dates between timestamps).
The result should be something like this:
2018-08-01: 12 hours
2018-08-02: 24 hours
2018-08-03: 12 hours
2018-08-04: 0 hours
2018-08-05: 1 hours
2018-08-06: 14 hours
2018-08-07: 8 hours
You can try this, use CTE RECURSIVE make a calendar table for every date start time and end time, and do some calculation.
Schema (SQLite v3.18)
CREATE TABLE tracktime(
id int,
starttime timestamp,
stoptime timestamp
);
insert into tracktime values
(11,'2018-08-01 12:00:00','2018-08-03 12:00:00');
insert into tracktime values
(12,'2018-09-05 18:00:00','2018-09-05 19:00:00');
Query #1
WITH RECURSIVE cte AS (
select id,starttime,date(starttime,'+1 day') totime,stoptime
from tracktime
UNION ALL
SELECT id,
date(starttime,'+1 day'),
date(totime,'+1 day'),
stoptime
FROM cte
WHERE date(starttime,'+1 day') < stoptime
)
SELECT strftime('%Y-%m-%d', starttime),(strftime('%s',CASE
WHEN totime > stoptime THEN stoptime
ELSE totime
END) -strftime('%s',starttime))/3600 diffHour
FROM cte;
| strftime('%Y-%m-%d', starttime) | diffHour |
| ------------------------------- | -------- |
| 2018-08-01 | 12 |
| 2018-09-05 | 1 |
| 2018-08-02 | 24 |
| 2018-08-03 | 12 |
View on DB Fiddle

Postgres SQL: New table with unique rows for all dates in date range

Looking for some guidance on where to begin with a query that is hurting my head.
I have a table that shows trial start & end dates for each account over time, looks something like this:
account_id trial_start trial_end
========== =========== =========
123 1/2/2017 1/9/17
234 1/8/2017 1/21/17
456 1/15/2017 5/10/17
The trial start and end dates vary and I want a resulting table that shows me each of the account IDs that were in a trial each week of the year. This way I can say how many active trials I had in each week of the year and do things like see how many of those accounts were actually being logged into during that week of their trial. Perhaps something like:
week account_id
=========== =========
1/1/2017 123
1/8/2017 123
1/8/2017 234
1/15/2017 234
1/15/2017 456
1/22/2017 456
...
5/7/2017 456
I have a reference table that has a row for each week of the year and I feel like I need to somehow join my account IDs to each row in that table, but I can't figure out how I might map each week in between the start and end date to each week's row such that i'm capturing the dates in between :/
Form a Cartesian product of accounts and weeks, then place the week staring date amongst the trail dates (this will pickup each week involved).
SQL Fiddle
PostgreSQL 9.6 Schema Setup:
CREATE TABLE TrialDates
(account_id int, trial_start timestamp, trial_end timestamp)
;
INSERT INTO TrialDates
("account_id", "trial_start", "trial_end")
VALUES
(678, '2017-01-04 00:00:00', '2017-01-05 00:00:00'),
(123, '2017-01-02 00:00:00', '2017-01-09 00:00:00'),
(234, '2017-01-08 00:00:00', '2017-01-21 00:00:00'),
(456, '2017-01-15 00:00:00', '2017-05-10 00:00:00')
;
CREATE TABLE Weeks
("week_start" timestamp)
;
INSERT INTO Weeks
("week_start")
VALUES
('2017-01-16 00:00:00'),
('2017-01-09 00:00:00'),
('2017-01-02 00:00:00')
;
Query 1: (edited)
select w.week_start, td.account_id
from (select distinct account_id from TrialDates) a
cross join Weeks w
inner join TrialDates td on a.account_id = td.account_id
and (
w.week_start >= td.trial_start and w.week_start < trial_end
OR
td.trial_start >= w.week_start and td.trial_start < w.week_start + interval '7 days'
)
order by w.week_start, td.account_id
Results:
| week_start | account_id |
|----------------------|------------|
| 2017-01-02T00:00:00Z | 123 |
| 2017-01-02T00:00:00Z | 234 |
| 2017-01-02T00:00:00Z | 678 |
| 2017-01-09T00:00:00Z | 234 |
| 2017-01-09T00:00:00Z | 456 |
| 2017-01-16T00:00:00Z | 234 |
| 2017-01-16T00:00:00Z | 456 |
added
For readers who might not have an existing weeks table, the same result can be achieved by using the really excellent generate_series functionality of PostgreSQL. So below the cross join of weeks is dynamically generated with the lower and upper boundary dates of the series also dynamically determined.
select w.week_start, td.account_id
from (select distinct account_id from TrialDates) a
cross join (
select
date_trunc('week', dates.d) as week_start
from generate_series(
(select date_trunc('week',min(trial_start)) from TrialDates)
,
(select date_trunc('week',max(trial_start) + interval '6 days') from TrialDates)
, '1 day'
) as dates(d)
group by 1
) w
inner join TrialDates td on a.account_id = td.account_id
and (
w.week_start >= td.trial_start and w.week_start < trial_end
OR
td.trial_start >= w.week_start and td.trial_start < w.week_start + interval '7 days'
)
order by w.week_start, td.account_id
;