for each month, count entries with interval - sql

i'm having hard times with creating a statistics with sum of ongoing subscriptions per month
i have table subscriptions
id | created_at | cancelled_at
----------------------------------------
1 | 2020-12-29 13:56:12 | null
2 | 2021-02-15 01:06:25 | 2021-04-21 19:35:31
3 | 2021-03-22 02:42:19 | null
4 | 2021-04-21 19:35:31 | null
and statistics should look as follows:
month | count
---------------
12/2020 | 1 -- #1
01/2021 | 1 -- #1
02/2021 | 2 -- #1 + #2
03/2021 | 3 -- #1 + #2 + #3
04/2021 | 3 -- #1 + #3 + #4, not #2 since it ends that month
05/2021 | 3 -- #1 + #3 + #4
so far i was able to make list of all months i need the stats for:
select generate_series(min, max, '1 month') as "month"
from (
select date_trunc('month', min(created_at)) as min,
now() as max
from subscriptions
) months;
and get the right number of subscriptions for specific month
select sum(
case
when
make_date(2021, 04, 1) >= date_trunc('month', created_at)
and make_date(2021, 04, 1); < date_trunc('month', coalesce(cancelled_at, now() + interval '1 month'))
then 1
else 0
end
) as total
from subscriptions
-- returns 3
but i am struggling combining those together... would OVER (which i am unexperienced with) be of any use for me? i found Count cumulative total in Postgresql but it's different case (dates are fixed)... or is the proper approach to use function with FOR somehow?

You can use generate_series() to generate the months and then a correlated subquery to calculate the actives:
select yyyymm,
(select count(*)
from subscriptions s
where s.created_at < gs.yyyymm + interval '1 month' and
(s.cancelled_at > gs.yyyymm + interval '1 month' or s.cancelled_at is null)
) as count
from generate_series('2020-12-01'::date, '2021-05-01'::date, interval '1 month'
) gs(yyyymm);

Related

How can i split 1 row into multiply rows in SQL

I want to split something like this:
Value | Startdate | Enddate
XXXX | 2.July | 16 August
Into this:
Value | Startdate | Enddate
XXXX | 2.July | 31 July
XXXX | 1.August | 16 August
The value is not important for now.
If I understand correctly, you want to split your range into different months. A convenient method uses generate_series():
select value, greatest(startdate, gs.mon), least(enddate, gs.mon + interval '1 month - 1 day')
from t cross join lateral
generate_series(date_trunc('month', startdate), date_trunc('month', enddate), interval '1 month'
) gs(mon)
Here is a db<>fiddle.

Group rows with start- end date by month on PostgresSQL

I have a database with a tbl_registration with rows that look like
ID | start_date_time | end_date_time | ...
1 | 2021-01-01 14:00:15 | 2021-01-01 14:00:15
2 | 2021-02-01 14:00:15 | null
4 | 2021-05-15 14:00:15 | 2024-01-01 14:00:15
5 | 2019--15 14:00:15 | 2024-01-01 14:00:15
endDate can be null
It contains 500.000 - 1.000.000 of records
We want to create an overview of year grouped by month that shows the amount of records that are active in that month. So a registration is counted per month if it lies (partially) in that month based on start and end date.
I can do a query per month like this
select count (id)
from tbl_registration
where
(r.end_date_time >= to_timestamp('01/01/2021 00:00:00', 'DD/MM/YYYY HH24:MI:SS') or r.end_date_time is null )
and r.start_date_time < to_timestamp('01/02/2021 00:00:00', 'DD/MM/YYYY HH24:MI:SS');
But that forces me to repeat this query 12 times.
I don't see a creative way to solve this in one query that would give me as a result 12 rows, one for each month
I've been looking at the generate_series function, but I don't see how I can group on the comparison of those start- and end dates
Postgres supports generate_series() . . . so generate the dates you want then then construct the query. One method is:
select gs.mon, x.cnt
from generate_series('2021-01-01'::date, '2021-12-01'::date, interval '1 month') gs(mon) left join lateral
(select count(*) as cnt
from tbl_registration
where r.end_date_time >= gs.mon or r.end_date_time is null) and
r.start_date_time < gs.mon + interval '1 month'
) x
on 1=1;

Get a rolling count of timestamps in SQL

I have a table (in an Oracle DB) that looks something like what is shown below with about 4000 records. This is just an example of how the table is designed. The timestamps range for several years.
| Time | Action |
| 9/25/2019 4:24:32 PM | Yes |
| 9/25/2019 4:28:56 PM | No |
| 9/28/2019 7:48:16 PM | Yes |
| .... | .... |
I want to be able to get a count of timestamps that occur on a rolling 15 minute interval. My main goal is to identify the maximum number of timestamps that appear for any 15 minute interval. I would like this done by looking at each timestamp and getting a count of timestamps that appear within 15 minutes of that timestamp.
My goal would to have something like
| Interval | Count |
| 9/25/2019 4:24:00 PM - 9/25/2019 4:39:00 | 2 |
| 9/25/2019 4:25:00 PM - 9/25/2019 4:40:00 | 2 |
| ..... | ..... |
| 9/25/2019 4:39:00 PM - 9/25/2019 4:54:00 | 0 |
I am not sure how I would be able to do this, if at all. Any ideas or advice would be much appreciated.
If you want any 15 minute interval in the data, then you can use:
select t.*,
count(*) over (order by timestamp
range between interval '15' minute preceding and current row
) as cnt_15
from t;
If you want the maximum, then use rank() on this:
select t.*
from (select t.*, rank() over (order by cnt_15 desc) as seqnum
from (select t.*,
count(*) over (order by timestamp
range between interval '15' minute preceding and current row
) as cnt_15
from t
) t
) t
where seqnum = 1;
This doesn't produce exactly the results you specify in the query. But it does answer the question:
I want to be able to get a count of timestamps that occur on a rolling 15 minute interval. My main goal is to identify the maximum number of timestamps that appear for any 15 minute interval.
You could enumerate the minutes with a recursive query, then bring the table with a left join:
with recursive cte (start_dt, max_dt) as (
select trunc(min(time), 'mi'), max(time) from mytable
union all
select start_dt + interval '1' minute, max_dt from cte where start_dt < max_dt
)
select
c.start_dt,
c.start_dt + interval '15' minute end_dt,
count(t.time) cnt
from cte c
left join mytable t
on t.time >= c.start_dt
and t.time < c.start_dt + interval '15' minute
group by c.start_dt

How can I aggregate values based on an arbitrary monthly cycle date range in SQL?

Given a table as such:
# SELECT * FROM payments ORDER BY payment_date DESC;
id | payment_type_id | payment_date | amount
----+-----------------+--------------+---------
4 | 1 | 2019-11-18 | 300.00
3 | 1 | 2019-11-17 | 1000.00
2 | 1 | 2019-11-16 | 250.00
1 | 1 | 2019-11-15 | 300.00
14 | 1 | 2019-10-18 | 130.00
13 | 1 | 2019-10-18 | 100.00
15 | 1 | 2019-09-18 | 1300.00
16 | 1 | 2019-09-17 | 1300.00
17 | 1 | 2019-09-01 | 400.00
18 | 1 | 2019-08-25 | 400.00
(10 rows)
How can I SUM the amount column based on an arbitrary date range, not simply a date truncation?
Taking the example of a date range beginning on the 15th of a month, and ending on the 14th of the following month, the output I would expect to see is:
payment_type_id | payment_date | amount
-----------------+--------------+---------
1 | 2019-11-15 | 1850.00
1 | 2019-10-15 | 230.00
1 | 2019-09-15 | 2600.00
1 | 2019-08-15 | 800.00
Can this be done in SQL, or is this something that's better handled in code? I would traditionally do this in code, but looking to extend my knowledge of SQL (which at this stage, isnt much!)
Click demo:db<>fiddle
You can use a combination of the CASE clause and the date_trunc() function:
SELECT
payment_type_id,
CASE
WHEN date_part('day', payment_date) < 15 THEN
date_trunc('month', payment_date) + interval '-1month 14 days'
ELSE date_trunc('month', payment_date) + interval '14 days'
END AS payment_date,
SUM(amount) AS amount
FROM
payments
GROUP BY 1,2
date_part('day', ...) gives out the current day of month
The CASE clause is for dividing the dates before the 15th of month and after.
The date_trunc('month', ...) converts all dates in a month to the first of this month
So, if date is before the 15th of the current month, it should be grouped to the 15th of the previous month (this is what +interval '-1month 14 days' calculates: +14, because the date_trunc() truncates to the 1st of month: 1 + 14 = 15). Otherwise it is group to the 15th of the current month.
After calculating these payment_days, you can use them for simple grouping.
I would simply subtract 14 days, truncate the month, and add 14 days back:
select payment_type_id,
date_trunc('month', payment_date - interval '14 day') + interval '14 day' as month_15,
sum(amount)
from payments
group by payment_type_id, month_15
order by payment_type_id, month_15;
No conditional logic is actually needed for this.
Here is a db<>fiddle.
You can use the generate_series() function and make a inner join comparing month and year, like this:
SELECT specific_date_on_month, SUM(amount)
FROM (SELECT generate_series('2015-01-15'::date, '2015-12-15'::date, '1 month'::interval) AS specific_date_on_month)
INNER JOIN payments
ON (TO_CHAR(payment_date, 'yyyymm')=TO_CHAR(specific_date_on_month, 'yyyymm'))
GROUP BY specific_date_on_month;
The generate_series(<begin>, <end>, <interval>) function generate a serie based on begin and end with an specific interval.

Database Query to generate a Time-based Chart

I have a logins table in the following (simplified) structure:
id | login_time
---------
1 | 2019-02-04 18:14:30.026361+00
2 | 2019-02-04 22:10:19.720065+00
3 | 2019-02-06 15:51:53.799014+00
Now I want to generate chart like this:
https://prnt.sc/mifz6y
Basically I want to show the logins within the past 48 hours.
My current query:
SELECT count(*), date_trunc('hour', login_time) as time_trunced FROM user_logins
WHERE login_time > now() - interval '48' hour
GROUP BY time_trunced
ORDER BY time_trunced DESC
This works as long as there are entries for every hour. However, if in some hour there were no logins, there will be no entry selected, like this:
time_trunced | count
---------------------
12:00 | 1
13:00 | 2
15:00 | 3
16:00 | 5
I would need a continous query, so that I can simply put the count values into an array:
time_trunced | count
---------------------
12:00 | 1
13:00 | 2
14:00 | 0 <-- This is missing
15:00 | 3
16:00 | 5
Based on that I can simply transform the query result into an array like [1, 2, 0, 3, 5] and pass that to my frontend.
Is this possible with postgresql? Or do I need to implement my own logic?
I think I would do:
select gs.h, count(ul.login_time)
from generate_series(
date_trunc('hour', now() - interval '48 hour'),
date_trunc('hour', now()),
interval '1 hour'
) gs(h) left join
user_logins ul
on ul.login_time >= gs.h and
ul.login_time < gs.h + interval '1 hour'
group by gs.h
order by gs.h;
This can almost certainly be tidied up a bit but should give you some ides. Props to clamp for the generate_series() tip :
SELECT t.time_trunced,coalesce(l.login_count,0) as logins
FROM
(
-- Generate an inline view with all hours between the min & max values in user_logins table
SELECT date_trunc('hour',a.min_time)+ interval '1h' * b.hr_offset as time_trunced
FROM (select min(login_time) as min_time from user_logins) a
JOIN (select generate_series(0,(select ceil((EXTRACT(EPOCH FROM max(login_time))-EXTRACT(EPOCH FROM min(login_time)))/3600) from user_logins)::int) as hr_offset) b on true
) t
LEFT JOIN
(
-- OP's original query tweaked a bit
SELECT count(*) as login_count, date_trunc('hour', login_time) as time_trunced
FROM user_logins
GROUP BY time_trunced
) l on t.time_trunced=l.time_trunced
order BY 1 desc;