How to count users group by time interval - sql

I have a table with user id and created_at of type timestamp, I want to count how many users have created their account in 3 hours interval for a given day. so far I have created this query but I'm not able to get the count for each three hours
with time_cte AS (
SELECT time_sample from
generate_series('2021-12-01'::date, '2021-12-01'::date + interval '1 day', interval '3 hour')
as time_sample
) SELECT time_sample, count(u.id) FROM time_cte
join users u ON u.created_at::date = '2021-12-01'::date
GROUP BY time_sample;
I am able to get series and count but they are total users count for that day
The output I got
time_sample count
2021-12-01 00:00:00.000000, 4
2021-12-01 03:00:00.000000, 4
2021-12-01 06:00:00.000000, 4
2021-12-01 09:00:00.000000, 4
2021-12-01 12:00:00.000000, 4
2021-12-01 15:00:00.000000, 4
2021-12-01 18:00:00.000000, 4
2021-12-01 21:00:00.000000, 4
2021-12-02 00:00:00.000000, 4
The output I expect is
time_sample count
2021-12-01 00:00:00.000000, 0
2021-12-01 03:00:00.000000, 0
2021-12-01 06:00:00.000000, 3
2021-12-01 09:00:00.000000, 1
2021-12-01 12:00:00.000000, 0
2021-12-01 15:00:00.000000, 0
2021-12-01 18:00:00.000000, 0
2021-12-01 21:00:00.000000, 0
2021-12-02 00:00:00.000000, 0

For PostgreSQL 14 you can use the built-in date_bin function.
select
date_bin(interval '3 hours', created_at, date_trunc('day', created_at)) as time_slot,
count(*) as cnt
from users
group by time_slot
order by time_slot;
For PostgreSQL versions before 14 you may use this implementation of date_bin.

Related

How to count consecutive days in a table where days are duplicated "PostgresSQL"

Hello I would like to know the highest count of consecutive days a user has trained for.
My logs table that stores the records looks like this:
id
user_id
day
ground_id
created_at
1
1
1
1
2023-01-24 10:00:00
2
1
2
1
2023-01-25 10:00:00
3
1
3
1
2023-01-26 10:00:00
4
1
4
1
2023-01-27 10:00:00
5
1
5
1
2023-01-28 10:00:00
The closest I could get is with this query, which does work only if the user has trained on one ground at a day.
SELECT COUNT(*) AS days_in_row
FROM (SELECT row_number() OVER (ORDER BY day) - day AS grp
FROM logs
WHERE created_at >= '2023-01-24 00:00:00'
AND user_id = 1) x
GROUP BY grp
logs table:
id
user_id
day
ground_id
created_at
1
1
1
1
2023-01-24 10:00:00
2
1
2
1
2023-01-25 10:00:00
3
1
3
1
2023-01-26 10:00:00
4
1
4
1
2023-01-27 10:00:00
5
1
5
1
2023-01-28 10:00:00
This query would return a count of 5 consecutive days which is correct.
However my query doesn't work once a user trains multiple times on different training grounds in one day:
logs table:
id
user_id
day
ground_id
created_at
1
1
1
1
2023-01-24 10:00:00
2
1
2
1
2023-01-25 10:00:00
3
1
3
1
2023-01-26 10:00:00
4
1
3
2
2023-01-26 10:00:00
5
1
4
1
2023-01-27 10:00:00
Than the query from above would return a count of 2 consecutive days which is not what I expect instead I would expect the number four because the user has trained the following days in row (1,2,3,4).
Thank you for reading.
Select only distinct data of interest first
SELECT min(created_at) start, COUNT(*) AS days_in_row
FROM (SELECT created_at, row_number() OVER (ORDER BY day) - day AS grp
FROM (
select distinct day, created_at
from logs
where created_at >= '2023-01-24 00:00:00'
AND user_id = 1) t
) x
GROUP BY grp

Count the number of transactions the user had within the previous seven days

I have a transactions table as source
transaction_id
date
user_id
is_blocked
transaction_amount
transaction_category_id
02506723-1dd4-4f6b-af18-5c793f758b71
2022-04-30
97526bdd-12fa-4529-b4db-a95532ea7b6d
True
11.17
2
920a00cf-91b7-41f7-b255-a0caff61d867
2022-04-30
27f043f1-0b03-4eb1-960e-6de67f06eb79
True
21.62
6
cac92b31-8847-465f-ab63-0bc93cfe88e8
2022-04-09
2858b0b7-55f1-4f38-91e5-ad938ff861ab
True
63.40
6
2e306f57-5c52-4e6b-8567-ef3c196b82a7
2022-05-30
3e401e63-ca5c-4ec9-ba42-9c85b1fe6c12
True
31.53
3
cccb1a90-b1b8-4cff-9069-07f070e91687
2022-05-27
3fa7d28f-e8e7-4580-8117-cc6106ba9b35
True
89.40
10
02b9a570-cfc2-40bb-8703-ee895e39617b
2022-02-27
0705b115-030f-4c7d-95f9-da607985f405
True
21.05
5
f18f459c-02a5-487e-a722-667db7cc05d0
2022-05-06
327964a9-4e6f-4d4b-ba67-480c5af305cc
True
23.95
4
77056e5d-3e5f-4538-9b19-1e905205a640
2022-03-02
2e67800b-8002-464c-b376-1331aa72af08
True
52.40
1
4b4dc3c8-c877-45e4-8472-9d7405076793
2022-05-22
54465a1c-97a0-4acb-9a58-de4356efbeea
True
78.63
9
5da6fbb9-0de0-42c0-ab26-386ac611ce35
2022-02-27
e17b7d98-5ed5-44a3-8319-6a7562ebb358
True
60.66
4
6f157f6e-99c1-41d4-bab8-575e151cf1d4
2022-03-05
b5c58d7c-b779-449f-be9d-fa98d807a436
True
43.11
10
313b3ca5-7135-40b8-a538-2515440a4327
2022-04-28
dbb70729-c52e-4ed8-9ee9-39bea1f97634
True
58.00
1
2b3325ae-e958-4c12-bfe3-1da1a1e19b8d
2022-03-13
4592d896-057d-4e3b-8c2e-3bf9384092b5
I am currently using below query to count the number of transactions the user had within the seven days
SELECT
user_id,
COUNT(*) AS 'Transaction within 7 Days',
FROM
transactions
WHERE
timestamp BETWEEN CURRENT_DATE()
AND DATE_SUB(CURRENT_DATE(), INTERVAL 3 DAYS)
GROUP BY
user_id
Destination table
transaction_id
user_id
date
Transaction within 7 Days
ef05-4247
becf-457e
2020-01-01
0
c8d1-40ca
becf-457e
2020-01-05
1
fc2b-4b36
becf-457e
2020-01-07
2
3725-48c4
becf-457e
2020-01-15
0
5f2a-47c2
becf-457e
2020-01-16
1
7541-412c
5728-4f1c
2020-01-01
0
3deb-47d7
5728-4f1c
2020-01-12
0
I am looking to optimize the above query.Is there a way to optimize it using windows functions?
SELECT
user_id,
COUNT(transaction_id) AS 'Transaction within 7 Days'
-- date,
--transaction_id
FROM
transactions
WHERE
timestamp BETWEEN CURRENT_DATE()
AND DATE_SUB(CURRENT_DATE(), INTERVAL 7 DAYS)
GROUP BY
user_id

Rolling Sum Calculation Based on 2 Date Fields

Giving up after a few hours of failed attempts.
My data is in the following format - event_date can never be higher than create_date.
I'd need to calculate on a rolling n-day basis (let's say 3) the sum of units where the create_date and event_date were within the same 3-day window. The data is illustrative but each event_date can have over 500+ different create_dates associated with it and the number isn't constant. There is a possibility of event_dates missing.
So let's say for 2022-02-03, I only want to sum units where both the event_date and create_date values were between 2022-02-01 and 2022-02-03.
event_date
create_date
rowid
units
2022-02-01
2022-01-20
1
100
2022-02-01
2022-02-01
2
100
2022-02-02
2022-01-21
3
100
2022-02-02
2022-01-23
4
100
2022-02-02
2022-01-31
5
100
2022-02-02
2022-02-02
6
100
2022-02-03
2022-01-30
7
100
2022-02-03
2022-02-01
8
100
2022-02-03
2022-02-03
9
100
2022-02-05
2022-02-01
10
100
2022-02-05
2022-02-03
11
100
The output I'd need to get to (added in brackets the rows I'd need to include in the calculation for each date but my result would only need to include the numerical sum) . I tried calculating using either dates but neither of them returned the results I needed.
date
units
2022-02-01
100 (Row 2)
2022-02-02
300 (Row 2,5,6)
2022-02-03
300 (Row 2,6,8,9)
2022-02-04
200 (Row 6,9)
2022-02-05
200 (Row 9,11)
In Python I solved above with a definition that looped through filtering a dataframe for each date but I am struggling to do the same in SQL.
Thank you!
Consider below approach
with events_dates as (
select date from (
select min(event_date) min_date, max(event_date) max_date
from your_table
), unnest(generate_date_array(min_date, max_date)) date
)
select date, sum(units) as units, string_agg('' || rowid) rows_included
from events_dates
left join your_table
on create_date between date - 2 and date
and event_date between date - 2 and date
group by date
if applied to sample data in your question - output is

How to calculate range in 1 week using Postgres?

tanggal | product
2021-01-01 bag 1
2021-01-05 bag 5
2021-01-08 bag 8
2021-01-11 bag 11
2021-01-12 bag 12
2021-01-13 bag 13
2021-01-14 bag 14
here I have a product tbl, in this table there are input dates and product names,
I want to calculate the product based on 1 week how the query to calculate the data with a range of 7 days?
and this my query
select tanggal, product from tbl_product
where tanggal > current_date + interval '7' day
You could solve this for arbitrary dates using a generated time series.
For example:
SELECT series::date
FROM generate_series(
(now() - interval '1 week')::date,
now()::date,
'1 day'::interval
) series;
Would result in:
2021-05-26
2021-05-27
2021-05-28
2021-05-29
2021-05-30
2021-05-31
2021-06-01
2021-06-02
which you can join with other tables as you see fit.
For further information on generate_series() and other set-returning functions, check out the documentation.

How do I use SQL window to sum rows with a condition

Assume this is my table:
id start_date event_date sales
------------------------------------
1 2020-09-09 2020-08-30 27.9
1 2020-09-09 2020-09-01 15
1 2020-09-09 2020-09-05 25
1 2020-09-09 2020-09-06 20.75
2 2020-09-09 2020-01-30 5
2 2020-09-09 2020-08-01 12
I'm trying to use a window function, where I want to sum sales in event_date for 7 days prior to the start date for each id, so the output I'm trying to reach looks like this...
id start_date event_date sales sales_7_days
-------------------------------------------------
1 2020-09-09 2020-08-30 27.9 0
1 2020-09-09 2020-09-01 15 0 <---- this is not within 7 days of start_date
1 2020-09-09 2020-09-05 25 25 <---- this is within 7 days of start_date
1 2020-09-09 2020-09-06 20.75 40.75 <---- this is within 7 days of start_date
2 2020-09-09 2020-01-30 5 0
2 2020-09-09 2020-09-03 12 12
This is what I've tried so far, but the problem is it seems to start summing from 7 days previous to event_date rather than start_date.
SELECT
id,
start_date,
event_date,
sales,
CASE WHEN event_date >= DATE_ADD(start_date, -7) THEN SUM(sales) \
OVER(PARTITION BY id ORDER BY event_date RANGE BETWEEN INTERVAL 7 DAYS PRECEDING AND CURRENT ROW) ELSE 0 END AS sales_7_days
FROM
sample_df
ORDER BY
id,
start_date,
event_date
So the query above is producing the below (which I don't want, because the window sum starts from event_date rather than start_date)
id start_date event_date sales sales_7_days
-------------------------------------------------
1 2020-09-09 2020-08-30 27.9 0
1 2020-09-09 2020-09-01 15 0
1 2020-09-09 2020-09-05 25 67.9
1 2020-09-09 2020-09-06 20.75 60.75
2 2020-09-09 2020-01-30 5 0
2 2020-09-09 2020-09-03 12 17
Does anybody have any tips here?
where I want to sum sales in event_date for 7 days prior to the start date for each id
Because the start date is constant for each id, this is a constant. You can calculate it as:
select s.*,
sum(case when event_date <= start_date and event_date >= start_date - interval 7 day
then sales
end) over (partition by id)
from sample_df s;
Your results suggest, though, that you really want a cumulative sum based on the event_date. That's fine, but a different question. The answer for that is to tweak the SQL:
select s.*,
sum(case when event_date <= start_date and event_date >= start_date - interval 7 day
then sales
end) over (partition by id order by event_date)
from sample_df s;