How do I use SQL window to sum rows with a condition - sql

Assume this is my table:
id start_date event_date sales
------------------------------------
1 2020-09-09 2020-08-30 27.9
1 2020-09-09 2020-09-01 15
1 2020-09-09 2020-09-05 25
1 2020-09-09 2020-09-06 20.75
2 2020-09-09 2020-01-30 5
2 2020-09-09 2020-08-01 12
I'm trying to use a window function, where I want to sum sales in event_date for 7 days prior to the start date for each id, so the output I'm trying to reach looks like this...
id start_date event_date sales sales_7_days
-------------------------------------------------
1 2020-09-09 2020-08-30 27.9 0
1 2020-09-09 2020-09-01 15 0 <---- this is not within 7 days of start_date
1 2020-09-09 2020-09-05 25 25 <---- this is within 7 days of start_date
1 2020-09-09 2020-09-06 20.75 40.75 <---- this is within 7 days of start_date
2 2020-09-09 2020-01-30 5 0
2 2020-09-09 2020-09-03 12 12
This is what I've tried so far, but the problem is it seems to start summing from 7 days previous to event_date rather than start_date.
SELECT
id,
start_date,
event_date,
sales,
CASE WHEN event_date >= DATE_ADD(start_date, -7) THEN SUM(sales) \
OVER(PARTITION BY id ORDER BY event_date RANGE BETWEEN INTERVAL 7 DAYS PRECEDING AND CURRENT ROW) ELSE 0 END AS sales_7_days
FROM
sample_df
ORDER BY
id,
start_date,
event_date
So the query above is producing the below (which I don't want, because the window sum starts from event_date rather than start_date)
id start_date event_date sales sales_7_days
-------------------------------------------------
1 2020-09-09 2020-08-30 27.9 0
1 2020-09-09 2020-09-01 15 0
1 2020-09-09 2020-09-05 25 67.9
1 2020-09-09 2020-09-06 20.75 60.75
2 2020-09-09 2020-01-30 5 0
2 2020-09-09 2020-09-03 12 17
Does anybody have any tips here?

where I want to sum sales in event_date for 7 days prior to the start date for each id
Because the start date is constant for each id, this is a constant. You can calculate it as:
select s.*,
sum(case when event_date <= start_date and event_date >= start_date - interval 7 day
then sales
end) over (partition by id)
from sample_df s;
Your results suggest, though, that you really want a cumulative sum based on the event_date. That's fine, but a different question. The answer for that is to tweak the SQL:
select s.*,
sum(case when event_date <= start_date and event_date >= start_date - interval 7 day
then sales
end) over (partition by id order by event_date)
from sample_df s;

Related

How to count consecutive days in a table where days are duplicated "PostgresSQL"

Hello I would like to know the highest count of consecutive days a user has trained for.
My logs table that stores the records looks like this:
id
user_id
day
ground_id
created_at
1
1
1
1
2023-01-24 10:00:00
2
1
2
1
2023-01-25 10:00:00
3
1
3
1
2023-01-26 10:00:00
4
1
4
1
2023-01-27 10:00:00
5
1
5
1
2023-01-28 10:00:00
The closest I could get is with this query, which does work only if the user has trained on one ground at a day.
SELECT COUNT(*) AS days_in_row
FROM (SELECT row_number() OVER (ORDER BY day) - day AS grp
FROM logs
WHERE created_at >= '2023-01-24 00:00:00'
AND user_id = 1) x
GROUP BY grp
logs table:
id
user_id
day
ground_id
created_at
1
1
1
1
2023-01-24 10:00:00
2
1
2
1
2023-01-25 10:00:00
3
1
3
1
2023-01-26 10:00:00
4
1
4
1
2023-01-27 10:00:00
5
1
5
1
2023-01-28 10:00:00
This query would return a count of 5 consecutive days which is correct.
However my query doesn't work once a user trains multiple times on different training grounds in one day:
logs table:
id
user_id
day
ground_id
created_at
1
1
1
1
2023-01-24 10:00:00
2
1
2
1
2023-01-25 10:00:00
3
1
3
1
2023-01-26 10:00:00
4
1
3
2
2023-01-26 10:00:00
5
1
4
1
2023-01-27 10:00:00
Than the query from above would return a count of 2 consecutive days which is not what I expect instead I would expect the number four because the user has trained the following days in row (1,2,3,4).
Thank you for reading.
Select only distinct data of interest first
SELECT min(created_at) start, COUNT(*) AS days_in_row
FROM (SELECT created_at, row_number() OVER (ORDER BY day) - day AS grp
FROM (
select distinct day, created_at
from logs
where created_at >= '2023-01-24 00:00:00'
AND user_id = 1) t
) x
GROUP BY grp

Count the number of transactions the user had within the previous seven days

I have a transactions table as source
transaction_id
date
user_id
is_blocked
transaction_amount
transaction_category_id
02506723-1dd4-4f6b-af18-5c793f758b71
2022-04-30
97526bdd-12fa-4529-b4db-a95532ea7b6d
True
11.17
2
920a00cf-91b7-41f7-b255-a0caff61d867
2022-04-30
27f043f1-0b03-4eb1-960e-6de67f06eb79
True
21.62
6
cac92b31-8847-465f-ab63-0bc93cfe88e8
2022-04-09
2858b0b7-55f1-4f38-91e5-ad938ff861ab
True
63.40
6
2e306f57-5c52-4e6b-8567-ef3c196b82a7
2022-05-30
3e401e63-ca5c-4ec9-ba42-9c85b1fe6c12
True
31.53
3
cccb1a90-b1b8-4cff-9069-07f070e91687
2022-05-27
3fa7d28f-e8e7-4580-8117-cc6106ba9b35
True
89.40
10
02b9a570-cfc2-40bb-8703-ee895e39617b
2022-02-27
0705b115-030f-4c7d-95f9-da607985f405
True
21.05
5
f18f459c-02a5-487e-a722-667db7cc05d0
2022-05-06
327964a9-4e6f-4d4b-ba67-480c5af305cc
True
23.95
4
77056e5d-3e5f-4538-9b19-1e905205a640
2022-03-02
2e67800b-8002-464c-b376-1331aa72af08
True
52.40
1
4b4dc3c8-c877-45e4-8472-9d7405076793
2022-05-22
54465a1c-97a0-4acb-9a58-de4356efbeea
True
78.63
9
5da6fbb9-0de0-42c0-ab26-386ac611ce35
2022-02-27
e17b7d98-5ed5-44a3-8319-6a7562ebb358
True
60.66
4
6f157f6e-99c1-41d4-bab8-575e151cf1d4
2022-03-05
b5c58d7c-b779-449f-be9d-fa98d807a436
True
43.11
10
313b3ca5-7135-40b8-a538-2515440a4327
2022-04-28
dbb70729-c52e-4ed8-9ee9-39bea1f97634
True
58.00
1
2b3325ae-e958-4c12-bfe3-1da1a1e19b8d
2022-03-13
4592d896-057d-4e3b-8c2e-3bf9384092b5
I am currently using below query to count the number of transactions the user had within the seven days
SELECT
user_id,
COUNT(*) AS 'Transaction within 7 Days',
FROM
transactions
WHERE
timestamp BETWEEN CURRENT_DATE()
AND DATE_SUB(CURRENT_DATE(), INTERVAL 3 DAYS)
GROUP BY
user_id
Destination table
transaction_id
user_id
date
Transaction within 7 Days
ef05-4247
becf-457e
2020-01-01
0
c8d1-40ca
becf-457e
2020-01-05
1
fc2b-4b36
becf-457e
2020-01-07
2
3725-48c4
becf-457e
2020-01-15
0
5f2a-47c2
becf-457e
2020-01-16
1
7541-412c
5728-4f1c
2020-01-01
0
3deb-47d7
5728-4f1c
2020-01-12
0
I am looking to optimize the above query.Is there a way to optimize it using windows functions?
SELECT
user_id,
COUNT(transaction_id) AS 'Transaction within 7 Days'
-- date,
--transaction_id
FROM
transactions
WHERE
timestamp BETWEEN CURRENT_DATE()
AND DATE_SUB(CURRENT_DATE(), INTERVAL 7 DAYS)
GROUP BY
user_id

How to count users group by time interval

I have a table with user id and created_at of type timestamp, I want to count how many users have created their account in 3 hours interval for a given day. so far I have created this query but I'm not able to get the count for each three hours
with time_cte AS (
SELECT time_sample from
generate_series('2021-12-01'::date, '2021-12-01'::date + interval '1 day', interval '3 hour')
as time_sample
) SELECT time_sample, count(u.id) FROM time_cte
join users u ON u.created_at::date = '2021-12-01'::date
GROUP BY time_sample;
I am able to get series and count but they are total users count for that day
The output I got
time_sample count
2021-12-01 00:00:00.000000, 4
2021-12-01 03:00:00.000000, 4
2021-12-01 06:00:00.000000, 4
2021-12-01 09:00:00.000000, 4
2021-12-01 12:00:00.000000, 4
2021-12-01 15:00:00.000000, 4
2021-12-01 18:00:00.000000, 4
2021-12-01 21:00:00.000000, 4
2021-12-02 00:00:00.000000, 4
The output I expect is
time_sample count
2021-12-01 00:00:00.000000, 0
2021-12-01 03:00:00.000000, 0
2021-12-01 06:00:00.000000, 3
2021-12-01 09:00:00.000000, 1
2021-12-01 12:00:00.000000, 0
2021-12-01 15:00:00.000000, 0
2021-12-01 18:00:00.000000, 0
2021-12-01 21:00:00.000000, 0
2021-12-02 00:00:00.000000, 0
For PostgreSQL 14 you can use the built-in date_bin function.
select
date_bin(interval '3 hours', created_at, date_trunc('day', created_at)) as time_slot,
count(*) as cnt
from users
group by time_slot
order by time_slot;
For PostgreSQL versions before 14 you may use this implementation of date_bin.

Period and Quarter Sequence

I'm trying to find a way to do a sequence for date periods and quarters(not sure if this is the correct term).
Basically this will help people to navigate dates based on weeks, periods, and quarters once I join this to our sales data. For example, if I just want to know the sales from last week, I could just use WHERE WeekSequence = -1... Another example is, a manager wants to get the sales data for the past quarter, I could just use WHERE QuarterSequence = -1... something like that.
My current table:
WeekStartDate WeekEndDate CurrentWeek Period Quarter WeekSequence
----------------------------------------------------------------------
2020-08-03 2020-08-09 0 2 1 -5
2020-08-10 2020-08-16 0 2 1 -4
2020-08-17 2020-08-23 0 2 1 -3
2020-08-24 2020-08-30 0 2 1 -2
2020-08-31 2020-09-06 0 2 1 -1
2020-09-07 2020-09-13 1 3 1 0
2020-09-14 2020-09-20 0 3 1 1
2020-09-21 2020-09-27 0 3 1 2
2020-09-28 2020-10-04 0 3 1 3
2020-10-05 2020-10-11 0 4 2 4
2020-10-12 2020-10-18 0 4 2 5
What I want it to look like(highlighted):
If I understand correctly, just use window functions:
select t.*,
(period -
max(case when currentweek = 1 then period end) over ()
) as periodsequence,
(quarter -
max(case when currentweek = 1 then quarter end) over ()
) as quartersequence
from t;
You can include this in a view rather than putting it in a table.

Select data where days between two dates are part of a given month

My data looks like below, and I need to show the ids where interval between date1 and date2 are part of a given month/year parameter.
Eg.: for July 2018 I need ids from 1 to 7.
date1 date2 id
---------- ---------- --------
2017-11-01 2018-08-28 1
2018-06-05 2018-07-05 2
2018-06-05 2019-05-07 3
2018-06-05 2018-08-08 4
2018-07-01 2018-07-31 5
2018-07-07 2018-07-15 6
2018-07-27 2018-08-05 7
2018-06-01 2018-06-07 8
2018-08-03 2018-09-01 9
solution is quite simple
SELECT
id
FROM
YOUR_TABLE
WHERE
date1<=YOUR_DATE_END_OF_MONTH AND date2>=YOUR_DATE_START_OF_MONTH
e.g. for July 2018
SELECT
id
FROM
YOUR_TABLE
WHERE
date1<='2018-07-31' AND date2>='2018-07-01'
or if you do not need to calculate first end day of the month (but this do not use any indexes if exists on date1 and date2)
SELECT
id
FROM
YOUR_TABLE
WHERE
EXTRACT(YEAR FROM date1)*12 + EXTRACT(MONTH FROM date1)<=2018*12 + 7
AND EXTRACT(YEAR FROM date2)*12 + EXTRACT(MONTH FROM date2)>=2018*12 + 7