Count the number of transactions the user had within the previous seven days - sql

I have a transactions table as source
transaction_id
date
user_id
is_blocked
transaction_amount
transaction_category_id
02506723-1dd4-4f6b-af18-5c793f758b71
2022-04-30
97526bdd-12fa-4529-b4db-a95532ea7b6d
True
11.17
2
920a00cf-91b7-41f7-b255-a0caff61d867
2022-04-30
27f043f1-0b03-4eb1-960e-6de67f06eb79
True
21.62
6
cac92b31-8847-465f-ab63-0bc93cfe88e8
2022-04-09
2858b0b7-55f1-4f38-91e5-ad938ff861ab
True
63.40
6
2e306f57-5c52-4e6b-8567-ef3c196b82a7
2022-05-30
3e401e63-ca5c-4ec9-ba42-9c85b1fe6c12
True
31.53
3
cccb1a90-b1b8-4cff-9069-07f070e91687
2022-05-27
3fa7d28f-e8e7-4580-8117-cc6106ba9b35
True
89.40
10
02b9a570-cfc2-40bb-8703-ee895e39617b
2022-02-27
0705b115-030f-4c7d-95f9-da607985f405
True
21.05
5
f18f459c-02a5-487e-a722-667db7cc05d0
2022-05-06
327964a9-4e6f-4d4b-ba67-480c5af305cc
True
23.95
4
77056e5d-3e5f-4538-9b19-1e905205a640
2022-03-02
2e67800b-8002-464c-b376-1331aa72af08
True
52.40
1
4b4dc3c8-c877-45e4-8472-9d7405076793
2022-05-22
54465a1c-97a0-4acb-9a58-de4356efbeea
True
78.63
9
5da6fbb9-0de0-42c0-ab26-386ac611ce35
2022-02-27
e17b7d98-5ed5-44a3-8319-6a7562ebb358
True
60.66
4
6f157f6e-99c1-41d4-bab8-575e151cf1d4
2022-03-05
b5c58d7c-b779-449f-be9d-fa98d807a436
True
43.11
10
313b3ca5-7135-40b8-a538-2515440a4327
2022-04-28
dbb70729-c52e-4ed8-9ee9-39bea1f97634
True
58.00
1
2b3325ae-e958-4c12-bfe3-1da1a1e19b8d
2022-03-13
4592d896-057d-4e3b-8c2e-3bf9384092b5
I am currently using below query to count the number of transactions the user had within the seven days
SELECT
user_id,
COUNT(*) AS 'Transaction within 7 Days',
FROM
transactions
WHERE
timestamp BETWEEN CURRENT_DATE()
AND DATE_SUB(CURRENT_DATE(), INTERVAL 3 DAYS)
GROUP BY
user_id
Destination table
transaction_id
user_id
date
Transaction within 7 Days
ef05-4247
becf-457e
2020-01-01
0
c8d1-40ca
becf-457e
2020-01-05
1
fc2b-4b36
becf-457e
2020-01-07
2
3725-48c4
becf-457e
2020-01-15
0
5f2a-47c2
becf-457e
2020-01-16
1
7541-412c
5728-4f1c
2020-01-01
0
3deb-47d7
5728-4f1c
2020-01-12
0
I am looking to optimize the above query.Is there a way to optimize it using windows functions?

SELECT
user_id,
COUNT(transaction_id) AS 'Transaction within 7 Days'
-- date,
--transaction_id
FROM
transactions
WHERE
timestamp BETWEEN CURRENT_DATE()
AND DATE_SUB(CURRENT_DATE(), INTERVAL 7 DAYS)
GROUP BY
user_id

Related

How to count consecutive days in a table where days are duplicated "PostgresSQL"

Hello I would like to know the highest count of consecutive days a user has trained for.
My logs table that stores the records looks like this:
id
user_id
day
ground_id
created_at
1
1
1
1
2023-01-24 10:00:00
2
1
2
1
2023-01-25 10:00:00
3
1
3
1
2023-01-26 10:00:00
4
1
4
1
2023-01-27 10:00:00
5
1
5
1
2023-01-28 10:00:00
The closest I could get is with this query, which does work only if the user has trained on one ground at a day.
SELECT COUNT(*) AS days_in_row
FROM (SELECT row_number() OVER (ORDER BY day) - day AS grp
FROM logs
WHERE created_at >= '2023-01-24 00:00:00'
AND user_id = 1) x
GROUP BY grp
logs table:
id
user_id
day
ground_id
created_at
1
1
1
1
2023-01-24 10:00:00
2
1
2
1
2023-01-25 10:00:00
3
1
3
1
2023-01-26 10:00:00
4
1
4
1
2023-01-27 10:00:00
5
1
5
1
2023-01-28 10:00:00
This query would return a count of 5 consecutive days which is correct.
However my query doesn't work once a user trains multiple times on different training grounds in one day:
logs table:
id
user_id
day
ground_id
created_at
1
1
1
1
2023-01-24 10:00:00
2
1
2
1
2023-01-25 10:00:00
3
1
3
1
2023-01-26 10:00:00
4
1
3
2
2023-01-26 10:00:00
5
1
4
1
2023-01-27 10:00:00
Than the query from above would return a count of 2 consecutive days which is not what I expect instead I would expect the number four because the user has trained the following days in row (1,2,3,4).
Thank you for reading.
Select only distinct data of interest first
SELECT min(created_at) start, COUNT(*) AS days_in_row
FROM (SELECT created_at, row_number() OVER (ORDER BY day) - day AS grp
FROM (
select distinct day, created_at
from logs
where created_at >= '2023-01-24 00:00:00'
AND user_id = 1) t
) x
GROUP BY grp

CASE in WHERE Clause in Snowflake

I am trying to do a case statement within the where clause in snowflake but I’m not quite sure how should I go about doing it.
What I’m trying to do is, if my current month is Jan, then the where clause for date is between start of previous year and today. If not, the where clause for date would be between start of current year and today.
WHERE
CASE MONTH(CURRENT_DATE()) = 1 THEN DATE BETWEEN DATE_TRUNC(‘YEAR’, DATEADD(YEAR, -1, CURRENT_DATE())) AND CURRENT_DATE()
CASE MONTH(CURRENT_DATE()) != 1 THEN DATE BETWEEN DATE_TRUNC(‘YEAR’, CURRENT_DATE()) AND CURRENT_DATE()
END
Appreciate any help on this!
Use a CASE expression that returns -1 if the current month is January or 0 for any other month, so that you can get with DATEADD() a date of the previous or the current year to use in DATE_TRUNC():
WHERE DATE BETWEEN
DATE_TRUNC('YEAR', DATEADD(YEAR, CASE WHEN MONTH(CURRENT_DATE()) = 1 THEN -1 ELSE 0 END, CURRENT_DATE()))
AND
CURRENT_DATE()
I suspect that you don't even need to use CASE here:
WHERE
(MONTH(CURRENT_DATE()) = 1 AND
DATE BETWEEN DATE_TRUNC(‘YEAR’, DATEADD(YEAR, -1, CURRENT_DATE())) AND
CURRENT_DATE()) OR
(MONTH(CURRENT_DATE()) != 1 AND
DATE BETWEEN DATE_TRUNC(‘YEAR’, CURRENT_DATE()) AND CURRENT_DATE())
So the other answers are quite good, but... the answer can be even simpler
Making a little table to brake down what is happening.
select
row_number() over (order by null) - 1 as rn,
dateadd('day', rn * 5, date_trunc('year',current_date())) as pretend_current_date,
DATEADD(YEAR, -1, pretend_current_date) as pcd_sub1,
month(pretend_current_date) as pcd_month,
DATE_TRUNC(year, iff(pcd_month = 1, pcd_sub1, pretend_current_date)) as _from,
pretend_current_date as _to
from table(generator(ROWCOUNT => 30))
order by rn;
this shows:
RN
PRETEND_CURRENT_DATE
PCD_SUB1
PCD_MONTH
_FROM
_TO
0
2022-01-01
2021-01-01
1
2021-01-01
2022-01-01
1
2022-01-06
2021-01-06
1
2021-01-01
2022-01-06
2
2022-01-11
2021-01-11
1
2021-01-01
2022-01-11
3
2022-01-16
2021-01-16
1
2021-01-01
2022-01-16
4
2022-01-21
2021-01-21
1
2021-01-01
2022-01-21
5
2022-01-26
2021-01-26
1
2021-01-01
2022-01-26
6
2022-01-31
2021-01-31
1
2021-01-01
2022-01-31
7
2022-02-05
2021-02-05
2
2022-01-01
2022-02-05
8
2022-02-10
2021-02-10
2
2022-01-01
2022-02-10
9
2022-02-15
2021-02-15
2
2022-01-01
2022-02-15
10
2022-02-20
2021-02-20
2
2022-01-01
2022-02-20
11
2022-02-25
2021-02-25
2
2022-01-01
2022-02-25
12
2022-03-02
2021-03-02
3
2022-01-01
2022-03-02
13
2022-03-07
2021-03-07
3
2022-01-01
2022-03-07
14
2022-03-12
2021-03-12
3
2022-01-01
2022-03-12
15
2022-03-17
2021-03-17
3
2022-01-01
2022-03-17
16
2022-03-22
2021-03-22
3
2022-01-01
2022-03-22
17
2022-03-27
2021-03-27
3
2022-01-01
2022-03-27
18
2022-04-01
2021-04-01
4
2022-01-01
2022-04-01
19
2022-04-06
2021-04-06
4
2022-01-01
2022-04-06
20
2022-04-11
2021-04-11
4
2022-01-01
2022-04-11
21
2022-04-16
2021-04-16
4
2022-01-01
2022-04-16
22
2022-04-21
2021-04-21
4
2022-01-01
2022-04-21
23
2022-04-26
2021-04-26
4
2022-01-01
2022-04-26
24
2022-05-01
2021-05-01
5
2022-01-01
2022-05-01
25
2022-05-06
2021-05-06
5
2022-01-01
2022-05-06
26
2022-05-11
2021-05-11
5
2022-01-01
2022-05-11
27
2022-05-16
2021-05-16
5
2022-01-01
2022-05-16
28
2022-05-21
2021-05-21
5
2022-01-01
2022-05-21
29
2022-05-26
2021-05-26
5
2022-01-01
2022-05-26
Your logic is asking "is the current date in the month of January", at which point take the prior year, and then date truncate to the year, otherwise take the current date and truncate to the year. As the start of a BETWEEN test.
This is the same as getting the current date subtracting one month, and truncating this to year.
Thus there is no need for any IFF or CASE
WHERE date BETWEEN DATE_TRUNC(year, DATEADD(month,-1, CURRENT_DATE())) AND CURRENT_DATE()
and if you like to drop some paren's, CURRENT_DATE can be used if you leave it in upper case, thus it can even be smaller:
WHERE date BETWEEN DATE_TRUNC(year, DATEADD(month,-1, CURRENT_DATE)) AND CURRENT_DATE

How to count users group by time interval

I have a table with user id and created_at of type timestamp, I want to count how many users have created their account in 3 hours interval for a given day. so far I have created this query but I'm not able to get the count for each three hours
with time_cte AS (
SELECT time_sample from
generate_series('2021-12-01'::date, '2021-12-01'::date + interval '1 day', interval '3 hour')
as time_sample
) SELECT time_sample, count(u.id) FROM time_cte
join users u ON u.created_at::date = '2021-12-01'::date
GROUP BY time_sample;
I am able to get series and count but they are total users count for that day
The output I got
time_sample count
2021-12-01 00:00:00.000000, 4
2021-12-01 03:00:00.000000, 4
2021-12-01 06:00:00.000000, 4
2021-12-01 09:00:00.000000, 4
2021-12-01 12:00:00.000000, 4
2021-12-01 15:00:00.000000, 4
2021-12-01 18:00:00.000000, 4
2021-12-01 21:00:00.000000, 4
2021-12-02 00:00:00.000000, 4
The output I expect is
time_sample count
2021-12-01 00:00:00.000000, 0
2021-12-01 03:00:00.000000, 0
2021-12-01 06:00:00.000000, 3
2021-12-01 09:00:00.000000, 1
2021-12-01 12:00:00.000000, 0
2021-12-01 15:00:00.000000, 0
2021-12-01 18:00:00.000000, 0
2021-12-01 21:00:00.000000, 0
2021-12-02 00:00:00.000000, 0
For PostgreSQL 14 you can use the built-in date_bin function.
select
date_bin(interval '3 hours', created_at, date_trunc('day', created_at)) as time_slot,
count(*) as cnt
from users
group by time_slot
order by time_slot;
For PostgreSQL versions before 14 you may use this implementation of date_bin.

How do I use SQL window to sum rows with a condition

Assume this is my table:
id start_date event_date sales
------------------------------------
1 2020-09-09 2020-08-30 27.9
1 2020-09-09 2020-09-01 15
1 2020-09-09 2020-09-05 25
1 2020-09-09 2020-09-06 20.75
2 2020-09-09 2020-01-30 5
2 2020-09-09 2020-08-01 12
I'm trying to use a window function, where I want to sum sales in event_date for 7 days prior to the start date for each id, so the output I'm trying to reach looks like this...
id start_date event_date sales sales_7_days
-------------------------------------------------
1 2020-09-09 2020-08-30 27.9 0
1 2020-09-09 2020-09-01 15 0 <---- this is not within 7 days of start_date
1 2020-09-09 2020-09-05 25 25 <---- this is within 7 days of start_date
1 2020-09-09 2020-09-06 20.75 40.75 <---- this is within 7 days of start_date
2 2020-09-09 2020-01-30 5 0
2 2020-09-09 2020-09-03 12 12
This is what I've tried so far, but the problem is it seems to start summing from 7 days previous to event_date rather than start_date.
SELECT
id,
start_date,
event_date,
sales,
CASE WHEN event_date >= DATE_ADD(start_date, -7) THEN SUM(sales) \
OVER(PARTITION BY id ORDER BY event_date RANGE BETWEEN INTERVAL 7 DAYS PRECEDING AND CURRENT ROW) ELSE 0 END AS sales_7_days
FROM
sample_df
ORDER BY
id,
start_date,
event_date
So the query above is producing the below (which I don't want, because the window sum starts from event_date rather than start_date)
id start_date event_date sales sales_7_days
-------------------------------------------------
1 2020-09-09 2020-08-30 27.9 0
1 2020-09-09 2020-09-01 15 0
1 2020-09-09 2020-09-05 25 67.9
1 2020-09-09 2020-09-06 20.75 60.75
2 2020-09-09 2020-01-30 5 0
2 2020-09-09 2020-09-03 12 17
Does anybody have any tips here?
where I want to sum sales in event_date for 7 days prior to the start date for each id
Because the start date is constant for each id, this is a constant. You can calculate it as:
select s.*,
sum(case when event_date <= start_date and event_date >= start_date - interval 7 day
then sales
end) over (partition by id)
from sample_df s;
Your results suggest, though, that you really want a cumulative sum based on the event_date. That's fine, but a different question. The answer for that is to tweak the SQL:
select s.*,
sum(case when event_date <= start_date and event_date >= start_date - interval 7 day
then sales
end) over (partition by id order by event_date)
from sample_df s;

Period and Quarter Sequence

I'm trying to find a way to do a sequence for date periods and quarters(not sure if this is the correct term).
Basically this will help people to navigate dates based on weeks, periods, and quarters once I join this to our sales data. For example, if I just want to know the sales from last week, I could just use WHERE WeekSequence = -1... Another example is, a manager wants to get the sales data for the past quarter, I could just use WHERE QuarterSequence = -1... something like that.
My current table:
WeekStartDate WeekEndDate CurrentWeek Period Quarter WeekSequence
----------------------------------------------------------------------
2020-08-03 2020-08-09 0 2 1 -5
2020-08-10 2020-08-16 0 2 1 -4
2020-08-17 2020-08-23 0 2 1 -3
2020-08-24 2020-08-30 0 2 1 -2
2020-08-31 2020-09-06 0 2 1 -1
2020-09-07 2020-09-13 1 3 1 0
2020-09-14 2020-09-20 0 3 1 1
2020-09-21 2020-09-27 0 3 1 2
2020-09-28 2020-10-04 0 3 1 3
2020-10-05 2020-10-11 0 4 2 4
2020-10-12 2020-10-18 0 4 2 5
What I want it to look like(highlighted):
If I understand correctly, just use window functions:
select t.*,
(period -
max(case when currentweek = 1 then period end) over ()
) as periodsequence,
(quarter -
max(case when currentweek = 1 then quarter end) over ()
) as quartersequence
from t;
You can include this in a view rather than putting it in a table.