Daily count of user_ids who have visited my store 4 or more than 4 times every day - sql

I have a table of user_id who have visited my platform. I want to get count of only those user IDs who have visited my store 4 or more times for each user and for every day for a duration of 10 days.
To achieve this I am using this query:
select date(arrival_timestamp), count(user_id)
from mytable
where date(arrival_timestamp) >= current_date-10
and date(arrival_timestamp) < current_date
group by 1
having count(user_id)>=4
order by 2 desc
limit 10;
But this query is virtually taking all the users having count value greater than 4 and not on a daily basis which covers almost every user and hence I am not able to segregate only those users who vist my store more than once on a particular day. Any help in this regard is appreciated.
Thanks

you can try this
with list as (
select user_id, count(*) as user_count, array_agg(arrival_timestamp) as arrival_timestamp
from mytable
where date(arrival_timestamp) >= current_date-10
and date(arrival_timestamp) < current_date
group by user_id)
select user_id, unnest(arrival_timestamp)
from list
where user_count >= 4

From a list of daily users that have visited your store 4 or more times a day over the last 10 days (the internal query) select these who have 10 occurencies, i.e. every day.
select user_id
from
(
select user_id
from the_table
where arrival_timestamp::date between current_date - 10 and current_date - 1
group by user_id, arrival_timestamp::date
having count(*) >= 4
) t
group by user_id
having count(*) = 10;

Related

How to get the average of the number of actions per day

I have written the sql query:
SELECT id
date_diff("day", create_date, date) as day
action_type
FROM "my_database"
It brings this:
id day action_type
1 0 upload
1 0 upload
1 0 upload
1 1 upload
1 1 upload
2 0 upload
2 0 upload
2 1 upload
How to change my query to get table with unique days in column day and average number "upload" action_type among all id's. So desired result must look like this:
day avg_num_action
0 2.5
1 1.5
It is 2.5, because (3+2)/2 (3 uploads of id:1 and 2 uploads for id:2). same for 1.5
Please try this. Consider your given query as a table. If any WHERE condition needed then please enable this other wise disable where clause.
SELECT t.day
, COUNT(*) / COUNT(DISTINCT t.id) avg_num_action
FROM (SELECT id,
date_diff("day", create_date, date) as day,
action_type
FROM "my_database") t
WHERE t.action_type = 'upload'
GROUP BY t.day
Create a table from your given result set and write query based on that.
SELECT t.tday
, COUNT(*) / COUNT(DISTINCT t.id) avg_num_action
FROM my_database t
GROUP BY t.tday
Please check from url https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=871935ea2b919c4e24eb83fcbce78973
Update: I think my two-steps approach is more complicated than needed. Rahul Biswas shows how this can be done in one step. I suggest you use and accept his answer.
Original answer:
Two steps:
Count entries per ID and day
Take the average count per day
The query:
with rows as (select id, date_diff('day', create_date, date) as day from mytable)
, per_id_and_day as (select id, day, count(*) as cnt from rows group by id, day)
select day, avg(cnt)
from per_id_and_day
group by day
order by day;
You don't need a subquery for this logic:
SELECT date_diff("day", create_date, date) as day,
COUNT(*) * 1.0 / COUNT(DISTINCT id)
FROM "my_database"
GROUP BY date_diff("day", create_date, date)

Daily Rolling Count of Distinct Users on Different time periods

I am trying to find the most optimal way to run the following query which I need to connect to tableau and visualise. The idea is to count 7 day active users, 30 day active users and 90 day active users for each day. So for today I want who was active and for yesterday and I want who was active within those timeframes.
To clarify users can be active multiple times within my time frames.
A count of 7 day actives users would be the distinct number of users who had a session with in the period todays date and todays date -6. I need to calculate this for every date within the last 6 month.
This is the query I have.
with dau as (
select date_trunc('day', created_date) created_at,
count(distinct customer_id) dau
from sessions
where created_date >= date_trunc('day', dateadd('month', -6, getdate()))
group by date_trunc('day', created_date)
)
select created_at,
dau,
(select count(distinct customer_id)
from sessions
where date_trunc('day', created_date) between created_at - 6 and created_at) wau,
(select count(distinct customer_id)
from sessions
where date_trunc('day', created_date) between created_at - 29 and created_at) as mau,
(select count(distinct customer_id)
from session_s
where date_trunc('day', created_date) between created_at - 89 and created_at) as three_mau
from dau
It takes 30 min to run which seems crazy. Is there a better way to do it? I am also looking into the use of materialised views as a faster way to use this in a dashboard. Would this work?
The result I am looking to get would be a table where the rows are dates within the last 6 months and each column is the count of distinct users on 7, 30 and 90 periods from that date.
Thanks in advance!

How to find the number of purchases over time intervals SQL

I'm using Redshift (Postgres), and Pandas to do my work. I'm trying to get the number of user actions, lets say purchases to make it easier to understand. I have a table, purchases that holds the following data:
user_id, timestamp , price
1, , 2015-02-01, 200
1, , 2015-02-02, 50
1, , 2015-02-10, 75
ultimately I would like the number of purchases over a certain timestamp. Such as
userid, 28-14_days, 14-7_days, 7
Here is what I have so far, I'm aware I don't have an upper limit on the dates:
SELECT DISTINCT x_days.user_id, SUM(x_days.purchases) AS x_num, SUM(y_days.purchases) AS y_num,
x_days.x_date, y_days.y_date
FROM
(
SELECT purchases.user_id, COUNT(purchases.user_id) as purchases,
DATE(purchases.timestamp) as x_date
FROM purchases
WHERE purchases.timestamp > (current_date - INTERVAL '%(x_days_ago)s day') AND
purchases.max_value > 200
GROUP BY DATE(purchases.timestamp), purchases.user_id
) AS x_days
JOIN
(
SELECT purchases.user_id, COUNT(purchases.user_id) as purchases,
DATE(purchases.timestamp) as y_date
FROM purchases
WHERE purchases.timestamp > (current_date - INTERVAL '%(y_days_ago)s day') AND
purchases.max_value > 200
GROUP BY DATE(purchases.timestamp), purchases.user_id) AS y_days
ON
x_days.user_id = y_days.user_id
GROUP BY
x_days.user_id, x_days.x_date, y_days.y_date
params={'x_days_ago':x_days_ago, 'y_days_ago':y_days_ago}
where these are set in python/pandas
x_days_ago = 14
y_days_ago = 7
But this didn't work out exactly as planned:
user_id x_num y_num x_date y_date
0 5451772 1 1 2015-02-10 2015-02-10
1 5026678 1 1 2015-02-09 2015-02-09
2 6337993 2 1 2015-02-14 2015-02-13
3 6204432 1 3 2015-02-10 2015-02-11
4 3417539 1 1 2015-02-11 2015-02-11
Even though I don't have an upper date to look between (so x is effectively searching from 14 days to now and y is 7 days to now, meaning overlap), in some cases y is higher.
Can anyone help me either fix this or give me a better way?
Thanks!
It might not be the most efficient answer, but you can generate each sum with a sub-select:
WITH
summed AS (
SELECT user_id, day, COUNT(1) AS purchases
FROM (SELECT user_id, DATE(timestamp) AS day FROM purchases) AS _
GROUP BY user_id, day
),
users AS (SELECT DISTINCT user_id FROM purchases)
SELECT user_id,
(SELECT SUM(purchases) FROM summed
WHERE summed.user_id = users.user_id
AND day >= DATE(NOW() - interval ' 7 days')) AS days_7,
(SELECT SUM(purchases) FROM summed
WHERE summed.user_id = users.user_id
AND day >= DATE(NOW() - interval '14 days')) AS days_14
FROM users;
(This was tested in Postgres, not in Redshift; but the Redshift documentation suggests that both WITH and DISTINCT are supported.) I would have liked to do this with a window, to obtain rolling sums; but it's a little onerous without generate_series().

sql - find the number of days a user was using the app

I like to write a sql query that counts the number of days each user used the application and how many concurrent days. A user can enter the app several times a day but that should count as 1.
My table looks like this:
id | bigint
user_id | bigint
action_date | timestamp without time zone
To count the number of days per user:
SELECT user_id, count(DISTINCT action_date::date) AS days
FROM user_action_tbl
GROUP BY user_id;
One way to do it
SELECT user_id, COUNT(*) days_total, SUM(conseq) days_consecutive
FROM
(
SELECT user_id,
CASE WHEN LEAD(date, 1) OVER (PARTITION BY user_id ORDER BY date) - date = 1 THEN 1 ELSE 0 END consecutive
FROM
(
SELECT user_id, action_date::date date
FROM table1
GROUP BY user_id, action_date::date
) q
) p
GROUP BY user_id
Here is a SQLFiddle demo

SQL: List of users that meet a condition N times

I have data that looks something like:
Date UserID Visits
2012-01-01 2 5
...
I would like to output a list of users who have > x visits on at least y dates (e.g., the users who have >5 visits for at least 3 dates from January 3 to January 10).
Try this:
SELECT SUB.UserId, COUNT(*) FROM (
SELECT VL.UserId FROM VisitLog VL
WHERE VL.Visits > 5
AND VL.Date BETWEEN '2014-01-03' AND '2014-01-10') SUB
GROUP BY SUB.UserId
HAVING COUNT(*) >= 3
The sub query returns all rows where the number of Visits > 5 between your sample date range.
The results of this are then counted to return only users where this condition has been matched at least 3 times.
You don't give much information but if you have multiple records per date per user then use this query (exactly the same principal, just an inner grouping to sum by user and date):
SELECT SUB.UserId, COUNT(*) FROM (
SELECT VL.UserId, VL.Date FROM VisitLog VL
WHERE VL.Date BETWEEN '2014-01-03' AND '2014-01-10'
GROUP BY VL.UserId, VL.Date
HAVING SUM(VL.Visits) > 5) SUB
GROUP BY SUB.UserId
HAVING COUNT(*) >= 3
Select UserID,SUM(Visits) from TableOfData
where Date>#HereTheDate
group by UserID
having SUM(Visits)>5
Try this:
select * from users
where id in (
select UserID
from userVisits
where date between '2014-01-03' and '2014-01-10'
and visits >= 5
group by userid
having count(*) >= 3)