how to find consecutive user login across week - sql

I'm fairly new to SQL & maybe the complexity level for this report is above my pay grade
I need help to figure out the list of users who are logging to the app consecutively every week in the time period chosen(this logic eventually needs to be extended to a month, quarter & year ultimately but a week is good for now)
Table structure for ref
events: User_id int, login_date timestamp
The table events can have 1 or more entries for a user. This inherently means that the user can login multiple times to the app. To shed some light, if we focus on Jan 2020- Mar2020 then I need the following in the output
user_id who logged into the app every week from 2020wk1 to 2020Wk14
at least once
the week they logged in
number of times they logged in that week
I'm also okay if the output of the query is just the user_id. The thing is I'm unable to make sense out of the output that I'm seeing on my end after trying the following SQL code, perhaps working on this problem for so long might be the reason for that!
SQL code tried so far:
SELECT DISTINCT user_id
,extract('year' FROM timestamp)||'Wk'|| extract('week' FROM timestamp)
,lead(extract('week' FROM timestamp)) over (partition by user_id, extract('week' FROM timestamp) order by extract('week' FROM timestamp))
FROM events
WHERE user_id = 'Anything that u wish to enter'

You can get the summary you want as:
select user_id, date_trunc('week', timestamp) as week, count(*)
from events
group by user_id, week;
But the filtering is tricker. It is better to go with dates rather than week numbers:
select user_id, date_trunc('week', timestamp) as week, count(*) as cnt,
count(*) over (partition by user_id) as num_weeks
from events
where timestamp >= ? and timestamp < ?
group by user_id, week;
Then you can use a subquery:
select uw.*
from (select user_id, date_trunc('week', timestamp) as week, count(*) as cnt,
count(*) over (partition by user_id) as num_weeks
from events
where timestamp >= ? and timestamp < ?
group by user_id, week
) uw
where num_weeks = ? -- 14 in your example

Related

SQL list number of seen occurrences per month higher than 15

I'm not an SQL expert, so I'm requesting your help to list the MACs that apear more than 15 days in a month.
I made the following query, but is very complex and most probably not efficient. Any suggestions on how to make it simpler and efficient?
I'm using Google BigQuery, if that helps.
SELECT
macDays.macAddress AS macAddress,
macDays.days AS days
FROM (
SELECT
list_mac.macAddress AS macAddress,
COUNT( list_mac.macAddress) AS days
FROM (
SELECT
macAddress,
TIMESTAMP_TRUNC(time, DAY) date,
FROM
`my_table`
WHERE
time BETWEEN '2021-06-01 00:00:00'
AND '2021-06-30 23:59:00.000059'
GROUP BY
macAddress,
date
ORDER BY
macAddress) AS list_mac
GROUP BY
macAddress ) AS macDays
WHERE
macDays.days > 15
GROUP BY
macAddress,
days
The problem is that you are stripping off the time component from the date in your SELECT but grouping with the time portion left in, so you will get one row for every appearance rather than one for every day.
You can probably get rid of the inner subquery by using COUNT(DISTINCT field).
Try something like:
SELECT
macAddress AS macAddress,
COUNT(DISTINCT TIMESTAMP_TRUNC(time, DAY)) AS days
FROM
`my_table`
WHERE
time BETWEEN '2021-06-01 00:00:00'
AND '2021-06-30 23:59:00.000059'
GROUP BY
macAddress
HAVING
COUNT(DISTINCT TIMESTAMP_TRUNC(time, DAY)) > 15
ORDER BY
macAddress
You can do this by using a subquery. It will calculate how many times a MAC exist in a day. Then it will pick only those appeared more than 15times in a month.
I have not used any filter so you can add filter as and when needed. If you need how many times MACs appear in database in a single day, you can use dt as group by. And if you want how many total MACs exists in whole month, just remove distinct.
SELECT COUNT(*) cnt,
MAC,
mnth
FROM
(SELECT DISTINCT -- This will select only unique MACs on a day
macAddress,
TIMESTAMP_TRUNC(TIME, DAY) dt,
TIMESTAMP_TRUNC(TIME, MONTH) mnth,
FROM `my_table`) q
GROUP BY Mnth
HAVING COUNT(*)>15
would go with HAVING, which performs filtering to columns aggregated via group by:
select substr(time, 0, 8) yrmonth, macAddress, count(*) macDays
from my_table
where yrmonth = '2021-06'
group by macAddress, substr(time, 0, 8)
having count(*) >= 15
order by yrmonth desc
have not tried for GoogleBigQuery, here is the example on SQLite: SQL Fiddler

Daily Rolling Count of Distinct Users on Different time periods

I am trying to find the most optimal way to run the following query which I need to connect to tableau and visualise. The idea is to count 7 day active users, 30 day active users and 90 day active users for each day. So for today I want who was active and for yesterday and I want who was active within those timeframes.
To clarify users can be active multiple times within my time frames.
A count of 7 day actives users would be the distinct number of users who had a session with in the period todays date and todays date -6. I need to calculate this for every date within the last 6 month.
This is the query I have.
with dau as (
select date_trunc('day', created_date) created_at,
count(distinct customer_id) dau
from sessions
where created_date >= date_trunc('day', dateadd('month', -6, getdate()))
group by date_trunc('day', created_date)
)
select created_at,
dau,
(select count(distinct customer_id)
from sessions
where date_trunc('day', created_date) between created_at - 6 and created_at) wau,
(select count(distinct customer_id)
from sessions
where date_trunc('day', created_date) between created_at - 29 and created_at) as mau,
(select count(distinct customer_id)
from session_s
where date_trunc('day', created_date) between created_at - 89 and created_at) as three_mau
from dau
It takes 30 min to run which seems crazy. Is there a better way to do it? I am also looking into the use of materialised views as a faster way to use this in a dashboard. Would this work?
The result I am looking to get would be a table where the rows are dates within the last 6 months and each column is the count of distinct users on 7, 30 and 90 periods from that date.
Thanks in advance!

Month over Month percent change in user registrations

I am trying to write a query to find month over month percent change in user registration. \
Users table has the logs for user registrations
user_id - pk, integer
created_at - account created date, varchar
activated_at - account activated date, varchar
state - active or pending, varchar
I found the number of users for each year and month. How do I find month over month percent change in user registration? I think I need a window function?
SELECT
EXTRACT(month from created_at::timestamp) as created_month
,EXTRACT(year from created_at::timestamp) as created_year
,count(distinct user_id) as number_of_registration
FROM users
GROUP BY 1,2
ORDER BY 1,2
This is the output of above query:
Then I wrote this to find the difference in user registration in the previous year.
SELECT
*
,number_of_registration - lag(number_of_registration) over (partition by created_month) as difference_in_previous_year
FROM (
SELECT
EXTRACT(month from created_at::timestamp) as created_month
,EXTRACT(year from created_at::timestamp) as created_year
,count( user_id) as number_of_registration
FROM users as u
GROUP BY 1,2
ORDER BY 1,2) as temp
The output is this:
You want an order by clause that contains created_year.
number_of_registration
- lag(number_of_registration) over (partition by created_month order by created_year) as difference_in_previous_year
Note that you don't actually need a subquery for this. You can do:
select
extract(year from created_at) as created_year,
extract(month from created_at) as created_year
count(*) as number_of_registration,
count(*) - lag(count(*)) over(partition by extract(month from created_at) order by extract(year from created_at))
from users as u
group by created_year, created_month
order by created_year, created_month
I used count(*) instead of count(user_id), because I assume that user_id is not nullable (in which case count(*) is equivalent, and more efficient). Casting to a timestamp is also probably superfluous.
These queries work as long as you have data for every month. If you have gaps, then the problem should be addressed differently - but this is not the question you asked here.
I can get the registrations from each year as two tables and join them. But it is not that effective
SELECT
t1.created_year as year_2013
,t2.created_year as year_2014
,t1.created_month as month_of_year
,t1.number_of_registration_2013
,t2.number_of_registration_2014
,(t2.number_of_registration_2014 - t1.number_of_registration_2013) / t1.number_of_registration_2013 * 100 as percent_change_in_previous_year_month
FROM
(select
extract(year from created_at) as created_year
,extract(month from created_at) as created_month
,count(*) as number_of_registration_2013
from users
where extract(year from created_at) = '2013'
group by 1,2) t1
inner join
(select
extract(year from created_at) as created_year
,extract(month from created_at) as created_month
,count(*) as number_of_registration_2014
from users
where extract(year from created_at) = '2014'
group by 1,2) t2
on t1.created_month = t2.created_month
First off, Why are you using strings to hold date/time values? Your 1st step should to define created_at, activated_at as a proper timestamps. In the resulting query I assume this correction. If this is faulty (you do not correct it) then cast the string to timestamp in the CTE generating the date range. But keep in mind that if you leave it as text you will at some point get a conversion exception.
To calculate month-over-month use the formula "100*(Nt - Nl)/Nl" where Nt is the number of users this month and Nl is the number of users last month. There are 2 potential issues:
There are gaps in the data.
Nl is 0 (would incur divide by 0 exception)
The following handles this by first generating the months between the earliest date to the latest date then outer joining monthly counts to the generated dates. When Nl = 0 the query returns NULL indication the percent change could not be calculated.
with full_range(the_month) as
(select generate_series(low_month, high_month, interval '1 month')
from (select min(date_trunc('month',created_at)) low_month
, max(date_trunc('month',created_at)) high_month
from users
) m
)
select to_char(the_month,'yyyy-mm')
, users_this_month
, case when users_last_month = 0
then null::float
else round((100.00*(users_this_month-users_last_month)/users_last_month),2)
end percent_change
from (
select the_month, users_this_month , lag(users_this_month) over(order by the_month) users_last_month
from ( select f.the_month, count(u.created_at) users_this_month
from full_range f
left join users u on date_trunc('month',u.created_at) = f.the_month
group by f.the_month
) mc
) pc
order by the_month;
NOTE: There are several places there the above can be shortened. But the longer form is intentional to show how the final vales are derived.

Fetch records from current week only

I made a discord bot where I record contributions/posts of members in a database, in the following table.
CREATE TABLE IF NOT EXISTS posts (
id integer GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY,
...
post_date timestamptz NOT NULL DEFAULT CURRENT_TIMESTAMP
);
At the end of each week I'd manually run a command to fetch the member who had more contributions, so I could give him/her an award. To do that I created the following view table, which worked fine since it was for personal use only and I did it at the same date every week.
CREATE VIEW vw_posts AS
SELECT guild_id, account_id, COUNT(*) AS posts
FROM public.posts
WHERE post_date > CURRENT_TIMESTAMP - INTERVAL '1 week'
GROUP BY(guild_id, account_id)
ORDER BY posts DESC;
Now I'm doing a new command to show a weekly leaderboard. So after creating the command I quickly realized that my view is fetching in a 7-days interval rather than fetching the current week, so it's fetching data from the previous week.
I'm getting results like the red line, but I'd like the view to act as the green one.
I did a bit of research but most posts would suggest using date_trunc() or functions alike that wouldn't let me get the rest of the data, I'm definitely struggling to do the query even after reading the documentation.
Thanks for any advice!
For the current week use:
CREATE VIEW vw_posts AS
SELECT guild_id, account_id, COUNT(*) AS posts
FROM public.posts
WHERE post_date >= DATE_TRUNC('week', CURRENT_TIMESTAMP)
GROUP BY guild_id, account_id
ORDER BY posts DESC;
For the previous week:
CREATE VIEW vw_posts AS
SELECT guild_id, account_id, COUNT(*) AS posts
FROM public.posts
WHERE post_date >= DATE_TRUNC('week', CURRENT_TIMESTAMP) - INTERVAL '1 week' AND
post_date < DATE_TRUNC('week', CURRENT_TIMESTAMP)
GROUP BY guild_id, account_id
ORDER BY posts DESC;

how to perform query in Postresql that returns a data count created grouped by month?

In postgresql, how do I perform a query that returns the sum amounts of rows created of a particular table by month? I would like the result to be something like:
month: January
count: 67
month: February
count: 85
....
....
Let's suppose a I have a table, users. This table has a primary key, id, and a created_at column with time stored in ISO8601 formatting. Last year n number of users were created, and now I want to know how many were created by month, and I want the data returned to me in the above format -- grouped by month and an associated count reflecting how many users were created that month.
Does anyone know how to perform the above SQL query in postgresql?
The query would look something like this:
select date_trunc('month', created_at) as mm, count(*)
from users u
where subscribed = true and
created_at >= '2016-01-01' and
created_at < '2017-01-01'
group by date_trunc('month', created_at);
I don't know where the constant '2017-03-20 13:38:46.688-04' is coming from.
Of course you can make the year comparison dynamic:
select date_trunc('month', created_at) as mm, count(*)
from users u
where subscribed = true and
created_at >= date_trunc('year', now()) - interval '1 year' and
created_at < date_trunc('year', now())
group by date_trunc('month', created_at);