SELF JOIN a query to obtain the number of reactivated users - sql

Assume you have the table given below containing information on Facebook user logins. Write a query to obtain the number of reactivated users (which are dormant users who did not log in the previous month, who then logged in during the current month). Output the current month and number of reactivated users.
I have tried this question by first making an inner join combining a user's previous month to current month with this code.
WITH CTE as
(SELECT user_id,
EXTRACT(month from login_date) as current_month,
EXTRACT(month from login_date)-1 as prev_month
FROM user_logins)
SELECT a.user_id as user_id, a.current_month, a.prev_month,
b.user_id as prev_month_user
FROM CTE a LEFT JOIN CTE b
ON a.prev_month = b.current_month
My idea is to use a case statement
CASE WHEN a.user_id IN
(SELECT b.user_id
WHERE b.current_month = a.prev_month)
THEN 0 ELSE 1 END
BUT that is giving me wrong output for user_id 245 in current_month 4.
https://drive.google.com/file/d/1dOQQxaJWv7j7o7M1Q98nlj77KCzIHxKl/view?usp=sharing
How to fix this?

This gets you the first day of the current month:
select date_trunc('month', current_date)
You can add or subtract an interval of one month to get the previous or next month's starting date.
The complete query:
select *
from users
where user_id in
(
select user_id
from user_logins
where login_date >= date_trunc('month', current_date)
and login_date < date_trunc('month', current_date) + interval '1 month'
)
and user_id not in
(
select user_id
from user_logins
where login_date >= date_trunc('month', current_date) - interval '1 month'
and login_date < date_trunc('month', current_date)
)
Well, admittedly
and login_date < date_trunc('month', current_date) + interval '1 month'
is probably unnecessary here, because the table won't contain future logins :-) So, keep it or remove it, as you like.
If you want a self join, you should get distinct user/month pairs first. Then, as you want to get user/month pairs for which not exists a user/month-1 pair (and for which NOT EXISTS would be appropriate) your join must be an anti join. This means you outer join the user/month-1 pair and only keep the outer joined rows, i.e. the non-matches.
WITH cte AS
(
SELECT DISTINCT user_id, DATE_TRUNC('month', login_date) AS month
FROM user_logins
)
SELECT mon.month, mon.user_id
FROM cte mon
LEFT JOIN cte prev ON prev.user_id = mon.user_id
AND prev.month = mon.month - INTERVAL '1 month'
WHERE prev.month IS NULL -- anti join
ORDER BY mon.month, mon.user_id;
I don't find anti joins very readable and would use NOT EXISTS instead. But that's a matter of personal preference, I guess. The query gives you all users who logged in a month, but not the previous month. You can of course limit this to the cutrent month. Or you can aggregate per month and count. Or remove the WHERE clause and count repeating users vs. new ones (COUNT(*) = all that month, COUNT(prev.month) = all repeating users, COUNT(*) - COUNT(prev.month) = all new users).
Well having said this, ... wasn't the task about reactivated users? Then you are looking for users who were active once, then paused a month, then became active again. Here is a simple query to get this for users who paused last month:
select user_id
from user_logins
group by user_id
having min(login_date) < date_trunc('month', current_date) - interval '1 month'
and max(login_date) >= date_trunc('month', current_date)
and count(*) filter (where login_date >= date_trunc('month', current_date) - interval '1 month'
and login_date < date_trunc('month', current_date)) = 0;

Related

Can we use dynamic SQL or loops to automate this process?

I have a base dataset that is updated monthly. This contains information about employees such as Employer ID. I would like to create a table where we can see the leavers and joiners for each month.
The logic for this is as follows: if employee ID appears in latest month but not prior, then it is a joiner. If ID appears in prior but not latest, then it is a leaver.
The base data is appended and we also have a date variable, so I am able to produce a table of joiners/leavers with either CTEs or CREATE TABLE by specifying date(s) in where clause and merging.
I was wondering whether there was a way I could do this without manually creating multiple tables/CTES ? I.E. something that repeats the logic for a date range.
Aware it’s fairly simple to do in other coding languages but not sure how to go about it in SQL. Any help is greatly appreciated.
Self-join the table. Same employee, adjancent months. I am multiplying a year be twelve and add the month, so as to get a continues month numbering (e.g. 12/2020 = 24252, 01/2021 = 24253). I am using a full outer join and only keep the outer joined rows, thus getting the leavers and the joiners.
select
extract(year from coalesce(m_next.date, date_trunc('month', m_prev.date) + interval '1 month')) as year,
extract(month from coalesce(m_next.date, date_trunc('month', m_prev.date) + interval '1 month')) as month,
count(m_next.date) as joiners,
count(m_prev.date) as leavers
from mytable m_next
full outer join mytable m_prev
on m_prev.employee_id = m_next.employee_id
and extract(year from m_prev.date) * 12 + extract(month from m_prev.date) =
extract(year from m_next.date) * 12 + extract(month from m_next.date) - 1
where m_next.date is null or m_prev.date is null
group by
extract(year from coalesce(m_next.date, date_trunc('month', m_prev.date) + interval '1 month')),
extract(month from coalesce(m_next.date, date_trunc('month', m_prev.date) + interval '1 month'))
order by
extract(year from coalesce(m_next.date, date_trunc('month', m_prev.date) + interval '1 month')),
extract(month from coalesce(m_next.date, date_trunc('month', m_prev.date) + interval '1 month'));
Demo: https://dbfiddle.uk/?rdbms=postgres_14&fiddle=1c66b00a71d484cd3951baa0956ace63

Returning customers

Let' say I have a table web.orders with two columns:
user_id (integer) unique identifier of the user
created_at (integer) timestamp in epoch
And I want to know how many returning customers are there in the last 7 days or the previous month.
So first of all I guess I have list all the unique user_id -s from the last 7 days, then search everyone of them in the earlier part of the table. And then summarise the hits.
I examined this two questions on the subject, but had no luck to convert them to work for me:
Find number of repeating visitors in a month - PostgreSQL
PostgreSQL: Identifying return visitors based on date - joins or window functions?
Please anyone have a solution for this?
You could use exists:
select count(distinct o.user_id) no_returning_visitors
from web.orders o
where
created_at >= extract(epoch from date_trunc('month', current_date) - interval '7' day)
and created_at < extract(epoch from date_trunc('month', current_date))
and exists (
select 1
from web.orders o1
where
o1.user_id = o.user_id
and o1created_at < extract(epoch from date_trunc('month', current_date) - interval '7' day)
)

PostgreSQL generate_series with WHERE clause

I'm having an issue generating a series of dates and then returning the COUNT of rows matching that each date in the series.
SELECT generate_series(current_date - interval '30 days', current_date, '1 day':: interval) AS i, COUNT(*)
FROM download
WHERE product_uuid = 'someUUID'
AND created_at = i
GROUP BY created_at::date
ORDER BY created_at::date ASC
I want the output to be the number of rows that match the current date in the series.
05-05-2018, 35
05-06-2018, 23
05-07-2018, 0
05-08-2018, 10
...
The schema has the following columns: id, product_uuid, created_at. Any help would be greatly appreciated. I can add more detail if needed.
Put the table generating function in the from and use a join:
SELECT g.dte, COUNT(d.product_uuid)
FROM generate_series(current_date - interval '30 days', current_date, '1 day':: interval
) gs(dte) left join
download d
on d.product_uuid = 'someUUID' AND
d.created_at::date = g.dte
GROUP BY g.dte
ORDER BY g.dte;

Generate series of months in a column at postgresql

In postgresql, how can I generate a series of monthly dates by the format 'YYYY-MM', with the oldest being the creation month of the user up to the current month?
something like :
select to_char(dt, 'YYYY-MM')
from generate_series(
date_trunc('month', (select created_at::date from users where id=1234)),
now(),
'1 month'::interval) dt;
You can even do it in a single query level:
SELECT to_char(generate_series(created_at::date
,now(), interval '1 mon'), 'YYYY-MM') AS month
FROM users
WHERE users_id = 123 -- users_id is unique

Include count 0 in my group by request

I have a COUNT + GROUP BY request for postgresql.
SELECT date_trunc('day', created_at) AS "Day" ,
count(*) AS "No. of actions"
FROM events
WHERE created_at > now() - interval '2 weeks'
AND "events"."application_id" = 7
AND ("what" LIKE 'ACTION%')
GROUP BY 1
ORDER BY 1
My request counts the number of "ACTION*" per day on my events table (a log table) in 2weeks for my application with the id 7. But the problem is it doesn't show when there is a Day without any actions recorded.
I know it is because of my WHERE clause, so I tried some stuff with JOIN requests, but nothing gave me the good answer.
Thank you for your help
Make a date table:
CREATE TABLE "myDates" (
"DateValue" date NOT NULL
);
INSERT INTO "myDates" ("DateValue")
select to_date('20000101', 'YYYYMMDD') + s.a as dates from generate_series(0,36524,1) as s(a);
Then left join on it:
SELECT d.DateValue AS "Day" ,
count(*) AS "No. of actions"
FROM myDates d left join events e on date_trunc('day', "events"."created_at") = d.DateValue
WHERE created_at > now() - interval '2 weeks' AND
"events"."application_id" = 7 AND
("what" LIKE 'ACTION%')
GROUP BY 1 ORDER BY 1
Ok a friend helped me, here is the answer:
SELECT "myDates"."DateValue" AS "Day" ,
(select count(*) from events WHERE date_trunc('day', "events"."created_at") = "myDates"."DateValue" AND
("events"."application_id" = 4) AND
("events"."what" LIKE 'ACTION%')) AS "No. of actions"
FROM "myDates"
where ("myDates"."DateValue" > now() - interval '2 weeks') AND ("myDates"."DateValue" < now())
So we need to ask all the date from the MyDates table, and ask the count on the second argument.