Let' say I have a table web.orders with two columns:
user_id (integer) unique identifier of the user
created_at (integer) timestamp in epoch
And I want to know how many returning customers are there in the last 7 days or the previous month.
So first of all I guess I have list all the unique user_id -s from the last 7 days, then search everyone of them in the earlier part of the table. And then summarise the hits.
I examined this two questions on the subject, but had no luck to convert them to work for me:
Find number of repeating visitors in a month - PostgreSQL
PostgreSQL: Identifying return visitors based on date - joins or window functions?
Please anyone have a solution for this?
You could use exists:
select count(distinct o.user_id) no_returning_visitors
from web.orders o
where
created_at >= extract(epoch from date_trunc('month', current_date) - interval '7' day)
and created_at < extract(epoch from date_trunc('month', current_date))
and exists (
select 1
from web.orders o1
where
o1.user_id = o.user_id
and o1created_at < extract(epoch from date_trunc('month', current_date) - interval '7' day)
)
Related
Assume you have the table given below containing information on Facebook user logins. Write a query to obtain the number of reactivated users (which are dormant users who did not log in the previous month, who then logged in during the current month). Output the current month and number of reactivated users.
I have tried this question by first making an inner join combining a user's previous month to current month with this code.
WITH CTE as
(SELECT user_id,
EXTRACT(month from login_date) as current_month,
EXTRACT(month from login_date)-1 as prev_month
FROM user_logins)
SELECT a.user_id as user_id, a.current_month, a.prev_month,
b.user_id as prev_month_user
FROM CTE a LEFT JOIN CTE b
ON a.prev_month = b.current_month
My idea is to use a case statement
CASE WHEN a.user_id IN
(SELECT b.user_id
WHERE b.current_month = a.prev_month)
THEN 0 ELSE 1 END
BUT that is giving me wrong output for user_id 245 in current_month 4.
https://drive.google.com/file/d/1dOQQxaJWv7j7o7M1Q98nlj77KCzIHxKl/view?usp=sharing
How to fix this?
This gets you the first day of the current month:
select date_trunc('month', current_date)
You can add or subtract an interval of one month to get the previous or next month's starting date.
The complete query:
select *
from users
where user_id in
(
select user_id
from user_logins
where login_date >= date_trunc('month', current_date)
and login_date < date_trunc('month', current_date) + interval '1 month'
)
and user_id not in
(
select user_id
from user_logins
where login_date >= date_trunc('month', current_date) - interval '1 month'
and login_date < date_trunc('month', current_date)
)
Well, admittedly
and login_date < date_trunc('month', current_date) + interval '1 month'
is probably unnecessary here, because the table won't contain future logins :-) So, keep it or remove it, as you like.
If you want a self join, you should get distinct user/month pairs first. Then, as you want to get user/month pairs for which not exists a user/month-1 pair (and for which NOT EXISTS would be appropriate) your join must be an anti join. This means you outer join the user/month-1 pair and only keep the outer joined rows, i.e. the non-matches.
WITH cte AS
(
SELECT DISTINCT user_id, DATE_TRUNC('month', login_date) AS month
FROM user_logins
)
SELECT mon.month, mon.user_id
FROM cte mon
LEFT JOIN cte prev ON prev.user_id = mon.user_id
AND prev.month = mon.month - INTERVAL '1 month'
WHERE prev.month IS NULL -- anti join
ORDER BY mon.month, mon.user_id;
I don't find anti joins very readable and would use NOT EXISTS instead. But that's a matter of personal preference, I guess. The query gives you all users who logged in a month, but not the previous month. You can of course limit this to the cutrent month. Or you can aggregate per month and count. Or remove the WHERE clause and count repeating users vs. new ones (COUNT(*) = all that month, COUNT(prev.month) = all repeating users, COUNT(*) - COUNT(prev.month) = all new users).
Well having said this, ... wasn't the task about reactivated users? Then you are looking for users who were active once, then paused a month, then became active again. Here is a simple query to get this for users who paused last month:
select user_id
from user_logins
group by user_id
having min(login_date) < date_trunc('month', current_date) - interval '1 month'
and max(login_date) >= date_trunc('month', current_date)
and count(*) filter (where login_date >= date_trunc('month', current_date) - interval '1 month'
and login_date < date_trunc('month', current_date)) = 0;
I have a quest that is about doing a statistic of the sales per day in the last 30 day...i've found a way to only show the last month:
SELECT *
FROM purchase
WHERE date >= date('01-05-2021', current_date - interval '1 month')
and date < date('01-05-2021', current_date)
the columns in purchase are just id, value, date, cashier and store id what do you think is the best way to do this?
i have this and i don't know way it is not working...i'm new in postgresql so please don't be offended by this
Group by date and use sum to find the total.
select date,sum(value)
from purchase
where date between current_date - interval '1 month' and current_date - 1
group by date
My database table looks like this:
CREATE TABLE record
(
id INT,
status INT,
created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (id)
);
And I want to create a generic query to get count of record created after 3 hours of interval in last day
For example, I want to know in last 1 day after 3 hours how many records are created.
What I have so far: with a little help from stackoverflow I am able to create a query to calculate the count for a single full day.
SELECT
DATE(created_at) AS day, COUNT(1)
FROM
record
WHERE
created_at >= current_date - 1
GROUP BY
DATE(created_at)
This is telling me in full day like 24 records are created but I want to get how many are made in interval of 3 hours
If you want the count for the last three hours of data:
select count(*)
from record
where created_at >= now() - interval '3 hour';
If you want the last day minus 3 hours, that would be 21 hours:
select count(*)
from record
where created_at >= now() - interval '21 hour';
EDIT:
You want intervals of 3 hours for the last 24 hours. The simplest method is probably generate_series():
select gs.ts, count(r.created_at)
from generate_series(now() - interval '24 hour', now() - interval '3 hour', interval '3 hour') gs(ts) left join
record r
on r.created_at >= gs.ts and
r.created_at < gs.ts + interval '3 hour'
group by gs.ts
order by gs.ts;
I'm trying to grab people out of a table who have an abandon date between 20 minutes ago and 2 hours ago. This seems to grab the right amount of time, but is all 4 hours old:
SELECT *
FROM $A$
WHERE ABANDONDATE >= SYSDATE - INTERVAL '2' HOUR
AND ABANDONDATE < SYSDATE - INTERVAL '20' MINUTE
AND EMAIL_ADDRESS_ NOT IN(SELECT EMAIL_ADDRESS_ FROM $B$ WHERE ORDERDATE >= sysdate - 4)
also, it grabs every record for everyone and I only want the most recent product abandoned (highest abandondate) for each email address. I can't seem to figure this one out.
If the results are EXACTLY four hours old, it is possible that there is a time zone mismatch. What is the EXACT data type of ABANDONDATE in your database? Perhaps TIMESTAMP WITH TIMEZONE? Four hours seems like the difference between UTC and EDT (Eastern U.S. with daylight savings time offset).
For your other question, did you EXPECT your query to only pick up the most recent product abandoned? Which part of your query would do that? Instead, you need to add row_number() over (partition by [whatever identifies clients etc.] order by abandondate), make the resulting query into a subquery and wrap it within an outer query where you filter by (WHERE clause) rn = 1. We can help with this if you show us the table structure (name and data type of columns in the table - only the relevant columns - including which is or are Primary Key).
Try
SELECT * FROM (
SELECT t.*,
row_number()
over (PARTITION BY email_address__ ORDER BY ABANDONDATE DESC) As RN
FROM $A$ t
WHERE ABANDONDATE >= SYSDATE - INTERVAL '2' HOUR
AND ABANDONDATE < SYSDATE - INTERVAL '20' MINUTE
AND EMAIL_ADDRESS_ NOT IN(
SELECT EMAIL_ADDRESS_ FROM $B$
WHERE ORDERDATE >= sysdate - 4)
)
WHERE rn = 1
another approach
SELECT *
FROM $A$
WHERE (EMAIL_ADDRESS_, ABANDONDATE) IN (
SELECT EMAIL_ADDRESS_, MAX( ABANDONDATE )
FROM $A$
WHERE ABANDONDATE >= SYSDATE - INTERVAL '2' HOUR
AND ABANDONDATE < SYSDATE - INTERVAL '20' MINUTE
AND EMAIL_ADDRESS_ NOT IN(
SELECT EMAIL_ADDRESS_ FROM $B$
WHERE ORDERDATE >= sysdate - 4)
GROUP BY EMAIL_ADDRESS_
)
In postgresql, how can I generate a series of monthly dates by the format 'YYYY-MM', with the oldest being the creation month of the user up to the current month?
something like :
select to_char(dt, 'YYYY-MM')
from generate_series(
date_trunc('month', (select created_at::date from users where id=1234)),
now(),
'1 month'::interval) dt;
You can even do it in a single query level:
SELECT to_char(generate_series(created_at::date
,now(), interval '1 mon'), 'YYYY-MM') AS month
FROM users
WHERE users_id = 123 -- users_id is unique