In postgresql, how can I generate a series of monthly dates by the format 'YYYY-MM', with the oldest being the creation month of the user up to the current month?
something like :
select to_char(dt, 'YYYY-MM')
from generate_series(
date_trunc('month', (select created_at::date from users where id=1234)),
now(),
'1 month'::interval) dt;
You can even do it in a single query level:
SELECT to_char(generate_series(created_at::date
,now(), interval '1 mon'), 'YYYY-MM') AS month
FROM users
WHERE users_id = 123 -- users_id is unique
Related
Assume you have the table given below containing information on Facebook user logins. Write a query to obtain the number of reactivated users (which are dormant users who did not log in the previous month, who then logged in during the current month). Output the current month and number of reactivated users.
I have tried this question by first making an inner join combining a user's previous month to current month with this code.
WITH CTE as
(SELECT user_id,
EXTRACT(month from login_date) as current_month,
EXTRACT(month from login_date)-1 as prev_month
FROM user_logins)
SELECT a.user_id as user_id, a.current_month, a.prev_month,
b.user_id as prev_month_user
FROM CTE a LEFT JOIN CTE b
ON a.prev_month = b.current_month
My idea is to use a case statement
CASE WHEN a.user_id IN
(SELECT b.user_id
WHERE b.current_month = a.prev_month)
THEN 0 ELSE 1 END
BUT that is giving me wrong output for user_id 245 in current_month 4.
https://drive.google.com/file/d/1dOQQxaJWv7j7o7M1Q98nlj77KCzIHxKl/view?usp=sharing
How to fix this?
This gets you the first day of the current month:
select date_trunc('month', current_date)
You can add or subtract an interval of one month to get the previous or next month's starting date.
The complete query:
select *
from users
where user_id in
(
select user_id
from user_logins
where login_date >= date_trunc('month', current_date)
and login_date < date_trunc('month', current_date) + interval '1 month'
)
and user_id not in
(
select user_id
from user_logins
where login_date >= date_trunc('month', current_date) - interval '1 month'
and login_date < date_trunc('month', current_date)
)
Well, admittedly
and login_date < date_trunc('month', current_date) + interval '1 month'
is probably unnecessary here, because the table won't contain future logins :-) So, keep it or remove it, as you like.
If you want a self join, you should get distinct user/month pairs first. Then, as you want to get user/month pairs for which not exists a user/month-1 pair (and for which NOT EXISTS would be appropriate) your join must be an anti join. This means you outer join the user/month-1 pair and only keep the outer joined rows, i.e. the non-matches.
WITH cte AS
(
SELECT DISTINCT user_id, DATE_TRUNC('month', login_date) AS month
FROM user_logins
)
SELECT mon.month, mon.user_id
FROM cte mon
LEFT JOIN cte prev ON prev.user_id = mon.user_id
AND prev.month = mon.month - INTERVAL '1 month'
WHERE prev.month IS NULL -- anti join
ORDER BY mon.month, mon.user_id;
I don't find anti joins very readable and would use NOT EXISTS instead. But that's a matter of personal preference, I guess. The query gives you all users who logged in a month, but not the previous month. You can of course limit this to the cutrent month. Or you can aggregate per month and count. Or remove the WHERE clause and count repeating users vs. new ones (COUNT(*) = all that month, COUNT(prev.month) = all repeating users, COUNT(*) - COUNT(prev.month) = all new users).
Well having said this, ... wasn't the task about reactivated users? Then you are looking for users who were active once, then paused a month, then became active again. Here is a simple query to get this for users who paused last month:
select user_id
from user_logins
group by user_id
having min(login_date) < date_trunc('month', current_date) - interval '1 month'
and max(login_date) >= date_trunc('month', current_date)
and count(*) filter (where login_date >= date_trunc('month', current_date) - interval '1 month'
and login_date < date_trunc('month', current_date)) = 0;
I'm trying to perform a query which I can't figure out how to write. I have a claims table
claims table:
- id
- date_received
I want to grab the new claim count for each day for over 30 days. The naive solution I was trying to come up with was this.
select count(*)
from claims
where date_received
BETWEEN CURRENT_DATE - interval '1 month'
AND CURRENT_DATE
group by date_received;
This works but it groups by exact timestamp and not same day. How could I make it so it groups by same day?
EDIT
I was able to figure it our updated query:
select date_received::date as date, count(id) as new_claims
from claims
where date_received BETWEEN CURRENT_DATE - interval '1 month'
AND CURRENT_DATE
group by date_received::date
order by date;
Presumably, claims arrive in the past, not the future, so you would want:
select date_received::date, count(*)
from claims
where date_received >= CURRENT_DATE - interval '1 month' and
date_received < CURRENT_DATE
group by date_received::date
order by date_received::date;
Building on Gordon's answer, you can use date_trunc(precision, timestamp) to get what you want:
select date_received::date, count(*)
from claims
where date_received >=
date_trunc('day', CURRENT_DATE - interval '1 month')
and date_received < CURRENT_DATE
group by date_received::date
order by date_received::date;
See date_trunc() for details.
Suppose my date range is current: 01-jan-2015 to 17-Feb-2015.
I need the data for 01-jan-2014 to 17-02-2014 also.
how to write the query in sql?
i want to get the data for these CYD and pYD.
Here is an example in Oracle :
select sysdate - interval '1' year from dual;
The key word 'interval' also exist in other SQL language, like postgres.
To select those ranges of dates from, say, a calendar table, I'd probably write this in standard SQL.
select cal_date from calendar
where cal_date between date '2015-01-01'
and current_date
or cal_date between date '2015-01-01' - interval '1' year
and current_date - interval '1' year
order by cal_date;
If I wanted to identify rows in each range, I'd add literal values.
select 'cyd', cal_date
from calendar
where cal_date between date '2015-01-01'
and current_date
union all
select 'pyd', cal_date
from calendar
where cal_date between date '2015-01-01' - interval '1' year
and current_date - interval '1' year
order by cal_date;
I have a table as follow:
CREATE TABLE counts
(
T TIMESTAMP NOT NULL,
C INTEGER NOT NULL
);
I create the following views from it:
CREATE VIEW micounts AS
SELECT DATE_TRUNC('minute',t) AS t,SUM(c) AS c FROM counts GROUP BY 1;
CREATE VIEW hrcounts AS
SELECT DATE_TRUNC('hour',t) AS t,SUM(c) AS c,SUM(c)/60 AS a
FROM micounts GROUP BY 1;
CREATE VIEW dycounts AS
SELECT DATE_TRUNC('day',t) AS t,SUM(c) AS c,SUM(c)/24 AS a
FROM hrcounts GROUP BY 1;
The problem now comes in when I want to create the monthly counts to know what to divide the daily sums by to get the average column a i.e. the number of days in the specific month.
I know to get the days in PostgreSQL you can do:
SELECT DATE_PART('days',DATE_TRUNC('month',now())+'1 MONTH'::INTERVAL-DATE_TRUNC('month',now()))
But I can't use now(), I have to somehow let it know what the month is when the grouping gets done. Any suggestions i.e. what should replace ??? in this view:
CREATE VIEW mocounts AS
SELECT DATE_TRUNC('month',t) AS t,SUM(c) AS c,SUM(c)/(???) AS a
FROM dycounts
GROUP BY 1;
A bit shorter and faster and you get the number of days instead of an interval:
SELECT EXTRACT(day FROM date_trunc('month', now()) + interval '1 month'
- interval '1 day')
It's possible to combine multiple units in a single interval value . So we can use '1 mon - 1 day':
SELECT EXTRACT(day FROM date_trunc('month', now()) + interval '1 mon - 1 day')
(mon, month or months work all the same for month units.)
To divide the daily sum by the number of days in the current month (orig. question):
SELECT t::date AS the_date
, SUM(c) AS c
, SUM(c) / EXTRACT(day FROM date_trunc('month', t::date)
+ interval '1 mon - 1 day') AS a
FROM dycounts
GROUP BY 1;
To divide monthly sum by the number of days in the current month (updated question):
SELECT DATE_TRUNC('month', t)::date AS t
,SUM(c) AS c
,SUM(c) / EXTRACT(day FROM date_trunc('month', t)::date
+ interval '1 mon - 1 day') AS a
FROM dycounts
GROUP BY 1;
You have to repeat the GROUP BY expression if you want to use a single query level.
Or use a subquery:
SELECT *, c / EXTRACT(day FROM t + interval '1 mon - 1 day') AS a
FROM (
SELECT date_trunc('month', t)::date AS t, SUM(c) AS c
FROM dycounts
GROUP BY 1
) sub;
I'm trying to generate a series of monthly dates from a starting date, which happens to be the date of the oldest user in my users table.
Whilst I can select some dates quite easily;
SELECT generate_series(
now(),
now() + '5 months'::interval,
'1 month'::interval);
and can select the date I need to start at:
SELECT to_date( to_char(CAST(min(created_at) AS DATE),'yyyy-MM') || '-01','yyyy-mm-dd') from users
How can I combine the two so that I'm selecting every month up until now?
Turns out, it can be even simpler. :)
SELECT generate_series(
date_trunc('year', min(created_at))
, now()
, interval '1 month') AS month;
FROM users;
More about date_trunc in the manual.
Or, if you actually want the data type date instead of timestamp with time zone:
SELECT generate_series(
date_trunc('year', min(created_at))
, now()
, interval '1 month')::date AS month;
FROM users;
Turns out it's pretty simple:
SELECT generate_series(
(SELECT to_date( to_char(CAST(min(created_at) AS DATE),'yyyy-MM') || '-01','yyyy-mm-dd') from users),
now(),
'1 month'::interval) as month;