I have a base dataset that is updated monthly. This contains information about employees such as Employer ID. I would like to create a table where we can see the leavers and joiners for each month.
The logic for this is as follows: if employee ID appears in latest month but not prior, then it is a joiner. If ID appears in prior but not latest, then it is a leaver.
The base data is appended and we also have a date variable, so I am able to produce a table of joiners/leavers with either CTEs or CREATE TABLE by specifying date(s) in where clause and merging.
I was wondering whether there was a way I could do this without manually creating multiple tables/CTES ? I.E. something that repeats the logic for a date range.
Aware it’s fairly simple to do in other coding languages but not sure how to go about it in SQL. Any help is greatly appreciated.
Self-join the table. Same employee, adjancent months. I am multiplying a year be twelve and add the month, so as to get a continues month numbering (e.g. 12/2020 = 24252, 01/2021 = 24253). I am using a full outer join and only keep the outer joined rows, thus getting the leavers and the joiners.
select
extract(year from coalesce(m_next.date, date_trunc('month', m_prev.date) + interval '1 month')) as year,
extract(month from coalesce(m_next.date, date_trunc('month', m_prev.date) + interval '1 month')) as month,
count(m_next.date) as joiners,
count(m_prev.date) as leavers
from mytable m_next
full outer join mytable m_prev
on m_prev.employee_id = m_next.employee_id
and extract(year from m_prev.date) * 12 + extract(month from m_prev.date) =
extract(year from m_next.date) * 12 + extract(month from m_next.date) - 1
where m_next.date is null or m_prev.date is null
group by
extract(year from coalesce(m_next.date, date_trunc('month', m_prev.date) + interval '1 month')),
extract(month from coalesce(m_next.date, date_trunc('month', m_prev.date) + interval '1 month'))
order by
extract(year from coalesce(m_next.date, date_trunc('month', m_prev.date) + interval '1 month')),
extract(month from coalesce(m_next.date, date_trunc('month', m_prev.date) + interval '1 month'));
Demo: https://dbfiddle.uk/?rdbms=postgres_14&fiddle=1c66b00a71d484cd3951baa0956ace63
Related
Assume you have the table given below containing information on Facebook user logins. Write a query to obtain the number of reactivated users (which are dormant users who did not log in the previous month, who then logged in during the current month). Output the current month and number of reactivated users.
I have tried this question by first making an inner join combining a user's previous month to current month with this code.
WITH CTE as
(SELECT user_id,
EXTRACT(month from login_date) as current_month,
EXTRACT(month from login_date)-1 as prev_month
FROM user_logins)
SELECT a.user_id as user_id, a.current_month, a.prev_month,
b.user_id as prev_month_user
FROM CTE a LEFT JOIN CTE b
ON a.prev_month = b.current_month
My idea is to use a case statement
CASE WHEN a.user_id IN
(SELECT b.user_id
WHERE b.current_month = a.prev_month)
THEN 0 ELSE 1 END
BUT that is giving me wrong output for user_id 245 in current_month 4.
https://drive.google.com/file/d/1dOQQxaJWv7j7o7M1Q98nlj77KCzIHxKl/view?usp=sharing
How to fix this?
This gets you the first day of the current month:
select date_trunc('month', current_date)
You can add or subtract an interval of one month to get the previous or next month's starting date.
The complete query:
select *
from users
where user_id in
(
select user_id
from user_logins
where login_date >= date_trunc('month', current_date)
and login_date < date_trunc('month', current_date) + interval '1 month'
)
and user_id not in
(
select user_id
from user_logins
where login_date >= date_trunc('month', current_date) - interval '1 month'
and login_date < date_trunc('month', current_date)
)
Well, admittedly
and login_date < date_trunc('month', current_date) + interval '1 month'
is probably unnecessary here, because the table won't contain future logins :-) So, keep it or remove it, as you like.
If you want a self join, you should get distinct user/month pairs first. Then, as you want to get user/month pairs for which not exists a user/month-1 pair (and for which NOT EXISTS would be appropriate) your join must be an anti join. This means you outer join the user/month-1 pair and only keep the outer joined rows, i.e. the non-matches.
WITH cte AS
(
SELECT DISTINCT user_id, DATE_TRUNC('month', login_date) AS month
FROM user_logins
)
SELECT mon.month, mon.user_id
FROM cte mon
LEFT JOIN cte prev ON prev.user_id = mon.user_id
AND prev.month = mon.month - INTERVAL '1 month'
WHERE prev.month IS NULL -- anti join
ORDER BY mon.month, mon.user_id;
I don't find anti joins very readable and would use NOT EXISTS instead. But that's a matter of personal preference, I guess. The query gives you all users who logged in a month, but not the previous month. You can of course limit this to the cutrent month. Or you can aggregate per month and count. Or remove the WHERE clause and count repeating users vs. new ones (COUNT(*) = all that month, COUNT(prev.month) = all repeating users, COUNT(*) - COUNT(prev.month) = all new users).
Well having said this, ... wasn't the task about reactivated users? Then you are looking for users who were active once, then paused a month, then became active again. Here is a simple query to get this for users who paused last month:
select user_id
from user_logins
group by user_id
having min(login_date) < date_trunc('month', current_date) - interval '1 month'
and max(login_date) >= date_trunc('month', current_date)
and count(*) filter (where login_date >= date_trunc('month', current_date) - interval '1 month'
and login_date < date_trunc('month', current_date)) = 0;
select count(*)
from table
where EXTRACT(MONTH FROM addondatetime) = EXTRACT(MONTH FROM current_date)
and EXTRACT(year FROM addondatetime) = EXTRACT(year FROM current_date)
this is my query. i want to extract month from table which is equal to current month but this query is taking almost 2 min
Try:
select count(*)
from table
where date_trunc('month', addondatetime) = date_trunc('month', current_date);
Also create a function based index:
create index test_just_month on test (date_trunc('month', addondatetime));
The below query returns a distinct count of 'members' for a given month and brand (see image below).
select to_char(transaction_date, 'YYYY-MM') as month, brand,
count(distinct UNIQUE_MEM_ID) as distinct_count
from source.table
group by to_char(transaction_date, 'YYYY-MM'), brand;
The data is collected with a 15 day lag after the month closes (meaning September 2016 MONTHLY data won't be 100% until October 15). I am only concerned with monthly data.
The query I would like to build: Until the 15th of this month (October), last month's data (September) should reflect August's data. The current partial month (October) should default to the prior month and thus also to the above logic.
After the 15th of this month, last month's data (September) is now 100% and thus September should reflect September (and October will reflect September until November 15th, and so on).
The current partial month will always = the prior month. The complexity of the query is how to calc prior month.
This query will be ran on a rolling basis so needs to be dynamic.
To be clear, I am trying to build a query where distinct_count for the prior month (until end of current month + 15 days) should reflect (current month - 2) value (for each respective brand). After 15 days of the close of the month, prior month = (current month - 1).
Partial current month defaults to prior month's data. The 15 day value should be variable/modifiable.
First, simplify the query to:
select to_char(transaction_date, 'YYYY-MM') as month, brand,
count(distinct members) as distinct_count
from source.table
group by members, to_char(transaction_date, 'YYYY-MM'), brand;
Then, you are going to have a problem. The problem is that one row (say from Aug 20th) needs to go into two groups. A simple group by won't handle this. So, let's use union all. I think the result is something like this:
select date_trunc('month', transaction_date) as month, brand,
count(distinct members) as distinct_count
from source.table
where (date_trunc('month', transaction_date) < date_trunc('month' current_date) - interval '1 month') or
(day(current_date) > 15 and date_trunc('month', transaction_date) = date_trunc('month' current_date) - interval '1 month')
group by date_trunc('month', transaction_date), brand
union all
select date_trunc('month' current_date) - interval '1 month' as month, brand,
count(distinct members) as distinct_count
from source.table
where (day(current_date) < 15 and date_trunc('month', transaction_date) = date_trunc('month' current_date) - interval '1 month')
group by brand;
Since you already have a working query, I concentrate on the subselect. The condition you can use here is CASE, especially "Searched CASE"
case
when extract(day from current_date) < 15 then
extract(month from current_date - interval '2 months')
else
extract(month from current_date - interval '1 month')
end case
This may be used as part of a where clause, for example.
Here is some sudo code to get the begin date and the end date for your interval.
Begin date:
date DATE_TRUNC('month', CURRENT_DATE - integer 15) - interval '1 month'
This will return the current month only after the 15th day, from there you can subtract a full month to get your starting point.
End Date:
To calculate this, grab the begin date, plus a month, minus a day.
If the source table is partitioned by transaction_date, this syntax (not masking transaction_date with expression) enables partitions eliminatation.
select to_char(transaction_date, 'YYYY-MM') as month
,count (distinct members) as distinct_count
,brand as brand
FROM source.table
where transaction_date between date_trunc('month', current_date) - case when extract (day from current_date) >= 15 then 1 else 2 end * interval '1' month
and date_trunc('month', current_date) - case when extract (day from current_date) >= 15 then 0 else 1 end * interval '1' month - interval '1' day
group by to_char(transaction_date, 'YYYY-MM')
,brand
;
i do have a query which works fine but I was just wondering if there are other ways or alternate method to bettter this.
I have a table where i am fetching those records exceeding or do not fall between 1 year time interval however there is only the year and ISO week number column in the table (integer values).
basically the logic is to check ISO WEEK - YEAR falls between 'current_date - interval '1 year' AND current_date.
My query is as below :
select * from raj_weekly_records where
(date_dimension_week > extract(week from current_date) and date_dimension_year = extract(year from current_date) )
or (date_dimension_week < extract(week from current_date) and (extract(year from current_date)-date_dimension_year=1) )
or(extract(year from current_date)-date_dimension_year>1);
Here date_dimension_week and date_dimension_year are the only integer parameters by which I need to check is there any other alternate or better way?.This code is working fine no issues here.
Here is an idea. Convert the year/week to a numeric format: YYYYWW. That is, the year times 100 plus the week number. Then you can do the logic with a single comparison:
select *
from raj_weekly_records
where date_dimension_year * 100 + date_dimension_week
not between (extract(year from current_date) - 1) * 100 + extract(week from current_date) and
extract(year from current_date) * 100 + extract(week from current_date)
(There might be an off-by one error, depending on whether the weeks at the ends are included or excluded.)
select *
from raj_weekly_records
where
date_trunc('week',
'0001-01-01 BC'::date + date_dimension_year * interval '1 year'
)
+ (date_dimension_week + 1) * interval '1 week'
- interval '1 day'
not between
current_date - interval '1 year' and current_date
I have a table as follow:
CREATE TABLE counts
(
T TIMESTAMP NOT NULL,
C INTEGER NOT NULL
);
I create the following views from it:
CREATE VIEW micounts AS
SELECT DATE_TRUNC('minute',t) AS t,SUM(c) AS c FROM counts GROUP BY 1;
CREATE VIEW hrcounts AS
SELECT DATE_TRUNC('hour',t) AS t,SUM(c) AS c,SUM(c)/60 AS a
FROM micounts GROUP BY 1;
CREATE VIEW dycounts AS
SELECT DATE_TRUNC('day',t) AS t,SUM(c) AS c,SUM(c)/24 AS a
FROM hrcounts GROUP BY 1;
The problem now comes in when I want to create the monthly counts to know what to divide the daily sums by to get the average column a i.e. the number of days in the specific month.
I know to get the days in PostgreSQL you can do:
SELECT DATE_PART('days',DATE_TRUNC('month',now())+'1 MONTH'::INTERVAL-DATE_TRUNC('month',now()))
But I can't use now(), I have to somehow let it know what the month is when the grouping gets done. Any suggestions i.e. what should replace ??? in this view:
CREATE VIEW mocounts AS
SELECT DATE_TRUNC('month',t) AS t,SUM(c) AS c,SUM(c)/(???) AS a
FROM dycounts
GROUP BY 1;
A bit shorter and faster and you get the number of days instead of an interval:
SELECT EXTRACT(day FROM date_trunc('month', now()) + interval '1 month'
- interval '1 day')
It's possible to combine multiple units in a single interval value . So we can use '1 mon - 1 day':
SELECT EXTRACT(day FROM date_trunc('month', now()) + interval '1 mon - 1 day')
(mon, month or months work all the same for month units.)
To divide the daily sum by the number of days in the current month (orig. question):
SELECT t::date AS the_date
, SUM(c) AS c
, SUM(c) / EXTRACT(day FROM date_trunc('month', t::date)
+ interval '1 mon - 1 day') AS a
FROM dycounts
GROUP BY 1;
To divide monthly sum by the number of days in the current month (updated question):
SELECT DATE_TRUNC('month', t)::date AS t
,SUM(c) AS c
,SUM(c) / EXTRACT(day FROM date_trunc('month', t)::date
+ interval '1 mon - 1 day') AS a
FROM dycounts
GROUP BY 1;
You have to repeat the GROUP BY expression if you want to use a single query level.
Or use a subquery:
SELECT *, c / EXTRACT(day FROM t + interval '1 mon - 1 day') AS a
FROM (
SELECT date_trunc('month', t)::date AS t, SUM(c) AS c
FROM dycounts
GROUP BY 1
) sub;