Google bigquery (GBQ) churn / retention rate from previous month - google-bigquery

I'm using google bigquery and try to understand monthly churn from the following table, e.g. 1 user (C) churned, and 1 user (D) is new.
My thought was to self join and see which user shows 'null' in current month as a churn, 'null' in previous month is new. But the query returns to "left" join result when I'm using "full join". How should I fix it? I'll be open to other approach as well! thank you!
userid
active_month
A
Jan 2022
B
Jan 2022
D
Jan 2022
A
Dec 2021
B
Dec 2021
C
Dec 2021
A
Nov 2021
B
Nov 2021
select
t1.active_month,
count(distinct t1.userid) recent_user,
count(distinct t2.userid) previous_user,
count(distinct case when t1.userid is null then t2.userid end) as churned_user,
count(distinct case when t2.user is null then t1.userid end) as new_user
from table t1
full outer join table t2
on t1.active_month = datesub(t2.active_month, interval 1 month)
and t1.userid=t2.userid

The two problems I see with your query are:
the join clause should be reversed, so subtracting 1 month from t1.active_date instead of t2.active_date
when you select the t1.active_month you need to coalesce with t2.active_month because it's a full outer join
I have slightly changed your query and added one more condition in the calculation of churned_user to exclude the t2.userid with t2.active_month equal to the most recent active_month. Now it should be correct.
select
coalesce(t1.active_month, t2.active_month) as active_month,
count(distinct t1.userid) recent_user,
count(distinct IF(t2.userid is not null and t2.active_month is not null and t2.active_month <> (select max(active_month) from table), t2.userid, NULL)) previous_user,
count(distinct case when t1.userid is null and t2.active_month <> (select max(active_month) from table) then t2.userid end) as churned_user,
count(distinct case when t2.userid is null then t1.userid end) as new_user
from table t1
full outer join table t2
on date_sub(t1.active_month, interval 1 month) = t2.active_month and t1.userid = t2.userid
group by 1
order by 1
Just as an alternative approach the following would also calculate for each month:
recent_user: as the number of active users for a month
previous_user: as the number of active users for the previous month
churned_user: as the number of users that were last active on a month
new_user: as the number of users that are first seen on a month
Window functions LAG and LEAD come in handy for calculating the previous and next active month for each userid
WITH
table AS (
SELECT
MAX(active_month) OVER() AS current_month,
userid,
active_month,
LAG(active_month) OVER(PARTITION BY userid ORDER BY active_month) AS previous_month,
LEAD(active_month) OVER(PARTITION BY userid ORDER BY active_month) AS next_month
FROM
original_table
ORDER BY
userid,
active_month)
SELECT
active_month,
COUNT(userid) AS recent_user,
COUNTIF(DATE_DIFF(active_month, previous_month, MONTH) = 1) AS previous_user,
COUNTIF(next_month IS NULL AND active_month <> current_month) AS churned_user,
COUNTIF(previous_month IS NULL) AS new_user
FROM
TABLE
GROUP BY
active_month
ORDER BY
active_month

Related

How to add a condition for the query so that it can calculate the unique months and years in PostgreSQL

How can I write down the condition so that he counts the months for me from a certain client (redirect) and source (source)? I need to know how many invoices were issued, and this is counted by month, type January and February are 2 invoices, March April June 3 invoices, etc. I could write max instead of count, but this is not correct, since the client may appear in the middle of the year, for example in May, and he will have the values of the maximum month.
Here is my request:
select TA.redirect,
count(case when TA.source='zlat1' then extract(month from TA.date) else 0 end) number_of_accounts_zlat1,
count(case when TA.source='zlat2' then extract(month from TA.date) else 0 end) number_of_accounts_zlat2,
sum(TA.result_for_the_day) accrued
from total_accounts TA
group by TA.redirect
Here are tables and data + query and result ---->
https://dbfiddle.uk/?rdbms=postgres_14&fiddle=0bc8002e59b03afedeac8d1b8dfc98d1
insert into finace_base (redirect)
select distinct Ta.redirect /*this select will display those names that are not present
in FB if there is other info that u must add to insert then
just add , next to redirect and add whatever u like*/
from total_acounts TA
left join finace_base FB on TA.redirect=FB.redirect
where FB.redirect is null;
update finace_base FB
set zlat1=TA.zlat1,
zlat2=TA.zlat2,
accrued=TA.accrued
from (select TA.redirect,
count(*) filter ( where TA.source='zlat1' ) as zlat1,
count(*) filter ( where TA.source='zlat2' ) as zlat2,
sum(TA.accrued) as accrued
from(
select sum(TA.accrued) as accrued,
TA.date,
TA.redirect,
TA.source
from (select TA.result_for_the_day as accrued,
to_char(TA.date, 'yyyy-mm') as date,
TA.redirect,
TA.source
from total_accounts TA) TA
group by TA.redirect, TA.date, TA.source) TA
group by TA.redirect) TA
where FB.redirect=TA.redirect
i could not add it into comments cause it was too long essentialy you first run the insert into statement and then update it will only do inserts for redirects that are not added yet
select TA.redirect,
count(*) filter ( where TA.source='zlat1' ) as zlat1,
count(*) filter ( where TA.source='zlat2' ) as zlat2,
sum(TA.accrued)
from(
select sum(TA.accrued) as accrued,
TA.date,
TA.redirect,
TA.source
from (select TA.result_for_the_day as accrued,
to_char(TA.date, 'yyyy-mm') as date,
TA.redirect,
TA.source
from total_accounts TA) TA
group by TA.redirect, TA.date, TA.source) TA
group by TA.redirect
there you go thats the answer. giving back to comunity that i have taken :D

Calculating difference in rows for many columns in SQL (Access)

What's up guys. I have an other question regarding using SQL to analyze. I have a table build like this.
ID Date Value
1 31.01.2019 10
1 30.01.2019 5
2 31.01.2019 20
2 30.01.2019 10
3 31.01.2019 30
3 30.01.2019 20
With many different IDs and many different Dates. What I would like to have as an output is an additional column, that gives me the difference to the previous date for each ID. So that I can then analyze the change of values between days for each Category (ID). To do that I would need to avoid that the command computes the difference of Last Day WHERE ID = 1 - First Day WHERE ID = 2.
Desired Output:
ID Date Difference to previous Days
1 31.01.2019 5
2 31.01.2019 10
3 31.01.2019 10
In the end I want to find outlier, so days where the difference in value between two days is very large. Does anyone have a solution? If it is not possible with Access, I am open to solutions with Excel, but Access should be the first choice as it is more scaleable.
Greetings and thanks in advance!!
With a self join:
select t1.ID, t1.[Date],
t1.[Value] - t2.[Value] as [Difference to previous Day]
from tablename t1 inner join tablename t2
on t2.[ID] = t1.[ID] and t2.[Date] = t1.[Date] - 1
Results:
ID Date Difference to previous Day
1 31/1/2019 5
2 31/1/2019 10
3 31/1/2019 10
Edit.
For the case that there are gaps between your dates:
select
t1.ID, t1.[Date], t1.[Value] - t2.[Value] as [Difference to previous Day]
from (
select t.ID, t.[Date], t.[Value],
(select max(tt.[Date]) from tablename as tt where ID = t.ID and tt.[Date] < t.[Date]) as prevdate
from tablename as t
) as t1 inner join tablename as t2
on t2.ID = t1.ID and t2.[Date] = t1.prevdate
In your example data, each id has the same two rows and the values are increasing. If this is generally true, then you can simply use aggregation:
select id, max(date), max(value) - min(value)
from t
group by id;
If the values might not be increasing, but the dates are the same, then you can use conditional aggregation:
select id,
max(date),
(max(iif(date = "31.01.2019", value, null)) -
max(iif(date = "30.01.2019", value, null))
) as diff
from t
group by id;
Note: Your date looks like it is using a bespoke format, so I am just doing the comparison as a string.
If previous date is exactly one day before, you can use a join:
select t.*,
(t.value - tprev.value) as diff
from t left join
t as tprev
on t.id = tprev.di and t.date = dateadd("d", 1, tprev.date);
If date is arbitrarily the previous date in the table, then you can use a correlated subquery
select t.*,
(t.value -
(select top (1) tprev.value
from t as tprev
where tprev.id = t.id and tprev.date < t.date
order by tprev.date desc
)
) as diff
(t.value - tprev.value) as diff
from t;
You can use a self join with an additional condition using a sub-query to determine the previous date
SELECT t.ID, t.Date, t.Value - prev.Value AS Diff
FROM
dtvalues AS t
INNER JOIN dtvalues AS prev
ON t.ID = prev.ID
WHERE
prev.[Date] = (SELECT MAX(x.[Date]) FROM dtvalues x WHERE x.ID=t.ID AND x.[Date]<t.[Date])
ORDER BY t.ID, t.[Date];
You could also include the where condition into the join condition, but the query designer would not be able to handle the query anymore. Like this, you can still edit the query in the query designer.

Bring through a newly created calculated column in another query

I have 2 separate queries below which run correctly.Now I've created a calculated column to provide a count of working days by YMs and would like to bring this through to query1(the join would be query1.Period = query2.Yms)
please see the query and outputs below.
SELECT Client, ClientGroup, Type, Value, Period, PeriodName, PeriodNumber, ClientName
FROM metrics.dbo.vw_KPI_001_Invoice
select YMs,sum(case when IsWorkDay = 'X' then 1 else 0 end) from IESAONLINE.Dbo.DS_Dates
where Year > '2013'
group by YMs
Query 1
Client ClientGroup Type Value Period PeriodName PeriodNumber ClientName
0LG0 KarroFoods Stock 5691.68 201506 Week 06 2015 35 Karro Foods Scunthorpe
Query 2
YMs (No column name)
201401 23
Would the following work:
SELECT Client, ClientGroup, Type, Value, Period, PeriodName, PeriodNumber, ClientName, cnt
FROM metrics.dbo.vw_KPI_001_Invoice q1
INNER JOIN (select YMs,sum(case when IsWorkDay = 'X' then 1 else 0 end) as cnt from IESAONLINE.Dbo.DS_Dates
where Year > '2013'
group by YMs ) q2 ON q1.Period = q2.YMs
If a value isn't always available then you might consider changing the INNER JOIN to an OUTER JOIN.

Aggregating a sub query within query

I am currently working on aggregating the sum qty of "OUT" and "OUT+IN".
Current query is the following:
Select
a.Date
,a.DepartmentID
from
(Select
dris.Date
,dris.RentalItemKey
,dris.WarehouseKey
,ISNULL((Select TOP 1 dris.Date where OutQty=1 order by Date DESC),(Select ri.ReceiveDate from RentalItem ri where ri.RentalItemKey=dris.RentalItemKey)) as LastOutDate
,(Select d.DepartmentKey from Department d where d.Department=i.Department)as DepartmentID
, (CASE WHEN OutQty=1 OR (RepairQty=1 AND RentedQty=1) THEN 'IN' ELSE 'OUT' END) as Status
from DailyRentalItemStatus dris
inner join Inventory i on i.InventoryKey=dris.InventoryKey
where dris.Date='2014-08-02'
and i.ICode='3223700'
and i.Classification IN ('ITEM', 'ACCESSORY')
and i.AvailFor='RENT'
and i.AvailFrom='WAREHOUSE'
and dris.Warehouse='TORONTO')a
and I would like the result to be the following:
Date WarehouseID DepartmentID ICode Owned NotRedundant Out
2014-08-02 001T A00G 3223700 30 30 19
Where Owned is is The items with status as "OUT+IN", out is "OUT" and Not Redundant as where the lastout date is within the last 2 years from the date.
Help would be greatly appreciated.
I think this is close to what you're looking for. Your Not Redundant description, is hard to understand. Which dates are you comparing. The same trick for OUT may be used for that though.
My query also assumes that you always have a department connecting to the inventory table and that there's always a rentalitem.receivedate.
;WITH LastOut as
(Select Max(Date) as LastOutDate, rentalItemKey
from DailyRentalItemStatus
WHERE OutQty=1
)
Select
dris.Date
,dris.WarehouseKey as WarehouseID
,d.DepartmentKey as DepartmentID
, i.Icode
--,ISNULL((Select TOP 1 dris.Date where OutQty=1 order by Date DESC),(Select ri.ReceiveDate from RentalItem ri where ri.RentalItemKey=dris.RentalItemKey)) as LastOutDate
, Count(1) as Owned
, Sum(CASE WHEN NOT (OutQty=1 OR (RepairQty=1 AND RentedQty=1)) THEN 1 ELSE 0 END) as OUT
, Sum(CASE WHEN DateAdd(yy, 2,dris.[date]) >= ISNULL(lastout.lastoutdate, ri.ReceiveDate) then 1 else 0 end) as NonRedundent
from DailyRentalItemStatus dris
inner join Inventory i on i.InventoryKey=dris.InventoryKey
INNER JOIN Department d ON d.Department=i.Department
INNER JOIN RentalItem ri ON ri.RentalItemKey=dris.RentalItemKey
LEFT OUTER JOIN LastOUT ON LastOut.rentalItemKey=dris.RentalItemKey
where dris.Date='2014-08-02'
and i.ICode='3223700'
and i.Classification IN ('ITEM', 'ACCESSORY')
and i.AvailFor='RENT'
and i.AvailFrom='WAREHOUSE'
and dris.Warehouse='TORONTO'
Group BY dris.Date, d.DepartmentKey, Dris.WarehouseKey , i.icode

sql db2 select records from either table

I have an order file, with order id and ship date. Orders can only be shipped monday - friday. This means there are no records selected for Saturday and Sunday.
I use the same order file to get all order dates, with date in the same format (yyyymmdd).
i want to select a count of all the records from the order file based on order date... and (i believe) full outer join (or maybe right join?) the date file... because i would like to see
20120330 293
20120331 0
20120401 0
20120402 920
20120403 430
20120404 827
etc...
however, my sql statement is still not returning a zero record for the 31st and 1st.
with DatesTable as (
select ohordt "Date" from kivalib.orhdrpf
where ohordt between 20120315 and 20120406
group by ohordt order by ohordt
)
SELECT ohscdt, count(OHTXN#) "Count"
FROM KIVALIB.ORHDRPF full outer join DatesTable dts on dts."Date" = ohordt
--/*order status = filled & order type = 1 & date between (some fill date range)*/
WHERE OHSTAT = 'F' AND OHTYP = 1 and ohscdt between 20120401 and 20120406
GROUP BY ohscdt ORDER BY ohscdt
any ideas what i'm doing wrong?
thanks!
It's because there is no data for those days, they do not show up as rows. You can use a recursive CTE to build a contiguous list of dates between two values that the query can join on:
It will look something like:
WITH dates (val) AS (
SELECT CAST('2012-04-01' AS DATE)
FROM SYSIBM.SYSDUMMY1
UNION ALL
SELECT Val + 1 DAYS
FROM dates
WHERE Val < CAST('2012-04-06' AS DATE)
)
SELECT d.val AS "Date", o.ohscdt, COALESCE(COUNT(o.ohtxn#), 0) AS "Count"
FROM dates AS d
LEFT JOIN KIVALIB.ORDHRPF AS o
ON o.ohordt = TO_CHAR(d.val, 'YYYYMMDD')
WHERE o.ohstat = 'F'
AND o.ohtyp = 1