Cohort retention with SQL BigQuery - sql

I am trying to create a retention table like the following using SQL in Big Query but with MONTHLY cohorts;
I have the following columns to use in my dataset, I am only using one table and it's name is 'curious-furnace-341507.TEST.Test_Dataset_-_Orders'
order_date
order_id
customer_id
2020-01-02
12345
6789
I do not need the new user column and the data goes through June 2020 I think ideally a cohort month column that lists January-June cohorts and then 5 periods across.
I have tried so many different things and keep getting errors in BigQuery I think I am approaching it all wrong. The online queries I am trying to pull from seem to use dates rather than months which is also causing some confusion as I think I need to truncate my date column to months only in the query?
Does anyone have a go-to query that will work in BigQuery for a retention table or can help me approach this? Thanks!

This may help you:
With cohorts AS (
SELECT
customer_id,
MIN(DATE(order_date)) AS cohort_date
FROM 'curious-furnace-341507.TEST.Test_Dataset_-_Orders'
GROUP BY 1)
SELECT
FORMAT_DATE("%Y-%m", c.cohort_date) AS cohort_mth,
t.customer_id AS cust_id,
DATE_DIFF(t.order_date, c.cohort_date, month) AS order_period,
FROM 'curious-furnace-341507.TEST.Test_Dataset_-_Orders' t
JOIN cohorts c ON t.customer_id = c.customer_id
WHERE cohort_date >= ('2020-01-01')
AND DATE_DIFF(t.order_date, c.cohort_date, month) <=5
GROUP BY 1, 2, 3
I typically do pivots and % calcs in excel/ sheets. So this will give just you the input data you need for that.
NOTE:
This will give you a count of unique customers who ordered in period X (ignores repeat orders in period).
This also has period 0 (ordered again in cohort_mth) which you may wish to keep/ exclude.

Related

Adding column based on dynamic criteria that changes for every row in snowflake

Trying to add a column that counts distinct customers in snowflake based on criteria that changes for every row i.e. needs to count customers between 52 weeks before current week_ending date to current week_ending date.
The query that goes like
select week_ending, sales, last_year_cust_count
from table where year = 2022
now i want the last_year_cust_count to have distinct customers between 52 weeks before week_ending till current week_ending and this needs to show following results as example
Week_ending
Sales
last_year_cust_count
02/01/22
$300
3479
09/01/22
$350
3400
16/01/22
$450
3500
... and so on
The optimal way to solve this over complex structure, is to use a bitmap, and then roll that up to the projections you over.
You should read Using Bitmaps to Compute Distinct Values for Hierarchical Aggregations
The simple, non-performant way is to self join and throw processing power at it.
select a.week_ending, a.sales, count(distinct b.customer) as last_year_cust_count
from table_a as a
join table_a as b
on <filter that I cannot bothered writing to select last 52 weeks base on years and weeks>
where year = 2022

Sum dates with different timestamps and picking the min date?

Beginner here. I want to have only one row for each delivery date but it is important to keep the hours and the minutes. I have the following table in Oracle (left):
As you can see there are days that a certain SKU (e.g SKU A) was delivered twice in the same day. The table on the right is the desired result. Essentially, I want to have the quantities that arrived on the 28th summed up and in the Supplier_delivery column I want to have the earliest delivery timestamp.
I need to keep the hours and the minutes otherwise I know I could achieve this by writing sth like: SELECT SKU, TRUNC(TO_DATE(SUPPLIER_DELIVERY), 'DDD'), SUM(QTY) FROM TABLE GROUP BY SKU , TRUNC(TO_DATE(SUPPLIER_DELIVERY), 'DDD')
Any ideas?
You can use MIN():
SELECT SKU, MIN(SUPPLIER_DELIVERY), SUM(QTY)
FROM TABLE
GROUP BY SKU, TRUNC(SUPPLIER_DELIVERY);
This assumes that SUPPLIER_DELIVERY is a date and does not need to be converted to one. But it would work with TO_DATE() in the GROUP BY as well.

Trying to summarize loan origination by month

I am trying to summarize monthly loan originations by month from a table that contains loan level data going back to the late 90s. Every month, the most recent loan-level data are added to the table – the month_key field is used to identify the most recent records. I want to group the loan origination dates by month and sum the total loan commitments originated in the individual months. The table attached depicts how I want to summarize the data in my query, and the code below is what I have written thus far - it outputs data summarized by month dating back to the 90s. Thanks for the help.
Edit: JMB's solution worked. Once I added the month_key field back in, and sorted for the latest month on record, and summed the original loan balance I received the output I needed.
select SUM(INDIVIDUAL_LOAN_BALANCE) AS MONTHLY_LOAN_ORIGINATIONS, ltrim(TO_CHAR(ORIG_OBGN_DATE,'mm-yyyy'), '0') AS ORIG_MONTH
FROM LOAN_TABLE
WHERE MONTH_KEY = 202002
group by ltrim(TO_CHAR(ORIG_OBGN_DATE,'mm-yyyy'), '0')
HAVING ltrim(TO_CHAR(ORIG_OBGN_DATE,'mm-yyyy'), '0') >= '12-2018'
ORDER BY ltrim(TO_CHAR(ORIG_OBGN_DATE,'mm-yyyy'), '0') DESC
Example for how I want the query to depict the data:
How the query currently outputs the data
I would suggest leveraging the date datatype to filter, aggregate and sort: this makes things much easier and safer (typically, your where clause compares strings, and does not do what you expect). You can handle the formatting in the select clause.
select
sum(individual_loan_balance) as monthly_loan_originations,
ltrim(to_char(trunc(orig_obgn_date, 'mm'),'mm-yyyy'), '0') as orig_month
from loan_table
where orig_obgn_date >= date'2018-01-01'
group by trunc(orig_obgn_date, 'mm')
order by trunc(orig_obgn_date, 'mm')

Is there a simple line (or two) of code that will pull records before a minimum date in another table?

I want to pull Emergency room visits before a members first treatment date. Everyone as a different first treatment date and none occur before Jan 01 2012.
So if a member has a first treatment date of Feb 24 2013, I want to know how many times they visited the ER one year prior to that date.
These min dates are located in another table and I can not use the Min date in my DATEADD function. Thoughts?
One possible solution is to use a CTE to capture the visits between the dates your interested in and then join to that with your select.
Here is an example:
Rextester
Edit:
I just completely updated my answer. Sorry for the confusion.
So you have at least two tables:
Emergency room visits
Treatment information
Let's call these two tables [ERVisits] and [Treatments].
I suppose both tables have some id-field for the patient/member. Let's call it [MemberId].
How about this conceptual query:
WITH [FirstTreatments] AS
(
SELECT [MemberId], MIN([TreatmentDate]) AS [FirstTreatmentDate]
FROM [Treatments]
GROUP BY [MemberId]
)
SELECT V.[MemberId], T.[FirstTreatmentDate], COUNT(*) AS [ERVisitCount]
FROM [ERVisits] AS V INNER JOIN [FirstTreatments] AS T ON T.[MemberId] = V.[MemberId]
WHERE DATEDIFF(DAY, V.[VisitDate], T.[FirstTreatmentDate]) BETWEEN 0 AND 365
GROUP BY V.[MemberId], T.[FirstTreatmentDate]
This query should show the number of times a patient/member has visited the ER in the year before his/her first treatment date.
Here is a tester: https://rextester.com/UXIE4263

Querying SQLITE DB for Data from One Column Based On Another Column

I hope the title of this post makes sense.
The db in question has two columns that are related to my issue, a date column that follows the format xx/xx/xxxx and price a column. What I want to do is get a sum of the prices in the price column based on the month and year in which they occurred, but that data is in the other aforementioned column. Doing so will allow me to determine the total for a given month of a given year. The problem is I have no idea how to construct a query that would do what I need. I have done some reading on the web, but I'm not really sure how to go about this. Can anyone provide some advice/tips?
Thanks for your time!
Mike
I was able to find a solution using a LIKE clause:
SELECT sum(price) FROM purchases WHERE date LIKE '11%1234%'
The "11" could be any 2-digit month and the "1234" is any 4 digit year. The % sign acts as a wildcard. This query, for example, returns the sum of any prices that were from month 11 of year 1234 in the db.
Thanks for your input!
You cannot use the built-in date functions on these date values because you have stored them formatted for displaing instead of in one of the supported date formats.
If the month and day fields always have two digits, you can use substr:
SELECT substr(MyDate, 7, 4) AS Year,
substr(MyDate, 1, 2) AS Month,
sum(Price)
FROM Purchases
GROUP BY Year,
Month
So, the goal is to get an aggregate grouping by the month?
select strftime('%m', mydate), sum(price)
from mytable
group by strftime('%m', mydate)
Look into group by