SQL cohort query output - sql

This is a sample query and I do not have the data associated with these values, but I just need to know what kind of chart/information would be the output of this request:
SELECT
cohorts.cohortyear,
YEAR(orders.time) AS purchaseyear,
SUM(orders.amount) AS amount
FROM orders,
(SELECT id AS customer_id, YEAR(created) AS cohortyear FROM customers) AS cohorts
WHERE orders.customer_id = cohorts.customer_id
GROUP BY cohortyear, purchaseyear;

Not sure I understand the question. By looking at the projections in the SELECT clause I am going to guess your output columns will look like:
cohortyear (integer) | purchaseyear (integer) | amount (some numeric)

The query would be better written and more intelligible if it used explicit join syntax:
SELECT co.cohortyear, YEAR(o.time) AS purchaseyear, SUM(o.amount) AS amount
FROM orders o join
(SELECT id AS customer_id, YEAR(created) AS cohortyear
FROM customers cu
) co
on o.customer_id = co.customer_id
GROUP BY co.cohortyear, purchaseyear;
It is taking customers and assigning them to similarity groups based on the year the customers record was created (assuming created does what it sounds like). So, all customers created in the same year are in the same group. Then, it calculates the amount spent in each year on or after that one. If you ran the query, the output would be pretty self explanatory. Something like:
CohortYear PurchaseYear Amount
2014 2014 $1000
2013 2014 $2000
2013 2013 $3000
. . .
The first row says that for customers who started in 2014, they spent $1000 in 2014. By contrast, customers who start in 2013 spent $2000 in 2014. And this group spent $3000 in 2013. However, the comparisons aren't very good, because it doesn't take into account the number of customers. But that is another matter.

Related

Adding column based on dynamic criteria that changes for every row in snowflake

Trying to add a column that counts distinct customers in snowflake based on criteria that changes for every row i.e. needs to count customers between 52 weeks before current week_ending date to current week_ending date.
The query that goes like
select week_ending, sales, last_year_cust_count
from table where year = 2022
now i want the last_year_cust_count to have distinct customers between 52 weeks before week_ending till current week_ending and this needs to show following results as example
Week_ending
Sales
last_year_cust_count
02/01/22
$300
3479
09/01/22
$350
3400
16/01/22
$450
3500
... and so on
The optimal way to solve this over complex structure, is to use a bitmap, and then roll that up to the projections you over.
You should read Using Bitmaps to Compute Distinct Values for Hierarchical Aggregations
The simple, non-performant way is to self join and throw processing power at it.
select a.week_ending, a.sales, count(distinct b.customer) as last_year_cust_count
from table_a as a
join table_a as b
on <filter that I cannot bothered writing to select last 52 weeks base on years and weeks>
where year = 2022

Tying in public purchase records

I'm trying to get a list of purchases for work. Currently, what we're doing is tracing product orders based on what our customers have said they've bought. For instance, Steve says he has bought iPhones on the 3rd Jan 2022, the 15th Dec 2021 and the 1st of Nov 2021.
What I'm currently doing is a very arduous task - three separate queries, extracting them all, then removing duplicates in Excel to then analyze the account ID's that have bought on those three dates.
This in of itself takes time, but also, some customers buy 5, 7, 10 products etc so it can become quite difficult.
select tt.order_time, tt.from, tt.to, tt.value/1e9, tt.orderID
from transactiontable tt
where right(tt.data::varchar,8) = '9092fcb5'
and left(tt.data::varchar, 10) in ('\xfb3bdb41', '\x7ff36ab5')
and tt.contract_address = '\9092fcb5'
and tt.order_time between '2022-05-09 00:00' and '2022-05-09 23:59'
Does anyone have any help on how I can essentially take X number of queries and then only show me the distanct "from" addresses for each overall query?
Thanks!

Cohort retention with SQL BigQuery

I am trying to create a retention table like the following using SQL in Big Query but with MONTHLY cohorts;
I have the following columns to use in my dataset, I am only using one table and it's name is 'curious-furnace-341507.TEST.Test_Dataset_-_Orders'
order_date
order_id
customer_id
2020-01-02
12345
6789
I do not need the new user column and the data goes through June 2020 I think ideally a cohort month column that lists January-June cohorts and then 5 periods across.
I have tried so many different things and keep getting errors in BigQuery I think I am approaching it all wrong. The online queries I am trying to pull from seem to use dates rather than months which is also causing some confusion as I think I need to truncate my date column to months only in the query?
Does anyone have a go-to query that will work in BigQuery for a retention table or can help me approach this? Thanks!
This may help you:
With cohorts AS (
SELECT
customer_id,
MIN(DATE(order_date)) AS cohort_date
FROM 'curious-furnace-341507.TEST.Test_Dataset_-_Orders'
GROUP BY 1)
SELECT
FORMAT_DATE("%Y-%m", c.cohort_date) AS cohort_mth,
t.customer_id AS cust_id,
DATE_DIFF(t.order_date, c.cohort_date, month) AS order_period,
FROM 'curious-furnace-341507.TEST.Test_Dataset_-_Orders' t
JOIN cohorts c ON t.customer_id = c.customer_id
WHERE cohort_date >= ('2020-01-01')
AND DATE_DIFF(t.order_date, c.cohort_date, month) <=5
GROUP BY 1, 2, 3
I typically do pivots and % calcs in excel/ sheets. So this will give just you the input data you need for that.
NOTE:
This will give you a count of unique customers who ordered in period X (ignores repeat orders in period).
This also has period 0 (ordered again in cohort_mth) which you may wish to keep/ exclude.

Optimize Average of Averages SQL Query

I have a table where each row is a vendor with a sale made on some date.
I'm trying to compute average daily sales per vendor for the year 2019, and get a single number. Which I think means I want to compute an average of averages.
This is the query I'm considering, but it takes a very long time on this large table. Is there a smarter way to compute this average without this much nesting? I have a feeling I'm scanning rows more times than I need to.
-- Average of all vendor's average daily sale counts
SELECT AVG(vendor_avgs.avg_daily_sales) avg_of_avgs
FROM (
-- Get average number of daily sales for each vendor
SELECT vendor_daily_totals.memberdeviceid, AVG(vendor_daily_totals.cnt)
avg_daily_sales
FROM (
-- Get total number of sales for each vendor
SELECT vendorid, COUNT(*) cnt
FROM vendor_sales
WHERE year = 2019
GROUP BY vendorid, month, day
) vendor_daily_totals
GROUP BY vendor_daily_totals.vendorid
) vendor_avgs;
I'm curious if there is in general a way to compute an average of averages more efficiently.
This is running in Impala, by the way.
I think you can just do the calculation in one shot:
SELECT AVG(t.avgs)
FROM (
SELECT vendorid,
COUNT(*) * 1.0 / COUNT(DISTINCT month, day) as avgs
FROM vendor_sales
WHERE year = 2019
GROUP BY vendorid
) t
This gets the total and divides by the number of days. However, COUNT(DISTINCT) might be even slower than nested GROUP BYs in Impala, so you need to test this.

sql average of a sum divided by a count

I am trying to do an SQL query to get the average spend for a specific customer.
I have written the following SQL (this is slighlty cut down for this example)..
SELECT SUM(price) as sumPrice, COUNT(transactionId) as transactionCount, customerName
FROM customers, transactions
WHERE customers.customerId = transactions.customerId
AND transactiontypeId = 1
GROUP BY customers.customerId
This gives me the sum of the transaction and the count. With this I can then divide the sum by the count to get the average spend. However I would like to be able to get the Average as a value straight out of the database rather than manipulate the data once I have got it out.
Is there any way to do this? I have played around with writing a select within a select but haven;t had much luck as of yet, hence asking on here.
Thanks in advance
MySQL has a mean average function built-in.
SELECT AVG(price) AS averageSpend, customerName
FROM customers, transactions
WHERE customers.customerId = transactions.customerId
AND transactiontypeId = 1
GROUP BY customers.customerId