how to calculate churn and retention rate with Bigquery? - sql

i have table with name
user
| Column name : user_id (PK), register_date, age, gender
sessions
| Column name : sessions_id (PK), user_id (FK), traffic_medium, traffic_source, visits_timestamps
events
| Column name : event_id (PK), sessions_id (FK), event, timestamp
transactions
| Column name : transactions_id (PK), sessions_id (FK), payment_method_id (FK), total_amount, transactions_timestamps, status
payment_methods
| Column name : payment_method_id (PK), payment_method
transaction_items
| Column name : transaction_item_id (PK), transactions_id (FK), product_id (FK), product_qty, product_amount, product_price
products
| Column name : product_id (PK), product_name, product_category
i want calculate churn rate/retention rate based on quarter or monthly year 2020
...
WITH customer_data AS (
SELECT
s.user_id,
u.register_date,
COUNT(DISTINCT t.transactions_id) AS transactions
FROM
`da-labs-b4-ecommerce.b4_ecommerce_dataset.users` u
JOIN `da-labs-b4-ecommerce.b4_ecommerce_dataset.sessions` s ON u.user_id = s.user_id
JOIN `da-labs-b4-ecommerce.b4_ecommerce_dataset.transactions` t ON s.sessions_id = t.sessions_id
WHERE
transactions_timestamps BETWEEN '2020-01-01' AND '2020-12-31'
GROUP BY
1,2
),
churned_customers AS (
SELECT
user_id,
register_date,
transactions
FROM customer_data
WHERE
transactions = 0
)
SELECT
100 * COUNT(churned_customers.user_id) / COUNT(customer_data.user_id) AS churn_rate
FROM customer_data
JOIN churned_customers ON customer_data.user_id = churned_customers.user_id

Related

Union all customers from two tables and assign date values from second table in case customer does not exist in first table

DB-Fiddle
CREATE TABLE sales
(
id SERIAL PRIMARY KEY,
last_order DATE,
customer VARCHAR(255)
);
INSERT INTO sales (last_order, customer)
VALUES
('2020-09-10', 'user_01'),
('2020-10-15', 'user_02'),
('2020-11-26', 'user_03');
CREATE TABLE customers
(
id SERIAL PRIMARY KEY,
first_order DATE,
customer VARCHAR(255)
);
INSERT INTO customers (first_order, customer)
VALUES
('2020-08-10', 'user_01'),
('2020-09-15', 'user_02'),
('2020-10-17', 'user_03'),
('2020-05-03', 'user_04'),
('2020-04-12', 'user_05');
Expected result:
customer | used_date |
-----------|----------------|------
user_01 | 2020-09-10 |
user_02 | 2020-10-15 |
user_03 | 2020-11-26 |
user_04 | 2020-05-03 |
user_05 | 2020-04-12 |
I have two tables and I want to query all customers in both tables.
Additionally, in case the customer does not exist in the table sales I want to apply the first_order from table customers as used_date.
So far I came up with this query:
SELECT
t1.customer,
(CASE WHEN t1.last_order IS NULL THEN t1.first_order ELSE t1.last_order END) AS used_date
FROM
(SELECT
s.customer,
s.last_order AS last_order,
NULL::date AS first_order
FROM
sales s
GROUP BY
1, 2, 3
UNION ALL
SELECT
c.customer,
NULL::date AS last_order,
c.first_order AS first_order
FROM
customers c
GROUP BY
1, 2, 3) t1
GROUP BY
1, 2
ORDER BY
1;
How do I need to modify the CASE WHEN statement to get the expected result?
This looks like a left join:
select c.*,
coalesce(s.last_order, c.first_order) as used_date
from customers c left join
sales s
on c.customer = s.customer;

Generating a unique ID from a SQL subquery

I'm trying to insert the result of a subquery into a table, but the table I want to insert into has a unique primary ID. I want to take the max of the EID number and add 1 to any entry I'm trying to insert to the table.
Table I want to Insert Into:
----------------------------------------------------
EID (pk) | First Name | Employment Date
----------------------------------------------------
1 | John | 2016-01-01
2 | Joe | 2013-01-01
3 | Jill | 2012-01-01
4 | Jen | 2017-01-01
My subquery statement:
(SELECT FIRSTNAME, ORDERDATE as EMPLOYMENTDATE
FROM CUSTOMER, ORDER
WHERE CUSTOMER.id = ORDER.id
AND ORDERDATE >= DATE '2017-01-01')
The problem is inserting, as I don't have an unique ID to generate. This is on SQL Server
I am trying to insert something like this:
INSERT INTO EMPLOYEE(EID, FIRSTNAME, EMPLOYMENTDATE)
SELECT ??????WHAT GOES HERE??????, FIRSTNAME, ORDERDATE as EMPLOYMENTDATE
FROM CUSTOMER, ORDER
WHERE CUSTOMER.id = ORDER.id
AND ORDERDATE >= DATE '2017-01-01'
Assuming EID is identity, this is how:
INSERT INTO EMPLOYEE(FIRSTNAME, EMPLOYMENTDATE)
SELECT FIRSTNAME, ORDERDATE as EMPLOYMENTDATE
FROM CUSTOMER, ORDER
WHERE CUSTOMER.id = ORDER.id
AND ORDERDATE >= DATE '2017-01-01'
You're right, since you're using a subquery it would be like this:
INSERT INTO EMPLOYEE(EID, FIRSTNAME, EMPLOYMENTDATE)
SELECT
ISNULL((MAX(Emp.EID)+1),1) AS EID,
CustOrd.FirstName,
CustOrd.EmploymentDate
FROM
EMPLOYEE Emp,
(SELECT FIRSTNAME, ORDERDATE as EMPLOYMENTDATE
FROM CUSTOMER, ORDER
WHERE CUSTOMER.id = ORDER.id
AND ORDERDATE >= DATE '2017-01-01') CustOrd

Get the results of a subquery in SQL

How do you create a join to get the latest invoice for all customers?
Tables:
- Invoices
- Customers
Customers table has: id, last_invoice_sent_at, last_invoice_guid
Invoices table has: id, customer_id, sent_at, guid
I'd like to fetch the latest invoice for every customer and, with that data, update last_invoice_sent_at and last_invoice_guid in the Customers table.
You want to use distinct on. For a query soring by customer_id and then by invoice, it would return the first row for each distinct value indicated in distinct on. That is the rows with * below:
customer_id | sent_at |
1 | 2014-07-12 | *
1 | 2014-07-10 |
1 | 2014-07-09 |
2 | 2014-07-11 | *
2 | 2014-07-10 |
So your update query could look like:
update customers
set last_invoice_sent_at = sent_at
from (
select distinct on (customer_id)
customer_id,
sent_at
from invoices
order by customer_id, sent_at desc
) sub
where sub.customer_id = customers.customer_id
#Konrad provided a flawless SQL statement. But since we are only interested in a single column, GROUP BY will be more efficient than DISTINCT ON (which is great to retrieve multiple columns from the same row):
UPDATE customers c
SET last_invoice_sent_at = sub.last_sent
FROM (
SELECT customer_id, max(sent_at) AS last_sent
FROM invoices
GROUP BY 1
) sub
WHERE sub.customer_id = c.customer_id;

How can I translate this unscalable query into a window function for dynamic columns?

I have a query that I use to generate the total order amount for customers and group them into columns by month alongside another column that represents total order amount.
Here's the schema:
temp=# \d+ customers;
Table "pg_temp_2.customers"
Column | Type | Modifiers | Storage | Description
------------+-----------------------------+-----------+----------+-------------
id | integer | not null | plain |
created_at | timestamp without time zone | | plain |
name | text | | extended |
Indexes:
"customers_pkey" PRIMARY KEY, btree (id)
Referenced by:
TABLE "orders" CONSTRAINT "orders_customer_id_fkey" FOREIGN KEY (customer_id) REFERENCES customers(id)
Has OIDs: no
Table "pg_temp_2.orders"
Column | Type | Modifiers | Storage | Description
-------------+-----------------------------+---------------+---------+-------------
id | integer | not null | plain |
created_at | timestamp without time zone | default now() | plain |
customer_id | integer | | plain |
amount | integer | | plain |
Indexes:
"orders_pkey" PRIMARY KEY, btree (id)
Foreign-key constraints:
"orders_customer_id_fkey" FOREIGN KEY (customer_id) REFERENCES customers(id)
Has OIDs: no
For convenience, I've added the create table statements:
create temporary table customers ( id integer primary key, created_at timestamp without time zone, name text);
create temporary table orders ( id integer primary key, created_at timestamp without time zone, customer_id integer references customers(id));
Here's the query I'm using:
SELECT
c.name,
sum(o.amount),
CAST(SUM(
CASE
WHEN date_trunc('month', o.created_at) BETWEEN '2012-10-01' AND ('2012-11-01'::date - '1 day'::interval)
THEN o.amount
ELSE 0
END
) / 100.0 AS MONEY) october2012,
CAST(SUM(
CASE
WHEN date_trunc('month', o.created_at) BETWEEN '2012-11-01' AND ('2012-12-01'::date - '1 day'::interval)
THEN o.amount
ELSE 0
END
) / 100.0 AS MONEY) as november2012
FROM orders o
INNER JOIN customers c ON o.customer_id = c.id
WHERE o.created_at >= '01 October 2012'
AND o.created_At < '01 December 2012'
GROUP BY
c.name
ORDER BY
october2012 desc;
How can I get rid of that ugly case statement? There MUST be a more elegant way that rolls up these queries over a certain time slice. I tried to use window functions, but I've failed miserably. Any assistance would be appreciated!
I'm using postgresql 9.1
This is not simpler but is more scalable:
select *
from crosstab($$
with orders as (
select
customer_id,
date_trunc('month', created_at) created_at,
cast(amount / 100.0 as money) amount
from orders
where
created_at >= '2012-10-01'
and created_at < '2012-12-01'
), months as (
select
c.name,
to_char(o.created_at, 'YYYY-MM') created_at,
sum(o.amount) amount
from
orders o
inner join
customers c on c.id = o.customer_id
group by 1, 2
)
select name, created_at, amount
from months
union all
select name, 'Total', sum(amount)
from months
group by 1, 2
order by 1, 2
$$, $$
select distinct to_char(created_at, 'YYYY-MM')
from orders
where
created_at >= '2012-10-01'
and created_at < '2012-12-01'
union select 'Total'
order by 1
$$
) as (name text, "2012-10" money, "2012-11" money, "Total" money)

MySQL: multiple grouping

So I have an example table called items with the following columns:
item_id (int)
person_id (int)
item_name (varchar)
item_type (varchar) - examples: "news", "event", "document"
item_date (datetime)
...and a table person with the following columns: "person_id", "person_name".
I was hoping to display a list of the top 2 submitters (+ the COUNT() of items submitted) in a given time period for each item_type. Here's basically what I was hoping the MySQL output would look like:
person_name | item_type | item_count
Steve Jobs | document | 11
Bill Gates | document | 6
John Doe | event | 4
John Smith | event | 2
Bill Jones | news | 24
Bill Nye | news | 21
How is this possible without making a separate query for each item_type? Thanks in advance!
SELECT item_type, person_name, item_count
FROM (
SELECT item_type, person_name, item_count,
#r := IFNULL(#r, 0) + 1 AS rc,
CASE WHEN #_item_type IS NULL OR #_item_type <> item_type THEN #r := 0 ELSE 1 END,
#_item_type := item_type,
FROM (
SELECT #r := 0,
#_item_type := NULL
) vars,
(
SELECT item_type, person_name, COUNT(*) AS item_count
FROM items
GROUP BY
item_type, person_name
ORDER BY
item_type, person_name, item_count DESC
) vo
) voi
WHERE rc < 3
Something like this shoul work:
SELECT
p.person_name, i.item_type, COUNT(1) AS item_count
FROM
person p
LEFT JOIN item i
ON p.person_id = i.person_id
GROUP BY
p.person_id,
i.item_type
HAVING
COUNT(1) >= (
SELECT
COUNT(1)
FROM
item i2
WHERE
i2.item_type = i.item_type
GROUP BY
i2.person_id
LIMIT 1,1
)
I think this should do it:
SELECT person_name,item_type,count(item_id) AS item_count
FROM person
LEFT JOIN items USING person_id
GROUP BY person_id
The "item_type" column is going to be dodgy though, each row represents multiple items, and you're only showing the item_type from one of them. You can list all of them with "GROUP_CONCAT", that's a lot of fun.