postgresql - cumul. sum active customers by month (removing churn) - sql

I want to create a query to get the cumulative sum by month of our active customers. The tricky thing here is that (unfortunately) some customers churn and so I need to remove them from the cumulative sum on the month they leave us.
Here is a sample of my customers table :
customer_id | begin_date | end_date
-----------------------------------------
1 | 15/09/2017 |
2 | 15/09/2017 |
3 | 19/09/2017 |
4 | 23/09/2017 |
5 | 27/09/2017 |
6 | 28/09/2017 | 15/10/2017
7 | 29/09/2017 | 16/10/2017
8 | 04/10/2017 |
9 | 04/10/2017 |
10 | 05/10/2017 |
11 | 07/10/2017 |
12 | 09/10/2017 |
13 | 11/10/2017 |
14 | 12/10/2017 |
15 | 14/10/2017 |
Here is what I am looking to achieve :
month | active customers
-----------------------------------------
2017-09 | 7
2017-10 | 6
I've managed to achieve it with the following query ... However, I'd like to know if there are a better way.
select
"begin_date" as "date",
sum((new_customers.new_customers-COALESCE(churn_customers.churn_customers,0))) OVER (ORDER BY new_customers."begin_date") as active_customers
FROM (
select
date_trunc('month',begin_date)::date as "begin_date",
count(id) as new_customers
from customers
group by 1
) as new_customers
LEFT JOIN(
select
date_trunc('month',end_date)::date as "end_date",
count(id) as churn_customers
from customers
where
end_date is not null
group by 1
) as churn_customers on new_customers."begin_date" = churn_customers."end_date"
order by 1
;

You may use a CTE to compute the total end_dates and then subtract it from the counts of start dates by using a left join
SQL Fiddle
Query 1:
WITH edt
AS (
SELECT to_char(end_date, 'yyyy-mm') AS mon
,count(*) AS ct
FROM customers
WHERE end_date IS NOT NULL
GROUP BY to_char(end_date, 'yyyy-mm')
)
SELECT to_char(c.begin_date, 'yyyy-mm') as month
,COUNT(*) - MAX(COALESCE(ct, 0)) AS active_customers
FROM customers c
LEFT JOIN edt ON to_char(c.begin_date, 'yyyy-mm') = edt.mon
GROUP BY to_char(begin_date, 'yyyy-mm')
ORDER BY month;
Results:
| month | active_customers |
|---------|------------------|
| 2017-09 | 7 |
| 2017-10 | 6 |

Related

Find the first order of a supplier in a day using SQL

I am trying to write a query to return supplier ID (sup_id), order date and the order ID of the first order (based on earliest time).
+--------+--------+------------+--------+-----------------+
|orderid | sup_id | items | sales | order_ts |
+--------+--------+------------+--------+-----------------+
|1111132 | 3 | 1 | 27,0 | 24/04/17 13:00 |
|1111137 | 3 | 2 | 69,0 | 02/02/17 16:30 |
|1111147 | 1 | 1 | 87,0 | 25/04/17 08:25 |
|1111153 | 1 | 3 | 82,0 | 05/11/17 10:30 |
|1111155 | 2 | 1 | 29,0 | 03/07/17 02:30 |
|1111160 | 2 | 2 | 44,0 | 30/01/17 20:45 |
|....... | ... | ... | ... | ... ... |
+--------+--------+------------+--------+-----------------+
Output I am looking for:
+--------+--------+------------+
| sup_id | date | order_id |
+--------+--------+------------+
|....... | ... | ... |
+--------+--------+------------+
I tried using a subquery in the join clause as below but didn't know how to join it without having selected order_id.
SELECT sup_id, date(order_ts), order_id
FROM sales s
JOIN
(
SELECT sup_id, date(order_ts) as date, min(time(order_date))
FROM sales
GROUP BY merchant_id, date
) m
on ...
Kindly assist.
You can use not exists:
select *
from sales
where not exists (
-- find sales for same supplier, earlier date, same day
select *
from sales as older
where older.sup_id = sales.sup_id
and older.order_ts < sales.order_ts
and older.order_ts >= cast(sales.order_ts as date)
)
The query below might not be the fastest in the world, but it should give you all information you need.
select order_id, sup_id, items, sales, order_ts
from sales s
where order_ts <= (
select min(order_ts)
from sales m
where m.sup_id = s.sup_id
)
select sup_id, min(order_ts), min(order_id) from sales
where order_ts = '2022-15-03'
group by sup_id
Assumed orderid is an identity / auto increment column

Count item after joining two table

I have two tables,
First is Product table
+-----------+------+-----+------------+
| id | pnum | year |month|date |
+-----------+------+-----+------------+
|12 | S5 | 2021 | 2 | 2021-02-21 |
|12 | S5 | 2021 | 2 | 2021-02-22 |
|12 | S5 | 2021 | 2 | 2021-02-23 |
|33 | A55| 2021 | 3 | 2021-03-01 |
|44 | B1 | 2021 | 6 | 2021-06-01 |
Second is Deal table
+-----------+------+-----+------------+
| id | pnum| year |month|date |
+-----------+------+-----+------------+
|12 | S5 | 2021 | 2 | 2021-02-28 |
|12 | S5 | 2021 | 2 | 2021-02-01 |
|33 | A55| 2021 | 3 | 2021-03-01 |
I need a result which can tell me how many product got launch
for a year_month and count of deal in first 15 days or in next 15 days
+----------- +------------+----------------+--------------------+
| num | count| year-month |deal_in_first15 |deal_after_first15 |
+----------- +------+-----+-------------------------------------
|S5 | 3 | 2021-02 | 1 | 1 |
|A55 | 1 | 2021-03 | 1 | 0 |
I was trying to do it like below
select * from Product p inner join Deal d on
p.num=d.num AND p.id=d.id AND p.month=d.month
but it is not helping me to get exact result as intended.
I have some java and python background and not expert in sql hence applying count and case statement is not working out.
You can try to use the condition aggregate function in subquery then do JOIN
select p.pnum ,
COUNT(*) count,
FORMAT(p.[date],'yyyy-MM') 'year-month',
deal_in_first15 ,
deal_after_first15
from Product p
inner join (
SELECT id ,
pnum ,
month,
year ,
COUNT(CASE WHEN DATEPART(day,[date]) < 15 THEN 1 END) deal_in_first15 ,
COUNT(CASE WHEN DATEPART(day,[date]) >= 15 THEN 1 END) deal_after_first15
FROM Deal
GROUP BY id ,pnum ,month,year
) d on
p.pnum=d.pnum AND p.id=d.id AND p.month=d.month
group by FORMAT(p.[date],'yyyy-MM') ,
p.pnum,
deal_in_first15 ,
deal_after_first15
I think there is another way might you want, using two subquery then JOIN
select p.pnum ,
p.cnt 'count',
CONCAT(p.year,'-',FORMAT(p.month,'0#')) 'year-month',
deal_in_first15 ,
deal_after_first15
from (
SELECT id ,
pnum ,
month,
year,
count(*) cnt
FROM Product
GROUP BY id ,
pnum ,
month,
year
) p
inner join (
SELECT id ,
pnum ,
month,
year ,
COUNT(CASE WHEN DATEPART(day,[date]) < 15 THEN 1 END) deal_in_first15 ,
COUNT(CASE WHEN DATEPART(day,[date]) >= 15 THEN 1 END) deal_after_first15
FROM Deal
GROUP BY id ,pnum ,month,year
) d on
p.pnum=d.pnum
AND p.id=d.id
AND p.month=d.month
AND p.year=d.year
sqlfiddle
Note
I would use the year column to be JOIN condition. otherwise, the result will be wrong when you cross-year meet the same month.
This is it:
select pnum,cnt,sum(mn1),sum(mn2) from (select d."pnum",(select count(*) from Product p where p."pnum"=d."pnum"
) cnt,
case when
EXTRACT(DAY FROM d."date")<=15 then sum(1) else sum(0) end mn1 ,
case when
EXTRACT(DAY FROM d."date")>15 then sum(1) else sum(0) end mn2,
d."year"||'-'||d."month"
from deal d
group by 1,d."year"||'-'||d."month",d."date")abc group by 1,2;
please check at:http://sqlfiddle.com/#!17/3bad9/18

Getting duplication WITH partition by

I want to know which customers who ordered in June 2020 also ordered in June 2021. My code returns the correct DISTINCT orders, but discounted sales is incorrect for customers who placed more than one order in either year. For example, a customer who placed one order in 2020 and four orders in 2021 has 2020 discounted sales at 4x the actual amount. The four orders in 2021 have four rows, and the one 2020 order populates against each. I saw this by using ROW_NUMBER () which exposed the underlying problem. I cannot use DISTINCT with discounted sales because customers do place multiple orders for identical dollar amounts. How do I get the exact discounted sales using standard SQL for BQ?
SELECT
DISTINCT ly.cuid AS cuid,
COUNT(DISTINCT ly.order_id) OVER (PARTITION BY ly.cuid) AS ly_orders,
SUM(ly.discounted_sales) OVER (PARTITION BY ly.cuid) AS ly_demand,
COUNT(DISTINCT ty.order_id) OVER (PARTITION BY ty.cuid) AS ty_orders,
SUM(ty.discounted_sales) OVER (PARTITION BY ly.cuid) AS ty_demand
FROM table ly
LEFT JOIN table ty
ON ly.cuid = ty.cuid
WHERE ly.order_date BETWEEN '2020-06-01' AND '2020-06-30'
AND ty.order_date BETWEEN '2021-06-01'AND '2021-06-30'
AND ly.financial_status <> 'credit'
AND ty.financial_status <> 'credit'
AND ly.discounted_sales >0
AND ty.discounted_sales >0
AND ly.channel = 'b2b'
AND ty.channel = 'b2b'
ORDER BY ly.cuid asc
[Results]
cuid ly_orders ly_demand ty_orders ty_demand comments
D 1 22,466.40 4 154,596.24 ly is 4x actual
F 2 2,573.20 1 1,944.40 ty is 2x actual
G 1 32,134.40 4 1,632.00 ly is 4x actual
I 2 757.56 1 730.56 ty is 2x actual
J 2 54,859.00 2 23,822.32 both are 2x actual
THIS WORKED:
WITH prior_period AS (
SELECT
DISTINCT cuid AS ly_cuid,
COUNT(DISTINCT order_id) OVER (PARTITION BY cuid) AS ly_orders,
SUM(discounted_sales) OVER (PARTITION BY cuid) AS ly_demand
FROM TABLE
WHERE EXTRACT (YEAR FROM order_date) = 2020 AND EXTRACT(MONTH FROM order_date) = 6
AND financial_status <> 'credit'
AND discounted_sales >0
AND channel = 'b2b'
GROUP BY cuid, order_id, discounted_sales
ORDER BY cuid asc),
this_period AS (
SELECT
DISTINCT cuid AS ty_cuid,
COUNT(DISTINCT order_id) OVER (PARTITION BY cuid) AS ty_orders,
SUM(discounted_sales) OVER (PARTITION BY cuid) AS ty_demand
FROM TABLE
WHERE EXTRACT (YEAR FROM order_date) = 2021 AND EXTRACT(MONTH FROM order_date) = 6
AND financial_status <> 'credit'
AND discounted_sales >0
AND channel = 'b2b'
GROUP BY cuid, order_id, discounted_sales
ORDER BY cuid asc)
SELECT *
FROM prior_period ly
JOIN this_period ty ON ly.ly_cuid = ty.ty_cuid
ORDER BY ly.ly_cuid
Updated with your schema and approximate data:
Try this...
WITH periods AS (
SELECT cuid AS cuid
, COUNT(*) AS orders
, SUM(discounted_sales) AS demand
, EXTRACT(YEAR FROM order_date) AS yr
FROM demand2
WHERE EXTRACT(YEAR FROM order_date) IN (2020, 2021) AND EXTRACT(MONTH FROM order_date) = 6
AND financial_status <> 'credit'
AND discounted_sales > 0
AND channel = 'b2b'
GROUP BY cuid, EXTRACT(YEAR FROM order_date)
)
SELECT ly.cuid
, ly.orders AS ly_orders
, ly.demand AS ly_demand
, ty.orders AS ty_orders
, ty.demand AS ty_demand
FROM periods AS ly
JOIN periods AS ty
ON ly.cuid = ty.cuid
AND ly.yr = 2020
AND ty.yr = 2021
ORDER BY ly.cuid
;
The result:
+------+-----------+-----------+-----------+-----------+
| cuid | ly_orders | ly_demand | ty_orders | ty_demand |
+------+-----------+-----------+-----------+-----------+
| D | 1 | 5616.60 | 4 | 154596.24 |
| F | 2 | 2573.20 | 1 | 972.20 |
| G | 1 | 8033.60 | 4 | 1632.56 |
| I | 2 | 757.56 | 1 | 365.28 |
| J | 2 | 27429.50 | 2 | 11911.16 |
+------+-----------+-----------+-----------+-----------+
Here's a similar example with data, SQL and results to show both the incorrect result and the correct result.
The data:
SELECT * FROM demand ORDER BY account_id, period;
+----+------------+--------+--------+
| id | account_id | period | demand |
+----+------------+--------+--------+
| 1 | 1 | 202005 | 100 |
| 2 | 1 | 202005 | 120 |
| 3 | 1 | 202105 | 105 |
| 4 | 1 | 202105 | 125 |
| 5 | 1 | 202105 | 30 |
| 6 | 2 | 202005 | 200 |
| 7 | 2 | 202105 | 240 |
+----+------------+--------+--------+
The incorrect SQL, without SUMs to just show the join behavior:
SELECT t1.id, t1.account_id, t1.period, t1.demand AS demand1
, t2.id, t2.period, t2.demand AS demand2
FROM demand AS t1
LEFT JOIN demand AS t2
ON t1.account_id = t2.account_id
AND t2.period = 202105
WHERE t1.period = 202005
ORDER BY t1.account_id, t1.period, demand1, demand2
;
+----+------------+--------+---------+------+--------+---------+
| id | account_id | period | demand1 | id | period | demand2 |
+----+------------+--------+---------+------+--------+---------+
| 1 | 1 | 202005 | 100 | 5 | 202105 | 30 |
| 1 | 1 | 202005 | 100 | 3 | 202105 | 105 |
| 1 | 1 | 202005 | 100 | 4 | 202105 | 125 |
| 2 | 1 | 202005 | 120 | 5 | 202105 | 30 |
| 2 | 1 | 202005 | 120 | 3 | 202105 | 105 |
| 2 | 1 | 202005 | 120 | 4 | 202105 | 125 |
| 6 | 2 | 202005 | 200 | 7 | 202105 | 240 |
+----+------------+--------+---------+------+--------+---------+
Notice account 2 doesn't have a problem because only one demand was found last year and this year.
But account 1 found 2 demand rows for last year and 3 demand rows for this year, leading to (2 x 3) = 6 rows in the joined result. This is the source of the problem.
To correct this, we aggregate before the JOIN, so that we have at most one (1) row per account / period to be joined.
One form of the correct solution, with SUMs derived in the CTE term:
WITH demands AS (
SELECT account_id, period
, SUM(demand) AS demand
, COUNT(*) AS orders
FROM demand
GROUP BY account_id, period
)
SELECT ly.account_id, ly.period
, ly.orders AS ly_orders
, ly.demand AS ly_demand
, ty.orders AS ty_orders
, ty.demand AS ty_demand
FROM demands AS ly
LEFT JOIN demands AS ty
ON ly.account_id = ty.account_id
AND ty.period = 202105
WHERE ly.period = 202005
ORDER BY ly.account_id, ly.period, ly_demand, ty_demand
;
The result:
+------------+--------+-----------+-----------+-----------+-----------+
| account_id | period | ly_orders | ly_demand | ty_orders | ty_demand |
+------------+--------+-----------+-----------+-----------+-----------+
| 1 | 202005 | 2 | 220 | 3 | 260 |
| 2 | 202005 | 1 | 200 | 1 | 240 |
+------------+--------+-----------+-----------+-----------+-----------+
Since we performed aggregation in the CTE term (demands), the join found at most one row for each period for each account.

SQL query grouping by range

Hi have a table A with the following data:
+------+-------+----+--------+
| YEAR | MONTH | PA | AMOUNT |
+------+-------+----+--------+
| 2020 | 1 | N | 100 |
+------+-------+----+--------+
| 2020 | 2 | N | 100 |
+------+-------+----+--------+
| 2020 | 3 | O | 100 |
+------+-------+----+--------+
| 2020 | 4 | N | 100 |
+------+-------+----+--------+
| 2020 | 5 | N | 100 |
+------+-------+----+--------+
| 2020 | 6 | O | 100 |
+------+-------+----+--------+
I'd like to have the following result:
+---------+---------+--------+
| FROM | TO | AMOUNT |
+---------+---------+--------+
| 2020-01 | 2020-02 | 200 |
+---------+---------+--------+
| 2020-03 | 2020-03 | 100 |
+---------+---------+--------+
| 2020-04 | 2020-05 | 200 |
+---------+---------+--------+
| 2020-06 | 2020-06 | 100 |
+---------+---------+--------+
My DB is DB2/400.
I have tried with ROW_NUMBER partitioning, subqueries but I can't figure out how to solve this.
I understand this as a gaps-and-island problem, where you want to group together adjacent rows that have the same PA.
Here is an approach using the difference between row numbers to build the groups:
select min(year_month) year_month_start, max(year_month) year_month_end, sum(amount) amount
from (
select a.*, year * 100 + month year_month
row_number() over(order by year, month) rn1,
row_number() over(partition by pa order by year, month) rn2
from a
) a
group by rn1 - rn2
order by year_month_start
You can try the below -
select min(year)||'-'||min(month) as from_date,max(year)||'-'||max(month) as to_date,sum(amount) as amount from
(
select *,row_number() over(order by month)-
row_number() over(partition by pa order by month) as grprn
from t1
)A group by grprn,pa order by grprn
This works in tsql, guess you can adaot it to db2-400?
SELECT MIN(Dte) [From]
, MAX(Dte) [To]
-- ,PA
, SUM(Amount)
FROM (
SELECT year * 100 + month Dte
, Pa
, Amount
, ROW_NUMBER() OVER (PARTITION BY pa ORDER BY year * 100 + month) +
10000- (YEar*100+Month) rn
FROM tabA a
) b
GROUP BY Pa
, rn
ORDER BY [From]
, [To]
The trick is the row number function partitioned by PA and ordered by date, This'll count one up for each month for the, when added to the descnding count of month and month, you will get the same number for consecutive months with same PA. You the group by PA and the grouping yoou made, rn, to get the froups, and then Bob's your uncle.

How do I create cohorts of users from month of first order, then count information about those orders in SQL?

I'm trying to use SQL to:
Create user cohorts by the month of their first order
Sum the total of all the order amounts bought by that cohort all-time
Output the cohort name (its month), the cohort size (total users who made first purchase in that month), total_revenue (all order revenue from the users in that cohort), and avg_revenue (the total_revenue divided by the cohort size)
Please see below for a SQL Fiddle, with sample tables, and the expected output:
http://www.sqlfiddle.com/#!15/b5937
Thanks!!
Users Table
+-----+---------+
| id | name |
+-----+---------+
| 1 | Adam |
| 2 | Bob |
| 3 | Charles |
| 4 | David |
+-----+---------+
Orders Table
+----+--------------+-------+---------+
| id | date | total | user_id |
+----+--------------+-------+---------+
| 1 | '2020-01-01' | 100 | 1 |
| 2 | '2020-01-02' | 200 | 2 |
| 3 | '2020-03-01' | 300 | 3 |
| 4 | '2020-04-01' | 400 | 1 |
+----+--------------+-------+---------+
Desired Output
+--------------+--------------+----------------+-------------+
| cohort | cohort_size | total_revenue | avg_revenue |
+--------------+--------------+----------------+-------------+
| '2020-01-01' | 2 | 700 | 350 |
| '2020-03-01' | 1 | 300 | 300 |
+--------------+--------------+----------------+-------------+
You can find the minimum date for every user and aggregate for them. Then you can aggregate for every such date:
with first_orders(user_id, cohort, total) as (
select user_id, min(ordered_at), sum(total)
from orders
group by user_id
)
select to_char(date_trunc('month', fo.cohort), 'YYYY-MM-DD'), count(fo.user_id), sum(fo.total), avg(fo.total)
from first_orders fo
group by date_trunc('month', fo.cohort)
You can use window functions to get the first date. The rest is then aggregation:
select date_trunc('month', first_date) as yyyymm,
count(distinct user_id), sum(total), sum(total)/ count(distinct user_id)
from (select o.*, min(o.ordered_at) over (partition by o.user_id) as first_date
from orders o
) o
group by date_trunc('month', first_date)
order by yyyymm;
Here is the SQL fiddle.