Cumulated Cohorts in SQL

Cumulated Cohorts in SQL - sql

I have the following table :
cohort
month cohort
orders
cumulated orders
2021-01
0
126
126
2021-01
1
5
131
2021-01
2
4
135
2021-02
0
131
131
2021-02
1
9
140
2021-02
2
8
148
And now I want to have the following table where I divide each repeat orders by the number of orders of month 0 :
cohort
month cohort
orders
cumulated orders
cumulated in %
2021-01
0
126
126
100%
2021-01
1
5
131
104%
2021-01
2
4
135
107%
2021-02
0
131
131
100%
2021-02
1
9
140
107%
2021-02
2
8
148
114%
My only hint is to create a CASE statement, but I don't want each month to update the query by adding the line
WHEN cohort="2021-08" THEN cumulated orders / 143
where 143 is the number of orders of cohort 2021-08 at month cohort =0
Has someone got an idea how to get this table ?

A case expression isn't needed. You can use first_value():
select t.*,
( cumulated_order /
first_value(orders) over (partition by cohort order by month_cohort)
) as ratio
from t;
If you really wanted a case, you could use:
select t.*,
( cumulated_order /
max(case when month_cohort = 0 then orders end) over (partition by cohort)
) as ratio
from t;

Consider below
select *,
round(100 * cumulated_orders /
sum(if(month_cohort = 0, orders, 0)) over(partition by cohort)
) as cumulated_in_percent
from `project.dataset.table`
if applied to sample data in your question - output is

Related

SQL Server : how to count a column that if exist duplicate data, it reuse duplicate data first exist count number

Table A
shop
amount
count
sameShopCount
shop5
100
1
1
shop2
99
2
1
shop3
98
3
1
shop4
97
4
1
shop1
96
5
1
shop2
95
6
2
shop4
94
7
2
shop5
93
8
2
shop5
92
9
3
shop1
91
10
2
shop5
90
11
4
shop3
89
12
2
Expected Result (order by amount desc):
shop
amount
expected result
shop5
100
1
shop2
99
2
shop3
98
3
shop4
97
4
shop1
96
5
shop2
95
2
shop4
94
4
shop5
93
1
shop5
92
1
shop1
91
5
shop5
90
1
shop3
89
3
I want to count shop column similar to count column in Table A. But also if shop exist more than 1 time it will reuse the first exist count number.
How can I achieved this with/without a temp table in SQL Server respectively? (SQL Server 2014 - build v12.0.6108.1)
I had tried something like:
ROW_NUMBER() OVER (ORDER BY amount DESC)
DENSE_RANK() OVER (PARTITION BY shop ORDER BY amount DESC)

Try using max and dense_rank window functions as the following:
with max_shop_amount as
(
select *,
max(amount) over (partition by shop) as mx
from table_name
)
select shop, amount,
dense_rank() over (order by mx desc) expected
from max_shop_amount
order by amount desc
See demo

range between interval that does not include the first row

I have a line:
sum(purchases) over(partition by category order by value_day range between interval '1' month preceding and current row)
If value_day = Aug 21, it returns sum from and included July 21 till and included Aug 21, but I need from and included July 22 till and included Aug 21.
How can I do that?

You can use an expression to define the starting point of the window. So you can
Subtract a month from the current date
Add a day to it
Giving something like:
sum ( purchases ) over (
partition by category
order by value_day
range between ( value_day - ( add_months ( value_day, -1 ) + 1 ) ) preceding
and current row
)

You can either:
Use two windowed functions, your one to add everything from the past month and then subtract a second one that just covers the range you do not want to include; or
Use a correlated sub-query rather than windowed analytic functions.
SELECT t.*,
sum(purchases) over(
partition by category
order by value_day
range between interval '1' month preceding and current row
) -
COALESCE(
sum(purchases) over(
partition by category
order by value_day
range between interval '1' month preceding and interval '1' month preceding
),
0
) AS total1,
( SELECT SUM(s.purchases)
FROM table_name s
WHERE t.category = s.category
AND ADD_MONTHS(t.value_day, -1) + INTERVAL '1' DAY <= s.value_day
AND s.value_day <= t.value_day
) AS total2
FROM table_name t;
Which, for the sample data:
CREATE TABLE table_name (category, value_day, purchases) AS
SELECT 1, DATE '2022-01-01' + LEVEL - 1, LEVEL
FROM DUAL
CONNECT BY LEVEL <= 50;
Outputs:
CATEGORY
VALUE_DAY
PURCHASES
TOTAL1
TOTAL2
...
...
...
...
...
1
01-FEB-22
32
527
527
1
02-FEB-22
33
558
558
1
03-FEB-22
34
589
589
1
04-FEB-22
35
620
620
1
05-FEB-22
36
651
651
1
06-FEB-22
37
682
682
1
07-FEB-22
38
713
713
1
08-FEB-22
39
744
744
1
09-FEB-22
40
775
775
1
10-FEB-22
41
806
806
1
11-FEB-22
42
837
837
1
12-FEB-22
43
868
868
1
13-FEB-22
44
899
899
1
14-FEB-22
45
930
930
1
15-FEB-22
46
961
961
1
16-FEB-22
47
992
992
1
17-FEB-22
48
1023
1023
1
18-FEB-22
49
1054
1054
1
19-FEB-22
50
1085
1085
db<>fiddle here

Oracle Cumulative Calculation using Windows Analytical functions

I am new to Windows functions (lag, lead etc) and I am using Oracle. I did some research and tried few solutions but I couldnt get the desired result.
I have an inventory table, where I want to find out the items used and the remaining item left on the given day.
The data comes in as follows
Dates | Items | Total_Inv_Items | Damaged_Items | Sellable_Items | Sold_Items | Remaining_Items
11/13/2020 Pen 999 15 984 109 875
11/13/2020 Book 401 6 386 109 277
11/14/2020 Pen 0 0 0 121 0
11/14/2020 Book 0 0 0 121 0
11/15/2020 Pen 0 0 0 31 0
11/15/2020 Book 0 0 0 31 0
11/16/2020 Pen 201 3 198 33 165
11/16/2020 Book 301 5 296 33 263
Useable_Items and Remaing_Items are caluclated columns.
Sellable_Items = Total_Inv_Items - Damaged_Items
Remaning_Items = Sellable_Items - Sold_Items
Total_Inv_Items is manufacture or new items available in the store.
The desired output:
Dates | Items | Total_Inv_Items | Damaged_Items | Sellable_Items | Sold_Items | Remaining_Items
11/13/2020 Pen 999 15 984 109 875
11/13/2020 Book 401 6 386 109 277
11/14/2020 Pen 0 0 875 121 754
11/14/2020 Book 0 0 277 121 156
11/15/2020 Pen 0 0 754 31 721
11/15/2020 Book 0 0 156 31 125
11/16/2020 Pen 201 3 919 33 886
11/16/2020 Book 301 5 421 33 388
The remanining item is cummulative and the sellable items changes whenever there is a new item added to the inventory or it is same as previous days remaining item.
Query that gives me first data
SELECT
Date,
Items,
NVL(Total_Inv_Items, 0) as Total_Inv_Items,
NVL(Round(Total_Inv_Items * DamagePercentage/100), 0) as Damaged_Item,
Round(Total_Inv_Items - (Total_Inv_Items * DamagePercentage/100)) as Sellable_Items,
Sold_Items,
Round(Total_Inv_Items - (Total_Inv_Items * DamagePercentage/100))- Sold_Items as Remaining_Items
FROM
Table
Order By
Date
Note:
If you notice on Nov 16th there was new items that was available in the store so it adds to the previous days remaining items(Ex Pen 201-3+721).

You can use the lag function as follows:
SELECT Date,
Items,
Total_Inv_Items,
Damaged_Item,
-- FOLLOWING IS CALCULATED EXPRESSION BASED ON YOUR FORMULA: Pen 201-3+721
Total_Inv_Items
- Damaged_Item
+ COALESCE(LAG(Remaining_Items) OVER (PARTITION BY ITEMS ORDER BY DATE),0) AS Sellable_Items,
-- EXPRESSION ENDS HERE
Sold_Items,
Remaining_Items
FROM
(SELECT DISTINCT
Date,
Items,
NVL(Total_Inv_Items, 0) as Total_Inv_Items,
NVL(Round(Total_Inv_Items * DamagePercentage/100), 0) as Damaged_Item,
Round(Total_Inv_Items - (Total_Inv_Items * DamagePercentage/100)) as Sellable_Items,
Sold_Items,
Round(Total_Inv_Items - (Total_Inv_Items * DamagePercentage/100))- Sold_Items as Remaining_Items
FROM Table)
Order By Date
This expression should be placed in the outer query. The inner query should be your query which is giving the first data.

I think you just want cumulative sums and arithmetic:
select t.*,
sum(total_inv_items - damaged_items) over (partition by item order by date) as sellable_items,
sum(total_inv_items - damaged_items - sold_items) over (partition by item order by date) as remaining_items
from t
order by date;

Thanks #Gordon Linoff and #Tejash, your comments helped me to resolve this issue. The following code gave me the desired result.
NOTE: 1.5 is the pecentage of Damaged Items which I have included it directly in the query, most of the time its comes from the DB.
SELECT
Dates,Items,
Total_Inv_Items,
Damaged_Item,
CASE
WHEN Total_Inv_Items = 0 THEN LAG(Remaining_Items) OVER (partition by items order by dates)
ELSE Sellable_Items+NVL(LAG(Remaining_Items) OVER (partition by items order by dates),0)
END as Sellable_Items,
Sold_Items,
Remaining_Items
FROM
(
SELECT
Dates,Items,
Total_Inv_Items,
Damaged_Item,
Sellable_Items,
Sold_Items,
SUM(Sellable_Items - Sold_Items) OVER (partition by items order by dates) as Remaining_Items
FROM
(
SELECT
Dates,
Items,
NVL(Total_Inv_Items, 0) as Total_Inv_Items,
NVL(Round(Total_Inv_Items * 1.5/100), 0) as Damaged_Item,
Round(Total_Inv_Items - (Total_Inv_Items * 1.5/100)) as Sellable_Items,
Sold_Items,
Round(Total_Inv_Items - (Total_Inv_Items * 1.5/100))- Sold_Items as Remaining_Items
FROM
Table T
) A
)B
Order By
Dates

Calculate Sub Query Column Based On Calculated Column

I have a table ScheduleRotationDetail that contains these as columns:
ScheduleRotationID ScheduleID Ordinal Duration
379 61 1 1
379 379 2 20
379 512 3 1
379 89 4 20
I have a query that goes like this in order to get the day of the year each schedule is supposed to start on:
SELECT ScheduleID, Ordinal, Duration,
,Duration * 7 AS DurationDays
,( SELECT ( ISNULL( SUM(ISNULL( Duration, 0 )), 0 ) - 1 ) * 7
FROM ScheduleRotationDetail WHERE ScheduleRotationID = srd.ScheduleRotationID
AND Ordinal <= srd.Ordinal ) AS StartDay
FROM ScheduleRotationDetail srd
WHERE srd.ScheduleRotationID = 379
That outputs this as the result set:
ScheduleID Ordinal Duration DurationDays StartDay
61 1 1 7 0
379 2 20 140 140
512 3 1 7 147
89 4 20 140 287
Yet what I need the start day column values to be are:
0
7
147
154
I have tried CTEs but can't get it to work so I've come to here for advice.

It looks like you want a cumulative sum. In SQL Server 2012+, you can do:
SELECT ScheduleID, Ordinal, Duration,
SUM(Duration*7) OVER (ORDER BY Ordinal) - Duration*7 as StartDate
FROM ScheduleRotationDetail srd ;
In earlier versions, you can use APPLY for this purpose (or a correlated subquery).

Sqlite substract sums (with group by) with JOIN and duplicates

I previously found the solution to my problem but unfortunately I lost files on my harddrive and I can't find the statement I managed to produce.
I have 2 tables T2REQ and T2STOCK, both have 2 columns (typeID and quantity) and my problem reside in the fact that I can have multiple occurences of SAME typeID in BOTH tables.
What I'm trying to do is SUM(QUANTITY) grouped by typeID and substract the values of T2STOCK from T2REQ but since I have multiple occurences of same typeID in both tables, the SUM I get is multiplied by the number of occurences of typeID.
Here's a sample of T2REQ (take typeID 11399 for example):
typeID quantity
---------- ----------
34 102900
35 10500
36 3220
37 840
11399 700
563 140
9848 140
11486 28
11688 700
11399 390
4393 130
9840 390
9842 390
11399 390
11483 19.5
11541 780
And this is a sample of T2STOCK table :
typeID quantity
---------- ----------
9842 1921
9848 2400
11399 1700
11475 165
11476 27
11478 28
11481 34
11483 122
11476 2
And this is where I'm at for now, I know that the SUM(t2stock.quantity) is affected (multiplied) because of the JOIN 1 = 1 but whatever I tried, I'm not doing it in the right order:
SELECT
t2req.typeID, sum(t2req.quantity), sum(t2stock.quantity),
sum(t2req.quantity) - sum(t2stock.quantity) as diff
FROM t2req JOIN t2stock ON t2req.typeID = t2stock.typeID
GROUP BY t2req.typeID
ORDER BY diff DESC;
typeID sum(t2req.quantity) sum(t2stock.quantity) diff
---------- ------------------- --------------------- ----------
563 140 30 110
11541 780 780 0
11486 28 40 -12
11483 19.5 122 -102.5
9840 390 1000 -610
40 260 940 -680
9842 390 1921 -1531
9848 140 2400 -2260
11399 1480 5100 -3620
39 650 7650 -7000
37 1230 116336 -115106
36 28570 967098 -938528
35 33770 2477820 -2444050
34 102900 2798355 -2695455
You can see that SUM(t2req) for typeID 11399 is correct : 1480
And you can see that the SUM(t2stock) for typeID 11399 is not correct : 5100 instead of 1700 (which is 5100 divided by 3, the number of occurences in t2req)
What would be the best way to avoid multiplications because of multiple typeIDs (in both tables) with the JOIN for my sum substract ?
Sorry for the wall of text, just trying to explain as best as I can since english is not my mother tongue.
Thanks a lot for your help.

You can aggregate before join:
SELECT
t2req.typeID,
t2req.quantity,
t2stock.quantity,
t2req.quantity - t2stock.quantity as diff
FROM
(SELECT TypeID, SUM(Quantity) Quantity FROM t2req GROUP BY TypeID) t2req JOIN
(SELECT TypeID, SUM(Quantity) Quantity FROM t2stock GROUP BY TypeID) t2stock
ON t2req.typeID = t2stock.typeID
ORDER BY diff DESC;
Fiddle sample: http://sqlfiddle.com/#!7/06711/5

You can't do this in a single aggregation:
SELECT
COALESCE(r.typeID, s.typeID) AS typeID,
COALESCE(r.quantity, 0) AS req_quantity,
COALESCE(s.quantity, 0) AS stock_quantity,
COALESCE(r.quantity, 0) - COALESCE(s.quantity, 0) AS diff
FROM (
SELECT rr.typeID, SUM(rr.quantity) AS quantity
FROM t2req rr
GROUP BY rr.typeID
) r
CROSS JOIN (
SELECT ss.typeID, SUM(ss.quantity) AS quantity
FROM t2stock ss
GROUP BY ss.typeID
) s ON r.typeID = s.typeID
ORDER BY 4 DESC;

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Cumulated Cohorts in SQL - sql

Consider below select , round(100 cumulated_orders / sum(if(month_cohort = 0, orders, 0)) over(partition by cohort) ) as cumulated_in_percent from `project.dataset.table` if applied to sample data in your question - output is

Related

SQL Server : how to count a column that if exist duplicate data, it reuse duplicate data first exist count number

range between interval that does not include the first row

Oracle Cumulative Calculation using Windows Analytical functions

Calculate Sub Query Column Based On Calculated Column

Sqlite substract sums (with group by) with JOIN and duplicates

Categories

Resources

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Cumulated Cohorts in SQL - sql

Consider below select *, round(100 * cumulated_orders / sum(if(month_cohort = 0, orders, 0)) over(partition by cohort) ) as cumulated_in_percent from `project.dataset.table` if applied to sample data in your question - output is

Related

SQL Server : how to count a column that if exist duplicate data, it reuse duplicate data first exist count number

range between interval that does not include the first row

Oracle Cumulative Calculation using Windows Analytical functions

Calculate Sub Query Column Based On Calculated Column

Sqlite substract sums (with group by) with JOIN and duplicates

Categories

Resources

Consider below select , round(100 cumulated_orders / sum(if(month_cohort = 0, orders, 0)) over(partition by cohort) ) as cumulated_in_percent from `project.dataset.table` if applied to sample data in your question - output is