Use oracle LAG function data in calculations - sql

I want get STOCK and weighted average unit cost WAUC.
TABLE T1:
ROW
ITEM
IN
OUT
PRICE
STOCK
WAUC
1
A
1000
-
20
1000
20
2
A
2000
-
25
-
-
3
A
1500
-
15
-
-
4
A
500
-
20
-
-
I have the first 5 columns and the first two records of last two columns, and want to get the rest of last two columns STOCK and WAUC.
WAUC = ((PREVIOUS_PRICE * PREVIOUS_STOCK) + (CURRENT_IN * CURRENT_PRICE)) / CURRENT_STOCK
So, I write the query:
SELECT ROW,
SUM(IN - OUT) OVER(ORDER BY ROW) STOCK,
((LAG(STOCK * WAUC) OVER (ORDER BY ROW)) + (IN * PRICE)) / STOCK AS WAUC
FROM T1
What I want is :
ROW
ITEM
IN
OUT
PRICE
STOCK
WAUC
1
A
1000
-
20
1000
20
2
A
2000
-
25
3000
23.33
3
A
1500
-
15
4500
20.55
4
A
500
-
20
5000
20.49
In other words, I want to use LAG results in calculation data.

Your formula should be:
WAUC = (
PREVIOUS_WAUC * PREVIOUS_STOCK
+ (CURRENT_IN - CURRENT_OUT) * CURRENT_PRICE
)
/ CURRENT_STOCK
You can use a MODEL clause (with some extra measurements to make the calculation simpler):
SELECT "ROW", item, "IN", "OUT", price, stock, wauc
FROM t1
MODEL
DIMENSION BY ("ROW")
MEASURES (item, "IN", "OUT", price, 0 AS change, 0 AS stock, 0 AS total, 0 AS wauc)
RULES (
change["ROW"] = COALESCE("IN"[cv()], 0) - COALESCE("OUT"[cv()], 0),
stock["ROW"] = change[cv()] + COALESCE(stock[cv()-1], 0),
total["ROW"] = change[cv()] * price[cv()] + COALESCE(total[cv()-1], 0),
wauc["ROW"] = total[cv()] / stock[cv()]
);
Or, from Oracle 12, using MATCH_RECOGNIZE:
SELECT "ROW",
item,
"IN",
"OUT",
price,
total_stock AS stock,
total_cost / total_stock AS wauc
FROM t1
MATCH_RECOGNIZE(
ORDER BY "ROW"
MEASURES
SUM(COALESCE("IN", 0) - COALESCE("OUT", 0)) AS total_stock,
SUM((COALESCE("IN", 0) - COALESCE("OUT", 0))*price) AS total_cost
ALL ROWS PER MATCH
PATTERN (all_rows+)
DEFINE
all_rows AS 1 = 1
)
Or analytic functions:
SELECT "ROW",
item,
"IN",
"OUT",
price,
SUM(COALESCE("IN",0) - COALESCE("OUT", 0)) OVER (ORDER BY "ROW")
AS stock,
SUM((COALESCE("IN",0) - COALESCE("OUT", 0))*price) OVER (ORDER BY "ROW")
/ SUM(COALESCE("IN",0) - COALESCE("OUT", 0)) OVER (ORDER BY "ROW")
AS wauc
FROM t1
Which, for the sample data:
CREATE TABLE t1 ("ROW", ITEM, "IN", "OUT", PRICE, STOCK, WAUC) AS
SELECT 1, 'A', 1000, CAST(NULL AS NUMBER), 20, CAST(NULL AS NUMBER), CAST(NULL AS NUMBER) FROM DUAL UNION ALL
SELECT 2, 'A', 2000, NULL, 25, NULL, NULL FROM DUAL UNION ALL
SELECT 3, 'A', 1500, NULL, 15, NULL, NULL FROM DUAL UNION ALL
SELECT 4, 'A', 500, NULL, 20, NULL, NULL FROM DUAL;
All output:
ROW
ITEM
IN
OUT
PRICE
STOCK
WAUC
1
A
1000
20
1000
20
2
A
2000
25
3000
23.33333333333333333333333333333333333333
3
A
1500
15
4500
20.55555555555555555555555555555555555556
4
A
500
20
5000
20.5
Note: ROW, IN and OUT are keywords and you should not use them as identifiers as you would have to use quoted identifiers everywhere they occur.
db<>fiddle here

Your problem has nothing to do with "lag".
Using MT0's sample data:
select "ROW", item, "IN", "OUT", price,
sum(nvl("IN", 0) - nvl("OUT", 0))
over (partition by item order by "ROW") as stock,
round(sum((nvl("IN", 0) - nvl("OUT", 0)) * price)
over (partition by item order by "ROW")
/ sum(nvl("IN", 0) - nvl("OUT", 0))
over (partition by item order by "ROW"), 2 ) as wauc
from t1
;
ROW ITEM IN OUT PRICE STOCK WAUC
------ ---- ------ ------ ------ ------ ------
1 A 1000 20 1000 20
2 A 2000 25 3000 23.33
3 A 1500 15 4500 20.56
4 A 500 20 5000 20.5

Related

SQL decreasing sum by a percentage

I have a table like
timestamp
type
value
08.01.2023
1
5
07.01.2023
0
20
06.01.2023
1
1
05.01.2023
0
50
04.01.2023
0
50
03.01.2023
1
1
02.01.2023
1
1
01.01.2023
1
1
Type 1 means a deposit, type 0 means a withdrawal.
The thing is when a type is 1 then the amount is the exact amount the user deposited so we can just sum that but type 0 means a withdrawal in percentage.
What I'm looking for is to create another column with current deposited amount. For the example above it would look like that.
timestamp
type
value
deposited
08.01.2023
1
5
5.4
07.01.2023
0
20
1.4
06.01.2023
1
1
1.75
05.01.2023
0
50
0.75
04.01.2023
0
50
1.5
03.01.2023
1
1
3
02.01.2023
1
1
2
01.01.2023
1
1
1
I can't figure out how to make a sum like this which would subtract percentage of previous total
You are trying to carry state over time, so ether need to use a UDTF to doing the carry work for you. Or use a recursive CTE
with data(transaction_date, type, value) as (
select to_date(column1, 'dd.mm.yyyy'), column2, column3
from values
('08.01.2023', 1, 5),
('07.01.2023', 0, 20),
('06.01.2023', 1, 1),
('05.01.2023', 0, 50),
('04.01.2023', 0, 50),
('03.01.2023', 1, 1),
('02.01.2023', 1, 1),
('01.01.2023', 1, 1)
), pre_process_data as (
select *
,iff(type = 0, 0, value)::number as add
,iff(type = 0, value, 0)::number as per
,row_number()over(order by transaction_date asc) as rn
from data
), rec_cte_block as (
with recursive rec_sub_cte as (
select
p.*,
p.add::number(20,4) as deposited
from pre_process_data as p
where p.rn = 1
union all
select
p.*,
round(div0((r.deposited + p.add)*(100-p.per), 100), 2) as deposited
from rec_sub_cte as r
left join pre_process_data as p
where p.rn = r.rn+1
)
select *
from rec_sub_cte
)
select * exclude(add, per, rn)
from rec_cte_block
order by 1;
I wrote the recursive CTE this way, as there currently is an incident if IFF or CASE is used inside the CTE.
TRANSACTION_DATE
TYPE
VALUE
DEPOSITED
2023-01-01
1
1
1
2023-01-02
1
1
2
2023-01-03
1
1
3
2023-01-04
0
50
1.5
2023-01-05
0
50
0.75
2023-01-06
1
1
1.75
2023-01-07
0
20
1.4
2023-01-08
1
5
6.4
Solution without recursion and UDTF
create table depo (timestamp date,type int, value float);
insert into depo values
(cast('01.01.2023' as date),1, 1.0)
,(cast('02.01.2023' as date),1, 1.0)
,(cast('03.01.2023' as date),1, 1.0)
,(cast('04.01.2023' as date),0, 50.0)
,(cast('05.01.2023' as date),0, 50.0)
,(cast('06.01.2023' as date),1, 1.0)
,(cast('07.01.2023' as date),0, 20.0)
,(cast('08.01.2023' as date),1, 5.0)
;
with t0 as(
select *
,sum(case when type=0 and value>=100 then 1 else 0 end)over(order by timestamp) gr
from depo
)
,t1 as (select timestamp as dt,type,gr
,case when type=1 then value else 0 end depo
,case when type=0 then ((100.0-value)/100.0) else 0.0 end pct
,sum(case when type=0 and value<100 then log((100.0-value)/100.0,2.0)
when type=0 and value>=100 then null
else 0.0
end)
over(partition by gr order by timestamp ROWS BETWEEN CURRENT ROW
AND UNBOUNDED FOLLOWING) totLog
from t0
)
,t2 as(
select *
,case when type=1 then
isnull(sum(depo*power(cast(2.0 as float),totLog))
over(partition by gr order by dt rows between unbounded preceding and 1 preceding)
,0)/power(cast(2.0 as float),totLog)
+depo
else
isnull(sum(depo*power(cast(2.0 as float),totLog))
over(partition by gr order by dt rows between unbounded preceding and 1 preceding)
,0)/power(cast(2.0 as float),totLog)*pct
end rest
from t1
)
select dt,type,depo,pct*100 pct
,rest-lag(rest,1,0)over(order by dt) movement
,rest
from t2
order by dt
dt
type
depo
pct
movement
rest
2023-01-01
1
1
0
1
1
2023-02-01
1
1
0
1
2
2023-03-01
1
1
0
1
3
2023-04-01
0
0
50
-1.5
1.5
2023-05-01
0
0
50
-0.75
0.75
2023-06-01
1
1
0
1
1.75
2023-07-01
0
0
80
-0.35
1.4
2023-08-01
1
5
0
5
6.4
I think, it is better to perform this kind of calculations on client side or middle level.
Sequential calculations are difficult to implement in Sql. In some special cases, you can use logarithmic expressions. But it is clearer and easier to implement through recursion, as #Simeon showed.
To expand on #ValNik's answer
The fist simple step is to change "deduct 20%, then deduct 50%, then deduct 30%" in to a multiplication...
X - 20% - 50% - 30%
=>
x * 0.8 * 0.5 * 0.7
=>
x * 0.28
The second trick is to understand how to calculate cumulative PRODUCT() when you only have cumulative sum; SUM() OVER (), using the properties of logarithms...
a * b == exp( log(a) + log(b) )
0.8 * 0.5 * 0.7
=>
exp( log(0.8) + log(0.5) + log(0.7) )
=>
exp( -0.2231 + -0.6931 + -0.3567 )
=>
exp( -1.2730 )
=>
0.28
The next trick is easier to explain with integers rather than percentages. That is to be able to break down the original problem in to one that can be solved using "cumulative sum" and "cumulative product"...
Current working:
row_id
type
value
equation
result
1
+
10
0 + 10
10
2
+
20
(0 + 10 + 20)
30
3
*
2
(0 + 10 + 20) * 2
60
4
+
30
(0 + 10 + 20) * 2 + 30
90
5
*
3
((0 + 10 + 20) * 2 + 30) * 3
270
Rearranged working:
row_id
type
value
CUMPROD
new equation
result
1
+
10
2*3=6
(10*6 ) / 6
10
2
+
20
2*3=6
(10*6 + 20*6 ) / 6
30
3
*
2
3=3
(10*6 + 20*6 ) / 3
60
4
+
30
3=3
(10*6 + 20*6 + 30*3) / 3
90
5
*
3
=1
(10*6 + 20*6 + 30*3) / 1
270
CUMPROD is the "cumulative product" of all future "multiplication values".
The equation is then the "cumulative sum" of value * CUMPROD divided by the current CUMPROD.
So...
row 1 : SUM(10*6 ) / 6 => SUM(10 )
row 2 : SUM(10*6, 20*6 ) / 6 => SUM(10, 20)
row 3 : SUM(10*6, 20*6 ) / 3 => SUM(10, 20) * 2
row 4 : SUM(10*6, 20*6, 30*3) / 3 => SUM(10, 20) * 2 + SUM(30)
row 5 : SUM(10*6, 20*6, 30*3) / 1 => SUM(10, 20) * 2*3 + SUM(30) * 3
The only things to be cautious of are:
LOG(0) = Infinity (which would happen when deducting 100%)
Deducting more than 100% makes no sense
So, I copied #ValNik's code that creates a new partition every time 100% or more is deducted (forcing everything in the next partition to start at zero again).
This gives the following SQL (a re-arranged version of #ValNik's code):
WITH
partition_when_deduct_everything AS
(
SELECT
*,
SUM(
CASE WHEN type = 0 AND value >= 100 THEN 1 ELSE 0 END
)
OVER (
ORDER BY timestamp
)
AS deduct_everything_id,
CASE WHEN type = 1 THEN value
ELSE 0
END
AS deposit,
CASE WHEN type = 1 THEN 1.0 -- Deposits == Deduct 0%
WHEN value >= 100 THEN 1.0 -- Treat "deduct everything" as a special case
ELSE (100.0-value)/100.0 -- Change "deduct 20%" to "multiply by 0.8"
END
AS multiplier
FROM
your_table
)
,
cumulative_product_of_multipliers as
(
SELECT
*,
EXP(
ISNULL(
SUM(
LOG(multiplier)
)
OVER (
PARTITION BY deduct_everything_id
ORDER BY timestamp
ROWS BETWEEN 1 FOLLOWING
AND UNBOUNDED FOLLOWING
)
, 0
)
)
AS future_multiplier
FROM
partition_when_deduct_everything
)
SELECT
*,
ISNULL(
SUM(
deposit * future_multiplier
)
OVER (
PARTITION BY deduct_everything_id
ORDER BY timestamp
ROWS BETWEEN UNBOUNDED PRECEDING
AND CURRENT ROW
),
0
)
/
future_multiplier
AS rest
FROM
cumulative_product_of_multipliers
Demo : https://dbfiddle.uk/mrioIMiB
So how this should be solved, is a UDTF, because it requires to "sorting the data once" and "traversing the data only once", and if you have different PARTITIONS aka user_id etc etc you can work in parallel):
create or replace function carry_value_state(_TYPE float, _VALUE float)
returns table (DEPOSITED float)
language javascript
as
$$
{
initialize: function(argumentInfo, context) {
this.carried_value = 0.0;
},
processRow: function (row, rowWriter, context){
if(row._TYPE === 1) {
this.carried_value += row._VALUE;
} else {
let limited = Math.max(Math.min(row._VALUE, 100.0), 0.0);
this.carried_value -= (this.carried_value * limited) / 100;
}
rowWriter.writeRow({DEPOSITED: this.carried_value});
}
}
$$;
which then gets used like:
select d.*,
c.*
from data as d
,table(carry_value_state(d.type::float, d.value::float) over (order by transaction_date)) as c
order by 1;
so for the data we have been using in the example, that gives:
TRANSACTION_DATE
TYPE
VALUE
DEPOSITED
2023-01-01
1
1
1
2023-01-02
1
1
2
2023-01-03
1
1
3
2023-01-04
0
50
1.5
2023-01-05
0
50
0.75
2023-01-06
1
1
1.75
2023-01-07
0
20
1.4
2023-01-08
1
5
6.4
yes, the results are now in floating point, so you should double round to avoid FP representation problems, like:
round(round(c.deposited, 6) , 2) as deposited
An alternative approach using Match_Recognize(), POW(), SUM().
I would not recommend using Match_Recognize() unless you have too, it's fiddly and can waste time, however does look elegant.
with data(transaction_date, type, value) as (
select
to_date(column1, 'dd.mm.yyyy'),
column2,
column3
from
values
('08.01.2023', 1, 5),
('07.01.2023', 0, 20),
('06.01.2023', 1, 1),
('05.01.2023', 0, 50),
('04.01.2023', 0, 50),
('03.01.2023', 1, 1),
('02.01.2023', 1, 1),
('01.01.2023', 1, 1)
)
select
*
from
data match_recognize(
order by
transaction_date measures
sum(iff(CLASSIFIER() = 'ROW_WITH_DEPOSIT', value, 0)) DEPOSITS,
pow(iff(CLASSIFIER() = 'ROW_WITH_WITHDRAWL', value / 100, 1) ,count(row_with_withdrawl.*)) DISCOUNT_FROM_WITHDRAWL,
CLASSIFIER() TRANS_TYPE,
first(transaction_date) as start_date,
last(transaction_date) as end_date,
count(*) as rows_in_sequence,
count(row_with_deposit.*) as num_deposits,
count(row_with_withdrawl.*) as num_withdrawls
after
match skip PAST LAST ROW pattern((row_with_deposit + | row_with_withdrawl +)) define row_with_deposit as type = 1,
row_with_withdrawl as type = 0
);

How To Calculate Running balance using SQL

If I have total qty = 100. and it has been shipped in 4 phases line 40, 10, 25, 25 that equals to 100. when I am running this query:
Someone Helped me with this query. I want the same runnable for DB2.
SET totalQty = -1;
SELECT
IF(#totalQty<0, pl.quantity, #totalQty) AS totalQty,
pr.invoiceqty,
#totalQty:=(#totalQty - pr.invoiceqty) AS balance
FROM
purchaseorderline pl, replenishmentrequisition pr
I am getting result like this :
--total qty-- --invoice qty-- --balance qty--
100 40 60
100 10 90
100 25 75
100 25 70
The result I want :
--total qty-- --invoice qty-- --balance qty--
100 40 60
60 10 50
50 25 25
25 25 00
It would be good enough, if you provided some sample data in a table form and not just what you get on it.
WITH MYTAB (PHASE_ID, QTY) AS
(
-- Your initial data as the result of
-- your base SELECT statement
VALUES
(1, 40)
, (2, 10)
, (3, 25)
, (4, 25)
)
SELECT
QTY + QTY_TOT - QTY_RTOT AS "total qty"
, QTY AS "invoice qty"
, QTY_TOT - QTY_RTOT AS "balance qty"
FROM
(
SELECT
PHASE_ID
, QTY
-- Running total sum
, SUM (QTY) OVER (ORDER BY PHASE_ID) AS QTY_RTOT
-- Total sum
, SUM (QTY) OVER () AS QTY_TOT
FROM MYTAB
)
ORDER BY PHASE_ID
total qty
invoice qty
balance qty
100
40
60
60
10
50
50
25
25
25
25
0
A variation of Marks answer is:
WITH MYTAB (PHASE_ID, QTY) AS
(
-- Your initial data as the result of
-- your base SELECT statement
VALUES (1, 40)
, (2, 10)
, (3, 25)
, (4, 25)
)
SELECT QTY_TOT AS "total qty"
, QTY AS "invoice qty"
, coalesce(lead(QTY_TOT) over (order by phase_id),0) AS "balance qty"
FROM
( SELECT PHASE_ID
, QTY
-- Running total sum
, SUM (QTY) OVER (ORDER BY PHASE_ID desc) AS qty_tot
FROM MYTAB
)
ORDER BY PHASE_ID
It uses lead at the outer level instead of sum over the entire window at the inner level
Fiddle

Calculating compound interest with deposits/withdraws

I'm trying to calculate the total on an interest-bearing account accounting for deposits/withdraws with BigQuery.
Example scenario:
Daily interest rate = 10%
Value added/removed on every day: [100, 0, 29, 0, -100] (negative means amount removed)
The totals for each day are:
Day 1: 0*1.1 + 100 = 100
Day 2: 100*1.1 + 0 = 110
Day 3: 110*1.1 + 29 = 150
Day 4: 150*1.1 + 0 = 165
Day 5: 165*1.1 - 100 = 81.5
This would be trivial to implement in a language like Python
daily_changes = [100, 0, 29, 0, -100]
interest_rate = 0.1
result = []
for day, change in enumerate(daily_changes):
if day == 0:
result.append(change)
else:
result.append(result[day-1]*(1+interest_rate) + change)
print(result)
# Result: [100, 110.00000000000001, 150.00000000000003, 165.00000000000006, 81.50000000000009]
My difficulty lies in calculating values for row N when they depend on row N-1 (the usual SUM(...) OVER (ORDER BY...) solution does not suffice here).
Here's a CTE to test with the mock data in this example.
with raw_data as (
select 1 as day, numeric '100' as change union all
select 2 as day, numeric '0' as change union all
select 3 as day, numeric '29' as change union all
select 4 as day, numeric '0' as change union all
select 5 as day, numeric '-100' as change
)
select * from raw_data
You may try below:
SELECT day,
ROUND((SELECT SUM(c * POW(1.1, day - o - 1))
FROM t.changes c WITH OFFSET o), 2) AS totals
FROM (
SELECT *, ARRAY_AGG(change) OVER (ORDER BY day) changes
FROM raw_data
) t;
+-----+--------+
| day | totals |
+-----+--------+
| 1 | 100.0 |
| 2 | 110.0 |
| 3 | 150.0 |
| 4 | 165.0 |
| 5 | 81.5 |
+-----+--------+
Another option with use of recursive CTE
with recursive raw_data as (
select 1 as day, numeric '100' as change union all
select 2 as day, numeric '0' as change union all
select 3 as day, numeric '29' as change union all
select 4 as day, numeric '0' as change union all
select 5 as day, numeric '-100' as change
), iterations as (
select *, change as total
from raw_data where day = 1
union all
select r.day, r.change, 1.1 * i.total + r.change
from iterations i join raw_data r
on r.day = i.day + 1
)
select *
from iterations
with output

Redshift: Grouping rows by range and adding to output columns

I have data like this:
Table 1: (lots of items denoted by 1, 2, 3 etc. and with sales date in epochs and the number of sales on the given date as Number. The data only covers the last 12 weeks of sales)
Item | Sales_Date | Number
1 1587633401000 2
1 1587374201000 3
1 1585732601000 4
1 1583054201000 1
1 1582190201000 2
1 1580548601000 3
What I was as the output is a single line per item with each column showing the total sales for each individual month:
Output:
Item | Month_1_Sales | Month_2_Sales | Month_3_Sales
1 3 3 9
As the only sale that occurred happened at 1580548601000 (sales = 3), while 1583054201000 (sales = 1) and 1582190201000 (sales = 2) both occur in Month 2 etc.
So I need to split the sales dates into groups by month, sum their sales numbers, and then these numbers in columns. I am very new to SQL so don't know where to start. Would anyone be able to help?
You can extract the months from the timestamp using:
select extract(month from (timestamp 'epoch' + sales_date / 1000 * interval '1 second'))
However, I am guessing that you really want 4-week periods, because 12 weeks of data is not 3 complete months. That would make more sense to me. For the calculation, use the difference from the earliest date and then use arithmetic and conditional aggregation:
select item,
sum(case when floor((sales_date - min_sales_date) / (1000 * 60 * 60 * 24 * 4 * 7)) = 2
then number
end) as month_3_sales
sum(case when floor((sales_date - min_sales_date) / (1000 * 60 * 60 * 24 * 4 * 7)) = 1
then number
end) as month_2_sales
sum(case when floor((sales_date - min_sales_date) / (1000 * 60 * 60 * 24 * 4 * 7)) = 0
then number
end) as month_3_sales
from (select t1.*,
min(sales_date) over () as min_sales_date
from table1 t1
) t1
group by item;

SQL Select all items from all clients with its last price

I have a table with all purchases made. with these columns:
clientnumber,
articlenumber,
datepurchased,
price,
qty
Sample data: (I have got more that 1000 clients and more than 50 products)
client1 - article1 - price 100 - qty 2 - date xx-xx-xxxx
client1 - article1 - price 111 - qty 5 - date xx-xx-xxxx
client1 - article2 - price 1 - qty 5 - date xx-xx-xxxx
client2 - article1 - price 114 - qty 5 - date xx-xx-xxxx
client2 - article1 - price 500 - qty 6 - date xx-xx-xxxx
etc..
i want get a list back that gives me all articles from each client purchased with its last price for each article and each client like this
Client 1, Artikel 1, 50 USD (this price should be the newest datepurchased)
client 1, articel 5, 30 usd
clients 2, articel 1, 30 usd
client 2, articel 2, 20 usd
...
You want to rank your records per client and item and show only the best ranked row (here: the latest purchase). Use ROW_NUMBER to do that.
select clientnumber, articlenumber, price
from
(
select
clientnumber, articlenumber, price,
row_number() over (partition by clientnumber, articlenumber
order by datepurchased desc) as rn
from purchases
) ranked
where rn = 1;
SELECT clientnumber,
articlenumber,
datepurchased,
(SELECT (yourtable.price / yourtable.qty) as unitPrice FROM yourtable yt WHERE yt.articlenumber = yourtable.articlenumber ORDER BY datepurchased DESC LIMIT 1) as latestPrice ,
qty
FROM yourtable
ORDER BY datepurchased DESC
SELECT TOP 1 articlenumber,
clientnumber,
datepurchased,
(yourtable.price / yourtable.qty) as pricePerArticle ,
qty
FROM yourtable
ORDER BY datepurchased DESC