I want to compute inventory costs using average value, and I'm somewhat stuck here...
Consider a simple transaction table tr: (ids are autoincrement, negative volume indicates a sell transaction)
order_id | volume | price | type
1 | 1000 | 100 | B
2 | -500 | 110 | S
3 | 1500 | 80 | B
4 | -100 | 150 | S
5 | -600 | 110 | S
6 | 700 | 105 | B
Now I want to know the total volume and total costs after each transaction. The difficulty is getting the sells right. Sells are always valued at the average cost at this point (ie the sell price is actually not relevant here), so the transaction order does matter here.
Optimally, the result would look like this:
order_id | volume | price | total_vol | total_costs | unit_costs
1 | 1000 | 100 | 1000 | 100000 | 100
2 | -500 | 110 | 500 | 50000 | 100
3 | 1500 | 80 | 2000 | 170000 | 85
4 | -100 | 150 | 1900 | 161500 | 85
5 | -600 | 110 | 1300 | 110500 | 85
6 | 700 | 105 | 2000 | 184000 | 92
Now, total_vol is easy with a sum(volume) over (...), total costs on the other hand. I've played around with window functions, but unless I'm missing something totally obvious (or very clever), I don't think it can be done with window functions alone...
Any help would be appreciated. :)
UPDATE:
This is the code I finally used, a combination of both answers (the data model is a bit more complex than my simplified example above, but you get the idea):
select ser_num
, tr_id
, tr_date
, action_typ
, volume
, price
, total_vol
, trunc(total_costs,0) total_costs
, trunc(unit_costs,4) unit_costs
from itt
model
partition by (ser_num)
dimension by (row_number() over (partition by ser_num order by tr_date, tr_id) rn)
measures (tr_id, tr_date, volume, price, action_typ, 0 total_vol, 0 total_costs, 0 unit_costs)
rules automatic order
( total_vol[ANY] order by rn
= nvl(total_vol[cv()-1],0) +
decode(action_typ[cv()], 'Buy', 1, 'Sell', -1) * volume[cv()]
, total_costs[ANY] order by rn
= case action_typ[cv()]
when 'Buy' then volume[cv()] * price[cv()] + nvl(total_costs[cv()-1],0)
when 'Sell' then total_vol[cv()] * nvl(unit_costs[cv()-1],price[cv()])
end
, unit_costs[ANY] order by rn
= decode(total_vol[cv()], 0, unit_costs[cv()-1],
total_costs[cv()] / total_vol[cv()])
)
order by ser_num, tr_date, tr_id
Some observations:
When using partitions and references to the previous cell (cv()-1), the dimension has to be partitioned in the same way as the whole model clause (this is also why using iteration_number can be tricky)
No iteration is needed here as long as you specify the correct execution order on the rules (order by rn edit: Automatic order does this automatically)
Automatic order is probably not necessary here, but it cant hurt.
You can use the MODEL clause to do this recursive calculation
Create sample table and insert data
create table costs (order_id int, volume int, price numeric(16,4), type char(1));
insert into costs (order_id, volume, price) values (1,1000,100);
insert into costs (order_id, volume, price) values (2,-500,110);
insert into costs (order_id, volume, price) values (3,1500,80);
insert into costs (order_id, volume, price) values (4,-100,150);
insert into costs (order_id, volume, price) values (5,-600,110);
insert into costs (order_id, volume, price) values (6,700,105);
The query (EDITED changing rules iterate(1000) to rules automatic order implements the MODEL clause as it is intended to function, i.e. top to bottom sequentially. It also took the query from 0.44s to 0.01s!)
select order_id, volume, price, total_vol, total_costs, unit_costs
from (select order_id, volume, price,
volume total_vol,
0.0 total_costs,
0.0 unit_costs,
row_number() over (order by order_id) rn
from costs order by order_id)
model
dimension by (order_id)
measures (volume, price, total_vol, total_costs, unit_costs)
rules automatic order -- iterate(1000)
( total_vol[any] = volume[cv()] + nvl(total_vol[cv()-1],0.0),
total_costs[any] =
case SIGN(volume[cv()])
when -1 then total_vol[cv()] * nvl(unit_costs[cv()-1],0.0)
else volume[cv()] * price[cv()] + nvl(total_costs[cv()-1],0.0)
end,
unit_costs[any] = total_costs[cv()] / total_vol[cv()]
)
order by order_id
Output
ORDER_ID VOLUME PRICE TOTAL_VOL TOTAL_COSTS UNIT_COSTS
1 1000 100 1000 100000 100
2 -500 110 500 50000 100
3 1500 80 2000 170000 85
4 -100 150 1900 161500 85
5 -600 110 1300 110500 85
6 700 105 2000 184000 92
This site has a good tutorial on the MODEL clause
http://www.sqlsnippets.com/en/topic-11663.html
The EXCEL sheet for the data above would look like this, with the formula extended downwards
A B C D E F
---------------------------------------------------------------------------
1| order_id volume price total_vol total_costs unit_costs
2| 0 0 0
3| 1 1000 100 =C4+E3 =IF(C4<0,G3*E4,F3+C4*D4) =F4/E4
4| 2 -500 110 =C5+E4 =IF(C5<0,G4*E5,F4+C5*D5) =F5/E5
5| 3 1500 80 =C6+E5 =IF(C6<0,G5*E6,F5+C6*D6) =F6/E6
6| 4 -100 150 =C7+E6 =IF(C7<0,G6*E7,F6+C7*D7) =F7/E7
7| 5 -600 110 =C8+E7 =IF(C8<0,G7*E8,F7+C8*D8) =F8/E8
8| 6 700 105 =C9+E8 =IF(C9<0,G8*E9,F8+C9*D9) =F9/E9
There is a problem with Richard's model clause query. It is doing 1000 iterations without an UNTIL clause. After four iterations the end result is achieved already. The next 996 iterations consume CPU power, but do nothing.
Here you can see that the query is done processing after 4 iterations with the current data set:
SQL> select order_id
2 , volume
3 , price
4 , total_vol
5 , total_costs
6 , unit_costs
7 from ( select order_id
8 , volume
9 , price
10 , volume total_vol
11 , 0.0 total_costs
12 , 0.0 unit_costs
13 , row_number() over (order by order_id) rn
14 from costs
15 order by order_id
16 )
17 model
18 dimension by (order_id)
19 measures (volume, price, total_vol, total_costs, unit_costs)
20 rules iterate (4)
21 ( total_vol[any] = volume[cv()] + nvl(total_vol[cv()-1],0.0)
22 , total_costs[any]
23 = case SIGN(volume[cv()])
24 when -1 then total_vol[cv()] * nvl(unit_costs[cv()-1],0.0)
25 else volume[cv()] * price[cv()] + nvl(total_costs[cv()-1],0.0)
26 end
27 , unit_costs[any] = total_costs[cv()] / total_vol[cv()]
28 )
29 order by order_id
30 /
ORDER_ID VOLUME PRICE TOTAL_VOL TOTAL_COSTS UNIT_COSTS
---------- ---------- ---------- ---------- ----------- ----------
1 1000 100 1000 100000 100
2 -500 110 500 50000 100
3 1500 80 2000 170000 85
4 -100 150 1900 161500 85
5 -600 110 1300 110500 85
6 700 105 2000 184000 92
6 rows selected.
It needs 4 iterations and not 6, because automatic order is used, and each iteration tries to adjust all 6 rows.
You are far more performant if you use just as many iterations as there are rows and each iteration adjusts just one row. You can also skip the subquery and then the final query becomes:
SQL> select order_id
2 , volume
3 , price
4 , total_vol
5 , total_costs
6 , unit_costs
7 from costs
8 model
9 dimension by (row_number() over (order by order_id) rn)
10 measures (order_id, volume, price, type, 0 total_vol, 0 total_costs, 0 unit_costs)
11 rules iterate (1000) until (order_id[iteration_number+2] is null)
12 ( total_vol[iteration_number+1]
13 = nvl(total_vol[iteration_number],0) + volume[iteration_number+1]
14 , total_costs[iteration_number+1]
15 = case type[iteration_number+1]
16 when 'B' then volume[iteration_number+1] * price[iteration_number+1] + nvl(total_costs[iteration_number],0)
17 when 'S' then total_vol[iteration_number+1] * nvl(unit_costs[iteration_number],0)
18 end
19 , unit_costs[iteration_number+1]
20 = total_costs[iteration_number+1] / total_vol[iteration_number+1]
21 )
22 order by order_id
23 /
ORDER_ID VOLUME PRICE TOTAL_VOL TOTAL_COSTS UNIT_COSTS
---------- ---------- ---------- ---------- ----------- ----------
1 1000 100 1000 100000 100
2 -500 110 500 50000 100
3 1500 80 2000 170000 85
4 -100 150 1900 161500 85
5 -600 110 1300 110500 85
6 700 105 2000 184000 92
6 rows selected.
Hope this helps.
Regards,
Rob.
EDIT
Some proof to backup my claim:
SQL> create procedure p1 (p_number_of_iterations in number)
2 is
3 begin
4 for x in 1 .. p_number_of_iterations
5 loop
6 for r in
7 ( select order_id
8 , volume
9 , price
10 , total_vol
11 , total_costs
12 , unit_costs
13 from ( select order_id
14 , volume
15 , price
16 , volume total_vol
17 , 0.0 total_costs
18 , 0.0 unit_costs
19 , row_number() over (order by order_id) rn
20 from costs
21 order by order_id
22 )
23 model
24 dimension by (order_id)
25 measures (volume, price, total_vol, total_costs, unit_costs)
26 rules iterate (4)
27 ( total_vol[any] = volume[cv()] + nvl(total_vol[cv()-1],0.0)
28 , total_costs[any]
29 = case SIGN(volume[cv()])
30 when -1 then total_vol[cv()] * nvl(unit_costs[cv()-1],0.0)
31 else volume[cv()] * price[cv()] + nvl(total_costs[cv()-1],0.0)
32 end
33 , unit_costs[any] = total_costs[cv()] / total_vol[cv()]
34 )
35 order by order_id
36 )
37 loop
38 null;
39 end loop;
40 end loop;
41 end p1;
42 /
Procedure created.
SQL> create procedure p2 (p_number_of_iterations in number)
2 is
3 begin
4 for x in 1 .. p_number_of_iterations
5 loop
6 for r in
7 ( select order_id
8 , volume
9 , price
10 , total_vol
11 , total_costs
12 , unit_costs
13 from costs
14 model
15 dimension by (row_number() over (order by order_id) rn)
16 measures (order_id, volume, price, type, 0 total_vol, 0 total_costs, 0 unit_costs)
17 rules iterate (1000) until (order_id[iteration_number+2] is null)
18 ( total_vol[iteration_number+1]
19 = nvl(total_vol[iteration_number],0) + volume[iteration_number+1]
20 , total_costs[iteration_number+1]
21 = case type[iteration_number+1]
22 when 'B' then volume[iteration_number+1] * price[iteration_number+1] + nvl(total_costs[iteration_number],0)
23 when 'S' then total_vol[iteration_number+1] * nvl(unit_costs[iteration_number],0)
24 end
25 , unit_costs[iteration_number+1]
26 = total_costs[iteration_number+1] / total_vol[iteration_number+1]
27 )
28 order by order_id
29 )
30 loop
31 null;
32 end loop;
33 end loop;
34 end p2;
35 /
Procedure created.
SQL> set timing on
SQL> exec p1(1000)
PL/SQL procedure successfully completed.
Elapsed: 00:00:01.32
SQL> exec p2(1000)
PL/SQL procedure successfully completed.
Elapsed: 00:00:00.45
SQL> exec p1(1000)
PL/SQL procedure successfully completed.
Elapsed: 00:00:01.28
SQL> exec p2(1000)
PL/SQL procedure successfully completed.
Elapsed: 00:00:00.43
Related
I need to select all orders data including orders which lead to a transaction + those which didn't lead to a transaction.
Knowing that:
SELECT * FROM Buy_Orders
OrderID OrderQuantity OrderPrice OrderPlacementDate
-----------------------------------------------------------------
12 11 103 2021-10-12 14:02:22.703
14 6 100 2021-10-12 14:04:24.700
14 0 100 2021-10-12 14:07:27.206
17 3 80 2021-10-12 14:08:22.703
12 0 103 2021-10-12 14:09:21.501
20 20 23 2021-10-12 14:11:23.705
SELECT * FROM Sell_Orders
OrderID OrderQuantity OrderPrice OrderPlacementDate
--------------------------------------------------------------
9 2 13 2021-10-12 14:05:25.705
23 7 100 2021-10-12 14:07:27.205
23 1 100 2021-10-12 14:07:27.206
33 9 90 2021-10-12 14:08:28.403
90 1 103 2021-10-12 14:09:21.500
90 0 103 2021-10-12 14:09:21.501
SELECT * FROM Transactions
TransactionID TransactionQuantity TransactionPrice SellOrderID BuyOrderID
---------------------------------------------------------------------------------------
113 6 100 23 14
123 1 103 90 12
Logic for TransactionID 113 ( SellOrderID 23 + BuyOrderID 14 ): transaction created when order 23 entered the order book at 2021-10-12 14:07:27.205 and matched order 14 (partial fill). That's why there was an update on both impacted orders (23 & 14) at 2021-10-12 14:07:27.206 in tables Sell_Orders and Buy_Orders. So, the match with quantity=6 will create an update on order 23 to re-enter the orderbook with quantity=1 and an update on order 14 to re-enter the order book with quantity=0 at 2021-10-12 14:07:27.206.
I have tried the following SQL query but with no chance. I assume I'm not fluent with SQL. Please help!
SELECT
o.OrderID
o.OrderQuantity
o.OrderPlacementDate
t.TransactionID
FROM (
SELECT *
from
Sell_Orders
UNION
SELECT *
from
Buy_Orders ) o
LEFT JOIN (
SELECT
TransactioID
FROM
Transactions ) t on t.SellOrderID = o.OrderID or t.BuyOrderID = o.OrderID
I expect to have this table as an output:
OrderID TransactionID OrderQuantity OrderPrice OrderPlacementDate
---------------------------------------------------------------------------
12 NULL 1 103 2021-10-12 14:02:22.703
14 NULL 6 100 2021-10-12 14:04:24.700
9 NULL 2 13 2021-10-12 14:05:25.705
23 NULL 7 100 2021-10-12 14:07:27.205 -----> 1st Transaction
23 113 1 100 2021-10-12 14:07:27.206
14 113 0 100 2021-10-12 14:07:27.206
17 NULL 3 80 2021-10-12 14:08:22.703
33 NULL 9 90 2021-10-12 14:08:28.403
90 NULL 1 103 2021-10-12 14:09:21.500 -----> 2nd Transaction
90 123 0 103 2021-10-12 14:09:21.501
12 123 0 103 2021-10-12 14:09:21.501
20 NULL 20 23 2021-10-12 14:11:23.705
Your last subquery doesn't include the order id's, so there's nothing to join t on. Just don't use a subquery.
LEFT JOIN (
SELECT
TransactioID
FROM
Transactions ) t on t.SellOrderID = o.OrderID or t.BuyOrderID = o.OrderID
Becomes...
LEFT JOIN
Transactions t
ON t.SellOrderID = o.OrderID
OR t.BuyOrderID = o.OrderID
EDIT:
You also want each transaction to join on just one buy order, and one sell order, which requires adding a ranking id to each order.
Provided that (OrderID, OrderPlacementDate) is guaranteed to be unique, that can be accomplished with...
SELECT
o.*,
t.TransactionID
FROM
(
SELECT
*,
'Sell' AS OrderType,
ROW_NUMBER() OVER (PARTITION BY OrderID ORDER BY OrderPlacementDate DESC) AS OrderIDRank
FROM
Sell_Orders
UNION ALL -- ALWAYS use ALL unless you KNOW a reason otherwise
SELECT
*,
'Buy' AS OrderType,
ROW_NUMBER() OVER (PARTITION BY OrderID ORDER BY OrderPlacementDate DESC) AS OrderIDRank
FROM
Buy_Orders
)
AS o
LEFT JOIN
Transactions AS t
ON o.OrderIDRank = 1
AND o.OrderID IN (t.BuyOrderID, t.SellOrderID)
This query gets expected table
with All_Orders as(
SELECT *,'S' as side from Sell_Orders
UNION ALL
SELECT *,'B' as side from Buy_Orders
)
SELECT
o.OrderID
,case when TransactionQuantity<=OrderQuantity then null
else TransactionId
end TransactionId
,o.OrderQuantity
,o.OrderPlacementDate
-- for info
,t.TransactionID as realTranId
,o.Side,TransactionQuantity
FROM All_Orders o LEFT JOIN Transactions t on t.SellOrderID = o.OrderID or t.BuyOrderID = o.OrderID
order by OrderPlacementDate
Source row
12, 11, 103, 2021-10-12 14:02:22.703
may be ?
12, 1, 103, 2021-10-12 14:02:22.703
DBFiddle example
I have 2 table which is DATA and MAIN. DATA table is the raw data extract from excel and MAIN table is the data after validation(few rules) have been made.
Rule:
If Amoun1<>'' and Amount2<>''.
Insert 2 row in MAIN table. The first row will have the value for Amount and Percentage from Amount2 & Percentage2 with TaxRateType = Taxable.
And the second row will have Amount and Percentage get from Amount1 & Percentage1 with TaxRateType = EXEMPT. The invoiceNo also will be add with '_1'
If Amount2<>'' and Amount1=''
Insert 1 row with Amount and Percentage from Amount2 & Percentage2 with TaxRateType = Taxable.
Else
Insert 1 row with Amount and Percentage get from Amount1 & Percentage1 with TaxRateType = EXEMPT
The example is as below table:
**DATA**
InvoiceNo | TotalAmount | Percentage1 | Amount1 | Percentage2 | Amount2
abc123 100 5 45 20 55
abc124 60 5 60 20
abc125 50 5 22 50
**MAIN**
InvoiceNo | Percentage | Amount | TaxRateType | ReferenceValue
abc123 20 55 TAXABLE 2
abc123_1 5 45 EXEMPT 1
abc124 5 60 EXEMPT 1
abc125 22 50 TAXABLE 2
I'm stuck in here for 4 hours searching for which method to use. Currently I have an idea to use if exists but still it's not correct and somehow I feel its not a good method.
IF EXISTS (SELECT ID from [alcatel].[Main_Temp] where Amount0<>'' and Amount21<>'')
BEGIN
INSERT INTO [alcatel].[Main]
( [Country],[InvoiceNo],[Amount],[Percentage],[TaxRateType],[Reference Value])
SELECT
[Country],[InvoiceNo],[Amount2],[Percentage2],'TAXABLE' as [TaxRateType],2 as [Reference Value]
FROM [alcatel].[Data];
INSERT INTO [alcatel].[Main]
( [Country],[InvoiceNo],[Amount],[Percentage],[TaxRateType],[Reference Value])
SELECT
[Country],[InvoiceNo]+'_1' as InvoiceNo,[Amount1],[Percentage1],'EXEMPT' as [TaxRateType],'1' as [Reference Value]
FROM [alcatel].[Data];
END
Followed with other condition.
I think you just want to unpivot the data with some logic:
select invoiceno + v.suffix, v.percentage, v.amount,
v.taxratetype, v.referencevalue
from data d cross apply
(values (1, d.Percentage1, d.Amount1, 'EXEMPT', ''),
(2, d.Percentage2, d.Amount2, 'TAXABLE', (case when d.amount1 is not null then '_1' else '' end))
) v(ReferenceValue, Percentage, Amount, TaxRateType, Suffix)
where amount is not null;
Here is a db<>fiddle.
I'm working with Oracle and cannot achieve the query I need for the moment.
Suppose I have the following table :
- ID Date Type Value
- 1 01/12/2016 prod 1
- 2 01/01/2017 test 10
- 3 01/06/2017 test 20
- 4 01/12/2017 prod 30
- 5 15/12/2017 test 40
- 6 01/01/2018 test 50
- 7 01/06/2018 test 60
- 8 01/12/2018 prod 70
I need to sum the VALUES between the "prod" TYPES + the last "prod" VALUE.
The results should be :
- 1 01/01/2016 - 1
- 2 01/01/2017 - 60
- 3 01/06/2017 - 60
- 4 01/12/2017 - 60
- 5 15/12/2017 - 220
- 6 01/01/2018 - 220
- 7 01/06/2018 - 220
- 8 01/12/2018 - 220
I first had to sum VALUES by YEAR without taking TYPES into account.
The need changed and I don't see how to start to identify, for each line, which is the previous "prod" DATE and sum each VALUE including the last "prod" TYPE.
Thanks
You can define the groups using a cumulative sum on type = 'PROD' -- in reverse, then use a window function for the final summation:
select t.*,
sum(value) over (partition by grp) as total
from (select t.*,
sum(case when type = 'PROD' then 1 else 0 end) over (order by id desc) as grp
from t
) t
order by id;
To see the grouping logic, look at:
ID Date Type Value Grp
1 01/12/2016 prod 1 3
2 01/01/2017 test 10 2
3 01/06/2017 test 20 2
4 01/12/2017 prod 30 2
5 15/12/2017 test 40 1
6 01/01/2018 test 50 1
7 01/06/2018 test 60 1
8 01/12/2018 prod 70 1
This identifies the groups that need to be summed. The DESC is because "prod" ends a group. If "prod" started a group (i.e. was included with the sum on the next row), then ASC would be used.
Rextester Demo
Gordon Linoff's answer is great.
This below is just for a bit of a different flavor(12c+)
Setup:
ALTER SESSION SET NLS_DATE_FORMAT = 'DD/MM/YYYY';
CREATE TABLE TEST_TABLE(
THE_ID INTEGER,
THE_DATE DATE,
THE_TYPE CHAR(4),
THE_VALUE INTEGER);
INSERT INTO TEST_TABLE VALUES (1,TO_DATE('01/12/2016'),'prod',1);
INSERT INTO TEST_TABLE VALUES (2,TO_DATE('01/01/2017'),'test',10);
INSERT INTO TEST_TABLE VALUES (3,TO_DATE('01/06/2017'),'test',20);
INSERT INTO TEST_TABLE VALUES (4,TO_DATE('01/12/2017'),'prod',30);
INSERT INTO TEST_TABLE VALUES (5,TO_DATE('15/12/2017'),'test',40);
INSERT INTO TEST_TABLE VALUES (6,TO_DATE('01/01/2018'),'test',50);
INSERT INTO TEST_TABLE VALUES (7,TO_DATE('01/06/2018'),'test',70);
INSERT INTO TEST_TABLE VALUES (8,TO_DATE('01/12/2018'),'prod',60);
COMMIT;
Query:
SELECT
THE_ID, THE_DATE, MAX(RUNNING_GROUP_SUM) OVER (PARTITION BY THE_MATCH_NUMBER) AS GROUP_SUM
FROM TEST_TABLE
MATCH_RECOGNIZE (
ORDER BY THE_ID
MEASURES
MATCH_NUMBER() AS THE_MATCH_NUMBER,
RUNNING SUM(THE_VALUE) AS RUNNING_GROUP_SUM
ALL ROWS PER MATCH
AFTER MATCH SKIP PAST LAST ROW
PATTERN (TEST_TARGET{0,} PROD_TARGET)
DEFINE TEST_TARGET AS THE_TYPE = 'test',
PROD_TARGET AS THE_TYPE = 'prod')
ORDER BY THE_ID ASC;
Result:
THE_ID THE_DATE GROUP_SUM
---------- ---------- ----------
1 01/12/2016 1
2 01/01/2017 60
3 01/06/2017 60
4 01/12/2017 60
5 15/12/2017 220
6 01/01/2018 220
7 01/06/2018 220
8 01/12/2018 220
I need to make a counter for everytime a value changes in time.
I have this table:
Date | Quantity
2017-02-01 | 10000
2017-02-02 | 20000
2017-02-03 | 20000
2017-02-04 | 20000
2017-02-05 | 10000
2017-02-06 | 10000
I want to make something like this:
Date | Quantity | Counter
2017-02-01 | 10000 | 1
2017-02-02 | 20000 | 2
2017-02-03 | 20000 | 2
2017-02-04 | 20000 | 2
2017-02-05 | 10000 | 3
2017-02-06 | 10000 | 3
I tried using dense_rank and other functions but I couldn't make it look like that because it would give the same Counter number when the quantity is 10000.
Is what I'm asking even possible?
Thank you!
A simple method is to use lag() and a cumulative sum:
select t.date, t.quantity,
sum(case when quantity = prev_quantity then 0 else 1 end) over (order by date) as counter
from (select t.*, lag(quantity) over (order by date) as prev_quantity
from t
) t;
These are ANSI standard functions and available in most databases.
Simple solution for Oracle 12 and above only, using the MATCH_RECOGNIZE clause:
with
test_data ( dt, quantity ) as (
select date '2017-02-01', 10000 from dual union all
select date '2017-02-02', 20000 from dual union all
select date '2017-02-03', 20000 from dual union all
select date '2017-02-04', 20000 from dual union all
select date '2017-02-05', 10000 from dual union all
select date '2017-02-06', 10000 from dual
)
-- end of test data, for illustration only; WITH clause is NOT part of the query
-- solution (SQL query) begins BELOW THIS LINE
select dt, quantity, mn as counter
from test_data
match_recognize (
order by dt
measures match_number() as mn
all rows per match
pattern ( a b* )
define b as b.quantity = a.quantity
)
;
DT QUANTITY COUNTER
---------- ---------- ----------
2017-02-01 10000 1
2017-02-02 20000 2
2017-02-03 20000 2
2017-02-04 20000 2
2017-02-05 10000 3
2017-02-06 10000 3
6 rows selected.
In oracle, the LISTAGG function allows me to use it analytically with a OVER (PARTITION BY column..) clause. However, it does not support use of windowing with the ROWS or RANGE keywords.
I have a data set from a store register (simplified for the question). Note that the register table's quantity is always 1 - one item, one transaction line.
TranID TranLine ItemId OrderID Dollars Quantity
------ -------- ------ ------- ------- --------
1 101 23845 23 2.99 1
1 102 23845 23 2.99 1
1 103 23845 23 2.99 1
1 104 23845 23 2.99 1
1 105 23845 23 2.99 1
I have to "match" this data to a table in an special order system where items are grouped by quantity. Note that the system can have the same item ID on multiple lines (components ordered may be different even if the item is the same).
ItemId OrderID Order Line Dollars Quantity
------ ------- ---------- ------- --------
23845 23 1 8.97 3
23845 23 2 5.98 2
The only way I can match this data is by order id, item id and dollar amount.
Essentially I need to get to the following result.
ItemId OrderID Order Line Dollars Quantity Tran ID Tran Lines
------ ------- ---------- ------- -------- ------- ----------
23845 23 1 8.97 3 1 101;102;103
23845 23 2 5.98 2 1 104;105
I don't specifically care if the tran lines are ordered in any way, all I care is that the dollar amounts match and that I don't "re-use" a line from the register in computing the total on the special order. I don't need the tran lines broken out into a table - this is for reporting purposes and the granularity never goes back down to the register transaction line level.
My initial thinking was that I can do this with analytic functions to do a "best match" to identify the the first set of rows that match to the dollar amount and quantity in the ordering system, giving me a result set like:
TranID TranLine ItemId OrderID Dollars Quantity CumDollar CumQty
------ -------- ------ ------- ------- -------- -------- ------
1 101 23845 23 2.99 1 2.99 1
1 102 23845 23 2.99 1 5.98 2
1 103 23845 23 2.99 1 8.97 3
1 104 23845 23 2.99 1 11.96 4
1 105 23845 23 2.99 1 14.95 5
So far so good. But I then try to add LISTAGG to my query:
SELECT tranid, tranline, itemid, orderid, dollars, quantity,
SUM(dollars) OVER (partition by tranid, itemid, orderid order by tranline) cumdollar,
SUM(quantity) OVER (partition by tranid, itemid, orderid order by tranline) cumqty
LISTAGG (tranline) within group (order by tranid, itemid, orderid, tranline) OVER (partition by tranid, itemid, orderid)
FROM table
I discover that it always returns a full agg instead of a cumulative agg:
TranID TranLine ItemId OrderID Dollars Quantity CumDollar CumQty ListAgg
------ -------- ------ ------- ------- -------- -------- ------ -------
1 101 23845 23 2.99 1 2.99 1 101;102;103;104;105
1 102 23845 23 2.99 1 5.98 2 101;102;103;104;105
1 103 23845 23 2.99 1 8.97 3 101;102;103;104;105
1 104 23845 23 2.99 1 11.96 4 101;102;103;104;105
1 105 23845 23 2.99 1 14.95 5 101;102;103;104;105
So this isn't useful.
I would much prefer to do this in SQL if at all possible. I am aware that I can do this with cursors & procedural logic.
Is there any way to do windowing with the LISTAGG analytic function, or perhaps another analytic function which would support this?
I'm on 11gR2.
The only way I can think of to achieve this is with a correlated subquery:
WITH CTE AS
( SELECT TranID,
TranLine,
ItemID,
OrderID,
Dollars,
Quantity,
SUM(dollars) OVER (PARTITION BY TranID, ItemID, OrderID ORDER BY TranLine) AS CumDollar,
SUM(Quantity) OVER (PARTITION BY TranID, ItemID, OrderID ORDER BY TranLine) AS CumQuantity
FROM T
)
SELECT TranID,
TranLine,
ItemID,
OrderID,
Dollars,
Quantity,
CumDollar,
CumQuantity,
( SELECT LISTAGG(Tranline, ';') WITHIN GROUP(ORDER BY CumQuantity)
FROM CTE T2
WHERE T1.CumQuantity >= T2.CumQuantity
AND T1.ItemID = T2.ItemID
AND T1.OrderID = T2.OrderID
AND T1.TranID = T2.TranID
GROUP BY tranid, itemid, orderid
) AS ListAgg
FROM CTE T1;
I realise this doesn't give the exact output you were asking for, but hopefully it is enough to overcome the problem of the cumulative LISTAGG and get you on your way.
I've set up an SQL Fiddle to demonstrate the solution.
In your example, your store register table contains 5 rows and your special order system table contains 2 rows. Your expected result set contains the two rows from your special order system table and all "tranlines" of your store register table should be mentioned in the "Tran Line" column.
This means you need to aggregate those 5 rows to 2 rows. Meaning you don't need the LISTAGG analytic function, but the LISTAGG aggregate function.
Your challenge is to join the rows of the store register table to the right row in the special order system table. You were well on your way by calculating the running sum of dollars and quantities. The only step missing is to define ranges of dollars and quantities by which you can assign each store register row to each special order system row.
Here is an example. First define the tables:
SQL> create table store_register_table (tranid,tranline,itemid,orderid,dollars,quantity)
2 as
3 select 1, 101, 23845, 23, 2.99, 1 from dual union all
4 select 1, 102, 23845, 23, 2.99, 1 from dual union all
5 select 1, 103, 23845, 23, 2.99, 1 from dual union all
6 select 1, 104, 23845, 23, 2.99, 1 from dual union all
7 select 1, 105, 23845, 23, 2.99, 1 from dual
8 /
Table created.
SQL> create table special_order_system_table (itemid,orderid,order_line,dollars,quantity)
2 as
3 select 23845, 23, 1, 8.97, 3 from dual union all
4 select 23845, 23, 2, 5.98, 2 from dual
5 /
Table created.
And the query:
SQL> with t as
2 ( select tranid
3 , tranline
4 , itemid
5 , orderid
6 , sum(dollars) over (partition by itemid,orderid order by tranline) running_sum_dollars
7 , sum(quantity) over (partition by itemid,orderid order by tranline) running_sum_quantity
8 from store_register_table srt
9 )
10 , t2 as
11 ( select itemid
12 , orderid
13 , order_line
14 , dollars
15 , quantity
16 , sum(dollars) over (partition by itemid,orderid order by order_line) running_sum_dollars
17 , sum(quantity) over (partition by itemid,orderid order by order_line) running_sum_quantity
18 from special_order_system_table
19 )
20 , t3 as
21 ( select itemid
22 , orderid
23 , order_line
24 , dollars
25 , quantity
26 , 1 + lag(running_sum_dollars,1,0) over (partition by itemid,orderid order by order_line) begin_sum_dollars
27 , running_sum_dollars end_sum_dollars
28 , 1 + lag(running_sum_quantity,1,0) over (partition by itemid,orderid order by order_line) begin_sum_quantity
29 , running_sum_quantity end_sum_quantity
30 from t2
31 )
32 select t3.itemid "ItemID"
33 , t3.orderid "OrderID"
34 , t3.order_line "Order Line"
35 , t3.dollars "Dollars"
36 , t3.quantity "Quantity"
37 , t.tranid "Tran ID"
38 , listagg(t.tranline,';') within group (order by t3.itemid,t3.orderid) "Tran Lines"
39 from t3
40 inner join t
41 on ( t.itemid = t3.itemid
42 and t.orderid = t3.orderid
43 and t.running_sum_dollars between t3.begin_sum_dollars and t3.end_sum_dollars
44 and t.running_sum_quantity between t3.begin_sum_quantity and t3.end_sum_quantity
45 )
46 group by t3.itemid
47 , t3.orderid
48 , t3.order_line
49 , t3.dollars
50 , t3.quantity
51 , t.tranid
52 /
ItemID OrderID Order Line Dollars Quantity Tran ID Tran Lines
---------- ---------- ---------- ---------- ---------- ---------- --------------------
23845 23 1 8.97 3 1 101;102;103
23845 23 2 5.98 2 1 104;105
2 rows selected.
Regards,
Rob.