LISTAGG equivalent with windowing clause

LISTAGG equivalent with windowing clause - sql

In oracle, the LISTAGG function allows me to use it analytically with a OVER (PARTITION BY column..) clause. However, it does not support use of windowing with the ROWS or RANGE keywords.
I have a data set from a store register (simplified for the question). Note that the register table's quantity is always 1 - one item, one transaction line.
TranID TranLine ItemId OrderID Dollars Quantity
------ -------- ------ ------- ------- --------
1 101 23845 23 2.99 1
1 102 23845 23 2.99 1
1 103 23845 23 2.99 1
1 104 23845 23 2.99 1
1 105 23845 23 2.99 1
I have to "match" this data to a table in an special order system where items are grouped by quantity. Note that the system can have the same item ID on multiple lines (components ordered may be different even if the item is the same).
ItemId OrderID Order Line Dollars Quantity
------ ------- ---------- ------- --------
23845 23 1 8.97 3
23845 23 2 5.98 2
The only way I can match this data is by order id, item id and dollar amount.
Essentially I need to get to the following result.
ItemId OrderID Order Line Dollars Quantity Tran ID Tran Lines
------ ------- ---------- ------- -------- ------- ----------
23845 23 1 8.97 3 1 101;102;103
23845 23 2 5.98 2 1 104;105
I don't specifically care if the tran lines are ordered in any way, all I care is that the dollar amounts match and that I don't "re-use" a line from the register in computing the total on the special order. I don't need the tran lines broken out into a table - this is for reporting purposes and the granularity never goes back down to the register transaction line level.
My initial thinking was that I can do this with analytic functions to do a "best match" to identify the the first set of rows that match to the dollar amount and quantity in the ordering system, giving me a result set like:
TranID TranLine ItemId OrderID Dollars Quantity CumDollar CumQty
------ -------- ------ ------- ------- -------- -------- ------
1 101 23845 23 2.99 1 2.99 1
1 102 23845 23 2.99 1 5.98 2
1 103 23845 23 2.99 1 8.97 3
1 104 23845 23 2.99 1 11.96 4
1 105 23845 23 2.99 1 14.95 5
So far so good. But I then try to add LISTAGG to my query:
SELECT tranid, tranline, itemid, orderid, dollars, quantity,
SUM(dollars) OVER (partition by tranid, itemid, orderid order by tranline) cumdollar,
SUM(quantity) OVER (partition by tranid, itemid, orderid order by tranline) cumqty
LISTAGG (tranline) within group (order by tranid, itemid, orderid, tranline) OVER (partition by tranid, itemid, orderid)
FROM table
I discover that it always returns a full agg instead of a cumulative agg:
TranID TranLine ItemId OrderID Dollars Quantity CumDollar CumQty ListAgg
------ -------- ------ ------- ------- -------- -------- ------ -------
1 101 23845 23 2.99 1 2.99 1 101;102;103;104;105
1 102 23845 23 2.99 1 5.98 2 101;102;103;104;105
1 103 23845 23 2.99 1 8.97 3 101;102;103;104;105
1 104 23845 23 2.99 1 11.96 4 101;102;103;104;105
1 105 23845 23 2.99 1 14.95 5 101;102;103;104;105
So this isn't useful.
I would much prefer to do this in SQL if at all possible. I am aware that I can do this with cursors & procedural logic.
Is there any way to do windowing with the LISTAGG analytic function, or perhaps another analytic function which would support this?
I'm on 11gR2.

The only way I can think of to achieve this is with a correlated subquery:
WITH CTE AS
( SELECT TranID,
TranLine,
ItemID,
OrderID,
Dollars,
Quantity,
SUM(dollars) OVER (PARTITION BY TranID, ItemID, OrderID ORDER BY TranLine) AS CumDollar,
SUM(Quantity) OVER (PARTITION BY TranID, ItemID, OrderID ORDER BY TranLine) AS CumQuantity
FROM T
)
SELECT TranID,
TranLine,
ItemID,
OrderID,
Dollars,
Quantity,
CumDollar,
CumQuantity,
( SELECT LISTAGG(Tranline, ';') WITHIN GROUP(ORDER BY CumQuantity)
FROM CTE T2
WHERE T1.CumQuantity >= T2.CumQuantity
AND T1.ItemID = T2.ItemID
AND T1.OrderID = T2.OrderID
AND T1.TranID = T2.TranID
GROUP BY tranid, itemid, orderid
) AS ListAgg
FROM CTE T1;
I realise this doesn't give the exact output you were asking for, but hopefully it is enough to overcome the problem of the cumulative LISTAGG and get you on your way.
I've set up an SQL Fiddle to demonstrate the solution.

In your example, your store register table contains 5 rows and your special order system table contains 2 rows. Your expected result set contains the two rows from your special order system table and all "tranlines" of your store register table should be mentioned in the "Tran Line" column.
This means you need to aggregate those 5 rows to 2 rows. Meaning you don't need the LISTAGG analytic function, but the LISTAGG aggregate function.
Your challenge is to join the rows of the store register table to the right row in the special order system table. You were well on your way by calculating the running sum of dollars and quantities. The only step missing is to define ranges of dollars and quantities by which you can assign each store register row to each special order system row.
Here is an example. First define the tables:
SQL> create table store_register_table (tranid,tranline,itemid,orderid,dollars,quantity)
2 as
3 select 1, 101, 23845, 23, 2.99, 1 from dual union all
4 select 1, 102, 23845, 23, 2.99, 1 from dual union all
5 select 1, 103, 23845, 23, 2.99, 1 from dual union all
6 select 1, 104, 23845, 23, 2.99, 1 from dual union all
7 select 1, 105, 23845, 23, 2.99, 1 from dual
8 /
Table created.
SQL> create table special_order_system_table (itemid,orderid,order_line,dollars,quantity)
2 as
3 select 23845, 23, 1, 8.97, 3 from dual union all
4 select 23845, 23, 2, 5.98, 2 from dual
5 /
Table created.
And the query:
SQL> with t as
2 ( select tranid
3 , tranline
4 , itemid
5 , orderid
6 , sum(dollars) over (partition by itemid,orderid order by tranline) running_sum_dollars
7 , sum(quantity) over (partition by itemid,orderid order by tranline) running_sum_quantity
8 from store_register_table srt
9 )
10 , t2 as
11 ( select itemid
12 , orderid
13 , order_line
14 , dollars
15 , quantity
16 , sum(dollars) over (partition by itemid,orderid order by order_line) running_sum_dollars
17 , sum(quantity) over (partition by itemid,orderid order by order_line) running_sum_quantity
18 from special_order_system_table
19 )
20 , t3 as
21 ( select itemid
22 , orderid
23 , order_line
24 , dollars
25 , quantity
26 , 1 + lag(running_sum_dollars,1,0) over (partition by itemid,orderid order by order_line) begin_sum_dollars
27 , running_sum_dollars end_sum_dollars
28 , 1 + lag(running_sum_quantity,1,0) over (partition by itemid,orderid order by order_line) begin_sum_quantity
29 , running_sum_quantity end_sum_quantity
30 from t2
31 )
32 select t3.itemid "ItemID"
33 , t3.orderid "OrderID"
34 , t3.order_line "Order Line"
35 , t3.dollars "Dollars"
36 , t3.quantity "Quantity"
37 , t.tranid "Tran ID"
38 , listagg(t.tranline,';') within group (order by t3.itemid,t3.orderid) "Tran Lines"
39 from t3
40 inner join t
41 on ( t.itemid = t3.itemid
42 and t.orderid = t3.orderid
43 and t.running_sum_dollars between t3.begin_sum_dollars and t3.end_sum_dollars
44 and t.running_sum_quantity between t3.begin_sum_quantity and t3.end_sum_quantity
45 )
46 group by t3.itemid
47 , t3.orderid
48 , t3.order_line
49 , t3.dollars
50 , t3.quantity
51 , t.tranid
52 /
ItemID OrderID Order Line Dollars Quantity Tran ID Tran Lines
---------- ---------- ---------- ---------- ---------- ---------- --------------------
23845 23 1 8.97 3 1 101;102;103
23845 23 2 5.98 2 1 104;105
2 rows selected.
Regards,
Rob.

Related

analytical functions

good afternoon, a question, how can I optimize the code, I don't know, maybe using oracle analytical functions :
-- tabledeuda : this table contains 2 months 202212 and 202211
SELECT B.*,
NVL(B.DEUDAPRESTAMO_PAGPER,0)-NVL(A.DEUDAPRESTAMO_PAGPER,0) AS SALE_CT -- current month - previous month
FROM tabledeuda B
LEFT JOIN tabledeuda A ON (A.CODLLAVE = B.CODLLAVE
AND A.CODMES = TO_NUMBER(TO_CHAR(ADD_MONTHS(TO_DATE(B.CODMES,'YYYYMM'),-1),'YYYYMM'))
AND A.financial_company = B.financial_company
AND A.CODMONEY=B.CODMONEY)
WHERE NVL(B.DEUDAPRESTAMO_PAGPER,0)>NVL(A.DEUDAPRESTAMO_PAGPER,0)
AND B.CODMES = &CODMES; ---> &CODMES 202212
OUTPUT

Looks like a candidate for lag analytic function.
Sample data is rather poor so it is unclear what happens when there's more data, but - that's the general idea.
Sample data:
SQL> with test (codmes, customer, deudaprestamo_pagper) as
2 (select 202212, 'T1009', 200 from dual union all
3 select 202211, 'T1009', 150 from dual
4 )
Query:
5 select codmes, customer,
6 deudaprestamo_pagper,
7 deudaprestamo_pagper -
8 lag(deudaprestamo_pagper) over (partition by customer order by codmes) sale_ct
9 from test;
CODMES CUSTO DEUDAPRESTAMO_PAGPER SALE_CT
---------- ----- -------------------- ----------
202211 T1009 150
202212 T1009 200 50
SQL>
If you want to fetch only the last row (sorted by codmes), you could e.g.
6 with temp as
7 (select codmes, customer,
8 deudaprestamo_pagper,
9 deudaprestamo_pagper -
10 lag(deudaprestamo_pagper) over (partition by customer order by codmes) sale_ct,
11 --
12 row_number() over (partition by customer order by codmes desc) rn
13 from test
14 )
15 select codmes, customer, deudaprestamo_pagper, sale_ct
16 from temp
17 where rn = 1;
CODMES CUSTO DEUDAPRESTAMO_PAGPER SALE_CT
---------- ----- -------------------- ----------
202212 T1009 200 50
SQL>

Query to pull data from column based off max value of second column

I have a table that has [Order], [Yield], [Scrap], [OpAc] columns. I need to pull the yield based on the max value of [OpAc].
Order
Yield
Scrap
OpAc
1234
140
0
10
1234
140
0
20
1234
130
10
30
1234
130
0
40
1234
125
5
50
1234
110
15
60
1235
140
0
10
1235
138
2
20
1235
138
0
30
1235
138
0
40
1235
138
0
50
1235
137
1
60
1235
137
0
70
Expected Results
Order
Yield
1234
110
1235
137
The query that I have tried is
select [Order], [Yield], MAX([OpAc]) as Max_OpAc
from SCRAP
GROUP BY [Order], [Yield]
order by [order]
This produces
Order
Yield
Max_OpAc
1234
110
60
1234
125
50
1234
130
40
1234
140
20
1235
137
70
1235
138
50
1235
140
10
I've tried setting up some CTE queries to break it down into separate functions but I keep getting caught at this step.
WITH CTE1 AS(
SELECT ROW_NUMBER() OVER(PARTITION BY [Order] ORDER BY [Order],[OpAc]) AS RN , *
FROM SAP_SCRAP
),
This proved to be redundant due to the fact that the [OpAc] field is sequential for each step.
Thanks in advance for any help

You almost got it!
WITH Orders_By_OpAc_Desc AS (
SELECT
[Order],
[Yield].
ROW_NUMBER() OVER (PARTITION BY [Order] ORDER BY OpAc DESC) AS [rn],
FROM
SCRAP
)
SELECT [Order],
[Yield]
FROM
Orders_By_OpAc_Desc
WHERE
rn = 1
The trick here is ROW_NUMBER() OVER (PARTITION BY [Order] ORDER BY OpAc DESC) AS [rn]. It might be confusing to understand in SQL, but when expressed in words it's a bit clearer.
This statement takes each group of rows with the same Order value (PARTITION BY [Order]), orders each group by OpAc in descending order so that the higher OpAc values end up "on top" of the group (ORDER BY OpAc DESC), and numbers each row in the group "top" to "bottom", starting with 1 (ROW_NUMBER()).
Meaning, each row with this number set to 1 has the highest OpAc value for the OrderId.
Wrap that into a CTE and then select just the rows with this number (rn) set to 1. Voi-la.

You definitely want the OVER (PARTITION BY) but MAX() is also an option here. You want something like:
SELECT
*
FROM
(
SELECT
t3.*
, MAX(OpAc) OVER (PARTITION BY [Order]) max1
FROM
SCRAP t3
) a
WHERE
a.Max1 = a.OpAc
for MAX()
Depending on your SQL Server edition, version, and query needs, you may be able to use FIRST_VALUE() as well:
SELECT
DISTINCT
t3.[Order],
FIRST_VALUE(Yield) OVER(PARTITION BY [Order] ORDER BY OpAc DESC) Yield
FROM
SCRAP t3

You were so close. Just missing an ORDER BY OpAc DESC in your ROW_NUMBER function.
SQL Fiddle
MS SQL Server 2017 Schema Setup:
CREATE TABLE orders (
[Order] int null
, Yield int null
, Scrap int null
, OpAc int null
);
INSERT INTO orders ([Order], Yield, Scrap, OpAc)
VALUES (1234,140,0,10)
, (1234,140,0,20)
, (1234,130,10,30)
, (1234,130,0,40)
, (1234,125,5,50)
, (1234,110,15,60)
, (1235,140,0,10)
, (1235,138,2,20)
, (1235,138,0,30)
, (1235,138,0,40)
, (1235,138,0,50)
, (1235,137,1,60)
, (1235,137,0,70)
;
Query 1:
WITH CTE1 AS (
SELECT *
, ROW_NUMBER() OVER(PARTITION BY [Order] ORDER BY OpAc DESC) as row_num
FROM orders
)
SELECT *
FROM CTE1 as c
WHERE c.row_num = 1
Results:
| Order | Yield | Scrap | OpAc | row_num |
|-------|-------|-------|------|---------|
| 1234 | 110 | 15 | 60 | 1 |
| 1235 | 137 | 0 | 70 | 1 |

Getting latest price of different products from control table

I have a control table, where Prices with Item number are tracked date wise.
id ItemNo Price Date
---------------------------
1 a001 100 1/1/2003
2 a001 105 1/2/2003
3 a001 110 1/3/2003
4 b100 50 1/1/2003
5 b100 55 1/2/2003
6 b100 60 1/3/2003
7 c501 35 1/1/2003
8 c501 38 1/2/2003
9 c501 42 1/3/2003
10 a001 95 1/1/2004
This is the query I am running.
SELECT pr.*
FROM prices pr
INNER JOIN
(
SELECT ItemNo, max(date) max_date
FROM prices
GROUP BY ItemNo
) p ON pr.ItemNo = p.ItemNo AND
pr.date = p.max_date
order by ItemNo ASC
I am getting below values
id ItemNo Price Date
------------------------------
10 a001 95 2004-01-01
6 b100 60 2003-01-03
9 c501 42 2003-01-03
Question is, is my query right or wrong? though I am getting my desired result.

Your query does what you want, and is a valid approach to solve your problem.
An alternative option would be to use a correlated subquery for filtering:
select p.*
from prices p
where p.date = (select max(p1.date) from prices where p1.itemno = p.itemno)
The upside of this query is that it can take advantage of an index on (itemno, date).
You can also use window functions:
select *
from (
select p.*, rank() over(partition by itemno order by date desc) rn
from prices p
) p
where rn = 1
I would recommend benchmarking the three options against your real data to assess which one performs better.

How to get latest records based on two columns of max

I have a table called Inventory with the below columns
item warehouse date sequence number value
111 100 2019-09-25 12:29:41.000 1 10
111 100 2019-09-26 12:29:41.000 1 20
222 200 2019-09-21 16:07:10.000 1 5
222 200 2019-09-21 16:07:10.000 2 10
333 300 2020-01-19 12:05:23.000 1 4
333 300 2020-01-20 12:05:23.000 1 5
Expected Output:
item warehouse date sequence number value
111 100 2019-09-26 12:29:41.000 1 20
222 200 2019-09-21 16:07:10.000 2 10
333 300 2020-01-20 12:05:23.000 1 5
Based on item and warehouse, i need to pick latest date and latest sequence number of value.
I tried with below code
select item,warehouse,sequencenumber,sum(value),max(date) as date1
from Inventory t1
where
t1.date IN (select max(date) from Inventory t2
where t1.warehouse=t2.warehouse
and t1.item = t2.item
group by t2.item,t2.warehouse)
group by t1.item,t1.warehouse,t1.sequencenumber
Its working for latest date but not for latest sequence number.
Can you please suggest how to write a query to get my expected output.

You can use row_number() for this:
select *
from (
select
t.*,
row_number() over(
partition by item, warehouse
order by date desc, sequence_number desc, value desc
) rn
from mytable t
) t
where rn = 1

Split a Table into 2 or more Tables based on Column value

I have a Table called "MIVTable" which has the following records,
MIVID Quantity Value
------ ---------- --------
14 10 3000
14 20 3500
14 15 2000
15 20 3000
15 50 7500
16 25 2000
Here, I need to store the above Table into two tables such as "HeaderTbl" and "DetailTbl" based on the MIVID as follows:
HeaderTbl:
HID MIVID TotalQuantity TotalValue
----- ------- ------------- -----------
1 14 45 8500
2 15 70 10500
3 16 25 2000
Here HID is the Primary Key with Identity Column.
DetailTbl:
HID MIVID Quantity Value
----- ------- ------------ -------
1 14 10 3000
1 14 20 3500
1 14 15 2000
2 15 20 3000
2 15 50 7500
3 16 25 2000
Suppose, if the MIVTable contains 4 different MIVID means, then 4 row should be created based on the MIVID on the HeaderTbl. How to do this?

To insert records in HeaderTbl from MIVTable use this: (HID should be auto increment)
INSERT INTO HeaderTbl
([MIVID], [TotalQuantity], [TotalValue])
SELECT MIVID, SUM(Quantity), SUM(Value) FROM MIVTable GROUP BY MIVID;
To insert records in DetailTbl from HeaderTbl and MIVTable use this:
INSERT INTO DetailTbl
([HID], [MIVID], [Quantity], [Value])
SELECT H.HID, M.*
FROM HeaderTbl H
INNER JOIN MIVTable M
ON H.MIVID = M.MIVID;
Look at this SQLFiddle
Here you need to use INSERT INTO SELECT statement to insert data from one table to another. You can also use JOIN in such statement as I did it for DetailTbl.

You would generate the HeaderTbl using RANK() SQL Server function, as follows:
SELECT RANK() OVER (ORDER BY MIVID) as HID, MIVID, TotalQuantity, TotalValue
FROM
(
SELECT
MIVID,
SUM(Quantity) as TotalQuantity,
SUM(Value) as TotalValue
FROM MIVTable GROUP BY MIVID
) AS A
and the Detail table using the ROW_NUMBER() SQL Server function, as follows:
SELECT
ROW_NUMBER() OVER (ORDER BY MIVID) AS HID,
MIVID,
Quantity,
Value
FROM MIVTable

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

LISTAGG equivalent with windowing clause - sql

Related

analytical functions

Query to pull data from column based off max value of second column

Getting latest price of different products from control table

How to get latest records based on two columns of max

Split a Table into 2 or more Tables based on Column value

Categories

Resources