SQL query to Calculate allocation / netting - sql

Here is my source data,
Group | Item | Capacity
-----------------------
1 | A | 100
1 | B | 80
1 | C | 20
2 | A | 90
2 | B | 40
2 | C | 20
The above data shows the capacity to consume "something" for each item.
Now suppose I have maximum 100 allocated to each group. I want to distribute this "100" to each group upto the item's maximum capacity. So my desired output is like this:
Group | Item | Capacity | consumption
-------------------------------------
1 | A | 100 | 100
1 | B | 80 | 0
1 | C | 20 | 0
2 | A | 90 | 90
2 | B | 40 | 10
2 | C | 20 | 0
My question is how do I do it in a single SQL query (preferably avoiding any subquery construct). Please note, number of items in each group is not fixed.
I was trying LAG() with running SUM(), but could not quite produce the desired output...
select
group, item, capacity,
sum (capacity) over (partition by group order by item range between UNBOUNDED PRECEDING AND CURRENT ROW) run_tot,
from table_name

Without a subquery using just the analytic SUM function:
SQL> create table mytable (group_id,item,capacity)
2 as
3 select 1, 'A' , 100 from dual union all
4 select 1, 'B' , 80 from dual union all
5 select 1, 'C' , 20 from dual union all
6 select 2, 'A' , 90 from dual union all
7 select 2, 'B' , 40 from dual union all
8 select 2, 'C' , 20 from dual
9 /
Table created.
SQL> select group_id
2 , item
3 , capacity
4 , case
5 when sum(capacity) over (partition by group_id order by item) > 100 then 100
6 else sum(capacity) over (partition by group_id order by item)
7 end -
8 case
9 when nvl(sum(capacity) over (partition by group_id order by item rows between unbounded preceding and 1 preceding),0) > 100 then 100
10 else nvl(sum(capacity) over (partition by group_id order by item rows between unbounded preceding and 1 preceding),0)
11 end consumption
12 from mytable
13 /
GROUP_ID I CAPACITY CONSUMPTION
---------- - ---------- -----------
1 A 100 100
1 B 80 0
1 C 20 0
2 A 90 90
2 B 40 10
2 C 20 0
6 rows selected.

Here's a solution using recursive subquery factoring. This clearly ignores your preference to avoid subqueries, but doing this in one pass might be impossible.
Probably the only way to do this in one pass is to use MODEL, which I'm not allowed to code after midnight. Maybe someone waking up in Europe can figure it out.
with ranked_items as
(
--Rank the items. row_number() should also randomly break ties.
select group_id, item, capacity,
row_number() over (partition by group_id order by item) consumer_rank
from consumption
),
consumer(group_id, item, consumer_rank, capacity, consumption, left_over) as
(
--Get the first item and distribute as much of the 100 as possible.
select
group_id,
item,
consumer_rank,
capacity,
least(100, capacity) consumption,
100 - least(100, capacity) left_over
from ranked_items
where consumer_rank = 1
union all
--Find the next row by the GROUP_ID and the artificial CONSUMER_ORDER_ID.
--Distribute as much left-over from previous consumption as possible.
select
ranked_items.group_id,
ranked_items.item,
ranked_items.consumer_rank,
ranked_items.capacity,
least(left_over, ranked_items.capacity) consumption,
left_over - least(left_over, ranked_items.capacity) left_over
from ranked_items
join consumer
on ranked_items.group_id = consumer.group_id
and ranked_items.consumer_rank = consumer.consumer_rank + 1
)
select group_id, item, capacity, consumption
from consumer
order by group_id, item;
Sample data:
create table consumption(group_id number, item varchar2(1), capacity number);
insert into consumption
select 1, 'A' , 100 from dual union all
select 1, 'B' , 80 from dual union all
select 1, 'C' , 20 from dual union all
select 2, 'A' , 90 from dual union all
select 2, 'B' , 40 from dual union all
select 2, 'C' , 20 from dual;
commit;

Does this work as expected?
WITH t AS
(SELECT GROUP_ID, item, capacity,
SUM(capacity) OVER (PARTITION BY GROUP_ID ORDER BY item RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) sum_run,
GREATEST(100-SUM(capacity) OVER (PARTITION BY GROUP_ID ORDER BY item RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW), 0) AS remain
FROM table_name)
SELECT t.*,
LEAST(sum_run,lag(remain, 1, 100) OVER (PARTITION BY GROUP_ID ORDER BY item)) AS run_tot
FROM t

select group_id,item,capacity,(case when rn=1 then capacity else 0 end) consumption
from
(select group_id,item,capacity,
row_number() over (partition by group_id order by capacity desc) rn from mytable)

Related

Select with limited join

I have two tables: products and products_prices.
products table:
id
name
user_id
1
Headphones
1
2
Phone
1
products_prices table:
id
product_id
price
time
1
1
10
1
2
1
15
2
3
1
20
3
4
2
10
4
5
2
15
5
6
2
20
6
I have a simple query:
SELECT * FROM products WHERE (user_id = 1) LIMIT 1 OFFSET 1
So I need to get limited rows from products table with only two prices values from table product_prices ordered by time for each row in products.
(I need to get product with two latest prices).
This is example of what I want to get:
id
user_id
name
curr_price
prev_price
2
1
Phone
20
15
And example of my query:
select products.*,
(SELECT price FROM products_prices WHERE product_id = products.id ORDER BY time asc LIMIT 1 OFFSET 0) as curr_price,
(SELECT price FROM products_prices WHERE product_id = products.id ORDER BY time asc LIMIT 1 OFFSET 1) as prev_price
from "products"
where (products."user_id" = 1)
limit 1 offset 1
Is it possible to do it without subqueries?
Not sure I find any of these easier to read...
0th approach using window functions and a CTE Demo
With products as (SELECT 1 ID, 'Headphones' name, 1 user_id UNION ALL
SELECT 2 ID, 'Phone' name, 1 user_id ),
products_Prices as (SELECT 1 ID, 1 Product_ID, 10 price, 1 time UNION ALL
SELECT 2 ID, 1 Product_ID, 15 price, 2 time UNION ALL
SELECT 3 ID, 1 Product_ID, 20 price, 3 time UNION ALL
SELECT 4 ID, 2 Product_ID, 33 price, 4 time UNION ALL
SELECT 5 ID, 2 Product_ID, 22 price, 5 time UNION ALL
SELECT 6 ID, 2 Product_ID, 11 price, 6 time),
STEP1 as (
SELECT P.ID, P.Name, P.user_ID,
price as CurrentPrice, lead(price) over (partition by P.ID order by time desc) Prev_Price, time,
row_number() over (Partition by P.ID order by time Desc) RN
FROM Products P
LEFT JOIN Products_Prices Z
on Z.Product_ID = P.ID)
SELECT Id, Name, User_ID, CurrentPRice, PRev_Price
From Step1 where RN = 1
Giving us:
+----+------------+---------+--------------+------------+
| id | name | user_id | currentprice | prev_price |
+----+------------+---------+--------------+------------+
| 1 | Headphones | 1 | 20 | 15 |
| 2 | Phone | 1 | 11 | 22 |
+----+------------+---------+--------------+------------+
1st approach using analytics and a CTE: note I changed price numbers to show variance.
DEMO
With products as (SELECT 1 ID, 'Headphones' name, 1 user_id UNION ALL
SELECT 2 ID, 'Phone' name, 1 user_id ),
products_Prices as (SELECT 1 ID, 1 Product_ID, 10 price, 1 time UNION ALL
SELECT 2 ID, 1 Product_ID, 15 price, 2 time UNION ALL
SELECT 3 ID, 1 Product_ID, 20 price, 3 time UNION ALL
SELECT 4 ID, 2 Product_ID, 33 price, 4 time UNION ALL
SELECT 5 ID, 2 Product_ID, 22 price, 5 time UNION ALL
SELECT 6 ID, 2 Product_ID, 11 price, 6 time),
STEP1 as (SELECT P.ID, P.Name, P.user_ID, PP.price, row_number() over (partition by PP.product_ID order by time desc) RN
FROM Products P
LEFT JOIN products_prices PP
on P.ID = PP.Product_ID)
SELECT ID, Name, User_ID, max(case when RN = 1 then Price end) as Current_price, max(case when RN=2 then price end) as Last_price
FROM STEP1
WHERE RN <=2
GROUP BY ID, name, User_ID
Giving us:
+----+------------+---------+---------------+------------+
| id | name | user_id | current_price | last_price |
+----+------------+---------+---------------+------------+
| 2 | Phone | 1 | 11 | 22 |
| 1 | Headphones | 1 | 20 | 15 |
+----+------------+---------+---------------+------------+
Option 2 using lateral.
demo
With products as (SELECT 1 ID, 'Headphones' name, 1 user_id UNION ALL
SELECT 2 ID, 'Phone' name, 1 user_id ),
products_Prices as (SELECT 1 ID, 1 Product_ID, 10 price, 1 time UNION ALL
SELECT 2 ID, 1 Product_ID, 15 price, 2 time UNION ALL
SELECT 3 ID, 1 Product_ID, 20 price, 3 time UNION ALL
SELECT 4 ID, 2 Product_ID, 33 price, 4 time UNION ALL
SELECT 5 ID, 2 Product_ID, 22 price, 5 time UNION ALL
SELECT 6 ID, 2 Product_ID, 11 price, 6 time)
SELECT P.ID, P.Name, P.user_ID, PP.price, time
FROM Products P
LEFT JOIN lateral (SELECT Product_ID, Price, time
FROM Products_Prices Z
WHERE Z.Product_ID = P.ID
ORDER BY Time Desc LIMIT 2) PP
on TRUE
ORDER BY TIME DESC;
Givng us : (unpivoted) and using the row number logic above we could pivot.
+----+------------+---------+-------+------+
| id | name | user_id | price | time |
+----+------------+---------+-------+------+
| 2 | Phone | 1 | 11 | 6 |
| 2 | Phone | 1 | 22 | 5 |
| 1 | Headphones | 1 | 20 | 3 |
| 1 | Headphones | 1 | 15 | 2 |
+----+------------+---------+-------+------+

Oracle recursively calculate total base on tax

I have a temp table like this:
id d tax_rate money
1 20210101 5 100
1 20210201 15 0
1 20210301 20 0
1 20210401 5 0
This is the output I want to select:
id d tax_rate money total
1 20210101 5 100 105
1 20210201 15 105 120.75
1 20210301 20 120.75 144.9
1 20210401 5 144.9 152.145
This means that I need to recursively calculate the total based on tax_rate and previous total (in first day previous total = money).
total = previous total (by date) * (1 + tax_rate) (tax_rate in percentage)
I tried using LAG() OVER() but LAG only calculate previous, not recursively so from 3rd day the calculated return wrong total.
In my case, if I can use LAG or any function to multiple all the previous tax_rate (e.g 1.05 * 1.15 * 1.2 = 1.449) then I can calculate the right previous total, but no luck to find a function to do that.
WITH tmp AS
(
SELECT 1 AS id, 20210101 AS d, 5 AS tax_rate, 1000 AS money FROM dual UNION ALL
SELECT 1 AS id, 20210201 AS d, 15 AS tax_rate, 0 AS money FROM dual UNION ALL
SELECT 1 AS id, 20210301 AS d, 20 AS tax_rate, 0 AS money FROM dual UNION ALL
SELECT 1 AS id, 20210401 AS d, 5 AS tax_rate, 0 AS money FROM dual
)
SELECT *
FROM tmp;
You can try to use mathematical formulas to do accumulate for multiplication.
Then calculate money by the accumulate for multiplication.
Query 1:
SELECT ID, D, tax_rate,
SUM(money) OVER(PARTITION BY ID ORDER BY ID) * EXP(SUM(LN(CAST(tax_rate AS DECIMAL(5,2))/100 + 1))over(PARTITION BY ID ORDER BY d)) total
FROM tmp
Results:
| ID | D | TAX_RATE | TOTAL |
|----|----------|----------|---------|
| 1 | 20210101 | 5 | 105 |
| 1 | 20210201 | 15 | 120.75 |
| 1 | 20210301 | 20 | 144.9 |
| 1 | 20210401 | 5 | 152.145 |
One option would be something like this
WITH tmp AS
(
SELECT 1 AS id, 20210101 AS d, 5 AS tax_rate, 100 AS money FROM dual UNION ALL
SELECT 1 AS id, 20210201 AS d, 15 AS tax_rate, 0 AS money FROM dual UNION ALL
SELECT 1 AS id, 20210301 AS d, 20 AS tax_rate, 0 AS money FROM dual UNION ALL
SELECT 1 AS id, 20210401 AS d, 5 AS tax_rate, 0 AS money FROM dual
),
running_total( id, d, tax_rate, money, total )
as (
select id, d, tax_rate, money, money * (1 + tax_rate/100) total
from tmp
where money != 0
union all
select t.id, t.d, t.tax_rate, t.money, rt.total * (1 + t.tax_rate/100)
from tmp t
join running_total rt
on t.id = rt.id
and to_date( rt.d, 'yyyyddmm' ) = to_date( t.d, 'yyyyddmm' ) - 1
)
select *
from running_total;
See this dbfiddle.
I am assuming that the first row, which forms the base of the recursive CTE, is the row where money != 0 (so there would be only one such row per id). You could change that to pick the row with the earliest date per id or whatever other "first row" logic your actual data supports.
Note that life will be easier for you if you use actual dates for dates rather than using numbers that represent dates. For a 4 row virtual table, it won't matter much that you have to do a to_date on both sides of the join in the running_total recursive CTE. But for a real table with a decent number of rows, you'd want to be able to have an index on (id, d) to get decent performance. You could, of course, create a function-based index but then you'd either need to explicitly specify things like the NLS environment in your to_date call or deal with the potential for sessions not to use your index if their NLS environment doesn't match the NLS settings used to create the index.

What is the most efficient SQL query to find the max N values for every entities in a table

I wrote these 2 queries, the first one is keeping duplicates and the second one is dropping them
Does anyone know a more efficient way to achieve this?
Queries are for MSSQL, returning the top 3 values
1-
SELECT TMP.entity_id, TMP.value
FROM(
SELECT TAB.entity_id, LEAD(TAB.entity_id, 3, 0) OVER(ORDER BY TAB.entity_id, TAB.value) AS next_id, TAB.value
FROM mytable TAB
) TMP
WHERE TMP.entity_id <> TMP.next_id
2-
SELECT TMP.entity_id, TMP.value
FROM(
SELECT TMX.entity_id, LEAD(TMX.entity_id, 3, 0) OVER(ORDER BY TMX.entity_id, TMX.value) AS next_id, TMX.value
FROM(
SELECT TAB.entity_id, LEAD(TAB.entity_id, 1, 0) OVER(ORDER BY TAB.entity_id, TAB.value) AS next_id, TAB.value, LEAD(TAB.value, 1, 0) OVER(ORDER BY TAB.entity_id, TAB.value) AS next_value
FROM mytable TAB
) TMX
WHERE TMP.entity_id <> TMP.next_id OR TMX.value <> TMX.next_value
) TMP
WHERE TMP.entity_id <> TMP.next_id
Example:
Table:
entity_id value
--------- -----
1 9
1 11
1 12
1 3
2 25
2 25
2 5
2 37
3 24
3 9
3 2
3 15
Result Query 1 (25 appears twice for entity_id 2):
entity_id value
--------- -----
1 9
1 11
1 12
2 25
2 25
2 37
3 9
3 15
3 24
Result Query 2 (25 appears only once for entity_id 2):
entity_id value
--------- -----
1 9
1 11
1 12
2 5
2 25
2 37
3 9
3 15
3 24
You can use the ROW_NUMBER which will allow duplicates as follows:
select entity_id, value from
(select t.*, row_number() over (partition by entity_id order by value desc) as rn
from your_Table) where rn <= 3
You can use the rank to remove the duplicate as follows:
select distinct entity_id, value from
(select t.*, rank() over (partition by entity_id order by value desc) as rn
from your_Table) where rn <= 3

How to find the last non null value of a column and recursively find the sum value of another column

Suppose I have a column A and currently fetched value of A is null. I need to go back to previous rows and find the non -null value of column A. Then I need to find the sum of another column B from the point non value is seen till the current point. After that I need to add the sum of B with A, which will be new value of A.
For finding the column A non null value I have written the query as
nvl(last_value(nullif(A,0)) ignore nulls over (order by A),0)
But I need to do the calculation of B as mentioned above.
nvl(last_value(nullif(A,0)) ignore nulls over (order by A),0)
Can anyone please help me out ?
Sample data
A B date
null 20 14/06/2019
null 40 13/06/2019
10 50 12/06/2019
here value of A on 14/06/2019 should be replaced by sum of B + value of A on 12/06/2019(which is the 1st non null value of A)=20+40+50+10=120
If you have version 12c or higher:
with t( A,B, dte ) as
(
select null, 20, date'2019-06-14' from dual union all
select null, 40, date'2019-06-13' from dual union all
select 10 ,50, date'2019-06-12' from dual
)
select * from t
match_recognize(
order by dte desc
measures
nvl(
first(a),
y.a + sum(b)
) as a,
first(b) as b,
first(dte) as dte
after match skip to next row
pattern(x* y{0,1})
define x as a is null,
y as a is not null
);
A B DTE
------ ---------- ----------
120 20 2019-14-06
100 40 2019-13-06
10 50 2019-12-06
Use conditional count to divide data into separate groups, then use this group for analytical calculation:
select a, b, dt, grp, sum(nvl(a, 0) + nvl(b, 0)) over (partition by grp order by dt) val
from (
select a, b, dt, count(case when a is not null then 1 end) over (order by dt) grp
from t order by dt desc)
order by dt desc
Sample result:
A B DT GRP VAL
------ ---------- ----------- ---------- ----------
20 2019-06-14 4 120
40 2019-06-13 4 100
10 50 2019-06-12 4 60
5 2 2019-06-11 3 7
6 1 2019-06-10 2 7
3 2019-06-09 1 14
7 4 2019-06-08 1 11
demo
I think what you want is handled by using
sum(<column>) over (...) together with last_value over (...) function as below
:
with t( A,B, "date" ) as
(
select null, 20, date'2019-06-14' from dual union all
select null, 40, date'2019-06-13' from dual union all
select 10 ,50, date'2019-06-12' from dual
)
select nvl(a,sum(b) over (order by 1)+
last_value(a) ignore nulls
over (order by 1 desc)
) as a,
b, "date"
from t;
A B date
--- -- ----------
120 20 14.06.2019
120 40 13.06.2019
10 50 12.06.2019
Demo

sql - count one column based on another column

I have a dataset
case_id subcase_id
1 | 1-1
1 | 1-2
1 | 1-3
1 | 1-6
2 | 2-1
2 | 2-7
I want the following output. The idea is to count the occurence of a subcase corresponding to a case.
case_id subcase_id
1 | 1-1 | 1
1 | 1-2 | 2
1 | 1-3 | 3
1 | 1-6 | 4
2 | 2-1 | 1
2 | 2-7 | 2
You can try using row_number() function
select
caseid,
subcase_id,
row_number() over(partition by caseid
order by
cast(SUBSTR(subcase_id, 1,INSTR(subcase_id, '-') -1) as number),
cast(SUBSTR(subcase_id, INSTR(subcase_id, '-') +1) as number)) as rn
from tablename
You may use count() over (partition by .. order by ..) clause as :
with t(case_id,subcase_id) as
(
select 1,'1-1' from dual union all
select 1,'1-2' from dual union all
select 1,'1-3' from dual union all
select 1,'1-6' from dual union all
select 2,'2-1' from dual union all
select 2,'2-7' from dual
)
select t.*,
count(*) over (partition by case_id order by subcase_id)
as result
from t;
CASE_ID SUBCASE_ID RESULT
------- ---------- ------
1 1-1 1
1 1-2 2
1 1-3 3
1 1-6 4
2 2-1 1
2 2-7 2
where subcase_id is changes frequently and distinct for all values while case_id changes rarely.
Rextester Demo
Here is a query which should behave as you want. We have to isolate the two numeric components of the subcase_id, and then cast them to integers, to avoid sorting this column as text.
SELECT
case_id,
subcase_id,
ROW_NUMBER() OVER (PARTITION BY case_id
ORDER BY TO_NUMBER(SUBSTR(subcase_id, 1, INSTR(subcase_id, '-') - 1)),
TO_NUMBER(SUBSTR(subcase_id, INSTR(subcase_id, '-') + 1))) rn
FROM yourTable
ORDER BY
case_id,
TO_NUMBER(SUBSTR(subcase_id, 1, INSTR(subcase_id, '-') - 1)),
TO_NUMBER(SUBSTR(subcase_id, INSTR(subcase_id, '-') + 1));
Demo
It is not a good idea to treat the subcase_id column as both text and numbers. If you really have a long term need to sort on this column, then I suggest breaking out the two numeric components as separate number columns.