Oracle recursively calculate total base on tax - sql

I have a temp table like this:
id d tax_rate money
1 20210101 5 100
1 20210201 15 0
1 20210301 20 0
1 20210401 5 0
This is the output I want to select:
id d tax_rate money total
1 20210101 5 100 105
1 20210201 15 105 120.75
1 20210301 20 120.75 144.9
1 20210401 5 144.9 152.145
This means that I need to recursively calculate the total based on tax_rate and previous total (in first day previous total = money).
total = previous total (by date) * (1 + tax_rate) (tax_rate in percentage)
I tried using LAG() OVER() but LAG only calculate previous, not recursively so from 3rd day the calculated return wrong total.
In my case, if I can use LAG or any function to multiple all the previous tax_rate (e.g 1.05 * 1.15 * 1.2 = 1.449) then I can calculate the right previous total, but no luck to find a function to do that.
WITH tmp AS
(
SELECT 1 AS id, 20210101 AS d, 5 AS tax_rate, 1000 AS money FROM dual UNION ALL
SELECT 1 AS id, 20210201 AS d, 15 AS tax_rate, 0 AS money FROM dual UNION ALL
SELECT 1 AS id, 20210301 AS d, 20 AS tax_rate, 0 AS money FROM dual UNION ALL
SELECT 1 AS id, 20210401 AS d, 5 AS tax_rate, 0 AS money FROM dual
)
SELECT *
FROM tmp;

You can try to use mathematical formulas to do accumulate for multiplication.
Then calculate money by the accumulate for multiplication.
Query 1:
SELECT ID, D, tax_rate,
SUM(money) OVER(PARTITION BY ID ORDER BY ID) * EXP(SUM(LN(CAST(tax_rate AS DECIMAL(5,2))/100 + 1))over(PARTITION BY ID ORDER BY d)) total
FROM tmp
Results:
| ID | D | TAX_RATE | TOTAL |
|----|----------|----------|---------|
| 1 | 20210101 | 5 | 105 |
| 1 | 20210201 | 15 | 120.75 |
| 1 | 20210301 | 20 | 144.9 |
| 1 | 20210401 | 5 | 152.145 |

One option would be something like this
WITH tmp AS
(
SELECT 1 AS id, 20210101 AS d, 5 AS tax_rate, 100 AS money FROM dual UNION ALL
SELECT 1 AS id, 20210201 AS d, 15 AS tax_rate, 0 AS money FROM dual UNION ALL
SELECT 1 AS id, 20210301 AS d, 20 AS tax_rate, 0 AS money FROM dual UNION ALL
SELECT 1 AS id, 20210401 AS d, 5 AS tax_rate, 0 AS money FROM dual
),
running_total( id, d, tax_rate, money, total )
as (
select id, d, tax_rate, money, money * (1 + tax_rate/100) total
from tmp
where money != 0
union all
select t.id, t.d, t.tax_rate, t.money, rt.total * (1 + t.tax_rate/100)
from tmp t
join running_total rt
on t.id = rt.id
and to_date( rt.d, 'yyyyddmm' ) = to_date( t.d, 'yyyyddmm' ) - 1
)
select *
from running_total;
See this dbfiddle.
I am assuming that the first row, which forms the base of the recursive CTE, is the row where money != 0 (so there would be only one such row per id). You could change that to pick the row with the earliest date per id or whatever other "first row" logic your actual data supports.
Note that life will be easier for you if you use actual dates for dates rather than using numbers that represent dates. For a 4 row virtual table, it won't matter much that you have to do a to_date on both sides of the join in the running_total recursive CTE. But for a real table with a decent number of rows, you'd want to be able to have an index on (id, d) to get decent performance. You could, of course, create a function-based index but then you'd either need to explicitly specify things like the NLS environment in your to_date call or deal with the potential for sessions not to use your index if their NLS environment doesn't match the NLS settings used to create the index.

Related

multiple top n aggregates query defined as a view (or function)?

I couldn't find a past question exactly like this problem. I have an orders table, containing a customer id, order date, and several numeric columns (how many of a particular item were ordered on that date). Removing some of the numberics, it looks like this:
customer_id date a b c d
0001 07/01/22 0 3 3 5
0001 07/12/22 12 0 50 0
0002 06/30/22 5 65 0 30
0002 07/20/22 1 0 19 2
0003 08/01/22 0 0 99 0
I need to sum each numeric column by customer_id, then return the top n customers for each column. Obviously that means a single customer may appear multiple times, once for each column. Assuming top 2, the desired output would look something like this:
column_ranked customer_id sum rank
'a' 001 12 1
'a' 002 6 2
'b' 002 65 1
'b 001 3 2
'c' 003 99 1
'c' 001 53 2
'd' 002 30 1
'd' 001 5 2
(this assumes no date range filter)
My first thought was a CTE to collapse the table into its per-customer sums, then a union from the CTE, with a limit n clause, once for each summed column. That works if the date range is hard-coded into the CTE .... but I want to define this as a view, so it can be called by users something like this:
SELECT * from top_customers_view WHERE date_range BETWEEN ( date1 and date2 )
How can I pass the date restriction down to the CTE? Or am I taking the wrong approach entirely? If a view isn't possible, can it be done as a function? (without using a costly cursor, that is.)
Since the date ranges clearly produce a massive number of combinations you cannot generate a view with them. You can write a query, however, as shown below:
with
p as (select cast ('2022-01-01' as date) as ds, cast ('2022-12-31' as date) as de),
a as (
select top 10 customer_id, 'a' as col, sum(a) as s
from t cross join p where date between ds and de
group by customer_id order by s desc
),
b as (
select top 10 customer_id, 'b' as col, sum(b) as s
from t cross join p where date between ds and de
group by customer_id order by s desc
),
c as (
select top 10 customer_id, 'c' as col, sum(b) as s
from t cross join p where date between ds and de
group by customer_id order by s desc
),
d as (
select top 10 customer_id, 'd' as col, sum(b) as s
from t cross join p where date between ds and de
group by customer_id order by s desc
)
select * from a
union all select * from b
union all select * from c
union all select * from d
order by customer_id, col, s desc
The date range is in the second line.
See db<>fiddle.
Alternatively, you could create a data warehousing solution, but it would require much more effort to make it work.

How to find the last non null value of a column and recursively find the sum value of another column

Suppose I have a column A and currently fetched value of A is null. I need to go back to previous rows and find the non -null value of column A. Then I need to find the sum of another column B from the point non value is seen till the current point. After that I need to add the sum of B with A, which will be new value of A.
For finding the column A non null value I have written the query as
nvl(last_value(nullif(A,0)) ignore nulls over (order by A),0)
But I need to do the calculation of B as mentioned above.
nvl(last_value(nullif(A,0)) ignore nulls over (order by A),0)
Can anyone please help me out ?
Sample data
A B date
null 20 14/06/2019
null 40 13/06/2019
10 50 12/06/2019
here value of A on 14/06/2019 should be replaced by sum of B + value of A on 12/06/2019(which is the 1st non null value of A)=20+40+50+10=120
If you have version 12c or higher:
with t( A,B, dte ) as
(
select null, 20, date'2019-06-14' from dual union all
select null, 40, date'2019-06-13' from dual union all
select 10 ,50, date'2019-06-12' from dual
)
select * from t
match_recognize(
order by dte desc
measures
nvl(
first(a),
y.a + sum(b)
) as a,
first(b) as b,
first(dte) as dte
after match skip to next row
pattern(x* y{0,1})
define x as a is null,
y as a is not null
);
A B DTE
------ ---------- ----------
120 20 2019-14-06
100 40 2019-13-06
10 50 2019-12-06
Use conditional count to divide data into separate groups, then use this group for analytical calculation:
select a, b, dt, grp, sum(nvl(a, 0) + nvl(b, 0)) over (partition by grp order by dt) val
from (
select a, b, dt, count(case when a is not null then 1 end) over (order by dt) grp
from t order by dt desc)
order by dt desc
Sample result:
A B DT GRP VAL
------ ---------- ----------- ---------- ----------
20 2019-06-14 4 120
40 2019-06-13 4 100
10 50 2019-06-12 4 60
5 2 2019-06-11 3 7
6 1 2019-06-10 2 7
3 2019-06-09 1 14
7 4 2019-06-08 1 11
demo
I think what you want is handled by using
sum(<column>) over (...) together with last_value over (...) function as below
:
with t( A,B, "date" ) as
(
select null, 20, date'2019-06-14' from dual union all
select null, 40, date'2019-06-13' from dual union all
select 10 ,50, date'2019-06-12' from dual
)
select nvl(a,sum(b) over (order by 1)+
last_value(a) ignore nulls
over (order by 1 desc)
) as a,
b, "date"
from t;
A B date
--- -- ----------
120 20 14.06.2019
120 40 13.06.2019
10 50 12.06.2019
Demo

Select except where different in SQL

I need a bit of help with a SQL query.
Imagine I've got the following table
id | date | price
1 | 1999-01-01 | 10
2 | 1999-01-01 | 10
3 | 2000-02-02 | 15
4 | 2011-03-03 | 15
5 | 2011-04-04 | 16
6 | 2011-04-04 | 20
7 | 2017-08-15 | 20
What I need is all dates where only one price is present.
In this example I need to get rid of row 5 and 6 (because there is two difference prices for the same date) and either 1 or 2(because they're duplicate).
How do I do that?
select date,
count(distinct price) as prices -- included to test
from MyTable
group by date
having count(distinct price) = 1 -- distinct for the duplicate pricing
The following should work with any DBMS
SELECT id, date, price
FROM TheTable o
WHERE NOT EXISTS (
SELECT *
FROM TheTable i
WHERE i.date = o.date
AND (
i.price <> o.price
OR (i.price = o.price AND i.id < o.id)
)
)
;
JohnHC answer is more readable and delivers the information the OP asked for ("[...] I need all the dates [...]").
My answer, though less readable at first, is more general (allows for more complexes tie-breaking criteria) and also is capable of returning the full row (with id and price, not just date).
;WITH CTE_1(ID ,DATE,PRICE)
AS
(
SELECT 1 , '1999-01-01',10 UNION ALL
SELECT 2 , '1999-01-01',10 UNION ALL
SELECT 3 , '2000-02-02',15 UNION ALL
SELECT 4 , '2011-03-03',15 UNION ALL
SELECT 5 , '2011-04-04',16 UNION ALL
SELECT 6 , '2011-04-04',20 UNION ALL
SELECT 7 , '2017-08-15',20
)
,CTE2
AS
(
SELECT A.*
FROM CTE_1 A
INNER JOIN
CTE_1 B
ON A.DATE=B.DATE AND A.PRICE!=B.PRICE
)
SELECT * FROM CTE_1 WHERE ID NOT IN (SELECT ID FROM CTE2)

SQL query to Calculate allocation / netting

Here is my source data,
Group | Item | Capacity
-----------------------
1 | A | 100
1 | B | 80
1 | C | 20
2 | A | 90
2 | B | 40
2 | C | 20
The above data shows the capacity to consume "something" for each item.
Now suppose I have maximum 100 allocated to each group. I want to distribute this "100" to each group upto the item's maximum capacity. So my desired output is like this:
Group | Item | Capacity | consumption
-------------------------------------
1 | A | 100 | 100
1 | B | 80 | 0
1 | C | 20 | 0
2 | A | 90 | 90
2 | B | 40 | 10
2 | C | 20 | 0
My question is how do I do it in a single SQL query (preferably avoiding any subquery construct). Please note, number of items in each group is not fixed.
I was trying LAG() with running SUM(), but could not quite produce the desired output...
select
group, item, capacity,
sum (capacity) over (partition by group order by item range between UNBOUNDED PRECEDING AND CURRENT ROW) run_tot,
from table_name
Without a subquery using just the analytic SUM function:
SQL> create table mytable (group_id,item,capacity)
2 as
3 select 1, 'A' , 100 from dual union all
4 select 1, 'B' , 80 from dual union all
5 select 1, 'C' , 20 from dual union all
6 select 2, 'A' , 90 from dual union all
7 select 2, 'B' , 40 from dual union all
8 select 2, 'C' , 20 from dual
9 /
Table created.
SQL> select group_id
2 , item
3 , capacity
4 , case
5 when sum(capacity) over (partition by group_id order by item) > 100 then 100
6 else sum(capacity) over (partition by group_id order by item)
7 end -
8 case
9 when nvl(sum(capacity) over (partition by group_id order by item rows between unbounded preceding and 1 preceding),0) > 100 then 100
10 else nvl(sum(capacity) over (partition by group_id order by item rows between unbounded preceding and 1 preceding),0)
11 end consumption
12 from mytable
13 /
GROUP_ID I CAPACITY CONSUMPTION
---------- - ---------- -----------
1 A 100 100
1 B 80 0
1 C 20 0
2 A 90 90
2 B 40 10
2 C 20 0
6 rows selected.
Here's a solution using recursive subquery factoring. This clearly ignores your preference to avoid subqueries, but doing this in one pass might be impossible.
Probably the only way to do this in one pass is to use MODEL, which I'm not allowed to code after midnight. Maybe someone waking up in Europe can figure it out.
with ranked_items as
(
--Rank the items. row_number() should also randomly break ties.
select group_id, item, capacity,
row_number() over (partition by group_id order by item) consumer_rank
from consumption
),
consumer(group_id, item, consumer_rank, capacity, consumption, left_over) as
(
--Get the first item and distribute as much of the 100 as possible.
select
group_id,
item,
consumer_rank,
capacity,
least(100, capacity) consumption,
100 - least(100, capacity) left_over
from ranked_items
where consumer_rank = 1
union all
--Find the next row by the GROUP_ID and the artificial CONSUMER_ORDER_ID.
--Distribute as much left-over from previous consumption as possible.
select
ranked_items.group_id,
ranked_items.item,
ranked_items.consumer_rank,
ranked_items.capacity,
least(left_over, ranked_items.capacity) consumption,
left_over - least(left_over, ranked_items.capacity) left_over
from ranked_items
join consumer
on ranked_items.group_id = consumer.group_id
and ranked_items.consumer_rank = consumer.consumer_rank + 1
)
select group_id, item, capacity, consumption
from consumer
order by group_id, item;
Sample data:
create table consumption(group_id number, item varchar2(1), capacity number);
insert into consumption
select 1, 'A' , 100 from dual union all
select 1, 'B' , 80 from dual union all
select 1, 'C' , 20 from dual union all
select 2, 'A' , 90 from dual union all
select 2, 'B' , 40 from dual union all
select 2, 'C' , 20 from dual;
commit;
Does this work as expected?
WITH t AS
(SELECT GROUP_ID, item, capacity,
SUM(capacity) OVER (PARTITION BY GROUP_ID ORDER BY item RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) sum_run,
GREATEST(100-SUM(capacity) OVER (PARTITION BY GROUP_ID ORDER BY item RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW), 0) AS remain
FROM table_name)
SELECT t.*,
LEAST(sum_run,lag(remain, 1, 100) OVER (PARTITION BY GROUP_ID ORDER BY item)) AS run_tot
FROM t
select group_id,item,capacity,(case when rn=1 then capacity else 0 end) consumption
from
(select group_id,item,capacity,
row_number() over (partition by group_id order by capacity desc) rn from mytable)

Simple SQL Server COUNT query (counting changes to values in a column)

I have a table with columns: MONTH, YEAR, PROJECT_ID, STATUS.
Status can be:
R (red).
A (amber).
G (green).
N (not started).
C (completed).
I want to know how many projects completed in a given month i.e. :
where STATUS changed from anything that is NOT C to C;
It sounds simple...!
It's easy to find when any given project completed with:
SELECT TOP 1 MONTH,YEAR,PROJECT_ID FROM Table WHERE PROJECT_ID=9236 AND RAG='C'
ORDER BY YEAR ASC, MONTH ASC
But given year = 2011 and month = 8 (for example), I have no idea how to find the number of projects that had status='C' for the first time that month. Any ideas?
Edit: projects are still included as rows with status='C' after they complete, so I can't just count the Cs as that will return the number of projects that completed in this AND previous months (hence the chronological ordering and select top 1).
Sample data for 10/2010 to 01/2011 months:
Month | Year | Project | Status
-------------------------------
10 | 2010 | A | G
11 | 2010 | A | C
12 | 2010 | A | C
1 | 2011 | A | C
10 | 2010 | B | R
11 | 2010 | B | R
12 | 2010 | B | R
1 | 2011 | B | R
10 | 2010 | C | G
11 | 2010 | C | G
12 | 2010 | C | G
1 | 2011 | C | C
10 | 2010 | D | A
11 | 2010 | D | C
12 | 2010 | D | C
1 | 2011 | D | C
^ Projects A and D was completed in 11/2010. Project B hasn't changed to completed in any of the four months shown. Project C was completed in 01/2011. {Month,Year,Project} is the primary key.
So, inputs and outputs would be:
10/2010 => 0
11/2010 => 2 (because of A and D)
12/2010 => 0
1/2011 => 1 (because of C)
This will give you the counts you are looking for
select p1.mm,p1.yyyy,COUNT(*)
from projs p1
join (select projid,MIN(yyyy*100+mm) as closedOn from projs
where stat='c' group by projId) xx
on xx.projId=p1.projId and p1.yyyy*100+p1.mm=xx.closedOn
where p1.stat='c'
group by p1.mm,p1.yyyy
The inner query determines the date the project closed, so you are finding all projects which closed this month...
There you go
WITH
src(month, year, project, status) AS (
SELECT 10,2010,'A','G' UNION ALL
SELECT 11,2010,'A','C' UNION ALL
SELECT 12,2010,'A','C' UNION ALL
SELECT 1,2011,'A','C' UNION ALL
SELECT 10,2010,'B','R' UNION ALL
SELECT 11,2010,'B','R' UNION ALL
SELECT 12,2010,'B','R' UNION ALL
SELECT 1,2011,'B','R' UNION ALL
SELECT 10,2010,'C','G' UNION ALL
SELECT 11,2010,'C','G' UNION ALL
SELECT 12,2010,'C','G' UNION ALL
SELECT 1,2011,'C','C' UNION ALL
SELECT 10,2010,'D','A' UNION ALL
SELECT 11,2010,'D','C' UNION ALL
SELECT 12,2010,'D','C' UNION ALL
SELECT 1,2011,'D','C'),
src_date (date, project, status) AS (
SELECT date = CONVERT(DATETIME, CONVERT(VARCHAR, year * 100 + month) + '01'), project, status
FROM src
)
SELECT month = CONVERT(VARCHAR, YEAR(alldates.date)) + '/' + CONVERT(VARCHAR, MONTH(alldates.date)),
projects = ISNULL(cnt.value,0)
FROM (
SELECT DISTINCT date
FROM src_date
) alldates
LEFT JOIN
(
SELECT date = min_date, value = COUNT(*)
FROM
(
SELECT project, min_date = MIN(date)
FROM src_date
WHERE status = 'C'
GROUP BY project
) mins
GROUP BY min_date
) cnt
ON alldates.date = cnt.date
SELECT
distinctMonths.month,
distinctMonths.year,
count(countProjects.project) as numChanges
FROM
(
SELECT DISTINCT
month, year
FROM
Table
) as distinctMonths -- need to get all months available, independent of the project status, in case there were not an complete ones during a given month
LEFT OUTER JOIN
(
SELECT
Month, Year, Project
FROM
Table
WHERE
status = 'C' AND
NOT EXISTS ( -- this will filter out our result set to only include the earliest instance of the given project's complete status
SELECT
1
FROM
Table t2
WHERE
t2.project = Table.project AND
t2.status = 'C' AND
( -- this will convert the date fragments into proper date values, that can be compared easily
cast(
cast(t2.year as varchar) + '-' + cast(t2.month as varchar) + '-1'
as datetime)
<
cast(
cast(table.year as varchar) + '-' + cast(table.month as varchar) + '-1'
as datetime)
)
)
) as countProjects ON
distinctMonths.month = countProjects.month AND
distinctMonths.year = countProjects.year
GROUP BY
distinctMonths.month,
distinctMonths.year
ORDER BY
distinctMonths.year,
distinctMonths.month
I like to use this function: lead() over().
If you have, for example, this select:
select Month, Year, Project, Status
from youTable
where 1 = 1 --if you have any condition
I find next value of "status" column with lead() function and I compare with the current one so :
select count(1) as number from
(select lead(Status) over(order by Project) as nextStatus, Month, Year, Project, Status
from youTable
where 1=1) as tmp
where tmp.nextStatus <> tmp.Status
now, in number I have the numbers of changed value into "Status" column