Cumulative sum based on same column calculated result - sql

I have the following table, for which I am trying to calculate a running balance, and remaining value, but the remaining value is the function of the previously calculated row, as such:
date PR amount total balance remaining_value
----------------------------------------------------------
'2020-1-1' 1 1.0 100.0 1.0 100 -- 100 (inital total)
'2020-1-2' 1 2.0 220.0 3.0 320 -- 100 (previous row) + 220
'2020-1-3' 1 -1.5 -172.5 1.5 160 -- 320 - 160 (see explanation 1)
'2020-1-4' 1 3.0 270.0 4.5 430 -- 160 + 270
'2020-1-5' 1 1.0 85.0 5.5 515 -- 430 + 85
'2020-1-6' 1 2.0 202.0 7.5 717 -- 575 + 202
'2020-1-7' 1 -4.0 -463.0 3.5 334.6 -- 717 - 382.4 (see explanation 2)
'2020-1-8' 1 -0.5 -55.0 3.0 ...
'2020-1-9' 1 2.0 214.0 5.0
'2020-1-1' 2 1.0 100 1.0 100 -- different PR: start new running total
The logic is as follows:
For positive amount rows, the remaining value is simply the value from the previous row in column remaining_value + the value in column total from that row.
For negative amount rows, it gets tickier:
Explanation 1: We start with 320 (previous row balance) and from it we remove 1.5/3.0 (absolute value of current row amount divided by previous row balance) and we multiply it by the previous row remaining_value, which is 320. The calculation gives:
320 - (1.5/3 * 320) = 160
Explanation 2: Same logic as above. 717 - (4/7.5 * 717) = 717 - 382.4
4/7.5 here represents the current row's absolute amount divided by the previous row's balance.
I tried the window function sum() but did not manage to get the desired result. Is there a way to get this done in PostgreSQL without having to resort to a loop?
Extra complexity: There are multiple products identified by PR (product id), 1, 2 etc. Each need their own running total and calculation.

You could create a custom aggregate function:
CREATE OR REPLACE FUNCTION f_special_running_sum (_state numeric, _total numeric, _amount numeric, _prev_balance numeric)
RETURNS numeric
LANGUAGE sql IMMUTABLE AS
'SELECT CASE WHEN _amount > 0 THEN _state + _total
ELSE _state * (1 + _amount / _prev_balance) END';
CREATE OR REPLACE AGGREGATE special_running_sum (_total numeric, _amount numeric, _prev_balance numeric) (
sfunc = f_special_running_sum
, stype = numeric
, initcond = '0'
);
The CASE expression does the split: If amount is positive, just add total, else apply your (simplified) formula:
320 * (1 + -1.5 / 3.0) instead of 320 - (1.5/3 * 320), i.e.:
_state * (1 + _amount / _prev_balance)
Function and aggregate parameter names are only for documentation.
Then your query can look like this:
SELECT *
, special_running_sum(total, amount, prev_balance) OVER (PARTITION BY pr ORDER BY date)
FROM (
SELECT pr, date, amount, total
, lag(balance, 1, '1') OVER (PARTITION BY pr ORDER BY date) AS prev_balance
FROM tbl
) t;
db<>fiddle here
We need a subquery to apply the first window function lag() and fetch the previous balance into the current row (prev_balance). I default to 1 if there is no previous row to avoid NULL values.
Caveats:
If the first row has a negative total, the result is undefined. My aggregate function defaults to 0.
You did not declare data types, nor requirements regarding precision. I assume numeric and aim for maximum precision. The calculation with numeric is precise. But your formula produces fractional decimal numbers. Without rounding, there will be a lot of fractional digits after a couple of divisions, and the calculation will quickly degrade in performance. You'll have to strike a compromise between precision and performance. For example, doing the same with double precision has constant performance.
Related:
Cumulative adding with dynamic base in Postgres

Related

Calculate dollars to millions returns integer

I am trying to calculate dollar amount (in DB as Decimal(15, 0) NOT NULL) to millions by dividing but as result I get only integer amount.
desired result:
AMOUNT AMOUNT IN MIL
123000 0.1
1123000 1.123
I have this, but it return only 0, 1...
SELECT ... AMOUNT / 1000000 AS "AMOUNT IN MIL" FROM ....
Try to change the datatype to 6 fractional digits before calculation:
cast(amount as decimal(28,6)) / 1000000
You should familiarize yourself with rules of such arithmetic operations. Refer to the Expressions article.
Briefly:
INT / INT = INT
DEC(p, s) / DEC(p', s') = DEC(31, 31-p+s-s')
So, if you want to get DEC(31, X) on INT / INT, you may explicitly cast a numerator to DEC(31-X).
In your case (X=6):
> db2 describe dec(1, 25)/1000000
Column Information
Number of columns: 1
SQL type Type length Column name Name length
-------------------- ----------- ------------------------------ -----------
484 DECIMAL 31, 6 1 1

Calculate average of a row in SQL Server

I have the below table..the percent column is of type nvarchar
Data Percent1 Percent2 Percent3
1 3% 4% 6%
2 6% 8% 7%
3 8% 6% 8%
I have to calculate the Avg per line so I get results like
Data Avg
1 4.33%
I was trying to convert the %column into decimal so I can apply the average function
select
Case WHEN Isnumeric([Percent1]) = 1
THEN CONVERT(DECIMAL(18,2),Replace([Percent1],'%',''))
ELSE 0 END AS Percent1
from DashboardData
but I am just getting 0 values..I am guessing the outer function is running before the inner for some reason. Can someone please tell me how I can achieve this.
I know the IsNumeric function will make it 0 but I tried it before that and I was getting an exception that type is not a number.
Thanks
SELECT ISNUMERIC('3%') will return 0, as will all the rest of your values, so your else condition will always be the result.
Just drop the %
select
data,
(replace(Percent1,'%','') + replace(Percent2,'%','') + replace(Percent3,'%','')) * 1.0 / 3
Note, if any of these values are NULL you need to account for that because NULL + anything IS NULL.
Also, you don't want to lean on ISNUMERIC too heavy... it can produce some results you probably aren't expecting
select
ISNUMERIC('$') --money which is a numeric value
,ISNUMERIC('1e4') --scientific notation
,ISNUMERIC('45D-1') --old scientific notation
,ISNUMERIC('.') --just the decimal portion of a float / decimal
Is this what you want?
select dd.*, s.average
from dashboarddata dd cross apply
(select avg(try_convert(numeric(10, 2), replace(pc, '%', ''))) as average
from values (percent1), (percent2), (percent3)) as v(pc)
) s;

Redshift division result does not include decimals

I'm trying to do something really quite basic to calculate a kind of percentage between two columns in Redshift. However, when I run the query with an example the result is simply zero because the decimals are not being covered.
code:
select 1701 / 84936;
Output:
I tried :
select cast(1701 / 84936 as numeric (10,10));
but the result was 0.0000000000.
How could I solve this silly thing?
It is integer division. Make sure that at least one argument is: NUMERIC(accurate data type)/FLOAT(caution: it's approximate data type):
/ division (integer division truncates the result)
select 1701.0 / 84936;
-- or
SELECT 1.0 * 1701 / 84936;
-- or
SELECT CAST(1701 AS NUMERIC(10,4))/84936;
DBFiddle Demo
When mixing data types the order counts
Note that the order of the elements in a math expression counts for the data type of the result.
Let's assume that we intend to calculate the percentage unit_sales/total_sales where both columns (or numbers) are integers.
See and try with this code here.
-- Some dummy table
drop table if exists sales;
create table sales as
select 3 as unit_sales, 9 as total_sales;
-- The calculations
select
unit_sales/total_sales*100, --> 0 (integer)
unit_sales/total_sales*100.0, --> 0.0 (float)
100.0*unit_sales/total_sales --> 33.3 (float and expected result)
from sales;
The output
0 | 0.0 | 33.33
The first column is 0 (integer) because of 3/9=0 in an integer division.
The second column is 0.0 because SQL first got the integer 0 (3/9), and later, SQL converts it to float in order to perform the multiplication by 100.0.
The expected result.
The non-integer 100.0 at the beginning of the expression force a non-integer calculation.

SQL Rounding Percentages to make the sum 100% - 1/3 as 0.34, 0.33, 0.33

I am currently trying to split one value with percentage column. But as most of percentages values are 1/3, I am not able to get aboslute 100% with two decimal points in the value. For example:
Product Supplier percentage totalvalue customer_split
decimal(15,14) (decimal(18,2) decimal(18,2)
-------- -------- ------------ --------------- ---------------
Product1 Supplier1 0.33 10.00 3.33
Product1 Supplier2 0.33 10.00 3.33
Product1 Supplier3 0.33 10.00 3.33
So, here we are missing 0.01 in the value column and suppliers would like to put this missing 0.01 value against any one of the supplier randomly. I have been trying to get this done in a two sets of SQLs with temporary tables, but is there any simple way of doing this. If possible how can I get 0.34 in the percentage column itself for one of the above rows? 0.01 is negligible value, but when the value column is 1000000000 it is significant.
It sounds like you're doing some type of "allocation" here. This is a common problem any time you are trying to allocate something from a higher granulartiy to a lower granularity, and you need to be able to re-aggregate to the total value correctly.
This becomes a much bigger problem when dealing with larger fractions.
For example, if I try to divide a total value of, say $55.30 by eight, I get a decimal value of $6.9125 for each of the eight buckets. Should I round one to $6.92 and the rest to $6.91? If I do, I will lose a cent. I would have to round one to $6.93 and the others to $6.91. This gets worse as you add more buckets to divide by.
In addition, when you start to round, you introduce problems like "Should 33.339 be rounded to 33.34 or 33.33?"
If your business logic is such that you just want to take whatever remainder beyond 2 significant digits may exist and add it to one of the dollar values "randomly" so you don't lose any cents, #Diego is on the right track with this.
Doing it in pure SQL is a bit more difficult. For starters, your percentage isn't 1/3, it's .33, which will yield a total value of 9.9, not 10. I would either store this as a ratio or as a high-precision decimal field (.33333333333333).
P S PCT Total
-- -- ------------ ------
P1 S1 .33333333333 10.00
P2 S2 .33333333333 10.00
P3 S3 .33333333333 10.00
SELECT
BaseTable.P, BaseTable.S,
CASE WHEN BaseTable.S = TotalTable.MinS
THEN BaseTable.BaseAllocatedValue + TotalTable.Remainder
ELSE BaseTable.BaseAllocatedValue
END As AllocatedValue
FROM
(SELECT
P, S, FLOOR((PCT * Total * 100)) / 100 as BaseAllocatedValue,
FROM dataTable) BaseTable
INNER JOIN
(SELECT
P, MIN(S) AS MinS,
SUM((PCT * Total) - FLOOR((PCT * Total * 100)) / 100) as Remainder,
FROM dataTable
GROUP BY P) as TotalTable
ON (BaseTable.P = TotalTable.P)
It appears your calculation is an equal distribution based on the total number of products per supplier. If it is, it may be advantageous to remove the percentage and instead just store the count of items per supplier in the table.
If it is also possible to store a flag indicating the row that should get the remainder value applied to it, you could assign based on that flag instead of randomly.
run this, it will give an idea on how you can solve your problem.
I created a table called orders just with an ID to be easy to understand:
create table orders(
customerID int)
insert into orders values(1)
go 3
insert into orders values(2)
go 3
insert into orders values(3)
go 3
these values represent the 33% you have
1 33.33
2 33.33
3 33.33
now:
create table #tempOrders(
customerID int,
percentage numeric(10,2))
declare #maxOrder int
declare #maxOrderID int
select #maxOrderID = max(customerID) from orders
declare #total numeric(10,2)
select #total =count(*) from orders
insert into #tempOrders
select customerID, cast(100*count(*)/#total as numeric(10,2)) as Percentage
from orders
group by customerID
update #tempOrders set percentage = percentage + (select 100-sum(Percentage) from #tempOrders)
where customerID =#maxOrderID
this code will basically calculate the percentage and the order with the max ID, then it gets the diference from 100 to the percentage sum and add it to the order with the maxID (your random order)
select * from #tempOrders
1 33.33
2 33.33
3 33.34
This should be an easy task using Windowed Aggregate Functions. You probably use them already for the calculation of customer_split:
totalvalue / COUNT(*) OVER (PARTITION BY Product) as customer_split
Now sum up the customer_splits and if there's a difference to total value add (or substract) it to one random row.
SELECT
Product
,Supplier
,totalvalue
,customer_split
+ CASE
WHEN COUNT(*)
OVER (PARTITION BY Product
ROWS UNBOUNDED PRECEDING) = 1 -- get a random row, using row_number/order you might define a specific row
THEN totalvalue - SUM(customer_split)
OVER (PARTITION BY Product)
ELSE 0
END
FROM
(
SELECT
Product
,Supplier
,totalvalue
,totalvalue / COUNT(*) OVER (PARTITION BY Product) AS customer_split
FROM dropme
) AS dt
After more than one trial and test i think i found better solution
Idea
Get Count of all(Count(*)) based on your conditions
Get Row_Number()
Check if (Row_Number() value < Count(*))
Then select round(curr_percentage,2)
Else
Get sum of all other percentage(with round) and subtract it from 100
This steps will select current percentage every time EXCEPT Last one will be
100 - the sum of all other percentages
this is part of my code
Select your_cols
,(Select count(*) from [tbl_Partner_Entity] pa_et where [E_ID] =#E_ID)
AS cnt_all
,(ROW_NUMBER() over ( order by pe.p_id)) as row_num
,Case when (
(ROW_NUMBER() over ( order by pe.p_id)) <
(Select count(*) from [tbl_Partner_Entity] pa_et where [E_ID] =#E_ID))
then round(([partnership_partners_perc]*100),2)
else
100-
((select sum(round(([partnership_partners_perc]*100),2)) FROM [dbo].
[tbl_Partner_Entity] PEE where [E_ID] =#E_ID and pee.P_ID != pe.P_ID))
end AS [partnership_partners_perc_Last]
FROM [dbo].[tbl_Partner_Entity] PE
where [E_ID] =#E_ID

How can I round a column in a single SQL request without changing the overall sum?

I've got a table defined like this :
create table #tbFoo
(bar float)
And I'm looking for a way to round every value contained in column bar without changing the total sum (which is known to be an integer, or very close to an integer because of float number precision).
Rounding every value to the nearest integer won't work (ex : 1,5;1,5 will be rounded to 1;1 or 2;2)
It's quite easy to do this using several requests (eg storing the original sum, rounding, computing the new sum, and updating as many rows as needed to go back to the original sum), but this is not a very elegant solution.
Is there a way to do this using a single SQL request?
I'm using SQL Server 2008, so solutions taking advantage of this specific vendor are welcome.
Edit : I'm looking for a request minimizing the differences between the old values and the new ones. In other words, a value should never be rounded up if a greater value has been rounded down, and vice-versa
Update:
See this solution explained in more details in the article in my blog:
Rounding numbers preserving their sum
You need to keep cumulative offset for each value:
1.2 (1 + 0.0) ~ 1 1 1.2 +0.2
1.2 (1 + 0.2) ~ 1 2 2.4 +0.4
1.2 (1 + 0.4) ~ 1 3 3.6 +0.6
1.2 (1 + 0.6) ~ 2 5 4.8 -0.2
1.2 (1 - 0.2) ~ 1 6 6.0 0.0
This is easily done in MySQL, but in SQL Server you will have to write a cursor or use cumulative subselects (which are less efficient).
Update:
The query below selects the difference between the sums of the values and of those rounded down to the nearest smaller integer.
This gives us the number (N) of values we should round up.
Then we order the values by their fractional part (ones that are closer to their ceiling go first) and round the first N up, the others down.
SELECT value,
FLOOR(value) + CASE WHEN ROW_NUMBER() OVER (ORDER BY value - FLOOR(value) DESC) <= cs THEN 1 ELSE 0 END AS nvalue
FROM (
SELECT cs, value
FROM (
SELECT SUM(value) - SUM(FLOOR(value)) AS cs
FROM #mytable
) c
CROSS JOIN
#mytable
) q
Here's the script for the test data:
SET NOCOUNT ON
GO
SELECT RAND(0.20090917)
DECLARE #mytable TABLE (value FLOAT NOT NULL)
DECLARE #cnt INT;
SET #cnt = 0;
WHILE #cnt < 100
BEGIN
INSERT
INTO #mytable
VALUES (FLOOR(RAND() * 100) / 10)
SET #cnt = #cnt + 1
END
INSERT
INTO #mytable
SELECT 600 - SUM(value)
FROM #mytable
If you have a list of n values whose elements are accurate only to within an integer value (+-0.5), then any sum of those elements will have a cumulative error or +-(n*0.5). If you have 6 elements in your list which should add up to some number, then your worst case scenario is that you're off by 3 if you just add the integer values.
If you find some way of showing 10.2 as 11 in order to make the sum work, you've changed the precision of that element from +-0.5 to +-0.8, which is counterintuitive when looking at integers?
One possible solution to think about is to round your number during display only (using some format string on your output), not already at the retrieval stage. Each number will be as close as possible to the actual value, but the sum will be more correct too.
Example: If you have 3 values of 1/3 each, displayed as whole-numbered percentages, then you should be showing 33, 33 and 33. To do anything else is to create a margin of error greater than +-0.5 for any individual value. Your total should still be displayed as 100%, because that is the best possible value (as opposed to working with sums of already rounded values)
Also, be aware that by using a float, you've already introduced a limitation on your precision because you have no way of accurately representing 0.1. For more on that, read
What Every Computer Scientist Should Know About Floating-Point Arithmetic
First get the difference between the rounded sum and the actual sum, and the number of records:
declare #Sum float, #RoundedSum float, #Cnt int
select #Sum = sum(bar), #RoundedSum = sum(round(bar)), #Cnt = count(*)
from #tbFoo
Then you spread the difference equally on all values before rounding:
declare #Offset float
set #Offset = (#Sum - #RoundedSum) / #Cnt
select bar = round(bar + #Offset)
from #tbFoo