Allocation via SQL - Retaining repeating decimals for the sum() - sql

I am allocating a single unit across multiple rows using a calculation and storing the results into a table. I am then sum() the allocations and the sums are resulting in numbers that are not whole numbers. What is going on is that some of the allocations are ending up as numbers with repeating decimals, and then the sum of those not adding back up to the whole number (ala 1/3 + 1/3 + 1/3 != 1).
I have tried casting the numbers into different formats, however, Athena keep rounding the decimals at some arbitrary precision resulting in the problem.
I would like the sum of the allocations to equal the sum of the original units.
My Database is AWS Athena which I understand to use the Presto SQL language.
Example of my allocation:
case
when count_of_visits = 1 then 1
when count_of_visits = 2 then .5
when count_of_visits >= 3 then
case
when visit_seq_number = min_visit_seq_number then .4
when visit_seq_number = max_visit_seq_number then .4
else .2 / (count_of_visits - 2 )
end
else 0
end as u_shp_alloc_leads
In this allocation, the first and last visits get 40% of the allocation and all visits in between split 20%
A unit that is being allocated to 29 visits ends up dividing the 20% by 27 which equals 0.00740Repeating. The table is storing 0.007407407407407408 which when I go to sum the numbers the result is 1.0000000000000004 I would like the result to be 1

This is a limitation of databases or computers in general. When you work with fractions like that, some sort of rounding will always take place.
I would apply a reasonable degree of rounding to the x-th decimal on the sums you retrieve from your table, that will just cut off these residual decimals at the end.
If that's not sufficient for you, something you can do to at least theoretically have full precision is to store numerator and denominator separately in two columns. When computing sum( numerator_column/denominator_column ) you will see the same rounding effects, so summing up the numbers would be something a little more complicated like this:
SELECT sum(numerator_sum/denominator)
FROM (
SELECT
denominator,
sum(numerator) as numerator_sum
FROM your_allocation_table
GROUP BY denominator
)

Related

How to solve that snowflake force the very small result of integer division to zero

I'm writing a snowflake query that calculate 1/2940744 and get the result equals to 0
How to solve to get the actual calculation result?
From docs:
Division
When performing division:
The leading digits for the output is the sum of the leading digits of the numerator and the scale of the denominator.
Snowflake minimizes potential overflow in the output (due to chained division) and loss of scale by adding 6 digits to the scale of the numerator, up to a maximum threshold of 12 digits, unless the scale of the numerator is larger than 12, in which case the numerator scale is used as the output scale.
In other words, assuming a division operation with numerator L1.S1 and denominator L2.S2, the maximum number of digits in the output are calculated as follows:
Scale S = max(S1, min(S1 + 6, 12))
If the result of the division operation exceeds the output scale, Snowflake rounds the output (rather than truncating the output).
Returning to example:
SELECT 1/2940744;
-- 0
DESC RESULT LAST_QUERY_ID();
The value 0.00000034005 was rounded to 0. In order to change the behaviour one of the arguments could be explicitly casted:
SELECT 1::NUMBER(38,12)/2940744;
-- 0.00000034005
DESC RESULT LAST_QUERY_ID();
-- 1::NUMBER(38,12)/2940744 NUMBER(38,12)
Thanks for the answer above, I check this answer late and solve the question myself by converting the result to ::double -> 1/5000000::double

How do I calculate the sum efficiently?

Given an integer n such that (1<=n<=10^18)
We need to calculate f(1)+f(2)+f(3)+f(4)+....+f(n).
f(x) is given as :-
Say, x = 1112222333,
then f(x)=1002000300.
Whenever we see a contiguous subsequence of same numbers, we replace it with the first number and zeroes all behind it.
Formally, f(x) = Sum over all (first element of the contiguous subsequence * 10^i ), where i is the index of first element from left of a particular contiguous subsequence.
f(x)=1*10^9 + 2*10^6 + 3*10^2 = 1002000300.
In, x=1112222333,
Element at index '9':-1
and so on...
We follow zero based indexing :-)
For, x=1234.
Element at index-'0':-4,element at index -'1':3,element at index '2':-2,element at index 3:-1
How to calculate f(1)+f(2)+f(3)+....+f(n)?
I want to generate an algorithm which calculates this sum efficiently.
There is nothing to calculate.
Multiplying each position in the array od numbers will yeild thebsame number.
So all you want to do is end up with 0s on a repeated number
IE lets populate some static values in an array in psuedo code
$As[1]='0'
$As[2]='00'
$As[3]='000'
...etc
$As[18]='000000000000000000'```
these are the "results" of 10^index
Given a value n of `1234`
```1&000 + 2&00 +3 & 0 + 4```
Results in `1234`
So, if you are putting this on a chip, then probably your most efficient method is to do a bitwise XOR between each register and the next up the line as a single operation
Then you will have 0s in all the spots you care about, and just retrive the values in the registers with a 1
In code, I think it would be most efficient to do the following
```$n = arbitrary value 11223334
$x=$n*10
$zeros=($x-$n)/10```
Okay yeah we can just do bit shifting to get a value like 100200300400 etc.
To approach this problem, it could help to begin with one digit numbers and see what sum you get.
I mean like this:
Let's say, we define , then we have:
F(1)= 45 # =10*9/2 by Euler's sum formula
F(2)= F(1)*9 + F(1)*100 # F(1)*9 is the part that comes from the last digit
# because for each of the 10 possible digits in the
# first position, we have 9 digits in the last
# because both can't be equal and so one out of ten
# becomse zero. F(1)*100 comes from the leading digit
# which is multiplied by 100 (10 because we add the
# second digit and another factor of 10 because we
# get the digit ten times in that position)
If you now continue with this scheme, for k>=1 in general you get
F(k+1)= F(k)*100+10^(k-1)*45*9
The rest is probably straightforward.
Can you tell me, which Hackerrank task this is? I guess one of the Project Euler tasks right?

Round number up until it is evenly divisible into another number - not limited to integers

As an example I have two numbers (40.25 & 1.88001). In reality the two numbers could be any number with up to five decimal places. However, the second number will always be <= the first number.
I need to round 1.88001 up until it is evenly divisible into 40.25. I need to find a factor of 40.25 that is greater than and closest to 1.88001.
I need to maintain the decimal precision if necessary, meaning I can't just round up to the nearest integer unless a whole number is indeed the closest factor.
I have found other solutions/questions that are similar but have found none that are applicable and/or SQL related.
So far I have a brute force solution that is too slow to be of any real use:
Declare #SheetLength as money = 40.25
Declare #PartLength as money = 1.88001
WHILE #SheetLength % #PartLength > 0
BEGIN
#PartLength = #PartLength + .00001
END
Select #PartLength
There must be a faster/more efficient way to do this ...
What do you mean "round up"? The simplest solution is to take the ratio of the two numbers, take the floor, and divide that into the larger number:
select #SheetLength / floor(#SheetLength / #PartLength)
For your example, this gives 1.91666667.

Error taking int of logs in VBA

When I calculate log(8) / log(2) I get 3 as one would expect:
?log(8)/log(2)
3
However, if I take the int of this calculation like this the result is 2 and thus wrong:
?int(log(8)/log(2))
2
How and why does this happen?
Likely because the actual number returned is of type double. Because floats and doubles cannot accurately represent most base 10 rational numbers the number returned is something like 2.99999999999. Then when you apply int() the .999999999 is truncated.
How floating-point number works: it dedicates a bit for the sign, a few bits to store an exponent, and the rest for the actual fraction. This leads to numbers being represented in a form similar to 1.45 * 10^4; except that instead of the base being 10, it's two.

Why decimal behave differently?

I am doing this small exercise.
declare #No decimal(38,5);
set #No=12345678910111213.14151;
select #No*1000/1000,#No/1000*1000,#No;
Results are:
12345678910111213.141510
12345678910111213.141000
12345678910111213.14151
Why are the results of first 2 selects different when mathematically it should be same?
it is not going to do algebra to convert 1000/1000 to 1. it is going to actually follow the order of operations and do each step.
#No*1000/1000
yields: #No*1000 = 12345678910111213141.51000
then /1000= 12345678910111213.141510
and
#No/1000*1000
yields: #No/1000 = 12345678910111.213141
then *1000= 12345678910111213.141000
by dividing first you lose decimal digits.
because of rounding, the second sql first divides by 1000 which is 12345678910111.21314151, but your decimal is only 38,5, so you lose the last three decimal points.
because when you divide first you get:
12345678910111.21314151
then only six decimal digits are left after point:
12345678910111.213141
then *1000
12345678910111213.141
because the intermediary type is the same as the argument's - in this case decimal(38,5). so dividing first gives you a loss of precision that's reflected in the truncated answer. multiplying by 1000 first doesn't give any loss of precision because that doesn't overload 38 digits.
It's probably because you lose part of data making division first. Notice that #No has 5-point decimal precision so when you divide this number by 1000 you suddenly need 8 digits for decimal part:
123.12345 / 1000 = 0.12312345
So the value has to be rounded (0.12312) and then this value is multiply by 1000 -> 123.12 (you lose 0.00345.
I think that's why the result is what it is...
The first does #No*1000 then divides it by 1000. The intermediates values are always able to represent all the decimal places. The second expression first divides by 1000, which throws away the last two decimal places, before multiplying back to the original value.
You can get around the problem by using CONVERT or CAST on the first value in your expression to increase the number of decimal places and avoid a loss of precision.
DECLARE #num decimal(38,5)
SET #num = 12345678910111213.14151
SELECT CAST(#num AS decimal(38,8)) / 1000 * 1000