Precise Multiplication - multiplication

first post!
I have a problem with a program that i'm writing for a numerical simulation and I have a problem with the multiplication. Basically, I am trying to calculate:
result1 = (a + b)*c
and this loops thousands of times. I need to expand this code to be
result2 = a*c + b*c
However, when I do that I start to get significant errors in my results. I used a high precision library, which did improve things, but the simulation ran horribly slow (the simulation took 50 times longer) and it really isn't a practical solution. From this I realised that it isn't really the precision of the variables a, b, & c that is hurting me, but something in the way the multiplication is done.
My question is: how can I multiply out these brackets in way so that result1 = result2?
Thanks.
SOLVED!!!!!!!!!
It was a problem with the addition. So i reordered the terms and applied Kahan addition by writing the following piece of code:
double Modelsimple::sum(double a, double b, double c, double d) {
//reorder the variables in order from smallest to greatest
double tempone = (a<b?a:b);
double temptwo = (c<d?c:d);
double tempthree = (a>b?a:b);
double tempfour = (c>d?c:d);
double one = (tempone<temptwo?tempone:temptwo);
double four = (tempthree>tempfour?tempthree:tempfour);
double tempfive = (tempone>temptwo?tempone:temptwo);
double tempsix = (tempthree<tempfour?tempthree:tempfour);
double two = (tempfive<tempsix?tempfive:tempsix);
double three = (tempfive>tempsix?tempfive:tempsix);
//kahan addition
double total = one;
double tempsum = one + two;
double error = (tempsum - one) - two;
total = tempsum;
// first iteration complete
double tempadd = three - error;
tempsum = total + tempadd;
error = (tempsum - total) - tempadd;
total = tempsum;
//second iteration complete
tempadd = four - error;
total += tempadd;
return total;
}
This gives me results that are as close to the precise answer as makes no difference. However, in a fictitious simulation of a mine collapse, the code with the Kahan addition takes 2 minutes whereas the high precision library takes over a day to finish!!
Thanks to all the help here. This problem was really a pain in the a$$.

I am presuming your numbers are all floating point values.
You should not expect result1 to equal result2 due to limitations in the scale of the numbers and precision in the calculations. Which one to use will depend upon the numbers you are dealing with. More important than result1 and result2 being the same is that they are close enough to the real answer (eg that you would have calculated by hand) for your application.
Imagine that a and b are both very large, and c much less than 1. (a + b) might overflow so that result1 will be incorrect. result2 would not overflow because it scales everything down before adding.
There are also problems with loss of precision when combining numbers of widely differing size, as the smaller number has significant digits reduced when it is converted to use the same exponent as the larger number it is added to.
If you give some specific examples of a, b and c which are causing you issues it might be possible to suggest further improvements.

I have been using the following program as a test, using values for a and b between 10^5 and 10^10, and c around 10^-5, but so far cannot find any differences.
Thinking about the storage of 10^5 vs 10^10, I think it requires about 13 bits vs 33 bits, so you may lose about 20 bits of precision when you add a and b together in result1.
But multiplying them by the same value c essentially reduces the exponent but leaves the significand the same, so it should also lose about 20 bits of precision in result2.
A double significand usually stores 53 bits, so I suspect your results will still retain 33 bits, or about 10 decimal digits of precision.
#include <stdio.h>
int main()
{
double a = 13584.9484893449;
double b = 43719848748.3911;
double c = 0.00001483394434;
double result1 = (a+b)*c;
double result2 = a*c + b*c;
double diff = result1 - result2;
printf("size of double is %d\n", sizeof(double));
printf("a=%f\nb=%f\nc=%f\nr1=%f\nr2=%f\ndiff=%f\n",a,b,c,result1,result2,diff);
}
However I do find a difference if I change all the doubles to float and use c=0.00001083394434. Are you sure that you are using 64 (or 80) bit doubles when doing your calculations?

Usually "loss of precision" in these kinds of calculations can be traced to "poorly formulated problem". For example, when you have to add a series of numbers of very different sizes, you will get a different answer depending on the order in which you sum them. The problem is even more acute when you subtract numbers.
The best approach in your case above is to look not simply at this one line, but at the way that result1 is used in your subsequent calculations. In principle, an engineering calculation should not require precision in the final result beyond about three significant figures; but in many instances (for example, finite element methods) you end up subtracting two numbers that are very similar in magnitude - in which case you may lose many significant figures and get a meaningless answer. Given that you are talking about "materials properties" and "strain", I am suspecting that is actually at the heart of your problem.
One approach is to look at places where you compute a difference, and see if you can reformulate your problem (for example, if you can differentiate your function, you can replace Y(x+dx)-Y(x) with dx * Y(x)'.
There are many excellent references on the subject of numerical stability. It is a complicated subject. Just "throwing more significant figures at the problem" is almost never the best solution.

Related

Kotlin: Why these two implementations of log base 10 give different results on the specific imputs?

println(log(it.toDouble(), 10.0).toInt()+1) // n1
println(log10(it.toDouble()).toInt() + 1) // n2
I had to count the "length" of the number in n-base for non-related to the question needs and stumbled upon a bug (or rather unexpected behavior) that for it == 1000 these two functions give different results.
n1(1000) = 3,
n2(1000) = 4.
Checking values before conversion to int resulted in:
n1_double(1000) = 3.9999999999999996,
n2_double(1000) = 4.0
I understand that some floating point arithmetics magic is involved, but what is especially weird to me is that for 100, 10000 and other inputs that I checked n1 == n2.
What is special about it == 1000? How I ensure that log gives me the intended result (4, not 3.99..), because right now I can't even figure out what cases I need to double-check, since it is not just powers of 10, it is 1000 (and probably some other numbers) specifically.
I looked into implementation of log() and log10() and log is implemented as
if (base <= 0.0 || base == 1.0) return Double.NaN
return nativeMath.log(x) / nativeMath.log(base) //log() here is a natural logarithm
while log10 is implemented as
return nativeMath.log10(x)
I suspect this division in the first case is the reason of an error, but I can't figure out why it causes an error only in specific cases.
I also found this question:
Python math.log and math.log10 giving different results
But I already know that one is more precise than another. However there is no analogy for log10 for some base n, so I'm curious of reason WHY it is specifically 1000 that goes wrong.
PS: I understand there are methods of calculating length of a number without fp arithmetics and log of n-base, but at this point it is a scientific curiosity.
but I can't figure out why it causes an error only in specific cases.
return nativeMath.log(x) / nativeMath.log(base)
//log() here is a natural logarithm
Consider x = 1000 and nativeMath.log(x). The natural logarithm is not exactly representable. It is near
6.90775527898213_681... (Double answer)
6.90775527898213_705... (closer answer)
Consider base = 10 and nativeMath.log(base). The natural logarithm is not exactly representable. It is near
2.302585092994045_901... (Double)
2.302585092994045_684... (closer answer)
The only exactly correct nativeMath.log(x) for a finite x is when x == 1.0.
The quotient of the division of 6.90775527898213681... / 2.302585092994045901... is not exactly representable. It is near 2.9999999999999995559...
The conversion of the quotient to text is not exact.
So we have 4 computation errors with the system giving us a close (rounded) result instead at each step.
Sometimes these rounding errors cancel out in a way we find acceptable and the value of "3.0" is reported. Sometimes not.
Performed with higher precision math, it is easy to see log(1000) was less than a higher precision answer and that log(10) was more. These 2 round-off errors in the opposite direction for a / contributed to the quotient being extra off (low) - by 1 ULP than hoped.
When log(x, 10) is computed for other x = power-of-10, and the log(x) is slightly more than than a higher precision answer, I'd expect the quotient to less often result in a 1 ULP error. Perhaps it will be 50/50 for all powers-of-10.
log10(x) is designed to compute the logarithm in a different fashion, exploiting that the base is 10.0 and certainly exact for powers-of-10.

How to handle precision problems of floating point numbers?

I am using Firebird 3.0.4 (both in Windows and Linux) and I have the following procedure that clearly demonstrates my problem with floating point numbers, and that also demonstrates a possible workaround:
create or alter procedure test_float returns (res double precision,
res1 double precision,
res2 double precision)
as
declare variable z1 double precision;
declare variable z2 double precision;
declare variable z3 double precision;
begin
z1=15;
z2=1.1;
z3=0.49;
res=z1*z2*z3; /* one expects res to be 8.085, but internally, inside the procedure
it is represented as 8.084999999999.
The procedure-internal representation is repaired when then
res is sent to the output of the procedure, but the procedure-internal
representation (which is worng) impacts the further calculations */
res1=round(res, 2);
res2=round(round(res, 8), 2);
suspend;
end
On can see the result of the procedure with:
select proc.res, proc.res1, proc.res2
from test_float proc
The result is
RES RES1 RES2
8,085 8,08 8,09
But one can expect that RES2 should be 8.09.
One can clearly see that the internal representation of the res contains 8.0849999 (e.g. one can assign res to the exception message and then raise this exception), it is repaired during output but it leads to the failed calculations when such variable is used in the further calculations.
RES2 demonstrates the repair: I can always apply ROUND(..., 8) to repair the internal representation. I am ready to go with this solution, but my question is - is it acceptable workaround (when the outer ROUND is with strictly less than 5 decimal places) or is there better workaround.
All my tests pass with this workaround, but the feeling is bad.
Of course, I know the minimum that every programmer should know about floats (there is article about that) and I know that one should not use double for business calculations.
This is an inherent problem with calculating with floating point numbers, and is not specific to Firebird. The problem is that the calculation of 15 * 1.1 * 0.49 using double precision numbers is not exactly 8.085. In fact, if you would do 8.085 - RES, you'd get a value that is (approximately) 1.776356839400251e-015 (although likely your client will just present it as 0.00000000).
You would get similar results in different languages. For example, in Java
DecimalFormat df = new DecimalFormat("#.00");
df.format(15 * 1.1 * 0.49);
will also produce 8.08 for exactly the same reason.
Also, if you would change the order of operations, you would get a different result. For example using 15 * 0.49 * 1.1 would produce 8.085 and round to 8.09, so the actual results would match your expectations.
Given round itself also returns a double precision, this isn't really a good way to handle this in your SQL code, because the rounded value with a higher number of decimals might still yield a value slightly less than what you'd expect because of how floating point numbers work, so the double round may still fail for some numbers even if the presentation in your client 'looks' correct.
If you purely want this for presentation purposes, it might be better to do this in your frontend, but alternatively you could try tricks like adding a small value and casting to decimal, for example something like:
cast(RES + 1e-10 as decimal(18,2))
However this still has rounding issues, because it is impossible to distinguish between values that genuinely are 8.08499999999 (and should be rounded down to 8.08), and values where the result of calculation just happens to be 8.08499999999 in floating point, while it would be 8.085 in exact numerics (and therefor need to be rounded up to 8.09).
In a similar vein, you could try to use double casting to decimal (eg cast(cast(res as decimal(18,3)) as decimal(18,2))), or casting the decimal and then rounding (eg round(cast(res as decimal(18,3)), 2). This would be a bit more consistent than double rounding because the first cast will convert to exact numerics, but again this has similar downside as mentioned above.
Although you don't want to hear this answer, if you want exact numeric semantics, you shouldn't be using floating point types.

Why write 1,000,000,000 as 1000*1000*1000 in C?

In code created by Apple, there is this line:
CMTimeMakeWithSeconds( newDurationSeconds, 1000*1000*1000 )
Is there any reason to express 1,000,000,000 as 1000*1000*1000?
Why not 1000^3 for that matter?
One reason to declare constants in a multiplicative way is to improve readability, while the run-time performance is not affected.
Also, to indicate that the writer was thinking in a multiplicative manner about the number.
Consider this:
double memoryBytes = 1024 * 1024 * 1024;
It's clearly better than:
double memoryBytes = 1073741824;
as the latter doesn't look, at first glance, the third power of 1024.
As Amin Negm-Awad mentioned, the ^ operator is the binary XOR. Many languages lack the built-in, compile-time exponentiation operator, hence the multiplication.
There are reasons not to use 1000 * 1000 * 1000.
With 16-bit int, 1000 * 1000 overflows. So using 1000 * 1000 * 1000 reduces portability.
With 32-bit int, the following first line of code overflows.
long long Duration = 1000 * 1000 * 1000 * 1000; // overflow
long long Duration = 1000000000000; // no overflow, hard to read
Suggest that the lead value matches the type of the destination for readability, portability and correctness.
double Duration = 1000.0 * 1000 * 1000;
long long Duration = 1000LL * 1000 * 1000 * 1000;
Also code could simple use e notation for values that are exactly representable as a double. Of course this leads to knowing if double can exactly represent the whole number value - something of concern with values greater than 1e9. (See DBL_EPSILON and DBL_DIG).
long Duration = 1000000000;
// vs.
long Duration = 1e9;
Why not 1000^3?
The result of 1000^3 is 1003. ^ is the bit-XOR operator.
Even it does not deal with the Q itself, I add a clarification. x^y does not always evaluate to x+y as it does in the questioner's example. You have to xor every bit. In the case of the example:
1111101000₂ (1000₁₀)
0000000011₂ (3₁₀)
1111101011₂ (1003₁₀)
But
1111101001₂ (1001₁₀)
0000000011₂ (3₁₀)
1111101010₂ (1002₁₀)
For readability.
Placing commas and spaces between the zeros (1 000 000 000 or 1,000,000,000) would produce a syntax error, and having 1000000000 in the code makes it hard to see exactly how many zeros are there.
1000*1000*1000 makes it apparent that it's 10^9, because our eyes can process the chunks more easily. Also, there's no runtime cost, because the compiler will replace it with the constant 1000000000.
For readability. For comparison, Java supports _ in numbers to improve readability (first proposed by Stephen Colebourne as a reply to Derek Foster's PROPOSAL: Binary Literals for Project Coin/JSR 334) . One would write 1_000_000_000 here.
In roughly chronological order, from oldest support to newest:
XPL: "(1)1111 1111" (apparently not for decimal values, only for bitstrings representing binary, quartal, octal or hexadecimal values)
PL/M: 1$000$000
Ada: 1_000_000_000
Perl: likewise
Ruby: likewise
Fantom (previously Fan): likewise
Java 7: likewise
Swift: (same?)
Python 3.6
C++14: 1'000'000'000
It's a relatively new feature for languages to realize they ought to support (and then there's Perl). As in chux#'s excellent answer, 1000*1000... is a partial solution but opens the programmer up to bugs from overflowing the multiplication even if the final result is a large type.
Might be simpler to read and get some associations with the 1,000,000,000 form.
From technical aspect I guess there is no difference between the direct number or multiplication. The compiler will generate it as constant billion number anyway.
If you speak about objective-c, then 1000^3 won't work because there is no such syntax for pow (it is xor). Instead, pow() function can be used. But in that case, it will not be optimal, it will be a runtime function call not a compiler generated constant.
To illustrate the reasons consider the following test program:
$ cat comma-expr.c && gcc -o comma-expr comma-expr.c && ./comma-expr
#include <stdio.h>
#define BILLION1 (1,000,000,000)
#define BILLION2 (1000^3)
int main()
{
printf("%d, %d\n", BILLION1, BILLION2);
}
0, 1003
$
Another way to achieve a similar effect in C for decimal numbers is to use literal floating point notation -- so long as a double can represent the number you want without any loss of precision.
IEEE 754 64-bit double can represent any non-negative integer <= 2^53 without problem. Typically, long double (80 or 128 bits) can go even further than that. The conversions will be done at compile time, so there is no runtime overhead and you will likely get warnings if there is an unexpected loss of precision and you have a good compiler.
long lots_of_secs = 1e9;

Objective C, division between floats not giving an exact answer

Right now I have a line of code like this:
float x = (([self.machine micSensitivity] - 0.0075f) / 0.00025f);
Where [self.machine micSensitivity] is a float containing the value 0.010000
So,
0.01 - 0.0075 = 0.0025
0.0025 / 0.00025 = 10.0
But in this case, it keeps returning 9.999999
I'm assuming there's some kind of rounding error but I can't seem to find a clean way of fixing it. micSensitivity is incremented/decremented by 0.00025 and that formula is meant to return a clean integer value for the user to reference so I'd rather get the programming right than just adding 0.000000000001.
Thanks.
that formula is meant to return a clean integer value for the user to reference
If that is really important to you, then why do you not multiply all the numbers in this story by 10000, coerce to int, and do integer arithmetic?
Or, if you know that the answer is arbitrarily close to an integer, round to that integer and present it.
Floating-point arithmetic is binary, not decimal. It will almost always give rounding errors. You need to take that into account. "float" has about six digit precision. "double" has about 15 digits precision. You throw away nine digits precision for no reason.
Now think: What do you want to display? What do you want to display if the result of your calculation is 9.999999999? What would you want to display if the result is 9.538105712?
None of the numbers in your question, except 10.0, can be exactly represented in a float or a double on iOS. If you want to do float math with those numbers, you will have rounding errors.
You can round your result to the nearest integer easily enough:
float x = rintf((self.machine.micSensitivity - 0.0075f) / 0.00025f);
Or you can just multiply all your numbers, including the allowed values of micSensitivity, by 4000 (which is 1/0.00025), and thus work entirely with integers.
Or you can change the allowed values of micSensitivity so that its increment is a fraction whose denominator is a power of 2. For example, if you use an increment of 0.000244140625 (which is 2-12), and change 0.0075 to 0.00732421875 (which is 30 * 2-12), you should get exact results, as long as your micSensitivity is within the range ±4096 (since 4096 is 212 and a float has 24 bits of significand).
The code you have posted is correct and functioning properly. This is a known side effect of using floating point arithmetic. See the wiki on floating point accuracy problems for a dull explanation as to why.
There are several ways to work around the problem depending on what you need to use the number for.
If you need to compare two floats, then most everything works OK: less than and greater than do what you would expect. The only trouble is testing if two floats are equal.
// If x and y are within a very small number from each other then they are equal.
if (fabs(x - y) < verySmallNumber) { // verySmallNumber is usually called epsilon.
// x and y are equal (or at least close enough)
}
If you want to print a float, then you can specify a precision to round to.
// Get a string of the x rounded to five digits of precision.
NSString *xAsAString = [NSString stringWithFormat:#"%.5f", x];
9.999999 is equal 10. there is prove:
9.999999 = x then 10x = 99.999999 then 10x-x = 9x = 90 then x = 10

Determing longest repeating cycle in a decimal expansion

Today I encountered this article about decimal expansion and I was instantaneously inspired to rework my solution on Project Euler Problem 26 to include this new knowledge of math for a more effecient solution (no brute forcing). In short the problem is to find the value of d ranging 1-1000 that would maximize the length of the repeating cycle in the expression "1/d".
Without making any further assumptions about the problem that could further improve the effecienty of solving the problem I decided to stick with
10^s=10^(s+t) (mod n)
which allows me for any value of D to find the longest repeating cycle (t) and the starting point for the cycle (s).
The problem is that eksponential part of the equation, since this will generate extremely large values before they're reduced by using modulus. No integral value can handle this large values, and the floating point data types seemes to be calculating wrong.
I'm using this code currently:
Private Function solveDiscreteLogarithm(ByVal D As Integer) As Integer
Dim NumberToIndex As New Dictionary(Of Long, Long)()
Dim maxCheck As Integer = 1000
For index As Integer = 1 To maxCheck
If (Not NumberToIndex.ContainsKey((10 ^ index) Mod D)) Then
NumberToIndex.Add((10 ^ index) Mod D, index)
Else
Return index - NumberToIndex((10 ^ index) Mod D)
End If
Next
Return -1
End Function
which at some point will compute "(10^47) mod 983" resulting in 783 which is not the correct result. The correct result should have been 732. I'm assuming it's because I'm using integral data types and it's causing overflow. I tried using double instead, but that gave even stranger results.
So what are my options?
Instead of using ^ to do your powers, I would do a for loop using multiplication and then taking the mod of the number as you go along by using a conditional to check if the number calculated is greater than the mod. This helps to keep the numbers smaller and within range of your mod number.
I'll give you a hint from my own solution to this.
With each decimal expansion of the fraction, you end up with a remainder, which if multiplied by the current decimal place, is an integer. Since this remainder is all you need to determine the next decimal expansion, you can use it to make predictions about the subsequent expansion.
See my post for this other question, getting the nth digit of a fraction, you may find some useful leads on what to try. (Methinks the answer is the largest prime less than 1000.) (Correction: the largest prime or Carmichael number less than 1000.)