How to maintain high decimal precision in VBA Excel - vba

I am converting a FORTRAN code that maintains at least 16 decimal precision.
I am facing a problem of dividing by zero in VBA excel, but If I try the code below on this online compiler, I do not get a zero.
Any help is appreciated. Thanks in advance.
This is the fortran Code
program sum
IMPLICIT DOUBLE PRECISION (A-H,O-Z)
x = 3.14159265358979
y = 1.24325643454325
z = (x*y)/dtan(0.0D0)
print *, datan(.04D0*z)
end program sum
This is the VBA Code
Public Function dosomething()
Dim X As Double
Dim Y As Double
Dim Z As Double
X = 3.14159265358979
Y = 1.24325643454325
Z = (X * Y) / Tan(0#)
End Function

Without going into too many details, Fortran can support real values (floats) of +/-Infinity and NaN, depending on how the value is calculated. For example, your original post contained two uninitialized variables which you then used to calculate (v1 * v2)/dtan(0.0d0). Since uninitialized vars are often (but not always) set to 0, this calculation becomes 0.0/0.0, which is mathematically undefined and the result is NaN.1
Now, if the numerator is positive, z=(x*y)/dtan(0.0D0) results in z=Infinity, regardless of what x and y are. If your system cannnot represent Infinity, then it uses "a very large number". That is evidently the case with VBA.
Finally, you calculate datan(.04D0*z). Mathematically, this is arctangent(Infinity)=PI/2. And again, the correctly computed Fortran results match this, returning a double-precision value of 1.57079632679490.2
Now, I don't know much about VBA, but it does not seem to support +/-Infinity or NaN. If a "very large number" results in significant error compared to what you are expecting in your final result, then it appears there are workarounds as described at this SO question.
1 Note that in Fortran with double precision you should get dtan(0.0d0) = 0.000000000000000E+000.
2
In order to maintain double-precision in the Fortran x and y variables, you must append d0. Otherwise they will become single-precision values by default and store only the first 7 sig figs from your original assignment, and it's up to the compiler what the remaining digits in the double-precision value take (usually just garbage).

Z = (X * Y) / Tan(0#)
The type hint on the 0 literal is superfluous, Tan function takes a Double and returns a Double. But Tan(0) returns 0, so you are dividing by 0.
Seems your online Fortran compiler is doing something funky.
it should not be zero tho, it should be tan(1E-16)
No. That's mathematically wrong, VBA is doing it right. If you need your VBA code to be just as broken as that Fortran, then you need to handle the situation explicitly:
Z = (X * Y) / Tan(1E-16)
But just know that is mathematically wrong. I've not idea how the Fortran code manages to output 1.5707963267948966. This VBA code outputs 3.90580528128931E+16.

Related

How do I test binary floating point math functions

I implemented the powf(float x, float y) math function. This function is a binary floating point operation. I need to test it for correctness,but the test can't iterate over all floating point. what should I do.
Consider 2 questions:
How do I test binary floating point math functions?
Break FP values into groups:
Largest set: Normal values including +/- values near 1.0 and near the extremes as well as randomly selected ones.
Subnormals
Zeros: +0.0, -0.0
NANs
Use at least combinations of 100,000s+ sample values from the first set (including +/-min, +/-1.0, +/-max), 1,000s from the second set (including +/-min, +/-max) and -0.0, +0.0, -NAN, +NAN.
Additional tests for the function's edge cases.
How do I test powf()?
How: Test powf() against pow() for result correctness.
Values to test against: powf() has many concerns.
*pow(x,y) functions are notoriously difficult to code well. The error in little sub-calculations errors propagate to large final errors.
*pow() includes expected integral results with integral value arguments. E.g. pow(2, y) is expected to be exact for all in range results. pow(10, y) is expected to be within 0.5 unit in the last place for all y in range.
*pow() includes expected integer results with negative x.
There is little need to test every x, y combination. Consider every x < 0, y non-whole number value leads to a NAN.
z = powf(x,y) readily underflows to 0.0. Testing of x, y, values near a result of z == 0 needs some attention.
z = powf(x,y) readily overflows to ∞. Testing of x, y, values near a result of z == FMT_MAX needs more attention as a slight error result in FLT_MAX vs. INF. Since overflow is so rampant with powf(x,y), this reduces the numbers of combinations needed as it is the edge that is important and larger values need light testing.

Units conversion on a PIC 18F2431

I have the following conversion given Pressure per Square Inch (PSI) and Megapascals (MPa):
psi = MPa*1.45038;
I need the lowest value possible after conversion to be 1 PSI. An example of what I am looking for is:
psi = ((long)MPa*145)/100
Is there anyway to optimize this for memory and speed by not using float or long? I will be implementing this conversion on a microcontroller (PIC 18F2431).
You should divide by powers of 2 instead which is far cheaper than division by any other values. And if the type can't be negative then use an unsigned type instead. Depending on the type of MPa and its maximum value you can choose different denominator to suite your needs. No need to cast to a wider type if the multiplication won't overflow
For example if MPa is of type uint16_t you can do psi = MPa*95052/(1UL << 16); (95052/65536 ≈ 1.450378)
If MPa is not larger than 1024 or 210 then you can multiply it with 221 without overflowing, thus you can increase the numerator/denominator for more precision
psi = MPa*3041667/(1UL << 21);
Edit:
On the PIC 18F2431 int is a 16-bit type. That means 95052 will be of type long and MPa will be promoted to long in the expression. If you don't need much precision then change the scaling to fit in an int/int16_t to avoid dealing with long
In case MPa is not larger than 20 you can divide it by 2048 which is the largest power of 2 that is less than or equal to 216/20.
psi = MPa*2970/(1U << 11);
Note that the * and / have equal precedence so it'll be evaluated from left to right, and the above equation will be identical to
psi = (MPa*2970)/2048; // = MPa*1.4501953125)
no need for such excessive parentheses
Edit 2:
Unfortunately if the range of MPa is [0, 2000] then you can only multiply it by 32 without overflowing a 16-bit unsigned int. The closest ratio that you can achieve is 46/32 = 1.4375 so if you need more precision, there's no way other than using long. Anyway integer math with long is still a lot faster than floating-point math on the PIC MCU, and cost significantly less code space
Calculate the largest N such that MPa*1.45038*2^N < 2^32
Calculate the constant value of K = floor(1.45038*2^N) once
Calculate the value of psi = (MPa*K)>>N for every value of MPa
Since 0 <= MPa <= 2000, you must choose N such that 2000*1.45038*2^N < 2^32:
2^N < 2^32/(2000*1.45038)
N < log(2^32/(2000*1.45038))
N < 20.497
N = 20
Therefore, K = floor(1.45038*2^N) = floor(1.45038*2^20) = 1520833.
So for every value of MPa, you can calculate psi = (MPa*1520833)>>20.
You'll need to make sure that MPa is unsigned (or cast it accordingly).
Using this method will allow you to avoid floating-point operations.
For better accuracy, you can use round instead of floor, giving you K = 1520834.
In this specific case it will work out fine, because 2000*1520834 is smaller than 2^32.
But with a different maximum value of MPa or a different conversion scalar, it might not.
In any case, the difference in the outcome of psi for each value of K is neglectable.
BTW, if you don't mind the additional memory, then you can use a look-up table instead:
Add a pre-calculated global variable const unsigned short lut[2001] = {...}
For every value of MPa, calculate psi = lut[MPa] instead of psi = (MPa*K)>>N
So instead of mul + shift operations, your program will perform a load operation
Please note, however, that whether or not this is more efficient in terms of runtime performance depends on several things, such as the accessibility of the memory segment in which you allocate the look-up table, the architecture of the processor at hand, runtime caching heuristics, etc.
So you will need to apply some profiling on your program in order to decide which approach is better.

Differences between mult and div operations on floating point numbers

Is any difference in computation precision for these 2 cases:
1) x = y / 1000d;
2) x = y * 0.001d;
Edit: Shoudn't add C# tag. Question is only from 'floating-point' point of view. I don't wanna know what is faster, I need to know what case will give me 'better precision'.
No, they're not the same - at least not with C#, using the version I have on my machine (just standard .NET 4.5.1) on my processor - there are enough subtleties involved that I wouldn't like to claim it'll do the same on all machines, or with all languages. This may very well be a language-specific question after all.
Using my DoubleConverter class to show the exact value of a double, and after a few bits of trial and error, here's a C# program which at least on my machine shows a difference:
using System;
class Program
{
static void Main(string[] args)
{
double input = 9;
double x1 = input / 1000d;
double x2 = input * 0.001d;
Console.WriteLine(x1 == x2);
Console.WriteLine(DoubleConverter.ToExactString(x1));
Console.WriteLine(DoubleConverter.ToExactString(x2));
}
}
Output:
False
0.00899999999999999931998839741709161899052560329437255859375
0.009000000000000001054711873393898713402450084686279296875
I can reproduce this in C with the Microsoft C compiler - apologies if it's horrendous C style, but I think it at least demonstrates the differences:
#include <stdio.h>
void main(int argc, char **argv) {
double input = 9;
double x1 = input / 1000;
double x2 = input * 0.001;
printf("%s\r\n", x1 == x2 ? "Same" : "Not same");
printf("%.18f\r\n", x1);
printf("%.18f\r\n", x2);
}
Output:
Not same
0.008999999999999999
0.009000000000000001
I haven't looked into the exact details, but it makes sense to me that there is a difference, because dividing by 1000 and multiplying by "the nearest double to 0.001" aren't the same logical operation... because 0.001 can't be exactly represented as a double. The nearest double to 0.001 is actually:
0.001000000000000000020816681711721685132943093776702880859375
... so that's what you end up multiplying by. You're losing information early, and hoping that it corresponds to the same information that you lose otherwise by dividing by 1000. It looks like in some cases it isn't.
you are programming in base 10 but the floating point is base 2 you CAN represent 1000 in base 2 but cannot represent 0.001 in base 2 so you have chosen bad numbers to ask your question, on a computer x/1000 != x*0.001, you might get lucky most of the time with rounding and more precision but it is not a mathematical identity.
Now maybe that was your question, maybe you wanted to know why x/1000 != x*0.001. And the answer to that question is because this is a binary computer and it uses base 2 not base 10, there are conversion problems with 0.001 when going to base 2, you cannot exactly represent that fraction in an IEEE floating point number.
In base 10 we know that if we have a fraction with a factor of 3 in the denominator (and lacking one in the numerator to cancel it out) we end up with an infinitely repeated pattern, basically we cannot accurately represent that number with a finite set of digits.
1/3 = 0.33333...
Same problem when you try to represent 1/10 in base 2. 10 = 2*5 the 2 is okay 1/2, but the 5 is the real problem 1/5.
1/10th (1/1000 works the same way). Elementary long division:
0 000110011
----------
1010 | 1.000000
1010
------
1100
1010
----
10000
1010
----
1100
1010
----
10
we have to keep pulling down zeros until we get 10000 10 goes into 16 one time, remainder 6, drop the next zero. 10 goes into 12 1 time remainder 2. And we repeat the pattern so you end up with this 001100110011 repeated forever. Floating point is a fixed number of bits, so we cannot represent an infinite pattern.
Now if your question has to do with something like is dividing by 4 the same as multiplying by 1/4th. That is a different question. Aanswer is it should be the same, consumes more cycles and/or logic to do a divide than multiply but works out with the same answer in the end.
Probably not. The compiler (or the JIT) is likely to convert the first case to the second anyway, since multiplication is typically faster than division. You would have to check this by compiling the code (with or without optimizations enabled) and then examining the generated IL with a tool like IL Disassembler or .NET Reflector, and/or examining the native code with a debugger at runtime.
No, there is no any difference. Except if you set custom rounding mode.
gcc produces ((double)0.001 - (double)1.0/1000) == 0.0e0
When compiler converts 0.001 to binary it divides 1 by 1000. It uses software floating point simulation compatible with target architecture to do this.
For high precision there are long double (80-bit) and software simulation of any precision.
PS I used gcc for 64 bit machine, both sse and x87 FPU.
PPS With some optimizations 1/1000.0 could be more precise on x87 since x87 uses 80-bit internal representation and 1000 == 1000.0. It is true if you use result for next calculations promptly. If you return/write to memory it calculates 80-bit value and then rounds it to 64-bit. But SSE is more common to use for double.

sqrtf(6)*sqrtf(6) != 6 in xcode

I'm trying to create a software that shows the angle between two vectors and it's not working when then are equal to (1,1,2), hence the modulus of this vector is sqrtf(6) which is rouding to 2.449490 and it should be 2.44948974278318.
Is there a way to increase precision of this operation?
In the next steps of my software I make this operation:
float angle = acos(dot/(modulus1*modulus2));
If modulus1 == modulus 2, then modulus1*modulus2 = dot, but it's not happening with some values.
I hope I made myself clear.
Thanks in advance,
Gruber
You can use double if you want greater precision. However, note that the == operation on floating point numbers never work the way they do with integral types. Use an epsilon to adjust for minor differences.

Normal Distribution function

edit
So based on the answers so far (thanks for taking your time) I'm getting the sense that I'm probably NOT looking for a Normal Distribution function. Perhaps I'll try to re-describe what I'm looking to do.
Lets say I have an object that returns a number of 0 to 10. And that number controls "speed". However instead of 10 being the top speed, I need 5 to be the top speed, and anything lower or higher would slow down accordingly. (with easing, thus the bell curve)
I hope that's clearer ;/
-original question
These are the times I wish I remembered something from math class.
I'm trying to figure out how to write a function in obj-C where I define the boundries, ex (0 - 10) and then if x = foo y = ? .... where x runs something like 0,1,2,3,4,5,6,7,8,9,10 and y runs 0,1,2,3,4,5,4,3,2,1,0 but only on a curve
Something like the attached image.
I tried googling for Normal Distribution but its way over my head. I was hoping to find some site that lists some useful algorithms like these but wasn't very successful.
So can anyone help me out here ? And if there is some good sites which shows useful mathematical functions, I'd love to check them out.
TIA!!!
-added
I'm not looking for a random number, I'm looking for.. ex: if x=0 y should be 0, if x=5 y should be 5, if x=10 y should be 0.... and all those other not so obvious in between numbers
alt text http://dizy.cc/slider.gif
Okay, your edit really clarifies things. You're not looking for anything to do with the normal distribution, just a nice smooth little ramp function. The one Paul provides will do nicely, but is tricky to modify for other values. It can be made a little more flexible (my code examples are in Python, which should be very easy to translate to any other language):
def quarticRamp(x, b=10, peak=5):
if not 0 <= x <= b:
raise ValueError #or return 0
return peak*x*x*(x-b)*(x-b)*16/(b*b*b*b)
Parameter b is the upper bound for the region you want to have a slope on (10, in your example), and peak is how high you want it to go (5, in the example).
Personally I like a quadratic spline approach, which is marginally cheaper computationally and has a different curve to it (this curve is really nice to use in a couple of special applications that don't happen to matter at all for you):
def quadraticSplineRamp(x, a=0, b=10, peak=5):
if not a <= x <= b:
raise ValueError #or return 0
if x > (b+a)/2:
x = a + b - x
z = 2*(x-a)/b
if z > 0.5:
return peak * (1 - 2*(z-1)*(z-1))
else:
return peak * (2*z*z)
This is similar to the other function, but takes a lower bound a (0 in your example). The logic is a little more complex because it's a somewhat-optimized implementation of a piecewise function.
The two curves have slightly different shapes; you probably don't care what the exact shape is, and so could pick either. There are an infinite number of ramp functions meeting your criteria; these are two simple ones, but they can get as baroque as you want.
The thing you want to plot is the probability density function (pdf) of the normal distribution. You can find it on the mighty Wikipedia.
Luckily, the pdf for a normal distribution is not difficult to implement - some of the other related functions are considerably worse because they require the error function.
To get a plot like you showed, you want a mean of 5 and a standard deviation of about 1.5. The median is obviously the centre, and figuring out an appropriate standard deviation given the left & right boundaries isn't particularly difficult.
A function to calculate the y value of the pdf given the x coordinate, standard deviation and mean might look something like:
double normal_pdf(double x, double mean, double std_dev) {
return( 1.0/(sqrt(2*PI)*std_dev) *
exp(-(x-mean)*(x-mean)/(2*std_dev*std_dev)) );
}
A normal distribution is never equal to 0.
Please make sure that what you want to plot is indeed a
normal distribution.
If you're only looking for this bell shape (with the tangent and everything)
you can use the following formula:
x^2*(x-10)^2 for x between 0 and 10
0 elsewhere
(Divide by 125 if you need to have your peek on 5.)
double bell(double x) {
if ((x < 10) && (x>0))
return x*x*(x-10.)*(x-10.)/125.;
else
return 0.;
}
Well, there's good old Wikipedia, of course. And Mathworld.
What you want is a random number generator for "generating normally distributed random deviates". Since Objective C can call regular C libraries, you either need a C-callable library like the GNU Scientific Library, or for this, you can write it yourself following the description here.
Try simulating rolls of dice by generating random numbers between 1 and 6. If you add up the rolls from 5 independent dice rolls, you'll get a surprisingly good approximation to the normal distribution. You can roll more dice if you'd like and you'll get a better approximation.
Here's an article that explains why this works. It's probably more mathematical detail than you want, but you could show it to someone to justify your approach.
If what you want is the value of the probability density function, p(x), of a normal (Gaussian) distribution of mean mu and standard deviation sigma at x, the formula is
p(x) = exp( ((x-mu)^2)/(2*sigma^2) ) / (sigma * 2 * sqrt(pi))
where pi is the area of a circle divided by the square of its radius (approximately 3.14159...). Using the C standard library math.h, this is:
#include <math>
double normal_pdf(double x, double mu, double sigma) {
double n = sigma * 2 * sqrt(M_PI); //normalization factor
p = exp( -pow(x-mu, 2) / (2 * pow(sigma, 2)) ); // unnormalized pdf
return p / n;
}
Of course, you can do the same in Objective-C.
For reference, see the Wikipedia or MathWorld articles.
It sounds like you want to write a function that yields a curve of a specific shape. Something like y = f(x), for x in [0:10]. You have a constraint on the max value of y, and a general idea of what you want the curve to look like (somewhat bell-shaped, y=0 at the edges of the x range, y=5 when x=5). So roughly, you would call your function iteratively with the x range, with a step that gives you enough points to make your curve look nice.
So you really don't need random numbers, and this has nothing to do with probability unless you want it to (as in, you want your curve to look like a the outline of a normal distribution or something along those lines).
If you have a clear idea of what function will yield your desired curve, the code is trivial - a function to compute f(x) and a for loop to call it the desired number of times for the desired values of x. Plot the x,y pairs and you're done. So that's your algorithm - call a function in a for loop.
The contents of the routine implementing the function will depend on the specifics of what you want the curve to look like. If you need help on functions that might return a curve resembling your sample, I would direct you to the reading material in the other answers. :) However, I suspect that this is actually an assignment of some sort, and that you have been given a function already. If you are actually doing this on your own to learn, then I again echo the other reading suggestions.
y=-1*abs(x-5)+5