Why does floating point addition took longer than multiplication

Why does floating point addition took longer than multiplication - optimization

I was working with PIC18f4550 and the program is critical to speed. when I multiply two floating variables it tooks the PIC about 140 cycles to perform the multiplication. I am measuring it with PIC18f4550 timer1.
variable_1 = variable_2 * variable_3; // took 140 cycles to implement
On the the other hand when I add the same two variables the PIC tooks 280 cycles to perfom the addition.
variable_1 = variable_2 + variable_3; // took 280 cycles to implement
I have seen that the number of cycles vary if the variables changed depend on their exponents.
What is the reason of those more cycles? though I was thinking the addition is more simple than multiplication.
Is there any solution?

For floating point addition, the operands need to be adjusted so that they have the same exponent before the add, and that involves shifting one of the mantissas across byte boundaries, whereas a multiply is basically multiplying the mantissas and adding the exponents.
Since the PIC apparently has a small hardware multiplier, it may not be surprising that sometimes the multiply can be faster than doing a multi-byte shift (especially if the PIC only has single bit shift instructions).
Unless a processor has direct support for it, floating point is always slow, and you should certainly consider arranging your code to use fixed point if at all possible. Getting rid of the floating point library would probably free up a lot of code space as well.

Related

MPFR - Loss precision after addition

First, sorry if this question looks "silly", because I'm new to MPFR, LOL.
I have two mpfr_t variables with precision of 1024, and they have the value of 0.2 and 0.06 stored in them.
But when I add these variables, things goes wrong and the result (which is also a mpfr_t variable) has the value of 0.2599999...
This is strange because the MPFR library should maintain the precision (isn't it?).
Could you please help me with this? Thanks so much, so much in advance.

MPFR numbers are represented in binary (base 2). In this system, the only numbers that can be represented exactly have the form N·2k, where N and k are integers. Neither 0.2 = 1/5 nor 0.06 = 3/50 have this form, so that they are approximated with some small error. When you add these variables, you are seeing a consequence of this error (there may be also another error in the addition operation since in binary these numbers have many nonzero digits, unlike in decimal).
This is the same issue as the one described in: Is floating point math broken?
EDIT:
To answer the question in comment "Is there a way to avoid this situation?", no, there is no way to avoid this situation in practice, except in very specific cases. For instance, if all your numbers (inputs and results of each intermediate operations) are decimal numbers, representable with a small enough number of digits, you can use a decimal arithmetic (but MPFR can't do that). Computer algebra systems may help in some cases. There's also iRRAM... I'll come back to it later.
However, there are solutions to attempt to hide issues with numerical errors. You need to estimate the maximum possible error on a computed value. With an error analysis, you can obtain rigorous bounds, but this may be difficult or take time to do. Note that rigorous bounds are pessimistic in general, but if you use arbitrary precision (e.g. with MPFR), this is less an issue. The analysis can be done dynamically with interval arithmetic (still pessimistic, even worse). But perhaps a simple estimate is sufficient for you. Once you have an estimate of the maximum error:
For the output, choose the number of displayed digits so that the error is less than the weight of the last displayed digit.
For discontinuous functions (e.g. equality test, floor, ceil): if the distance between the computed value and a discontinuity point is less than the maximum error, assume that the actual value is equal to the discontinuity point. Note that this is just a heuristic, but if it fails (this may remain unnoticed and will probably invalidate your estimate), this means that you have not done your computations with enough precision.
Note: MPFR won't do that for you. But you can write code to take these rules into account.
The iRRAM package, which is based on MPFR, can track the error in a rigorous way (like with interval arithmetic) and automatically redo all the computations in a higher precision if it notices that the accuracy is too low. However, if some mathematical result is a discontinuity point, iRRAM won't help. In particular, it cannot provide a rigorous equality test.
Finally, I suggest that you have a look at Goldberg's paper What Every Computer Scientist Should Know About Floating-Point Arithmetic, in particular the notion of cancellation.

Big Oh! algorithms running in O(4^N)

Locked. There are disputes about this question’s content being resolved at this time. It is not currently accepting new answers or interactions.
For algorithms running in
O(16^N)
If we triple the size, the time is multiplied by what number??

This is an interesting question because while equivalent questions for runtimes like Θ(n) or Θ(n3) have clean answers, the answer here is a bit more nuanced.
Let's start with a simpler question. We have an algorithm whose runtime is Θ(n2), and on a "sufficiently large" input the runtime is T seconds. What should we expect the runtime to be once we triple the size of the input? To answer this, let's imagine, just for simplicity's sake, that the actual runtime of this function is closely approximated by cn2, and let's have k be the "sufficiently large" input we plugged into it. Then, plugging in 3k, we see that the runtime is
c(3k)2 = 9ck2 = 9(ck2) = 9T.
That last step follows because the cost of running the algorithm on an input of size k is T, meaning that ck2 = T.
Something important to notice here - tripling the size of the input does not change the fact that the runtime here is Θ(n2). The runtime is still quadratic; we're just changing how big the input is.
More generally, for any algorithm whose runtime is Θ(nm) for some fixed constant m, the runtime will grow by roughly a factor of 3m if you triple the size of the input. That's because
c(3k)m = 3mckm = 3mT.
But something interesting happens if we try performing this same analysis on a function whose runtime is Θ(4n). Let's imagine that we ran this algorithm on some input k and it took T time units to finish. Then running this algorithm on an input of size 3k will take time roughly
c43k = c4k42k = T42k = 16kT.
Notice how we aren't left with a constant multiple of the original cost, but rather something that's 16k times bigger. In particular, that means that the amount by which the algorithm slows down will depend on how big the input is. For example, the slowdown going from input size 10 to input size 30 is a factor of 1620, while the slowdown going from input size 30 to input size 90 is a staggering 1660. For what it's worth, 1660 = 2240, which is pretty close to the number of atoms in the observable universe.
And, intuitively, that makes sense. Exponential functions grow at a rate proportional to how big they already are. That means that the scale in runtime for doubling or tripling the size of the input will lead to a runtime change that depends on the size of that input.
And, as above, notice that the runtime is not now Θ(43n). The runtime is still Θ(4n); we're just changing which inputs we're plugging in.
So, to summarize:
The runtime of the function slows down by a factor of 42n if you triple the size of the input n. This means that the slowdown depends on how big the input is.
The runtime of the function stays at Θ(4n) when we do this. All that's changing is where we're evaluating the 4n.
Hope this helps!

The time complexity of the algorithm represents the growth in run-time of the algorithm with respect to the growth in input size. So, if our input size increases by 3 times, that means we have now new value for our input size.
Hence, time complexity of the algorithm still remains same. i.e O(4^N)

Why does my simulation compute only a certain number of digits before only changing the power magnitude?

I am using another person's code to try and demonstrate this problem in physics:
a large mass M collides with a smaller mass m, which then moves moves to rebound off a wall returning to collide with the larger mass M. This process repeats until larger mass has turned and its velocity sign flips. If the mass of the larger block is 16*100^n (where n is an integer) times more massive than the first block the number of collisions between the large block and the small block compute the (n+1) digits of pi. For example: when the block is 1600 times bigger there are 31 collisions. If the block is 16000000 there are be 3141 collisions.
I did my code in vPython and it works, but only until a certain amount. I was able to get 31415 collisions when the original code. When I make N=5 the simulation completely fails and the screen turns black. Apparently this is because the time step is not small enough. So I tried to make it smaller and see if it can compute more numbers and it does. I was able to count 314159 collisions by changing the time step to 0.00001. But then I input N=6 and again it collapses. So I try to increase the time step to 0.000001 and it works but only gives me the number 3.14159e+6 without the extra digit of pi.
enter image description here
Can someone please tell why this is. Why do I not get the next digit. Is my computer not strong enough. I do not need to actually fix this problem, that is not the point, I just need to understand the limitations of my simulation and computer and why it cannot compute the next digit.

Numerical Accuracy: to scale or not?

I am working on a n-body gravitational simulator that takes input and produces output in metric MKS units. This involves dealing with some very large numbers (like solar masses expressed in kilograms, semimajor axes of planetary orbits expressed in meters, and timescales of years expressed in seconds), which get multiplied by some very small numbers (notably, the gravitational constant, which is 6.67384e-11 in MKS units), and also the occasional very small number getting added to or subtracted from a very large number (mainly when summing up pairwise accelerations), which gets me concerned about the effects of rounding errors.
I've already taken the step of replacing all masses m by Gm (premultiplying by the gravitational constant), which significantly reduces the total number of multiplies, and makes the mass numbers much smaller, and that seems to have had a positive effect on both efficiency and accuracy, as judged by how well the simulator conserves energy.
I am wondering, however: is potentially it worth trying to do some internal re-scaling into different units to further minimize floating point errors? And if so, what kind of range (for double-precision floats) should I be trying to get my numbers centered on for maximum accuracy?

In general if you want precise results in physical based rendering you don't want to use floats or doubles since they have massive rounding problems and thus introduce errors in your simulation.
If you need or want to stick with floats/double you probably should rescale around zero. The reason is that often floating point representations have a higher "density" of values around this point and tend to have fewer on the min/max sides. Image example from google
I would suggest that you change all values to integer based number variables. This erases rounding errors (over/underflow can still happen!) and speeds up the calculation process by an order of magnitude because normal CPUs work faster with integer operations. In case of GPU its basically the same but thats another story all by its own...
But before you take such an effort to further improve your accuracy i would strongly advise an arbitrary precision number library. This may come with an performance loss but should be way easier and yield better results than a rescaling of your values.

Most of the numerical mathematicians come across this problem.
At first let me remind you that you can not deal with numbers (or phsycal values) smaller than the machine epsilon for each calculation. Unfortunately the epsilon depends around which number you are analyzing. You can try eps(a) for any value of a in MATLAB, as far as I remember eps(1.0)~=2.3e-16 and eps(0)~1e-298.
That's why in numerical methods you avoid calculations using very different scaled numbers. Because one is just an ignored (smaller than its epsilon) by the other value and rounding errors are inevitable.
But what else people do? If they encounter such physical problems, before coding, mathematicians analyse the problem theoritically, they make simplifications to use similarly scaled numbers.

Mathematical analysis of a sound sample (as an array of numbers)

I need to find the frequency of a sample, stored (in vb) as an array of byte. Sample is a sine wave, known frequency, so I can check), but the numbers are a bit odd, and my maths-foo is weak.
Full range of values 0-255. 99% of numbers are in range 235 to 245, but there are some outliers down to 0 and 1, and up to 255 in the remaining 1%.
How do I normalise this to remove outliers, (calculating the 235-245 interval as it may change with different samples), and how do I then calculate zero-crossings to get the frequency?
Apologies if this description is rubbish!

The FFT is probably the best answer, but if you really want to do it by your method, try this:
To normalize, first make a histogram to count how many occurrances of each value from 0 to 255. Then throw out X percent of the values from each end with something like:
for (i=lower=0;i< N*(X/100); lower++)
i+=count[lower];
//repeat in other direction for upper
Now normalize with
A[i] = 255*(A[i]-lower)/(upper-lower)-128
Throw away results outside the -128..127 range.
Now you can count zero crossings. To make sure you are not fooled by noise, you might want to keep track of the slope over the last several points, and only count crossings when the average slope is going the right way.

The standard method to attack this problem is to consider one block of data, hopefully at least twice the actual frequency (taking more data isn't bad, so it's good to overestimate a bit), then take the FFT and guess that the frequency corresponds to the largest number in the resulting FFT spectrum.
By the way, very similar problems have been asked here before - you could search for those answers as well.

Use the Fourier transform, it's much more noise insensitive than counting zero crossings
Edit: #WaveyDavey
I found an F# library to do an FFT: From here
As it turns out, the best free
implementation that I've found for F#
users so far is still the fantastic
FFTW library. Their site has a
precompiled Windows DLL. I've written
minimal bindings that allow
thread-safe access to FFTW from F#,
with both guru and simple interfaces.
Performance is excellent, 32-bit
Windows XP Pro is only up to 35%
slower than 64-bit Linux.
Now I'm sure you can call F# lib from VB.net, C# etc, that should be in their docs

If I understood well from your description, what you have is a signal which is a combination of a sine plus a constant plus some random glitches. Say, like
x[n] = A*sin(f*n + phi) + B + N[n]
where N[n] is the "glitch" noise you want to get rid of.
If the glitches are one-sample long, you can remove them using a median filter which has to be bigger than the glitch length. On both sides of the glitch. Glitches of length 1, mean you will have enough with a median of 3 samples of length.
y[n] = median3(x[n])
The median is computed so: Take the samples of x you want to filter (x[n-1],x[n],x[n+1]), sort them, and your output is the middle one.
Now that the noise signal is away, get rid of the constant signal. I understand the buffer is of a limited and known length, so you can just compute the mean of the whole buffer. Substract it.
Now you have your single sinus signal. You can now compute the fundamental frequency by counting zero crossings. Count the amount of samples above 0 in which the former sample was below 0. The period is the total amount of samples of your buffer divided by this, and the frequency is the oposite (1/x) of the period.

Although I would go with the majority and say that it seems like what you want is an fft solution (fft algorithm is pretty quick), if fft is not the answer for whatever reason you may want to try fitting a sine curve to the data using a fitting program and reading off the fitted frequency.
Using Fityk, you can load the data, and fit to a*sin(b*x-c) where 2*pi/b will give you the frequency after fitting.
Fityk can be used from a gui, from a command-line for scripting and has a C++ API so could be included in your programs directly.

I googled for "basic fft". Visual Basic FFT Your question screams FFT, but be careful, using FFT without understanding even a little bit about DSP can lead results that you don't understand or don't know where they come from.

get the Frequency Analyzer at http://www.relisoft.com/Freeware/index.htm and run it and look at the code.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas