Why is closed form for fibonacci sequence not used in practice? - sequence

There is a closed form for the Fibonacci sequence that can be obtained via generating functions. It is:
f_n = 1/sqrt(5) (phi^n-\psi^n)
For what the terms mean, see the link above or here.
However, it is discussed here that this closed form isn't really used in practice because it starts producing the wrong answers when n becomes around a hundred and larger.
But in the answer here, it seems one of the methods employed is fast matrix exponentiation which can be used to get the nth Fibonacci number very efficiently in O(log(n)) time.
But then, the closed form expression involves a bunch of terms that are raised to the nth power. So, you could calculate all those terms with fast exponentiation and get the result efficiently that way. Why would fast exponentiation on a matrix be better than doing it on scalars that show up in the closed-form expression? And besides, looking for how to do fast exponentiation of a matrix efficiently, the accepted answer here suggests we convert to the diagonal form and do it on scalars anyway.
The question then is - if fast exponentiation of a matrix is good for calculating the nth Fibonacci number in O(log(n)) time, why isn't the closed form a good way to do it when it involves fast exponentiation on scalars?

The "closed form" formula for computing Fibonacci numbers, you need to raise irrational numbers to the power n, which means you have to accept using only approximations (typically, double-precision floating-point arithmetic) and therefore inaccurate results for large numbers.
On the contrary, in the "matrix exponentiation" formula for computing Fibonacci numbers, the matrix you are raising to the power n is an integer matrix, so you can do integer calculations with no loss of precision using a "big int" library to do arithmetic with arbitrarily large integers (or if you use a language like Python, "big ints" are the default).
So the difference is that you can't do exact arithmetic with irrational numbers but you can with integers.

Note that "In practice" here is referring to competitive programming (in reality, you basically never want to compute massive fibonacci numbers). So, the first reason is that the normal way of calculating fibonacci numbers is way faster to type without making any errors, and less code. Plus, it will be faster than the fancy method for small numbers.
When it comes to big numbers, Fast Matrix multiplication is O(log(n)) if you don't care about precision. However, in competitive programming we almost always care about precision and want the correct answer. To do that, we would need to increase the precision of our numbers. And the bigger n gets, the more precision is required. I don't know the exact formula, but I would imagine that because of the increased precision required, the matrix multiplication which requires only O(log n) multiplications will require something like O(log n) bits of precision so the time complexity will actually end up being somewhat bad (O(log^3 n) maybe?). Not to mention, even harder to code and very slow because you are multiplying arbitrary-precision numbers.

Related

Computing GCD on sorted array

Is it possible to get some optimization on any algorithm used for getting the gcd of numbers in an array if the array is sorted?
Thanks!
So, let's see. The general method of finding the GCD of an array of numbers is:
result = a[0]
for i = 1 to length(a)-1
result = gcd(result, a[i])
So what's the complexity of the gcd algorithm? Well, that's a rather involved question. See, for example, Time complexity of Euclid's Algorithm
If we pretend, as posited in the accepted answer, that the GCD algorithm is constant time (i.e. O(1)), then the complexity of the loop above is O(n). That's a reasonable assumption for numbers that fit into computer registers. And if that's the case then spending O(n log n) time to sort the array would almost certainly be a loser.
But in reality the GCD calculation is linear in the number of digits in the two numbers. If your input data consists of lots of large numbers, it's possible that sorting the array first will give you an advantage. The reasoning is that the result of gcd(a, b) will by definition give you a number that's no larger than min(a,b). So by getting the GCD of the two smallest numbers first, you limit the number of digits you have to deal with. Whether that limiting will overcome the cost of sorting the array is unclear.
If the numbers are larger than will fit into a computer register (hundreds of digits), then the GCD calculation is more expensive. But then again, so is sorting.
So the answer to your question is that sorting will almost certainly increase the speed of calculating the GCD of an array of numbers, but whether the performance improvement will offset the cost of sorting is unclear.
I think the only way you'll know for sure is to test it with representative data.

Complexity of division

The article Computational complexity of mathematical operations mentions that the complexity of division in O(M(n)), and that "M(n) below stands in for the complexity of the chosen multiplication algorithm".
But I'm not sure how to read that M(n) embedded in O(M(n)): does it mean that the division has the same complexity as multiplication?
If I use, say, Karatsuba multiplication algorithm, will the division also take O(n^1.585)?
does it mean that the division has the same complexity as multiplication?
Formally, it means division cannot have a complexity worse than multiplication. But in practice, the notation is often use to say they have the same complexity.
If I use, say, Karatsuba multiplication algorithm, will the division also take O(n^1.585)?
According to the statement, yes.
However, I am not sure that statement is correct. Indeed, when looking at the Newton–Raphson method, I see it is an iterative process, which has to be repeated a certain number of times to be exact, in the order of log(n) (see the discussion about S here).
In that case, the complexity would be O(log(n)M(n)).
However, if it is not a problem for you to have only a fixed precision (ie. the number of correct digits) whatever the size of the operands are, you can set a constant number of iterations, resulting in a O(M(n)) complexity.

Is multiplying y by 2^x and subtracting y faster that multiplying y by [(2^x)-1] directly?

I have a rather theoretical question:
Is multiplying y by 2^x and subtracting y faster than
multiplying y by [(2^x)-1] directly?
(y*(2^x) - y) vs (y*((2^x)-1))
I implemented a moving average filter on some data I get from a sensor. The basic idea is that I want to average the last 2^x values by taking the old average, multiplying that by [(2^x)-1], adding the new value, and dividing again by 2^x. But because I have to do this more than 500 times a second, I want to optimize it as much as possible.
I know that floating point numbers are represented in IEEE754 and therefore, multiplying and dividing by a power of 2 should be rather fast (basically just changing the mantissa), but how to do that most efficiently? Should I simply stick with just multiplying ((2^x)-1), or is multiplying by 2.0f and subtracting y better, or could I even do that more efficiently by performing a leftshift on the mantissa? And if that is possible, how to implement that properly?
Thank you very much!
I don't think that multiplying a floating-point number by a power of two is faster in practice than a generic multiplication (though I agree that in theory it should be faster, assuming no overflow/underflow). Said otherwise, I don't think that there is a hardware optimization.
Now, I can assume that you have a modern processor, i.e. with a FMA. In this case, (y*(2^x) - y) is faster if performed as fma(y,2^x,-y) (the way you have to write the expression depends on your language and implementation): a FMA should be as fast as a multiplication in practice.
Note also that the speed may also depend on the context. For instance, I've observed on simple code that doing more work can surprisingly yield faster code! So, you need to test (on your real code, not with an arbitrary benchmark).

does every algorithm have Big Omega?

does every algorithm have Big Omega?
Is it possible for algorithms to have both Big O and Big Omega (but not equal to each other- not Big Theta) ?
For instance Quicksort's Big O - O(n log n) But does it have Big Omega? If it does, how do i calculate it?
First, it is of paramount importance that one not confuse the bound with the case. A bound - like Big-Oh, Big-Omega, Big-Theta, etc. - says something about a rate of growth. A case says something about the kinds of input you're currently considering being processed by your algorithm.
Let's consider a very simple example to illustrate the distinction above. Consider the canonical "linear search" algorithm:
LinearSearch(list[1...n], target)
1. for i := 1 to n do
2. if list[i] = target then return i
3. return -1
There are three broad kinds of cases one might consider: best, worst, and average cases for inputs of size n. In the best case, what you're looking for is the first element in the list (really, within any fixed number of the start of the list). In such cases, it will take no more than some constant amount of time to find the element and return from the function. Therefore, the Big-Oh and Big-Omega happen to be the same for the best case: O(1) and Omega(1). When both O and Omega apply, we also say Theta, so this is Theta(1) as well.
In the worst case, the element is not in the list, and the algorithm must go through all n entries. Since f(n) = n happens to be a function that is bound from above and from below by the same class of functions (linear ones), this is Theta(n).
Average case analysis is usually a bit trickier. We need to define a probability space for viable inputs of length n. One might say that all valid inputs (where integers can be represented using 32 bits in unsigned mode, for instance) are equally probable. From that, one could work out the average performance of the algorithm as follows:
Find the probability that target is not represented in the list. Multiply by n.
Given that target is in the list at least once, find the probability that it appears at position k for each 1 <= k <= n. Multiply each P(k) by k.
Add up all of the above to get a function in terms of n.
Notice that in step 1 above, if the probability is non-zero, we will definitely get at least a linear function (exercise: we can never get more than a linear function). However, if the probability in step 1 is indeed zero, then the assignment of probabilities in step 2 makes all the difference in determining the complexity: you can have best-case behavior for some assignments, worst-case for others, and possibly end up with behavior that isn't the same as best (constant) or worst (linear).
Sometimes, we might speak loosely of a "general" or "universal" case, which considers all kinds of input (not just the best or the worst), but that doesn't give any particular weighting to inputs and doesn't take averages. In other words, you consider the performance of the algorithm in terms of an upper-bound on the worst-case, and a lower-bound on the best-case. This seems to be what you're doing.
Phew. Now, back to your question.
Are there functions which have different O and Omega bounds? Definitely. Consider the following function:
f(n) = 1 if n is odd, n if n is even.
The best case is "n is odd", in which case f is Theta(1); the worst case is "n is even", in which case f is Theta(n); and if we assume for the average case that we're talking about 32-bit unsigned integers, then f is Theta(n) in the average case, as well. However, if we talk about the "universal" case, then f is O(n) and Omega(1), and not Theta of anything. An algorithm whose runtime behaves according to f might be the following:
Strange(list[1...n], target)
1. if n is odd then return target
2. else return LinearSearch(list, target)
Now, a more interesting question might be whether there are algorithms for which some case (besides the "universal" case) cannot be assigned some valid Theta bound. This is interesting, but not overly so. The reason is that you, during your analysis, are allowed to choose the cases that constitutes best- and worst-case behavior. If your first choice for the case turns out not to have a Theta bound, you can simply exclude the inputs that are "abnormal" for your purposes. The case and the bound aren't completely independent, in that sense: you can often choose a case such that it has "good" bounds.
But can you always do it?
I don't know, but that's an interesting question.
Does every algorithm have a Big Omega?
Yes. Big Omega is a lower bound. Any algorithm can be said to take at least constant time, so any algorithm is Ω(1).
Does every algorithm have a Big O?
No. Big O is a upper bound. Algorithms that don't (reliably) terminate don't have a Big O.
An algorithm has an upper bound if we can say that, in the absolute worst case, the algorithm will not take longer than this. I'm pretty sure O(∞) is not valid notation.
When will the Big O and Big Omega of an algorithm be equal?
There is actually a special notation for when they can be equal: Big Theta (Θ).
They will be equal if the algorithm scales perfectly with the size of the input (meaning there aren't input sizes where the algorithm is suddenly a lot more efficient).
This is assuming we take Big O to be the smallest possible upper bound and Big Omega to be the largest possible lower bound. This is not actually required from the definition, but they're commonly informally treated as such. If you drop this assumption, you can find a Big O and Big Omega that aren't equal for any algorithm.
Brute force prime number checking (where we just loop through all smaller numbers and try to divide them into the target number) is perhaps a good example of when the smallest upper bound and largest lower bound are not equal.
Assume you have some number n. Let's also for the time being ignore the fact that bigger numbers take longer to divide (a similar argument holds when we take this into account, although the actual complexities would be different). And I'm also calculating the complexity based on the number itself instead of the size of the number (which can be the number of bits, and could change the analysis here quite a bit).
If n is divisible by 2 (or some other small prime), we can very quickly check whether it's prime with 1 division (or a constant number of divisions). So the largest lower bound would be Ω(1).
Now if n is prime, we'll need to try to divide n by each of the numbers up to sqrt(n) (I'll leave the reason we don't need to go higher than this as an exercise). This would take O(sqrt(n)), which would also then be our smallest upper bound.
So the algorithm would be Ω(1) and O(sqrt(n)).
Exact complexity also may be hard to calculate for some particularly complex algorithms. In such cases it may be much easier and acceptable to simply calculate some reasonably close lower and upper bounds and leave it at that. I don't however have an example on hand for this.
How does this relate to best case and worst case?
Do not confuse upper and lower bounds for best and worst case. This is a common mistake, and a bit confusing, but they're not the same. This is a whole other topic, but as a brief explanation:
The best and worst (and average) cases can be calculated for every single input size. The upper and lower bounds can then be used for each of those 3 cases (separately). You can think of each of those cases as a line on a graph with input size on the x-axis and time on the y-axis and then, for each of those lines, the upper and lower bounds are lines which need to be strictly above or below that line as the input size tends to infinity (this isn't 100% accurate, but it's a good basic idea).
Quick-sort has a worst-case of Θ(n2) (when we pick the worst possible pivot at every step) and a best-case of Θ(n log n) (when we pick good pivots). Note the use of Big Theta, meaning each of those are both lower and upper bounds.
Let's compare quick-sort with the above prime checking algorithm:
Say you have a given number n, and n is 53. Since it's prime, it will (always) take around sqrt(53) steps to determine whether it's prime. So the best and worst cases are all the same.
Say you want to sort some array of size n, and n is 53. Now those 53 elements can be arranged such that quick-sort ends up picking really bad pivots and run in around 532 steps (the worst case) or really good pivots and run in around 53 log 53 steps (the best case). So the best and worst cases are different.
Now take n as 54 for each of the above:
For prime checking, it will only take around 1 step to determine that 54 is prime. The best and worst cases are the same again, but they're different from what they were for 53.
For quick-sort, you'll again have a worst case of around 542 steps and a best case of around 54 log 54 steps.
So for quick-sort the worst case always takes around n2 steps and the best case always takes around n log n steps. So the lower and upper (or "tight") bound of the worst case is Θ(n2) and the tight bound of the best case is Θ(n log n).
For our prime checking, sometimes the worst case takes around sqrt(n) steps and sometimes it takes around 1 step. So the lower bound for the worse case would be Ω(1) and upper bound would be O(sqrt(n)). It would be the same for the best case.
Note that above I simply said "the algorithm would be Ω(1) and O(sqrt(n))". This is slightly ambiguous, as it's not clear whether the algorithm always takes the same amount of time for some input size, or the statement is referring to one of the best, average or worst case.
How do I calculate this?
It's hard to give general advice for this since proofs of bounds are greatly dependent on the algorithm. You'd need to analyse the algorithm similar to what I did above to figure out the worst and best cases.
Big O and Big Omega it can be calculated for every algorithm as you can see in Big-oh vs big-theta

approximating log10[x^k0 + k1]

Greetings. I'm trying to approximate the function
Log10[x^k0 + k1], where .21 < k0 < 21, 0 < k1 < ~2000, and x is integer < 2^14.
k0 & k1 are constant. For practical purposes, you can assume k0 = 2.12, k1 = 2660. The desired accuracy is 5*10^-4 relative error.
This function is virtually identical to Log[x], except near 0, where it differs a lot.
I already have came up with a SIMD implementation that is ~1.15x faster than a simple lookup table, but would like to improve it if possible, which I think is very hard due to lack of efficient instructions.
My SIMD implementation uses 16bit fixed point arithmetic to evaluate a 3rd degree polynomial (I use least squares fit). The polynomial uses different coefficients for different input ranges. There are 8 ranges, and range i spans (64)2^i to (64)2^(i + 1).
The rational behind this is the derivatives of Log[x] drop rapidly with x, meaning a polynomial will fit it more accurately since polynomials are an exact fit for functions that have a derivative of 0 beyond a certain order.
SIMD table lookups are done very efficiently with a single _mm_shuffle_epi8(). I use SSE's float to int conversion to get the exponent and significand used for the fixed point approximation. I also software pipelined the loop to get ~1.25x speedup, so further code optimizations are probably unlikely.
What I'm asking is if there's a more efficient approximation at a higher level?
For example:
Can this function be decomposed into functions with a limited domain like
log2((2^x) * significand) = x + log2(significand)
hence eliminating the need to deal with different ranges (table lookups). The main problem I think is adding the k1 term kills all those nice log properties that we know and love, making it not possible. Or is it?
Iterative method? don't think so because the Newton method for log[x] is already a complicated expression
Exploiting locality of neighboring pixels? - if the range of the 8 inputs fall in the same approximation range, then I can look up a single coefficient, instead of looking up separate coefficients for each element. Thus, I can use this as a fast common case, and use a slower, general code path when it isn't. But for my data, the range needs to be ~2000 before this property hold 70% of the time, which doesn't seem to make this method competitive.
Please, give me some opinion, especially if you're an applied mathematician, even if you say it can't be done. Thanks.
You should be able to improve on least-squares fitting by using Chebyshev approximation. (The idea is, you're looking for the approximation whose worst-case deviation in a range is least; least-squares instead looks for the one whose summed squared difference is least.) I would guess this doesn't make a huge difference for your problem, but I'm not sure -- hopefully it could reduce the number of ranges you need to split into, somewhat.
If there's already a fast implementation of log(x), maybe compute P(x) * log(x) where P(x) is a polynomial chosen by Chebyshev approximation. (Instead of trying to do the whole function as a polynomial approx -- to need less range-reduction.)
I'm an amateur here -- just dipping my toe in as there aren't a lot of answers already.
One observation:
You can find an expression for how large x needs to be as a function of k0 and k1, such that the term x^k0 dominates k1 enough for the approximation:
x^k0 +k1 ~= x^k0, allowing you to approximately evaluate the function as
k0*Log(x).
This would take care of all x's above some value.
I recently read how the sRGB model compresses physical tri stimulus values into stored RGB values.
It basically is very similar to the function I try to approximate, except that it's defined piece wise:
k0 x, x < 0.0031308
k1 x^0.417 - k2 otherwise
I was told the constant addition in Log[x^k0 + k1] was to make the beginning of the function more linear. But that can easily be achieved with a piece wise approximation. That would make the approximation a lot more "uniform" - with only 2 approximation ranges. This should be cheaper to compute due to no longer needing to compute an approximation range index (integer log) and doing SIMD coefficient lookup.
For now, I conclude this will be the best approach, even though it doesn't approximate the function precisely. The hard part will be proposing this change and convincing people to use it.