Function call faster than on the fly calculation? - optimization

I am now seriously confused. I have a function creating a table with a random number of entries, and I tried two different methods to choose that number (which is somewhat wheighted):
Method 1, separated function
local function n()
local n = math.random()
if n < .7 then return 0
elseif n < .8 then return 1
end
return 2
end
local function final()
for i = 1, n() do
...
end
end
Method 2, direct calculation
local function final()
local n = math.random()
if n < .7 then n = 0
elseif n < .8 then n = 1
else n = 2
end
for i = 1, n do
...
end
end
The problem is: for some reason, the first method performs 30% faster than the second. Why is this?

No, call will never be faster than plainly inlining it. All the difference for the first method is adding extra work of setting up stack and dismantling it. The rest of code, both original and compiled is exactly the same, so it is only natural that "just calculation" will be faster than "just calculation + some extra work".
Your benchmark seem to be imprecise. For such a lightweight function a for loop and os.clock call themselves will take almost as many time as the function itself, so combined with os.clock inherent low resoulution and small amount of loops your data is not really statistically significant and you're mostly seeing results of random hiccups in your hardware. Use better timer and increase number of loops to at least 1000000.

Related

How the complexity of the following code is O(nlogn)?

for(i=1;i<=n;i=i*2)
{
for(j=1;j<=i;j++)
{
}
}
How the complexity of the following code is O(nlogn) ?
Time complexity in terms of what? If you want to know how many inner loop operations the algorithm performs, it is not O(n log n). If you want to take into account also the arithmetic operations, then see further below. If you literally are to plug in that code into a programming language, chances are the compiler will notice that your code does nothing and optimise the loop away, resulting in constant O(1) time complexity. But only based on what you've given us, I would interpret it as time complexity in terms of whatever might be inside the inner loop, not counting arithmetic operations of the loops themselves. If so:
Consider an iteration of your inner loop a constant-time operation, then we just need to count how many iterations the inner loop will make.
You will find that it will make
1 + 2 + 4 + 8 + ... + n
iterations, if n is a square number. If it is not square, it will stop a bit sooner, but this will be our upper limit.
We can write this more generally as
the sum of 2i where i ranges from 0 to log2n.
Now, if you do the math, e.g. using the formula for geometric sums, you will find that this sum equals
2n - 1.
So we have a time complexity of O(2n - 1) = O(n), if we don't take the arithmetic operations of the loops into account.
If you wish to verify this experimentally, the best way is to write code that counts how many times the inner loop runs. In javascript, you could write it like this:
function f(n) {
let c = 0;
for(i=1;i<=n;i=i*2) {
for(j=1;j<=i;j++) {
++c;
}
}
console.log(c);
}
f(2);
f(4);
f(32);
f(1024);
f(1 << 20);
If you do want to take the arithmetic operations into account, then it depends a bit on your assumptions but you can indeed get some logarithmic coefficients to account for. It depends on how you formulate the question and how you define an operation.
First, we need to estimate number of high-level operations executed for different n. In this case the inner loop is an operation that you want to count, if I understood the question right.
If it is difficult, you may automate it. I used Matlab for example code since there was no tag for specific language. Testing code will look like this:
% Reasonable amount of input elements placed in array, change it to fit your needs
x = 1:1:100;
% Plot linear function
plot(x,x,'DisplayName','O(n)', 'LineWidth', 2);
hold on;
% Plot n*log(n) function
plot(x, x.*log(x), 'DisplayName','O(nln(n))','LineWidth', 2);
hold on;
% Apply our function to each element of x
measured = arrayfun(#(v) test(v),x);
% Plot number of high level operations performed by our function for each element of x
plot(x,measured, 'DisplayName','Measured','LineWidth', 2);
legend
% Our function
function k = test(n)
% Counter for operations
k = 0;
% Outer loop, same as for(i=1;i<=n;i=i*2)
i = 1;
while i < n
% Inner loop
for j=1:1:i
% Count operations
k=k+1;
end
i = i*2;
end
end
And the result will look like
Our complexity is worse than linear but not worse than O(nlogn), so we choose O(nlogn) as an upper bound.
Furthermore the upper bound should be:
O(n*log2(n))
The worst case is n being in 2^x. x€real numbers
The inner loop is evaluated n times, the outer loop log2 (logarithm basis 2) times.

Optimising Sparse Array Math

I have a sparse array: term_doc
its size is 622256x715 of Float64. It is very sparse:
Of its ~444,913,040 cells, only about 22,215 of them normally are nonempty.
Of the 622256 rows only 4,699 are occupied
though of the 715 columns all are occupied.
The operator I would like to perform can be described as returning the row normalized and column normalized versions this matrix.
The Naive nonsparse version, I wrote is:
function doUnsparseWay()
gc() #Force Garbage collect before I start (and periodically during). This uses alot of memory
term_doc
N = term_doc./sum(term_doc,1)
println("N done")
gc()
P = term_doc./sum(term_doc,2)
println("P done")
gc()
N[isnan(N)] = 0.0
P[isnan(P)] = 0.0
N,P,term_doc
end
Running this:
> #time N,P,term_doc= doUnsparseWay()
outputs:
N done
P done
elapsed time: 30.97332475 seconds (14466 MB allocated, 5.15% gc time in 13 pauses with 3 full sweep)
It is fairly simple.
It chews memory, and will crash if the garbage collection does not occur at the right times (Thus I call it manually).
But it is fairly fast
I wanted to get it to work on the sparse matrix.
So as not to chew my memory out,
and because logically it is a faster operation -- less cells need operating on.
I followed suggestions from this post and from the performance page of the docs.
function doSparseWay()
term_doc::SparseMatrixCSC{Float64,Int64}
N= spzeros(size(term_doc)...)
N::SparseMatrixCSC{Float64,Int64}
for (doc,total_terms::Float64) in enumerate(sum(term_doc,1))
if total_terms == 0
continue
end
#fastmath #inbounds N[:,doc] = term_doc[:,doc]./total_terms
end
println("N done")
P = spzeros(size(term_doc)...)'
P::SparseMatrixCSC{Float64,Int64}
gfs = sum(term_doc,2)[:]
gfs::Array{Float64,1}
nterms = size(term_doc,1)
nterms::Int64
term_doc = term_doc'
#inbounds #simd for term in 1:nterms
#fastmath #inbounds P[:,term] = term_doc[:,term]/gfs[term]
end
println("P done")
P=P'
N[isnan(N)] = 0.0
P[isnan(P)] = 0.0
N,P,term_doc
end
It never completes.
It gets up to outputting "N Done",
but never outputs "P Done".
I have left it running for several hours.
How can I optimize it so it can complete in reasonable time?
Or if this is not possible, explain why.
First, you're making term_doc a global variable, which is a big problem for performance. Pass it as an argument, doSparseWay(term_doc::SparseMatrixCSC). (The type annotation at the beginning of your function does not do anything useful.)
You want to use an approach similar to the answer by walnuss:
function doSparseWay(term_doc::SparseMatrixCSC)
I, J, V = findnz(term_doc)
normI = sum(term_doc, 1)
normJ = sum(term_doc, 2)
NV = similar(V)
PV = similar(V)
for idx = 1:length(V)
NV[idx] = V[idx]/normI[J[idx]]
PV[idx] = V[idx]/normJ[I[idx]]
end
m, n = size(term_doc)
sparse(I, J, NV, m, n), sparse(I, J, PV, m, n), term_doc
end
This is a general pattern: when you want to optimize something for sparse matrices, extract the I, J, V and perform all your computations on V.

Optimization of If statement in a Loop

I have the following code that runs inside a loop that is executed 100 000 times per frame (it is a game):
If (_vertices(vertexIndex).X > _currentPosition.X - 100) And (_vertices(vertexIndex).X < _currentPosition.X + 100) And (_vertices(vertexIndex).X Mod 4) And (_vertices(vertexIndex).Z Mod 4) Then
_grassPatches(i Mod 9).Render(_vertices(vertexIndex))
End If
With this code my game runs at around 8 FPS. If I comment out the Render line, the game runs at around 100 FPS, however, if I comment out the whole If loop, the framerate increases to around 400 FPS. I don't understand why does this If ... And ... And ... And ... Then loop slows down my game so much. Is it because of the multiple Ands?
Any help would be appreciated.
EDIT 1:
Here is one of the ways I tried to improve the performance (also includes some extra code to show context):
Dim i As Integer = 0
Dim vertex As Vector3
Dim curPosX As Integer
For vertexIndex As Integer = _startIndex To _endIndex
vertex = _vertices(vertexIndex)
curPosX = _currentPosition.X
If (vertex.X > curPosX - 100) And (vertex.X < curPosX + 100) And (vertex.X Mod 4) And (vertex.Z Mod 4) Then
_grassPatches(i Mod 9).Render(_vertices(vertexIndex))
End If
i += 1
Next
EDIT 2: could my problem be due to a Branch Prediction fail? (Why is it faster to process a sorted array than an unsorted array?)
EDIT 3: I also tried replacing all the Ands with AndAlsos. This didn't lead to any performance benefit.
Your problem probably comes from the use of the Mod operator. If you can avoid using it, or find another way of getting to your result, it will make your loop faster.
Cheers

Optimization of "static" loops

I'm writing a compiled language for fun, and I've recently gotten on a kick for making my optimizing compiler very robust. I've figured out several ways to optimize some things, for instance, 2 + 2 is always 4, so we can do that math at compile time, if(false){ ... } can be removed entirely, etc, but now I've gotten to loops. After some research, I think that what I'm trying to do isn't exactly loop unrolling, but it is still an optimization technique. Let me explain.
Take the following code.
String s = "";
for(int i = 0; i < 5; i++){
s += "x";
}
output(s);
As a human, I can sit here and tell you that this is 100% of the time going to be equivalent to
output("xxxxx");
So, in other words, this loop can be "compiled out" entirely. It's not loop unrolling, but what I'm calling "fully static", that is, there are no inputs that would change the behavior of the segment. My idea is that anything that is fully static can be resolved to a single value, anything that relies on input or makes conditional output of course can't be optimized further. So, from the machine's point of view, what do I need to consider? What makes a loop "fully static?"
I've come up with three types of loops that I need to figure out how to categorize. Loops that will always end up with the same machine state after every run, regardless of inputs, loops that WILL NEVER complete, and loops that I can't figure out one way or the other. In the case that I can't figure it out (it conditionally changes how many times it will run based on dynamic inputs), I'm not worried about optimizing. Loops that are infinite will be a compile error/warning unless specifically suppressed by the programmer, and loops that are the same every time should just skip directly to putting the machine in the proper state, without looping.
The main case of course to optimize is the static loop iterations, when all the function calls inside are also static. Determining if a loop has dynamic components is easy enough, and if it's not dynamic, I guess it has to be static. The thing I can't figure out is how to detect if it's going to be infinite or not. Does anyone have any thoughts on this? I know this is a subset of the halting problem, but I feel it's solvable; the halting problem is a problem due to the fact that for some subsets of programs, you just can't tell it may run forever, it may not, but I don't want to consider those cases, I just want to consider the cases where it WILL halt, or it WILL NOT halt, but first I have to distinguish between the three states.
This looks like a kind of a symbolic solver that can be defined for several classes, but not generally.
Let's restrict the requirements a bit: no number overflow, just for loops (while can be sometimes transformed to full for loop, except when using continue etc.), no breaks, no modifications of the control variable inside the for loop.
for (var i = S; E(i); i = U(i)) ...
where E(i) and U(i) are expressions that can be symbolically manipulated. There are several classes that are relatively easy:
U(i) = i + CONSTANT : n-th cycle the value of i is S + n * CONSTANT
U(i) = i * CONSTANT : n-th cycle the value of i is S * CONSTANT^n
U(i) = i / CONSTANT : n-th cycle the value of i is S * CONSTANT^-n
U(i) = (i + CONSTANT) % M : n-th cycle the value of i is (S + n * CONSTANT) % M
and some other quite easy combinations (and some very difficult ones)
Determining whether the loop terminates is searching for n where E(i(n)) is false.
This can be done by some symbolic manipulation for a lot of cases, but there is a lot of work involved in making the solver.
E.g.
for(int i = 0; i < 5; i++),
i(n) = 0 + n * 1 = n, E(i(n)) => not(n < 5) =>
n >= 5 => stops for n = 5
for(int i = 0; i < 5; i--),
i(n) = 0 + n * -1 = -n, E(i(n)) => not(-n < 5) => -n >= 5 =>
n < -5 - since n is a non-negative whole number this is never true - never stops
for(int i = 0; i < 5; i = (i + 1) % 3),
E(i(n)) => not(n % 3 < 5) => n % 3 >= 5 => this is never true => never stops
for(int i = 10; i + 10 < 500; i = i + 2 * i) =>
for(int i = 10; i < 480; i = 3 * i),
i(n) = 10 * 3^n,
E(i(n)) => not(10 * 3^n < 480) => 10 * 3^n >= 480 => 3^n >= 48 => n >= log3(48) => n >= 3.5... =>
since n is whole => it will stop for n = 4
for other cases it would be good if they can get transformed to the ones you can already solve...
Many tricks for symbolic manipulation come from Lisp era, and are not too difficult. Although the ones described (or variants) are the most common types practice, there are many more difficult and/or impossible to solve scenarios.

Prime number checker unbelievably slow

I have this piece of code which checks whether a given number is prime:
If x Mod 2 = 0 Then
Return False
End If
For i = 3 To x / 2 + 1 Step 2
If x Mod i = 0 Then
Return False
End If
Next
Return True
I only use it for numbers 1E7 <= x <= 2E7. However, it is extremely slow - I can hardly check 300 numbers a second, so checking all x's would take more than 23 days...
Could someone give some improvements tips or say what I might be doing redundantly this way?
That is general algorithm for checking prime number.
If you want to check prime number in bulk use algorithm http://en.wikipedia.org/wiki/Sieve_of_Eratosthenes
Look up the term "Sieve of Eratosthenes". It's a two thousand years old algorithm which is way better than yours. It's often taught in school.
You should definitely change your algorithm! You can try Sieve of Eratosthenes or a more advanced Fermat primality test. Beware that your code will be more complicated, as you would need to implement modular arithmetics. Look here for the list of some even more mathematically advanced methods.
You can also look for AKS primality test.
This is a good algorithm for checking primality.
Since x/2 + 1 is a constant through out the looping operation, keep it in a separate variable before the For loop. Thus, saving a division & addition operation every time you loop. Though this might slightly increase the performance.
Use the Sieve of Eratosthenes to create a Set that contains all the prime numbers up to the largest number you need to check. It will take a while to set up the Set, but then checking if a number exists in it will be very fast.
Split range in some chunks, and do checks in two or more threads, if you have multicore cpu. Or use Parallel.For.
To check if the number is prime you have only check if it can't be divided by primes less then it.
Please check following snippet:
Sub Main()
Dim primes As New List(Of Integer)
primes.Add(1)
For x As Integer = 1 To 1000
If IsPrime(x, primes) Then
primes.Add(x)
Console.WriteLine(x)
End If
Next
End Sub
Private Function IsPrime(x As Integer, primes As IEnumerable(Of Integer)) As Boolean
For Each prime In primes
If prime <> 1 AndAlso prime <> x Then
If x Mod prime = 0 Then
Return False
End If
End If
Next
Return True
End Function
it slow because you use the x/2. I modified your code a little bit. (I don't know about syntax of VB, Maybe you have to change my syntax.)
If x < 2 Then
Return False
IF x == 2 Then
Return True
If x Mod 2 = 0 Then
Return False
End If
For i = 3 To (i*i)<=x Step 2
If x Mod i = 0 Then
Return False
End If
Next
Return True