Determining when to stop a possibly infinite iteration? - iteration

As a beginner programmer, a common problem I encounter is when to stop an iteration. For example, if I were to program a function to determine if an integer was happy or not (by brute-force ), when would I stop? Another example, concerning something like the Mandelbrot set, how would I know to stop an iteration and firmly say that a number diverges or converges? Does it depend on the problem you're dealing with, or is there a method to do things like this?

Brute-force method you have to find your base case to terminate your program as in case of recursion. For Happy Prime number the base case is finding loop in your iteration.
Code
# sum of square of digit of n
def numSquareSum(n):
squareSum = 0
while(n):
squareSum += (n % 10) * (n % 10)
n = int(n / 10)
return squareSum
#method return true if n is Happy Number
def isHappyNumber(n):
li = []
while (1):
n = numSquareSum(n)
if (n == 1):
return True
if (n in li):
return False
li.append(n)
# Main Code
n = 7;
if (isHappyNumber(n)):
print(n , "is a Happy number")
else:
print(n , "is not a Happy number")
Hope this will be helpful.

I believe what you are describing is the Halting Problem. The main takeaway to Halting Problem is that computers cannot solve every problem. One of which being detecting generic infinite loops. It depends on the specific problem you are trying to solve, whether there is a solution to detecting infinite loops.
If you have anymore questions please refer to Wiki on the Halting Problem

In NASA's The Power of 10: Rules for Developing Safety-Critical Code there is a rule "All loops must have fixed bounds. This prevents runaway code."
In this trivial example this would be:
# sum of square of digit of n
def numSquareSum(n):
count = 0
squareSum = 0
while(n):
squareSum += (n % 10) * (n % 10)
n = int(n / 10)
count += 1
if count > 100000000:
raise Exception('loop bound exceeded')
return squareSum

Related

What is Pseudo-polynomial complexity?

Yes, I've seen this answer - What is pseudopolynomial time? How does it differ from polynomial time? - but I still don't understand.
Why does the representation in bits make a difference only sometimes?
For this program for example
function isPrime(n):
for i from 2 to n - 1:
if (n mod i) = 0, return false
return true
it says the complexity is not polynomial, because n requires log n bits to write out so the complexity is O(2^(4*log n)) but if i use that on every other problem then it could also be pseudopolynomial, right? (unless im getting it all wrong here). What makes this program so special to be measured in the amount of bits required to write out n?
You have linked to other questions where this is explained fairly well for someone who understands the concept, so here comes a very brief version.
for i from 2 to n - 1:
can be rewritten as
i = 2
while(i < n - 1):
if (n mod i) == 0:
return false
i = i + 1
Very often, we assume that the operations i < n - 1, i = i + 1 and n mod i are O(1). But this is not necessarily true. It is usually true for small values. And on a 32 bit machine, a "small value" is in the order of a billion.
Number that requires more than 32 bits to be represented will take more time to perform operations on than a number that fits in 32 bit. And it will take even more if it required more than 64 bit.
In practice, this rarely matters.
A very simple way to visualize this is to imagine that you get the task to implement the common mathematical operations where the operands are represented as strings. Here is a simple python function that takes two strings representing binary numbers and returns the sum as a string. It was quickly hacked together and assumes both strings has the same length. It may contain bugs and can most likely be refined. But it demonstrate the point. This function adds two numbers, but it will take longer time for longer numbers.
def binadd(a, b):
carry = '0'
result = list('0'*(len(a)+1))
for i in range(len(a)-1,-1, -1):
xor = '1' if (a[i] == '1') != (b[i] == '1') else '0'
val = '1' if (xor == '1') != (carry == '1') else '0'
carry = '1' if (carry == '1' and xor == '1') or (a[i] == '1' and b[i] == '1') else '0'
result[i] = val
result[0]=carry
return ''.join(result)
What makes this program so special to be measured in the amount of bits required to write out n?
There's nothing special about this particular program. At least not theoretical. In practice it is special in the sense that determining if a VERY big number is a prime is a common problem. Or to be more accurate, it would have been a much more common problem if there existed a very fast algorithm to do it. If it did, it would basically break encryption as we know it today.

Why does the following algorithm have runtime log(log(n))?

I don't understand how the runtime of the algorithm can be log(log(n)). Can someone help me?
s=1
while s <= (log n)^2 do
s=3s
Notation note: log(n) indicates log2(n) throughout the solution.
Well, I suppose (log n)^2 indicates the square of log(n), which means log(n)*log(n). Let us try to analyze the algorithm.
It starts from s=1 and goes like 1,3,9,27...
Since it goes by the exponents of 3, after each iteration s can be shown as 3^m, m being the number of iterations starting from 1.
We will do these iterations until s becomes bigger than log(n)*log(n). So at some point 3^m will be equal to log(n)*log(n).
Solve the equation:
3^m = log(n) * log(n)
m = log3(log(n) * log(n))
Time complexity of the algorithm can be shown as O(m). We have to express m in terms of n.
log3(log(n) * log(n)) = log3(log(n)) + log3(log(n))
= 2 * log3(log(n)) For Big-Oh notation, constants do not matter. So let us get rid of 2.
Time complexity = O(log3(log(n)))
Well ok, here is the deal: By the definition of Big-Oh notation, it represents an upper bound runtime for our function. Therefore O(n) ⊆ O(n^2).
Notice that log3(a) < log2(a) after a point.
By the same logic we can conclude that O(log3(log(n)) ⊆ O(log(log(n)).
So the time complexity of the algorithm : O(log(logn))
Not the most scientific explanation, but I hope you got the point.
This follows as a special case of a more general principle. Consider the following loop:
s = 1
while s < k:
s = 3s
How many times will this loop run? Well, the values of s taken on will be equal to 1, 3, 9, 27, 81, ... = 30, 31, 32, 33, ... . And more generally, on the ith iteration of the loop, the value of s will be 3i.
This loop stops running at soon as 3i overshoots k. To figure out where that is, we can equate and solve:
3i = k
i = log3 k
So this loop will run a total of log3 k times.
Now, what do you think would happen if we used this loop instead?
s = 1
while s < k:
s = 4s
Using similar logic, the number of loop iterations would be log4 k. And more generally, if we have the following loop:
s = 1
while s < k:
s = c * s
Then assuming c > 1, the number of iterations will be logc k.
Given this, let's look at your loop:
s = 1
while s <= (log n)^2 do
s = 3s
Using the reasoning from above, the number of iterations of this loop works out to log3 (log n)2. Using properties of logarithms, we can simplify this to
log3 (log n)2
= 2 log3 log n
= O(log log n).

Time complexity of for loops, I cannot really understand a thing

So these are the for loops that I have to find the time complexity, but I am not really clearly understood how to calculate.
for (int i = n; i > 1; i /= 3) {
for (int j = 0; j < n; j += 2) {
... ...
}
for (int k = 2; k < n; k = (k * k) {
...
}
For the first line, (int i = n; i > 1; i /= 3), keeps diving i by 3 and if i is less than 1 then the loop stops there, right?
But what is the time complexity of that? I think it is n, but I am not really sure.
The reason why I am thinking it is n is, if I assume that n is 30 then i will be like 30, 10, 3, 1 then the loop stops. It runs n times, doesn't it?
And for the last for loop, I think its time complexity is also n because what it does is
k starts as 2 and keeps multiplying itself to itself until k is greater than n.
So if n is 20, k will be like 2, 4, 16 then stop. It runs n times too.
I don't really think I am understanding this kind of questions because time complexity can be log(n) or n^2 or etc but all I see is n.
I don't really know when it comes to log or square. Or anything else.
Every for loop runs n times, I think. How can log or square be involved?
Can anyone help me understanding this? Please.
Since all three loops are independent of each other, we can analyse them separately and multiply the results at the end.
1. i loop
A classic logarithmic loop. There are countless examples on SO, this being a similar one. Using the result given on that page and replacing the division constant:
The exact number of times that this loop will execute is ceil(log3(n)).
2. j loop
As you correctly figured, this runs O(n / 2) times;
The exact number is floor(n / 2).
3. k loop
Another classic known result - the log-log loop. The code just happens to be an exact replicate of this SO post;
The exact number is ceil(log2(log2(n)))
Combining the above steps, the total time complexity is given by
Note that the j-loop overshadows the k-loop.
Numerical tests for confirmation
JavaScript code:
T = function(n) {
var m = 0;
for (var i = n; i > 1; i /= 3) {
for (var j = 0; j < n; j += 2)
m++;
for (var k = 2; k < n; k = k * k)
m++;
}
return m;
}
M = function(n) {
return ceil(log(n)/log(3)) * (floor(n/2) + ceil(log2(log2(n))));
}
M(n) is what the math predicts that T(n) will exactly be (the number of inner loop executions):
n T(n) M(n)
-----------------------
100000 550055 550055
105000 577555 577555
110000 605055 605055
115000 632555 632555
120000 660055 660055
125000 687555 687555
130000 715055 715055
135000 742555 742555
140000 770055 770055
145000 797555 797555
150000 825055 825055
M(n) matches T(n) perfectly as expected. A plot of T(n) against n log n (the predicted time complexity):
I'd say that is a convincing straight line.
tl;dr; I describe a couple of examples first, I analyze the complexity of the stated problem of OP at the bottom of this post
In short, the big O notation tells you something about how a program is going to perform if you scale the input.
Imagine a program (P0) that counts to 100. No matter how often you run the program, it's going to count to 100 as fast each time (give or take). Obviously right?
Now imagine a program (P1) that counts to a number that is variable, i.e. it takes a number as an input to which it counts. We call this variable n. Now each time P1 runs, the performance of P1 is dependent on the size of n. If we make n a 100, P1 will run very quickly. If we make n equal to a googleplex, it's going to take a little longer.
Basically, the performance of P1 is dependent on how big n is, and this is what we mean when we say that P1 has time-complexity O(n).
Now imagine a program (P2) where we count to the square root of n, rather than to itself. Clearly the performance of P2 is going to be worse than P1, because the number to which they count differs immensely (especially for larger n's (= scaling)). You'll know by intuition that P2's time-complexity is equal to O(n^2) if P1's complexity is equal to O(n).
Now consider a program (P3) that looks like this:
var length= input.length;
for(var i = 0; i < length; i++) {
for (var j = 0; j < length; j++) {
Console.WriteLine($"Product is {input[i] * input[j]}");
}
}
There's no n to be found here, but as you might realise, this program still depends on an input called input here. Simply because the program depends on some kind of input, we declare this input as n if we talk about time-complexity. If a program takes multiple inputs, we simply call those different names so that a time-complexity could be expressed as O(n * n2 + m * n3) where this hypothetical program would take 4 inputs.
For P3, we can discover it's time-complexity by first analyzing the number of different inputs, and then by analyzing in what way it's performance depends on the input.
P3 has 3 variables that it's using, called length, i and j. The first line of code does a simple assignment, which' performance is not dependent on any input, meaning the time-complexity of that line of code is equal to O(1) meaning constant time.
The second line of code is a for loop, implying we're going to do something that might depend on the length of something. And indeed we can tell that this first for loop (and everything in it) will be executed length times. If we increase the size of length, this line of code will do linearly more, thus this line of code's time complexity is O(length) (called linear time).
The next line of code will take O(length) time again, following the same logic as before, however since we are executing this every time execute the for loop around it, the time complexity will be multiplied by it: which results in O(length) * O(length) = O(length^2).
The insides of the second for loop do not depend on the size of the input (even though the input is necessary) because indexing on the input (for arrays!!) will not become slower if we increase the size of the input. This means that the insides will be constant time = O(1). Since this runs in side of the other for loop, we again have to multiply it to obtain the total time complexity of the nested lines of code: `outside for-loops * current block of code = O(length^2) * O(1) = O(length^2).
The total time-complexity of the program is just the sum of everything we've calculated: O(1) + O(length^2) = O(length^2) = O(n^2). The first line of code was O(1) and the for loops were analyzed to be O(length^2). You will notice 2 things:
We rename length to n: We do this because we express
time-complexity based on generic parameters and not on the ones that
happen to live within the program.
We removed O(1) from the equation. We do this because we're only
interested in the biggest terms (= fastest growing). Since O(n^2)
is way 'bigger' than O(1), the time-complexity is defined equal to
it (this only works like that for terms (e.g. split by +), not for
factors (e.g. split by *).
OP's problem
Now we can consider your program (P4) which is a little trickier because the variables within the program are defined a little cloudier than the ones in my examples.
for (int i = n; i > 1; i /= 3) {
for (int j = 0; j < n; j += 2) {
... ...
}
for (int k = 2; k < n; k = (k * k) {
...
}
}
If we analyze we can say this:
The first line of code is executed O(cbrt(3)) times where cbrt is the cubic root of it's input. Since i is divided by 3 every loop, the cubic root of n is the number of times the loop needs to be executed before i is smaller or equal to 1.
The second for loop is linear in time because j is executed
O(n / 2) times because it is increased by 2 rather than 1 which
would be 'normal'. Since we know that O(n/2) = O(n), we can say
that this for loop is executed O(cbrt(3)) * O(n) = O(n * cbrt(n)) times (first for * the nested for).
The third for is also nested in the first for, but since it is not nested in the second for, we're not going to multiply it by the second one (obviously because it is only executed each time the first for is executed). Here, k is bound by n, however since it is increased by a factor of itself each time, we cannot say it is linear, i.e. it's increase is defined by a variable rather than by a constant. Since we increase k by a factor of itself (we square it), it will reach n in 2log(n) steps. Deducing this is easy if you understand how log works, if you don't get this you need to understand that first. In any case, since we analyze that this for loop will be run O(2log(n)) time, the total complexity of the third for is O(cbrt(3)) * O(2log(n)) = O(cbrt(n) *2log(n))
The total time-complexity of the program is now calculated by the sum of the different sub-timecomplexities: O(n * cbrt(n)) + O(cbrt(n) *2log(n))
As we saw before, we only care about the fastest growing term if we talk about big O notation, so we say that the time-complexity of your program is equal to O(n * cbrt(n)).

Optimization of If statement in a Loop

I have the following code that runs inside a loop that is executed 100 000 times per frame (it is a game):
If (_vertices(vertexIndex).X > _currentPosition.X - 100) And (_vertices(vertexIndex).X < _currentPosition.X + 100) And (_vertices(vertexIndex).X Mod 4) And (_vertices(vertexIndex).Z Mod 4) Then
_grassPatches(i Mod 9).Render(_vertices(vertexIndex))
End If
With this code my game runs at around 8 FPS. If I comment out the Render line, the game runs at around 100 FPS, however, if I comment out the whole If loop, the framerate increases to around 400 FPS. I don't understand why does this If ... And ... And ... And ... Then loop slows down my game so much. Is it because of the multiple Ands?
Any help would be appreciated.
EDIT 1:
Here is one of the ways I tried to improve the performance (also includes some extra code to show context):
Dim i As Integer = 0
Dim vertex As Vector3
Dim curPosX As Integer
For vertexIndex As Integer = _startIndex To _endIndex
vertex = _vertices(vertexIndex)
curPosX = _currentPosition.X
If (vertex.X > curPosX - 100) And (vertex.X < curPosX + 100) And (vertex.X Mod 4) And (vertex.Z Mod 4) Then
_grassPatches(i Mod 9).Render(_vertices(vertexIndex))
End If
i += 1
Next
EDIT 2: could my problem be due to a Branch Prediction fail? (Why is it faster to process a sorted array than an unsorted array?)
EDIT 3: I also tried replacing all the Ands with AndAlsos. This didn't lead to any performance benefit.
Your problem probably comes from the use of the Mod operator. If you can avoid using it, or find another way of getting to your result, it will make your loop faster.
Cheers

Optimization of "static" loops

I'm writing a compiled language for fun, and I've recently gotten on a kick for making my optimizing compiler very robust. I've figured out several ways to optimize some things, for instance, 2 + 2 is always 4, so we can do that math at compile time, if(false){ ... } can be removed entirely, etc, but now I've gotten to loops. After some research, I think that what I'm trying to do isn't exactly loop unrolling, but it is still an optimization technique. Let me explain.
Take the following code.
String s = "";
for(int i = 0; i < 5; i++){
s += "x";
}
output(s);
As a human, I can sit here and tell you that this is 100% of the time going to be equivalent to
output("xxxxx");
So, in other words, this loop can be "compiled out" entirely. It's not loop unrolling, but what I'm calling "fully static", that is, there are no inputs that would change the behavior of the segment. My idea is that anything that is fully static can be resolved to a single value, anything that relies on input or makes conditional output of course can't be optimized further. So, from the machine's point of view, what do I need to consider? What makes a loop "fully static?"
I've come up with three types of loops that I need to figure out how to categorize. Loops that will always end up with the same machine state after every run, regardless of inputs, loops that WILL NEVER complete, and loops that I can't figure out one way or the other. In the case that I can't figure it out (it conditionally changes how many times it will run based on dynamic inputs), I'm not worried about optimizing. Loops that are infinite will be a compile error/warning unless specifically suppressed by the programmer, and loops that are the same every time should just skip directly to putting the machine in the proper state, without looping.
The main case of course to optimize is the static loop iterations, when all the function calls inside are also static. Determining if a loop has dynamic components is easy enough, and if it's not dynamic, I guess it has to be static. The thing I can't figure out is how to detect if it's going to be infinite or not. Does anyone have any thoughts on this? I know this is a subset of the halting problem, but I feel it's solvable; the halting problem is a problem due to the fact that for some subsets of programs, you just can't tell it may run forever, it may not, but I don't want to consider those cases, I just want to consider the cases where it WILL halt, or it WILL NOT halt, but first I have to distinguish between the three states.
This looks like a kind of a symbolic solver that can be defined for several classes, but not generally.
Let's restrict the requirements a bit: no number overflow, just for loops (while can be sometimes transformed to full for loop, except when using continue etc.), no breaks, no modifications of the control variable inside the for loop.
for (var i = S; E(i); i = U(i)) ...
where E(i) and U(i) are expressions that can be symbolically manipulated. There are several classes that are relatively easy:
U(i) = i + CONSTANT : n-th cycle the value of i is S + n * CONSTANT
U(i) = i * CONSTANT : n-th cycle the value of i is S * CONSTANT^n
U(i) = i / CONSTANT : n-th cycle the value of i is S * CONSTANT^-n
U(i) = (i + CONSTANT) % M : n-th cycle the value of i is (S + n * CONSTANT) % M
and some other quite easy combinations (and some very difficult ones)
Determining whether the loop terminates is searching for n where E(i(n)) is false.
This can be done by some symbolic manipulation for a lot of cases, but there is a lot of work involved in making the solver.
E.g.
for(int i = 0; i < 5; i++),
i(n) = 0 + n * 1 = n, E(i(n)) => not(n < 5) =>
n >= 5 => stops for n = 5
for(int i = 0; i < 5; i--),
i(n) = 0 + n * -1 = -n, E(i(n)) => not(-n < 5) => -n >= 5 =>
n < -5 - since n is a non-negative whole number this is never true - never stops
for(int i = 0; i < 5; i = (i + 1) % 3),
E(i(n)) => not(n % 3 < 5) => n % 3 >= 5 => this is never true => never stops
for(int i = 10; i + 10 < 500; i = i + 2 * i) =>
for(int i = 10; i < 480; i = 3 * i),
i(n) = 10 * 3^n,
E(i(n)) => not(10 * 3^n < 480) => 10 * 3^n >= 480 => 3^n >= 48 => n >= log3(48) => n >= 3.5... =>
since n is whole => it will stop for n = 4
for other cases it would be good if they can get transformed to the ones you can already solve...
Many tricks for symbolic manipulation come from Lisp era, and are not too difficult. Although the ones described (or variants) are the most common types practice, there are many more difficult and/or impossible to solve scenarios.