Can one prove correctness of a function with side-effects - testing

I'm reading a lot of things about "correctness proof"* in algorithms.
Some say that proofs apply to algorithms and not implementations, but the demonstrations are done with code source most of the time, not math. And code source may have side effects. So, i would like to know if any basic principle prevent someone to prove an impure function correct.
I feel it's true, but cannot say why. If such principle exists, could you explain it ?
Thanks
* Sorry if wording is incorrect, not sure what the good one would be.

The answer is that, although there are no side effects in math, it is possible to mathematically model code that has side effects.
In fact, we can even pull this trick to turn impure code into pure code (without having to go to math in the first place. So, instead of the (psuedocode) function:
f(x) = {
y := y + x
return y
}
...we could write:
f(x, state_before) = {
let old_y = lookup_y(state_before)
let state_after = update_y(state_before, old_y + x)
let new_y = lookup_y(state_after)
return (new_y, state_after)
}
...which can accomplish the same thing with no side effects. Of course, the entire program would have to be rewritten to explicitly pass these state values around, and you'd need to write appropriate lookup_ and update_ functions for all mutable variables, but it's a theoretically straightforward process.
Of course, no one wants to program this way. (Haskell does something similar to simulate side effects without having them be part of the language, but a lot of work went into making it more ergonomic than this.) But because it can be done, we know that side-effects are a well-defined concept.
This means that we can prove things about languages with side-effects, provided that their specifications provide us with enough information to know how to rewrite programs in them into state-passing style.

Related

Set union in prolog with variables

I am searching some SWI-Prolog function which is able to make some set union with variables as parameters inside. My aim is to make the union first and define the parameters at further on in source code.
Means eg. I have some function union and the call union(A, B, A_UNION_B) makes sense. Means further more the call:
union(A, [1,2], C), A=[3].
would give me as result
C = [3, 1, 2].
(What you call union/3 is most probably just concatenation, so I will use append/3 for keeping this answer short.)
What you expect is impossible without delayed goals or constraints. To see this, consider the following failure-slice
?- append(A, [1,2], C), false, A=[3].
loops, unexpected. % observed, but for us unexpected
false. % expected, but not the case
This query must terminate, in order to make the entire question useful. But there are infinitely many lists of different length for A. So in order to describe all possible solutions, we would need infinitely many answer substitutions, like
?- append(A, [1,2], C).
A = [], C = [1,2]
; A = [_A], C = [_A,1,2]
; A = [_A,_B], C = [_A,_B,1,2]
; A = [_A,_B,_C], C = [_A,_B,_C,1,2]
; ... .
The only way around is to describe that set of solutions with finitely many answers. One possibility could be:
?- when((ground(A);ground(C)), append(A,B,C)).
when((ground(A);ground(C)),append(A,B,C)).
Essentially it reads: Yes, the query is true, provided the query is true.
While this solves your exact problem, it will now delay many otherwise succeeding goals, think of A = [X], B = [].
A more elaborate version could provide more complex tests. But it would require a somehow different definition than append/3 is. Some systems like sicstus-prolog provide block declarations to make this more smoothly (SWI has a coarse emulation for that).
So it is possible to make this even better, but the question remains whether or not this makes much sense. After all, debugging delayed goals becomes more and more difficult with larger programs.
In many situations it is preferable to prevent this and produce an instantiation error in its stead as iwhen/2 does:
?- iwhen((ground(A);ground(C)),append(A,B,C)).
error(instantiation_error,iwhen/2).
That error is not the nicest answer possible, but at least it is not incorrect. It says: You need to provide more instantiations.
If you really want to solve this problem for the general case you have to delve into E-unification. That is an area with most trivial problem statements and extremely evolved answers. Often, just decidability is non-trivial let alone an effective algorithm. For your particular question, either ACI (for sets) or ANlr (for concatenation) are of interest. Where ACI requires solving Diophantine Equations and associative unification alone is even more complex than that. I am unaware of any such implementation for a Prolog system that solves the general problem.
Prolog IV offered an associative infix operator for concatenation but simply delayed more complex cases. So debugging these remains non-trivial.

If I come from an imperative programming background, how do I wrap my head around the idea of no dynamic variables to keep track of things in Haskell?

So I'm trying to teach myself Haskell. I am currently on the 11th chapter of Learn You a Haskell for Great Good and am doing the 99 Haskell Problems as well as the Project Euler Problems.
Things are going alright, but I find myself constantly doing something whenever I need to keep track of "variables". I just create another function that accepts those "variables" as parameters and recursively feed it different values depending on the situation. To illustrate with an example, here's my solution to Problem 7 of Project Euler, Find the 10001st prime:
answer :: Integer
answer = nthPrime 10001
nthPrime :: Integer -> Integer
nthPrime n
| n < 1 = -1
| otherwise = nthPrime' n 1 2 []
nthPrime' :: Integer -> Integer -> Integer -> [Integer] -> Integer
nthPrime' n currentIndex possiblePrime previousPrimes
| isFactorOfAnyInThisList possiblePrime previousPrimes = nthPrime' n currentIndex theNextPossiblePrime previousPrimes
| otherwise =
if currentIndex == n
then possiblePrime
else nthPrime' n currentIndexPlusOne theNextPossiblePrime previousPrimesPlusCurrentPrime
where currentIndexPlusOne = currentIndex + 1
theNextPossiblePrime = nextPossiblePrime possiblePrime
previousPrimesPlusCurrentPrime = possiblePrime : previousPrimes
I think you get the idea. Let's also just ignore the fact that this solution can be made to be more efficient, I'm aware of this.
So my question is kind of a two-part question. First, am I going about Haskell all wrong? Am I stuck in the imperative programming mindset and not embracing Haskell as I should? And if so, as I feel I am, how do avoid this? Is there a book or source you can point me to that might help me think more Haskell-like?
Your help is much appreciated,
-Asaf
Am I stuck in the imperative programming mindset and not embracing
Haskell as I should?
You are not stuck, at least I don't hope so. What you experience is absolutely normal. While you were working with imperative languages you learned (maybe without knowing) to see programming problems from a very specific perspective - namely in terms of the van Neumann machine.
If you have the problem of, say, making a list that contains some sequence of numbers (lets say we want the first 1000 even numbers), you immediately think of: a linked list implementation (perhaps from the standard library of your programming language), a loop and a variable that you'd set to a starting value and then you would loop for a while, updating the variable by adding 2 and putting it to the end of the list.
See how you mostly think to serve the machine? Memory locations, loops, etc.!
In imperative programming, one thinks about how to manipulate certain memory cells in a certain order to arrive at the solution all the time. (This is, btw, one reason why beginners find learning (imperative) programming hard. Non programmers are simply not used to solve problems by reducing it to a sequence of memory operations. Why should they? But once you've learned that, you have the power - in the imperative world. For functional programming you need to unlearn that.)
In functional programming, and especially in Haskell, you merely state the construction law of the list. Because a list is a recursive data structure, this law is of course also recursive. In our case, we could, for example say the following:
constructStartingWith n = n : constructStartingWith (n+2)
And almost done! To arrive at our final list we only have to say where to start and how many we want:
result = take 1000 (constructStartingWith 0)
Note that a more general version of constructStartingWith is available in the library, it is called iterate and it takes not only the starting value but also the function that makes the next list element from the current one:
iterate f n = n : iterate f (f n)
constructStartingWith = iterate (2+) -- defined in terms of iterate
Another approach is to assume that we had another list our list could be made from easily. For example, if we had the list of the first n integers we could make it easily into the list of even integers by multiplying each element with 2. Now, the list of the first 1000 (non-negative) integers in Haskell is simply
[0..999]
And there is a function map that transforms lists by applying a given function to each argument. The function we want is to double the elements:
double n = 2*n
Hence:
result = map double [0..999]
Later you'll learn more shortcuts. For example, we don't need to define double, but can use a section: (2*) or we could write our list directly as a sequence [0,2..1998]
But not knowing these tricks yet should not make you feel bad! The main challenge you are facing now is to develop a mentality where you see that the problem of constructing the list of the first 1000 even numbers is a two staged one: a) define how the list of all even numbers looks like and b) take a certain portion of that list. Once you start thinking that way you're done even if you still use hand written versions of iterate and take.
Back to the Euler problem: Here we can use the top down method (and a few basic list manipulation functions one should indeed know about: head, drop, filter, any). First, if we had the list of primes already, we can just drop the first 1000 and take the head of the rest to get the 1001th one:
result = head (drop 1000 primes)
We know that after dropping any number of elements form an infinite list, there will still remain a nonempty list to pick the head from, hence, the use of head is justified here. When you're unsure if there are more than 1000 primes, you should write something like:
result = case drop 1000 primes of
[] -> error "The ancient greeks were wrong! There are less than 1001 primes!"
(r:_) -> r
Now for the hard part. Not knowing how to proceed, we could write some pseudo code:
primes = 2 : {-an infinite list of numbers that are prime-}
We know for sure that 2 is the first prime, the base case, so to speak, thus we can write it down. The unfilled part gives us something to think about. For example, the list should start at some value that is greater 2 for obvious reason. Hence, refined:
primes = 2 : {- something like [3..] but only the ones that are prime -}
Now, this is the point where there emerges a pattern that one needs to learn to recognize. This is surely a list filtered by a predicate, namely prime-ness (it does not matter that we don't know yet how to check prime-ness, the logical structure is the important point. (And, we can be sure that a test for prime-ness is possible!)). This allows us to write more code:
primes = 2 : filter isPrime [3..]
See? We are almost done. In 3 steps, we have reduced a fairly complex problem in such a way that all that is left to write is a quite simple predicate.
Again, we can write in pseudocode:
isPrime n = {- false if any number in 2..n-1 divides n, otherwise true -}
and can refine that. Since this is almost haskell already, it is too easy:
isPrime n = not (any (divides n) [2..n-1])
divides n p = n `rem` p == 0
Note that we did not do optimization yet. For example we can construct the list to be filtered right away to contain only odd numbers, since we know that even ones are not prime. More important, we want to reduce the number of candidates we have to try in isPrime. And here, some mathematical knowledge is needed (the same would be true if you programmed this in C++ or Java, of course), that tells us that it suffices to check if the n we are testing is divisible by any prime number, and that we do not need to check divisibility by prime numbers whose square is greater than n. Fortunately, we have already defined the list of prime numbers and can pick the set of candidates from there! I leave this as exercise.
You'll learn later how to use the standard library and the syntactic sugar like sections, list comprehensions, etc. and you will gradually give up to write your own basic functions.
Even later, when you have to do something in an imperative programming language again, you'll find it very hard to live without infinte lists, higher order functions, immutable data etc.
This will be as hard as going back from C to Assembler.
Have fun!
It's ok to have an imperative mindset at first. With time you will get more used to things and start seeing the places where you can have more functional programs. Practice makes perfect.
As for working with mutable variables you can kind of keep them for now if you follow the rule of thumb of converting variables into function parameters and iteration into tail recursion.
Off the top of my head:
Typeclassopedia. The official v1 of the document is a pdf, but the author has moved his v2 efforts to the Haskell wiki.
What is a monad? This SO Q&A is the best reference I can find.
What is a Monad Transformer? Monad Transformers Step by Step.
Learn from masters: Good Haskell source to read and learn from.
More advanced topics such as GADTs. There's a video, which does a great job explaining it.
And last but not least, #haskell IRC channel. Nothing can even come close to talk to real people.
I think the big change from your code to more haskell like code is using higher order functions, pattern matching and laziness better. For example, you could write the nthPrime function like this (using a similar algorithm to what you did, again ignoring efficiency):
nthPrime n = primes !! (n - 1) where
primes = filter isPrime [2..]
isPrime p = isPrime' p [2..p - 1]
isPrime' p [] = True
isPrime' p (x:xs)
| (p `mod` x == 0) = False
| otherwise = isPrime' p xs
Eg nthPrime 4 returns 7. A few things to note:
The isPrime' function uses pattern matching to implement the function, rather than relying on if statements.
the primes value is an infinite list of all primes. Since haskell is lazy, this is perfectly acceptable.
filter is used rather than reimplemented that behaviour using recursion.
With more experience you will find you will write more idiomatic haskell code - it sortof happens automatically with experience. So don't worry about it, just keep practicing, and reading other people's code.
Another approach, just for variety! Strong use of laziness...
module Main where
nonmults :: Int -> Int -> [Int] -> [Int]
nonmults n next [] = []
nonmults n next l#(x:xs)
| x < next = x : nonmults n next xs
| x == next = nonmults n (next + n) xs
| otherwise = nonmults n (next + n) l
select_primes :: [Int] -> [Int]
select_primes [] = []
select_primes (x:xs) =
x : (select_primes $ nonmults x (x + x) xs)
main :: IO ()
main = do
let primes = select_primes [2 ..]
putStrLn $ show $ primes !! 10000 -- the first prime is index 0 ...
I want to try to answer your question without using ANY functional programming or math, not because I don't think you will understand it, but because your question is very common and maybe others will benefit from the mindset I will try to describe. I'll preface this by saying I an not a Haskell expert by any means, but I have gotten past the mental block you have described by realizing the following:
1. Haskell is simple
Haskell, and other functional languages that I'm not so familiar with, are certainly very different from your 'normal' languages, like C, Java, Python, etc. Unfortunately, the way our psyche works, humans prematurely conclude that if something is different, then A) they don't understand it, and B) it's more complicated than what they already know. If we look at Haskell very objectively, we will see that these two conjectures are totally false:
"But I don't understand it :("
Actually you do. Everything in Haskell and other functional languages is defined in terms of logic and patterns. If you can answer a question as simple as "If all Meeps are Moops, and all Moops are Moors, are all Meeps Moors?", then you could probably write the Haskell Prelude yourself. To further support this point, consider that Haskell lists are defined in Haskell terms, and are not special voodoo magic.
"But it's complicated"
It's actually the opposite. It's simplicity is so naked and bare that our brains have trouble figuring out what to do with it at first. Compared to other languages, Haskell actually has considerably fewer "features" and much less syntax. When you read through Haskell code, you'll notice that almost all the function definitions look the same stylistically. This is very different than say Java for example, which has constructs like Classes, Interfaces, for loops, try/catch blocks, anonymous functions, etc... each with their own syntax and idioms.
You mentioned $ and ., again, just remember they are defined just like any other Haskell function and don't necessarily ever need to be used. However, if you didn't have these available to you, over time, you would likely implement these functions yourself when you notice how convenient they can be.
2. There is no Haskell version of anything
This is actually a great thing, because in Haskell, we have the freedom to define things exactly how we want them. Most other languages provide building blocks that people string together into a program. Haskell leaves it up to you to first define what a building block is, before building with it.
Many beginners ask questions like "How do I do a For loop in Haskell?" and innocent people who are just trying to help will give an unfortunate answer, probably involving a helper function, and extra Int parameter, and tail recursing until you get to 0. Sure, this construct can compute something like a for loop, but in no way is it a for loop, it's not a replacement for a for loop, and in no way is it really even similar to a for loop if you consider the flow of execution. Similar is the State monad for simulating state. It can be used to accomplish similar things as static variables do in other languages, but in no way is it the same thing. Most people leave off the last tidbit about it not being the same when they answer these kinds of questions and I think that only confuses people more until they realize it on their own.
3. Haskell is a logic engine, not a programming language
This is probably least true point I'm trying to make, but hear me out. In imperative programming languages, we are concerned with making our machines do stuff, perform actions, change state, and so on. In Haskell, we try to define what things are, and how are they supposed to behave. We are usually not concerned with what something is doing at any particular time. This certainly has benefits and drawbacks, but that's just how it is. This is very different than what most people think of when you say "programming language".
So that's my take how how to leave an imperative mindset and move to a more functional mindset. Realizing how sensible Haskell is will help you not look at your own code funny anymore. Hopefully thinking about Haskell in these ways will help you become a more productive Haskeller.

Does the order of expression to check in boolean statement affect performance

If I have a Boolean expression to check
(A && B)
If A is found to be false will the language bother to check B? Does this vary from language to language?
The reason I ask is that I'm wondering if it's the case that B is checked even if A is false then wouldn't
if (A) {
if(B) {
} else {
// code x
}
} else {
// code x
}
be marginally quicker than
if (A && B) {
} else {
// code x
}
This depends on the language. Most languages will implement A && B as a short-circuit operator, meaning that if A evaluates to false, B will never be evaluated. There's a detailed list on Wikipedia.
Almost every language implements something called short-circuit evaluation, which means that yes, (A && B) will not evaluate B if A is false. This also takes effect if you write:
if (A || B) {
...
}
and A is true. This is worth remembering if B may take a long time to load, but generally it's not something to worry about.
As a bit of history, in my mind this is a bit of a sore part of LISP because code like this:
(if (and (= x 5) (my-expensive-query y)) "Yes" "No")
is not made of functions, but rather so-called "special forms" (that is, "and" could not be defun'd here).
This would depend 100% on how the language compiles said code. Anything is possible :)
Is there a specific language you're wondering about?
In short, no. A double branch involves various forms of branch prediction. If A and B are simple to evaluate, it may be faster to do if (A && B) in a non-short-circuit way than if (A) if (B). In addition, you've duplicated code in the second form. This is virtually always (exception to every rule .. I guess) bad and far worse than any gain.
Secondly, this is the kind of micro-optimization that you give to the language interpreter, JIT or compiler.
Many languages (including almost all curly-brace languages, like C/C++/Java/C#) offer short-circuit boolean evaluation. In those languages, if A is false then B won't be evaluated. You'll need to see (or ask) whether this is true for your specific language, or whether there's a way to do it (VB has AndAlso, for example).
If you find your language doesn't support it, you'll also need to consider whether the cost of evaluating B is worth having to maintain two identical pieces of code -- and the potential doubling in cache footprint (not to mention all the extra branching) that'd come from doing that duplication every time.
As others have said, it depends on the language and/or compiler. For me, I don't care how fast or slow it might be to short-circuit or not, the need to duplicate code is a deal-killer.
If A and B are actually calls that have side-effects (i.e. they do more than simply return a value suitable for comparison), then I would argue that those calls should be made into variable assignments that are then used in your comparison. It doesn't matter whether or not you always require those side-effects or only require them conditionally, the code will be more readable if you don't depend on whether or not short-circuit exists.
That last bit about readability is based on my feeling that reducing the need to refer to external documentation improves readability. Reading a book with a bunch of new words that require dictionary look-ups is much more challenging than reading that same book when you already have the necessary vocabulary. In this case, short-circuit is invisible, so anybody that needs to look it up won't even know that they need to look it up.

Is it possible to compare two Objective-C blocks by content?

float pi = 3.14;
float (^piSquare)(void) = ^(void){ return pi * pi; };
float (^piSquare2)(void) = ^(void){ return pi * pi; };
[piSquare isEqualTo: piSquare2]; // -> want it to behave like -isEqualToString...
To expand on Laurent's answer.
A Block is a combination of implementation and data. For two blocks to be equal, they would need to have both the exact same implementation and have captured the exact same data. Comparison, thus, requires comparing both the implementation and the data.
One might think comparing the implementation would be easy. It actually isn't because of the way the compiler's optimizer works.
While comparing simple data is fairly straightforward, blocks can capture objects-- including C++ objects (which might actually work someday)-- and comparison may or may not need to take that into account. A naive implementation would simply do a byte level comparison of the captured contents. However, one might also desire to test equality of objects using the object level comparators.
Then there is the issue of __block variables. A block, itself, doesn't actually have any metadata related to __block captured variables as it doesn't need it to fulfill the requirements of said variables. Thus, comparison couldn't compare __block values without significantly changing compiler codegen.
All of this is to say that, no, it isn't currently possible to compare blocks and to outline some of the reasons why. If you feel that this would be useful, file a bug via http://bugreport.apple.com/ and provide a use case.
Putting aside issues of compiler implementation and language design, what you're asking for is provably undecidable (unless you only care about detecting 100% identical programs). Deciding if two programs compute the same function is equivalent to solving the halting problem. This is a classic consequence of Rice's Theorem: Any "interesting" property of Turing machines is undecidable, where "interesting" just means that it's true for some machines and false for others.
Just for fun, here's the proof. Assume we can create a function to decide if two blocks are equivalent, called EQ(b1, b2). Now we'll use that function to solve the halting problem. We create a new function HALT(M, I) that tells us if Turing machine M will halt on input I like so:
BOOL HALT(M,I) {
return EQ(
^(int) {return 0;},
^(int) {M(I); return 0;}
);
}
If M(I) halts then the blocks are equivalent, so HALT(M,I) returns YES. If M(I) doesn't halt then the blocks are not equivalent, so HALT(M,I) returns NO. Note that we don't have to execute the blocks -- our hypothetical EQ function can compute their equivalence just by looking at them.
We have now solved the halting problem, which we know is not possible. Therefore, EQ cannot exist.
I don't think this is possible. Blocks can be roughly seen as advanced functions (with access to global or local variables). The same way you cannot compare functions' content, you cannot compare blocks' content.
All you can do is to compare their low-level implementation, but I doubt that the compiler will guarantee that two blocks with the same content share their implementation.

Is while (true) with break bad programming practice?

I often use this code pattern:
while(true) {
//do something
if(<some condition>) {
break;
}
}
Another programmer told me that this was bad practice and that I should replace it with the more standard:
while(!<some condition>) {
//do something
}
His reasoning was that you could "forget the break" too easily and have an endless loop. I told him that in the second example you could just as easily put in a condition which never returned true and so just as easily have an endless loop, so both are equally valid practices.
Further, I often prefer the former as it makes the code easier to read when you have multiple break points, i.e. multiple conditions which get out of the loop.
Can anyone enrichen this argument by adding evidence for one side or the other?
There is a discrepancy between the two examples. The first will execute the "do something" at least once every time even if the statement is never true. The second will only "do something" when the statement evaluates to true.
I think what you are looking for is a do-while loop. I 100% agree that while (true) is not a good idea because it makes it hard to maintain this code and the way you are escaping the loop is very goto esque which is considered bad practice.
Try:
do {
//do something
} while (!something);
Check your individual language documentation for the exact syntax. But look at this code, it basically does what is in the do, then checks the while portion to see if it should do it again.
To quote that noted developer of days gone by, Wordsworth:
...
In truth the prison, unto which we doom
Ourselves, no prison is; and hence for me,
In sundry moods, 'twas pastime to be bound
Within the Sonnet's scanty plot of ground;
Pleased if some souls (for such their needs must be)
Who have felt the weight of too much liberty,
Should find brief solace there, as I have found.
Wordsworth accepted the strict requirements of the sonnet as a liberating frame, rather than as a straightjacket. I'd suggest that the heart of "structured programming" is about giving up the freedom to build arbitrarily-complex flow graphs in favor of a liberating ease of understanding.
I freely agree that sometimes an early exit is the simplest way to express an action. However, my experience has been that when I force myself to use the simplest possible control structures (and really think about designing within those constraints), I most often find that the result is simpler, clearer code. The drawback with
while (true) {
action0;
if (test0) break;
action1;
}
is that it's easy to let action0 and action1 become larger and larger chunks of code, or to add "just one more" test-break-action sequence, until it becomes difficult to point to a specific line and answer the question, "What conditions do I know hold at this point?" So, without making rules for other programmers, I try to avoid the while (true) {...} idiom in my own code whenever possible.
When you can write your code in the form
while (condition) { ... }
or
while (!condition) { ... }
with no exits (break, continue, or goto) in the body, that form is preferred, because someone can read the code and understand the termination condition just by looking at the header. That's good.
But lots of loops don't fit this model, and the infinite loop with explicit exit(s) in the middle is an honorable model. (Loops with continue are usually harder to understand than loops with break.) If you want some evidence or authority to cite, look no further than Don Knuth's famous paper on Structured Programming with Goto Statements; you will find all the examples, arguments, and explanations you could want.
A minor point of idiom: writing while (true) { ... } brands you as an old Pascal programmer or perhaps these days a Java programmer. If you are writing in C or C++, the preferred idiom is
for (;;) { ... }
There's no good reason for this, but you should write it this way because this is the way C programmers expect to see it.
I prefer
while(!<some condition>) {
//do something
}
but I think it's more a matter of readability, rather than the potential to "forget the break." I think that forgetting the break is a rather weak argument, as that would be a bug and you'd find and fix it right away.
The argument I have against using a break to get out of an endless loop is that you're essentially using the break statement as a goto. I'm not religiously against using goto (if the language supports it, it's fair game), but I do try to replace it if there's a more readable alternative.
In the case of many break points I would replace them with
while( !<some condition> ||
!<some other condition> ||
!<something completely different> ) {
//do something
}
Consolidating all of the stop conditions this way makes it a lot easier to see what's going to end this loop. break statements could be sprinkled around, and that's anything but readable.
while (true) might make sense if you have many statements and you want to stop if any fail
while (true) {
if (!function1() ) return;
if (!function2() ) return;
if (!function3() ) return;
if (!function4() ) return;
}
is better than
while (!fail) {
if (!fail) {
fail = function1()
}
if (!fail) {
fail = function2()
}
........
}
Javier made an interesting comment on my earlier answer (the one quoting Wordsworth):
I think while(true){} is a more 'pure' construct than while(condition){}.
and I couldn't respond adequately in 300 characters (sorry!)
In my teaching and mentoring, I've informally defined "complexity" as "How much of the rest of the code I need to have in my head to be able to understand this single line or expression?" The more stuff I have to bear in mind, the more complex the code is. The more the code tells me explicitly, the less complex.
So, with the goal of reducing complexity, let me reply to Javier in terms of completeness and strength rather than purity.
I think of this code fragment:
while (c1) {
// p1
a1;
// p2
...
// pz
az;
}
as expressing two things simultaneously:
the (entire) body will be repeated as long as c1 remains true, and
at point 1, where a1 is performed, c1 is guaranteed to hold.
The difference is one of perspective; the first of these has to do with the outer, dynamic behavior of the entire loop in general, while the second is useful to understanding the inner, static guarantee which I can count on while thinking about a1 in particular. Of course the net effect of a1 may invalidate c1, requiring that I think harder about what I can count on at point 2, etc.
Let's put a specific (tiny) example in place to think about the condition and first action:
while (index < length(someString)) {
// p1
char c = someString.charAt(index++);
// p2
...
}
The "outer" issue is that the loop is clearly doing something within someString that can only be done as long as index is positioned in the someString. This sets up an expectation that we'll be modifying either index or someString within the body (at a location and manner not known until I examine the body) so that termination eventually occurs. That gives me both context and expectation for thinking about the body.
The "inner" issue is that we're guaranteed that the action following point 1 will be legal, so while reading the code at point 2 I can think about what is being done with a char value I know has been legally obtained. (We can't even evaluate the condition if someString is a null ref, but I'm also assuming we've guarded against that in the context around this example!)
In contrast, a loop of the form:
while (true) {
// p1
a1;
// p2
...
}
lets me down on both issues. At the outer level, I am left wondering whether this means that I really should expect this loop to cycle forever (e.g. the main event dispatch loop of an operating system), or whether there's something else going on. This gives me neither an explicit context for reading the body, nor an expectation of what constitutes progress toward (uncertain) termination.
At the inner level, I have absolutely no explicit guarantee about any circumstances that may hold at point 1. The condition true, which is of course true everywhere, is the weakest possible statement about what we can know at any point in the program. Understanding the preconditions of an action are very valuable information when trying to think about what the action accomplishes!
So, I suggest that the while (true) ... idiom is much more incomplete and weak, and therefore more complex, than while (c1) ... according to the logic I've described above.
The problem is that not every algorithm sticks to the "while(cond){action}" model.
The general loop model is like this :
loop_prepare
loop:
action_A
if(cond) exit_loop
action_B
goto loop
after_loop_code
When there is no action_A you can replace it by :
loop_prepare
while(cond)
action_B
after_loop_code
When there is no action_B you can replace it by :
loop_prepare
do action_A
while(cond)
after_loop_code
In the general case, action_A will be executed n times and action_B will be executed (n-1) times.
A real life example is : print all the elements of a table separated by commas.
We want all the n elements with (n-1) commas.
You always can do some tricks to stick to the while-loop model, but this will always repeat code or check twice the same condition (for every loops) or add a new variable. So you will always be less efficient and less readable than the while-true-break loop model.
Example of (bad) "trick" : add variable and condition
loop_prepare
b=true // one more local variable : more complex code
while(b): // one more condition on every loop : less efficient
action_A
if(cond) b=false // the real condition is here
else action_B
after_loop_code
Example of (bad) "trick" : repeat the code. The repeated code must not be forgotten while modifying one of the two sections.
loop_prepare
action_A
while(cond):
action_B
action_A
after_loop_code
Note : in the last example, the programmer can obfuscate (willingly or not) the code by mixing the "loop_prepare" with the first "action_A", and action_B with the second action_A. So he can have the feeling he is not doing this.
The first is OK if there are many ways to break from the loop, or if the break condition cannot be expressed easily at the top of the loop (for example, the content of the loop needs to run halfway but the other half must not run, on the last iteration).
But if you can avoid it, you should, because programming should be about writing very complex things in the most obvious way possible, while also implementing features correctly and performantly. That's why your friend is, in the general case, correct. Your friend's way of writing loop constructs is much more obvious (assuming the conditions described in the preceding paragraph do not obtain).
There's a substantially identical question already in SO at Is WHILE TRUE…BREAK…END WHILE a good design?. #Glomek answered (in an underrated post):
Sometimes it's very good design. See Structured Programing With Goto Statements by Donald Knuth for some examples. I use this basic idea often for loops that run "n and a half times," especially read/process loops. However, I generally try to have only one break statement. This makes it easier to reason about the state of the program after the loop terminates.
Somewhat later, I responded with the related, and also woefully underrated, comment (in part because I didn't notice Glomek's the first time round, I think):
One fascinating article is Knuth's "Structured Programming with go to Statements" from 1974 (available in his book 'Literate Programming', and probably elsewhere too). It discusses, amongst other things, controlled ways of breaking out of loops, and (not using the term) the loop-and-a-half statement.
Ada also provides looping constructs, including
loopname:
loop
...
exit loopname when ...condition...;
...
end loop loopname;
The original question's code is similar to this in intent.
One difference between the referenced SO item and this is the 'final break'; that is a single-shot loop which uses break to exit the loop early. There have been questions on whether that is a good style too - I don't have the cross-reference at hand.
Sometime you need infinite loop, for example listening on port or waiting for connection.
So while(true)... should not categorized as good or bad, let situation decide what to use
It depends on what you’re trying to do, but in general I prefer putting the conditional in the while.
It’s simpler, since you don't need another test in the code.
It’s easier to read, since you don’t have to go hunting for a break inside the loop.
You’re reinventing the wheel. The whole point of while is to do something as long as a test is true. Why subvert that by putting the break condition somewhere else?
I’d use a while(true) loop if I was writing a daemon or other process that should run until it gets killed.
If there's one (and only one) non-exceptional break condition, putting that condition directly into the control-flow construct (the while) is preferable. Seeing while(true) { ... } makes me as a code-reader think that there's no simple way to enumerate the break conditions and makes me think "look carefully at this and think about carefully about the break conditions (what is set before them in the current loop and what might have been set in the previous loop)"
In short, I'm with your colleague in the simplest case, but while(true){ ... } is not uncommon.
The perfect consultant's answer: it depends. Most cases, the right thing to do is either use a while loop
while (condition is true ) {
// do something
}
or a "repeat until" which is done in a C-like language with
do {
// do something
} while ( condition is true);
If either of these cases works, use them.
Sometimes, like in the inner loop of a server, you really mean that a program should keep going until something external interrupts it. (Consider, eg, an httpd daemon -- it isn't going to stop unless it crashes or it's stopped by a shutdown.)
THEN AND ONLY THEN use a while(1):
while(1) {
accept connection
fork child process
}
Final case is the rare occasion where you want to do some part of the function before terminating. In that case, use:
while(1) { // or for(;;)
// do some stuff
if (condition met) break;
// otherwise do more stuff.
}
I think the benefit of using "while(true)" is probably to let multiple exit condition easier to write especially if these exit condition has to appear in different location within the code block. However, for me, it could be chaotic when I have to dry-run the code to see how the code interacts.
Personally I will try to avoid while(true). The reason is that whenever I look back at the code written previously, I usually find that I need to figure out when it runs/terminates more than what it actually does. Therefore, having to locate the "breaks" first is a bit troublesome for me.
If there is a need for multiple exit condition, I tend to refactor the condition determining logic into a separate function so that the loop block looks clean and easier to understand.
No, that's not bad since you may not always know the exit condition when you setup the loop or may have multiple exit conditions. However it does require more care to prevent an infinite loop.
He is probably correct.
Functionally the two can be identical.
However, for readability and understanding program flow, the while(condition) is better. The break smacks more of a goto of sorts. The while (condition) is very clear on the conditions which continue the loop, etc. That doesn't mean break is wrong, just can be less readable.
A few advantages of using the latter construct that come to my mind:
it's easier to understand what the loop is doing without looking for breaks in the loop's code.
if you don't use other breaks in the loop code, there's only one exit point in your loop and that's the while() condition.
generally ends up being less code, which adds to readability.
I prefer the while(!) approach because it more clearly and immediately conveys the intent of the loop.
There has been much talk about readability here and its very well constructed but as with all loops that are not fixed in size (ie. do while and while) you run at a risk.
His reasoning was that you could "forget the break" too easily and have an endless loop.
Within a while loop you are in fact asking for a process that runs indefinitely unless something happens, and if that something does not happen within a certain parameter, you will get exactly what you wanted... an endless loop.
What your friend recommend is different from what you did. Your own code is more akin to
do{
// do something
}while(!<some condition>);
which always run the loop at least once, regardless of the condition.
But there are times breaks are perfectly okay, as mentioned by others. In response to your friend's worry of "forget the break", I often write in the following form:
while(true){
// do something
if(<some condition>) break;
// continue do something
}
By good indentation, the break point is clear to first time reader of the code, look as structural as codes which break at the beginning or bottom of a loop.
It's not so much the while(true) part that's bad, but the fact that you have to break or goto out of it that is the problem. break and goto are not really acceptable methods of flow control.
I also don't really see the point. Even in something that loops through the entire duration of a program, you can at least have like a boolean called Quit or something that you set to true to get out of the loop properly in a loop like while(!Quit)... Not just calling break at some arbitrary point and jumping out,
using loops like
while(1) { do stuff }
is necessary in some situations. If you do any embedded systems programming (think microcontrollers like PICs, MSP430, and DSP programming) then almost all your code will be in a while(1) loop. When coding for DSPs sometimes you just need a while(1){} and the rest of the code is an interrupt service routine (ISR).
If you loop over an external condition (not being changed inside the loop), you use while(t), where t is the condition. However, if the loop stops when the condition changes inside the loop, it's more convenient to have the exit point explicitly marked with break, instead of waiting for it to happen on the next iteration of the loop:
while (true) {
...
a := a + 1;
if (a > 10) break; // right here!
...
}
As was already mentioned in a few other answers, the less code you have to keep in your head while reading a particular line, the better.