I have an algorithm that works perfectly, but it uses recursion. I know there are patterns for just about everything, but I could not find one for this case.
I just need some simple examples that show how to modify an algorithm, specifically the part where a method or function calls itself. I've seen iteration algorithms that do it with a while loop. So there must be a simple checklist to follow in order to convert an recursive algorithm into an iterational one.
You can definitely model recursion with iteration and a custom call stack. Since recursion is nothing but execution of same instructions in a new environment, you can model your own environment using a simple stack structure, and just wrap your algorithm in a loop, pushing your current mini-environment at the start of an iteration and popping it whenever you finish a loop iteration, or exit it prematurely via break or continue.
Tail-call recursion where the recursion happens as the end of a function is pretty trivial to make non-recursive with a loop. Some compilers even do that for you automatically.
Converting any recursive function into an iterative one in a systematic way isn't so simple and in all likelihood you'd end up creating your own call stack, which most likely defeats the purpose of having a non-recursive algorithm anyway.
Also see: Can every recursion be converted into iteration?
BTW, if you are using GCC or llvm, then your non-debug code with -O2 or -O3 turned on will perform tail recursion elimination for you. (In case you don't know, tail recursion is when the recursion call comes last thing in the function and is simply returned, i.e. not part of an expression. See http://en.wikipedia.org/wiki/Tail_call.)
So, if the recursive write up is clearer to read, it's probably better to stick with that.
Related
I want to check what is the definition of «iterative» in expansion regions in activity diagrams. For me personally this was never a question because I understand it as letting me do a For loop, e.g.,
For i=1 to 10
Do-Something // So it does it 10 times
End For
However, while I was presenting my UML diagram to an audience, an engineer team leader (not a UML maven) objected against the term ‘iterative’, because he understood ‘iterative’ to mean an 'iterative process' such that each step improves a result. I am also aware of this definition, but I assume the UML definition is not that, but rather means a simple For-Loop.
Please confirm that the UML definition of «iterative» and iteration is like a simple For-loop. Or otherwise, if so.
No, it has a different meaning. UML 2.5 states in p. 480:
The mode of an ExpansionRegion controls how its expansion executions proceed.
If the value is iterative, the expansion executions must occur in an iterative sequence, with one completing before another can begin. The first expansion execution begins immediately when the ExpansionRegion starts executing, with subsequent executions starting when the previous execution is completed. If the input collections are ordered, then the expansion executions are sequenced in the order induced by the input collection. Otherwise, the order of the expansion executions is not defined.
Other values for this keyword are parallel and stream. You can guess that behavior defined in a parallel region can be executed in parallel. stream is a bit more complicated and you might read on that page in the UML spec.
The for-loop itself comes from the input collection you pass to the region. This can be processed in either of the above ways.
tl;dr
So rather than a for loop the keyword «iterative» for the region tells that it's behavior may not be handeled in parallel.
Ahhh, semantics...
First a disclaimer - I am not a native English speaker. Yet my believe both my level of English and IT experience are sufficient to answer this question.
Let's have a look at the dictionary definition of iterative first:
iterative adjective
/ˈɪtərətɪv/
/ˈɪtəreɪtɪv/, /ˈɪtərətɪv/
(of a process) that involves repeating a process or set of instructions again and again, each time applying it to the result of the previous stage
We used an iterative process of refinement and modification.
an iterative procedure/method/approach
The highlight with a script font is mine.
Of course this is a pure word definition, not in the context of software development.
In real life a process can quite easily be considered repetitive but in itself not really iterative. Imagine an assembly line in a mass production factory. On one of the positions a particular screw/set of screws is applied to join two or more elements. For every next run, identical set of elements the same type and number of screws is applied. There is a virtually endless stream of similar part sets, each set consisting of the same type of parts as previously and requiring the same kind of connection. From the position perspective joining the elements is a repetitive process but it is not iterative, as each join is applied to a different set of elements - it does not apply to those already joined.
If you think of a code, it's somewhat different though. When applying a loop, almost always you have some sort of a resulting set impacted by it and one can argue that with every loop step that resulting set is being further changed, meaning the next loop step is applied on the result of the previous step. From this perspective almost every loop is iterative.
On the other hand, you can have a loop like that:
loop
wait 10
while buffer is empty
read buffer
You can clearly say it is a loop and nothing is being changed. All the code does is waiting for a buffer to fill. So it is not iterative.
For UML specifically though the precise meaning is included in qwerty_so's answer so I will not repeat it here.
I'm working on a streaming rules engine, and some of my customers have a few hundred rules they'd like to evaluate on every event that arrives at the system. The rules are pure (i.e. non-side-effecting) Boolean expressions, and they can be nested arbitrarily deeply.
Customers are creating, updating and deleting rules at runtime, and I need to detect and adapt to the population of rules dynamically. At the moment, the expression evaluation uses an interpreter over the internal AST, and I haven't started thinking about codegen yet.
As always, some of the predicates in the tree are MUCH cheaper to evaluate than others, and I've been looking for an algorithm or data structure that makes it easier to find the predicates that are cheap, and that are validly interpretable as controlling the entire expression. My mental headline for this pattern is "ANDs all the way to the root", i.e. any predicate for which all ancestors are ANDs can be interpreted as controlling.
Despite several days of literature search, reading about ROBDDs, CNF, DNF, etc., I haven't been able to close the loop from what might be common practice in the industry to my particular use case. One thing I've found that seems related is Analysis and optimization for boolean expression indexing
but it's not clear how I could apply it without implementing the BE-Tree data structure myself, as there doesn't seem to be an open source implementation.
I keep half-jokingly mentioning to my team that we're going to need a SAT solver one of these days. 😅 I guess it would probably suffice to write a recursive algorithm that traverses the tree and keeps track of whether every ancestor is an AND or an OR, but I keep getting the "surely this is a solved problem" feeling. :)
Edit: After talking to a couple of friends, I think I may have a sketch of a solution!
Transform the expressions into Conjunctive Normal Form, in which, by definition, every node is in a valid short-circuit position.
Use the Tseitin algorithm to try to avoid exponential blowups in expression size as a result of the CNF transform
For each AND in the tree, sort it in ascending order of cost (i.e. cheapest to the left)
???
Profit!^Weval as usual :)
You should seriously consider compiling the rules (and the predicates). An interpreter is 10-50x slower than machine code for the same thing. This is a good idea if the rule set doesn't change very often. Its even a good idea if the rules can change dynamically because in practice they still don't change very fast, although now your rule compiler has be online. Eh, just makes for a bigger application program and memory isn't much of an issue anymore.
A Boolean expression evaluation using individual machine instructions is even better. Any complex boolean equation can be compiled in branchless sequences of individual machine instructions over the leaf values. No branches, no cache misses; stuff runs pretty damn fast. Now, if you have expensive predicates, you probably want to compile code with branches to skip subtrees that don't affect the result of the expression, if they contain expensive predicates.
Within reason, you can generate any equivalent form (I'd run screaming into the night over the idea of using CNF because it always blows up on you). What you really want is the shortest boolean equation (deepest expression tree) equivalent to what the clients provided because that will take the fewest machine instructions to execute. This may sound crazy, but you might consider exhaustive search code generation, e.g., literally try every combination that has a chance of working, especially if the number of operators in the equation is relatively small. The VLSI world has been working hard on doing various optimizations when synthesizing boolean equations into gates. You should look into the the Espresso hueristic boolean logic optimizer (https://en.wikipedia.org/wiki/Espresso_heuristic_logic_minimizer)
One thing that might drive you expression evaluation is literally the cost of the predicates. if I have formula A and B, and I know that A is expensive to evaluate and usually returns true, then clearly I want to evaluate B and A instead.
You should consider common sub expression evaluation, so that any common subterm is only computed once. This is especially important when one has expensive predicates; you never want to evaluate the same expensive predicate twice.
I implemented these tricks in a PLC emulator (these are basically machines that evaluate buckets [like hundreds of thousands] of boolean equations telling factory actuators when to move) using x86 machine instructions for AND/OR/NOT for Rockwell Automation some 20 years ago. It outran Rockwell's "premier" PLC which had custom hardware but was essentially an interpreter.
You might also consider incremental evaluation of the equations. The basic idea is not to re-evaluate all the equations over and over, but rather to re-evaluate only those equations whose input changed. Details are too long to include here, but a patent I did back then explains how to do it. See https://patents.google.com/patent/US5623401A/en?inventor=Ira+D+Baxter&oq=Ira+D+Baxter
If within a for loop (running say n times ),i make a call to a library function,which i know in the back end runs another loop,does it affect my overall complexity ? Or does it remain O(n) ?
It does affect your overall complexity. Imagine you were to make a call to the function you're writing from within another loop - you can't ignore a function's inherent runtime because it looks like a single statement.
Now, exactly HOW it affects your complexity depends on what you're doing with it, and what it does, but you certainly can't ignore it.
When you have a function that accepts an array as an argument and calls another function with that array and that calls another function with it and so forth the stack will contain many copies of the pointer to that array. I just thought of an interesting way to alleviate this problem but I'm wondering whether or not it is worth implementing.
Does anyone have any idea how often stacks contain duplicate pointers in practice?
EDIT
Just to clarify, I am not optimizing a given program but, rather, am considering writing a new kind of optimization pass for my VM. My benchmarks have indicated that my current solution causes up to 70% of the total running time to be spent in stack manipulations. The optimization pass I am thinking of would generate code at compile time that would perform the same actions but pointers would (potentially) be duplicated on the stack less often. I am interested in any prior studies that have measured the number of duplicates on the stack because this would help me to quantify my optimization's potential. For example, if it is known that real programs do not push pointers already on the stack in practice then my optimization is worthless.
Moreover, these stack manipulations are due to the code generated by my VM making sure locally-held pointers are visible to the garbage collector and not due only to function parameters as both answerers have currently assumed. And they are actually operations on a shadow stack rather than the main stack.
First of all, the answer will depend on your application.
Secondly, even with high duplication, I doubt there is much sense in implementing the mechanism you describe, or even that it is possible in a general case. If you call a method and you pass it parameters, you must do it either one way or another.
There may be advantages to doing it in some specific way - for example there are several function calling conventions and many C/C++ compilers (e.g. gcc) let you choose between passing parameters on the stack or via registers. In certain cases, the latter may be faster - you can try and benchmark if it helps your application.
But in a general case, the cost of detecting duplicated values on the stack and "reusing" them would probably much exceed any gains from having a smaller stack. The code for pushing and popping values is really simple (just a few CPU instructions in an optimized case), code for finding and reusing duplicates - hardly so. You would also have to somehow store the information about which values are already on the stack and how to find them - a nontrivial data structure. Except for some really weird cases, I don't think this would be smaller than the actual copied data itself.
What you could do, would be to rewrite your algorithm in such way that some function calls are eliminated. For example, if your function's result only depends on the input arguments, you could somehow cache or memoize the results, thus avoiding repeated calls with the same values. This may indeed bring some gains, though it's usually a memory vs CPU time tradeoff. Getting an advantage both in memory and in CPU time is rarely possible. Also, rewriting your algorithm is not really "avoiding duplication of data on the stack".
Any way, for the original question, I think the idea is not viable and you should look at optimizations elsewhere.
PS: You use case may somewhat resemble tail-call optimization, so perhaps that's a direction worth looking at - but if you implement it yourself, I would also consider this to fall into the "change your algorithm" category. Maybe changing from a recursive algorithm to an iterative one could help also.
Can I suggest getting some exposure to actual performance tuning?
(Here's my canonical example.)
Between the time a program starts and the time it ends, of the cycles it uses, it obviously uses 100% of those cycles.
If it goes in and out of functions, and passes pointers to an array, but does nothing else, then there's no surprise that a high percent of time goes into function entry and exit, and passing arguments.
If a program P is written to do task T, there are a multitude of other programs P' which could also do task T. Some of them take fewer cycles than all the others, and those are the optimal ones.
The way the optimal ones differ from the non-optimal ones is that the non-optimal ones are doing things that can be done without.
So, to optimize any program, find out what cycles are being spent that don't have to be, and get rid of those activities. That link shows in great detail how I do it.
Trying to pass fewer arguments to functions might or might not be necessary, depending on what your diagnostics tell you.
This question concerns optimization. Suppose I need the array length of an array A at two places in my code. Should I use the function a.length() in the two places, or is it faster to assign a local variable the value of a.length() and use it at the two places.
By "faster" I mean in terms of running time. Moreover, i am talking asymptotically.
The asymptotic complexity of calling the function twice is the same - any constant number of calls to the same (pure) function on the same arguments has the same asymptotic complexity as a single call to that function, since you can just roll the constant number of calls into the big-O's hidden constant.
As for what will be faster, there's no guarantee which one will be faster. It depends on the language and compiler. I'd suggest just writing it both ways and timing the result to see if there's an appreciable difference. That said, if you are writing something that is so performance-critical that you can't afford to call .length() twice, you may need to reconsider your approach in general to see if there's a better global solution to the problem. Microoptimizations are rarely worth the effort unless you have a compelling reason to believe that your program is markedly slower in the unoptimized version.
If you have to ask the question, you're not at a point where it matters yet. If you were, you'd already have code that you've profiled, and you could just try it and see. This kind of thing depends heavily on your language and compiler, and the only results that matter are the ones you see.
Don't worry about micro-optimizations til you find you need to shave cycles, and even then the algorithm is the first thing to check.
What language? In many languages, such calls are optimized away (either at compile time or by a JIT compiler) into direct access to the length field of the array object.