How can be bytecode used for optimizing the execution time of dynamic languages? - optimization

I am interested in some optimization methods or general bytecode designs, which might help speed up execution using VM in comparison to interpretation of an AST.

The main win in AST interpretation vs. bytecode is operation dispatch cost, for highly optimised interpreters this starts to become a real problem. "Dispatch" is the term used to describe the overhead required to start executing an operation (such as arithmetic, property access, etc).
A fairly normal AST based interpreter would look something like this:
class ASTNode {
virtual double execute() = 0;
}
class NumberNode {
virtual double execute() { return m_value; }
double m_value;
}
class AddNode {
virtual double execute() { return left->execute() + right->execute(); }
}
So executing the code for something as simple as 1+1 requires 3 virtual calls. Virtual calls a very expensive (in the grand scheme of things) due to the multiple indirections to make the call, and the general cost of making a call in the first place.
In a bytecode interpreter you have you a different dispatch model -- rather than virtual calls you have an execution loop, akin to:
while (1) {
switch (op.type) {
case op_add:
// Efficient interpreters use "registers" rather than
// a stack these days, but the example code would be more
// complicated
push(pop() + pop());
continue;
case op_end:
return pop();
}
}
This still has a reasonably expensive dispatch cost vs native code, but is much faster than virtual dispatch. You can further improve perf using a gcc extension called "computed goto" which allows you to remove the switch dispatch, reducing total dispatch cost to basically a single indirect branch.
In addition to improving dispatch costs bytecode based interpreters have a number of additional advantages over AST interpreters, mostly due to the ability of the bytecode to "directly" jump to other locations as a real machine would, for example imagine a snippet of code like this:
while (1) {
...statements...
if (a)
break;
else
continue;
}
To implement this correctly everytime a statement is executed you would need to indicate whether execution is meant to stay in the loop or stop, so the execution loop becomes something like:
while (condition->execute() == true) {
for (i = 0; i < statements->length(); i++) {
result = statements[i]->execute();
if (result.type == BREAK)
break;
if (result.type == CONTINUE)
i = 0;
}
}
As you add more forms of flow control this signalling becomes more and more expensive. Once you add exceptions (eg. flow control that can happen everywhere) you start needing to check for these things in the middle of even basic arithmetic, leading to ever increasing overhead. If you want to see this in the real world I encourage you to look at the ECMAScript spec, where they describe the execution model in terms of an AST interpreter.
In a bytecode interpreter these problems basically go away, as the bytecode is able to directly express control flow rather than indirectly through signalling, eg. continue is simply converted into a jump instruction, and you only get that cost if it's actually hit.
Finally an AST interpreter by definition is recursive, and so has to be prevented from overflowing the system stack, which puts very heavy restrictions on how much you can recurse in your code, something like:
1+(1+(1+(1+(1+(1+(1+(1+1)))))))
Has 8 levels of recursion (at least) in the interpreter -- this can be a very significant cost; older versions of Safari (pre-SquirrelFish) used an AST interpreter, and for this reason JS was allowed only a couple of hundred levels of recursion vs 1000's allowed in modern browsers.

Perhaps you could look at the various methods which the llvm "opt" tool provides. Those are bytecode-to-bytecode optimisations, and the tool itself will provide analysis on the benefits of applying a particular optimisation.

Related

How to disable all optimization when using COSMIC compiler?

I am using the COSMIC compiler in the STVD ide and even though optimization is turned of with -no (documentation says "-no: do not use optimizer") some lines of code get removed and cannot have a breakpoint placed upon them, nor are they to be found in the disassembly.
I tried to set -oc (leave removed instructions as comments) which resulted in not even showing the removed lines as comment.
bool foo(void)
{
uint8_t val;
if (globalvar > 5)
val = 0;
for (val = 0; val < 8; val++)
{
some code...
}
return true;
}
I do know it seems idiotic to set val to 0 prior to the for loop but lets just assume it is for some reason necessary. When I set no optimization I expect it to be not optimized but insted the val = 0; gets removed without any traces.
I am not looking for a workaround like declaring val volatile whitch solves the problem. I am rather looking for a way to prevent the optimization or at least understand/know what changes are made to my code when compiling.
It is not clear from the manual, but it seems that the -no option prevents assembly level optimisation. It seems possible that the code generator stage that runs before assembly optimisation may perform higher level optimisation such as redundant code removal.
From the manual:
-cp
disable the constant propagation optimization. By default,
when a variable is assigned with a constant, any subsequent access to that variable is replaced by the constant
itself until the variable is modified or a flow break is
encountered (function call, loop, label ...).
It seems that it is this constant propagation feature that you must explicitly disable.
It is unusual perhaps, but it appears that this compiler optimises by default, and distinguishes between compiler optimisations and assembler optimisations (performed as the compilation stage), and them makes you switch off each individual optimisation separately.
To avoid this in the code, rather than switching it off globally, you could initialise val to a non-zero value in this case:
int val = -1 ;
Then the later assignment to zero will require explicit code. This has the advantage over volatile perhaps in that it will not block optimisations when you do enable them.
I believe that this behaviour is allowed by the C language specification.
You are effectively writing the same value either once or twice to the same variable on successive lines of code. The compiler could assign this value to either a processor register or a memory location as it sees fit and knows that the value following the initial assignment in the for loop is the same as the value assigned when the if clause is actioned. As a result the language spec allows the compiler to throw the redundant code away.
The way to force the compiler to perform all read and write accesses to the variable is to use the volatile keyword. That is what it is for.

Mono.Defer() vs Mono.create() vs Mono.just()?

Could someone help me to understand the difference between:
Mono.defer()
Mono.create()
Mono.just()
How to use it properly?
Mono.just(value) is the most primitive - once you have a value you can wrap it into a Mono and subscribers down the line will get it.
Mono.defer(monoSupplier) lets you provide the whole expression that supplies the resulting Mono instance. The evaluation of this expression is deferred until somebody subscribes. Inside of this expression you can additionally use control structures like Mono.error(throwable) to signal an error condition (you cannot do this with Mono.just).
Mono.create(monoSinkConsumer) is the most advanced method that gives you the full control over the emitted values. Instead of the need to return Mono instance from the callback (as in Mono.defer), you get control over the MonoSink<T> that lets you emit values through MonoSink.success(), MonoSink.success(value), MonoSink.error(throwable) methods.
Reactor documentation contains a few good examples of possible Mono.create use cases: link to doc.
The general advice is to use the least powerful abstraction to do the job: Mono.just -> Mono.defer -> Mono.create.
Although in general I agree with (and praise) #IlyaZinkovich's answer, I would be careful with the advice
The general advice is to use the least powerful abstraction to do the job: Mono.just -> Mono.defer -> Mono.create.
In the reactive approach, especially if we are beginners, it's very easy to overlook which the "least powerful abstraction" actually is. I am not saying anything else than #IlyaZinkovich, just depicting one detailed aspect.
Here is one specific use case where the more powerful abstraction Mono.defer() is preferable over Mono.just() but which might not be visible at the first glance.
See also:
https://stackoverflow.com/a/54412779/2886891
https://stackoverflow.com/a/57877616/2886891
We use switchIfEmpty() as a subscription-time branching:
// First ask provider1
provider1.provide1(someData)
// If provider1 did not provide the result, ask the fallback provider provider2
.switchIfEmpty(provider2.provide2(someData))
public Mono<MyResponse> provide2(MyRequest someData) {
// The Mono assembly is needed only in some corner cases
// but in fact it is always happening
return Mono.just(someData)
// expensive data processing which might even fail in the assemble time
.map(...)
.map(...)
...
}
provider2.provide2() accepts someData only when provider1.provide1() does not return any result, and/or the method assembly of the Mono returned by provider2.provide2() is expensive and even fails when called on wrong data.
It this case defer() is preferable, even if it might not be obvious at the first glance:
provider1.provide1(someData)
// ONLY IF provider1 did not provide the result, assemble another Mono with provider2.provide()
.switchIfEmpty(Mono.defer(() -> provider2.provide2(someData)))

Confusion regarding reentrant functions

My understanding of "reentrant function" is that it's a function that can be interrupted (e.g by an ISR or a recursive call) and later resumed such that the overall output of the function isn't affected in any way by the interruption.
Following is an example of a reentrant function from Wikipedia https://en.wikipedia.org/wiki/Reentrancy_(computing)
int t;
void swap(int *x, int *y)
{
int s;
s = t; // save global variable
t = *x;
*x = *y;
// hardware interrupt might invoke isr() here!
*y = t;
t = s; // restore global variable
}
void isr()
{
int x = 1, y = 2;
swap(&x, &y);
}
I was thinking, what if we modify the ISR like this:
void isr()
{
t=0;
}
And let's say, then, that the main function calls the swap function, but then suddenly an interrupt occurs, then the output would surely get distorted as the swap wouldn't be proper, which in my mind makes this function non-reentrant.
Is my thinking right or wrong? Is there some mistake in my understanding of reentrancy?
The answer to your question:
that the main function calls the swap function, but then suddenly an interrupt occurs, then the output would surely get distorted as the swap wouldn't be proper, which in my mind makes this function non-reentrant.
Is no, it does not, because re-entrancy is (by definition) defined with respect to self. If isr calls swap, the other swap would be safe. However, swap is thread-unsafe, though.
The correct way of thinking depends on the precise definition of re-entrancy and thread-safety (See, say Threadsafe vs re-entrant)
Wikipedia, the source of the code in question, selected the definition of reentrant function to be "if it can be interrupted in the middle of its execution and then safely called again ("re-entered") before its previous invocations complete execution".
I have never heard the term re-entrancy used in the context of interrupt service routines. It is generally the responsibility of the ISR (and/or the operating system) to maintain consistency - application code should not need to know anything about what an interrupt might do.
That a function is re-entrant usually means that it can be called from multiple threads simultaneously - or by itself recursively (either directly or through a more elaborate call chain) - and still maintain internal consistency.
For functions to be re-entrant they must generally avoid using static variables and of course avoid calls to other functions that are not themselves re-entrant.

State Machine - Is enum the choice for states?

I would like to have class static variables as states, but Objective C disallowed it
I tried +(int)LOOPING_STATE for state class, but it will fail in
switch (myCurrentState) {
case [STATE_CLASS LOOPING_STATE]: <== received an error of "expression can't be put here"
return;
}
Is enum generally the choice for writing state codes?
Are there any other options, and under what condition should those options be used?
Thanks in advance.
This has little to do with Objective-C but more with the C in Objective-C. In general, using an enum to represent the states of your state machine should be preferred over plain integers.
The reason you can't use classes in a switch is that the value of the expressions used in the case labels of a switch statement need to be known at compile time. Assuming [STATE_CLASS LOOPING_STATE] is an invocation of a class method, the compiler can't safely know the result of that expression at compile time, and will thus refuse generating a switch statement.
Why does the compiler require knowing the result of expressions used for case labels at compile time? The idea behind a switch statement is to be more efficient than a series of semantically equivalent if/else if blocks. This is achieved by translating a switch statement into a dispatch table with an unconditional jump, whereas the if/else if solution requires lots of conditional jumps. As one can easily guess, conditional jumps are fundamentally at odds with modern pipelined CPU designs since they may cause the entire pipeline to be flushed. (Modern CPUs try to compensate with sophisticated branch prediction, but it would be better if we can avoid the issue altogether, right?)
But then, getting it right comes first, making it fast second.
If this state machine needs to be fast, enums are the way to go. However, if you want an object oriented way of doing this, the functionality of each state would be a method of the state object itself. Thus you would do away with the switch/if statement altogether. Your state machine's loop would look something like this:
-(void) run
{
State* currentState;
currentState = [self startState];
while (currentState != [self stopState])
{
currentState = [currentState transitionWitInput: inputs
actions: actions];
}
}
inputs is the input data for the state transition, actions is a block or a selector or an NSInvocation or something that tells the state what to do during the transition.

Can you write any algorithm without an if statement?

This site tickled my sense of humour - http://www.antiifcampaign.com/ but can polymorphism work in every case where you would use an if statement?
Smalltalk, which is considered as a "truly" object oriented language, has no "if" statement, and it has no "for" statement, no "while" statement. There are other examples (like Haskell) but this is a good one.
Quoting Smalltalk has no “if” statement:
Some of the audience may be thinking
that this is evidence confirming their
suspicions that Smalltalk is weird,
but what I’m going to tell you is
this:
An “if” statement is an abomination in an Object Oriented language.
Why? Well, an OO language is composed
of classes, objects and methods, and
an “if” statement is inescapably none
of those. You can’t write “if” in an
OO way. It shouldn’t exist.
Conditional execution, like everything
else, should be a method. A method of
what? Boolean.
Now, funnily enough, in Smalltalk,
Boolean has a method called
ifTrue:ifFalse: (that name will look
pretty odd now, but pass over it for
now). It’s abstract in Boolean, but
Boolean has two subclasses: True and
False. The method is passed two blocks
of code. In True, the method simply
runs the code for the true case. In
False, it runs the code for the false
case. Here’s an example that hopefully
explains:
(x >= 0) ifTrue: [
'Positive'
] ifFalse: [
'Negative'
]
You should be able to see ifTrue: and
ifFalse: in there. Don’t worry that
they’re not together.
The expression (x >= 0) evaluates to
true or false. Say it’s true, then we
have:
true ifTrue: [
'Positive'
] ifFalse: [
'Negative'
]
I hope that it’s fairly obvious that
that will produce ‘Positive’.
If it was false, we’d have:
false ifTrue: [
'Positive'
] ifFalse: [
'Negative'
]
That produces ‘Negative’.
OK, that’s how it’s done. What’s so
great about it? Well, in what other
language can you do this? More
seriously, the answer is that there
aren’t any special cases in this
language. Everything can be done in an
OO way, and everything is done in an
OO way.
I definitely recommend reading the whole post and Code is an object from the same author as well.
That website is against using if statements for checking if an object has a specific type. This is completely different from if (foo == 5). It's bad to use ifs like if (foo instanceof pickle). The alternative, using polymorphism instead, promotes encapsulation, making code infinitely easier to debug, maintain, and extend.
Being against ifs in general (doing a certain thing based on a condition) will gain you nothing. Notice how all the other answers here still make decisions, so what's really the difference?
Explanation of the why behind polymorphism:
Take this situation:
void draw(Shape s) {
if (s instanceof Rectangle)
//treat s as rectangle
if (s instanceof Circle)
//treat s as circle
}
It's much better if you don't have to worry about the specific type of an object, generalizing how objects are processed:
void draw(Shape s) {
s.draw();
}
This moves the logic of how to draw a shape into the shape class itself, so we can now treat all shapes the same. This way if we want to add a new type of shape, all we have to do is write the class and give it a draw method instead of modifying every conditional list in the whole program.
This idea is everywhere in programming today, the whole concept of interfaces is all about polymorphism. (Shape is an interface defining a certain behavior, allowing us to process any type that implements the Shape interface in our method.) Dynamic programming languages take this even further, allowing us to pass any type that supports the necessary actions into a method. Which looks better to you? (Python-style pseudo-code)
def multiply(a,b):
if (a is string and b is int):
//repeat a b times.
if (a is int and b is int):
//multiply a and b
or using polymorphism:
def multiply(a,b):
return a*b
You can now use any 2 types that support the * operator, allowing you to use the method with types that haven't event been created yet.
See polymorphism and what is polymorhism.
Though not OOP-related: In Prolog, the only way to write your whole application is without if statements.
Yes actually, you can have a turing-complete language that has no "if" per se and only allows "while" statements:
http://cseweb.ucsd.edu/classes/fa08/cse200/while.html
As for OO design, it makes sense to use an inheritance pattern rather than switches based on a type field in certain cases... That's not always feasible or necessarily desirable though.
#ennuikiller: conditionals would just be a matter of syntactic sugar:
if (test) body; is equivalent to x=test; while (x) {x=nil; body;}
if-then-else is a little more verbose:
if (test) ifBody; else elseBody;
is equivalent to
x = test; y = true;
while (x) {x = nil; y = nil; ifBody;}
while (y) {y = nil; elseBody;}
the primitive data structure is a list of lists. you could say 2 scalars are equal if they are lists of the same length. you would loop over them simultaneously using the head/tail operators and see if they stop at the same point.
of course that could all be wrapped up in macros.
The simplest turing complete language is probably iota. It contains only 2 symbols ('i' and '*').
Yep. if statements imply branches which can be very costly on a lot of modern processors - particularly PowerPC. Many modern PCs do a lot of pipeline re-ordering and so branch mis-predictions can cost an order of >30 cycles per branch miss.
On console programming it's sometimes faster to just execute the code and ignore it than check if you should execute it!
Simple branch avoidance in C:
if (++i >= 15)
{
i = 0;
)
can be re-written as
i = (i + 1) & 15;
However, if you want to see some real anti-if fu then read this
Oh and on the OOP question - I'll replace a branch mis-prediction with a virtual function call? No thanks....
The reasoning behind the "anti-if" campaign is similar to what Kent Beck said:
Good code invariably has small methods and
small objects. Only by factoring the system into many small pieces of state
and function can you hope to satisfy the “once and only once” rule. I get lots
of resistance to this idea, especially from experienced developers, but no one
thing I do to systems provides as much help as breaking it into more pieces.
If you don't know how to factor a program with composition and inheritance, then your classes and methods will tend to grow bigger over time. When you need to make a change, the easiest thing will be to add an IF somewhere. Add too many IFs, and your program will become less and less maintainable, and still the easiest thing will be to add more IFs.
You don't have to turn every IF into an object collaboration; but it's a very good thing when you know how to :-)
You can define True and False with objects (in a pseudo-python):
class True:
def if(then,else):
return then
def or(a):
return True()
def and(a):
return a
def not():
return False()
class False:
def if(then,else):
return false
def or(a):
return a
def and(a):
return False()
def not():
return True()
I think it is an elegant way to construct booleans, and it proves that you can replace every if by polymorphism, but that's not the point of the anti-if campaign. The goal is to avoid writing things such as (in a pathfinding algorithm) :
if type == Block or type == Player:
# You can't pass through this
else:
# You can
But rather call a is_traversable method on each object. In a sense, that's exactly the inverse of pattern matching. "if" is useful, but in some cases, it is not the best solution.
I assume you are actually asking about replacing if statements that check types, as opposed to replacing all if statements.
To replace an if with polymorphism requires a method in a common supertype you can use for dispatching, either by overriding it directly, or by reusing overridden methods as in the visitor pattern.
But what if there is no such method, and you can't add one to a common supertype because the super types are not maintained by you? Would you really go to the lengths of introducing a new supertype along with subtypes just to get rid of a single if? That would be taking purity a bit far in my opinion.
Also, both approaches (direct overriding and the visitor pattern) have their disadvantages: Overriding the method directly requires that you implement your method in the classes you want to switch on, which might not help cohesion. On the other hand, the visitor pattern is awkward if several cases share the same code. With an if you can do:
if (o instanceof OneType || o instanceof AnotherType) {
// complicated logic goes here
}
How would you share the code with the visitor pattern? Call a common method? Where would you put that method?
So no, I don't think replacing such if statements is always an improvement. It often is, but not always.
I used to write code a lot as the recommend in the anti-if campaign, using either callbacks in a delegate dictionary or polymorphism.
It's quite a beguiling argument, especially if you are dealing with messy code bases but to be honest, although it's great for a plugin model or simplifying large nested if statements, it does make navigating and readability a bit of a pain.
For example F12 (Go To Definition) in visual studio will take you to an abstract class (or, in my case an interface definition).
It also makes quick visual scanning of a class very cumbersome, and adds an overhead in setting up the delegates and lookup hashes.
Using the recommendations put forward in the anti-if campaign as much as they appear to be recommending looks like 'ooh, new shiny thing' programming to me.
As for the other constructs put forward in this thread, albeit it has been done in the spirit of a fun challenge, are just substitutes for an if statement, and don't really address what the underlying beliefs of the anti-if campaign.
You can avoid ifs in your business logic code if you keep them in your construction code (Factories, builders, Providers etc.). Your business logic code would be much more readable, easier to understand or easier to maintain or extend. See: http://www.youtube.com/watch?v=4F72VULWFvc
Haskell doesn't even have if statements, being pure functional. ;D
You can do it without if per se, but you can't do it without a mechanism that allows you to make a decision based on some condition.
In assembly, there's no if statement. There are conditional jumps.
In Haskell for instance, there's no explicit if, instead, you define a function multiple times, I forgot the exact syntax, but it's something like this:
pseudo-haskell:
def posNeg(x < 0):
return "negative"
def posNeg(x == 0):
return "zero"
def posNeg(x):
return "positive"
When you call posNeg(a), the interpreter will look at the value of a, if it's < 0 then it will choose the first definition, if it's == 0 then it will choose the second definition, otherwise it will default to the third definition.
So while languages like Haskell and SmallTalk don't have the usual C-style if statement, they have other means of allowing you to make decisions.
This is actually a coding game I like to play with programming languages. It's called "if we had no if" which has its origins at: http://wiki.tcl.tk/4821
Basically, if we disallow the use of conditional constructs in the language: no if, no while, no for, no unless, no switch etc.. can we recreate our own IF function. The answer depends on the language and what language features we can exploit (remember using regular conditional constructs is cheating co no ternary operators!)
For example, in tcl, a function name is just a string and any string (including the empty string) is allowed for anything (function names, variable names etc.). So, exploiting this we can do:
proc 0 {true false} {uplevel 1 $false; # execute false code block, ignore true}
proc 1 {true false} {uplevel 1 $true; # execute true code block, ignore flase}
proc _IF {boolean true false} {
$boolean $true $false
}
#usage:
_IF [expr {1<2}] {
puts "this is true"
} {
#else:
puts "this is false"
}
or in javascript we can abuse the loose typing and the fact that almost anything can be cast into a string and combine that with its functional nature:
function fail (discard,execute) {execute()}
function pass (execute,discard) {execute()}
var truth_table = {
'false' : fail,
'true' : pass
}
function _IF (expr) {
return truth_table[!!expr];
}
//usage:
_IF(3==2)(
function(){alert('this is true')},
//else
function(){alert('this is false')}
);
Not all languages can do this sort of thing. But languages I like tend to be able to.
The idea of polymorphism is to call an object without to first verify the class of that object.
That doesn't mean the if statement should not be used at all; you should avoid to write
if (object.isArray()) {
// Code to execute when the object is an array.
} else if (object.inString()) {
// Code to execute if the object is a string.
}
It depends on the language.
Statically typed languages should be able to handle all of the type checking by sharing common interfaces and overloading functions/methods.
Dynamically typed languages might need to approach the problem differently since type is not checked when a message is passed, only when an object is being accessed (more or less). Using common interfaces is still good practice and can eliminate many of the type checking if statements.
While some constructs are usually a sign of code smell, I am hesitant to eliminate any approach to a problem apriori. There may be times when type checking via if is the expedient solution.
Note: Others have suggested using switch instead, but that is just a clever way of writing more legible if statements.
Well, if you're writing in Perl, it's easy!
Instead of
if (x) {
# ...
}
you can use
unless (!x){
# ...
}
;-)
In answer to the question, and as suggested by the last respondent, you need some if statements to detect state in a factory. At that point you then instantiate a set of collaborating classes that solve the state specific problem. Of course, other conditionals would be required as needed, but they would be minimized.
What would be removed of course would be the endless procedural state checking rife in so much service based code.
Interesting smalltalk is mentioned, as that's the language I used before being dragged across into Java. I don't get home as early as I used to.
I thought about adding my two cents: you can optimize away ifs in many languages where the second part of a boolean expression is not evaluated when it won't affect the result.
With the and operator, if the first operand evaluates to false, then there is no need to evaluate the second one. With the or operator, it's the opposite - there's no need to evaluate the second operand if the first one is true. Some languages always behave like that, others offer an alternative syntax.
Here's an if - elseif - else code made in JavaScript by only using operators and anonymous functions.
document.getElementById("myinput").addEventListener("change", function(e) {
(e.target.value == 1 && !function() {
alert('if 1');
}()) || (e.target.value == 2 && !function() {
alert('else if 2');
}()) || (e.target.value == 3 && !function() {
alert('else if 3');
}()) || (function() {
alert('else');
}());
});
<input type="text" id="myinput" />
This makes me want to try defining an esoteric language where blocks implicitly behave like self-executing anonymous functions and return true, so that you would write it like this:
(condition && {
action
}) || (condition && {
action
}) || {
action
}