Coroutine vs Fiber difference clarification - coroutine

In the book Linux System Programming, 2nd Edition, the difference between coroutines and fiber is explained as follows:
Coroutines and fibers provide a unit of execution even lighter in weight than the thread (with the former being their name when they are a programming language construct, and the latter when they are a system construct).
I have some example of Coroutines (language constructs) but unable to find the example of Fibers.
Can anyone provide me some example of Fiber (a system construct)?

You could take a look at boost.coroutine2 and boost.fiber (C++ libraries) - both use the same context switching mechanism (callcc()/continuation) from boost.context.
In short - the difference between coroutines and fibers is, that the context switch between fibers is managed by a scheduler (selects the next fiber ...). Coroutines don't have a concept of a scheduler.
A more detailed explanation of the difference between coroutines and fibers can be read in N4024: Distinguishing coroutines and fibers.

Related

What is the difference between processes/messages in Erlang and objects/messages in Smalltalk?

I'm trying to understand difference between objects/messages in Smalltalk and processes/messages in Erlang. I read the following post on the topic.
As far as I understand, in Smalltalk, everything is an object, and everything has the same "object/message" abstraction - even the number 1 is an object that can only be reached with message passing.
Is 1 a process in Erlang/Elixir? Is everything in Erlang a response to message/program paradigm? Can you send a message to a number in Erlang?
Thanks a lot.
Processes in Erlang and Objects in Smalltalk are indeed the same thing.
At first glance, this is not terribly surprising: Erlang is an Actor Model language. The Actor Model was invented by Carl Hewitt, who based the message-driven evaluation model on Smalltalk's message-driven evaluation model. (Really, Actors and Objects are the same thing; they only differ in some details.) Alan Kay, in turn, was influenced by Carl Hewitt's PLANNER, when he designed Smalltalk.
So, there is a close relationship between Actors and Objects, and therefore, it should not be surprising that Erlang's Processes and Smalltalk's Objects are so similar.
Except for one thing: the designers of Erlang didn't know about the Actor Model!!! They only learned about it later, particularly when Joe Armstrong wrote his PhD Thesis under Seif Haridi (co-author of the definitive book on Programming Paradigms) in the late 1990s.
Joe Armstrong wrote an article in which he strongly advocated against OO (Why OO Sucks), but he later changed his mind when he realized that Erlang is actually very object-oriented. In fact, he even went so far as to claim that Erlang is the only object-oriented language in this interview with Joe Armstrong and Ralph Johnson.
This is an interesting case of what evolutionary biologists would call convergent evolution, i.e. two unrelated species evolving to be similar in response to similar external pressures.
There are still a lot of relationships between Erlang and Smalltalk, though:
Erlang started out as a concurrency extension to Prolog (and even when Erlang became its own separate language, the first implementations were written in Prolog) and is still to this day heavily rooted in Prolog. Prolog is heavily influenced by Carl Hewitt's PLANNER.
Smalltalk was also heavily influenced by what would later become the ARPANet (and even later the Internet); Erlang was designed for networked systems.
However, one of the important differences between Erlang and Smalltalk is that not everything is a Process. 1 is a number, not a process. You can't send a message to a number.
There are multiple "layers" of Erlang:
Functional Erlang: a mostly typical, dynamically-typed functional language with some "oddities" inherited from Prolog, e.g. unification.
Concurrent Erlang: Functional Erlang + Processes and Messages.
Distributed Erlang: Concurrent Erlang + Remote Processes.
Fault-Tolerant Erlang: Distributed Erlang + certain Design Patterns codified in the OTP libraries, e.g. supervisor trees and gen_server.
A Fault-Tolerant system written in Erlang/OTP will typically look like something we might recognize as "Object-Oriented". But the insides of those objects will often be implemented in a more functional than object-oriented style.
Interestingly, the "evolutionary pressure" that Erlang was under, in other words, the problem Erlang's designers were trying to solve (reliability, replication, redundancy, …) is the same pressure that led to the evolution of cells. Alan Kay minored in microbiology, and explicitly modeled OO on biological cells. This is another parallel between Erlang and Smalltalk.
I wrote a little bit about this in another answer of mine.

ForkJoinPool and Kotlin Coroutines

As I understand it, by default, if you start a Kotlin Coroutine via launch or async it'll launch in CommonPool (or if you use GlobalScope). And CommonPool is a ForkJoinPool and that, by default, is in non-async mode so it executes tasks in LIFO order. That seems like a very bad choice for something like asynchronous web server applications where we'd want fair scheduling: we don't want the poor sucker who hit our web server first to wait for all calls that came later.
However, Kotlin coroutines add an additional wrinkle here in that there's some bit of code from the Kotlin standard library that will arrange to have those coroutines executed (some variation of the standard asyc select/epoll loop as I understand it). So maybe the LIFO thing isn't a concern?
I could certainly run some experiments and/or step into the code in a debugger to see how this works but I suspect other's have the same question and I bet somebody "just knows" the answer...
Per discussion on Kotlin Discuss CommonPool is no longer the default and they now default to a "mostly fair" scheduler. Details in the linked discussion.
This shouldn't be a concern, because ForkJoinPool is not really LIFO.
That is, it's LIFO for a single thread in the pool, but that's where things become interesting with "work stealing part". Task queue for each thread is double linked. So, what is LIFO for one thread is FIFO for another thread that became free.
In general, ForkJoinPool is a great solution for small tasks, and usually your coroutines are considered small, if you use suspending functions wisely.
Also, you can read more about asyncMode in documentation, as it's not that "async": https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/ForkJoinPool.html
asyncMode - if true, establishes local first-in-first-out scheduling
mode for forked tasks that are never joined. This mode may be more
appropriate than default locally stack-based mode in applications in
which worker threads only process event-style asynchronous tasks. For
default value, use false.

How implement go style channels (CSP) with objective-c?

I wonder how create a CSP library for obj-c, that work like Go's channels/goroutines but with idiomatic obj-c (and less boilerplate than actual ways).
In other languages with native courutines and/or generators is possible to model it easily, but I don't grasp how do the same with the several ways of do concurrent programing in obj-c (plus, the idea is have "cheap" threads).
Any hint about what I need to do?
I would look at the State Threads library as it implements roughly the same idea which underlies the goroutine switching algorythm of Go: a goroutine surrenders control to the scheduler when it's about to sleep in a syscall, and so the ST library wraps OS-level file descriptors to provide their own FD-like objects which can be read from (and/or written to) but instead of blocking the whole process these operation transfer control to other light-weight threads managed by the library.
Then you might need a scheduler more advanced than that of the ST library to keep OS threads busy running your SPs. A no-brainer introduction to the Go 1.2 scheduler is here, and it contains a link to a more hard-core design document. The rest is in the Go's source code.
See also this answer on SO.
Create operations, e.g. for an example consider this process:
process x takes number from east, transforms it to a string, and gives it to west.
That I could model it with an object that keeps an internal state of x (consisting of number and string) and the following operations:
east-output, operation defined somewhere else by east process logic
x-input, operation that depends on east-output. It copies number from east-output's data structure into x's data structure
x-output, operation that depends on x-input. Its content is defined as purely internal transformation - in our example, stringWithFormat...
west-input, operation that depends on x-output, etc.
Then you dump the operations into NSOperationQueue and see what happens (does it work, or are there contradicting dependencies...)

Matching a virtual machine design with its primary programming language

As background for a side project, I've been reading about different virtual machine designs, with the JVM of course getting the most press. I've also looked at BEAM (Erlang), GHC's RTS (kind of but not quite a VM) and some of the JavaScript implementations. Python also has a bytecode interpreter that I know exists, but have not read much about.
What I have not found is a good explanation of why particular virtual machine design choices are made for a particular language. I'm particularly interested in design choices that would fit with concurrent and/or very dynamic (Ruby, JavaScript, Lisp) languages.
Edit: In response to a comment asking for specificity here is an example. The JVM uses a stack machine rather then a register machine, which was very controversial when Java was first introduced. It turned out that the engineers who designed the JVM had done so intending platform portability, and converting a stack machine back into a register machine was easier and more efficient then overcoming an impedance mismatch where there were too many or too few registers virtual.
Here's another example: for Haskell, the paper to look at is Implementing lazy functional languages on stock hardware: the Spineless Tagless G-machine. This is very different from any other type of VM I know about. And in point of fact GHC (the premier implementation of Haskell) does not run live, but is used as an intermediate step in compilation. Peyton-Jones lists no less then 8 other virtual machines that didn't work. I would like to understand why some VM's succeed where other fail.
I'll answer your question from a different tack: what is a VM? A VM is just a specification for "interpreter" of a lower level language than the source language. Here I'm using the black box meaning of the word "interpreter". I don't care how a VM gets implemented (as a bytecode intepereter, a JIT compiler, whatever). When phrased that way, from a design point of view the VM isn't the interesting thing it's the low level language.
The ideal VM language will do two things. One, it will make it easy to compile the source language into it. And two it will also make it easy to interpret on the target platform(s) (where again the interpreter could be implemented very naively or could be some really sophisticated JIT like Hotspot or V8).
Obviously there's a tension between those two desirable properties, but they do more or less form two end points on a line through the design space of all possible VMs. (Or, perhaps some more complicated shape than a line because this isn't a flat Euclidean space, but you get the idea). If you build your VM language far outside of that line then it won't be very useful. That's what constrains VM design: putting it somewhere into that ideal line.
That line is also why high level VMs tend to be very language specific while low level VMs are more language agnostic but don't provide many services. A high level VM is by its nature close to the source language which makes it far from other, different source languages. A low level VM is by its nature close to the target platform thus close to the platform end of the ideal lines for many languages but that low level VM will also be pretty far from the "easy to compile to" end of the ideal line of most source languages.
Now, more broadly, conceptually any compiler can be seen as a series of transformations from the source language to intermediate forms that themselves can be seen as languages for VMs. VMs for the intermediate languages may never be built, but they could be. A compiler eventually emits the final form. And that final form will itself be a language for a VM. We might call that VM "JVM", "V8"...or we might call that VM "x86", "ARM", etc.
Hope that helps.
One of the techniques of deriving a VM is to just go down the compilation chain, transforming your source language into more and more low level intermediate languages. Once you spot a low level enough language suitable for a flat representation (i.e., the one which can be serialised into a sequence of "instructions"), this is pretty much your VM. And your VM interpreter or JIT compiler would just continue your transformations chain from the point you selected for a serialisation.
Some serialisation techniques are very common - e.g., using a pseudo-stack representation for expression trees (like in .NET CLR, which is not a "real" stack machine at all). Otherwise you may want to use an SSA-form for serialisation, as in LLVM, or simply a 3-address VM with an infinite number of registers (as in Dalvik). It does not really matter which way you take, since it is only a serialisation and it would be de-serialised later to carry on with your normal way of compilation.
It is a bit different story if you intend to interpret you VM code immediately instead of compiling it. There is no consensus currently in what kind of VMs are better suited for interpretation. Both stack- (or I'd dare to say, Forth-) based VMs and register-based had proven to be efficient.
I found this book to be helpful. It discusses many of the points you are asking about. (note I'm not in any way affiliated with Amazon, nor am I promoting Amazon; just was the easiest place to link from).
http://www.amazon.com/dp/1852339691/

What opcode dispatch strategies are used in efficient interpreters?

What techniques promote efficient opcode dispatch to make a fast interpreter? Are there some techniques that only work well on modern hardware and others that don't work well anymore due to hardware advances? What trade offs must be made between ease of implementation, speed, and portability?
I'm pleased that Python's C implementation is finally moving beyond a simple switch (opcode) {...} implementation for opcode dispatch to indirect threading as a compile time option, but I'm less pleased that it took them 20 years to get there. Maybe if we document these strategies on stackoverflow the next language will get fast faster.
There are a number of papers on different kinds of dispatch:
M. Anton Ertl and David Gregg, Optimizing Indirect Branch Prediction Accuracy in Virtual Machine Interpreters, in Proceedings of the ACM SIGPLAN 2003 Conference on Programming Language Design and Implementation (PLDI 03), pp. 278-288, San Diego, California, June 2003.
M. Anton Ertl and David Gregg, The behaviour of efficient virtual machine interpreters on modern architectures, in Proceedings of the 7th European Conference on Parallel Computing (Europar 2001), pp. 403-412, LNCS 2150, Manchester, August 2001.
An excellent summary is provided by Yunhe Shi in his PhD thesis.
Also, someone discovered a new technique a few years ago which is valid ANSI C.
Befor you start anything, check Lua.
It's small (150Kb), pure ANSI C, works on anything having C compiler. Very fast.
And most important - source code is clean and readable. Worth checking out.
Indirect threading is a strategy where each opcode implementation has its own JMP to the next opcode. The patch to the Python interpreter looks something like this:
add:
result = a + b;
goto *opcode_targets[*next_instruction++];
opcode_targets maps the instruction in the language's bytecode to the location in memory of the opcode implementation. This is faster because the processor's branch predictor can make a different prediction for each bytecode, in contrast to a switch statement that has only one branch instruction.
The compiler must support computed goto for this to work, which mostly means gcc.
Direct threading is similar, but in direct threading the array of opcodes is replaced with pointers to the opcode implentations like so:
goto *next_opcode_target++;
These techniques are only useful because modern processors are pipelined and must clear their pipelines (slow) on a mispredicted branch. The processor designers put in branch prediction to avoid having to clear the pipeline as often, but branch prediction only works for branches that are more likely to take a particular path.
Just-in-time compilation is one.
One big win is to store the source code in an intermediate form, rather than redoing lexical analysis and parsing during execution.
This can range all the way from just storing the tokens, through Forth style threaded code and on to JIT compilation.
Benchmarking is a good technique on making anything fast on given platform(s). Test, refine, test again, improve.
I don't think you can get any better answer. There's lot of techniques to make interpreters. But I give you a tip, do not do trade offs, just choose what you really need and pursue those goals.
I found an blog post on threaded interpreter implementation that was useful.
The author describes GCC label-based threading and also how to do this in Visual Studio using inline assembler.
http://abepralle.wordpress.com/2009/01/25/how-not-to-make-a-virtual-machine-label-based-threading/
The results are interesting. He reports 33% performance improvement when using GCC but surprisingly the Visual Studio inline assembly implementation is 3 times slower!
The question is a bit vague. But, it seems you're asking about writing an interpreter.
Interpreters typically utilize traditional parsing components: lexer, parser, and abstract syntax tree (AST). This allows the designer to read and interpret valid syntax and build a tree structure of commands with associated operators, parameters, etc.
Once in AST form, the entire input is tokenized and the interpreter can begin executing by traversing the tree.
There are many options, but I've recently used ANTLR as a parser generator that can build parsers in various target languages, including C/C++ and C#.