What are the specifics about the continuations upon which Raku(do) relies? - raku

The topic of delimited continuations was barely discussed among programming language enthusiasts in the 1990s and 2000s. It has recently been re-emerging as a major thing in programming language discussions.
My hope is that someone can at least authoritatively say whether the continuations underlying Rakudo (as contrasted with Raku) do or don't have each of the six characteristics listed below. I say a bit more about the sort of answer I'm hoping for after the list.
Quoting verbatim (with a formatting touch up) from an online message[1] written by the person driving the work on adding continuations to the JVM:
Asymmetric: When the continuation suspends or yields, the execution returns to the caller (of Continuation.run()). Symmetric continuations don't have the notion of a caller. When they yield, they must specify another continuation to transfer the execution to. Neither symmetric nor asymetric continuations are more powerful than one another, and each could be used to simulate the other.
Stackful: The continuation can be suspended at any depth in the call-stack, rather than in the same subroutine where the delimited context begins when the continuation is stackless (as is the case in C#). I.e the continuation has its own stack rather than just a single subroutine frame. Stackful continuations are more powerful than stackless ones.
Delimited: The continuation captures the execution context that starts with a specific call (in our case, the body of a certain runnable) rather than the entire execution state all the way up to main(). Delimited continuations are strictly more powerful than undelimited ones (http://okmij.org/ftp/continuations/undelimited.html), the latter considered "not practically useful" (http://okmij.org/ftp/continuations/against-callcc.html).
Multi-prompt: Continuations can be nested, and anywhere in the call stack, any of the enclosing continutions can be suspended. This is similar to nesting of try/catch blocks, and throwing an exception of a certain type that unwinds the stack up to the nearest catch that handles it rather than just the nearest catch. An example of nested continuations can be using a Python-like generator inside a virtual thread. The generator code can do a blocking IO call, which will suspend the enclosing thread continuation, and not just the generator: https://youtu.be/9vupFNsND6o?t=2188
One-shot/non-reentrant: Every time we continue a suspended continuation its state is mutated, and we cannot continue it from the same suspension state multiple times (i.e we can't go back in time). This is unlike reentrant continuations where every time we suspend them, a new immutable continuation object that represents a particular suspension point is returned. I.e. the continuation is a single point in time, and every time we continue it we go back to that state. Reentrant continuations are strictly more powerful than non-reentrant ones; i.e. they can do things that are strictly impossible with just one-shot continuations.
Cloneable: If we are able to clone a one-shot continuation we can provide the same ability as reentrant continuations. Even though the continuation is mutated every time we continue it, we can clone its state before continuing to create a snapshot of that point in time that we can return to later.
Aiui continuations aren't directly exposed in Raku, so perhaps the correct answer related to Raku (as against Rakudo) would be "there are no continuations". But that's not clear to me so in the following, in which I describe what I'm hoping might be in an answer if I'm very lucky, I'll pretend it makes some sense to talk about them in the context of both Raku and Rakudo as two distinct realms.
Here's the sort of answer I'm imagining would be possible (though I'm just somewhat wildly guessing at what is actually true):
"As a "100 year" language design, Raku's current underlying semantic [execution?] model requires, at minimum, stackless one-shot multi prompt delimited continuations.
From a theoretic pov, Raku's design can never expand to require that continuations are cloneable but it could theoretically expand to require they are stackful.
Rakudo implements the currently required continuation semantics.
MoarVM has support for these semantics built in, and could realistically track the theoretically possible expansions of requirements if Raku's design ever so expands.
The JVM and JS backends have suitable shims that achieve the same thing, albeit at a cost to performance. It seems plausible that the JVM backend could switch to using continuations that are native to the JVM if it comes to pass that it gets them, provided of course that they meet requirements, but my current impression is that it would likely realistically be perhaps a decade away, or more, before we would need to consider crossing that bridge."
(Or something vaguely like that.)
If an answer also provided a bit more detail on something like the above, perhaps some code links, that would be a particularly awesome addition.
Similarly, if an answer included a couple brief examples of how this continuation power surfaces in current Raku features, and a speculation about how it might one day, say 10 years from now, surface in other features, that would make an answer an over-the-top brilliant one.
PS. Thank you to #Larry who understood things deeply enough to know continuations needed to be part of the picture; to Stefan O'Rear for his contributions, including the initial implementations of what I think are one-shot multi prompt delimited continuations; and to jnthn for making the dream come true.
Footnotes
1 There is work underway to introduce continuations as a first class construct to the JVM. A key driver of this effort is Ron Pressler. The above is based on a message he wrote in November.

Rakudo uses continuations as an implementation strategy for two features:
gather/take - for implementing lazy iterators
Making await on the thread pool non-blocking
The characteristics of the continuations implemented follow the requirements of these language features. I'll go through them in a slightly different order than above because it eases explaining.
Stackful - yes, because we need to be able to do the take or await at any depth in the callstack relative to the gather or the thread pool worker's work loop. For example, you could write a recursive graph traversal algorithm inside of a gather and then take each encountered node. For await, this is at the heart of the difference between Raku's await and await as seen in many other languages: you don't have to refactor all the way up the call stack.
Delimited - yes. The continuation reset operation installs a tag (or "prompt"), and when we do a continuation control operation, we slice the stack at this delimiter. I can't imagine how you'd implement the Raku features involved without them being delimited.
Multi-prompt - yes, this is required because you can be iterating one data source provided by a gather inside of another gather's implementation, or do an await inside of a gather.
Asymmetric - after the continuation has been taken, execution continues after the reset instruction. In the await case, we go and find another task in the worker task queue, and in the take case we're back in the pull-one method of the iterator and can return the taken value. I think this approach fits well in a language where only a few features use continuations.
One-shot/non-reentrant - yes, and at least in MoarVM the memory safety of the runtime depends on this property. It is enforced by an atomic compare and swap operation, so if two threads were to race to invoke the continuation, only one could ever succeed. No Raku features need the additional complexity that reentrant continuations would imply.
Cloneable - no, because no Raku features need it. In theory this isn't too awful to implement in MoarVM in terms of saying "yes, we can do it", but I suspect it raises a lot of questions like "how deep should be clone". If you just cloned all the invocation records and similar, you'd still share Scalar containers, Arrays, etc. between the clones.
As I understand it - though I follow from a distance - the JVM continuations are at least partly aimed at the same design space that the Raku await mechanism is in, and so I'd be surprised if they didn't end up providing what Raku needs. This would clearly simplify compilation of Raku code to the JVM (currently it does the global CPS transform as it does code generation, which curiously turned out simpler than I expected), and it'd almost certainly perform much better too, because the transform required probably obscures quite a few things from the perspective of the JIT compiler.
So far as code goes, you can see the current continuations implementation, which uses the continuation data structure which in turn has various bits of memory management. At the time of writing, these have all been significantly refactored as part of the new callstack representation required by ongoing dispatcher work; those changes do make working with continuations more efficient, but don't change the overall set of operations.

Related

Synchronized collection that blocks on every method

I have a collection that is commonly used between different threads. In one thread I need to add items, remove items, retrieve items and iterate over the list of items. What I am looking for is a collection that blocks access to any of its read/write/remove methods whenever any of these methods are already being called. So if one thread retrieves an item, another thread has to wait until the reading has completed before it can remove an item from the collection.
Kotlin doesn't appear to provide this. However, I could create a wrapper class that provides the synchronization I'm looking for. Java does appear to offer the synchronizedList class but from what I read, this is really for blocking calls on a single method, meaning that no two threads can remove an item at the same time but one can remove while the other reads an item (which is what I am trying to avoid).
Are there any other solutions?
A wrapper such as the one returned by synchronizedList
synchronizes calls to every method, using the wrapper itself as the lock. So one thread would be blocked from calling get(), say, while another thread is currently calling put(). (This is what the question seems to ask for.)
However, as the docs to that method point out, this does nothing to protect sequences of calls, such as you might use when iterating through a collection. If another thread changes the collection in between your calls to next(), then anything could happen. (This is what I think the question is really about!)
To handle that safely, your options include:
Manual synchronization. Surround each sequence of calls to the collection in a synchronized block that synchronises on the collection, e.g.:
val list = Collections.synchronizedList(mutableListOf<String>())
// …
synchronized (list) {
for (i in list) {
// …
}
}
This is straightforward, and relatively easy to do if the collection is under your control. But if you miss any sequences, then you could get unexpected behaviour. Also, you'll need to keep your sequences short, to avoid holding the lock for an extended time and affecting performance.
Use a concurrent collection implementation which provides primitives letting you do all the processing you need in a single call, avoiding iteration and other sequences.
For maps, Java provides very good support with its ConcurrentMap interface, and high-performance implementations such as ConcurrentHashMap. These have methods allowing you to iterate, update single or multiple mappings, search, reduce, and many other whole-map operations in a single call, avoiding any concurrency problems.
For sets (as per this question) you can use a ConcurrentSkipListSet, or you can create one from a ConcurrentHashMap with newKeySet().
For lists (as per this question), there are fewer options. (I think concurrent lists are much less commonly needed.) If you don't need random access, ConcurrentLinkedQueue may suffice. Or if modification is much less common than iteration, CopyOnWriteArrayList could work.
There are many other concurrent classes in the java.util.concurrent package, so it's well worth looking through to see if any of those is a better match for your particular case.
If you have specialised requirements, you could write your own collection implementation which supports them. Obviously this is more work, and only worthwhile if none of the above approaches does what you want.
In general, I think it's well worth stepping back and seeing whether iteration is really needed. Historically, in imperative languages all the way from FORTRAN through BASIC and C up to Java, the for loop has traditionally been the tool of choice (sometimes the only structure) for operating on collections of data — and for those of us who grew up on those languages, it's what we reach for instinctively. But the functional programming paradigm provides alternative tools, and so in languages like Kotlin which provide some of them, it's good to stop and ask ourselves “What am I ultimately trying to achieve here?” (Often what we want is actually to update all entries, or map to a new structure, or search for an element, or find the maximum — all of which have better approaches in Kotlin than low-level iteration.)
After all, if you can tell the compiler what you want to do, instead of how to do it, then your program is likely to be shorter and easier to read and maintain, freeing you to think about more important things!

Do we need to lock the immutable list in kotlin?

var list = listOf("one", "two", "three")
fun One() {
list.forEach { result ->
/// Does something here
}
}
fun Two() {
list = listOf("four", "five", "six")
}
Can function One() and Two() run simultaneously? Do they need to be protected by locks?
No, you dont need to lock the variable. Even if the function One() still runs while you change the variable, the forEach function is running for the first list. What could happen is that the assignment in Two() happens before the forEach function is called, but the forEach would either loop over one or the other list and not switch due to the assignment
if you had a println(result) in your forEach, your program would output either
one
two
three
or
four
five
six
dependent on if the assignment happens first or the forEach method is started.
what will NOT happen is something like
one
two
five
six
Can function One() and Two() run simultaneously?
There are two ways that that could happen:
One of those functions could call the other.  This could happen directly (where the code represented by // Does something here in One()⁽¹⁾ explicitly calls Two()), or indirectly (it could call something else which ends up calling Two() — or maybe the list property has a custom setter which does something that calls One()).
One thread could be running One() while a different thread is running Two().  This could happen if your program launches a new thread directly, or a library or framework could do so.  For example, GUI frameworks tend to have one thread for dispatching events, and others for doing work that could take time; and web server frameworks tend to use different threads for servicing different requests.
If neither of those could apply, then there would be no opportunity for the functions to run simultaneously.
Do they need to be protected by locks?
If there's any possibility of them being run on multiple threads, then yes, they need to be protected somehow.
99.999% of the time, the code would do exactly what you'd expect; you'd either see the old list or the new one.  However, there's a tiny but non-zero chance that it would behave strangely — anything from giving slightly wrong results to crashing.  (The risk depends on things like the OS, CPU/cache topology, and how heavily loaded the system is.)
Explaining exactly why is hard, though, because at a low level the Java Virtual Machine⁽²⁾ does an awful lot of stuff that you don't see.  In particular, to improve performance it can re-order operations within certain limits, as long as the end result is the same — as seen from that thread.  Things may look very different from other threads — which can make it really hard to reason about multi-threaded code!
Let me try to describe one possible scenario…
Suppose Thread A is running One() on one CPU core, and Thread B is running Two() on another core, and that each core has its own cache memory.⁽³⁾
Thread B will create a List instance (holding references to strings from the constant pool), and assign it to the list property; both the object and the property are likely to be written to its cache first.  Those cache lines will then get flushed back to main memory — but there's no guarantee about when, nor about the order in which that happens.  Suppose the list reference gets flushed first; at that point, main memory will have the new list reference pointing to a fresh area of memory where the new object will go — but since the new object itself hasn't been flushed yet, who knows what's there now?
So if Thread A starts running One() at that precise moment, it will get the new list reference⁽⁴⁾, but when it tries to iterate through the list, it won't see the new strings.  It might see the initial (empty) state of the list object before it was constructed, or part-way through construction⁽⁵⁾.  (I don't know whether it's possible for it to see any of the values that were in those memory locations before the list was created; if so, those might represent an entirely different type of object, or even not a valid object at all, which would be likely to cause an exception or error of some kind.)
In any case, if multiple threads are involved, it's possible for one to see list holding neither the original list nor the new one.
So, if you want your code to be robust and not fail occasionally⁽⁶⁾, then you have to protect against such concurrency issues.
Using #Synchronized and #Volatile is traditional, as is using explicit locks.  (In this particular case, I think that making list volatile would fix the problem.)
But those low-level constructs are fiddly and hard to use well; luckily, in many situations there are better options.  The example in this question has been simplified too much to judge what might work well (that's the down-side of minimal examples!), but work queues, actors, executors, latches, semaphores, and of course Kotlin's coroutines are all useful abstractions for handling concurrency more safely.
Ultimately, concurrency is a hard topic, with a lot of gotchas and things that don't behave as you'd expect.
There are many source of further information, such as:
These other questions cover some of the issues.
Chapter 17: Threads And Locks from the Java Language Specification is the ultimate reference on how the JVM behaves.  In particular, it describes what's needed to ensure a happens-before relationship that will ensure full visibility.
Oracle has a tutorial on concurrency in Java; much of this applies to Kotlin too.
The java.util.concurrent package has many useful classes, and its summary discusses some of these issues.
Concurrent Programming In Java: Design Principles And Patterns by Doug Lea was at one time the best guide to handling concurrency, and these excerpts discuss the Java memory model.
Wikipedia also covers the Java memory model
(1) According to Kotlin coding conventions, function names should start with a lower-case letter; that makes them easier to distinguish from class/object names.
(2) In this answer I'm assuming Kotlin/JVM.  Similar risks are likely apply to other platforms too, though the details differ.
(3) This is of course a simplification; there may be multiple levels of caching, some of which may be shared between cores/processors; and some systems have hardware which tries to ensure that the caches are consistent…
(4) References themselves are atomic, so a thread will either see the old reference or the new one — it can't see a bit-pattern comprising parts of the old and new ones, pointing somewhere completely random.  So that's one problem we don't have!
(5) Although the reference is immutable, the object gets mutated during construction, so it might be in an inconsistent state.
(6) And the more heavily loaded your system is, the more likely it is for concurrency issues to occur, which means that things will probably fail at the worst possible time!

Perfomance in audio processing with objective-c ++

I'm currently developing an audio application and the performance is one of my main concerns.
There are really good articles like Four common mistakes in audio development or Real-time audio programming 101: time waits for nothing.
I understood that the c++ is the way to go for audio processing but I still have a question: Does Objective-C++ slow down the performance?
For example with a code like this
#implementation MyObjectiveC++Class
- (float*) objCMethodWithOnlyC++:(float*) input {
// Process in full c++ code here
}
#end
Will this code be less efficient than the same one in a cpp file?
Bonus question: What will happen if I use GrandCentralDispatch inside this method in order to to parallelize the process?
Calling an obj C method is slower than calling a pure c or c++ method as the obj C runtime is invoked at every call. If it matters in your case dependent on the number of samples processed in each call. If you are only processing one sample at a time, then this might be a problem. If you process a large buffer then I wouldn't worry too much.
The best thing to do is to profile it and then evaluate the results against your requirements for performance.
And for your bonus question the answer is somewhat the same. GCD comes at a cost, and if that cost is larger than what you gain by parallelising it, then it is not worth. so again it depends on the amount of work you plan to do per call.
Regards
Klaus
To simplify, in the end ObjC and C++ code goes through the same compile and optimize chain as C. So the performance characteristics of identical code inside an ObjC or C++ method are identical.
That said, calling ObjC or C++ methods has different performance characteristics. ObjC provides a dynamically modifiable, binary stable ABI with its methods. These additional guarantees (which are great if you are exposing methods via public API from e.g. a framework) come at a slight performance cost.
If you are doing lots of calls to the same method (e.g. per sample or per pixel), that tiny performance penalty may add up. The same applies to ivar access, which carries another load on top of how ivar access would be in C++ (on the modern runtime) to guarantee binary stability.
Similar considerations apply to GCD. GCD parallelizes operations, so you pay a penalty for thread switches (like with regular threads) and for each new block you dispatch. But the actual code in them runs at just the same speed as it would anywhere else. Also, different from starting your own threads, GCD will re-use threads from a pool, so you don't pay the overhead for creating new threads repeatedly.
It's a trade-off. Depending on what your code does and how it does it and how long individual operations take, either approach may be faster. You'll just have to profile and see.
The worst thing you can probably do is do things that don't need to be real-time (like updating your volume meter) in one of the real-time CoreAudio threads, or make calls that block.
PS - In most cases, performance differences will be negligible. So another aspect to focus on would be readability and easy maintenance of your code. E.g. using blocks can make code that has lots of asynchronicity easier to read because you can set up the blocks in the correct order in one method of your source file, making the flow clearer than if you split them across several methods that all just do a tiny thing and then start an asynchronous process.

Functional data-structures, OO notions of dispatched equality and comparison, StructuralEquality, and referential transparency

I have a very CPU intensive F# program that depends on persistent data-structures - about 40% of the total CPU time is spent in the Map module. So I thought I'd try out the PersistentHashMap in FSharpX collections. (BTW, this is already a big improvement over the previous version of F# in VS2013 where the same program spent 70% of its time in Map. I also notice that running programs with the debugger attached doesn't have the huge penalty it did before - good work guys...) There is also a hot-spot where I'm re-sorting all the time, where instead I should be adding to a Heap, so I thought I'd give that a go as well.
Two issue became immediately apparent:
(1) Swapping out one for the other from an interface perspective proved harder than it seems it should - I.e., making a shim that let me switch from a Map to a PersistentMap, preserving both the needed module-based let-bound functions and Types necessary to use the each map. I know that having full HM type-inference (and no type-classes) is orthogonal to LSP-style referential transparency for the most part - but maybe I was missing some way to do this better with a minimal amount of code.
(2) The biggest problem (which I'd like to focus on here) is the reliance of the F# functional data-structs on oo-style dispatched equality and comparison via the IComparison (when 't : comparison), etc., family of interfaces.
Even for OO programs ISTM that the idea of dispatching equality and comparison is a bad idea -- an object "knows" how to perform its own domain-specific tasks, but it doesn't "know" for the most part what notion of equality is going to be necessary at various points in the program for various purposes -- so equality/comparison should not be part of the object's interface, but when these concepts are needed, they should always be mentioned explicitly. For example, there should never be a .Sort(), only a .SortWith(...). One could argue that even something as basic as structural equality in F# could be explicit a.StructEq(b) or a ~= b - otherwise you always get object.Equals -- but even stipulating that doing things this way is the best for a multi-paradigm language that's a first-class .Net citizen, it seems like there should at least be the option of using passed-in comparison and equality functions, but this is not the case.
This means that: (a) type constraints are enforced even if you don't want them, causing ripples of broken inferred typing (and hundreds of wavy red lines with it being unclear where the actual "problem" is) and (b), that by implementing a notion of equality or comparison that makes one container type happy in one part of your program (and in my case I want to use the same container and item type with two different notions of ordering in two different places), it is likely to silently break (or cause inefficiency, if one subsumes the other) in other parts of the code that depended on the default/previous implementation.
The only way around this that I could think of is wrapping each item a adapter object using new...with object expression - but I really don't want to create so much garbage just to get the code to work.
So, ISTM that we could have a "pure" version of each persistent data struct that could be loaded if desired (even basics like List, etc.) that do not depend on dispatched equality/comparison/hashing and do not impose type constraints - all such needs should be via a passed in fn's at the time of the call. (Dispatched eq/cmp would be only for used for interop with BCL collections that don't accept delegates.) Then we could have a [EqCmpHashThrowNotImplemented] attribute, and I could be sure that there were no default operations happening at all, and I would feel better about the efficiency and predictability of my code. (And this also let's one change from a Record to a Class or visa-versa w/o worrying about any changes in behavior due to default implementations.) Again, this would be optional, but done by with a simple import. (Which does mean that each base core collection type would have to be broken out into its own module, which isn't really a bad idea anyway.)
If I've overlooked a better way to do things or there are some patterns people are using here, I'd be interested.

What does it really mean that a programming language is stackless?

According to this answer
https://stackoverflow.com/questions/551950/what-stackless-programming-languages-are-available/671296#671296
all of these programming languages are stackless
Stackless Python
PyPy
Lisp
Scheme
Tcl
Lua
Parrot VM
What does it really mean for them to be stackless? Does it mean they don't use a call stack? If they don't use a call stack, what do they use?
What does it really mean for them to be stackless? Does it mean they don't use a call stack?
Yes, that's about right.
If they don't use a call stack, what do they use?
The exact implementation will, of course, vary from language to language. In Stackless Python, there's a dispatcher which starts the Python interpreter using the topmost frame and its results. The interpreter processes opcodes as needed one at a time until it reaches a CALL_FUNCTION opcode, the signal that you're about to enter into a function. This causes the dispatcher to build a new frame with the relevant information and return to the dispatcher with the unwind flag. From there, the dispatcher begins anew, pointing the interpreter at the topmost frame.
Stackless languages eschew call stacks for a number of reasons, but in many cases it's used so that certain programming constructs become much easier to implement. The canonical one is continuations. Continuations are very powerful, very simple control structures that can represent any of the usual control structures you're probably already familiar with (while, do, if, switch, et cetera).
If that's confusing, you may want to try wrapping your head around the Wikipedia article, and in particular the cutesy continuation sandwich analogy:
Say you're in the kitchen in front of the refrigerator, thinking about a sandwich. You take a continuation right there and stick it in your pocket. Then you get some turkey and bread out of the refrigerator and make yourself a sandwich, which is now sitting on the counter. You invoke the continuation in your pocket, and you find yourself standing in front of the refrigerator again, thinking about a sandwich. But fortunately, there's a sandwich on the counter, and all the materials used to make it are gone. So you eat it.
They don't use a call stack, because they operate in continuation-passing style. If you're not familiar with tail call optimization, that's probably a good first step to understanding what this means.
To emulate the traditional call/return on this model, instead of pushing a return address and expecting the remainder of the frame to remain untouched the caller closes over the remainder of its code and any variables that are still needed (the rest are freed). It then performs a tail call to the callee, passing this continuation as an argument. When the callee "returns", it does so by calling this continuation, passing the return value as an argument to it.
As far as the above goes, it's merely a complicated way to do function calls. However, it generalizes very nicely to more complex scenarios:
exception/finally/etc blocks are very easily modeled - if you can pass one "return" continuation as an argument, you can pass 2 (or more) just as easily. lisp-y "condition handler" blocks (which may or may not return control to the caller) are also easy - pass a continuation for the remainder of this function, that might or might not be called.
Multiple return values are similarly made easy - pass several arguments on to the continuation.
Return temporaries/copying are no longer different than function argument passing. This often makes it easier to eliminate temporaries.
Tail recursion optimization is trivial - the caller simply passes on the "return" continuation it received rather than capturing a new one.