What does justify the use of coroutines instead of subroutines? - coroutine

I am currently exploring coroutines in Python, but I have difficulties in determining when to use them over normal subroutines. I am trying to explain my problem with the help of an example.
The task is to iterate over tabular data and format the content of the row according to its entry in the type field. When formatted, the result is written to the same output file.
index | type | time | content
-----------------------------
0 | A | 4 | ...
1 | B | 6 | ...
2 | C | 9 | ...
3 | B | 11 | ...
...
Normally, I would check the type and write some sort of switch case and delegate the data to a specific subroutine (== function) like so (pseudo-code):
outfile = open('test.txt')
for row in infile:
switch(row.type)
{
case A:
format_a(row.content, outfile) // subroutine that formats and writes data of type A
break
case B:
format_b(row.content, outfile) // same for type B...
break
case C:
format_c(row.content, outfile) // ... and type C
break
default:
// handle unknown type exception
break
}
close(outfile)
The question is: would I get any benefit from realizing this with coroutines? I don't think so, and let me explain why: if I have determined the type of a row, I would pass the content to the respective coroutine. As long as I do this, the other coroutines and the calling function are paused. The data is formatted and written to the file and passes control back to the calling function which gets the next row, etc. This procedure repeats until I run out of rows. Therefore, this is exactly what the workflow using subroutines would look like.
One pro for using coroutines here would be if I had to keep track of some state. Maybe I am interested in the time difference to the last row per type. In this case, the coroutine function for B would save the time of its first call (6). When it is called the second time, it retrieves the value (11) and can calculate the difference (11 - 6 = 5). This would be way harder to do with subroutines.
But is the argument of keeping track of some state the only reason for using coroutines? I am looking for a rule-of-thumb, not a rule that covers every case possible.

Related

How to make Idris warn of incomplete cases/matches?

For example when implementing a Show instance for the following type:
data Shape = Circle Double
| Box Vector2D
| Polygon (List Vector2D)
| Chain (List Vector2D)
...and omitting the Chain case, Idris will successfully type check this file.
Similar issues go for implementing other functions.
Adding %default total at the beginning of the file doesn't seem to help this, but my impression was that it should.
With this:
%default total
data DataType = A | B | C
Show DataType where
show A = "A"
I get
$ idris Testme.idr
Type checking ./Testme.idr
Testme.idr:5:1-13:
|
5 | Show DataType where
| ~~~~~~~~~~~~~
Main.DataType implementation of Prelude.Show.Show is possibly not total due to: Prelude.Show.Main.DataType implementation of Prelude.Show.Show, method show
Please note that you will not see this kind of warnings in Atom - they do not seem part of the compiler / editor protocol.
There is an option you can call idris with:
--total Require functions to be total by default
I'm still not sure if this is the only way, why %default total seems to do nothing, and whether it's possible to only get a warning when a function is not total.

menhir - associate AST nodes with token locations in source file

I am using Menhir to parse a DSL. My parser builds an AST using an elaborate collection of nested types. During later typecheck and other passes in error reports generated for a user, I would like to refer to source file position where it occurred. These are not parsing errors, and they generated after parsing is completed.
A naive solution would be to equip all AST types with additional location information, but that would make working with them (e.g. constructing or matching) unnecessary clumsy. What are the established practices to do that?
I don't know if it's a best practice, but I like the approach taken in the abstract syntax tree of the Frama-C system; see https://github.com/Frama-C/Frama-C-snapshot/blob/master/src/kernel_services/ast_data/cil_types.mli
This approach uses "layers" of records and algebraic types nested in each other. The records hold meta-information like source locations, as well as the algebraic "node" you can match on.
For example, here is a part of the representation of expressions:
type ...
and exp = {
eid: int; (** unique identifier *)
enode: exp_node; (** the expression itself *)
eloc: location; (** location of the expression. *)
}
and exp_node =
| Const of constant (** Constant *)
| Lval of lval (** Lvalue *)
| UnOp of unop * exp * typ
| BinOp of binop * exp * exp * typ
...
So given a variable e of type exp, you can access its source location with e.eloc, and pattern match on its abstract syntax tree in e.enode.
So simple, "top-level" matches on syntax are very easy:
let rec is_const_expr e =
match e.enode with
| Const _ -> true
| Lval _ -> false
| UnOp (_op, e', _typ) -> is_const_expr e'
| BinOp (_op, l, r, _typ) -> is_const_expr l && is_const_expr r
To match deeper in an expression, you have to go through a record at each level. This adds some syntactic clutter, but not too much, as you can pattern match on only the one record field that interests you:
let optimize_double_negation e =
match e.enode with
| UnOp (Neg, { enode = UnOp (Neg, e', _) }, _) -> e'
| _ -> e
For comparison, on a pure AST without metadata, this would be something like:
let optimize_double_negation e =
match e.enode with
| UnOp (Neg, UnOp (Neg, e', _), _) -> e'
| _ -> e
I find that Frama-C's approach works well in practice.
You need somehow to attach the location information to your nodes. The usual solution is to encode your AST node as a record, e.g.,
type node =
| Typedef of typdef
| Typeexp of typeexp
| Literal of string
| Constant of int
| ...
type annotated_node = { node : node; loc : loc}
Since you're using records, you can still pattern match without too much syntactic overhead, e.g.,
match node with
| {node=Typedef t} -> pp_typedef t
| ...
Depending on your representation, you may choose between wrapping each branch of your type individually, wrapping the whole type, or recursively, like in Frama-C example by #Isabelle Newbie.
A similar but more general approach is to extend a node not with the location, but just with a unique identifier and to use a final map to add arbitrary data to nodes. The benefit of this approach is that you can extend your nodes with arbitrary data as you actually externalize node attributes. The drawback is that you can't actually guarantee the totality of an attribute since finite maps are no total. Thus it is harder to preserve an invariant that, for example, all nodes have a location.
Since every heap allocated object already has an implicit unique identifier, the address, it is possible to attach data to the heap allocated objects without actually wrapping it in another type. For example, we can still use type node as it is and use finite maps to attach arbitrary pieces of information to them, as long as each node is a heap object, i.e., the node definition doesn't contain constant constructors (in case if it has, you can work around it by adding a bogus unit value, e.g., | End can be represented as | End of unit.
Of course, by saying an address, I do not literally mean the physical or virtual address of an object. OCaml uses a moving GC so an actual address of an OCaml object may change during a program execution. Moreover, an address, in general, is not unique, as once an object is deallocated its address can be grabbed by a completely different entity.
Fortunately, after ephemera were added to the recent version of OCaml it is no longer a problem. Moreover, an ephemeron will play nicely with the GC, so that if a node is no longer reachable its attributes (like file locations) will be collected by the GC. So, let's ground this with a concrete example. Suppose we have two nodes c1 and c2:
let c1 = Literal "hello"
let c2 = Constant 42
Now we can create a location mapping from nodes to locations (we will represent the latter as just strings)
module Locations = Ephemeron.K1.Make(struct
type t = node
let hash = Hashtbl.hash (* or your own hash if you have one *)
let equal = (=) (* or a specilized equal operator *)
end)
The Locations module provides an interface of a typical imperative hash table. So let's use it. In the parser, whenever you create a new node you should register its locations in the global locations value, e.g.,
let locations = Locations.create 1337
(* somewhere in the semantics actions, where c1 and c2 are created *)
Locations.add c1 "hello.ml:12:32"
Locations.add c2 "hello.ml:13:56"
And later, you can extract the location:
# Locations.find locs c1;;
- : string = "hello.ml:12:32"
As you see, although the solution is nice in the sense, that it doesn't touch the node data type, so the rest of your code can pattern match on it nice and easy, it is still a little bit dirty, as it requires global mutable state, that is hard to maintain. Also, since we are using an object address as a key, every newly created object, even if it was logically derived from the original object, will have a different identity. For example, suppose you have a function, that normalizes all literals:
let normalize = function
| Literal str -> Literal (normalize_literal str)
| node -> node
It will create a new Literal node from the original nodes, so all the location information will be lost. That means, that you need to update the location information, every time you derive one node from another.
Another issue with ephemera is that they can't survive the marshaling or serialization. I.e., if you store your AST somewhere in a file, and then you restore it, all nodes will loose their identity, and the location table will become empty.
Speaking of the "monadic approach" that you mentioned in comments. Though monads are magic, they still can't magically solve all the problems. They are not silver bullets :) In order to attach something to a node we still need to extend it with an extra attribute - either a location information directly or an identity through which we can attach properties indirectly. The monad can be useful for the latter though, as instead of having a global reference to the last assigned identifier, we can use a state monad, to encapsulate our id generator. And for the sake of completeness, instead of using a state monad or a global reference to generate unique identifiers, you can use UUID and get identifiers that are not only unique in a program run, but are also universally unique, in the sense that there are no other objects in the world with the same identifier, no matter how often you run your program (in the sane world). And although it looks like that generating the UUID doesn't use any state, underneath the hood it still uses an imperative random number generator, so it is sort of cheating, but still can seen as pure functional, as it doesn't contain observable effects.

Block types in GNU Radio

I am still learning GNU Radio and I have trouble understanding something about signal processing block type. I understand that if I create a block taking let say 2 samples in the input and output 4 samples, it will be an interpolator of 2.
But now, I would like to create a block which will be a framer. So, it will have two inputs and one output. The block will receives the n samples from the first input, then take m inputs from the second input and append to the samples received from input one, and then output them. In this case, my samples are supposed to be bytes.
How to proceed in this case please ? Am I taking the right path like that? Do any one know to proceed with this type of scenario?
Your case (input 0 and input 1 having different relative rates to the output) is not covered by the sync_block/interpolator/decimator "templates" that GNU Radio has, so you have to use the general block approach.
Assuming you're familiar with gr_modtool¹, you can use it to add things like interpolator (relative rate >1), decimators (<1) and sync (=1) blocks:
-t BLOCK_TYPE, --block-type=BLOCK_TYPE
One of sink, source, sync, decimator, interpolator,
general, tagged_stream, hier, noblock.
But also note the general type. Using that, you can implement a block that doesn't have any restrictions on the relation between in- and output. That means that
you will have to manually consume() items from the inputs, because the number of items you took from the input can no longer be derived by the number of output items, and
you will have to implement a forecast method to tell the GNU Radio scheduler how much items you'll need for a given output.
gr_modtool will give you a stub where you'll only have to add the right code!
¹ if you're not: It's introduced in the GNU Radio Guided Tutorials, part 3 or so, somethig that I think will be a quick and fun read to you.
Considering that the question was asked 4 years ago and that there has been many changes in GNU Radio since then, I want to add to the answer that now this is possible to do with the Patterned Interleaver block.
patterned_interleaver_image
This block works the following way: it receives inputs in the ports to the left and outputs a single interleaved pattern in the port that is to the right. So let's imagine a block with 2 inputs, V1 and V2:
V1 = [0,1,0,0,1,1]
V2 = [1,1,1,0,1,0]
Suppose we want the output to be the first 2 bits of V1 followed by the first 2 bits of V2 followed by the next 2 bits of V1 and then the next 2 bits of V2 and so on...that is, we want the output to be
Vo = [0,1,1,1,0,0,1,0,1,1,1,0].
In order to accomplish this we go to the properties of the Patterned Interleaver block which looks like this:
patterned_interleaver_properties
The Pattern field allows us to control the order in which the bits in the input ports will be interleaved. By default they are in [0,0,1,1] meaning that the block will take 2 bits from input port 0 followed by 2 bits from input port 1. The corresponding output will be
[0,1,1,1,0,0,1,0,1,1,1,0],
that is, the first 2 bits of V1 followed by the first 2 bits of V2 and then the next 2 bits of V1, etc.
Let's see another example. In case the Pattern field is set to [0,0,1,1,1,0] the output will be 2 bits from input port 0 followed by 3 bits from input port 1 and then 1 bit from input port 0. In the output we will obtain [0,1,1,1,1,0,0,1,0,1,0,0].
Lastly, the Pattern field is also used to determine the number of input ports. If the Pattern field is [0,0,1,2] we will see that another input port is added to the block.
patterned_interleaver_3_inputs

Dynamically rewriting in ANTLR 3 n number of nodes from a list

I am using ANTLR 3 to parse and rewrite Answer-Set Programs (ASP). What I want to do is parse an ASP program and output an AST with some rewriting. I can easily add and remove nodes to/from the AST but what I need to do is add nodes to the root dynamically (effectively, adding new rules to the ASP program). Which nodes to add and how many is based on the input ASP program.
Below I have an example from my lexer and parser which outputs an AST. r_rule returns a LinkedHashMap that is filled based on what it matches. For each member of the LinkedHashMap, in the rewrite for r_program I want to add a new node to the root node PROGRAM. However, I cannot seem to find a way to iterate through the LinkedHashMap and add new nodes.
#members {
int rID = 0;
}
r_program
: (a=r_rule)* -> ^(PROGRAM r_rule*);
r_rule returns [LinkedHashMap<String, String> somehm]
#init {
$somehm = new LinkedHashMap<String, String>();
String strrID = Integer.toString(++rID);
}
: (head = r_head) ':-'
body=r_body[strrID] {$vartypes.putAll($body.vartypes); } -> ^(LIMPL $head ^(EXTENSION ^(NUMBER[strrID] $head)) $body);
I can use a semantic predicate but only to check a property of the LinkedHashMap. I can arbitrarily loop through the HashMap with inserted code, but I can't then, for each iteration, add child nodes or trigger a rewrite. The code generated is in fact put in the wrong place to even do this in an ugly way using Java (I can't access the root node PARENT).
What can I do about this? A completely different approach is also welcome. Many thanks!
Update 1
An example input is:
head_pred(X, Y, Z) :- body_1(X), body_1(Y), body_1(Z).
An example AST is, apologies for the drawing (n.b. strictly an example used for readability, in reality many more nodes are used in the rewrite)...
PROGRAM
|
|____:-
| |____head_pred(X, Y, X)
| |____body_1(X)
| |____body_1(Y)
| |____body_1(Z)
| |____X == Y
|
|____:-
| |____head_pred(X, Y, X)
| |____body_1(X)
| |____body_1(Y)
| |____body_1(Z)
| |____X == Z
I could go on, the idea is that each rule binds the variables differently, if they can be bound. Different inputs change the number and content of the children of PROGRAM.

How does multiple assignment work?

From Section 4.1 of Programming in Lua.
In a multiple assignment, Lua first evaluates all values and only then
executes the assignments. Therefore, we can use a multiple assignment
to swap two values, as in
x, y = y, x -- swap x' fory'
How does the assignment work actually?
How multiple assignment gets implemented depends on what implementation of Lua you are using. The implementation is free to do things anyway it likes as long as it preserves the semantics. That is, no matter how things get implemented, you should get the same result as if you had saved all the values in the RHS before assigning them to the LHS, as the Lua book explains.
If you are still curious about the actual implementation, one thing you can do is see what is the bytecode that gets produced for a certain program. For example, taking the following program
local x,y = 10, 11
x,y = y,x
and passing it to the bytecode compiler (luac -l) for Lua 5.2 gives
main <lop.lua:0,0> (6 instructions at 0x9b36b50)
0+ params, 3 slots, 1 upvalue, 2 locals, 2 constants, 0 functions
1 [1] LOADK 0 -1 ; 10
2 [1] LOADK 1 -2 ; 11
3 [2] MOVE 2 1
4 [2] MOVE 1 0
5 [2] MOVE 0 2
6 [2] RETURN 0 1
The MOVE opcode assigns the value in the right register to the left register (see lopcodes.h in the Lua source for more details). Apparently, what is going on is that registers 0 and 1 are being used for x and y and slot 2 is being used as a temporary extra slot. x and y get initialized with constants in the first two opcodes and in the next three 3 opcodes a swap is performed using the "temporary" second slot, kind of like you would do by hand:
tmp = y -- MOVE 2 1
y = x -- MOVE 1 0
x = tmp -- MOVE 0 2
Given how Lua used a different approach when doing a swapping assignment and a static initialization, I wouldn't be surprised if you got different results for different kinds of multiple assignments (setting table fields is probably going to look very different, specially since then the order should matter due to metamethods...). We would need to find the part in the source where the bytecode gets emitted to be 100% sure though. And as I mentioned before, all of this might vary between Lua versions and implementations, specially if you look at LuaJIT vs PUC Lua.