What's the minimum code required to make a NativeCall to the md_parse function in the md4c library? - raku

Note: This post is similar, but not quite the same as a more open-ended questions asked on Reddit: https://www.reddit.com/r/rakulang/comments/vvpikh/looking_for_guidance_on_getting_nativecall/
I'm trying to use the md4c c library to process a markdown file with its md_parse function. I'm having no success, and the program just quietly dies. I don't think I'm calling it with the right arguments.
Documentation for the function is here: https://github.com/mity/md4c/wiki/Embedding-Parser%3A-Calling-MD4C
I'd like to at least figure out the minimum amount of code needed to do this without error. This is my latest attempt, though I've tried many:
use v6.d;
use NativeCall;
sub md_parse(str, int32, Pointer is rw ) is native('md4c') returns int32 { * }
md_parse('hello', 5, Pointer.new());
say 'hi'; # this never gets printed

md4c is a SAX-like streaming parser that calls your functions when it encounters markdown elements. If you call it with an uninitialised Pointer, or with an uninitialised CStruct then the code will SEGV when the md4c library tries to call a null function pointer.
The README says:
The main provided function is md_parse(). It takes a text in the
Markdown syntax and a pointer to a structure which provides pointers
to several callback functions.
As md_parse() processes the input, it calls the callbacks (when
entering or leaving any Markdown block or span; and when outputting
any textual content of the document), allowing application to convert
it into another format or render it onto the screen.
The function signature of md_parse is:
int md_parse(const MD_CHAR* text, MD_SIZE size, const MD_PARSER* parser, void* userdata);
In order for md_parse() to work, you will need to:
define a native CStruct that matches the MD_PARSER type definition
create an instance of this CStruct
initialise all the function pointers with Raku functions that have the right function signature
call md_parse() with the initialised CStruct instance as the third parameter
The 4th parameter to md_parse() is void* userdata which is a pointer that you provide which gets passed back to you as the last parameter of each of the callback functions. My guess is that it's optional and if you pass a null value then you'll get called back with a null userdata parameter in each callback.
Followup
This turned into an interesting rabbit hole to fall down.
The code that makes it possible to pass a Raku sub as a callback parameter to a native function is quite complex and relies on MoarVM ops to build and cache the FFI callback trampoline. This is a piece of code that marshals the C calling convention parameters into a call that MoarVM can dispatch to a Raku sub.
It will be a sizeable task to implement equivalent functionality to provide some kind of nativecast that will generate the required callback trampoline and return a Pointer that can be assigned into a CStruct.
But we can cheat
We can use a simple C function to return the pointer to a generated callback trampoline as if it was for a normal callback sub. We can then store this pointer in our CStruct and our problem is solved. The generated trampoline is specific to the function signature of the Raku sub we want to call, so we need to generate a different NativeCall binding for each function signature we need.
The C function:
void* get_pointer(void* p)
{
return p;
}
Binding a NativeCall sub for the function signature we need:
sub get_enter_leave_fn(&func (uint32, Pointer, Pointer))
is native('./getpointer') is symbol('get_pointer') returns Pointer { * }
Initialising a CStruct attribute:
$!enter_block := get_enter_leave_fn(&enter_block);
Putting it all together:
use NativeCall;
enum BlockType < DOC QUOTE UL OL LI HR H CODE HTML P TABLE THEAD TBODY TR TH TD >;
enum SpanType < EM STRONG A IMG SPAN_CODE DEL SPAN_LATEXMATH LATEXMATH_DISPLAY WIKILINK SPAN_U >;
enum TextType < NORMAL NULLCHAR BR SOFTBR ENTITY TEXT_CODE TEXT_HTML TEXT_LATEXMATH >;
sub enter_block(uint32 $type, Pointer $detail, Pointer $userdata --> int32) {
say "enter block { BlockType($type) }";
}
sub leave_block(uint32 $type, Pointer $detail, Pointer $userdata --> int32) {
say "leave block { BlockType($type) }";
}
sub enter_span(uint32 $type, Pointer $detail, Pointer $userdata --> int32) {
say "enter span { SpanType($type) }";
}
sub leave_span(uint32 $type, Pointer $detail, Pointer $userdata --> int32) {
say "leave span { SpanType($type) }";
}
sub text(uint32 $type, str $text, uint32 $size, Pointer $userdata --> int32) {
say "text '{$text.substr(0..^$size)}'";
}
sub debug_log(str $msg, Pointer $userdata --> int32) {
note $msg;
}
#
# Cast functions that are specific to the required function signature.
#
# Makes use of a utility C function that returns its `void*` parameter, compiled
# into a shared library called libgetpointer.dylib (on MacOS)
#
# gcc -shared -o libgetpointer.dylib get_pointer.c
#
# void* get_pointer(void* p)
# {
# return p;
# }
#
# Each cast function uses NativeCall to build an FFI callback trampoline that gets
# cached in an MVMThreadContext. The generated callback code is specific to the
# function signature of the Raku function that will be called.
#
sub get_enter_leave_fn(&func (uint32, Pointer, Pointer))
is native('./getpointer') is symbol('get_pointer') returns Pointer { * }
sub get_text_fn(&func (uint32, str, uint32, Pointer))
is native('./getpointer') is symbol('get_pointer') returns Pointer { * }
sub get_debug_fn(&func (str, Pointer))
is native('./getpointer') is symbol('get_pointer') returns Pointer { * }
class MD_PARSER is repr('CStruct') {
has uint32 $!abi_version; # unsigned int abi_version
has uint32 $!flags; # unsigned int flags
has Pointer $!enter_block; # F:int ( )* enter_block
has Pointer $!leave_block; # F:int ( )* leave_block
has Pointer $!enter_span; # F:int ( )* enter_span
has Pointer $!leave_span; # F:int ( )* leave_span
has Pointer $!text; # F:int ( )* text
has Pointer $!debug_log; # F:void ( )* debug_log
has Pointer $!syntax; # F:void ( )* syntax
submethod TWEAK() {
$!abi_version = 0;
$!flags = 0;
$!enter_block := get_enter_leave_fn(&enter_block);
$!leave_block := get_enter_leave_fn(&leave_block);
$!enter_span := get_enter_leave_fn(&enter_span);
$!leave_span := get_enter_leave_fn(&leave_span);
$!text := get_text_fn(&text);
$!debug_log := get_debug_fn(&debug_log);
}
}
sub md_parse(str, uint32, MD_PARSER, Pointer is rw) is native('md4c') returns int { * }
my $parser = MD_PARSER.new;
my $md = '
# Heading
## Sub Heading
hello *world*
';
md_parse($md, $md.chars, $parser, Pointer.new);
The output:
./md4c.raku
enter block DOC
enter block H
text 'Heading'
leave block H
enter block H
text 'Sub Heading'
leave block H
enter block P
text 'hello '
enter span EM
text 'world'
leave span EM
leave block P
leave block DOC
In summary, it's possible. I'm not sure if I'm proud of this or horrified by it. I think a long-term solution will require refactoring the callback trampoline generator into a separate nqp op that can be exposed to Raku as a nativewrap style operation.

Related

In Perl 6, is there a way to get the Pod declarator block that is attached to a specific multi sub candidate?

Perl 6 has a cool feature which allows one to get any Pod declarator block that is attached to a subroutine (or class, role, etc.), using the WHY method:
#|(Some enlightening words about myfunc.)
sub myfunc (Int $i) { say "You provided an integer: $i"; };
#=(Some more words about myfunc.)
say &myfunc.WHY;
This displays:
Some enlightening words about myfunc.
Some more words about myfunc.
Unfortunately, when one has multiple candidates for a subroutine, one can't just invoke .WHY on the subroutine name:
#|(myfunc accepts an integer.)
multi myfunc (Int $i) { say "You provided an integer $i"; };
#|(myfunc accepts a string.)
multi myfunc (Str $s) { say "You provided a string $s"; };
say &myfunc.WHY;
The result:
No documentation available for type 'Sub'.
Perhaps it can be found at https://docs.perl6.org/type/Sub
Is there a way to get the Pod declarator block that is attached to a specific multi sub candidate? Is there a way to do so for all a subroutine's candidates?
You look up the multi with candidates or cando.
When initially posted I couldn't find a canned method for looking up a multi sub by signature but Christoph remedied that.
#| Initiate a specified spell normally
multi sub cast(Str $spell) {
say "casting spell $spell";
}
#= (do not use for class 7 spells)
#| Cast a heavy rock etc in irritation
multi sub cast(Str $heavy-item, Int $n) {
say "chucking $n heavy $heavy-item";
}
say "doc for cast spell";
say &cast.candidates[0].WHY;
say "doc for throwing rocks";
say &cast.candidates[1].WHY;
say "find doc for throwing things";
for &cast.candidates {
if .signature ~~ :( Str, Int ) {
say .WHY;
}
}
# more advanced
say &cast.cando(\(Str, Int))>>.WHY; # thanks to Christoph
&cast.candidates.first: { .signature ~~ :(Str, Int) } andthen .WHY.say;
OUTPUT:
doc for cast spell
Initiate a specified spell normally
(do not use for class 7 spells)
doc for throwing rocks
Cast a heavy rock etc in irritation
find doc for throwing things
Cast a heavy rock etc in irritation
... repeated for variants ...
Get all documentation via candidates:
&myfunc.candidates>>.WHY
Get documentation of narrowest matching candidate via cando:
&myfunc.cando(\(42)).first.WHY
This does not really answer your question, but tries to explain why using WHY on a multi does not work; it's mainly because it points to the proto of the multi
#|(my-multi-func accepts either an integer or a string)
proto my-multi-func (|) {*}
#|(myfunc accepts an integer.)
multi my-multi-func (Int $i) { say "You provided an integer $i"; };
#|(myfunc accepts a string.)
multi my-multi-func (Str $s) { say "You provided a string $s"; };
say "my-multi-func is a {&my-multi-func.perl} and does {&my-multi-func.WHY}";
I attach the {&my-multi-func.perl} here because that is what gave me the hint. If you don't define a proto, it returns
my-multi-func is a sub my-multi-func (;; Mu | is raw) { #`(Sub|59650976) ... }
, which is none of the defined multis, ergo the proto. Of course if you want to access those particular definitions of the candidates, #Christopher Bottoms answer is just perfect.
This is a little indirect, but ...
You can store each multi myfunc in a variable and call WHY on that variable, yet still call myfunc as before:
#!/bin/env perl6
#|(myfunc accepts an integer.)
my $func_int = multi myfunc (Int $i) { say "You provided an integer $i"; }
#=(More about Int version of myfunc)
#|(myfunc accepts a string.)
my $func_string = multi myfunc (Str $s) { say "You provided a string $s"; }
#=(More about Str version of myfunc)
myfunc(10); # myfunc works as normal
say $func_int.WHY; # show POD declarator block
say ''; # Blank line to separate output into two groups
myfunc("bar");
say $func_string.WHY;
Resulting in this output:
You provided an integer 10
myfunc accepts an integer.
More about Int version of myfunc
You provided a string bar
myfunc accepts a string.
More about Str version of myfunc
This is using Rakudo Star 2018.01 on CentOS 6.7.

How to use int[] type in Objective-C [duplicate]

I wrote a function containing array as argument,
and call it by passing value of array as follows.
void arraytest(int a[])
{
// changed the array a
a[0] = a[0] + a[1];
a[1] = a[0] - a[1];
a[0] = a[0] - a[1];
}
void main()
{
int arr[] = {1, 2};
printf("%d \t %d", arr[0], arr[1]);
arraytest(arr);
printf("\n After calling fun arr contains: %d\t %d", arr[0], arr[1]);
}
What I found is though I am calling arraytest() function by passing values, the original copy of int arr[] is changed.
Can you please explain why?
When passing an array as a parameter, this
void arraytest(int a[])
means exactly the same as
void arraytest(int *a)
so you are modifying the values in main.
For historical reasons, arrays are not first class citizens and cannot be passed by value.
For passing 2D (or higher multidimensional) arrays instead, see my other answers here:
How to pass a multidimensional [C-style] array to a function in C and C++, and here:
How to pass a multidimensional array to a function in C++ only, via std::vector<std::vector<int>>&
Passing 1D arrays as function parameters in C (and C++)
1. Standard array usage in C with natural type decay (adjustment) from array to ptr
#Bo Persson correctly states in his great answer here:
When passing an array as a parameter, this
void arraytest(int a[])
means exactly the same as
void arraytest(int *a)
Let me add some comments to add clarity to those two code snippets:
// param is array of ints; the arg passed automatically "adjusts" (frequently said
// informally as "decays") from `int []` (array of ints) to `int *`
// (ptr to int)
void arraytest(int a[])
// ptr to int
void arraytest(int *a)
However, let me add also that the above two forms also:
mean exactly the same as
// array of 0 ints; automatically adjusts (decays) from `int [0]`
// (array of zero ints) to `int *` (ptr to int)
void arraytest(int a[0])
which means exactly the same as
// array of 1 int; automatically adjusts (decays) from `int [1]`
// (array of 1 int) to `int *` (ptr to int)
void arraytest(int a[1])
which means exactly the same as
// array of 2 ints; automatically adjusts (decays) from `int [2]`
// (array of 2 ints) to `int *` (ptr to int)
void arraytest(int a[2])
which means exactly the same as
// array of 1000 ints; automatically adjusts (decays) from `int [1000]`
// (array of 1000 ints) to `int *` (ptr to int)
void arraytest(int a[1000])
etc.
In every single one of the array examples above, and as shown in the example calls in the code just below, the input parameter type adjusts (decays) to an int *, and can be called with no warnings and no errors, even with build options -Wall -Wextra -Werror turned on (see my repo here for details on these 3 build options), like this:
int array1[2];
int * array2 = array1;
// works fine because `array1` automatically decays from an array type
// to a pointer type: `int *`
arraytest(array1);
// works fine because `array2` is already an `int *`
arraytest(array2);
As a matter of fact, the "size" value ([0], [1], [2], [1000], etc.) inside the array parameter here is apparently just for aesthetic/self-documentation purposes, and can be any positive integer (size_t type I think) you want!
In practice, however, you should use it to specify the minimum size of the array you expect the function to receive, so that when writing code it's easy for you to track and verify. The MISRA-C-2012 standard (buy/download the 236-pg 2012-version PDF of the standard for £15.00 here) goes so far as to state (emphasis added):
Rule 17.5 The function argument corresponding to a parameter declared to have an array type shall have an appropriate number of elements.
...
If a parameter is declared as an array with a specified size, the corresponding argument in each function call should point into an object that has at least as many elements as the array.
...
The use of an array declarator for a function parameter specifies the function interface more clearly than using a pointer. The minimum number of elements expected by the function is explicitly stated, whereas this is not possible with a pointer.
In other words, they recommend using the explicit size format, even though the C standard technically doesn't enforce it--it at least helps clarify to you as a developer, and to others using the code, what size array the function is expecting you to pass in.
2. Forcing type safety on arrays in C
(Not recommended (correction: sometimes recommended, especially for fixed-size multi-dimensional arrays), but possible. See my brief argument against doing this at the end. Also, for my multi-dimensional-array [ex: 2D array] version of this, see my answer here.)
As #Winger Sendon points out in a comment below my answer, we can force C to treat an array type to be different based on the array size!
First, you must recognize that in my example just above, using the int array1[2]; like this: arraytest(array1); causes array1 to automatically decay into an int *. HOWEVER, if you take the address of array1 instead and call arraytest(&array1), you get completely different behavior! Now, it does NOT decay into an int *! This is because if you take the address of an array then you already have a pointer type, and pointer types do NOT adjust to other pointer types. Only array types adjust to pointer types. So instead, the type of &array1 is int (*)[2], which means "pointer to an array of size 2 of int", or "pointer to an array of size 2 of type int", or said also as "pointer to an array of 2 ints". So, you can FORCE C to check for type safety on an array by passing explicit pointers to arrays, like this:
// `a` is of type `int (*)[2]`, which means "pointer to array of 2 ints";
// since it is already a ptr, it can NOT automatically decay further
// to any other type of ptr
void arraytest(int (*a)[2])
{
// my function here
}
This syntax is hard to read, but similar to that of a function pointer. The online tool, cdecl, tells us that int (*a)[2] means: "declare a as pointer to array 2 of int" (pointer to array of 2 ints). Do NOT confuse this with the version withOUT parenthesis: int * a[2], which means: "declare a as array 2 of pointer to int" (AKA: array of 2 pointers to int, AKA: array of 2 int*s).
Now, this function REQUIRES you to call it with the address operator (&) like this, using as an input parameter a POINTER TO AN ARRAY OF THE CORRECT SIZE!:
int array1[2];
// ok, since the type of `array1` is `int (*)[2]` (ptr to array of
// 2 ints)
arraytest(&array1); // you must use the & operator here to prevent
// `array1` from otherwise automatically decaying
// into `int *`, which is the WRONG input type here!
This, however, will produce a warning:
int array1[2];
// WARNING! Wrong type since the type of `array1` decays to `int *`:
// main.c:32:15: warning: passing argument 1 of ‘arraytest’ from
// incompatible pointer type [-Wincompatible-pointer-types]
// main.c:22:6: note: expected ‘int (*)[2]’ but argument is of type ‘int *’
arraytest(array1); // (missing & operator)
You may test this code here.
To force the C compiler to turn this warning into an error, so that you MUST always call arraytest(&array1); using only an input array of the corrrect size and type (int array1[2]; in this case), add -Werror to your build options. If running the test code above on onlinegdb.com, do this by clicking the gear icon in the top-right and click on "Extra Compiler Flags" to type this option in. Now, this warning:
main.c:34:15: warning: passing argument 1 of ‘arraytest’ from incompatible pointer type [-Wincompatible-pointer-types]
main.c:24:6: note: expected ‘int (*)[2]’ but argument is of type ‘int *’
will turn into this build error:
main.c: In function ‘main’:
main.c:34:15: error: passing argument 1 of ‘arraytest’ from incompatible pointer type [-Werror=incompatible-pointer-types]
arraytest(array1); // warning!
^~~~~~
main.c:24:6: note: expected ‘int (*)[2]’ but argument is of type ‘int *’
void arraytest(int (*a)[2])
^~~~~~~~~
cc1: all warnings being treated as errors
Note that you can also create "type safe" pointers to arrays of a given size, like this:
int array[2]; // variable `array` is of type `int [2]`, or "array of 2 ints"
// `array_p` is a "type safe" ptr to array of size 2 of int; ie: its type
// is `int (*)[2]`, which can also be stated: "ptr to array of 2 ints"
int (*array_p)[2] = &array;
...but I do NOT necessarily recommend this (using these "type safe" arrays in C), as it reminds me a lot of the C++ antics used to force type safety everywhere, at the exceptionally high cost of language syntax complexity, verbosity, and difficulty architecting code, and which I dislike and have ranted about many times before (ex: see "My Thoughts on C++" here).
For additional tests and experimentation, see also the link just below.
References
See links above. Also:
My code experimentation online: https://onlinegdb.com/B1RsrBDFD
See also:
My answer on multi-dimensional arrays (ex: 2D arrays) which expounds upon the above, and uses the "type safety" approach for multi-dimensional arrays where it makes sense: How to pass a multidimensional array to a function in C and C++
If you want to pass a single-dimension array as an argument in a function, you would have to declare a formal parameter in one of following three ways and all three declaration methods produce similar results because each tells the compiler that an integer pointer is going to be received.
int func(int arr[], ...){
.
.
.
}
int func(int arr[SIZE], ...){
.
.
.
}
int func(int* arr, ...){
.
.
.
}
So, you are modifying the original values.
Thanks !!!
Passing a multidimensional array as argument to a function.
Passing an one dim array as argument is more or less trivial.
Let's take a look on more interesting case of passing a 2 dim array.
In C you can't use a pointer to pointer construct (int **) instead of 2 dim array.
Let's make an example:
void assignZeros(int(*arr)[5], const int rows) {
for (int i = 0; i < rows; i++) {
for (int j = 0; j < 5; j++) {
*(*(arr + i) + j) = 0;
// or equivalent assignment
arr[i][j] = 0;
}
}
Here I have specified a function that takes as first argument a pointer to an array of 5 integers.
I can pass as argument any 2 dim array that has 5 columns:
int arr1[1][5]
int arr1[2][5]
...
int arr1[20][5]
...
You may come to an idea to define a more general function that can accept any 2 dim array and change the function signature as follows:
void assignZeros(int ** arr, const int rows, const int cols) {
for (int i = 0; i < rows; i++) {
for (int j = 0; j < cols; j++) {
*(*(arr + i) + j) = 0;
}
}
}
This code would compile but you will get a runtime error when trying to assign the values in the same way as in the first function.
So in C a multidimensional arrays are not the same as pointers to pointers ... to pointers. An int(*arr)[5] is a pointer to array of 5 elements,
an int(*arr)[6] is a pointer to array of 6 elements, and they are a pointers to different types!
Well, how to define functions arguments for higher dimensions? Simple, we just follow the pattern!
Here is the same function adjusted to take an array of 3 dimensions:
void assignZeros2(int(*arr)[4][5], const int dim1, const int dim2, const int dim3) {
for (int i = 0; i < dim1; i++) {
for (int j = 0; j < dim2; j++) {
for (int k = 0; k < dim3; k++) {
*(*(*(arr + i) + j) + k) = 0;
// or equivalent assignment
arr[i][j][k] = 0;
}
}
}
}
How you would expect, it can take as argument any 3 dim arrays that have in the second dimensions 4 elements and in the third dimension 5 elements. Anything like this would be OK:
arr[1][4][5]
arr[2][4][5]
...
arr[10][4][5]
...
But we have to specify all dimensions sizes up to the first one.
You are not passing the array as copy. It is only a pointer pointing to the address where the first element of the array is in memory.
You are passing the address of the first element of the array
You are passing the value of the memory location of the first member of the array.
Therefore when you start modifying the array inside the function, you are modifying the original array.
Remember that a[1] is *(a+1).
Arrays in C are converted, in most of the cases, to a pointer to the first element of the array itself. And more in detail arrays passed into functions are always converted into pointers.
Here a quote from K&R2nd:
When an array name is passed to a function, what is passed is the
location of the initial element. Within the called function, this
argument is a local variable, and so an array name parameter is a
pointer, that is, a variable containing an address.
Writing:
void arraytest(int a[])
has the same meaning as writing:
void arraytest(int *a)
So despite you are not writing it explicitly it is as you are passing a pointer and so you are modifying the values in the main.
For more I really suggest reading this.
Moreover, you can find other answers on SO here
In C, except for a few special cases, an array reference always "decays" to a pointer to the first element of the array. Therefore, it isn't possible to pass an array "by value". An array in a function call will be passed to the function as a pointer, which is analogous to passing the array by reference.
EDIT: There are three such special cases where an array does not decay to a pointer to it's first element:
sizeof a is not the same as sizeof (&a[0]).
&a is not the same as &(&a[0]) (and not quite the same as &a[0]).
char b[] = "foo" is not the same as char b[] = &("foo").
Arrays are always passed by reference if you use a[] or *a:
int* printSquares(int a[], int size, int e[]) {
for(int i = 0; i < size; i++) {
e[i] = i * i;
}
return e;
}
int* printSquares(int *a, int size, int e[]) {
for(int i = 0; i < size; i++) {
e[i] = i * i;
}
return e;
}
An array can also be called as a decay pointer.
Usually when we put a variable name in the printf statement the value gets printed in case of an array it decays to the address of the first element, Therefore calling it as a decay pointer.
And we can only pass the decay pointer to a function.
Array as a formal parameter like Mr.Bo said int arr[] or int arr[10] is equivalent to the int *arr;
They will have there own 4 bytes of memory space and storing the decay pointer received.and we do pointer arithmetic on them.

What is the point of coercions like Int(Cool)?

The Perl 6 Web site on functions says
Coercion types can help you to have a specific type inside a routine, but accept wider input. When the routine is called, the argument is automatically converted to the narrower type.
sub double(Int(Cool) $x) {
2 * $x
}
say double '21'; # 42
say double Any; # Type check failed in binding $x; expected 'Cool' but got 'Any'
Here the Int is the target type to which the argument will be coerced, and Cool is the type that the routine accepts as input.
But what is the point for the sub? Isn't $x just an Int? Why would you restrict the caller to implement Cool for the argument?
I'm doubly confused by the example because Int already is Cool. So I did an example where the types don't share a hierarchy:
class Foo { method foomethod { say 'foomethod' } }
class Bar {}
class Quux is Foo {
# class Quux { # compile error
method Bar { Bar.new }
}
sub foo(Bar(Foo) $c) {
say $c.WHAT; # (Bar)
# $c.foomethod # fails if uncommented: Method 'foomethod' not found for invocant of class 'Bar'
}
foo(Quux.new)
Here the invocant of foo is restricted to provide a Foo that can be converted to a Bar but foo cannot even call a method of Foo on $c because its type is Bar. So why would foo care that the to-be-coerced type is a Foo in the first place?
Could someone shed some light on this? Links to appropriate documentation and parts of the spec are appreciated as well. I couldn't find anything useful there.
Update Having reviewed this answer today I've concluded I had completely misunderstood what #musiKk was getting at. This was revealed most clearly in #darch's question and #musiKk's response:
#darch: Or is your question why one might prefer Int(Cool) over Int(Any)? If that's the case, that would be the question to ask.
#musiKk: That is exactly my question. :)
Reviewing the many other answers I see none have addressed it the way I now think it warrants addressing.
I might be wrong of course so what I've decided to do is leave the original question as is, in particular leaving the title as is, and leave this answer as it was, and instead write a new answer addressing #darch's reformulation.
Specify parameter type, with no coercion: Int $x
We could declare:
sub double (Int $x) { ... } # Accept only Int. (No coercion.)
Then this would work:
double(42);
But unfortunately typing 42 in response to this:
double(prompt('')); # `prompt` returns the string the user types
causes the double call to fail with Type check failed in binding $x; expected Int but got Str ("42") because 42, while looking like a number, is technically a string of type Str, and we've asked for no coercion.
Specify parameter type, with blanket coercion: Int() $x
We can introduce blanket coercion of Any value in the sub's signature:
sub double (Int(Any) $x) { ... } # Take Any value. Coerce to an Int.
Or:
sub double (Int() $x) { ... } # Same -- `Int()` coerces from Any.
Now, if you type 42 when prompted by the double(prompt('')); statement, the run-time type-check failure no longer applies and instead the run-time attempts to coerce the string to an Int. If the user types a well-formed number the code just works. If they type 123abc the coercion will fail at run-time with a nice error message:
Cannot convert string to number: trailing characters after number in '123⏏abc'
One problem with blanket coercion of Any value is that code like this:
class City { ... } # City has no Int coercion
my City $city;
double($city);
fails at run-time with the message: "Method 'Int' not found for invocant of class 'City'".
Specify parameter type, with coercion from Cool values: Int(Cool) $x
We can choose a point of balance between no coercion and blanket coercion of Any value.
The best class to coerce from is often the Cool class, because Cool values are guaranteed to either coerce nicely to other basic types or generate a nice error message:
# Accept argument of type Cool or a subclass and coerce to Int:
sub double (Int(Cool) $x) { ... }
With this definition, the following:
double(42);
double(prompt(''));
works as nicely as it can, and:
double($city);
fails with "Type check failed in binding $x; expected Cool but got City (City)" which is arguably a little better diagnostically for the programmer than "Method 'Int' not found for invocant of class 'City'".
why would foo care that the to-be-coerced type is a Foo in the first place?
Hopefully it's now obvious that the only reason it's worth limiting the coerce-from-type to Foo is because that's a type expected to successfully coerce to a Bar value (or, perhaps, fail with a friendly message).
Could someone shed some light on this? Links to appropriate documentation and parts of the spec are appreciated as well. I couldn't find anything useful there.
The document you originally quoted is pretty much all there is for enduser doc. Hopefully it makes sense now and you're all set. If not please comment and we'll go from there.
What this does is accept a value that is a subtype of Cool, and tries to transform it into an Int. At that point it is an Int no matter what it was before.
So
sub double ( Int(Cool) $n ) { $n * 2 }
can really be thought of as ( I think this is how it was actually implemented in Rakudo )
# Int is a subtype of Cool otherwise it would be Any or Mu
proto sub double ( Cool $n ) {*}
# this has the interior parts that you write
multi sub double ( Int $n ) { $n * 2 }
# this is what the compiler writes for you
multi sub double ( Cool $n ) {
# calls the other multi since it is now an Int
samewith Int($n);
}
So this accepts any of Int, Str, Rat, FatRat, Num, Array, Hash, etc. and tries to convert it into an Int before calling &infix:<*> with it, and 2.
say double ' 5 '; # 25
say double 2.5; # 4
say double [0,0,0]; # 6
say double { a => 0, b => 0 }; # 4
You might restrict it to a Cool instead of Any as all Cool values are essentially required to provide a coercion to Int.
( :( Int(Any) $ ) can be shortened to just :( Int() $ ) )
The reason you might do this is that you need it to be an Int inside the sub because you are calling other code that does different things with different types.
sub example ( Int(Cool) $n ) returns Int {
other-multi( $n ) * $n;
}
multi sub other-multi ( Int $ ) { 10 }
multi sub other-multi ( Any $ ) { 1 }
say example 5; # 50
say example 4.5; # 40
In this particular case you could have written it as one of these
sub example ( Cool $n ) returns Int {
other-multi( Int($n) ) * Int($n);
}
sub example ( Cool $n ) returns Int {
my $temp = Int($n);
other-multi( $temp ) * $temp;
}
sub example ( Cool $n is copy ) returns Int {
$n = Int($n);
other-multi( $n ) * $n;
}
None of them are as clear as the one that uses the signature to coerce it for you.
Normally for such a simple function you can use one of these and it will probably do what you want.
my &double = * * 2; # WhateverCode
my &double = * × 2; # ditto
my &double = { $_ * 2 }; # bare block
my &double = { $^n * 2 }; # block with positional placeholder
my &double = -> $n { $n * 2 }; # pointy block
my &double = sub ( $n ) { $n * 2 } # anon sub
my &double = anon sub double ( $n ) { $n * 2 } # anon sub with name
my &double = &infix:<*>.assuming(*,2); # curried
my &double = &infix:<*>.assuming(2);
sub double ( $n ) { $n * 2 } # same as :( Any $n )
Am I missing something? I'm not a Perl 6 expert, but it appears the syntax allows one to specify independently both what input types are permissible and how the input will be presented to the function.
Restricting the allowable input is useful because it means the code will result in an error, rather than a silent (useless) type conversion when the function is called with a nonsensical parameter.
I don't think an example where the two types are not in a hierarchical relationship makes sense.
Per comments on the original question, a better version of #musiKk's question "What is the point of coercions like Int(Cool)?" turned out to be:
Why might one prefer Int(Cool) over Int(Any)?
A corollary, which I'll also address in this answer, is:
Why might one prefer Int(Any) over Int(Cool)?
First, a list of various related options:
sub _Int_strong (Int $) {} # Argument must be Int
sub _Int_cool (Int(Cool) $) {} # Argument must be Cool; Int invoked
sub _Int_weak (Int(Any) $) {} # Argument must be Any; Int invoked
sub _Int_weak2 (Int() $) {} # same
sub _Any (Any $) {} # Argument must be Any
sub _Any2 ( $) {} # same
sub _Mu (Mu $) {} # Weakest typing - just memory safe (Mu)
_Int_strong val; # Fails to bind if val is not an Int
_Int_cool val; # Fails to bind if val is not Cool. Int invoked.
_Int_weak val; # Fails to bind if val is not Any. Int invoked.
_Any val; # Fails to bind if val is Mu
_Mu val; # Will always bind. If val is a native value, boxes it.
Why might one prefer Int(Cool) over Int(Any)?
Because Int(Cool) is slightly stronger typing. The argument must be of type Cool rather than the broader Any and:
Static analysis will reject binding code written to pass an argument that isn't Cool to a routine whose corresponding parameter has the type constraint Int(Cool). If static analysis shows there is no other routine candidate able to accept the call then the compiler will reject it at compile time. This is one of the meanings of "strong typing" explained in the last section of this answer.
If a value is Cool then it is guaranteed to have a well behaved .Int conversion method. So it will not yield a Method not found error at run-time and can be relied on to provide a good error message if it fails to produce a converted to integer value.
Why might one prefer Int(Any) over Int(Cool)?
Because Int(Any) is slightly weaker typing in that the argument can be of any regular type and P6 will just try and make it work:
.Int will be called on an argument that's passed to a routine whose corresponding parameter has the type constraint Int(...) no matter what the ... is. Provided the passed argument has an .Int method the call and subsequent conversion has a chance of succeeding.
If the .Int fails then the error message will be whatever the .Int method produces. If the argument is actually Cool then the .Int method will produce a good error message if it fails to convert to an Int. Otherwise the .Int method is presumably not a built in one and the result will be pot luck.
Why Foo(Bar) in the first place?
And what's all this about weak and strong typing?
An Int(...) constraint on a function parameter is going to result in either:
A failure to type check; or
An.Int conversion of the corresponding argument that forces it to its integer value -- or fails, leaving the corresponding parameter containing a Failure.
Using Wikipedia definitions as they were at the time of writing this answer (2019) this type checking and attempted conversion will be:
strong typing in the sense that a type constraint like Int(...) is "use of programming language types in order to both capture invariants of the code, and ensure its correctness, and definitely exclude certain classes of programming errors";
Currently weak typing in Rakudo in the sense that Rakudo does not check the ... in Int(...) at compile time even though in theory it could. That is, sub double (Int $x) {}; double Date; yields a compile time error (Calling double(Date) will never work) whereas sub double (Int(Cool) $x) {}; double Date; yields a run time error (Type check failed in binding).
type conversion;
weak typing in the sense that it's implicit type conversion in the sense that the compiler will handle the .Int coercion as part of carrying out the call;
explicit type conversion in the sense that the Int(...) constraint is explicitly directing the compiler to do the conversion as part of binding a call;
checked explicit type conversion -- P6 only does type safe conversions/coercions.
I believe the answer is as simple as you may not want to restrict the argument to Int even though you will be treating it as Int within the sub. say for some reason you want to be able to multiply an Array by a Hash, but fail if the args can't be treated as Int (i.e. is not Cool).
my #a = 1,2,3;
my %h = 'a' => 1, 'b' => 2;
say #a.Int; # 3 (List types coerced to the equivalent of .elems when treated as Int)
say %h.Int; # 2
sub m1(Int $x, Int $y) {return $x * $y}
say m1(3,2); # 6
say m1(#a,%h); # does not match
sub m2(Int(Cool) $x, Int(Cool) $y) {return $x * $y}
say m2('3',2); # 6
say m2(#a,%h); # 6
say m2('foo',2); # does not match
of course, you could also do this without the signature because the math operation will coerce the type automatically:
sub m3($x,$y) {return $x * $y}
say m3(#a,%h); # 6
however, this defers your type check to the inside of the sub, which kind of defeats the purpose of a signature and prevents you from making the sub a multi
All subtypes of Cool will be (as Cool requires them to) coerced to an Int. So if an operator or routine internal to your sub only works with Int arguments, you don't have to add an extra statement/expression converting to an Int nor does that operator/routine's code need to account for other subtypes of Cool. It enforces that the argument will be an Int inside of your sub wherever you use it.
Your example is backwards:
class Foo { method foomethod { say 'foomethod' } }
class Bar {}
class Quux is Bar {
method Foo { Foo.new }
}
sub foo(Foo(Bar) $c) {
#= converts $c of type Bar to type Foo
#= returns result of foomethod
say $c.WHAT; #-> (Foo)
$c.foomethod #-> foomethod
}
foo(Quux.new)

Structure of a block declaration

When declaring a block what's the rationale behind using this syntax (i.e. surrounding brackets and caret on the left)?
(^myBlock)
For example:
int (^myBlock)(int) = ^(int num) {
return num * multiplier;
};
C BLOCKS: Syntax and Usage
Variables pointing to blocks take on the exact same syntax as variables pointing to functions, except * is substituted for ^. For example, this is a function pointer to a function taking an int and returning a float:
float (*myfuncptr)(int);
and this is a block pointer to a block taking an int and returning a float:
float (^myblockptr)(int);
As with function pointers, you'll likely want to typedef those types, as it can get relatively hairy otherwise. For example, a pointer to a block returning a block taking a block would be something like void (^(^myblockptr)(void (^)()))();, which is nigh impossible to read. A simple typedef later, and it's much simpler:
typedef void (^Block)();
Block (^myblockptr)(Block);
Declaring blocks themselves is where we get into the unknown, as it doesn't really look like C, although they resemble function declarations. Let's start with the basics:
myvar1 = ^ returntype (type arg1, type arg2, and so on) {
block contents;
like in a function;
return returnvalue;
};
This defines a block literal (from after = to and including }), explicitly mentions its return type, an argument list, the block body, a return statement, and assigns this literal to the variable myvar1.
A literal is a value that can be built at compile-time. An integer literal (The 3 in int a = 3;) and a string literal (The "foobar" in const char *b = "foobar";) are other examples of literals. The fact that a block declaration is a literal is important later when we get into memory management.
Finding a return statement in a block like this is vexing to some. Does it return from the enclosing function, you may ask? No, it returns a value that can be used by the caller of the block. See 'Calling blocks'. Note: If the block has multiple return statements, they must return the same type.
Finally, some parts of a block declaration are optional. These are:
The argument list. If the block takes no arguments, the argument list can be skipped entirely.
Examples:
myblock1 = ^ int (void) { return 3; }; // may be written as:
myblock2 = ^ int { return 3; }
The return type. If the block has no return statement, void is assumed. If the block has a return statement, the return type is inferred from it. This means you can almost always just skip the return type from the declaration, except in cases where it might be ambiguous.
Examples:
myblock3 = ^ void { printf("Hello.\n"); }; // may be written as:
myblock4 = ^ { printf("Hello.\n"); };
// Both succeed ONLY if myblock5 and myblock6 are of type int(^)(void)
myblock5 = ^ int { return 3; }; // can be written as:
myblock6 = ^ { return 3; };
source: http://thirdcog.eu/pwcblocks/
I think the rationale is that it looks like a function pointer:
void (*foo)(int);
Which should be familiar to any C programmer.

Dynamic/Static scope with Deep/Shallow binding (exercises)

I'm studying dynamic/static scope with deep/shallow binding and running code manually to see how these different scopes/bindings actually work. I read the theory and googled some example exercises and the ones I found are very simple (like this one which was very helpful with dynamic scoping) But I'm having trouble understanding how static scope works.
Here I post an exercise I did to check if I got the right solution:
considering the following program written in pseudocode:
int u = 42;
int v = 69;
int w = 17;
proc add( z:int )
u := v + u + z
proc bar( fun:proc )
int u := w;
fun(v)
proc foo( x:int, w:int )
int v := x;
bar(add)
main
foo(u,13)
print(u)
end;
What is printed to screen
a) using static scope? answer=180
b) using dynamic scope and deep binding? answer=69 (sum for u = 126 but it's foo's local v, right?)
c) using dynamic scope and shallow binding? answer=69 (sum for u = 101 but it's foo's local v, right?)
PS: I'm trying to practice doing some exercises like this if you know where I can find these types of problems (preferable with solutions) please give the link, thanks!
Your answer for lexical (static) scope is correct. Your answers for dynamic scope are wrong, but if I'm reading your explanations right, it's because you got confused between u and v, rather than because of any real misunderstanding about how deep and shallow binding work. (I'm assuming that your u/v confusion was just accidental, and not due to a strange confusion about values vs. references in the call to foo.)
a) using static scope? answer=180
Correct.
b) using dynamic scope and deep binding? answer=69 (sum for u = 126 but it's foo's local v, right?)
Your parenthetical explanation is right, but your answer is wrong: u is indeed set to 126, and foo indeed localizes v, but since main prints u, not v, the answer is 126.
c) using dynamic scope and shallow binding? answer=69 (sum for u = 101 but it's foo's local v, right?)
The sum for u is actually 97 (42+13+42), but since bar localizes u, the answer is 42. (Your parenthetical explanation is wrong for this one — you seem to have used the global variable w, which is 17, in interpreting the statement int u := w in the definition of bar; but that statement actually refers to foo's local variable w, its second parameter, which is 13. But that doesn't actually affect the answer. Your answer is wrong for this one only because main prints u, not v.)
For lexical scope, it's pretty easy to check your answers by translating the pseudo-code into a language with lexical scope. Likewise dynamic scope with shallow binding. (In fact, if you use Perl, you can test both ways almost at once, since it supports both; just use my for lexical scope, then do a find-and-replace to change it to local for dynamic scope. But even if you use, say, JavaScript for lexical scope and Bash for dynamic scope, it should be quick to test both.)
Dynamic scope with deep binding is much trickier, since few widely-deployed languages support it. If you use Perl, you can implement it manually by using a hash (an associative array) that maps from variable-names to scalar-refs, and passing this hash from function to function. Everywhere that the pseudocode declares a local variable, you save the existing scalar-reference in a Perl lexical variable, then put the new mapping in the hash; and at the end of the function, you restore the original scalar-reference. To support the binding, you create a wrapper function that creates a copy of the hash, and passes that to its wrapped function. Here is a dynamically-scoped, deeply-binding implementation of your program in Perl, using that approach:
#!/usr/bin/perl -w
use warnings;
use strict;
# Create a new scalar, initialize it to the specified value,
# and return a reference to it:
sub new_scalar($)
{ return \(shift); }
# Bind the specified procedure to the specified environment:
sub bind_proc(\%$)
{
my $V = { %{+shift} };
my $f = shift;
return sub { $f->($V, #_); };
}
my $V = {};
$V->{u} = new_scalar 42; # int u := 42
$V->{v} = new_scalar 69; # int v := 69
$V->{w} = new_scalar 17; # int w := 17
sub add(\%$)
{
my $V = shift;
my $z = $V->{z}; # save existing z
$V->{z} = new_scalar shift; # create & initialize new z
${$V->{u}} = ${$V->{v}} + ${$V->{u}} + ${$V->{z}};
$V->{z} = $z; # restore old z
}
sub bar(\%$)
{
my $V = shift;
my $fun = shift;
my $u = $V->{u}; # save existing u
$V->{u} = new_scalar ${$V->{w}}; # create & initialize new u
$fun->(${$V->{v}});
$V->{u} = $u; # restore old u
}
sub foo(\%$$)
{
my $V = shift;
my $x = $V->{x}; # save existing x
$V->{x} = new_scalar shift; # create & initialize new x
my $w = $V->{w}; # save existing w
$V->{w} = new_scalar shift; # create & initialize new w
my $v = $V->{v}; # save existing v
$V->{v} = new_scalar ${$V->{x}}; # create & initialize new v
bar %$V, bind_proc %$V, \&add;
$V->{v} = $v; # restore old v
$V->{w} = $w; # restore old w
$V->{x} = $x; # restore old x
}
foo %$V, ${$V->{u}}, 13;
print "${$V->{u}}\n";
__END__
and indeed it prints 126. It's obviously messy and error-prone, but it also really helps you understand what's going on, so for educational purposes I think it's worth it!
Simple and deep binding are Lisp interpreter viewpoints of the pseudocode. Scoping is just pointer arithmetic. Dynamic scope and static scope are the same if there are no free variables.
Static scope relies on a pointer to memory. Empty environments hold no symbol to value associations; denoted by word "End." Each time the interpreter reads an assignment, it makes space for association between a symbol and value.
The environment pointer is updated to point to the last association constructed.
env = End
env = [u,42] -> End
env = [v,69] -> [u,42] -> End
env = [w,17] -> [v,69] -> [u,42] -> End
Let me record this environment memory location as AAA. In my Lisp interpreter, when meeting a procedure, we take the environment pointer and put it our pocket.
env = [add,[closure,(lambda(z)(setq u (+ v u z)),*AAA*]]->[w,17]->[v,69]->[u,42]->End.
That's pretty much all there is until the procedure add is called. Interestingly, if add is never called, you just cost yourself a pointer.
Suppose the program calls add(8). OK, let's roll. The environment AAA is made current. Environment is ->[w,17]->[v,69]->[u,42]->End.
Procedure parameters of add are added to the front of the environment. The environment becomes [z,8]->[w,17]->[v,69]->[u,42]->End.
Now the procedure body of add is executed. Free variable v will have value 69. Free variable u will have value 42. z will have the value 8.
u := v + u + z
u will be assigned the value of 69 + 42 + 8 becomeing 119.
The environment will reflect this: [z,8]->[w,17]->[v,69]->[u,119]->End.
Assume procedure add has completed its task. Now the environment gets restored to its previous value.
env = [add,[closure,(lambda(z)(setq u (+ v u z)),*AAA*]]->[w,17]->[v,69]->[u,119]->End.
Notice how the procedure add has had a side effect of changing the value of free variable u. Awesome!
Regarding dynamic scoping: it just ensures closure leaves out dynamic symbols, thereby avoiding being captured and becoming dynamic.
Then put assignment to dynamic at top of code. If dynamic is same as parameter name, it gets masked by parameter value passed in.
Suppose I had a dynamic variable called z. When I called add(8), z would have been set to 8 regardless of what I wanted. That's probably why dynamic variables have longer names.
Rumour has it that dynamic variables are useful for things like backtracking, using let Lisp constructs.
Static binding, also known as lexical scope, refers to the scoping mechanism found in most modern languages.
In "lexical scope", the final value for u is neither 180 or 119, which are wrong answers.
The correct answer is u=101.
Please see standard Perl code below to understand why.
use strict;
use warnings;
my $u = 42;
my $v = 69;
my $w = 17;
sub add {
my $z = shift;
$u = $v + $u + $z;
}
sub bar {
my $fun = shift;
$u = $w;
$fun->($v);
}
sub foo {
my ($x, $w) = #_;
$v = $x;
bar( \&add );
}
foo($u,13);
print "u: $u\n";
Regarding shallow binding versus deep binding, both mechanisms date from the former LISP era.
Both mechanisms are meant to achieve dynamic binding (versus lexical scope binding) and therefore they produce identical results !
The differences between shallow binding and deep binding do not reside in semantics, which are identical, but in the implementation of dynamic binding.
With deep binding, variable bindings are set within a stack as "varname => varvalue" pairs.
The value of a given variable is retrieved from traversing the stack from top to bottom until a binding for the given variable is found.
Updating the variable consists in finding the binding in the stack and updating the associated value.
On entering a subroutine, a new binding for each actual parameter is pushed onto the stack, potentially hiding an older binding which is therefore no longer accessible wrt the retrieving mechanism described above (that stops at the 1st retrieved binding).
On leaving the subroutine, bindings for these parameters are simply popped from the binding stack, thus re-enabling access to the former bindings.
Please see the the code below for a Perl implementation of deep-binding dynamic scope.
use strict;
use warnings;
use utf8;
##
# Dynamic-scope deep-binding implementation
my #stack = ();
sub bindv {
my ($varname, $varval);
unshift #stack, [ $varname => $varval ]
while ($varname, $varval) = splice #_, 0, 2;
return $varval;
}
sub unbindv {
my $n = shift || 1;
shift #stack while $n-- > 0;
}
sub getv {
my $varname = shift;
for (my $i=0; $i < #stack; $i++) {
return $stack[$i][1]
if $varname eq $stack[$i][0];
}
return undef;
}
sub setv {
my ($varname, $varval) = #_;
for (my $i=0; $i < #stack; $i++) {
return $stack[$i][1] = $varval
if $varname eq $stack[$i][0];
}
return bindv($varname, $varval);
}
##
# EXERCICE
bindv( u => 42,
v => 69,
w => 17,
);
sub add {
bindv(z => shift);
setv(u => getv('v')
+ getv('u')
+ getv('z')
);
unbindv();
}
sub bar {
bindv(fun => shift);
setv(u => getv('w'));
getv('fun')->(getv('v'));
unbindv();
}
sub foo {
bindv(x => shift,
w => shift,
);
setv(v => getv('x'));
bar( \&add );
unbindv(2);
}
foo( getv('u'), 13);
print "u: ", getv('u'), "\n";
The result is u=97
Nevertheless, this constant traversal of the binding stack is costly : 0(n) complexity !
Shallow binding brings a wonderful O(1) enhanced performance over the previous implementation !
Shallow binding is improving the former mechanism by assigning each variable its own "cell", storing the value of the variable within the cell.
The value of a given variable is simply retrieved from the variable's
cell (using a hash table on variable names, we achieve a
0(1) complexity for accessing variable's values!)
Updating the variable's value is simply storing the value into the
variable's cell.
Creating a new binding (entering subs) works by pushing the old value
of the variable (a previous binding) onto the stack, and storing the
new local value in the value cell.
Eliminating a binding (leaving subs) works by popping the old value
off the stack into the variable's value cell.
Please see the the code below for a trivial Perl implementation of shallow-binding dynamic scope.
use strict;
use warnings;
our $u = 42;
our $v = 69;
our $w = 17;
our $z;
our $fun;
our $x;
sub add {
local $z = shift;
$u = $v + $u + $z;
}
sub bar {
local $fun = shift;
$u = $w;
$fun->($v);
}
sub foo {
local $x = shift;
local $w = shift;
$v = $x;
bar( \&add );
}
foo($u,13);
print "u: $u\n";
As you shall see, the result is still u=97
As a conclusion, remember two things :
shallow binding produces the same results as deep binding, but runs faster, since there is never a need to search for a binding.
The problem is not shallow binding versus deep binding versus
static binding BUT lexical scope versus dynamic scope (implemented either with deep or shallow binding).