Raku grammar with custom lexer - raku

Is there a possibility to define a custom lexer for a raku grammar, i.e. one that
converts a string to a stream of int id + value?
I was playing around with the grammar construct.
Rules seem
intuitive as they are probably converted to functions in a recursie descent
parser. However tokens and regex I would expect to be able to be broken out with explicit token-ids and an interface to map these to a name so that I can write my own lexer?

Raku grammars are a form of scannerless parsing, where the lexical structure and parse structure are specified together.
While it's true that rules form a recursive descent parser, that's only half of the story. When protoregexes or alternations (the | kind, not the || kind) are used, the declarative prefixes of these are gathered and a NFA is formed. It is then used to determine which of the alternation branches should be explored, if any; if there are multiple, they are ranked according to longest first, with longest literal and inheritance depth used as a tie-breaker.
Forming the declarative prefix involves looking down through the subrule calls to find lexical elements - effectively, then, the tokens. Thus, we could say that Raku grammars derive the tokenizer (actually many tokenizers) for us. These are typically generally generated at compile time, however, for things like custom operators, which are done by mixing in to the grammar, further NFAs will have to be produced at runtime too in order to account for the new tokens.
There's currently no way to hook into grammar compilation and do things differently (at least, not without playing with compiler internals). However, there probably will be in the next major language release, where the AST of a Raku program will be made available to the language user, and it will thus be possible to write modules that affect the compilation of different program constructs.

Related

Is Apples goto-bug possible in Kotlin if-statement without braces?

Imagine in Kotlin:
if (this) doThis()
else if(that) doThat()
else doWhatEver()
I've read using braces always (see Apples goto fail)!
Rule 1.3.a
Braces shall always surround the blocks of code (a.k.a., compound
statements), following if, else, switch, while, do, and for
statements; single statements and empty statements following these
keywords shall also always be surrounded by braces.
How does the Kotlin compiler handle the lack of braces in the above code? I thought Kotlin might be intelligent enough to avoid failures on that?
The example you give is not ambiguous; it could only have one reasonable meaning.  And it's rather different from the issue you link to (which doesn't involve else clauses at all.)  So I'm not sure what you're asking.
Kotlin is similar to most C-like languages in how it interprets if (and else).  So strictly speaking, an error of that type is still possible.  But Kotlin has two features which can reduce the risk of such problems.
First is that, unlike C and Java and similar languages, if can be used as an expression (returning a value).  When used this way, the compiler ensure that every branch returns a value; this will usually result in a compiler error if there's any confusion around multiple branches.
Second is the when structure, which functions like the C/Java switch statement, but avoids fall-through and hence the need for breaks; it can also be used as an expression, enforcing a single path through and a single return value.
So in Kotlin, the linked code would best have been written with a when, which would have been simpler as well as preventing that type of error.
Ultimately, I don't think it's really comparable, though.  The linked code is low-level C, which has very different practices and restrictions from general application code.  In particular, the use of goto for error clean-up is inherently error-prone.  And if they'd used else branches properly, it would have made the code rather clearer as well as preventing this error.
It's possible to write bad code in any language if you're determined enough!  A good language is one which makes it easier to write good code, and harder to write bad code.  (And I think Kotlin scores pretty well in that respect.) 

Why is adding methods to a type different than adding a sub or an operator in perl6?

Making subs/procedures available for reuse is one core function of modules, and I would argue that it is the fundamental way how a language can be composable and therefore efficient with programmer time:
if you create a type in your module, I can create my own module that adds a sub that operates on your type. I do not have to extend your module to do that.
# your module
class Foo {
has $.id;
has $.name;
}
# my module
sub foo-str(Foo:D $f) is export {
return "[{$f.id}-{$f.name}]"
}
# someone else using yours and mine together for profit
my $f = Foo.new(:id(1234), :name("brclge"));
say foo-str($f);
As seen in Overloading operators for a class this composability of modules works equally well for operators, which to me makes sense since operators are just some kinda syntactic sugar for subs anyway (in my head at least). Note that the definition of such an operator does not cause any surprising change of behavior of existing code, you need to import it into your code explicitly to get access to it, just like the sub above.
Given this, I find it very odd that we do not have a similar mechanism for methods, see e.g. the discussion at How do you add a method to an existing class in Perl 6?, especially since perl6 is such a method-happy language. If I want to extend the usage of an existing type, I would want to do that in the same style as the original module was written in. If there is a .is-prime on Int, it must be possible for me to add a .is-semi-prime as well, right?
I read the discussion at the link above, but don't quite buy the "action at a distance" argument: how is that different from me exporting another multi sub from a module? for example the rust way of making this a lexical change (Trait + impl ... for) seems quite hygienic to me, and would be very much in line with the operator approach above.
More interesting (to me at least) than the technicalities is the question if language design: isn't the ability to provide new verbs (subs, operators, methods) for existing nouns (types) a core design goal for a language like perl6? If it is, why would it treat methods differently? And if it does treat them differently for a good reason, does that not mean we are using way to many non-composable methods as nouns where we should be using subs instead?
From a language design perspective, it all comes down to a simple question: which language are we speaking? In Perl 6, this is a question about which we always try to be very clear.
The notion of ones current language in Perl 6 is defined entirely in terms of lexical scope. Sub declarations are lexically scoped. When we import symbols from a module, including extra multi candidates, those are lexically scoped. When we perform language tweaks - such as introducing new operators - those are lexically scoped. Verbs in our current language - that is, subroutine calls - are those with a lexical definition. (Operators are simply sub calls with more interesting parsing.) Since lexical scopes are closed at the end of compile time, the compiler has a complete view of the current language. That's why sub calls to non-existent subs, or references to undeclared variables, are detected and reported at compile time, as well as some basic compile-time type checking; future Perl 6 versions are likely to extend the set of compile-time checks that can be expected. The current language is the static, early-bound, part of Perl 6.
By contrast, a method call is a verb to be interpreted in the target object's language. This is the dynamic, late-bound, part of Perl 6. While the most immediate result of that is the typical polymorphism found in various forms in implementations of OO, thanks to meta-programming even the manner in which a verb is interpreted is up for grabs. For example, a monitor will acquire a lock while it interprets the verb and release it afterwards. Other objects might have been constructed based on things other than Perl 6 code, and so the interpretation of a verb doesn't mean invoking code written as a Perl 6 method. Or the code might be somewhere over the network. Who knows? Well, certainly not the caller, and that's the point, and the power, and the risk, of late binding.
The Perl 6 answer to "I want to extend the range of verbs I can use with this object in my current language" is very simple: use language features that relate to extending the current language! There's even a special syntax, $obj.&foo, that allows for a verb foo to be defined in the current language - by writing a sub - and then invoked much as if it's a method on the object. However, the small syntactic distinction makes it clear to the reader - and to the compiler - what is going on, and which language is getting to define that verb.
Through the use of augment it is possible to extend the language defined by some type of objects. However, it's rarely the best way to do things, given that it will have global effect, and also scatter the definition of the language of the object.
Much of what we do in programming is about building languages. By that I don't mean new syntax; most of our new languages - even in a language as open to mutation as Perl 6 - are just nouns and verbs defined using standard language features. However, in any non-trivial program, we can't keep every detail of every language in mind at once. When I go to the restaurant and order a schnitzel, I don't know how the order will be transported to the kitchen, what the kitchen looks like, whether the schnitzel is hammered out, breaded, and cooked on demand, or just served from a (hopefully not too stale) cache of prepared schnitzels. The kitchen and I have just enough shared meaning to make the right kind of thing happen, but I don't know how they'll precisely react to my request and they need not know what I'll do in the meantime. This kind of thinking is acknowledged by OO itself - at least when we fully embrace it - and at a larger scale by concepts such as bounded contexts, as found in Domain Driven Design.
In summary, Perl 6 tries to help us keep our languages straight: to know what is in our current language, and what we express with only limited understanding. That distinction is encoded by the sub/method distinction, which also turns out to be a sensible place to hang a static/dynamic distinction too.

Differences between Red's 5 function types, and why does it distinguish them?

In Red, there are functions of datatypes function!, op!, native!, routine! and action!. What are the differences between them? As far as I know function! is used for user-defined functions and op! for infix operators, and routine! for functions defined in Red/System, but why is there a need for the other two?
function!
As you've guessed yourself, function!s are user-defined functions that support refinements and typechecking, and can also contain embedded docstrings.
Typically, function! values are created with func, function, does and has constructors, and utilize so-called spec dialect; but, in theory, nothing stops you from making your own constructors or devising your own spec formats.
It's also worth noting that function!s fully support reflection.
op!
op!s are infix wrappers on top of other 4 types of functions - they take one value on the left and result of an expression on the right, and they also take precedence other functions during evaluation.
op! values are limited to two arguments, don't support refinements, and have a limited support for reflection (e.g. you can't inspect their bodies with body-of).
routine!
routines! exist in both realms of Red and Red/System (low-level dialect on top of which Red runtime is build). Their specs are written in spec dialect, but their bodies contain Red/System code. Oh, and they support reflection.
Usually they are used for library bindings (like the SQL lib you've mentioned), interaction with runtime, or for performance bottlenecks (Red/System is a compiled language, so rewriting perfomance-critial parts of your app as a set of routine!s will give you a significant boost, at the cost of mandatory compilation).
native!
native!s are functions written in Red/System (for perfomance, simplicity or feasibility reasons) and compiled down to native code (hence the name). Not sure what else can be said about them, aside from implementation details. native! aren't very user-facing, so you might want to study Red's source code in case you have any questions left.
action!
action!s are a standardized set of function written in Red/System (just like native!s) that each datatype implements (or inherits) as its "method". action! are polymorphic in a sense that they dispatch on their first argument:
>> add 1 2%
== 1.02
>> add 2% 1
== 102%
>> append [1] "2"
== [1 "2"]
>> append "1" [2]
== "12"
In mainstream languages this typically looks like "1".append([2]) or something like that.
Distinction between action!s and native!s boils down to a design choice:
you can have as many native! as you want, but action!s, for efficiency, have a fixed-size dispatch table (which means that maximum number of action!s per datatype is limited; minimum number is two: make [to create value] and mold [to serialize value to string!]).
logically, action!s are organized around datatype to which they belong, in one file, while native!s aren't really concerned with datatypes, and implement control flow, trigonometric functions, operations on sets, etc.
Coincidentially, just recently we have a similar discussion about action!s and native!s in our community chat, which you might want to read. I can also recommend to skim thru Rudolf Meijer's Red specification draft, and, of course, official reference documentation.
As for "why" in your question - distinction between 5 types is just an implementation detail, inherited from Rebol. Logically, they all implement what you might call a "function" from conceptual standpoint, and fall into any-function! camp.
While to a caller it may seem similar to run a function whose body is a BLOCK! of code to one which is implemented as native instructions...the implementation has to go down a different branch.
I don't know precisely what Red does in the compilation case, the interpreter case for Rebol2 and Red are similar. These different types are effectively part of a big switch() statement. If it looks in the cell describing the "function" and finds TYPE_NATIVE it knows to interpret the cell's contents as containing a native function pointer. If it finds TYPE_FUNCTION, it knows to pick apart the cell as containing a pointer to a block of code to execute:
https://github.com/red/red/blob/cb39b45f90585c8f6392dc4ccfc82ebaa2e312f7/runtime/interpreter.reds#L752
Now I myself would agree with your line of questioning. e.g. is this leaking an implementation detail to the user--who shouldn't be concerned with this facet in the type system?
But for what it is worth, there is a catch-all typeset called ANY-FUNCTION!:
>> any-function!
== make typeset! [native! action! op! function! routine!]
And you might think of that as "anything that obeys a function-like interface for calling". There are some complexities however, as OP! gets its first argument from the left...so that really is a matter of concern from an interface perspective.
Anyway... a NATIVE! (body is built as native code into the executable) vs. a FUNCTION! (body is a block of Red code run by interpretation or compilation) is just one distinction. A ROUTINE! is a facade built to interact with a DLL/library a la FFI that did not have a-priori knowledge of Red. An ACTION! is a very oversimplified attempt at what are called in other languages Generics. An OP! just gets its first argument from the left.
Point being that each of these might feel the same to a caller (except OP!), but the implementation has to do something different. The way it knows to do something different is via a type byte in a value cell. That's how Rebol2 did it--and Red followed Rebol2 fairly closely--so that's how it also does it. It means that any novel concept of what provides the implementation behind a function requires a new datatype, and it's probably not the greatest idea.
Red is based on Rebol an so has the same types.
function! is an user defined function defined in red
native! is an function in machinecode
op! is an infix operator written in machinecode
action! is an polymorphic function in machinecode
routine! is an function in imported from dynamic library

What is the exhaustive list of guidelines/practices/rules to fully conform with functional paradigm?

I've started playing around with Kotlin, but I sense my own limitation in the way I program. My problem is that I still think Java therefore the style is still imperative, my question is to all functional programming zealots , which I believe would be very useful to all people who at the very beginning stage and also need to 'brake' their brain to start building it again; to leave comfort zone and start thinking pseudo and not in "whatever is your first language". I believe it is possible for highly experienced polyglot developers to chew the concepts down to plain advices of what makes your program being written in entirely functional way and what violates the paradigm. I don't know all the quirks but please don't hesitate to include universally accepted terms which might be unknown to me(I can always lookup). At this point I need this set of rules to make myself suffer at first and not break them but then I know I will feel it, analyze guidelines and understand how they are worse/better which of course is my own homework.
So example of these guidelines, would be something like:
Never change state, this can be avoided by using x, y, z
Operate using higher order functions only (I maybe wrong, just example)
I hope the answer will give me long term reference to put myself in extreme conditions where I stop escaping to OOP whenever I feel uncomfortable. And now when I look at Kotlin I understand how I've should've been thinking about problems, it is about intention not about the structure imposed by one language or another. Intention can always be converted to a language of your choice and backed up by design patterns applicable to the language, but to find that middle ground I need to jail myself first from the comfort zone.
Avoid mutable state like the plague.
One of the main points of using functional programming, possibly the main one, is to avoid all the little pitfalls, bugs, issues one needs to deal with when using mutable state. You should do everything you can in order to avoid mutating state. For instance, instead of using C-style for-loops where you need to keep a counter variable updated, use map and other higher-order functions in order to abstract away your iteration patterns. This also means that you should never change the value of a variable if you can avoid that. Instead, you should be defining almost all of your variables, preferrably all of them, as constants, and using functions to compute new values from them instead of mutating them.
Avoid side-effects like the plague.
Mutable state's ugly cousin, side-effects. Side effects mean anything other than taking a value and returning a value in a function. If that function prints data, mutates global variables, sends messages to threads, or anything, anything other than simply taking its parameters, computing a value from them, and returning a value, that function has side-effects. Side-effects are important (see next bullet point), but if you use them a lot, they get impossible to track. Just think of how everyone tells you to avoid global variables in imperative programming. Functional programming goes a step further and tries to avoid all side-effects. The bulk of your program should be made of pure functions. (See ahead)
When you need to use side-effects, keep them contained.
Yes, I just told you to run away from side-effects. However, no program is useful without side-effects of some kind. Graphical User Interface? Side-effect. Audio output? Side-effect. Printing to a shell? Side-effect. So you can't really get rid of side-effects if you want to build useful stuff.
What you should do instead is write your code so that all your side-effecting code lives in a thin layer which mostly calls pure functions and then does the required side-effects using the result of these pure function calls.
Use pure functions for everything you can.
This is sort of the flipside of the previous point. A pure function is a function which has no side-effects and does not mutate anything. It can only take in parameters and return a value. You should use these a lot. For instance, instead of doing your logging within functions which are computing stuff, you should be constructing your log strings using pure functions, and then letting your side-effects layer call these pure functions, call more pure functions in order to format the log strings into a full log, and then output the log itself from your side-effects layer.
Use higher-order functions to structure your code.
Higher-order functions are, in a way, the glue that makes functional programming work. A higher-order function is a function which takes one or more functions as parameters and/or returns a function. The power of higher-order functions is that they can encapsulate many of the patterns which you would use in an imperative-style program in a declarative manner. For instance, let's take a look at the three most common higher-order functions:
map is a function which takes a function and a list of values, applies its function argument to each of those values, and returns a new list with the results. map encapsulates the whole pattern of iterating over a list doing an operation on each value in a declarative manner.
filter is a function which takes a function which returns a boolean and a list of values, applies its function argument to each of those values and returns a list containing only those values for which its function argument returns true. It encapsulates the whole pattern of selecting results from a list in a declarative manner.
reduce, also known as fold, takes an initial value, a binary function and a list of values. It uses its function argument to combine the initial value with the first value of the list, then combines the result with the next value of the list and keeps on doing this until it has reduced the list to just one single value. It encapsulates the entire pattern of obtaining an aggregate value from a list of values.
This is in no way an exhaustive list of higher-order functions, but these three are the most common ones. I hope this has been enough to show how you can structure code which would require a lot of tracking variables using only functions in a declarative manner. If you use these higher-order functions well, it's likely you won't ever need a for or while loop again.
This is definitely not an exhaustive list of functional programming practices, but I think most functional programmers would agree these five guidelines form the core of what functional programming is about. If you want to really learn how to apply these, my advice would be to learn a pure functional programming language such as Haskell, so you are forced to abandon the imperative paradigm and to learn how to structure things functionally instead. I would recommend the fantastic Haskell Programming from First Principles as a starting resource if you choose to go this way. In case you don't want to/can't put down the cash, Brent Yorgey's Haskell course at UPenn is also a great free resource.

Reasoning for Language-Required Variable Name Prefixes

The browser-based software StudyTRAX ( http://wiki.studytrax.com ), used for research data management, allows for custom form and form variable management via JavaScript. However, a StudyTRAX "variable" (essentially, a representation of both an element of a form [HTML properties included] and its corresponding parameter, with some data typing/etc.) must be referred to with #<varname>, while regular JavaScript variables will just be <varname>.
Is this sort of thing done to make parsing easier, or is it just to distinguish between the two so that researchers who aren't so technologically-inclined won't have as much trouble figuring out what they're doing? Given the nature of JavaScript, I would think the StudyTRAX "variables" are just regular JavaScript objects defined in such a way to make form design and customization simpler, and thus the latter would make more sense, but am I wrong?
Also, I know that there are other programming languages that do require specific variable prefixes (though I can't think of some off the top of my head at the moment); what is/was the usual reasoning for that choice in language design?
Two part answer, StudyTRAX is almost certainly using a preprocessor to do some magic. JavaScript makes this relativity easy, but not as easy as a Lisp would. You still need to parse the code. By prefixing, the parser can ignore a lot of the complicated syntax of JavaScript and get to the good part without needing a "picture perfect" compiler. Actually, a lot of templeting systems do this. It is an implementation of Lisp's quasi-quote (see Greenspun's Tenth Rule).
As for prefixes in general, the best way to understand them is to try to write a parser for a language without them. For very dynamic and pure languages like Lisp and JavaScript where everything is a List / object it is not too bad. When you get languages where methods are distinct from objects, or functions are not first class the parser begins having to ask itself what type of thing doe "foo" refer to? An annoying example from Ruby: an unprefixed identifier is either a local variable or a method implicitly on self. In Rails there are a few functions that are implemented with method_missing. Person.find_first_by_rank works fine, but
Class Person < ActiveRecord::Base
def promotion(name)
p = find_first_by_rank
[...]
end
end
gives an error because find_first_by_rank looks like it might be a local variable and Ruby is scared to call method_missing on something that might just be a misspelled local variable.
Now imagine trying to distinguish between instance variables (prefix-#), class-variables (prefix-##), global variables (prefix-$), Constants (first letter Capitol), method names and local variables (no prefix small case) by context alone.
(From a Compiler & Language Hobbyst Designer).
Your question is more especific to the "StudyTRAX" software.
In early days of programming, variables in Basic used prefixes as $ (for strings, "a$"), to difference from numeric values. Today, some programming languages such as PHP prefixes variables with "$". COBNOL used variables starting with A to I, for integers, and later letters for floats.
Transforming, and later, executing some code, its a complex task, that's why many programmers, use shortcuts like adding prefixes or suffixes to programming languages.
In many Collegues or Universities, exist specialized classes / courses for transforming code from a programming language, to something that the computer does, like "Compilers", "Automatons", "Language Design", because its not an easy task.
Perl requires different variable prefixes, depending on the type of data:
$scalar = 4.2;
#array = (1, 4, 9, 16);
%map = ("foo" => 42, "bar" => 17, "baz" => 137);
As I understand it, this is so the reader can immediately identify what kind of object they're dealing with. It's not a matter of whether the reader is technologically inclined or not: if you reduce the programmer's cognitive load, he can use his brainpower for more important things than figuring out fiddly syntactic details.
Whether Perl's design is successful in this respect is another question, but I believe that's the reasoning behind the feature.