Why not have operators as both keywords and functions? - grammar

I saw this question and it got me wondering.
Ignoring the fact that pretty much all languages have to be backwards compatible, is there any reason we cannot use operators as both keywords and functions, depending on if it's immediately followed by a parenthesis? Would it make the grammar harder?
I'm thinking mostly of python, but also C-like languages.

Perl does something very similar to this, and the results are sometimes surprising. You'll find warnings about this in many Perl texts; for example, this one comes from the standard distributed Perl documentation (man perlfunc):
Any function in the list below may be used either with or without parentheses around its arguments. (The syntax descriptions omit the parentheses.) If you use parentheses, the simple but occasionally surprising rule is this: It looks like a function, therefore it is a function, and precedence doesn't matter. Otherwise it's a list operator or unary operator, and precedence does matter. Whitespace between the function and left parenthesis doesn't count, so sometimes you need to be careful:
print 1+2+4; # Prints 7.
print(1+2) + 4; # Prints 3.
print (1+2)+4; # Also prints 3!
print +(1+2)+4; # Prints 7.
print ((1+2)+4); # Prints 7.
An even more surprising case, which often bites newcomers:
print
(a % 7 == 0 || a % 7 == 1) ? "good" : "bad";
will print 0 or 1.
In short, it depends on your theory of parsing. Many people believe that parsing should be precise and predictable, even when that results in surprising parses (as in the Python example in the linked question, or even more famously, C++'s most vexing parse). Others lean towards Perl's "Do What I Mean" philosophy, even though the result -- as above -- is sometimes rather different from what the programmer actually meant.
C, C++ and Python all tend towards the "precise and predictable" philosophy, and they are unlikely to change now.

Depending on the language, not() is not defined. If not() is not defined in some language, you can not use it. Why not() is not defined in some language? Because creator of that language probably had not need this type of language construction. Because it is better to let things be simpler.

Related

Precedence inside a function call

Using the defined-or operator ( // ) in a function call produces the result I'd expect:
say( 'nan'.Int // 42); # OUTPUT: «42»
However, using the lower-precedence orelse operator instead throws an error:
say( 'nan'.Int orelse 42);
# OUTPUT: «Error: Unable to parse expression in argument list;
# couldn't find final ')'
# (corresponding starter was at line 1)»
What am I missing about how precedence works?
(Or is the error a bug and I'm just overthinking this?)
I'd say, it's a grammar bug, as
say ("nan".Int orelse 42); # 42
works.
TL;DR My super useful naanswer (not-an-answer / non-authoritative answer / food for thought) is it might be a bug or it might not. :)
Other examples:
say(42 and 42);
say(42 ==> 99);
yield the same error.
What am I missing about how precedence works?
Perhaps nothing. Perhaps it will be desirable and possible to fix the grammar so these function-call-arg-list-signifying parens determine precedence just like plain expression parens do.
If so, perhaps fixing it would best wait, or perhaps realistically must wait, until when or after RakuAST lands (6.e?). Or perhaps even later, lf/when grammar cleanup/slangs lands (6.f?).
Or perhaps it's going to always stay as it is for reasons such as good usability (despite the initial "huh?") and/or expediency and/or single-pass parsing and/or whatever.
I've dug a little to see if I could find relevant commentary. Here are some (juicy?) bits:
the OPP is a bit more complex than a standard binary-operator OPP
(from a comment on #perl6)
If you scroll backwards from Larry's comment you'll see he said this in the context of Raku's extraordinary seamless parsing (no delimiters introduced) in a single pass of nested sub-languages that each can have arbitrary grammars.
(Btw, one thought I had: did std parse say(42 and 42) fine? I'm not sure if there's a running std anywhere these days.)
While we do have complete control of stock Raku, I'm not convinced there's anything compelling about bending over backwards to fix every wrinkle of this sort (foo(... op ...) in this case) when the general case (..... where the middle ... inside the outer pair of .s has arbitrary syntax) means we'll be hitting limits in how "perfect" it can all be when there's a huge amount of anarchic language / syntax mixing going on in userland/module space, as I anticipate will emerge in years to come.
So, imo, if it's reasonably easy to fix, without unduly cramping or burdening user slang freedom, great. If not, I think the current situation is fair enough (though perhaps it'll be desirable, viable and reasonable to improve the error message).
Perhaps consider the foregoing in combination with:
Raku borrows many concepts from human language ...
(from the doc)
in combination with:
☞ Self-clocking code produces better syntax error messages
(from Seeing Wrong Right)
in combination with:
Break that clock and your error messages will turn to mush
(from a mailing list comment)
But then again:
Please don't assume that rakudo's idiosyncracies and design fossils are canonical.
Do you mean this, maybe...?
> say ( NaN.Int orelse 42 )
42
since
> say( NaN.Int orelse 42 )
===SORRY!=== Error while compiling:
Unable to parse expression in argument list; couldn't find final ')' (corresponding starter was at line 1)
------> say( '42'.Int⏏ orelse 42 )
expecting any of:
infix
infix stopper
I would tend to agree with #lizmat that there is a grammar bug in the compiler.

Regex/token/rule to match nested curly braces?

I need to match the values of key = value pairs in BibTeX files, which can contain arbitrarily nested braces, delimited by braces. I've got as far as matching at most two deep nested curly braces, like {some {stuff} like {this}} with the kludgey:
token brace-value {
'{' <-[{}]>* ['{' <-[}]>* '}' <-[{}]>* ]* '}'
}
I shudder at the idea of going one level further down... but proper parsing of my BibTeX stuff needs at least three levels deep.
Yes, I know there are BibTeX parsers around, but I need to grab the complete entry for further processing, and peek at a few keys meanwhile. My *.bib files are rather tame (and I wouldn't mind to handle a few stray entries by hand), the problem is that I have a lot of them, with much overlap. But some of the "same" entries have different keys, or extra data. I want to consolidate them into a few master files (the whole idea behind BibTeX, right?). Not fun by hand if bibtool gives a file with no duplicates (ha!) of some 20 thousand lines...
After perusing Lenz' "Parsing with Perl 6 Regexes and Grammars" (Apress, 2017), I realized the "regex" machinery (based on backtracking) might actually be a lot more capable than officially admitted, as a regex can call another, and nowhere do I see a prohibition on recursive calls.
Before digging in, a bit of context free grammars: A way to describing nested braces (and nothing else) is with the grammar:
S -> { S } S | <nothing>
I.e., nested braces are either an opening brace, nested braces, a closing brace, more nested braces; or nothing whatsoever. This translates more or less directly to Raku (there is no empty regex, fake it by making the construction optional):
my regex nb {
[ '{' <nb> '}' <nb> ]?
}
Lo and behold, this works. Need to fix up to avoid captures, kill backtracking (if it doesn't match on the first try, it won't ever match), and decorate with "anything else" fillers.
my regex nested-braces {
:ratchet
<-[{}]>*
[ '{' <.nested-braces> '}' <.nested-braces> ]?
<-[{}]>*
};
This checks out with my test cases.
For not-so-adventurous souls, there is the Text::Balanced module for Perl (formerly Perl 5, callable from Raku using Inline::Perl5). Not directly useful to me inside a grammar, unfortunately.
Solution
A way to describe nested braces (and nothing else)
Presuming a rule named &R, I'd likely write the following pattern if I was writing a quick small one-off script:
\{ <&R>* \}
If I was writing a larger program that should be maintainable I'd likely be writing a grammar and, using a rule named R the pattern would be:
'{' ~ '}' <R>*
This latter avoids leaning toothpick syndrome and uses the regex ~ operator.
These will both parse arbitrarily deeply nested paired braces, eg:
say '{{{{}}}}' ~~ token { \{ <&?ROUTINE>* \} } # 「{{{{}}}}」
(&?ROUTINE refers to the routine in which it appears. A regex is a routine. (Though you can't use <&?ROUTINE> in a regex declared with / ... / syntax.)
regex vs token
kill backtracking
my regex nested-braces {
:ratchet
The only difference between patterns declared with regex and token is that the former turns ratcheting off. So using it and then immediately turning ratcheting on is notably unidiomatic. Instead:
my token nested-braces {
Backtracking
the "regex" machinery (based on backtracking)
The grammar/regex engine does include backtracking as an optional feature because that's occasionally exactly what one wants.
But the engine is not "based on backtracking", and many grammars/parsers make little or no use of backtracking.
Recursion
a regex can call another, and nowhere do I see a prohibition on recursive calls.
This alone is nothing special for contemporary regex engines.
PCRE has supported recursion since 2000, and named regexes since 2003. Perl's default regex engine has supported both since 2007.
Their support for deeper levels of recursion and more named regexes being stored at once has been increasing over time.
Damian Conway's PPR uses these features of regexes to build non-trivial (but still small) parse trees.
Capabilities
a lot more capable
Raku "regexes" can be viewed as a cleaned up take on the unfolding regex evolution. To the degree this helps someone understand them, great.
But really, it's a whole new deal. For example, they're turing complete, in a sensible way, and thus able to parse anything.
than officially admitted
Well that's an odd thing to say! Raku's Grammars are frequently touted as one of Raku's most innovative features.
There are three major caveats:
Performance The primary current caveat is that a well written C parser will blow the socks off a well written Raku Grammar based parser.
Pay off It's often not worth the effort it takes to write a fully correct parser for a non-trivial format if there's an existing parser.
Left recursion Raku does not automatically rewrite left recursion (infinite loops).
Using existing parsers
I know there are BibTeX parsers around, but I need to grab the complete entry for further processing, and peek at a few keys meanwhile.
Using a foreign module in Raku can be a bit of a revelation. It is not necessarily like anything you'll have experienced before. Raku's foreign language adaptors can do smart marshaling for you so it can be like you're using native Raku features.
Two of the available foreign language adaptors are already sufficiently polished to be amazing -- the ones for Perl and for C.
I'm pretty sure there's a BibTeX package for Perl that wraps a C BibTeX parser. If you used that you'd hopefully get parsing results all nicely wrapped up into Raku objects as if it was all Raku in the first place, but retaining much of the high performance of the C code.
A Raku BibTeX Grammar?
Perhaps your needs do call for creating and using a small Raku Grammar.
(Maybe you're doing this partly as an exercise to familiarize yourself with Raku, or the regex/grammar aspect of Raku. For that it sounds pretty ideal.)
As soon as you begin to use multiple regexes together -- even just two -- you are closing in on grammar territory. After all, they're just an easy-to-use construct for using multiple regexes together.
So if you decide you want to stick with writing parsing code in Raku, expect to write it something like this:
grammar BiBTeX {
token TOP { ... }
token ...
token ...
}
BiBTeX.parse: my-bib-file
For more details, see the official doc's Grammar tutorial or read Moritz's book.
OK, just (re) checked. The documentation of '{' ~ '}' leaves a whole lot to desire, it is not at all clear it is meant to handle balanced, correctly nested delimiters.
So my final solution is really just along the lines:
my regex nested-braces {
:ratchet
'{' ~ '}' .*
}
Thanks everyone! Lerned quite a bit today.

Is it acceptable to use `to` to create a `Pair`?

to is an infix function within the standard library. It can be used to create Pairs concisely:
0 to "hero"
in comparison with:
Pair(0, "hero")
Typically, it is used to initialize Maps concisely:
mapOf(0 to "hero", 1 to "one", 2 to "two")
However, there are other situations in which one needs to create a Pair. For instance:
"to be or not" to "be"
(0..10).map { it to it * it }
Is it acceptable, stylistically, to (ab)use to in this manner?
Just because some language features are provided does not mean they are better over certain things. A Pair can be used instead of to and vice versa. What becomes a real issue is that, does your code still remain simple, would it require some reader to read the previous story to understand the current one? In your last map example, it does not give a hint of what it's doing. Imagine someone reading { it to it * it}, they would be most likely confused. I would say this is an abuse.
to infix offer a nice syntactical sugar, IMHO it should be used in conjunction with a nicely named variable that tells the reader what this something to something is. For example:
val heroPair = Ironman to Spiderman //including a 'pair' in the variable name tells the story what 'to' is doing.
Or you could use scoping functions
(Ironman to Spiderman).let { heroPair -> }
I don't think there's an authoritative answer to this.  The only examples in the Kotlin docs are for creating simple constant maps with mapOf(), but there's no hint that to shouldn't be used elsewhere.
So it'll come down to a matter of personal taste…
For me, I'd be happy to use it anywhere it represents a mapping of some kind, so in a map{…} expression would seem clear to me, just as much as in a mapOf(…) list.  Though (as mentioned elsewhere) it's not often used in complex expressions, so I might use parentheses to keep the precedence clear, and/or simplify the expression so they're not needed.
Where it doesn't indicate a mapping, I'd be much more hesitant to use it.  For example, if you have a method that returns two values, it'd probably be clearer to use an explicit Pair.  (Though in that case, it'd be clearer still to define a simple data class for the return value.)
You asked for personal perspective so here is mine.
I found this syntax is a huge win for simple code, especial in reading code. Reading code with parenthesis, a lot of them, caused mental stress, imagine you have to review/read thousand lines of code a day ;(

Where is contains( Junction) defined?

This code works:
(3,6...66).contains( 9|21 ).say # OUTPUT: «any(True, True)␤»
And returns a Junction. It's also tested, but not documented.
The problem is I can't find its implementation anywhere. The Str code, which is also called from Cool, never returns a Junction (it does not take a Junction, either). There are no other methods contain in source.
Since it's autothreaded, it's probably specially defined somewhere. I have no idea where, though. Any help?
TL;DR Junction autothreading is handled by a single central mechanism. I have a go at explaining it below.
(The body of your question starts with you falling into a trap, one I think you documented a year or two back. It seems pretty irrelevant to what you're really asking but I cover that too.)
How junctions get handled
Where is contains( Junction) defined? ... The problem is I can't find [the Junctional] implementation anywhere. ... Since it's autothreaded, it's probably specially defined somewhere.
Yes. There's a generic mechanism that automatically applies autothreading to all P6 routines (methods, operators etc.) that don't have signatures that explicitly control what happens with Junction arguments.
Only a tiny handful of built in routines have these explicit Junction handling signatures -- print is perhaps the most notable. The same is true of user defined routines.
.contains does not have any special handling. So it is handled automatically by the generic mechanism.
Perhaps the section The magic of Junctions of my answer to an earlier SO Filtering elements matching two regexes will be helpful as a high level description of the low level details that follow below. Just substitute your 9|21 for the foo & bar in that SO, and your .contains for the grep, and it hopefully makes sense.
Spelunking the code
I'll focus on methods. Other routines are handled in a similar fashion.
method AUTOTHREAD does the work for full P6 methods.
This is setup in this code that sets up handling for both nqp and full P6 code.
The above linked P6 setup code in turn calls setup_junction_fallback.
When a method call occurs in a user's program, it involves calling find_method (modulo cache hits as explained in the comment above that code; note that the use of the word "fallback" in that comment is about a cache miss -- which is technically unrelated to the other fallback mechanisms evident in this code we're spelunking thru).
The bit of code near the end of this find_method handles (non-cache-miss) fallbacks.
Which arrives at find_method_fallback which starts off with the actual junction handling stuff.
A trap
This code works:
(3,6...66).contains( 9|21 ).say # OUTPUT: «any(True, True)␤»
It "works" to the degree this does too:
(3,6...66).contains( 2 | '9 1' ).say # OUTPUT: «any(True, True)␤»
See Lists become strings, so beware .contains() and/or discussion of the underlying issues such as pmichaud's comment.
Routines like print, put, infix ~, and .contains are string routines. That means they coerce their arguments to Str. By default the .Str coercion of a listy value is its elements separated by spaces:
put 3,6...18; # 3 6 9 12 15 18
put (3,6...18).contains: '9 1'; # True
It's also tested
Presumably you mean the two tests with a *.contains argument passed to classify:
my $m := #l.classify: *.contains: any 'a'..'f';
my $s := classify *.contains( any 'a'..'f'), #l;
Routines like classify are list routines. While some list routines do a single operation on their list argument/invocant, eg push, most of them, including classify, iterate over their list doing something with/to each element within the list.
Given a sequence invocant/argument, classify will iterate it and pass each element to the test, in this case a *.contains.
The latter will then coerce individual elements to Str. This is a fundamental difference compared to your example which coerces a sequence to Str in one go.

can a variable have multiple values

In algebra if I make the statement x + y = 3, the variables I used will hold the values either 2 and 1 or 1 and 2. I know that assignment in programming is not the same thing, but I got to wondering. If I wanted to represent the value of, say, a quantumly weird particle, I would want my variable to have two values at the same time and to have it resolve into one or the other later. Or maybe I'm just dreaming?
Is it possible to say something like i = 3 or 2;?
This is one of the features planned for Perl 6 (junctions), with syntax that should look like my $a = 1|2|3;
If ever implemented, it would work intuitively, like $a==1 being true at the same time as $a==2. Also, for example, $a+1 would give you a value of 2|3|4.
This feature is actually available in Perl5 as well through Perl6::Junction and Quantum::Superpositions modules, but without the syntax sugar (through 'functions' all and any).
At least for comparison (b < any(1,2,3)) it was also available in Microsoft Cω experimental language, however it was not documented anywhere (I just tried it when I was looking at Cω and it just worked).
You can't do this with native types, but there's nothing stopping you from creating a variable object (presuming you are using an OO language) which has a range of values or even a probability density function rather than an actual value.
You will also need to define all the mathematical operators between your variables and your variables and native scalars. Same goes for the equality and assignment operators.
numpy arrays do something similar for vectors and matrices.
That's also the kind of thing you can do in Prolog. You define rules that constraint your variables and then let Prolog resolve them ...
It takes some time to get used to it, but it is wonderful for certain problems once you know how to use it ...
Damien Conways Quantum::Superpositions might do what you want,
https://metacpan.org/pod/Quantum::Superpositions
You might need your crack-pipe however.
What you're asking seems to be how to implement a Fuzzy Logic system. These have been around for some time and you can undoubtedly pick up a library for the common programming languages quite easily.
You could use a struct and handle the operations manualy. Otherwise, no a variable only has 1 value at a time.
A variable is nothing more than an address into memory. That means a variable describes exactly one place in memory (length depending on the type). So as long as we have no "quantum memory" (and we dont have it, and it doesnt look like we will have it in near future), the answer is a NO.
If you want to program and to modell this behaviour, your way would be to use a an array (with length equal to the number of max. multiple values). With this comes the increased runtime, hence the computations must be done on each of the values (e.g. x+y, must compute with 2 different values x1+y1, x2+y2, x1+y2 and x2+y1).
In Perl , you can .
If you use Scalar::Util , you can have a var take 2 values . One if it's used in string context , and another if it's used in a numerical context .