How to add a small bit of context in a grammar? - lark-parser

I am tasked to parse (and transform) a code of a computer language, that has a slight quirk in its rules, at least I see it this way. To be exact, the compiler treats new lines (as well as semicolons) as statement separators, but other than that (e.g. inside the statement) it treats them as spacers (whitespace).
As an example, this code:
try
local x = 5 / 0
catch (i)
print(i + "\n")
is proved to be equivalent to this:
try local x = 5 / 0 catch (i) print(i + "\n")
I don't see how I can express such a rule in EBNF, or specifically in Lark EBNF dialect. I mean in a sensible way. I probably could define all possible newline positions inside all statements, but it would be cumbersome and error-prone.
I wish to find a way to treat newlines contextually. Is there a proven method for this, preferably within Python/Lark domain? If I have to modify the parser for that purpose, then where should I start?
Or if I misunderstood something in this language in particular or in machine language parsing in general, or my statement of the problem is wrong, I'd also be happy to get educated.
(As you may guess, the language in question has a well proven implementation, but no officially defined grammar. Also, it is Squirrel, for all that it matters.)

The relevant quote from the "specification" is this:
A squirrel program is a simple sequence of statements.:
stats := stat [';'|'\n'] stats
[...] Statements can be separated with a new line or ‘;’ (or with the keywords case or default if inside a switch/case statement), both symbols are not required if the statement is followed by ‘}’.
These are relatively complex rules and in their totality not context free if newlines can also be ignored everywhere else. Note however that in my understanding the text implies that ; or \n are required when no of the other cases apply. That would make your example illegal. That probably means that the BNF as written is correct, e.g. both ; and \n are optionally everywhere. In that case you can (for lark) just put an %ignore "\n" statement and it should work fine.
Also, lark should not complain if you both ignore the \n and use it in a rule: Where useful it will match it in a rule, otherwise it will just ignore it. Note however that this breaks if you use a Terminal that includes the \n (e.g. WS or /\s/). Just have \n as an extra case.
(For the future: You will probably get faster response for lark questions if you ask over on gitter or at least put a link to SO there.)

Related

Brace Delimiters with qq Don't Interpolate Code in Raku

Sorry if this is documented somewhere, but I haven't been able to find it. When using brace delimiters with qq, code is not interpolated:
qq.raku
#!/usr/bin/env raku
say qq{"Two plus two": { 2 + 2 }};
say qq["Two plus two": { 2 + 2 }];
$ ./qq.raku
"Two plus two": { 2 + 2 }
"Two plus two": 4
Obviously, this isn't a big deal since I can use a different set of delimiters, but I ran across it and thought I'd ask.
Update
As #raiph pointed out, I forgot to put the actual question: Is this the way it's supposed to work?
The quote language "nibbler" (the bit of the grammar that eats its way through a quoted string) looks like this:
[
<!stopper>
[
|| <starter> <nibbler> <stopper>
|| <escape>
|| .
]
]*
That is, until we see a stopper, eat whichever comes first of:
A starter (the opening { in your case), followed by some internal stuff, followed by a stopper (the }); this allows for nesting of the construct inside of the string
An escape (and closure interpolation is considered a kind of escape)
Any other character
This ordering in the grammar means that a nesting of the chosen quote starter/stopper will always win over an escape. This issue was discussed during the language design; we could, after all, have reordered the alternation in the grammar to have escapes win. On balance, however, it was felt that the choice of starter/stopper was the more local decision than the general properties of the quoting language, and so should take precedence. (This is also consistent with how quote languages are constructed: we take the base quoted string grammar and mix starter/stopper methods into it.)
Obviously, this isn't a big deal since I can use a different set of delimiters, but I ran across it and thought I'd ask.
You didn't ask anything. :)
Let's say you've got some text. And you want to use double quote processing to get interpolation, except you don't want braced text to be interpolated as code. You could write, say, qq:!c '...'. But don't you think it's a lot easier to remember, write, and read qq{ ... }?
Nice little touch, right?
Which is why it's the way it is -- it's a very nice touch.
And, perhaps, why it's not documented -- it's little, and, once you encounter it, obvious what you need to do.
That said, the Q lang escapes include ones to recursively re-enter the Q lang:
say qq{"Two plus two": \qq[{ 2 + 2 }] }; # "Two plus two": 4
Does that answer your question? :)

Modifiying ANTLR v4 auto-generated lexer?

So i am writing a small language and i am using antlrv4 as my tool. Antlr autogenerates lexer and parser files when u compile your grammar file(.g4). I am using javac btw. I want my language to have no semicolons and the way i want to do this is: if there is an identifier or ")" as the last token in a line, the lexer will automatically put the semicolon(Similar to what "go" language does). How would i approach something like this? There are other things like ATN(which i think is augmented transition network) and dfa(which i think is deterministic finite automaton) in the lexer file which i don't understand or how they relate to the lexing process?. Any help is appreciated. (btw i am still working on the grammar file so i don't have that fully completed).
Several points here: the ATN and the DFA are internal structures for parser + lexer and not something you would touch to change parsing behavior. Also, it's not clear to me why you want to have the lexer insert a semicolon at some point. What exactly do you want to accomplish by that (don't say: to make semicolons optional in the parser, I mean the underlying reason).
If you want to accept a command without a trailing semicolon you can make that optional:
assignment: simpleAssignment | complexAssignment SEMI?;
The parser will give you the content of the assignment rule regardless whether there is a trailing semicolon or not. Is that what you want?

What is Nim's approach to distinguish between commands?

I'm trying to understand what kind of approach is used by Nim to distinguish between commands.
There's the "separatist approach" where a semicolon just separates commands (used in Pascal for example), the "terminist approach" where a semicolon completely terminates the command (used in C, C++, Java, etc.) and the "liberal approach" where the programmer can decide whether or not to use a semicolon.
My thoughts are that Nim belongs to the liberal approach, but that would mean that semicolons could be added at the end of commands and Nim doesn't support that.
Any other thoughts?
I'm trying to understand what kind of approach is used by Nim to
distinguish between commands.
Why? This doesn't help in any way ... Nim has a complex syntax that doesn't readily fit into such boxes.
Your question is confused in several ways. First, what is a "command"? Semicolons separate statements or expressions. The difference between your categories matter mostly in expression languages--it determines whether the value of a block ending with a semicolon is the bottom value, or the value of the previous expression. "separatist" languages are confusing, error-prone, bad design, and obsolete--the mistakes of Algol are ancient history. Second, the categories don't make a lot of sense in languages like Nim where end-of-line is syntactically significant--a "missing" semicolon before a newline isn't really missing because the newline serves the same function. Thirdly, Nim most certainly does allow semicolons at the ends of expressions or statements (but it doesn't allow empty statements or expressions, so ;; is disallowed).
Consider:
proc a: int = 5 # returns 5
proc b: int = 5; # syntax error
proc c: int = # returns 5
5
proc d: int = # returns 5
5;
proc e: int = # syntax error
5;;
Since the ; that differentiates c and d makes no semantic difference, one could say that it's closer to "liberal" than to "separatist" or "terminist", but it isn't very liberal ... you can't just put semicolons anywhere.
Nim, like Python, is a whitespace-aware language. It uses newlines as statement separators and indentation to produce block structures.
Not all languages have visible statement separators, although some allow a visible statement separator in some circumstances. (For example, in Python simple statements can be separated by semicolons, but not compound statements.)
"There are more things in heaven and earth, Horatio, Than are dreamt of in your philosophy" (Hamlet I.5:159–167)

How can I make Perl6 (MoarVM / Rakudo) warn about all missing semicolons?

In Perl 5, it's best to use
use strict;
use warnings;
to ask the compiler to complain about missing semicolons, undeclared variables, etc.
I have been informed by citizens of the Perl community here on SO that Perl 6 uses strict by default, and this seems after testing to be the case.
Semicolons aren't required for the last statement in a block, but if I extend the block later, I'll be chagrinned when my code doesn't work because it's the same block (and also I want semicolons everywhere because it's, like, consistent and stuff).
My assumption is that Perl 6 doesn't even look at semicolons for the last statement in a block, but I'm still curious: is there a way to make it stricter yet?
Rather than enforce the extra semi-colon, Rakudo does try to give you a good error/hint if you do add to your block and forget to separate statements.
Typically I get "Two terms in a row across lines (missing semicolon or comma?)" when this happens.

Variable evaluation in LateX

I have the following piece of latex code:
\def\a{1}
\def\b{2}
\def\c{\a+\b}
\def\d{\c/2}
I expected \d to have the value 1.5. But it did not. However, adding parenthesis to the definition of \c like
\def\c{\a+\b}
Doesn't work either, because if I use \c somewhere, it complains about the parenthesis. Is there a way to evaluate \c before dividing it by 2 in the definition of \d? Like:
\def\d{\eval{\c}/2}
(I made that \eval up to show what I mean)
You could use the calc package for arithmetic operations. The package fp works with real numbers.
For discussing LaTeX problems you're kindly invited to visit tex.stackexchange.com.
You need to remember that \def is about creating replacement text. It will always give you back what you put in, quite apart from not knowing anything about maths. If we assume you are using e-TeX (likely), then for integer expressions you might do
\def\a{1}
\def\b{2}
\edef\c{\number\intexpr \a + \b \relax}
\edef\d{\number\intexpr \c / 2 \relax}
This uses the e-TeX primitive \intexpr, which does integer mathematics. For real numbers, Stefan is right that the fp package is the best approach.