Handling invalid input in the Lexer / Parser - antlr

so, I am parsing Hayes modem AT commands. Not read from a file, but passed as char * (I am using C).
1) what happens if I get something that I totally don't recognize? How do I handle that?
2) what if I have something like
my_token: "cmd param=" ("value_1" | "value_2");
and receive an invalid value for "param"?
I see some advice to let the back-end program (in C) handle it, but that goes against the grain for me. Catch teh problem as early as you can, is my motto.
Is there any way to catch "else" conditions in lexer/parser rules?
Thanks in advance ...

That's the thing: the whole point of your parser and lexer is to blow up if you get bad input, then you catch the blow up and present a pretty error message to the user.

I think you're looking for Custom Syntax Error Recovery to embed in your grammar.
EDIT
I've no experience with ANTLR and C (or C alone for that matter), so follow this advice with caution! :)
Looking at the page: http://www.antlr.org/api/C/using.html, perhaps the part at the bottom, Implementing Customized Methods is what you're after.
HTH

Related

Precedence inside a function call

Using the defined-or operator ( // ) in a function call produces the result I'd expect:
say( 'nan'.Int // 42); # OUTPUT: «42»
However, using the lower-precedence orelse operator instead throws an error:
say( 'nan'.Int orelse 42);
# OUTPUT: «Error: Unable to parse expression in argument list;
# couldn't find final ')'
# (corresponding starter was at line 1)»
What am I missing about how precedence works?
(Or is the error a bug and I'm just overthinking this?)
I'd say, it's a grammar bug, as
say ("nan".Int orelse 42); # 42
works.
TL;DR My super useful naanswer (not-an-answer / non-authoritative answer / food for thought) is it might be a bug or it might not. :)
Other examples:
say(42 and 42);
say(42 ==> 99);
yield the same error.
What am I missing about how precedence works?
Perhaps nothing. Perhaps it will be desirable and possible to fix the grammar so these function-call-arg-list-signifying parens determine precedence just like plain expression parens do.
If so, perhaps fixing it would best wait, or perhaps realistically must wait, until when or after RakuAST lands (6.e?). Or perhaps even later, lf/when grammar cleanup/slangs lands (6.f?).
Or perhaps it's going to always stay as it is for reasons such as good usability (despite the initial "huh?") and/or expediency and/or single-pass parsing and/or whatever.
I've dug a little to see if I could find relevant commentary. Here are some (juicy?) bits:
the OPP is a bit more complex than a standard binary-operator OPP
(from a comment on #perl6)
If you scroll backwards from Larry's comment you'll see he said this in the context of Raku's extraordinary seamless parsing (no delimiters introduced) in a single pass of nested sub-languages that each can have arbitrary grammars.
(Btw, one thought I had: did std parse say(42 and 42) fine? I'm not sure if there's a running std anywhere these days.)
While we do have complete control of stock Raku, I'm not convinced there's anything compelling about bending over backwards to fix every wrinkle of this sort (foo(... op ...) in this case) when the general case (..... where the middle ... inside the outer pair of .s has arbitrary syntax) means we'll be hitting limits in how "perfect" it can all be when there's a huge amount of anarchic language / syntax mixing going on in userland/module space, as I anticipate will emerge in years to come.
So, imo, if it's reasonably easy to fix, without unduly cramping or burdening user slang freedom, great. If not, I think the current situation is fair enough (though perhaps it'll be desirable, viable and reasonable to improve the error message).
Perhaps consider the foregoing in combination with:
Raku borrows many concepts from human language ...
(from the doc)
in combination with:
☞ Self-clocking code produces better syntax error messages
(from Seeing Wrong Right)
in combination with:
Break that clock and your error messages will turn to mush
(from a mailing list comment)
But then again:
Please don't assume that rakudo's idiosyncracies and design fossils are canonical.
Do you mean this, maybe...?
> say ( NaN.Int orelse 42 )
42
since
> say( NaN.Int orelse 42 )
===SORRY!=== Error while compiling:
Unable to parse expression in argument list; couldn't find final ')' (corresponding starter was at line 1)
------> say( '42'.Int⏏ orelse 42 )
expecting any of:
infix
infix stopper
I would tend to agree with #lizmat that there is a grammar bug in the compiler.

Drop use of = in parsing arguments in raku

The fact that you can write in raku the following
unit sub MAIN(Int $j = 2);
say $j
is amazing, and the fact that the argument parsing is done for you is beyond
useful. However I find personally extremely unergonomic
that for such arguments you habe to write a = to set the value, i.e.
./script.raku -j=5
I was wondering if there is a way to tell the parser that it should allow options without
the = so that I can write
./script.raku -j 5
I haven't seen this in the docs and this would really be much more intuitive for some people
like me. If it is not currently possible, I think it would be a useful add-on.
You could also use SuperMAIN, a library for CLI processing. This add some new superpowers to MAIN
There has been a lot of discussion of how command line parameters should be parsed. At the moment there are no plans of adding more functionality to what Raku provides out of the box.
If you want more tweakability, you should probably look at the Getopt::Long module by Leon Timmermans

Dollar and exclamation mark (bang) symbols in VTL

I've recently encountered these two variables in some Velocity code:
$!variable1
!$variable2
I was surprised by the similarity of these so I became suspicious about the correctness of the code and become interested in finding the difference between two.
Is it possible that velocity allows any order of these two symbols or do they have different purpose? Do you know the answer?
#Jr. Here is the guide I followed when doing VM R&D: http://velocity.apache.org/engine/1.7/user-guide.html
Velocity uses the !$ and $! annotations for different things. If you use !$ it will basically be the same as a normal "!" operator, but the $! is used as a basic check to see if the variable is blank and if so it prints it out as an empty string. If your variable is empty or null and you don't use the $! annotation it will print the actual variable name as a string.
I googled and stackoverflowed a lot before I finally found the answer at people.apache.org.
According to that:
It is very easy to confuse the quiet reference notation with the
boolean not-Operator. Using the not-Operator, you use !${foo}, while
the quiet reference notation is $!{foo}. And yes, you will end up
sometimes with !$!{foo}...
Easy after all, shame it didn't struck me immediately. Hope this helps someone.

Grammar-Kit: how to handle comment tokens

From the documentation provided for grammar-kit, I cannot figure out how I am supposed to correctly handle something like comments. My lexer currently returns TokenType.WHITE_SPACE for any comment blocks, but then no unique IElementType is generated for me to do syntax highlighting on.
If I create an IElementType and tell flex to return that for comments, I can perform syntax highlighting, but then that token is not a part of my language spec in the BNF, and so it is considered invalid.
What is the correct way to pass comments through as white space, but perform syntax highlighting on them in Intellij/grammar-kit/jflex?
You can use Grammar-Kit implementation as a reference:
Lexer
Grammar
ParserDefinition
Using TokenType.WHITE_SPACE for comments is a bad idea.
More details can be found here.

Declare a variable

I didn't touch Prolog since high-school, and even though I've tried to find the info, it didn't help. Below is the example that has to illustrate my problem:
%% everybody():- [dana, cody, bess, abby].
%% Everybody = [dana, cody, bess, abby].
likes(dana, cody).
hates(bess, dana).
hates(cody, abby).
hates(X, Y):- \+ likes(X, Y).
likes_somebody(_, []):- fail.
likes_somebody(X, [girl | others]):-
likes(X, girl) ; likes_somebody(X, others).
likes_everybody(_, []):- true.
likes_everybody(X, [girl | others]):-
likes(X, girl) , likes_everybody(X, others).
maplist(likes_somebody, [dana, cody, bess, abby], [dana, cody, bess, abby]).
How do I declare everybody to just be the list of girls? The commented lines are those which I've tried, but I've got bizarre error messages back.
This is the tutorial I followed more or less so far. I'm using GProlog, if it makes any difference. Sorry for such a basic question. GProlog's manual doesn't deal with language syntax, but I've certainly looked there. As an aside, I would be grateful for information on where to look for language documentation (as opposed to implementation documentation).
Every variable in Prolog must begin with an uppercase letter. So for starters, you want Everybody, not everybody.
Second problem, variables in Prolog are not assignables. So probably what you want to do is make a fact and use that instead:
everybody([dana, cody, bess, abby]).
Your bottom line of code is actually a fact definition and will attempt to overwrite maplist/3. What you probably want to do is put everything above that line into a file (say, called likes.pl) and then consult it ([likes].). Then you can run a query like this:
?- everybody(Everybody), maplist(likes_somebody, Everybody, Everybody).
This won't work, because likes_somebody/2 processes a list in the second argument. The predicate you have for likes_somebody/2 could be written:
likes_somebody(_, []).
but still won't mean much. It simply unifies anything with the empty list:
?- likes_somebody(chicken_tacos, []).
true.
You really need a predicate to tell you if someone is a girl, like this:
girl(dana).
girl(cody).
girl(bess).
girl(abby).
Then you could do what I think you're trying to do, which is something closer to this:
likes_somebody(X) :- girl(X).
Then the maplist construction would work like this:
everybody(Everybody), maplist(likes_somebody, Everybody).
Which would simply return true. You could simplify and eliminate everybody/1 by instead using findall(Girl, girl(X), Everybody) but it's getting weird.
You're trying to do list processing with likes_everybody/2, but it's broken because girl is literally girl, not a variable, and others is literally others, not a list of some kind that could be the tail of another list.
I think you still have some old ideas you need to cleanse. Read some more, write some more, and your code will start to make a lot more sense.