What do the numbers mean in antlr4 error and warning messages ?

What do the numbers mean in antlr4 error and warning messages ? - antlr

Can someone point to the antlr 4 documentation or tell me about the numbers in error and warning messages ?
I have a lexer file and a parser file that is generating this warning:
warning(125): Sybase\SybTSqlParser.g4:1084:158: implicit definition of token R in parser
The numbers "1084:158" do not seem to correspond to a line number or character count.

After some inspiration by Bart Kiers jogging some old memory cells this is the explanation:
When compiling independent lexer and parser files, the lexer files lines are concatenated to the parser file line numbers.
If the lexer has 10 lines and the error is detected on parser file line 25, the error is reported on line 35.

Related

Antlr4 (java) tries to match all input to first token

my antlr (I'm using IntelliJ plugin) matches all my input to the first expression in my parser rule, which obviously causes an error.
Simple example:
grammar test;
rule : WORD '+' WORD;
WORD : [a-z]+;
Now testing:
input = 'faefae' gets me:
line 1:6 mismatched input '' expecting '+'
(so far it makes sense)
input = 'faefae+':
line 1:0 mismatched input 'faefae+' expecting WORD.
input = 'faefae+faefae':
line 1:0 mismatched input 'faefae+faefae' expecting WORD.
Last input should work, why doesn't it?
Help is much appreciated,
wish you all a nice day!

faefae+faefae will parse just fine.
You probably haven't regenerated the lexer/parser classes.
With IntelliJ and the ANTLR4 plugin, I get this:

ANTLR 4.5 - Mismatched Input 'x' expecting 'x'

I have been starting to use ANTLR and have noticed that it is pretty fickle with its lexer rules. An extremely frustrating example is the following:
grammar output;
test: FILEPATH NEWLINE TITLE ;
FILEPATH: ('A'..'Z'|'a'..'z'|'0'..'9'|':'|'\\'|'/'|' '|'-'|'_'|'.')+ ;
NEWLINE: '\r'? '\n' ;
TITLE: ('A'..'Z'|'a'..'z'|' ')+ ;
This grammar will not match something like:
c:\test.txt
x
Oddly if I change TITLE to be TITLE: 'x' ; it still fails this time giving an error message saying "mismatched input 'x' expecting 'x'" which is highly confusing. Even more oddly if I replace the usage of TITLE in test with FILEPATH the whole thing works (although FILEPATH will match more than I am looking to match so in general it isn't a valid solution for me).
I am highly confused as to why ANTLR is giving such extremely strange errors and then suddenly working for no apparent reason when shuffling things around.

This seems to be a common misunderstanding of ANTLR:
Language Processing in ANTLR:
The Language Processing is done in two strictly separated phases:
Lexing, i.e. partitioning the text into tokens
Parsing, i.e. building a parse tree from the tokens
Since lexing must preceed parsing there is a consequence: The lexer is independent of the parser, the parser cannot influence lexing.
Lexing
Lexing in ANTLR works as following:
all rules with uppercase first character are lexer rules
the lexer starts at the beginning and tries to find a rule that matches best to the current input
a best match is a match that has maximum length, i.e. the token that results from appending the next input character to the maximum length match is not matched by any lexer rule
tokens are generated from matches:
if one rule matches the maximum length match the corresponding token is pushed into the token stream
if multiple rules match the maximum length match the first defined token in the grammar is pushed to the token stream
Example: What is wrong with your grammar
Your grammar has two rules that are critical:
FILEPATH: ('A'..'Z'|'a'..'z'|'0'..'9'|':'|'\\'|'/'|' '|'-'|'_'|'.')+ ;
TITLE: ('A'..'Z'|'a'..'z'|' ')+ ;
Each match, that is matched by TITLE will also be matched by FILEPATH. And FILEPATH is defined before TITLE: So each token that you expect to be a title would be a FILEPATH.
There are two hints for that:
keep your lexer rules disjunct (no token should match a superset of another).
if your tokens intentionally match the same strings, then put them into the right order (in your case this will be sufficient).
if you need a parser driven lexer you have to change to another parser generator: PEG-Parsers or GLR-Parsers will do that (but of course this can produce other problems).

This was not directly OP's problem, but for those who have the same error message, here is something you could check.
I had the same Mismatched Input 'x' expecting 'x' vague error message when I introduced a new keyword. The reason for me was that I had placed the new key word after my VARNAME lexer rule, which assigned it as a variable name instead of as the new keyword. I fixed it by putting the keywords before the VARNAME rule.

Internal EBCDIC support for ANTLR 3.1.3?

I'm trying to use ANTLR 3.1.3 on a system with local EBCDIC char set
Even a simple grammar like this:
lexer grammar test;
GENERIC_ID
: (LETTER)*
;
fragment LETTER
: 'a' .. 'z'
;
results in these errors during the initial compile (java org.antlr.Tool test.g):
error(10): internal error: problem parsing group <unknown>: line 1:1: unexpected char: 0x7 : line 1:1: unexpected char: 0x7
org.antlr.stringtemplate.language.GroupLexer.nextToken(GroupLexer.java:233)
antlr.TokenBuffer.fill(TokenBuffer.java:69)
antlr.TokenBuffer.LA(TokenBuffer.java:80)
antlr.LLkParser.LA(LLkParser.java:52)
antlr.Parser.match(Parser.java:210)
org.antlr.stringtemplate.language.GroupParser.group(GroupParser.java:120)
org.antlr.stringtemplate.StringTemplateGroup.parseGroup(StringTemplateGroup.java:792)
org.antlr.stringtemplate.StringTemplateGroup.<init>(StringTemplateGroup.java:274)
org.antlr.stringtemplate.PathGroupLoader.loadGroup(PathGroupLoader.java:67)
org.antlr.stringtemplate.StringTemplateGroup.loadGroup(StringTemplateGroup.java:969)
org.antlr.stringtemplate.StringTemplateGroup.loadGroup(StringTemplateGroup.java:955)
org.antlr.codegen.CodeGenerator.loadTemplates(CodeGenerator.java:198)
org.antlr.codegen.CodeGenerator.genRecognizer(CodeGenerator.java:292)
org.antlr.Tool.generateRecognizer(Tool.java:607)
org.antlr.Tool.process(Tool.java:429)
org.antlr.Tool.main(Tool.java:91)
error(10): internal error: test.g : java.lang.IllegalArgumentException: Can't find template outputFile.st; group hierarchy is [null]
org.antlr.stringtemplate.StringTemplateGroup.lookupTemplate(StringTemplateGroup.java:507)
org.antlr.stringtemplate.StringTemplateGroup.getInstanceOf(StringTemplateGroup.java:392)
org.antlr.stringtemplate.StringTemplateGroup.getInstanceOf(StringTemplateGroup.java:404)
org.antlr.codegen.CodeGenerator.genRecognizer(CodeGenerator.java:314)
org.antlr.Tool.generateRecognizer(Tool.java:607)
org.antlr.Tool.process(Tool.java:429)
org.antlr.Tool.main(Tool.java:91)
the grammar file seems to be processed appropriately, but it appears something internal is causing some problems. No matter what chars I use in the grammar file, the illegal char always seems to be 0x7.
Can I not compile ANTLR on a system with local EBCDIC char set? Any suggestions?
Update: it appears the problem lies in the template files (.stg files). If I convert the files in the codegen/templates directory to EBCDIC (also ANTLRCore.sti) then the compilation seems to complete. Is there a way to tell java/antlr to not read these files in local encoding? Either that or are these template files available in other encodings? Otherwise I am forced to convert by hand and replace each one

ANTLR error when not enough, or too many, newlines

ANTLR gives me the following error when my input file has either no newline at the EOF, or more than one.
line 0:-1 mismatched input '' expecting NEWLINE
How would I go about taking into account the possibilities of having multiple or no newlines at the end of the input file. Preferably I'd like to account for this in the grammar.

The rule:
parse
: (Token LineBreak)+ EOF
;
only parses a stream of tokens, separated by exactly one line break, ending with exactly one line break.
While the rule:
parse
: Token (LineBreak+ Token)* LineBreak* EOF
;
parses a stream of tokens separated by one or more line breaks, ending with zero, one or more line breaks.
But do you really need to make the line breaks visible in the parser? Couldn't you put them on a "hidden channel" instead?
If this doesn't answer your question, you'll have to post your grammar (you can edit your original question for that).
HTH

Can a parser tell the lexer to ignore a newline?

I'm writing a preprocessor for my language. In the preprocessor I've output a line that wasn't in the source file. This causes any error messages that Anltr creates to be incremented by one line.
The Lexer handles the line count so I'm wondering if there is a way for the parser to tell the lexer to decrement the line count, or to ignore a specific newline.
I'm also open to other suggestions on how to work around this.
The only constraint I have is putting the extra line inline with the existing code. I'd prefer to keep it on it's own line to keep my parsing sane.

What if you remove the pre-processor line and replace it with the new line instead of adding a new one? Then parse the output of the pre-processor.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas