antlr grammar multiple alternatives

antlr grammar multiple alternatives - antlr

I have this simple grammar for a C# like syntax. I can't figure out any way to separate fields and methods. All the examples I've seen for parsing C# combine fields and methods in the same rule. I would like to split them up as my synatx is pretty simple.
grammar test;
options
{
language =CSharp2;
k = 3;
output = AST;
}
SEMI : ';' ;
LCURLY : '{' ;
RCURLY : '}' ;
LPAREN : '(' ;
RPAREN : ')' ;
DOT :'.';
IDENTIFIER
: ( 'a'..'z' | 'A'..'Z' | '_' )
( 'a'..'z' | 'A'..'Z' | '_' | '0'..'9' )*
;
namespaceName
: IDENTIFIER (DOT IDENTIFIER)*
;
classDecl
: 'class' IDENTIFIER LCURLY (fieldDecl | methodDecl)* RCURLY
;
fieldDecl
: namespaceName IDENTIFIER SEMI;
methodDecl
: namespaceName IDENTIFIER LPAREN RPAREN SEMI;
I always end up wit this warning
Decision can match input such as "IDENTIFIER DOT IDENTIFIER" using multiple alternatives: 1, 2

Since namespaceName can be IDENTIFIER DOT IDENTIFIER DOT IDENTIFIER ... I think you have problems with k=3 in your options.
Can you remove the K option, ANTLR will default to K=*.

Related

How to allow an identifer which can start with a digit without causing MismatchedTokenException

I want to match the following input:
statement span=1m 0_dur=12
with the following grammar:
options {
language = Java;
output=AST;
ASTLabelType=CommonTree;
}
statement :'statement' 'span' '=' INTEGER 'm' ident '=' INTEGER;
INTEGER
: DIGIT+
;
ident : IDENT | 'AVG' | 'COUNT';
IDENT
: (LETTER | DIGIT | '_')+ ;
WHITESPACE
: ( ' '
| '\t'
| '\r'
| '\n'
) {$channel=HIDDEN;}
;
fragment
LETTER : ('a'..'z' | 'A'..'Z') ;
fragment
DIGIT : '0'..'9';
but it cause an error:
MismatchedTokenException : line 1:15 mismatched input '1m' expecting '\u0004'
Does anyone has any idea how to solve this?
THanks
Charles

I think your grammar is context sensitive, even at the lexical analyser(Tokenizer) level. The string "1m" is recognized as IDENT, not INTEGER followed by 'm'. You either redefine your syntax, or use predicated parsing, or embed Java code in your grammar to detect the context (e.g. If the number is presented after "span" followed by "=", then parse it as INTEGER).

fatal error in grammar - piecewise definition

I am translating a grammar from LALR to ANTLR and I am having trouble with translating this one rule, piecewise expression.
Attached is the sample grammar:
grammar Test;
options {
language = Java;
output = AST;
}
parse : expression ';'
;
expression : binaryExpression
| piecesExpression
;
binaryExpression : addingExpression (('=='|'!='|'<='|'>='|'>'|'<') addingExpression)*
;
addingExpression : multiplyingExpression (('+'|'-') multiplyingExpression)*
;
multiplyingExpression : unaryExpression
(('*'|'/') unaryExpression)*
;
unaryExpression: ('!'|'-')* primitiveElement;
primitiveElement : literalExpression
| id
| '(' expression ')'
;
literalExpression : INT
;
id : IDENTIFIER
;
piecesExpression : 'piecewise' '{' piece expression '}' ('(' expression ',' expression ')')? expression?
;
piece : expression '->' expression ';' (expression '->' expression ';')*
;
// L E X I C A L R U L E S
INT : DIGITS ;
IDENTIFIER : LETTER (LETTER | DIGIT)*;
WS : ( ' '
| '\t'
| '\r'
| '\n'
) {$channel=HIDDEN;}
;
fragment LETTER : ('a'..'z' | 'A'..'Z' | '_') ;
fragment DIGITS: DIGIT+;
fragment DIGIT : '0'..'9';
ANTLR v3.5 is complaining about the piecesExpression rule. It has 2 fatal errors and I would rather not use backtrack option.
Expected results:
piecewise {t -> s; t -> x; 100}
piecewise {t -> s; t -> x; 100} (0, x+1)
piecewise {t -> s; t -> x; 100} (0, x+1) y+5
How can piecesExpression be able to capture the above results?
Thanks in advance!

ANTLR has problems determining which alternatives to take in (at least) 2 cases:
piece starts with a expression but inside the piecewise{...}, it should also end with an expression
piecesExpression ends with '(' expression ... but also has an optional trailing expression (and an primitiveElement also matches '(' expression ... in its turn)
There's no need to use global backtracking, but without rewriting many rules, you do need to add some predicates (the (...)=> in the example below) to fix the two issues outlined above.
Try this:
piecesExpression
: 'piecewise' '{' ((expression '->')=> piece)+ expression '}'
( ('(' expression ',')=> '(' expression ',' expression ')' expression?
| expression
)
;
piece
: expression '->' expression ';'
;

Why is this grammar giving me a "non LL(*) decision" error?

I am trying to add support for expressions in my grammar. I am following the example given by Scott Stanchfield's Antlr Tutorial. For some reason the add rule is causing an error. It is causing a non-LL(*) error saying, "Decision can match input such as "'+'..'-' IDENT" using multiple alternatives"
Simple input like:
a.b.c + 4
causes the error. I am using the AntlrWorks Interpreter to test my grammar as I go. There seems to be a problem with how the tree is built for the unary +/- and the add rule. I don't understand why there are two possible parses.
Here's the grammar:
path : (IDENT)('.'IDENT)* //(NAME | LCSTNAME)('.'(NAME | LCSTNAME))*
;
term : path
| '(' expression ')'
| NUMBER
;
negation
: '!'* term
;
unary : ('+' | '-')* negation
;
mult : unary (('*' | '/' | '%') unary)*
;
add : mult (( '+' | '-' ) mult)*
;
relation
: add (('==' | '!=' | '<' | '>' | '>=' | '<=') add)*
;
expression
: relation (('&&' | '||') relation)*
;
multiFunc
: IDENT expression+
;
NUMBER : DIGIT+ ('.'DIGIT+)?
;
IDENT : (LCLETTER|UCLETTER)(LCLETTER|UCLETTER|DIGIT|'_')*
;
COMMENT
: '//' ~('\n'|'\r')* '\r'? '\n' {$channel=HIDDEN;}
| '/*' ( options {greedy=false;} : . )* '*/' {$channel=HIDDEN;}
;
WS : (' ' | '\t' | '\r' | '\n' | '\f')+ {$channel = HIDDEN;}
;
fragment
LCLETTER
: 'a'..'z'
;
fragment
UCLETTER: 'A'..'Z'
;
fragment
DIGIT : '0'..'9'
;
I need an extra set of eyes. What am I missing?

The fact that you let one or more expressions match in:
multiFunc
: IDENT expression+
;
makes your grammar ambiguous. Let's say you're trying to match "a 1 - - 2" using the multiFunc rule. The parser now has 2 possible ways to parse this: a is matched by IDENT, but the 2 minus signs 1 - - 2 cause trouble for expression+. The following 2 parses are possible:
parse 1
parse 2

Your grammar in rule multiFunc has a list of expressions. An expression can begin with + or - on behalf of unary, thus due to the list, it can also be followed by the same tokens. This is in conflict with the add rule: there is a problem deciding between continuation and termination.

ANTLR - Semantic predicate and LL(1)

I want to make a LL(1) grammer in ANTLR that allows a multiple assigment, like:
x = y = 5;
I think semantic predicate are usefull in this situation, but the following rules won't work :(
tokens {
BECOMES = '='
}
assignment_statement
: IDENTIFIER BECOMES expr
;
expr
: (IDENTIFIER BECOMES)=> IDENTIFIER BECOMES expr
| expr_or
;
IDENTIFIER
: LETTER (LETTER | DIGIT)*
;
ANTLRWORKS gives a NoViableAltException.
Do you know what I did wrong and how to make this work?
Thank you!

A grammar with a syntactic (not semantic) predicate that looks ahead 2 tokens isn't LL(1), of course.
But, you don't need a predicate, simply do something like this:
grammar T;
options {
output=AST;
}
tokens {
BECOMES = '=';
}
assignment_statement
: (IDENTIFIER BECOMES)+ expr ';'
;
expr
: IDENTIFIER
| NUMBER
;
IDENTIFIER
: LETTER (LETTER | DIGIT)*
;
NUMBER
: DIGIT+
;
fragment LETTER : 'a'..'z' | 'A'..'Z';
fragment DIGIT : '0'..'9';
which would parse the input "x=y=5;" as follows:
but would reject input like "x=2=3;".
Also, ANTLRWorks' interpreter doesn't work with any kind of predicate: use ANTLRWorks' debugger instead.

Parse sentences with different word types

I'm looking for a grammar for analyzing two type of sentences, that
means words separated by white spaces:
ID1: sentences with words not beginning with numbers
ID2: sentences with words not beginning with numbers and numbers
Basically, the structure of the grammar should look like
ID1 separator ID2
ID1: Word can contain number like Var1234 but not start with a number
ID2: Same as above but 1234 is allowed
separator: e. g. '='
#Bart
I just tried to add two tokens '_' and '"' as lexer-rule Special for later use in lexer-rule Word.
Even I haven't used Special in the following grammar, I get the following error in ANTLRWorks 1.4.2:
The following token definitions can never be matched because prior tokens match the same input: Special
But when I add fragment before Special, I don't get that error. Why?
grammar Sentence1b1;
tokens
{
TCUnderscore = '_' ;
TCQuote = '"' ;
}
assignment
: id1 '=' id2
;
id1
: Word+
;
id2
: ( Word | Int )+
;
Int
: Digit+
;
// A word must start with a letter
Word
: ( 'a'..'z' | 'A'..'Z') ('a'..'z' | 'A'..'Z' | Digit )*
;
Special
: ( TCUnderscore | TCQuote )
;
Space
: ( ' ' | '\t' | '\r' | '\n' ) { $channel = HIDDEN; }
;
fragment Digit
: '0'..'9'
;
Lexer-rule Special shall then be used in lexer-rule Word:
Word
: ( 'a'..'z' | 'A'..'Z' | Special ) ('a'..'z' | 'A'..'Z' | Special | Digit )*
;

I'd go for something like this:
grammar Sentence;
assignment
: id1 '=' id2
;
id1
: Word+
;
id2
: (Word | Int)+
;
Int
: Digit+
;
// A word must start with a letter
Word
: ('a'..'z' | 'A'..'Z') ('a'..'z' | 'A'..'Z' | Digit)*
;
Space
: (' ' | '\t' | '\r' | '\n') {skip();}
;
fragment Digit
: '0'..'9'
;
which will parse the input:
Word can contain number like Var1234 but not start with a number = Same as above but 1234 is allowed
as follows:
EDIT
To keep lexer rule nicely packed together, I'd keep them all at the bottom of the grammar instead of partly in the tokens { ... } block, which I only use for defining "imaginary tokens" (used in AST creation):
// wrong!
Special : (TCUnderscore | TCQuote);
TCUnderscore : '_';
TCQuote : '"';
Now, with the rules above, TCUnderscore and TCQuote can never become a token because when the lexer stumbles upon a _ or ", a Special token is created. Or in this case:
// wrong!
TCUnderscore : '_';
TCQuote : '"';
Special : (TCUnderscore | TCQuote);
the Special token can never be created because the lexer would first create TCUnderscore and TCQuote tokens. Hence the error:
The following token definitions can never be matched because prior tokens match the same input: ...
If you make TCUnderscore and TCQuote a fragment rule, you don't have that problem because fragment rules only "serve" other lexer rules. So this works:
// good!
Special : (TCUnderscore | TCQuote);
fragment TCUnderscore : '_';
fragment TCQuote : '"';
Also, fragment rules can therefor never be "visible" in any of your parser rules (the lexer will never create a TCUnderscore or TCQuote token!).
// wrong!
parse : TCUnderscore;
Special : (TCUnderscore | TCQuote);
fragment TCUnderscore : '_';
fragment TCQuote : '"';

I'm not sure if that fits your needs but with Bart's help in my post
ANTLR - identifier with whitespace
i came to this grammar:
grammar PropertyAssignment;
assignment
: id_nodigitstart '=' id_digitstart EOF
;
id_nodigitstart
: ID_NODIGITSTART+
;
id_digitstart
: (ID_DIGITSTART|ID_NODIGITSTART)+
;
ID_NODIGITSTART
: ('a'..'z'|'A'..'Z') ('a'..'z'|'A'..'Z'|'0'..'9')*
;
ID_DIGITSTART
: ('0'..'9'|'a'..'z'|'A'..'Z')+
;
WS : (' ')+ {skip();}
;
"a name = my 4value" works while "4a name = my 4value" causes an exception.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

antlr grammar multiple alternatives - antlr

Since namespaceName can be IDENTIFIER DOT IDENTIFIER DOT IDENTIFIER ... I think you have problems with k=3 in your options. Can you remove the K option, ANTLR will default to K=*.

Related

How to allow an identifer which can start with a digit without causing MismatchedTokenException

fatal error in grammar - piecewise definition

Why is this grammar giving me a "non LL(*) decision" error?

ANTLR - Semantic predicate and LL(1)

Parse sentences with different word types

Categories

Resources