Small grammar that doesn't work; what am I missing (antlr4) - antlr

I have the following grammar. It's supposed to accept the string shown in the comments in the header. It does not. I must be missing something fundamental. Hints on how to debug this would also be appreciated.
/*
Should accept:
b
a:b
a:b^10
b^10
Should not accept:
:b
a:
a:^10
*/
grammar test;
filter:
boostedField EOF
;
boostedField
: qualifiedField (CARET NUMBER)?
;
qualifiedField
: (FIELDNAME COLON)? term
;
term
: TERM
;
FIELDNAME: (LETTER | UNDERSCORE) (ALPHANUM | UNDERSCORE)* ;
NUMBER : NUM_CHAR+ ('.' NUM_CHAR+)? ;
COLON : ':' ;
CARET : '^' ;
WS : (' ' | '\t' | '\n' | '\r' | '\u3000') -> skip ;
UNDERSCORE: '_' ;
// a term may not have a colon or a caret (unless escaped)
TERM : TERM_START_CHAR TERM_CHAR*;
fragment TERM_START_CHAR
: ~( ' ' | '\t' | '\n' | '\r' | '\u3000' | ':' | '^' ) ;
fragment TERM_CHAR : (TERM_START_CHAR | ESCAPED_CHAR) ;
fragment ESCAPED_CHAR : ( '\\' . ) ;
fragment NUM_CHAR: '0'..'9';
fragment LETTER: 'a'..'z' | 'A'..'Z' ;
fragment ALPHANUM: LETTER | NUM_CHAR;

Related

grammar does not separate '123 and ] though the rule is set for it

I am new to antlr. I am trying to parse some queries like [network-traffic:src_port = '123] and [network-traffic:src_port =] and [network-traffic:src_port = ] and ... I have a grammar as follows:
grammar STIXPattern;
pattern
: observationExpressions EOF
;
observationExpressions
: <assoc=left> observationExpressions FOLLOWEDBY observationExpressions #observationExpressionsFollowedBY
| observationExpressionOr #observationExpressionOr_
;
observationExpressionOr
: <assoc=left> observationExpressionOr OR observationExpressionOr #observationExpressionOred
| observationExpressionAnd #observationExpressionAnd_
;
observationExpressionAnd
: <assoc=left> observationExpressionAnd AND observationExpressionAnd #observationExpressionAnded
| observationExpression #observationExpression_
;
observationExpression
: LBRACK comparisonExpression RBRACK # observationExpressionSimple
| LPAREN observationExpressions RPAREN # observationExpressionCompound
| observationExpression startStopQualifier # observationExpressionStartStop
| observationExpression withinQualifier # observationExpressionWithin
| observationExpression repeatedQualifier # observationExpressionRepeated
;
comparisonExpression
: <assoc=left> comparisonExpression OR comparisonExpression #comparisonExpressionOred
| comparisonExpressionAnd #comparisonExpressionAnd_
;
comparisonExpressionAnd
: <assoc=left> comparisonExpressionAnd AND comparisonExpressionAnd #comparisonExpressionAnded
| propTest #comparisonExpressionAndpropTest
;
propTest
: objectPath NOT? (EQ|NEQ) primitiveLiteral # propTestEqual
| objectPath NOT? (GT|LT|GE|LE) orderableLiteral # propTestOrder
| objectPath NOT? IN setLiteral # propTestSet
| objectPath NOT? LIKE StringLiteral # propTestLike
| objectPath NOT? MATCHES StringLiteral # propTestRegex
| objectPath NOT? ISSUBSET StringLiteral # propTestIsSubset
| objectPath NOT? ISSUPERSET StringLiteral # propTestIsSuperset
| LPAREN comparisonExpression RPAREN # propTestParen
| objectPath NOT? (EQ|NEQ) objectPathThl # propTestThlEqual
;
startStopQualifier
: START TimestampLiteral STOP TimestampLiteral
;
withinQualifier
: WITHIN (IntPosLiteral|FloatPosLiteral) SECONDS
;
repeatedQualifier
: REPEATS IntPosLiteral TIMES
;
objectPath
: objectType COLON firstPathComponent objectPathComponent?
;
objectPathThl
: varThlType DOT firstPathComponent objectPathComponent?
;
objectType
: IdentifierWithoutHyphen
| IdentifierWithHyphen
;
varThlType
: IdentifierWithoutHyphen
| IdentifierWithHyphen
;
firstPathComponent
: IdentifierWithoutHyphen
| StringLiteral
;
objectPathComponent
: <assoc=left> objectPathComponent objectPathComponent # pathStep
| '.' (IdentifierWithoutHyphen | StringLiteral) # keyPathStep
| LBRACK (IntPosLiteral|IntNegLiteral|ASTERISK) RBRACK # indexPathStep
;
setLiteral
: LPAREN RPAREN
| LPAREN primitiveLiteral (COMMA primitiveLiteral)* RPAREN
;
primitiveLiteral
: orderableLiteral
| BoolLiteral
| edgeCases
;
edgeCases
: QUOTE (IdentifierWithHyphen | IdentifierWithoutHyphen | IntNoSign) RBRACK
| RBRACK
;
orderableLiteral
: IntPosLiteral
| IntNegLiteral
| FloatPosLiteral
| FloatNegLiteral
| StringLiteral
| BinaryLiteral
| HexLiteral
| TimestampLiteral
;
IntNegLiteral :
'-' ('0' | [1-9] [0-9]*)
;
IntNoSign :
('0' | [1-9] [0-9]*)
;
IntPosLiteral :
'+'? ('0' | [1-9] [0-9]*)
;
FloatNegLiteral :
'-' [0-9]* '.' [0-9]+
;
FloatPosLiteral :
'+'? [0-9]* '.' [0-9]+
;
HexLiteral :
'h' QUOTE TwoHexDigits* QUOTE
;
BinaryLiteral :
'b' QUOTE
( Base64Char Base64Char Base64Char Base64Char )*
( (Base64Char Base64Char Base64Char Base64Char )
| (Base64Char Base64Char Base64Char ) '='
| (Base64Char Base64Char ) '=='
)
QUOTE
;
StringLiteral :
QUOTE ( ~['\\] | '\\\'' | '\\\\' )* QUOTE
;
BoolLiteral :
TRUE | FALSE
;
TimestampLiteral :
't' QUOTE
[0-9] [0-9] [0-9] [0-9] HYPHEN
( ('0' [1-9]) | ('1' [012]) ) HYPHEN
( ('0' [1-9]) | ([12] [0-9]) | ('3' [01]) )
'T'
( ([01] [0-9]) | ('2' [0-3]) ) COLON
[0-5] [0-9] COLON
([0-5] [0-9] | '60')
(DOT [0-9]+)?
'Z'
QUOTE
;
//////////////////////////////////////////////
// Keywords
AND: 'AND' ;
OR: 'OR' ;
NOT: 'NOT' ;
FOLLOWEDBY: 'FOLLOWEDBY';
LIKE: 'LIKE' ;
MATCHES: 'MATCHES' ;
ISSUPERSET: 'ISSUPERSET' ;
ISSUBSET: 'ISSUBSET' ;
LAST: 'LAST' ;
IN: 'IN' ;
START: 'START' ;
STOP: 'STOP' ;
SECONDS: 'SECONDS' ;
TRUE: 'true' ;
FALSE: 'false' ;
WITHIN: 'WITHIN' ;
REPEATS: 'REPEATS' ;
TIMES: 'TIMES' ;
// After keywords, so the lexer doesn't tokenize them as identifiers.
// Object types may have unquoted hyphens, but property names
// (in object paths) cannot.
IdentifierWithoutHyphen :
[a-zA-Z_] [a-zA-Z0-9_]*
;
IdentifierWithHyphen :
[a-zA-Z_] [a-zA-Z0-9_-]*
;
EQ : '=' | '==';
NEQ : '!=' | '<>';
LT : '<';
LE : '<=';
GT : '>';
GE : '>=';
QUOTE : '\'';
COLON : ':' ;
DOT : '.' ;
COMMA : ',' ;
RPAREN : ')' ;
LPAREN : '(' ;
RBRACK : ']' ;
LBRACK : '[' ;
PLUS : '+' ;
HYPHEN : MINUS ;
MINUS : '-' ;
POWER_OP : '^' ;
DIVIDE : '/' ;
ASTERISK : '*';
EQRBRAC : ']';
fragment HexDigit: [A-Fa-f0-9];
fragment TwoHexDigits: HexDigit HexDigit;
fragment Base64Char: [A-Za-z0-9+/];
// Whitespace and comments
//
WS : [ \t\r\n\u000B\u000C\u0085\u00a0\u1680\u2000\u2001\u2002\u2003\u2004\u2005\u2006\u2007\u2008\u2009\u200a\u2028\u2029\u202f\u205f\u3000]+ -> skip
;
COMMENT
: '/*' .*? '*/' -> skip
;
LINE_COMMENT
: '//' ~[\r\n]* -> skip
;
// Catch-all to prevent lexer from silently eating unusable characters.
InvalidCharacter
: .
;
Now when I feed [network-traffic:src_port = '123] I expect antlr to parse the query to '123 and ]
However the grammar return '123] and is not able to separate '123 and ]
Am missing anything?
grammar does not separate '123 and ] though the rule is set for it
That is not true. The quote and 123 are separate tokens. As demonstrated/suggested in your previous ANTLR question: start by printing all the tokens to your console to see what tokens are being created. This should always be the first thing you do when trying to debug an ANTLR grammar. It will save you a lot of time and headache.
The fact [network-traffic:src_port = '123] is not parsed properly, is because the ](RBRACK) is being consumed by the alternative observationExpressionSimple:
observationExpression
: LBRACK comparisonExpression RBRACK # observationExpressionSimple
| LPAREN observationExpressions RPAREN # observationExpressionCompound
| observationExpression startStopQualifier # observationExpressionStartStop
| observationExpression withinQualifier # observationExpressionWithin
| observationExpression repeatedQualifier # observationExpressionRepeated
;
Because RBRACK was already consumed by a parser rule, the edgeCases rule can't consume this RBRACK token as well.
To fix this, change your rule:
edgeCases
: QUOTE (IdentifierWithHyphen | IdentifierWithoutHyphen | IntNoSign) RBRACK
| RBRACK
;
into this:
edgeCases
: QUOTE (IdentifierWithHyphen | IdentifierWithoutHyphen | IntNoSign)
;
Now [network-traffic:src_port = '123] will be parsed properly:

Parsing DECAF grammar in ANTLR

I am creating a the parser for DECAF with Antlr
grammar DECAF ;
//********* LEXER ******************
LETTER: ('a'..'z'|'A'..'Z') ;
DIGIT : '0'..'9' ;
ID : LETTER( LETTER | DIGIT)* ;
NUM: DIGIT(DIGIT)* ;
COMMENTS: '//' ~('\r' | '\n' )* -> channel(HIDDEN);
WS : [ \t\r\n\f | ' '| '\r' | '\n' | '\t']+ ->channel(HIDDEN);
CHAR: (LETTER|DIGIT|' '| '!' | '"' | '#' | '$' | '%' | '&' | '\'' | '(' | ')' | '*' | '+'
| ',' | '-' | '.' | '/' | ':' | ';' | '<' | '=' | '>' | '?' | '#' | '[' | '\\' | ']' | '^' | '_' | '`'| '{' | '|' | '}' | '~'
'\t'| '\n' | '\"' | '\'');
// ********** PARSER *****************
program : 'class' 'Program' '{' (declaration)* '}' ;
declaration: structDeclaration| varDeclaration | methodDeclaration ;
varDeclaration: varType ID ';' | varType ID '[' NUM ']' ';' ;
structDeclaration : 'struct' ID '{' (varDeclaration)* '}' ;
varType: 'int' | 'char' | 'boolean' | 'struct' ID | structDeclaration | 'void' ;
methodDeclaration : methodType ID '(' (parameter (',' parameter)*)* ')' block ;
methodType : 'int' | 'char' | 'boolean' | 'void' ;
parameter : parameterType ID | parameterType ID '[' ']' ;
parameterType: 'int' | 'char' | 'boolean' ;
block : '{' (varDeclaration)* (statement)* '}' ;
statement : 'if' '(' expression ')' block ( 'else' block )?
| 'while' '(' expression ')' block
|'return' expressionA ';'
| methodCall ';'
| block
| location '=' expression
| (expression)? ';' ;
expressionA: expression | ;
location : (ID|ID '[' expression ']') ('.' location)? ;
expression : location | methodCall | literal | expression op expression | '-' expression | '!' expression | '('expression')' ;
methodCall : ID '(' arg1 ')' ;
arg1 : arg2 | ;
arg2 : (arg) (',' arg)* ;
arg : expression;
op: arith_op | rel_op | eq_op | cond_op ;
arith_op : '+' | '-' | '*' | '/' | '%' ;
rel_op : '<' | '>' | '<=' | '>=' ;
eq_op : '==' | '!=' ;
cond_op : '&&' | '||' ;
literal : int_literal | char_literal | bool_literal ;
int_literal : NUM ;
char_literal : '\'' CHAR '\'' ;
bool_literal : 'true' | 'false' ;
When I give it the input:
class Program {
void main(){
return 3+5 ;
}
}
The parse tree is not building correctly since it is not recognizing the 3+5 as an expression. Is there anything wrong with my grammar that is causing the problem?
Lexer rules are matched from top to bottom. When 2 or more lexer rules match the same amount of characters, the one defined first will win. Because of that, a single digit integer will get matched as a DIGIT instead of a NUM.
Try parsing the following instead:
class Program {
void main(){
return 33 + 55 ;
}
}
which will be parsed just fine. This is because 33 and 55 are matched as NUMs, because NUM can now match 2 characters (DIGIT only 1, so NUM wins).
To fix it, make DIGIT a fragment (and LETTER as well):
fragment LETTER: ('a'..'z'|'A'..'Z') ;
fragment DIGIT : '0'..'9' ;
ID : LETTER( LETTER | DIGIT)* ;
NUM: DIGIT(DIGIT)* ;
Lexer fragments are only used internally by other lexer rules, and will never become tokens of their own.
A couple of other things: your WS rule matches way too much (it now also matches a | and a '), it should be:
WS : [ \t\r\n\f]+ ->channel(HIDDEN);
and you shouldn't match a char literal in your parser: do it in the lexer:
CHAR : '\'' ( ~['\r\n\\] | '\\' ['\\] ) '\'';
If you don't, the following will not get parsed properly:
class Program {
void main(){
return '1';
}
}
because the 1 wil be tokenized as a NUM and not as a CHAR.

ANTLR - Field that accept attributes with more than one word

My Grammar file (see below) parses queries of the type:
(name = Jon AND age != 16 OR city = NY);
However, it doesn't allow something like:
(name = 'Jon Smith' AND age != 16);
ie, it doesn't allow assign to a field values with more than one word, separated by White Spaces. How can I modify my grammar file to accept that?
options
{
language = Java;
output = AST;
}
tokens {
BLOCK;
RETURN;
QUERY;
ASSIGNMENT;
INDEXES;
}
#parser::header {
package pt.ptinovacao.agorang.antlr;
}
#lexer::header {
package pt.ptinovacao.agorang.antlr;
}
query
: expr ('ORDER BY' NAME AD)? ';' EOF
-> ^(QUERY expr ^('ORDER BY' NAME AD)?)
;
expr
: logical_expr
;
logical_expr
: equality_expr (logical_op^ equality_expr)*
;
equality_expr
: NAME equality_op atom -> ^(equality_op NAME atom)
| '(' expr ')' -> ^('(' expr)
;
atom
: ID
| id_list
| Int
| Number
;
id_list
: '(' ID (',' ID)* ')'
-> ID+
;
NAME
: 'equipType'
| 'equipment'
| 'IP'
| 'site'
| 'managedDomain'
| 'adminState'
| 'dataType'
;
AD : 'ASC' | 'DESC' ;
equality_op
: '='
| '!='
| 'IN'
| 'NOT IN'
;
logical_op
: 'AND'
| 'OR'
;
Number
: Int ('.' Digit*)?
;
ID
: ('a'..'z' | 'A'..'Z' | '_' | '.' | '-' | Digit)*
;
String
#after {
setText(getText().substring(1, getText().length()-1).replaceAll("\\\\(.)", "$1"));
}
: '"' (~('"' | '\\') | '\\' ('\\' | '"'))* '"'
| '\'' (~('\'' | '\\') | '\\' ('\\' | '\''))* '\''
;
Comment
: '//' ~('\r' | '\n')* {skip();}
| '/*' .* '*/' {skip();}
;
Space
: (' ' | '\t' | '\r' | '\n' | '\u000C') {skip();}
;
fragment Int
: '1'..'9' Digit*
| '0'
;
fragment Digit
: '0'..'9'
;
indexes
: ('[' expr ']')+ -> ^(INDEXES expr+)
;
Include the String token as an alternative in your atom rule:
atom
: ID
| id_list
| Int
| Number
| String
;

Antlr 4 whitespace in string been eliminated

I'm using Antlr 4 to build a compiler for a made up language. I'm having problems with eliminating whitespace properly. It will get rid of whitespace between tokens but it also delete whitespace within the string token which is obviously not what I want. I've tried using modes to clear this issue up with no avail.
Lexer.g4
lexer grammar WaccLexer;
SEMICOLON: ';' ;
WS: [ \n\t\r\u000C]+ -> skip;
EOL: '\n' ;
BEGIN: 'begin' ;
END: 'end' ;
SKIP: 'skip' ;
READ: 'read' ;
FREE: 'free' ;
RETURN: 'return' ;
EXIT: 'exit' ;
IS: 'is' ;
PRINT: 'print' ;
PRINTLN: 'println' ;
IF: 'if' ;
THEN: 'then' ;
ELSE: 'else' ;
FI: 'fi' ;
WHILE: 'while' ;
DO: 'do' ;
DONE: 'done' ;
NEWPAIR: 'newpair' ;
CALL: 'call' ;
FST: 'fst' ;
SND: 'snd' ;
INT: 'int' ;
BOOL: 'bool' ;
CHAR: 'char' ;
STRING: 'string' ;
PAIR: 'pair' ;
EXCLAMATION: '!' ;
LEN: 'len' ;
ORD: 'ord' ;
TOINT: 'toInt' ;
DIGIT: '0'..'9' ;
LOWCHAR: 'a'..'z' ;
R: 'r' ;
F: 'f' ;
N: 'n' ;
T: 't' ;
B: 'b' ;
ZERO: '0' ;
MULTI: '*' ;
DIVIDE: '/' ;
MOD: '%' ;
PLUS: '+' ;
MINUS: '-' ;
GT: '>' ;
GTE: '>=' ;
LT: '<' ;
LTE: '<=' ;
DOUBLEEQUAL: '==' ;
EQUAL: '=' ;
NOTEQUAL: '!=' ;
AND: '&&' ;
OR: '||' ;
UNDERSCORE: '_' ;
UPCHAR: 'A'..'Z' ;
OPENSQUARE: '[' ;
CLOSESQUARE: ']' ;
OPENPARENTHESIS: '(' ;
CLOSEPARENTHESIS: ')' ;
TRUE: 'true' ;
FALSE: 'false' ;
SINGLEQUOT: '\'' ;
DOUBLEQUOT: '\"' ;
BACKSLASH: '\\' ;
COMMA: ',' ;
NULL: 'null' ;
OPENSTRING : DOUBLEQUOT -> pushMode(STRINGMODE) ;
COMMENT: '#' ~[\r\n]* '\r'? '\n' -> skip ;
mode STRINGMODE ;
CLOSESTRING : DOUBLEQUOT -> popMode ;
CHARACTER : ~[\"\'\\] | (BACKSLASH ESCAPEDCHAR) ;
STRLIT : (CHARACTER)* ;
ESCAPEDCHAR : ZERO
| B
| T
| N
| F
| R
| DOUBLEQUOT
| SINGLEQUOT
| BACKSLASH
;
Parser.g4
parser grammar WaccParser;
options {
tokenVocab=WaccLexer;
}
program : BEGIN (func)* stat END EOF;
func : type ident OPENPARENTHESIS (paramlist)? CLOSEPARENTHESIS IS stat END ;
paramlist : param (COMMA param)* ;
param : type ident ;
stat : SKIP
| type ident EQUAL assignrhs
| assignlhs EQUAL assignrhs
| READ assignlhs
| FREE expr
| RETURN expr
| EXIT expr
| PRINT expr
| PRINTLN expr
| IF expr THEN stat ELSE stat FI
| WHILE expr DO stat DONE
| BEGIN stat END
| stat SEMICOLON stat
;
assignlhs : ident
| expr OPENSQUARE expr CLOSESQUARE
| pairelem
;
assignrhs : expr
| arrayliter
| NEWPAIR OPENPARENTHESIS expr COMMA expr CLOSEPARENTHESIS
| pairelem
| CALL ident OPENPARENTHESIS (arglist)? CLOSEPARENTHESIS
;
arglist : expr (COMMA expr)* ;
pairelem : FST expr
| SND expr
;
type : basetype
| type OPENSQUARE CLOSESQUARE
| pairtype
;
basetype : INT
| BOOL
| CHAR
| STRING
;
pairtype : PAIR OPENPARENTHESIS pairelemtype COMMA pairelemtype CLOSEPARENTHESIS ;
pairelemtype : basetype
| type OPENSQUARE CLOSESQUARE
| PAIR
;
expr : intliter
| boolliter
| charliter
| strliter
| pairliter
| ident
| expr OPENSQUARE expr CLOSESQUARE
| unaryoper expr
| expr binaryoper expr
| OPENPARENTHESIS expr CLOSEPARENTHESIS
;
unaryoper : EXCLAMATION
| MINUS
| LEN
| ORD
| TOINT
;
binaryoper : MULTI
| DIVIDE
| MOD
| PLUS
| MINUS nus
| GT
| GTE
| LT
| LTE
| DOUBLEEQUAL
| NOTEQUAL
| AND
| OR
;
ident : (UNDERSCORE | LOWCHAR | UPCHAR) (UNDERSCORE | LOWCHAR | UPCHAR | DIGIT)* ;
intliter : (intsign)? (digit)+ ;
digit : DIGIT ;
intsign : PLUS
| MINUS
;
boolliter : TRUE
| FALSE
;
charliter : CHARACTER;
strliter : OPENSTRING STRLIT CLOSESTRING;
arrayliter : OPENSQUARE (expr (COMMA expr)*)? CLOSESQUARE ;
Please also remember that comment starting with # need to be ignored. Thanks in advance.
The OPENSTRING lexer rule will never be matched in your grammar because the DOUBLEQUOT rule matches exactly the same input sequence and appears before it in the grammar. If you want to define a lexer rule, but you do not actually want that lexer rule to create a token on its own, then you need to define the rule with the fragment modifier.
fragment DOUBLEQUOT : '"';
In addition, you need to correct the warnings that appear when you generate code for your grammar. At least one of them (defined as EPSILON_TOKEN) indicates a major mistake that you made that used to be an error in ANTLR 4.0 but was changed to a warning in ANTLR 4.1 since there is an edge case where it can be used without problems.

antlr gated predicate

This is a follow-up question from Antlr superfluous Predicate required? where I stated my problem in a simplified way, however it could not be solved there.
I have the following grammar and when I delete the {true}?=> predicates, the text is not recognized anymore. The input string is MODULE main LTLSPEC H {} {} {o} FALSE;. Note that the trailing ; is not tokenized as EOC, but as IGNORE. When I add {true}?=> to the EOC rule ; is tokenized as EOC.
I tried this from command-line with antlr-v3.3 and v3.4 without difference. Thanks in advance, I appreciate your help.
grammar NusmvInput;
options {
language = Java;
}
#parser::members{
public static void main(String[] args) throws Exception {
NusmvInputLexer lexer = new NusmvInputLexer(new ANTLRStringStream("MODULE main LTLSPEC H {} {} {o} FALSE;"));
NusmvInputParser parser = new NusmvInputParser(new CommonTokenStream(lexer));
parser.specification();
}
}
#lexer::members{
private boolean inLTL = false;
}
specification :
module+ EOF
;
module :
MODULE module_decl
;
module_decl :
NAME parameter_list ;
parameter_list
: ( LP (parameter ( COMMA parameter )*)? RP )?
;
parameter
: (NAME | INTEGER )
;
/**************
*** LEXER
**************/
COMMA
:{!inLTL}?=> ','
;
OTHER
: {!inLTL}?=>( '&' | '|' | 'xor' | 'xnor' | '=' | '!' |
'<' | '>' | '-' | '+' | '*' | '/' |
'mod' | '[' | ']' | '?')
;
RCP
: {!inLTL}?=>'}'
;
LCP
: {!inLTL}?=>'{'
;
LP
: {!inLTL}?=>'('
;
RP
: {!inLTL}?=>')'
;
MODULE
: {true}?=> 'MODULE' {inLTL = false;}
;
LTLSPEC
: {true}?=> 'LTLSPEC'
{inLTL = true; skip(); }
;
EOC
: ';'
{
if (inLTL){
inLTL = false;
skip();
}
}
;
WS
: (' ' | '\t' | '\n' | '\r')+ {$channel = HIDDEN;}
;
COMMENT
: '--' .* ('\n' | '\r') {$channel = HIDDEN;}
;
INTEGER
: {!inLTL}?=> ('0'..'9')+
;
NAME
:{!inLTL}?=> ('A'..'Z' | 'a'..'z') ('a'..'z' | 'A'..'Z' | '0'..'9' | '_' | '$' | '#' | '-')*
;
IGNORE
: {inLTL}?=> . {skip();}
;
It seems that without a predicate before MODULE and LTLSPEC, the NAME gets precedence over them even if these tokens are defined before the NAME token. Whether this is by design or a bug, I don't know.
However, the way you're trying to solve it seems rather complicated. As far as I can see, you seem to want to ignore (or skip) input starting with LTLSPEC and ending with a semi colon. Why not do something like this instead:
specification : module+ EOF;
module : MODULE module_decl;
module_decl : NAME parameter_list;
parameter_list : (LP (parameter ( COMMA parameter )*)? RP)?;
parameter : (NAME | INTEGER);
MODULE : 'MODULE';
LTLSPEC : 'LTLSPEC' ~';'* ';' {skip();};
COMMA : ',';
OTHER : ( '&' | '|' | 'xor' | 'xnor' | '=' | '!' |
'<' | '>' | '-' | '+' | '*' | '/' |
'mod' | '[' | ']' | '?')
;
RCP : '}';
LCP : '{';
LP : '(';
RP : ')';
EOC : ';';
WS : (' ' | '\t' | '\n' | '\r')+ {$channel = HIDDEN;};
COMMENT : '--' .* ('\n' | '\r') {$channel = HIDDEN;};
INTEGER : ('0'..'9')+;
NAME : ('A'..'Z' | 'a'..'z') ('a'..'z' | 'A'..'Z' | '0'..'9' | '_' | '$' | '#' | '-')*;