I am using antlr 2.7.6.
I am programming a parser for plc 61131-3 ST language and I can't resolve an issue with my grammar.
The grammar is:
case_Stmt : 'CASE' expression 'OF' case_Selection + ( 'ELSE' stmt_List )? 'END_CASE';
case_Selection : case_List ':' stmt_List;
case_List : case_List_Elem ( ',' case_List_Elem )*;
case_List_Elem : subrange | constant_Expr;
constant_Expr : constant | enum_Value;
stmt_List : ( Stmt ? ';' )*;
stmt : assign_Stmt | subprog_Ctrl_Stmt | selection_Stmt | Iteration_Stmt;
assign_Stmt : ( variable ':=' expression )
enum_Value : ( identifier '#' )? identifier;
variable : identifier | ...
The problem occurs with "enum_Value" as "case_Selection", the parser interprets it as a new "stmt" instead of the new "Case_Selection" it was supposed to.
Example:
CASE (enumVariable) OF
enum#literal1: Variable1 := 1;
enum#liteal2: Variable1 := 2;
enum#liteal3: Variable1 := 3;
ELSE
Variable1 := 4;
END_CASE;
In the above example instead of taking " enum.liteal2" as the new "case_Selection" it interprets it as "assign_Stmt" and gives error because it doesn't found the ':='.
Is there a way to try to read the maximum of characthers till we find the ':' or the ':=' to understand if we realy have a new "stmt" or not?
Thank you!
Edit1: better syntax;
Related
I've just started using antlr so Id really appreciate the help! Im just trying to make a variable declaration declaration rule but its not working! Ive put the files Im working with below, please lmk if you need anything else!
INPUT CODE:
var test;
GRAMMAR G4 FILE:
grammar treetwo;
program : (declaration | statement)+ EOF;
declaration :
variable_declaration
| variable_assignment
;
statement:
expression
| ifstmnt
;
variable_declaration:
VAR NAME SEMICOLON
;
variable_assignment:
NAME '=' NUM SEMICOLON
| NAME '=' STRING SEMICOLON
| NAME '=' BOOLEAN SEMICOLON
;
expression:
operand operation operand SEMICOLON
| expression operation expression SEMICOLON
| operand operation expression SEMICOLON
| expression operation operand SEMICOLON
;
ifstmnt:
IF LPAREN term RPAREN LCURLY
(declaration | statement)+
RCURLY
;
term:
| NUM EQUALITY NUM
| NAME EQUALITY NUM
| NUM EQUALITY NAME
| NAME EQUALITY NAME
;
/*Tokens*/
NUM : '0' | '-'?[1-9][0-9]*;
STRING: [a-zA-Z]+;
BOOLEAN: 'true' | 'false';
VAR : 'var';
NAME : [a-zA-Z]+;
SEMICOLON : ';';
LPAREN: '(';
RPAREN: ')';
LCURLY: '{';
RCURLY: '}';
EQUALITY: '==' | '<' | '>' | '<=' | '>=' | '!=' ;
operation: '+' | '-' | '*' | '/';
operand: NUM;
IF: 'if';
WS : [ \t\r\n]+ -> skip;
Error I'm getting:
(line 1,char 0): mismatched input 'var' expecting {NUM, 'var', NAME, 'if'}
Your STRING rule is the same as your NAME rule.
With the ANTLR lexer, if two lexer rules match the same input, the first one declared will be used. As a result, you’ll never see a NAME token.
Most tutorials will show you have to dump out the token stream. It’s usually a good idea to view the token stream and verify your Lexer rules before getting too far into your parser rules.
I am creating a DSL with ANTLR and I want to define the following syntax
// study without parameters
study()
// study with a single parameter
study(x = 1)
// study with several parameters
study(x = 1, x = 2)
here my grammer ,it allows the following input : study(x=1x=2)
study: 'study' '(' ( assign* | ( assign (',' assign)*) ) ')' NEWLINE;
assign: ID '=' (INT | DATA );
INT : [0-9]+ ;
DATA : '"' ID '"' | '"' INT '"';
ID : [a-zA-Z]+ ;
Your grammar allows study(x=1x=2) because assign* matches x=1x=2. If you don't want to allow input like that, you should remove the assign* alternative. To allow empty parameter lists, you can just make everything between the parentheses optional:
study: 'study' '(' (assign (',' assign)*)? ')' NEWLINE;
I'm new to Antlr and I have the following simplified language:
grammar Hello;
sentence : targetAttributeName EQUALS expression+ (IF relationedExpression (logicalRelation relationedExpression)*)?;
expression :
'(' expression ')' |
expression ('*'|'/') expression |
expression ('+'|'-') expression |
function |
targetAttributeName |
NUMBER;
filterExpression :
'(' filterExpression ')' |
filterExpression ('*'|'/') filterExpression |
filterExpression ('+'|'-') filterExpression |
function |
filterAttributeName |
NUMBER |
DATE;
relationedExpression :
filterExpression ('<'|'<='|'>'|'>='|'=') filterExpression |
filterAttributeName '=' STRING |
STRING '=' filterAttributeName
;
logicalRelation :
'AND' |
'OR'
;
targetAttributeName :
'x'|
'y'
;
filterAttributeName :
'a' |
'a' '1' |
targetAttributeName;
function:
simpleFunction |
complexFunction ;
simpleFunction :
'simpleFunction' '(' expression ')' |
'simpleFunction2' '(' expression ')'
;
complexFunction :
'complexFunction' '(' expression ')' |
'complexFunction2' '(' expression ')'
;
EQUALS : '=';
IF : 'IF';
STRING : '"' [a-zA-z0-9]* '"';
NUMBER : [-]?[0-9]+('.'[0-9]+)?;
DATE: NUMBER NUMBER NUMBER NUMBER '.' NUMBER NUMBER? '.' NUMBER NUMBER? '.';
WS : [ \t\r\n]+ -> skip ; // skip spaces, tabs, newlines
It works with x = y * 2, but it doesn't work with x =y * 1.
The error message is the following:
Hello::sentence:1:7: mismatched input '1' expecting {'simpleFunction', 'complexFunction', 'x', 'y', 'complexFunction2', '(', 'simpleFunction2', NUMBER}
It is very strange for me, because 1 is a NUMBER...
If I change the filterAttribute from 'a' '1' to 'a1', then it works with x=y*1, but I don't understand the difference between the two cases. Could somebody explain it for me?
Thanks.
By doing this:
filterAttributeName :
'a' |
'a' '1' |
targetAttributeName;
ANTLR creates lexer rules from these inline tokens. So you really have a lexer grammar that looks like this:
T_1 : '1': // the rule name will probably be different though
T_a : 'a';
...
NUMBER : [-]?[0-9]+('.'[0-9]+)?;
In other words, the input 1 will be tokenized as T_1, not as a NUMBER.
EDIT
Whenever certain input can match two or more lexer rules, ANTLR chooses the one defined first. The lexer does not "listen" to the parser to see what it needs at a particular time. The lexing and parsing are 2 distinct phases. This is simply how ANTLR works, and many other other parser generators. If this is not acceptable for you, you should google for "scanner-less parsing", or "packrat parsers".
Im using a cut down version of a pascal grammar to create a compiler which converts pascal to javascript, however i keep running into this error
line 3:4 no viable alternative at input 'PROCEDURE'
line 3:38 extraneous input ':' expecting {'END', ';'}
line 5:4 no viable alternative at input 'VAR'
The following is the relevant parts of my Grammar:
grammar pascal;
program
: programHeading ('INTERFACE')?
block
DOT
;
programHeading
: 'PROGRAM' identifier (LPAREN identifierList RPAREN)? SEMI
| 'UNIT' identifier SEMI
;
identifier
: IDENT
;
block
: ( labelDeclarationPart
| constantDefinitionPart
| typeDefinitionPart
| variableDeclarationPart
| procedureAndFunctionDeclarationPart
| usesUnitsPart
| 'IMPLEMENTATION'
)*
| compoundStatement
;
procedureAndFunctionDeclarationPart
: procedureOrFunctionDeclaration SEMI
;
procedureOrFunctionDeclaration
: procedureDeclaration
| functionDeclaration
;
procedureDeclaration
: 'PROCEDURE' identifier (formalParameterList)? SEMI
( block | directive )
;
functionDeclaration
: 'FUNCTION' identifier (formalParameterList)? COLON resultType SEMI
( block | directive )
;
compoundStatement
: 'BEGIN'
statements
'END'
;
statements
: statement ( SEMI statement )*
;
statement
: label COLON unlabelledStatement
| unlabelledStatement
;
im using antlr-4.5-complete and was just hoping someone could shed some light on this.
This is the program im trying to compile:
PROGRAM Lesson1_PROGRAM3;
BEGIN
PROCEDURE DrawLine(X : Integer; Y : Integer);
VAR
Num1, Num2, Sum : Integer;
BEGIN
Write('Input number 1:');
Readln(Num1);
Writeln('Input number 2:');
Readln(Num2);
Sum := Num1 + Num2;
Writeln(Sum);
Readln;
IF Sel = '1' THEN
BEGIN
Total := N1 + N2;
Write('Press any key TO continue...');
Readkey;
GOTO 1;
END;
FOR Counter := 1 TO 7 DO
writeln('for loop');
Readln;
END;
END.
Hopefully this is just the right amount of information to help me solve this problem.
Given the following ANTLR3 syntax
grammar mygrammar;
program : statement* | function*;
function : ID '(' args ')' '->' statement+ (','statement+) '.' ;
args : arg (',' arg)*;
arg : ID ('->' expression)?;
statement : assignment
| number
| string
;
assignment : ID '->' expression;
string : UNICODE_STRING;
number : HEX_NUMBER | INTEGER ( '.' INTEGER )?;
// ================================================================
HEX_NUMBER : '0x' HEX_DIGIT+;
INTEGER : DIGIT+;
fragment
DIGIT : ('0'..'9');
Here is the line that is causing problems in the parser.
my_function(x, y, z -> 42) -> 10001.
ANTLRWorks highlights the last . after the 10001 in red as being a problem with the following error.
How can I make this stop throwing org.antlr.runtime.EarlyExitException?
I am sure this is because of some ambiguity between my number parser rule and trying to use the . as a EOL delimiter.
There is another ambiguity that also needs fixing. Change:
program : statement* | function*;
into:
program : (statement | function)*;
(although the 2 are not equivalent, I'm guessing you want the latter)
And in your function rule, you now defined there to be at least 2 statements:
function : ID '(' args ')' '->' statement (','statement)+ '.' ;
while I'm guessing you really want at least one:
function : ID '(' args ')' '->' statement (','statement)* '.' ;
Now, your real problem: since you're constructing floats in a parser rule, from the end of your input, 10001., the parser tries to construct a number of it, while you want it to match an INTEGER and then a ., as you yourself already said in your OP.
To fix this, you need to give the parser a bit of extra look-ahead to "see" beyond this ambiguity. Do that by adding the predicate (INTEGER '.' INTEGER)=> before actually matching said input:
number
: HEX_NUMBER
| (INTEGER '.' INTEGER)=> INTEGER '.' INTEGER
| INTEGER
;
Now your input will generate the following parse tree:
Perhaps unrelated, but I'm curious none-the-less:
function : ID '(' args ')' '->' statement+ (','statement+) '.' ;
Should this instead be:
function : ID '(' args ')' '->' statement (',' statement)* '.' ;
I think the first one would require a single comma in a function definition but the second one would require a comma as a statement separator.
Also, does the rule for args allow z -> 42 correctly?