I was trying to write an operation that takes an undetermined amount of parameters, so if a user chooses not to fill one of the parameters then the operator changes its functionality.
oper
gen_NP = overload{
gen_NP : N -> NP =
\noun ->
mkNP(noun);
gen_NP : Str -> N -> NP =
\mdfir, noun ->
mkNP(mkN(mdfir) (noun));
....
}
But writing in this method would generate a huge number of overload with each new undetermined parameter.
So I used this method
oper
gen_NP : {noun : N ; mdfir : Str ; ....} -> NP =
\obj
case eqStr (obj.mdfir) ("") of {
PFalse =>
mkNP(mkN(mdfir) (noun));
PTrue =>
mkNP(noun);
};
}
When I tried the second method the program keep reporting:
Applying Predef.eqStr: Expected a value of type String, got VP (VGen 1 []) (LIdent(Id{rawId2utf8 = "mdfir"}))
Is there's a way to fix this problem, or is there's a better way to deal with an undetermined number of parameters?
Thank you
Best practices for overloading opers
A huge number of overloads is the intended way of doing things. Just look at any category in the RGL synopsis, you see easily over 20 overloads for a single function name. It may be annoying to define them, but that's something you only need to do once. Then when you use your overloads, it's much nicer to use them like this:
myRegularNoun = mkN "dog" ;
myIrregNoun = mkN "fish" "fish" ;
rather than being forced to give two arguments to everything:
myRegularNoun = mkN "dog" "" ;
myIrregNoun = mkN "fish" "fish" ;
So having several mkN instances is a feature, not a bug.
How to fix your code
I don't recommend using the Predef functions like eqStr, unless you really know what you're doing. For most cases when you need to check strings, you can use the standard pattern matching syntax. This is how to fix your function:
oper
gen_NP : {noun : N ; mdfir : Str} -> NP = \obj ->
case obj.mdfir of {
"" => mkNP obj.noun ;
_ => mkNP (mkN obj.mdfir obj.noun)
} ;
Testing in the GF shell, first with mdfir="":
> cc -unqual -table gen_NP {noun = mkN "dog" ; mdfir = ""}
s . NCase Nom => dog
s . NCase Gen => dog's
s . NPAcc => dog
s . NPNomPoss => dog
a . AgP3Sg Neutr
And now some non-empty string in mdfir:
> cc -unqual -table gen_NP {noun = mkN "dog" ; mdfir = "hello"}
s . NCase Nom => hello dog
s . NCase Gen => hello dog's
s . NPAcc => hello dog
s . NPNomPoss => hello dog
a . AgP3Sg Neutr
I have tried to write a grammar to recognize expressions like:
(A + MAX(B) ) / ( C - AVERAGE(A) )
IF( A > AVERAGE(A), 0, 1 )
X / (MAX(X)
Unfortunately antlr3 fails with these errors:
error(210): The following sets of rules are mutually left-recursive [unaryExpression, additiveExpression, primaryExpression, formula, multiplicativeExpression]
error(211): DerivedKeywords.g:110:13: [fatal] rule booleanTerm has non-LL(*) decision due to recursive rule invocations reachable from alts 1,2. Resolve by left-factoring or using syntactic predicates or using backtrack=true option.
error(206): DerivedKeywords.g:110:13: Alternative 1: after matching input such as decision cannot predict what comes next due to recursion overflow to additiveExpression from formula
I have spent some hours trying to fix these, it would be great if anyone could at least help me fix the first problem. Thanks
Code:
grammar DerivedKeywords;
options {
output=AST;
//backtrack=true;
}
WS : ( ' ' | '\t' | '\n' | '\r' )
{ skip(); }
;
//for numbers
DIGIT
: '0'..'9'
;
//for both integer and real number
NUMBER
: (DIGIT)+ ( '.' (DIGIT)+ )?( ('E'|'e')('+'|'-')?(DIGIT)+ )?
;
// Boolean operatos
AND : 'AND';
OR : 'OR';
NOT : 'NOT';
EQ : '=';
NEQ : '!=';
GT : '>';
LT : '<';
GTE : '>=';
LTE : '<=';
COMMA : ',';
// Token for Functions
IF : 'IF';
MAX : 'MAX';
MIN : 'MIN';
AVERAGE : 'AVERAGE';
VARIABLE : 'A'..'Z' ('A'..'Z' | '0'..'9')*
;
// OPERATORS
LPAREN : '(' ;
RPAREN : ')' ;
DIV : '/' ;
PLUS : '+' ;
MINUS : '-' ;
STAR : '*' ;
expression : formula;
formula
: functionExpression
| additiveExpression
| LPAREN! a=formula RPAREN! // First Problem
;
additiveExpression
: a=multiplicativeExpression ( (MINUS^ | PLUS^ ) b=multiplicativeExpression )*
;
multiplicativeExpression
: a=unaryExpression ( (STAR^ | DIV^ ) b=unaryExpression )*
;
unaryExpression
: MINUS^ u=unaryExpression
| primaryExpression
;
functionExpression
: f=functionOperator LPAREN e=formula RPAREN
| IF LPAREN b=booleanExpression COMMA p=formula COMMA s=formula RPAREN
;
functionOperator :
MAX | MIN | AVERAGE;
primaryExpression
: NUMBER
// Used for scientific numbers
| DIGIT
| VARIABLE
| formula
;
// Boolean stuff
booleanExpression
: orExpression;
orExpression : a=andExpression (OR^ b=andExpression )*
;
andExpression
: a=notExpression (AND^ b=notExpression )*
;
notExpression
: NOT^ t=booleanTerm
| booleanTerm
;
booleanOperator :
GT | LT | EQ | GTE | LTE | NEQ;
booleanTerm : a=formula op=booleanOperator b=formula
| LPAREN! booleanTerm RPAREN! // Second problem
;
error(210): The following sets of rules are mutually left-recursive [unaryExpression, additiveExpression, primaryExpression, formula, multiplicativeExpression]
- this means that if the parser enters unaryExpression rule, it has the possibility to match additiveExpression, primaryExpression, formula, multiplicativeExpression and unaryExpression again without ever consuming a single token from input - so it cannot decide whether to use those rules or not, because even if it uses the rules, the input will be the same.
You're probably trying to allow subexpressions in expressions by this sequence of rules - you need to make sure that path will consume the left parenthesis of the subexpression. Probably the formula alternative in primaryExpression should be changed to LPAREN formula RPAREN, and the rest of grammar be adjusted accordingly.
I defined the following grammar, following Scott Stanchfield tutorial.
grammar SampleScript;
program
:
declaration+
;
declaration
: macrodeclaration
;
macrodeclaration
:
MACRO STRING (LEFTPAREN parameters RIGHTPAREN)?
statement*
ENDMACRO
;
statement
: assignmentStatement
| ifStatement
| iterationStatement
| jumpStatement
| procedureCallStatement
| dimStatement
| labeledStatement
;
actualParameters
: expression (',' expression?)*
;
parameters
: ID (',' ID)*
;
assignmentStatement
: ID ASSIGN expression
| ID MATRIXASSIGN expression
;
ifStatement
: IF expression THEN (statement|compoundStatement)
(ELSE expression (statement|compoundStatement))?
;
iterationStatement
: WHILE expression compoundStatement
| FOR ID '=' expression TO expression (STEP expression)? compoundStatement
;
jumpStatement
: BREAK
| CONTINUE
| GOTO ID
| RETURN LEFTPAREN expression RIGHTPAREN
;
procedureCallStatement //todo: expression statement
: ID LEFTPAREN actualParameters? RIGHTPAREN
;
dimStatement
: DIM ID LEFTBRACKET expression(',' expression)* RIGHTBRACKET (',' ID LEFTBRACKET expression(',' expression)* RIGHTBRACKET)*
;
labeledStatement
: ID ':' statement
;
compoundStatement
: DO statement* END
;
term
: NUMBER
| STRING
| ID
| LEFTPAREN expression RIGHTPAREN //( )
| ID LEFTPAREN actualParameters RIGHTPAREN //Procedure Call
| ID (LEFTBRACKET expression RIGHTBRACKET)+ //Array Arr[3]
| ID ('.' expression)+ //Array Arr.Length
| LEFTBRACE (expression)? (',' expression)* RIGHTBRACE //{"OK","False"}
;
negation
: 'not'* term
;
unary
: ('-')* negation
;
mult
: unary (('*' | '/') unary)*
;
add
: mult (('+' | '-') mult)*
;
relation
: add (('=' | '/=' | '<' | '<=' | '>=' | '>') add)*
;
expression
: relation (('and' | 'or') relation)*
;
//Keywords
DIM: D I M;
RETURN: R E T U R N;
FOR: F O R;
STEP: S T E P;
TO: T O;
WHILE: W H I L E;
DO: D O;
END: E N D;
GOTO: G O T O;
BREAK: B R E A K;
CONTINUE: C O N T I N U E;
IF: I F;
THEN: T H E N;
ELSE: E L S E;
MACRO :M A C R O;
ENDMACRO :E N D M A C R O;
ID : ('_'|LETTER) ('_'|LETTER|DIGIT)*;
ASSIGN: '=';
MATRIXASSIGN: ':=';
LEFTPAREN : '(';
RIGHTPAREN : ')';
LEFTBRACKET : '[';
RIGHTBRACKET : ']';
LEFTBRACE : '{';
RIGHTBRACE : '}';
//STRING : '"' .*? '"' ; // match anything in "..."
STRING
: '"' (STRING_ESCAPE_SEQ|~('\n'|'\r'))*? '"'
| '\'' (STRING_ESCAPE_SEQ|~('\n'|'\r'))*? '\''
;
/// stringescapeseq ::= "\" <any source character>
fragment STRING_ESCAPE_SEQ //'\\"'
: '\\' .
;
UNSIGNED_INT : DIGIT+; //('0' | '1'..'9' '0'..'9'*);
UNSIGNED_FLOAT: DIGIT+ '.' DIGIT* Exponent?
| '.' DIGIT+ Exponent?
| DIGIT+ Exponent
;
NUMBER
: UNSIGNED_INT
| UNSIGNED_FLOAT
;
fragment DIGIT : [0-9] ; // not a token by itself
fragment Exponent : ('e'|'E') ('+'|'-')? (DIGIT)+ ;
LINE_COMMENT : '//' .*? '\r'? '\n' -> skip ; // Match "//" stuff '\n'
COMMENT : '/*' .*? '*/' -> skip ; // Match "/*" stuff "*/"
fragment A:('a'|'A');
fragment B:('b'|'B');
fragment C:('c'|'C');
fragment D:('d'|'D');
fragment E:('e'|'E');
fragment F:('f'|'F');
fragment G:('g'|'G');
fragment H:('h'|'H');
fragment I:('i'|'I');
fragment J:('j'|'J');
fragment K:('k'|'K');
fragment L:('l'|'L');
fragment M:('m'|'M');
fragment N:('n'|'N');
fragment O:('o'|'O');
fragment P:('p'|'P');
fragment Q:('q'|'Q');
fragment R:('r'|'R');
fragment S:('s'|'S');
fragment T:('t'|'T');
fragment U:('u'|'U');
fragment V:('v'|'V');
fragment W:('w'|'W');
fragment X:('x'|'X');
fragment Y:('y'|'Y');
fragment Z:('z'|'Z');
fragment LETTER : [A-Za-z];
WS : [ \t\n\r]+ -> skip ; // skip spaces, tabs, newlines
I am trying to parse following code
Macro 'test' (x)
a=1
b=2
c={}
d = x(3,4)
matrixinfo_skim = GetMatrixInfo(m_skim)
showmessage (i2s(a))
showarray(c)
endmacro
and gets the error below, I spent over 2 days on it and couldn't figure out why it could not parse the assignment statements a=1 and later? someone please help me..
[#0,0:4='Macro',<30>,1:0]
[#1,6:11=''test'',<41>,1:6]
[#2,13:13='(',<35>,1:13]
[#3,14:14='x',<32>,1:14]
[#4,15:15=')',<36>,1:15]
[#5,20:20='a',<32>,3:0]
[#6,21:21='=',<33>,3:1]
[#7,22:22='1',<42>,3:2]
[#8,25:25='b',<32>,4:0]
[#9,26:26='=',<33>,4:1]
[#10,27:27='2',<42>,4:2]
[#11,30:30='c',<32>,5:0]
[#12,31:31='=',<33>,5:1]
[#13,32:32='{',<39>,5:2]
[#14,33:33='}',<40>,5:3]
[#15,36:36='d',<32>,6:0]
[#16,38:38='=',<33>,6:2]
[#17,40:40='x',<32>,6:4]
[#18,41:41='(',<35>,6:5]
[#19,42:42='3',<42>,6:6]
[#20,43:43=',',<2>,6:7]
[#21,44:44='4',<42>,6:8]
[#22,45:45=')',<36>,6:9]
[#23,48:62='matrixinfo_skim',<32>,7:0]
[#24,64:64='=',<33>,7:16]
[#25,66:78='GetMatrixInfo',<32>,7:18]
[#26,79:79='(',<35>,7:31]
[#27,80:85='m_skim',<32>,7:32]
[#28,86:86=')',<36>,7:38]
[#29,91:101='showmessage',<32>,9:0]
[#30,103:103='(',<35>,9:12]
[#31,104:106='i2s',<32>,9:13]
[#32,107:107='(',<35>,9:16]
[#33,108:108='a',<32>,9:17]
[#34,109:109=')',<36>,9:18]
[#35,110:110=')',<36>,9:19]
[#36,113:121='showarray',<32>,10:0]
[#37,122:122='(',<35>,10:9]
[#38,123:123='c',<32>,10:10]
[#39,124:124=')',<36>,10:11]
[#40,127:134='endmacro',<31>,11:0]
[#41,140:139='<EOF>',<-1>,13:0]
line 3:2 extraneous input '1' expecting {'-', 'not', ID, '(', '{', STRING, NUMBER}
line 4:2 extraneous input '2' expecting {'-', 'not', ID, '(', '{', STRING, NUMBER}
line 6:6 mismatched input '3' expecting {'-', 'not', ID, '(', '{', STRING, NUMBER}
line 6:8 extraneous input '4' expecting {',', ')'}
(program (declaration (macrodeclaration Macro 'test' ( (parameters x) ) (statement (assignmentStatement a = (expression (relation (add (mult (unary 1 (negation (term b))))) = (add (mult (unary 2 (negation (term c))))) = (add (mult (unary (negation (term { }))))))))) (statement (assignmentStatement d = (expression (relation (add (mult (unary (negation (term x ( (actualParameters (expression (relation (add (mult (unary 3))))) , 4) )))))))))) (statement (assignmentStatement matrixinfo_skim = (expression (relation (add (mult (unary (negation (term GetMatrixInfo ( (actualParameters (expression (relation (add (mult (unary (negation (term m_skim)))))))) )))))))))) (statement (procedureCallStatement showmessage ( (actualParameters (expression (relation (add (mult (unary (negation (term i2s ( (actualParameters (expression (relation (add (mult (unary (negation (term a)))))))) ))))))))) ))) (statement (procedureCallStatement showarray ( (actualParameters (expression (relation (add (mult (unary (negation (term c)))))))) ))) endmacro)))
As the error messages indicate, things go wrong with the numbers which is matched by the expression in the assignmentStatement rule, which ultimately is (or should be) matched as a NUMBER in the term rule.
Looking at the lexer rules responsible for the creation of a NUMBER token:
UNSIGNED_INT : DIGIT+;
UNSIGNED_FLOAT: DIGIT+ '.' DIGIT* Exponent?
| '.' DIGIT+ Exponent?
| DIGIT+ Exponent
;
NUMBER
: UNSIGNED_INT
| UNSIGNED_FLOAT
;
it appears that a NUMBER token is never created since a NUMBER matches either a UNSIGNED_INT or an UNSIGNED_FLOAT. But since these 2 tokens are defined before the NUMBER is defined, the lexer creates UNSIGNED_INT and UNSIGNED_FLOAT tokens instead of NUMBER tokens.
You need to change UNSIGNED_INT and UNSIGNED_FLOAT into fragment rules instead:
fragment UNSIGNED_INT : DIGIT+;
fragment UNSIGNED_FLOAT: DIGIT+ '.' DIGIT* Exponent?
| '.' DIGIT+ Exponent?
| DIGIT+ Exponent
;
NUMBER
: UNSIGNED_INT
| UNSIGNED_FLOAT
;
Be sure to understand what a fragment is: What does "fragment" mean in ANTLR?
If I have expressions like:
(name = Paul AND age = 16) OR country = china;
And I want to get:
QUERY
|
|-------------|
() |
| |
AND OR
| |
|-------| |
name age country
| | |
Paul 16 china
How can I print the () and the condition (AND/OR) before the fields name, age country?
My grammar file is something like this:
parse
: block EOF -> block
;
block
: (statement)* (Return ID ';')?
-> ^(QUERY statement*)
;
statement
: assignment ';'
-> assignment
;
assignment
: expression (condition expression)*
-> ^(condition expression*)
| '(' expression (condition expression)* ')' (condition expression)*
-> ^(Brackets ^(condition expression*))
;
condition
: AND
| OR
;
Brackets: '()' ;
OR : 'OR' ;
AND : 'AND' ;
..
But it only prints the first condition that appears in the expression ('AND' in this example), and I can't group what is between brackets, and what is not...
Your grammar looks odd to me, and there are errors in it: if the parser does not match "()", you can't use Brackets inside a rewrite rule. And why would you ever want to have the token "()" inside your AST?
Given your example input:
(name = Paul AND age = 16) OR country = china;
here's possible way to construct an AST:
grammar T;
options {
output=AST;
}
query
: expr ';' EOF -> expr
;
expr
: logical_expr
;
logical_expr
: equality_expr ( logical_op^ equality_expr )*
;
equality_expr
: atom ( equality_op^ atom )*
;
atom
: ID
| INT
| '(' expr ')' -> expr
;
equality_op
: '='
| 'IS' 'NOT'?
;
logical_op
: 'AND'
| 'OR'
;
ID : ('a'..'z' | 'A'..'Z')+;
INT : '0'..'9'+;
WS : (' ' | '\t' | '\r' | '\n')+ {skip();};
which would result in this:
Is there any way to specify a grammar which allows the following syntax:
f(x)(g, (1-(-2))*3, 1+2*3)[0]
which is transformed into (in pseudo-lisp to show order):
(index
((f x)
g
(* (- 1 -2) 3)
(+ (* 2 3) 1)
)
0
)
along with things like limited operator precedence etc.
The following grammar works with backtrack = true, but I'd like to avoid that:
grammar T;
options {
output=AST;
backtrack=true;
memoize=true;
}
tokens {
CALL;
INDEX;
LOOKUP;
}
prog: (expr '\n')* ;
expr : boolExpr;
boolExpr
: relExpr (boolop^ relExpr)?
;
relExpr
: addExpr (relop^ addExpr)?
| a=addExpr oa=relop b=addExpr ob=relop c=addExpr
-> ^(LAND ^($oa $a $b) ^($ob $b $c))
;
addExpr
: mulExpr (addop^ mulExpr)?
;
mulExpr
: atomExpr (mulop^ atomExpr)?
;
atomExpr
: INT
| ID
| OPAREN expr CPAREN -> expr
| call
;
call
: callable ( OPAREN (expr (COMMA expr)*)? CPAREN -> ^(CALL callable expr*)
| OBRACK expr CBRACK -> ^(INDEX callable expr)
| DOT ID -> ^(INDEX callable ID)
)
;
fragment
callable
: ID
| OPAREN expr CPAREN
;
fragment
boolop
: LAND | LOR
;
fragment
relop
: (EQ|GT|LT|GTE|LTE)
;
fragment
addop
: (PLUS|MINUS)
;
fragment
mulop
: (TIMES|DIVIDE)
;
EQ : '==' ;
GT : '>' ;
LT : '<' ;
GTE : '>=' ;
LTE : '<=' ;
LAND : '&&' ;
LOR : '||' ;
PLUS : '+' ;
MINUS : '-' ;
TIMES : '*' ;
DIVIDE : '/' ;
ID : ('a'..'z')+ ;
INT : '0'..'9' ;
OPAREN : '(' ;
CPAREN : ')' ;
OBRACK : '[' ;
CBRACK : ']' ;
DOT : '.' ;
COMMA : ',' ;
There are a couple of things wrong with your grammar:
1
Only lexer rules can be fragments, not parser rules. Some ANTLR targets simply ignore the fragment keyword in front of parser rules (like the Java target), but better just remove them from your grammar: if you decide to create a parser for a different target-language, you may run into problems because of it.
2
Without the backtrack=true, you cannot mix tree-rewrite operators (^ and !) and rewrite rules (->) because you need to create a single alternative inside relExpr instead of the two alternatives you now have (this is to eliminate an ambiguity).
In your case, you can't create the desired AST with just ^ (inside a single alternative), so you'll need to do it like this:
relExpr
: (a=addExpr -> $a) ( (oa=relOp b=addExpr -> ^($oa $a $b))
( ob=relOp c=addExpr -> ^(LAND ^($oa $a $b) ^($ob $b $c))
)?
)?
;
(yes, I know, it's not particularly pretty, but that can't be helped AFAIK)
Also, you can only put the LAND token in the rewrite rules if it is defined in the tokens { ... } block:
tokens {
// literal tokens
LAND='&&';
...
// imaginary tokens
CALL;
...
}
Otherwise you can only use tokens (and other parser rules) in rewrite rules if they really occur inside the parser rule itself.
3
You did not account for the unary minus in your grammar, implement it like this:
mulExpr
: unaryExpr ((TIMES | DIVIDE)^ unaryExpr)*
;
unaryExpr
: MINUS atomExpr -> ^(UNARY_MINUS atomExpr)
| atomExpr
;
Now, to create a grammar that does not need backtrack=true, remove the ID and '(' expr ')' from your atomExpr rule:
atomExpr
: INT
| call
;
and make everything passed callable optional inside your call rule:
call
: (callable -> callable) ( OPAREN params CPAREN -> ^(CALL $call params)
| OBRACK expr CBRACK -> ^(INDEX $call expr)
| DOT ID -> ^(INDEX $call ID)
)*
;
That way, ID and '(' expr ')' are already matched by call (and there's no ambiguity).
Taken all the remarks above into account, you could get the following grammar:
grammar T;
options {
output=AST;
}
tokens {
// literal tokens
EQ = '==' ;
GT = '>' ;
LT = '<' ;
GTE = '>=' ;
LTE = '<=' ;
LAND = '&&' ;
LOR = '||' ;
PLUS = '+' ;
MINUS = '-' ;
TIMES = '*' ;
DIVIDE = '/' ;
OPAREN = '(' ;
CPAREN = ')' ;
OBRACK = '[' ;
CBRACK = ']' ;
DOT = '.' ;
COMMA = ',' ;
// imaginary tokens
CALL;
INDEX;
LOOKUP;
UNARY_MINUS;
PARAMS;
}
prog
: expr EOF -> expr
;
expr
: boolExpr
;
boolExpr
: relExpr ((LAND | LOR)^ relExpr)?
;
relExpr
: (a=addExpr -> $a) ( (oa=relOp b=addExpr -> ^($oa $a $b))
( ob=relOp c=addExpr -> ^(LAND ^($oa $a $b) ^($ob $b $c))
)?
)?
;
addExpr
: mulExpr ((PLUS | MINUS)^ mulExpr)*
;
mulExpr
: unaryExpr ((TIMES | DIVIDE)^ unaryExpr)*
;
unaryExpr
: MINUS atomExpr -> ^(UNARY_MINUS atomExpr)
| atomExpr
;
atomExpr
: INT
| call
;
call
: (callable -> callable) ( OPAREN params CPAREN -> ^(CALL $call params)
| OBRACK expr CBRACK -> ^(INDEX $call expr)
| DOT ID -> ^(INDEX $call ID)
)*
;
callable
: ID
| OPAREN expr CPAREN -> expr
;
params
: (expr (COMMA expr)*)? -> ^(PARAMS expr*)
;
relOp
: EQ | GT | LT | GTE | LTE
;
ID : 'a'..'z'+ ;
INT : '0'..'9'+ ;
SPACE : (' ' | '\t') {skip();};
which would parse the input "a >= b < c" into the following AST:
and the input "f(x)(g, (1-(-2))*3, 1+2*3)[0]" as follows: