Referencing operator :: - operators

I'm used to using ( + ) to reference the pervasive integer addition operator, but this doesn't work for ( :: ):
OCaml version 4.01.0
# (+);;
- : int -> int -> int = <fun>
# ( :: );;
Error: Syntax error: operator expected.
# ( := );;
- : 'a ref -> 'a -> unit = <fun>
The expression grammar says
expr ::= ...
| expr :: expr
...
∣ [ expr { ; expr } [;] ]
...
| expr infix-op expr
...
and lexical conventions says
infix-symbol ::= (= ∣ < ∣ > ∣ # ∣ ^ ∣ | ∣ & ∣ + ∣ - ∣ * ∣ / ∣ $ ∣ %) { operator-char }
which seems to exclude both :: and := as infix operators, even though ( := ) works just fine.
What is ::'s status as an operator?
Is there a convenient handle for the list prepend operator or is (fun el ls -> el::ls) the best one can do?

:: is a base construction that can't be implemented with other constructions. := is an operator that can be implemented as let (:=) r v = r.contents <- v. But I agree that this contradicts the lexical conventions described in the manual.
For your problem of using (::), the best you can do is to give it a short name if you want to use it multiple times. let cons h t = h :: t

The cons operator :: is a constructor, and it can't be applied as an infix operator.
Have a look at the pervasive module for a list of all the ones you can use like that.

which seems to exclude both :: and := as infix operators, even though
( := ) works just fine.
You made a jump there. You cite
expr ::= ...
| expr infix-op expr
But then you did not look at infix-op, which is defined as
infix-op ::= infix-symbol
∣ * ∣ + ∣ - ∣ -. ∣ = ∣ != ∣ < ∣ > ∣ or ∣ || ∣ & ∣ && ∣ :=
∣ mod ∣ land ∣ lor ∣ lxor ∣ lsl ∣ lsr ∣ asr
So there, := is a infix operator, along with other ones like mod, etc. infix-symbol is only for custom infix operators.

Related

How do I convert this Antlr3 AST to Antlr4?

I'm trying to convert my existing Antlr3 project to Antlr4 to get more functionality. I have this grammar that wouldn't compile with Antlr4.9
expr
: term ( OR^ term )* ;
and
factor
: ava | NOT^ factor | (LPAREN! expr RPAREN!) ;
Mostly because Antlr4 doesn't support ^ and ! anymore. From the documentation it seems like those are
AST root operator. When generating abstract syntax trees (ASTs), token
references suffixed with the "^" root operator force AST nodes to be
created and added as the root of the current tree. This symbol is only
effective when the buildAST option is set. More information about ASTs
is also available.
AST exclude operator. When generating abstract syntax trees, token
references suffixed with the "!" exclude operator are not included in
the AST constructed for that rule. Rule references can also be
suffixed with the exclude operator, which implies that, while the tree
for the referenced rule is constructed, it is not linked into the tree
for the referencing rule. This symbol is only effective when the
buildAST option is set. More information about ASTs is also available.
If I took those out it would compile but I'm not sure what do those mean and how would Antlr4 supports it.
LPAREN and RPAREN is tokens
tokens {
EQUALS = '=';
LPAREN = '(';
RPAREN = ')';
}
which Antlr4 kindly provides the way to convert that in the error messages but not ^ and !. The grammar is for parsing boolean expression for example (a=b AND b=c)
I think this is the rule
targetingexpr returns [boolean value]
: expr { $value = $expr.value; } ;
expr returns [boolean value]
: ^(NOT a=expr) { $value = !a; }
| ^(AND a=expr b=expr) { $value = a && b; }
| ^(OR a=expr b=expr) { $value = a || b; }
| ^(EQUALS A=ALPHANUM B=ALPHANUM) { $value = targetingContext.contains($A.text,$B.text); }
;
The v3 grammar:
...
tokens {
EQUALS = '=';
LPAREN = '(';
RPAREN = ')';
}
...
expr
: term ( OR^ term )* ;
factor
: ava | NOT^ factor | (LPAREN! expr RPAREN!) ;
in v4 would look like this:
...
expr
: term ( OR term )* ;
factor
: ava | NOT factor | (LPAREN expr RPAREN) ;
EQUALS : '=';
LPAREN : '(';
RPAREN : ')';
So, just remove the inline ^ and ! operators (tree rewriting is no longer available in ANTLR4), and move the literal tokens in the tokens { ... } sections into own lexer rules.
I think this is the rule
targetingexpr returns [boolean value]
: expr { $value = $expr.value; } ;
expr returns [boolean value]
: ^(NOT a=expr) { $value = !a; }
| ^(AND a=expr b=expr) { $value = a && b; }
| ^(OR a=expr b=expr) { $value = a || b; }
| ^(EQUALS A=ALPHANUM B=ALPHANUM) { $value = targetingContext.contains($A.text,$B.text); }
;
What you posted there is part of a tree grammar for which there is no equivalent. In ANTLR4 you'd use a visitor to evaluate your expressions instead of inside a tree grammar.

Antlr4 Grammar for Function Application

I'm trying to write a simple lambda calculus grammar (show below). The issue I am having is that function application seems to be treated as right associative instead of left associative e.g. "f 1 2" is parsed as (f (1 2)) instead of ((f 1) 2). ANTLR has an assoc option for tokens, but I don't see how that helps here since there is no operator for function application. Does anyone see a solution?
LAMBDA : '\\';
DOT : '.';
OPEN_PAREN : '(';
CLOSE_PAREN : ')';
fragment ID_START : [A-Za-z+\-*/_];
fragment ID_BODY : ID_START | DIGIT;
fragment DIGIT : [0-9];
ID : ID_START ID_BODY*;
NUMBER : DIGIT+ (DOT DIGIT+)?;
WS : [ \t\r\n]+ -> skip;
parse : expr EOF;
expr : variable #VariableExpr
| number #ConstantExpr
| function_def #FunctionDefinition
| expr expr #FunctionApplication
| OPEN_PAREN expr CLOSE_PAREN #ParenExpr
;
function_def : LAMBDA ID DOT expr;
number : NUMBER;
variable : ID;
Thanks!
this breaks 4.1's pattern matcher for left-recursion. cleaned up in main branch I believe. try downloading last master and build. CUrrently 4.1 generates:
expr[int _p]
: ( {} variable
| number
| function_def
| OPEN_PAREN expr CLOSE_PAREN
)
(
{2 >= $_p}? expr
)*
;
for that rule. expr ref in loop is expr[0] actually, which isn't right.

How to handle the precedence of assignment operator in a PHP parser?

I wrote a PHP5 parser in ANTLR 3.4, which is almost ready, but I can not handle one of the tricky feature of PHP. My problem is with the precedence of assignment operator. As the PHP manual says the precedence of assignment is almost at the end of the list. Only and, xor, or and , are after it in the list.
But there is a note on this the manual page which says:
Although = has a lower precedence than most other operators, PHP will
still allow expressions similar to the following: if (!$a = foo()), in
which case the return value of foo() is put into $a.
The small example in the note isn't a problem for my parser, I can handle this as a special case in the assigment rule.
But there are more complex codes eg:
if ($a && $b = func()) {}
My parser fails here, because it recognizes first $a && $b and can not deal with the rest of the conditioin. This is because the && has higher precedence, than =.
If I put brackets around the right side of &&:
if ($a && ($b = func())) {}
In this way the parser recognizes the structure well.
The operators are built in the way that the ANTLR book recommends: there are the base exressions at the first step and each level of operators are coming after each other.
Is there any way to handle this precedence jumping?
Don't look at it as an assignment, but let's name it an assignment expression. Put this assignment expression "below" the unary expressions (so they have a higher precedence than the unary ones):
grammar T;
options {
output=AST;
}
tokens {
BLOCK;
FUNC_CALL;
EXPR_LIST;
}
parse
: stat* EOF!
;
stat
: assignment ';'!
| if_stat
;
assignment
: Var '='^ expr
;
if_stat
: If '(' expr ')' block -> ^(If expr block)
;
block
: '{' stat* '}' -> ^(BLOCK stat*)
;
expr
: or_expr
;
or_expr
: and_expr ('||'^ and_expr)*
;
and_expr
: unary_expr ('&&'^ unary_expr)*
;
unary_expr
: '!'^ assign_expr
| '-'^ assign_expr
| assign_expr
;
assign_expr
: Var ('='^ atom)*
| atom
;
atom
: Num
| func_call
;
func_call
: Id '(' expr_list ')' -> ^(FUNC_CALL Id expr_list)
;
expr_list
: (expr (',' expr)*)? -> ^(EXPR_LIST expr*)
;
If : 'if';
Num : '0'..'9'+;
Var : '$' Id;
Id : ('a'..'z')+;
Space : (' ' | '\t' | '\r' | '\n')+ {skip();};
If you'd now parse the source:
if (!$a = foo()) { $a = 1 && 2; }
if ($a && $b = func()) { $b = 2 && 3; }
if ($a = baz() && $b) { $c = 3 && 4; }
the following AST would get constructed:

adding (...) {...} function literals while abstaining from backtracking

Building off the answer found in How to have both function calls and parenthetical grouping without backtrack, I'd like to add function literals which are in a non LL(*) means implemented like
...
tokens {
...
FN;
ID_LIST;
}
stmt
: expr SEMI // SEMI=';'
;
callable
: ...
| fn
;
fn
: OPAREN opt_id_list CPAREN compound_stmt
-> ^(FN opt_id_list compound_stmt)
;
compound_stmt
: OBRACE stmt* CBRACE
opt_id_list
: (ID (COMMA ID)*)? -> ^(ID_LIST ID*)
;
What I'd like to do is allow anonymous function literals that have an argument list (e.g. () or (a) or (a, b, c)) followed by a compound_stmt. So (a, b, c){...} is good. But (x)(y){} not so much. (Of course (x) * (y){} is "valid" in terms of the parser, just as ((y){})()[1].x would be.)
The parser needs a bit of extra look ahead. I guess it could be done without it, but it would definitely result in some horrible looking parser rule(s) that are a pain to maintain and a parser that would accept (a, 2, 3){...} (a function literal with an expression-list instead of an id-list), for example. This would cause you to do quite a bit of semantic checking after the AST has been created.
The (IMO) best way to solve this is by adding the function literal rule in the callable and adding a syntactic predicate in front of it which will tell the parser to make sure there really is such an alternative before actually matching it.
callable
: (fn_literal)=> fn_literal
| OPAREN expr CPAREN -> expr
| ID
;
A demo:
grammar T;
options {
output=AST;
}
tokens {
// literal tokens
EQ = '==' ;
GT = '>' ;
LT = '<' ;
GTE = '>=' ;
LTE = '<=' ;
LAND = '&&' ;
LOR = '||' ;
PLUS = '+' ;
MINUS = '-' ;
TIMES = '*' ;
DIVIDE = '/' ;
OPAREN = '(' ;
CPAREN = ')' ;
OBRACK = '[' ;
CBRACK = ']' ;
DOT = '.' ;
COMMA = ',' ;
OBRACE = '{' ;
CBRACE = '}' ;
SEMI = ';' ;
// imaginary tokens
CALL;
INDEX;
LOOKUP;
UNARY_MINUS;
PARAMS;
FN;
ID_LIST;
STATS;
}
prog
: expr EOF -> expr
;
expr
: boolExpr
;
boolExpr
: relExpr ((LAND | LOR)^ relExpr)?
;
relExpr
: (a=addExpr -> $a) ( (oa=relOp b=addExpr -> ^($oa $a $b))
( ob=relOp c=addExpr -> ^(LAND ^($oa $a $b) ^($ob $b $c))
)?
)?
;
addExpr
: mulExpr ((PLUS | MINUS)^ mulExpr)*
;
mulExpr
: unaryExpr ((TIMES | DIVIDE)^ unaryExpr)*
;
unaryExpr
: MINUS atomExpr -> ^(UNARY_MINUS atomExpr)
| atomExpr
;
atomExpr
: INT
| call
;
call
: (callable -> callable) ( OPAREN params CPAREN -> ^(CALL $call params)
| OBRACK expr CBRACK -> ^(INDEX $call expr)
| DOT ID -> ^(INDEX $call ID)
)*
;
callable
: (fn_literal)=> fn_literal
| OPAREN expr CPAREN -> expr
| ID
;
fn_literal
: OPAREN id_list CPAREN compound_stmt -> ^(FN id_list compound_stmt)
;
id_list
: (ID (COMMA ID)*)? -> ^(ID_LIST ID*)
;
params
: (expr (COMMA expr)*)? -> ^(PARAMS expr*)
;
compound_stmt
: OBRACE stmt* CBRACE -> ^(STATS stmt*)
;
stmt
: expr SEMI
;
relOp
: EQ | GT | LT | GTE | LTE
;
ID : 'a'..'z'+ ;
INT : '0'..'9'+ ;
SPACE : (' ' | '\t') {skip();};
A parser generated by the grammar above would reject the input (x)(y){} while it properly parses the following 3 snippets of code:
1
(a, b, c){ a+b*c; }
2
(x) * (y){ x.y; }
3
((y){})()[1].x

yacc associativity of nonterminal symbols?

Say I have a grammar like this:
expr : expr '+' expr { $$ = operation('+', $1, $3); }
| expr '-' expr { $$ = operation('-', $1, $3); }
| expr '*' expr { $$ = operation('*', $1, $3); }
| expr '/' expr { $$ = operation('/', $1, $3); }
| num
;
Where each of those operators has a precedence attached and is marked as left associative.
Then I want to refactor my grammar such that:
op : '+' | '-' | '*' | '/' ;
expr : expr op expr { $$ = operation($2, $1, $3); }
| num
;
How does yacc (if even at all) determine the associativity and precedence of op in this case? Will it trace its way through all the possible precedences/associativities of +, -, * and / when evaluating op, or does defining an associativity for nonterminal symbols make no sense?
AFAIK, with precedence order for nonterminals, it uses the precedence of the rightmost terminal symbol, but I can't find any documentation on the associativity rules themselves for nonterminals.
The "normal" way to do this (as far as I'm aware) is to define a different expr type for each operator, that way you get very explicit control over what's happening.
Python's grammar is a good example of this: http://docs.python.org/reference/grammar.html.