Problem with the lookup in a Map to get the value of a stored variable - kframework

I am facing an issue when I try to do a lookup to get the value of a variable stored in an env configuration which is a Map (common pattern). Here is the extracted code why the syntax and the rewriting rule that create the issue:
syntax Exp
::= Id
| Value
| "read" "(" Exp ")"
syntax Value
::= ...
| #loc(Int)
rule <k> read ( X:Id ) => read ( #loc( L ) ) ... </k>
<env> ... X |-> L ... </env>
When I krun a little program, it become stuck in this configuration:
<k>
read ( x ) ~> #freezer_;_OSL-SYNTAX_Stmt_Exp0_ ( ) ~> .Stmts ~> .
</k>
<env>
x |-> 0
y |-> 2
</env>
I expect that read(x) will be rewriting in read(#loc(0)), but the rule is not applied. If I comment the env configuration requirement in the rule and replace L by the constant 0, the rule can be apply:
rule <k> read ( X:Id ) => read ( #loc( 0 ) ) ... </k>
// <env> ... X |-> L ... </env>
And I got a termination . (because other rule will handle the read(#loc(0)) after this one)
<k>
.
</k>
<env>
x |-> 0
y |-> 2
</env>
I also tried to use this syntax from this documentation here
rule <k> read ( X:Id ) => read ( #loc( Env[X] ) ) ... </k>
<env> Env:Map </env>
But I get a parsing error
[Error] Inner Parser: Parse error: unexpected token '['.
Source(/home/alessio/Project/osl/model/./osl.k)
Location(156,43,156,44)
Do you have an idea for debugging this?

Related

how to resolve an ambiguity

I have a grammar:
grammar Test;
s : ID OP (NUMBER | ID);
ID : [a-z]+ ;
NUMBER : '.'? [0-9]+ ;
OP : '/.' | '/' ;
WS : [ \t\r\n]+ -> skip ;
An expression like x/.123 can either be parsed as (s x /. 123), or as (s x / .123). With the grammar above I get the first variant.
Is there a way to get both parse trees? Is there a way to control how it is parsed? Say, if there is a number after the /. then I emit the / otherwise I emit /. in the tree.
I am new to ANTLR.
An expression like x/.123 can either be parsed as (s x /. 123), or as (s x / .123)
I'm not sure. In the ReplaceAll page(*), Possible Issues paragraph, it is said that "Periods bind to numbers more strongly than to slash", so that /.123 will always be interpreted as a division operation by the number .123. Next it is said that to avoid this issue, a space must be inserted in the input between the /. operator and the number, if you want it to be understood as a replacement.
So there is only one possible parse tree (otherwise how could the Wolfram parser decide how to interpret the statement ?).
ANTLR4 lexer and parser are greedy. It means that the lexer (parser) tries to read as much input characters (tokens) that it can while matching a rule. With your OP rule OP : '/.' | '/' ; the lexer will always match the input /. to the /. alternative (even if the rule is OP : '/' | '/.' ;). This means there is no ambiguity and you have no chance the input to be interpreted as OP=/ and NUMBER=.123.
Given my small experience with ANTLR, I have found no other solution than to split the ReplaceAll operator into two tokens.
Grammar Question.g4 :
grammar Question;
/* Parse Wolfram ReplaceAll. */
question
#init {System.out.println("Question last update 0851");}
: s+ EOF
;
s : division
| replace_all
;
division
: expr '/' NUMBER
{System.out.println("found division " + $expr.text + " by " + $NUMBER.text);}
;
replace_all
: expr '/' '.' replacement
{System.out.println("found ReplaceAll " + $expr.text + " with " + $replacement.text);}
;
expr
: ID
| '"' ID '"'
| NUMBER
| '{' expr ( ',' expr )* '}'
;
replacement
: expr '->' expr
| '{' replacement ( ',' replacement )* '}'
;
ID : [a-z]+ ;
NUMBER : '.'? [0-9]+ ;
WS : [ \t\r\n]+ -> skip ;
Input file t.text :
x/.123
x/.x -> 1
{x, y}/.{x -> 1, y -> 2}
{0, 1}/.0 -> "zero"
{0, 1}/. 0 -> "zero"
Execution :
$ export CLASSPATH=".:/usr/local/lib/antlr-4.6-complete.jar"
$ alias a4='java -jar /usr/local/lib/antlr-4.6-complete.jar'
$ alias grun='java org.antlr.v4.gui.TestRig'
$ a4 Question.g4
$ javac Q*.java
$ grun Question question -tokens -diagnostics t.text
[#0,0:0='x',<ID>,1:0]
[#1,1:1='/',<'/'>,1:1]
[#2,2:5='.123',<NUMBER>,1:2]
[#3,7:7='x',<ID>,2:0]
[#4,8:8='/',<'/'>,2:1]
[#5,9:9='.',<'.'>,2:2]
[#6,10:10='x',<ID>,2:3]
[#7,12:13='->',<'->'>,2:5]
[#8,15:15='1',<NUMBER>,2:8]
[#9,17:17='{',<'{'>,3:0]
...
[#29,47:47='}',<'}'>,4:5]
[#30,48:48='/',<'/'>,4:6]
[#31,49:50='.0',<NUMBER>,4:7]
...
[#40,67:67='}',<'}'>,5:5]
[#41,68:68='/',<'/'>,5:6]
[#42,69:69='.',<'.'>,5:7]
[#43,71:71='0',<NUMBER>,5:9]
...
[#48,83:82='<EOF>',<EOF>,6:0]
Question last update 0851
found division x by .123
found ReplaceAll x with x->1
found ReplaceAll {x,y} with {x->1,y->2}
found division {0,1} by .0
line 4:10 extraneous input '->' expecting {<EOF>, '"', '{', ID, NUMBER}
found ReplaceAll {0,1} with 0->"zero"
The input x/.123 is ambiguous until the slash. Then the parser has two choices : / NUMBER in the division rule or / . expr in the replace_all rule. I think that NUMBER absorbs the input and so there is no more ambiguity.
(*) the link was yesterday in a comment that has disappeared, i.e. Wolfram Language & System, ReplaceAll

AnTLR4 strange behavior in precedence

I have a very simple test grammar as following:
grammar Test;
statement: expression EOF;
expression
: Identifier
| expression binary_op expression
| expression assignment_operator expression
| expression '.' Identifier
;
binary_op: '+';
assignment_operator : '=' ;
Identifier : [a-zA-Z]+ ;
WS : [ \n\r\t]+ -> channel(HIDDEN) ;
With this version of the grammar I got the expected behavior if I write the following code:
b.x + b.y
I get a tree as (+ (. b x) (. b y))
However, if I replace expression binary_op expression by expression '+' expression I get a very different tree: (. (+ (. b x) b) y)
Is there any explanation for this?
Thanks
You have to set the precendence using something like this:
expr : expr2 (assignment_operator expr3)? # Equals
expr2 : expr1 (binary_op expr2)? # Add
expr1 : Identifier |
expr1 . Identifier
;
This removes all ambiguity on operator precendence.
Literals in the parser can confuse matters. Check and fix the errors/warnings reported in generating the parser. Likely need to move the literals from parser rules to lexer rules.
You can verify that the lexer is operating as intended by dumping the token stream. That will provide a clear basis for understanding the path that the parser is taking.
Update
Neither of the parse tree representations you list look proper for an Antlr4 parse tree. Nonetheless, tried both variants of your grammar and I consistently get:
Token dump:
Identifier: [#0,0:0='b',<4>,1:0]
Dot: [#1,1:1='.',<3>,1:1]
Identifier: [#2,2:2='x',<4>,1:2]
null: [#4,4:4='+',<1>,1:4]
Identifier: [#6,6:6='b',<4>,1:6]
Dot: [#7,7:7='.',<3>,1:7]
Identifier: [#8,8:8='y',<4>,1:8]
Tree dump:
(statement (expression (expression (expression (expression b) . x) + (expression b)) . y) <EOF>)
using
ParseTree tree = parser.statement();
System.out.print(tree.toStringTree(parser));
The nulls in this particular token dump are because the symbols are first defined in the parser.

Rascal error when specifying grammar

I have a simple file in rascal for specifying a toy grammar
module temp
import IO;
import ParseTree;
layout LAYOUT = [\t-\n\r\ ]*;
start syntax Simple
= A B ;
syntax A = "Hello"+ ("joe" "pok")* ;
syntax A= "Hi";
syntax B = "world"*|"wembly";
syntax B = C | C C* ;
public void main () {
println("hello");
iprint(parse(#start[Simple], "Hello Hello world world world"));
}
This works fine, however, the problem is that I didn't want to write
syntax B = C | C C* ;
I wanted to write
syntax B = ( C | C C* )?
but it was rejected as a parse error by rascal -even though all of
syntax B = ( C C C* )? ;
syntax B = ( C | C* )? ;
syntax B = C | C C* ;
are accepted fine. Can anyone explain to me what I'm doing wrong?
The sequence symbol (nested sequence) always requires brackets in rascal. The meta notation is defined as
syntax Sym = sequence: "(" Sym+ ")" | opt: Sym "?" | alternative: "(" Sym "|" {Sym "|"}+ ")" | ... ;
So, in your example you should have written:
syntax B = (C | (C C*))?;
What is perhaps confusing is that Rascal uses the | sign twice. Once for separating top-level alternatives, once for nested alternative:
syntax X = "a" | "b"; // top-level
syntax Y = ("c" | "d"); // nested, will internally generate a new rule:
syntax ("c" | "d") = "c" | "d";
Finally, normal alternatives have sequences without brackets, as in:
syntax B
= C
| C C*
;
// or less abstractly:
syntax Exp = left Exp "*" Exp
> left Exp "+" Exp
;
BTW, we generally avoid the use of too many nested regular expressions because they are so anonymous and therefore make interpreting parse trees harder. The best usage of regular expressions is for expressing lexical syntax where we are not so much interested in the internal structure anyhow.

How to get rid of useless nodes from this AST tree?

I have already looked at this question and even though the question titles seem to be the same; it doesn't answer my question, at least not in any way that I can understand.
Parsing Math
Here is what I am parsing:
PI -> 3.14.
Number area(Number radius) -> PI * radius^2.
This is how I want my AST tree to look, minus all the useless root nodes.
how it should look http://vertigrated.com/images/How%20I%20want%20the%20tree%20to%20look.png
Here are what I hope are the relevant fragments of my grammar:
term : '(' expression ')'
| number -> ^(NUMBER number)
| (function_invocation)=> function_invocation
| ATOM
| ID
;
power : term ('^' term)* -> ^(POWER term (term)* ) ;
unary : ('+'! | '-'^)* power ;
multiply : unary ('*' unary)* -> ^(MULTIPLY unary (unary)* ) ;
divide : multiply ('/' multiply)* -> ^(DIVIDE multiply (multiply)* );
modulo : divide ('%' divide)* -> ^(MODULO divide (divide)*) ;
subtract : modulo ('-' modulo)* -> ^(SUBTRACT modulo (modulo)* ) ;
add : subtract ('+' subtract)* -> ^(ADDITION subtract (subtract)*) ;
relation : add (('=' | '!=' | '<' | '<=' | '>=' | '>') add)* ;
expression : relation (and_or relation)*
| string
| container_access
;
and_or : '&' | '|' ;
Precedence
I still want to keep the precedence as illustrated in the following diagrams, but want to eliminate the useless nodes if at all possible.
Source: Number a(x) -> 0 - 1 + 2 * 3 / 4 % 5 ^ 6.
Here are the nodes I want to eliminate:
how I want the precedence tree to look http://vertigrated.com/images/example%202%20desired%20result.png
Basically I want to eliminate any of those nodes that don't directly have a branch under them to binary options.
You must realize that the two rules:
add : sub ( ('+' sub)+ -> ^(ADD sub (sub)*) | -> sub ) ;
and
add : sub ('+'^ sub)* ;
do not produce the same AST. Given the input 1+2+3, the first rule will produce:
ADD
|
.--+--.
| | |
1 2 3
where the second rule produces:
(+)
|
.--+--.
| |
(+) 3
|
.--+--.
| |
1 2
The latter makes more sense: infix expressions are expected to have 2 child nodes, not more.
Why not simply remove the literals in your parser rules and just do:
add : sub (ADD^ sub)*;
ADD : '+';
Creating the same AST using a rewrite rule would look like this:
add : (sub -> sub) ('+' s=sub -> ^(ADD $add $s))*;
Also see chapter 7: Tree Construction from The Definitive ANTLR Reference. Especially the paragraphs Rewrite Rules in Subrules (page 173) and Referencing Previous Rule ASTs in Rewrite Rules (page 174/175).
Your rule (and other like it)
add : subtract ('+' subtract)* -> ^(ADDITION subtract (subtract)*) ;
produces the useless production when you don't have a sequence of add operations.
I'm not an ANTLR expert, but I'd guess you need two cases, one for an add term
that is unary, and one for a set of children, the first of which generates your
standard tree, and the second of which simply passes the child tree up to the parent,
without creating a new node?
add : subtract ( ('+' subtract)+ -> ^(ADDITION subtract (subtract)*)
| -> subtract ) ;
Similar changes for other rules with sequences of operands to an operator.
To get rid of the irrelevant nodes, just be explicit:
subtract
:
modulo
(
( '-' modulo)+ -> ^(SUBTRACT modulo+) // no need for parenthesis or asterisk
|
() -> modulo
)
;
Even though I accepted Barts's answers as correct, I wanted to post my own complete answer with example code that I got working just for completeness.
Here is what I did based on Bart's answer:
unary : ('+'! | '-'^)? term ;
pow : (unary -> unary) ('^' s=unary -> ^(POWER $pow $s))*;
mod : (pow -> pow) ('%' s=pow -> ^(MODULO $mod $s))*;
mult : (mod -> mod) ('*' s=mod -> ^(MULTIPLY $mult $s))*;
div : (mult -> mult) ('/' s=mult -> ^(DIVIDE $div $s))*;
sub : (div -> div) ('-' s=div -> ^(SUBTRACT $sub $s))*;
add : (sub -> sub) ('+' s=sub -> ^(ADD $add $s))*;
And here is what the resulting tree looks like:
working answer http://vertigrated.com/images/working_answer.png
There is an alternative solution to just not use the rewrites and promote the symbols themselves to roots, but I want all descriptive labels in my tree if at all possible. I am just being anal about how the tree is represented so that my tree walking code will be as clean as possible!
power : unary ('^'^ unary)* ;
mod : power ('%'^ power)* ;
mult : mod ('*'^ mod)* ;
div : mult ('/'^ mult)* ;
sub : div ('-'^ div)* ;
add : sub ('+'^ sub)* ;
And this looks like this:
without rewrites http://vertigrated.com/images/without_the_rewrites.png

simple antlr template rewrite

I am totally new to antlr. I am trying to use antlr to do some language translation. I think I use the right syntax, but I got exception.
The following is part of the grammar:
primary
: parExpression
| 'this' ('.' Identifier)* identifierSuffix?
| 'super' superSuffix
| literal
| 'new' creator
| Identifier ('.' Identifier)* identifierSuffix?
| primitiveType ('[' ']')* '.' 'class'
| 'void' '.' 'class'
;
I added a rewrite rule, for example
| 'new' creator -> 'mynew' creator
The exception happends:
[11:11:48] error(100): rjava_new_rewrite.g:851:26: syntax error: antlr: NoViableAltException(58#[921:1: rewrite_alternative options {k=1; } : ({...}? => rewrite_template | {...}? => ( rewrite_element )+ -> {!stream_rewrite_element.hasNext()}? ^( ALT[LT(1),"ALT"] EPSILON["epsilon"] EOA["<end-of-alt>"] ) -> ^( ALT[LT(1),"ALT"] ( rewrite_element )+ EOA["<end-of-alt>"] ) | -> ^( ALT[LT(1),"ALT"] EPSILON["epsilon"] EOA["<end-of-alt>"] ) | {...}? ETC );])
[11:11:48] error(100): rjava_new_rewrite.g:851:34: syntax error: antlr: MissingTokenException(inserted [#-1,0:0='<missing SEMI>',<52>,851:33] at creator)
[11:11:48] error(100): rjava_new_rewrite.g:852:5: syntax error: antlr: MissingTokenException(inserted [#-1,0:0='<missing COLON>',<54>,852:4] at |)
[11:11:48] error(100): rjava_new_rewrite.g:0:1: syntax error: assign.types: MismatchedTreeNodeException(0!=3)
[11:11:48] error(100): rjava_new_rewrite.g:0:1: syntax error: assign.types: MismatchedTreeNodeException(3!=28)
[11:11:48] error(100): rjava_new_rewrite.g:0:1: syntax error: assign.types: MismatchedTreeNodeException(3!=27)
[11:11:48] java.util.NoSuchElementException: can't look backwards more than one token in this stream
at org.antlr.runtime.misc.LookaheadStream.LB(LookaheadStream.java:159)
at org.antlr.runtime.misc.LookaheadStream.LT(LookaheadStream.java:120)
at org.antlr.runtime.RecognitionException.extractInformationFromTreeNodeStream(RecognitionException.java:144)
at org.antlr.runtime.RecognitionException.<init>(RecognitionException.java:111)
at org.antlr.runtime.MismatchedTreeNodeException.<init>(MismatchedTreeNodeException.java:42)
at org.antlr.runtime.tree.TreeParser.recoverFromMismatchedToken(TreeParser.java:135)
at org.antlr.runtime.BaseRecognizer.match(BaseRecognizer.java:115)
at org.antlr.grammar.v3.AssignTokenTypesWalker.grammar_(AssignTokenTypesWalker.java:388)
at org.antlr.tool.CompositeGrammar.assignTokenTypes(CompositeGrammar.java:337)
at org.antlr.tool.Grammar.setGrammarContent(Grammar.java:605)
at org.antlr.works.grammar.antlr.ANTLRGrammarEngineImpl.createNewGrammar(ANTLRGrammarEngineImpl.java:192)
at org.antlr.works.grammar.antlr.ANTLRGrammarEngineImpl.createParserGrammar(ANTLRGrammarEngineImpl.java:225)
at org.antlr.works.grammar.antlr.ANTLRGrammarEngineImpl.createCombinedGrammar(ANTLRGrammarEngineImpl.java:203)
at org.antlr.works.grammar.antlr.ANTLRGrammarEngineImpl.createGrammars(ANTLRGrammarEngineImpl.java:165)
at org.antlr.works.grammar.engine.GrammarEngineImpl.getGrammarLanguage(GrammarEngineImpl.java:115)
at org.antlr.works.components.GrammarWindowMenu.getEditTestRigTitle(GrammarWindowMenu.java:244)
at org.antlr.works.components.GrammarWindowMenu.menuItemState(GrammarWindowMenu.java:529)
at org.antlr.works.components.GrammarWindow.menuItemState(GrammarWindow.java:440)
at org.antlr.xjlib.appkit.menu.XJMainMenuBar.refreshMenuItemState(XJMainMenuBar.java:175)
at org.antlr.xjlib.appkit.menu.XJMainMenuBar.refreshMenuState(XJMainMenuBar.java:169)
at org.antlr.xjlib.appkit.menu.XJMainMenuBar.refreshState(XJMainMenuBar.java:153)
at org.antlr.xjlib.appkit.menu.XJMainMenuBar.refresh(XJMainMenuBar.java:145)
at org.antlr.works.grammar.decisiondfa.DecisionDFAEngine.refreshMenu(DecisionDFAEngine.java:203)
at org.antlr.works.components.GrammarWindow.afterParseOperations(GrammarWindow.java:1179)
at org.antlr.works.components.GrammarWindow.access$200(GrammarWindow.java:96)
at org.antlr.works.components.GrammarWindow$AfterParseOperations.threadRun(GrammarWindow.java:1553)
at org.antlr.works.ate.syntax.misc.ATEThread.run(ATEThread.java:152)
at java.lang.Thread.run(Thread.java:680)
Can anyone give any idea?
qinsoon wrote:
I think I use the right syntax, but I got exception.
You thought wrong. :)
You cannot put things in a rewrite rule that were not matched by the parser. So in your case:
| 'new' creator -> 'mynew' creator
'mynew' is wrong because the parser never encountered such a token/rule. If you want to insert tokens in your AST the parser didn't encounter (they're called imaginary tokens in ANTLR), you'll need to define them in the tokens {...} of your grammar, like this:
grammar YourGrammarName;
options {
output=AST;
}
tokens {
MYNEW;
}
primary
: ...
| 'new' creator -> ^(MYNEW creator)
| ...
;
This will insert a node with the type MYNEW and innner-text "MYNEW". If you want to associate some custom text in the node, do it like this:
primary
: ...
| 'new' creator -> ^(MYNEW["mynew"] creator)
| ...
;
As you can see from the above, I created an AST where the root node is MYNEW. If I do:
| 'new' creator -> MYNEW creator
ANTLR will return 2 nodes from that rule. You'd get in trouble if that rule would ever become the root of another (sub) tree: after all, you can get an AST with 2 roots! Always strive to let rewrite rules produce either a single AST/node:
rule
: subrule ';' -> subrule // omit the semi-colon
;
or when more nodes need to be created, make a proper AST with a single root-node:
rule
: AToken subrule1 subrule2 ';' -> ^(AToken subrule1 subrule2) // AToken is the root
;