K Framework: Cannot convert to subtype - kframework

I'm trying evaluate Expressions to Values (Exps ::= Values) for function calls.
Here's a simple example:
module ERL-SYNTAX
imports INT-SYNTAX
imports STRING
syntax Atom ::= "main" | "f"
syntax Exp ::= Atom | Int
syntax Exp ::= Exp "(" Exps ")" [seqstrict]
syntax Exps ::= List{Exp, ","} [seqstrict]
endmodule
module ERL-CONFIGURATION
imports ERL-SYNTAX
imports MAP
syntax Value ::= Atom | Int | "{" Values "}"
syntax Values ::= List{Value, ","}
syntax Exp ::= Value
syntax Exps ::= Values
syntax KResult ::= Value
syntax KResult ::= Values
configuration <cfg color="yellow">
<k color="green"> $PGM:Exp </k>
<fundefs> //some default function definitions
.Map (f |-> 5 , .Exps
main |-> f ( 2 , 3 , .Exps ) , .Exps )
</fundefs>
</cfg>
endmodule
module ERL
imports ERL-SYNTAX
imports ERL-CONFIGURATION
//rule .Exps => .Values
rule <k>F:Atom(_:Values) => L ...</k>
<fundefs>... F |-> L ...</fundefs>
endmodule
This gets stuck at
.Exps ~> #freezer_(_)ERL-SYNTAX1 ( main )
So I tried with this rule: .Exps => .Values to evaluate main().
To me, the strange thing is that this time heating 3 is ok:
.Values ~> #freezer_,ERL-SYNTAX1 ( 3 ) ~> #freezer,_ERL-SYNTAX1 ( 2 ) ~> ...
will be
3 , .Values ~> #freezer_,_ERL-SYNTAX1 ( 2 ) ~> ..
but here it gets stuck again.
How should I approach this problem?

Put the productions for Exps and Vals in the same module and give them the same klabel attribute. This will make them overload one another, at which point in time, the fact that .Values is a KResult should solve your problem.

Related

Why does a list of values not cool on the LLVM backend of K?

When trying to define the syntax for a Scheme-like language, I found that the running result of kompiled file with java backend
kompile --backend java scheme.k -d .
behaves differently with llvm backend
kompile --backend llvm scheme.k -d .
Here's my code for scheme.k:
module SCHEME-COMMON
imports DOMAINS-SYNTAX
syntax Name ::= "+" | "-" | "*" | "/"
| "display" | "newline"
syntax Names ::= List{Name," "}
syntax Exp ::= Int | Bool | String | Name
| "[" Name Exps "]" [strict(2)]
syntax Exps ::= List{Exp," "} [strict]
syntax Val
syntax Vals ::= List{Val," "}
syntax Bottom
syntax Bottoms ::= List{Bottom," "}
syntax Pgm ::= Exp Pgm [strict(1)]
| "eof"
endmodule
module SCHEME-SYNTAX
imports SCHEME-COMMON
imports BUILTIN-ID-TOKENS
syntax Name ::= r"[a-z][_a-zA-Z0-9]*" [token, prec(2)]
| #LowerId [token]
endmodule
module SCHEME-MACROS
imports SCHEME-COMMON
endmodule
module SCHEME
imports SCHEME-COMMON
imports SCHEME-MACROS
imports DOMAINS
configuration <T color="yellow">
<k color="green"> $PGM:Pgm </k>
<env color="violet"> .Map </env>
<store color="white"> .Map </store>
<input color="magenta" stream="stdin"> .List </input>
<output color="brown" stream="stdout"> .List </output>
</T>
syntax Val ::= Int | Bool | String
syntax Exp ::= Val
syntax Exps ::= Vals
syntax Vals ::= Bottoms
syntax Exps ::= Names
syntax Names ::= Bottoms
syntax KResult ::= Vals | Val
rule _:Val P:Pgm => P
when notBool(P ==K eof)
rule V:Val eof => V
rule [+ I1 I2 Vals] => [+ (I1 +Int I2) Vals] [arith]
rule [+ I .Vals] => I [arith]
rule [- I1 I2 Vals] => [- (I1 -Int I2) Vals] [arith]
rule [- I .Vals] => I [arith]
rule [* I1 I2 Vals] => [* (I1 *Int I2) Vals] [arith]
rule [* I .Vals] => I [arith]
rule [/ I1 I2 Vals] => [/ (I1 /Int I2) Vals]
when I2 =/=K 0 [arith]
rule [/ I .Vals] => I [arith]
rule <k> [newline .Exps] => "" ...</k>
<output>... .List => ListItem("\n") </output> [io]
rule <k> [display V:Val] => "" ...</k>
<output>... .List => ListItem(V) </output> [io]
endmodule
and this is the test file I'm trying to run:
[display 8]
eof
Strangely, the kompiled version using java can run this test case normally, while the kompiled version using llvm stucks at
<k>
8 .Bottoms ~> #freezer[__]_SCHEME-COMMON_Exp_Name_Exps0_ ( display ) ~> #freezer___SCHEME-COMMON_Pgm_Exp_Pgm1_ ( eof )
</k>
What might be a possible reason? The version information for kompile is
RV-K version 1.0-SNAPSHOT
Git revision: a7c2937
Git branch: UNKNOWN
Build date: Wed Feb 12 09:46:03 CST 2020
In the LLVM and Haskell backends, two productions are said to overload with one another when they share the same arity and klabel attribute and all the argument sorts of one production are less than or equal to the argument sorts of the other, and the result sort of the first is less than the result of the other. Special consideration is given during matching to terms that overload: For example, in your example, if a list of Exps and a list of Vals were said to overload, then if you have a pattern V:Vals, it would match the term V:Val, .Exps of sort Exps.
By default, the Java backend assumes that all Lists between sorts that have a subsort relationship overload. However, the LLVM and Haskell backends do not make this assumption. Thus, your example will work if you give the same klabel attribute to your Exps List and your Vals list. We do not do the same thing in the llvm backend because we have found that it tends to lead to serious ambiguity in your grammar in places where you do not expect it.
For example:
module SCHEME-COMMON
imports DOMAINS-SYNTAX
syntax Name ::= "+" | "-" | "*" | "/"
| "display" | "newline"
syntax Names ::= List{Name," "} [klabel(exps)]
syntax Exp ::= Int | Bool | String | Name
| "[" Name Exps "]" [strict(2)]
syntax Exps ::= List{Exp," "} [strict, klabel(exps)]
syntax Val
syntax Vals ::= List{Val," "} [klabel(exps)]
syntax Bottom
syntax Bottoms ::= List{Bottom," "} [klabel(exps)]
syntax Pgm ::= Exp Pgm [strict(1)]
| "eof"
endmodule

Grammar for string interpolation where malformed interpolations are treated as normal strings

Here is a subset of the language I want to parse:
A program consists of statements
A statement is an assignment: A = "b"
Assignment's left side is an identifier (all caps)
Assignment's right side is a string enclosed by quotation marks
A string supports string interpolation by inserting a bracket-enclosed identifier (A = "b[C]d")
So far this is straight forward enough. Here is what works:
Lexer:
lexer grammar string_testLexer;
STRING_START: '"' -> pushMode(STRING);
WS: [ \t\r\n]+ -> skip ;
ID: [A-Z]+;
EQ: '=';
mode STRING;
VAR_START: '[' -> pushMode(INTERPOLATION);
DOUBLE_QUOTE_INSIDE: '"' -> popMode;
REGULAR_STRING_INSIDE: ~('"'|'[')+;
mode INTERPOLATION;
ID_INSIDE: [A-Z]+;
CLOSE_BRACKET_INSIDE: ']' -> popMode;
Parser:
parser grammar string_testParser;
options { tokenVocab=string_testLexer; }
mainz: stat *;
stat: ID EQ string;
string: STRING_START string_part* DOUBLE_QUOTE_INSIDE;
string_part: interpolated_var | REGULAR_STRING_INSIDE;
interpolated_var: VAR_START ID_INSIDE CLOSE_BRACKET_INSIDE;
So far so good. However there is one more language feature:
if there is no valid identifier (that is all caps) in the brackets, treat as normal string.
Eg:
A = "hello" => "hello"
B = "h[A]a" => "h", A, "a"
C="h [A] a" => "h ", A, " a"
D="h [A][V] a" => "h ", A, V, " a"
E = "h [A] [V] a" => "h ", A, " ", V, " a"
F = "h [aVd] a" => "h [aVd] a"
G = "h [Va][VC] a" => "h [Va]", VC, " a"
H = "h [V][][ff[Z]" => "h ", V, "[][ff", Z
I tried to replace REGULAR_STRING_INSIDE: ~('"'|'[')+; With just REGULAR_STRING_INSIDE: ~('"')+;, but that does not work in ANTLR. It results in matching all the lines above as strings.
Since in ANTLR4 there is no backtracking to enable I'm not sure how to overcome this and tell ANTLR that if it did not match the interpolated_var rule it should go ahead and match REGULAR_STRING_INSIDE instead, it seems to always chose the latter.
I read that lexer always matches the longest token, so I tried to lift REGULAR_STRING_INSIDE and VAR_START as a parser rules, hoping that alternatives order in the parser will be honoured:
r: REGULAR_STRING_INSIDE
v: VAR_START
string: STRING_START string_part* DOUBLE_QUOTE_INSIDE;
string_part: v ID_INSIDE CLOSE_BRACKET_INSIDE | r;
That did not seem to make any difference at all.
I also read that antlr4 semantic predicates could help. But I have troubles coming up with the ones that needs to be applied in this case.
How do I modify this grammar above so that it can match both interpolated bits, or treat them as strings if they are malformed?
Test input:
A = "hello"
B = "h[A]a"
C="h [A] a"
D="h [A][V] a"
E = "h [A] [V] a"
F = "h [aVd] a"
G = "h [Va][VC] a"
H = "h [V][][ff[Z]"
How I compile / test:
antlr4 string_testLexer.g4
antlr4 string_testParser.g4
javac *.java
grun string_test mainz st.txt -tree
I tried to replace REGULAR_STRING_INSIDE: ~('"'|'[')+; With just REGULAR_STRING_INSIDE: ~('"')+;, but that does not work in ANTLR. It results in matching all the lines above as strings.
Correct, ANTLR tries to match as much as possible. So ~('"')+ will be far too greedy.
I also read that antlr4 semantic predicates could help.
Only use predicates as a last resort. It introduces target specific code in your grammar. If it's not needed (which in this case it isn't), then don't use them.
Try something like this:
REGULAR_STRING_INSIDE
: ( ~( '"' | '[' )+
| '[' [A-Z]* ~( ']' | [A-Z] )
| '[]'
)+
;
The rule above would read as:
match any char other than " or [ once or more
OR match a [ followed by zero or more capitals, followed by any char other than ] or a capital (your [Va and [aVd cases)
OR match an empty block, []
And match one of these 3 alternatives above once or more to create a single REGULAR_STRING_INSIDE.
And if a string can end with one or mote [, you may also want to do this:
DOUBLE_QUOTE_INSIDE
: '['* '"' -> popMode
;

Antlr4 Grammar for Function Application

I'm trying to write a simple lambda calculus grammar (show below). The issue I am having is that function application seems to be treated as right associative instead of left associative e.g. "f 1 2" is parsed as (f (1 2)) instead of ((f 1) 2). ANTLR has an assoc option for tokens, but I don't see how that helps here since there is no operator for function application. Does anyone see a solution?
LAMBDA : '\\';
DOT : '.';
OPEN_PAREN : '(';
CLOSE_PAREN : ')';
fragment ID_START : [A-Za-z+\-*/_];
fragment ID_BODY : ID_START | DIGIT;
fragment DIGIT : [0-9];
ID : ID_START ID_BODY*;
NUMBER : DIGIT+ (DOT DIGIT+)?;
WS : [ \t\r\n]+ -> skip;
parse : expr EOF;
expr : variable #VariableExpr
| number #ConstantExpr
| function_def #FunctionDefinition
| expr expr #FunctionApplication
| OPEN_PAREN expr CLOSE_PAREN #ParenExpr
;
function_def : LAMBDA ID DOT expr;
number : NUMBER;
variable : ID;
Thanks!
this breaks 4.1's pattern matcher for left-recursion. cleaned up in main branch I believe. try downloading last master and build. CUrrently 4.1 generates:
expr[int _p]
: ( {} variable
| number
| function_def
| OPEN_PAREN expr CLOSE_PAREN
)
(
{2 >= $_p}? expr
)*
;
for that rule. expr ref in loop is expr[0] actually, which isn't right.

GOLD Parser comment grammar

I'm having some trouble with comment blocks in my grammar. The syntax is fine, but Step 3 DFA scanner is complaining about the way I'm going about it.
The language I'm trying to parse looks like this:
{statement}{statement} etc.
Within each statement can be a couple of different types of comments:
{% This is a comment.
It can contain multiple lines
and continues until the statement end}
{statement REM This is a comment.
It can contain multiple lines
and continues until the statement end}
This is a simplified grammar that displays the problem I'm running into:
"Start Symbol" = <Program>
{String Chars} = {Printable} + {HT} - ["\]
StringLiteral = '"' ( {String Chars} | '\' {Printable} )* '"'
Comment Start = '{%'
Comment End = '}'
Comment Block #= { Ending = Closed } ! Eat the } and produce an empty statement
!Comment #= { Type = Noise } !Implied by GOLD
Remark Start = 'REM'
Remark End = '}'
Remark Block #= { Ending = Open } ! Don't eat the }, the statements expects it
Remark #= { Type = Noise }
<Program> ::= <Statements>
<Statements> ::= '{' <Statement> '}' <Statements> | <>
<Statement> ::= StringLiteral
Step 3 is complaining about the } in <Statements> and the } for the End of the lexical group.
Anyone know how to accomplish what I need?
[Edit]
I got the REM portion working with the following:
{Remark Chars} = {Printable} + {WhiteSpace} - [}]
Remark = 'REM' {Remark Chars}* '}'
<Statements> ::= <Statements> '{' <Statement> '}'
| <Statements> '{' <Statement> <Remark Stmt>
| <>
<Remark Stmt> ::= Remark
This is actually ideal, since Remarks are not necessarily noise to me.
Still having issues with the comment lexical group. I'll look at solving in the same way.
I don't think capturing the REM comment with a lexical group is possible.
I think you need to define a new terminal like this:
Remark = 'REM' ({Printable} - '}')*
This however means, that you need to be able to handle this new terminal in your productions...
Eg.
From:
<CurlyStatement> ::= '{' <Statement> '}'
To:
<CurlyStatement> ::= '{' <Statement> '}'
| '{' <Statement> Remark '}'
I have'nt checked the syntax in the above examples, but I hope you get my idear

ANTLR: how to parse a region within matching brackets with a lexer

i want to parse something like this in my lexer:
( begin expression )
where expressions are also surrounded by brackets. it isn't important what is in the expression, i just want to have all what's between the (begin and the matching ) as a token. an example would be:
(begin
(define x (+ 1 2)))
so the text of the token should be (define x (+ 1 2)))
something like
PROGRAM : LPAREN BEGIN .* RPAREN;
does (obviously) not work because as soon as he sees a ")", he thinks the rule is over, but i need the matching bracket for this.
how can i do that?
Inside lexer rules, you can invoke rules recursively. So, that's one way to solve this. Another approach would be to keep track of the number of open- and close parenthesis and let a gated semantic predicate loop as long as your counter is more than zero.
A demo:
T.g
grammar T;
parse
: BeginToken {System.out.println("parsed :: " + $BeginToken.text);} EOF
;
BeginToken
#init{int open = 1;}
: '(' 'begin' ( {open > 0}?=> // keep reapeating `( ... )*` as long as open > 0
( ~('(' | ')') // match anything other than parenthesis
| '(' {open++;} // match a '(' in increase the var `open`
| ')' {open--;} // match a ')' in decrease the var `open`
)
)*
;
Main.java
import org.antlr.runtime.*;
public class Main {
public static void main(String[] args) throws Exception {
String input = "(begin (define x (+ (- 1 3) 2)))";
TLexer lexer = new TLexer(new ANTLRStringStream(input));
TParser parser = new TParser(new CommonTokenStream(lexer));
parser.parse();
}
}
java -cp antlr-3.3-complete.jar org.antlr.Tool T.g
javac -cp antlr-3.3-complete.jar *.java
java -cp .:antlr-3.3-complete.jar Main
parsed :: (begin (define x (+ (- 1 3) 2)))
Note that you'll need to beware of string literals inside your source that might include parenthesis:
BeginToken
#init{int open = 1;}
: '(' 'begin' ( {open > 0}?=> // ...
( ~('(' | ')' | '"') // ...
| '(' {open++;} // ...
| ')' {open--;} // ...
| '"' ... // TODO: define a string literal here
)
)*
;
or comments that may contain parenthesis.
The suggestion with the predicate uses some language specific code (Java, in this case). An advantage of calling a lexer rule recursively is that you don't have custom code in your lexer:
BeginToken
: '(' Spaces? 'begin' Spaces? NestedParens Spaces? ')'
;
fragment NestedParens
: '(' ( ~('(' | ')') | NestedParens )* ')'
;
fragment Spaces
: (' ' | '\t')+
;