K Framework: problem with semantic cast in function declarations - kframework

I'm trying to update an erlang semantics from K 3.6 to 5.0 and I ran into the following issue:
When I try to write a function declaration without semantic cast, it works fine:
rule Name:Atom(Args) -> Body . =>. ... [structural]
But when I need to write the following, the kompile outputs [Error] Inner Parser: Parse error: unexpected token ')'.
rule Name:Atom(Args:Values) -> Body => . ... [structural]
To reproduce, here is my simplified syntax:
imports STRING
syntax UnquotedAtom ::= r"[a-z][_a-zA-Z0-9#]*" [token]
syntax Atom ::= UnquotedAtom | Bool
syntax Exp ::= Atom
syntax Exps ::= List{Exp, ","} [strict, klabel("exps"), prefer, listexps]
syntax FunCl ::= Atom"("Exps")" "->" Exps "." [funcl1]
syntax Value ::= Atom
syntax Values ::= List{Value, ","}
syntax Exp ::= Value
syntax KResult ::= Value
// Function declaration
//ok
rule <k>Name:Atom(Args) -> Body . =>. ...</k> [structural]
// unexpected token ')'
rule <k>Name:Atom(Args:Values) -> Body => . ...</k> [structural]
My K version is:
RV-K version 1.0-SNAPSHOT
Git revision: adf2f2d
Git branch: UNKNOWN
Build date: Tue Mar 16 16:43:04 CET 2021

One of the changes from K3 to K5 is that lists are no longer automatically subsorted if the elements are subsorted. If you manually add
syntax Exps ::= Values
Then your rule will kompile again.

Related

ANTLR4 grammar rule that cannot be reached from start rule affects language

The following minimal grammar shows the issue:
grammar test;
call : exp LP exp RP ;
exp : exp LP exp RP | ID;
ID : [a-z] ;
LP : '(' ;
RP : ')' ;
Newline : '\r\n' | '\n' ;
If I use call as the start rule, then the generated parser will gladly parse the following input:
f(x)
(tried it in ANTLR lab, which probably uses the java target, and locally using a C++ target with ANTLR 4.9.3).
If I now add the following rule to the grammar, but keep call as the start rule, then the same input does not match call anymore.
callWithNewline : call Newline;
Why does callWithNewline affect whether call matches?
If I change the input to have a newline character after it will suddenly match call in ANTLR lab (even though the newline is not part of the match of course), but not in the C++ target, so the targets have slightly different behavior here.
I ran into this behavior while unit testing subrules, it does appear that parsing a full grammar which contains this kind of subgrammar somewhere lower in the hierarchy does not lead to issues.
Edit
The issue still occurs if I remove the ambiguity
grammar test;
callWithNewline : call Newline ;
call : exp LP ID RP ;
exp : exp LP ID RP | ID;
ID : [a-z] ;
LP : '(' ;
RP : ')' ;
Newline : '\r\n' | '\n' ;

K Framework: Substitution not substituting in simple terms?

I have the following K file:
require "substitution.k"
module PURE
imports DOMAINS
imports SUBSTITUTION
syntax PSort ::= "$Type" [token]
| "$Kind" [token]
syntax Type ::= PSort
| KVar
| "Pi" KVar ":" Term "." Term [binder]
syntax Term ::= Type
| "(" Term ")" [bracket]
> Term Term [left]
> "declare" KVar ":" Term "in" Term
syntax KResult ::= Type
configuration
<T>
<k> typeof($PGM:Term, ?T) ~> ?T </k>
<typeEnv> .Map </typeEnv>
</T>
syntax KItem ::= typeof(Term, Term)
rule <k> typeof(declare X : T in E, T2) => typeof(E, T2) ... </k>
<typeEnv> TEnv => TEnv[X <- T] </typeEnv>
// VAR
rule <k> typeof(X:KVar, T) => . ... </k>
<typeEnv> ... X |-> T ... </typeEnv>
// APP
syntax KItem ::= Term "=" Term
rule T = T => .
rule typeof(M N, T) =>
typeof(M, Pi ?X : ?T1. ?T2) ~>
typeof(N, ?T1) ~>
?T2[N/?X] = T
endmodule
When I compile it with the Java backend and run the following file:
declare nat : $Type in
declare Z : nat in
declare Vector : Pi n : nat . $Type in
declare blah : Pi n : nat . (Vector n) in
blah Z
I get:
<T>
<k>
Vector n
</k>
<typeEnv>
Vector |-> Pi n : nat . $Type
Z |-> nat
blah |-> Pi n : nat . ( Vector n )
nat |-> $Type
</typeEnv>
</T>
But I want it to substitute Z for n and get Vector Z.
This appears to be a bug in the java backend that prematurely applies the substitution operator while its arguments are still symbolic variables. As a result, the substitution operator disappears prematurely, and then when the term that was substituted is instantiated via unification, it has not been substituted, which leads to the problem that you describe. Here is an issue tracking the problem: https://github.com/kframework/k/issues/1165
I took a stab at fixing it, but it proved to be nontrivial and I don't have time to dig deeper right now. You are welcome to try to fix it in a pull request if you want, although I am unsure why the fix I wrote is making other things break. Your better choice is probably to rewrite your typing rules so that they don't try to perform substitution on a variable. One way to do this would be to make the rule for application modify the type environment and then restore it when it's been fully typed. You can take a look at the K tutorial folder 1_k/5_types for some examples of how you can type a lambda-calculus-like language.

Why isn't antlr 4 breaking my tokens up as expected?

So I am fairly new to ANTLR 4. I have stripped down the grammar as much as I can to show the problem:
grammar DumbGrammar;
equation
: expression (AND expression)*
;
expression
: ID
;
ID : LETTER(LETTER|DIGIT)* ;
AND: 'and';
LETTER: [a-zA-Z_];
DIGIT : [0-9];
WS : [ \r\n\t] + -> channel (HIDDEN);
If use this grammar, and use the sample text: abc and d I get a weird tree with unexpected structure as shown below(using IntelliJ and ANTLR4 plug in):
If I simply change the terminal rule AND: 'and'; to read AND: '&&'; and then submit abc && d as input I get the following tree, as expected:
I cannot figure out why it isn't parsing "and" correctly, but does parse '&&' correctly.
The input "and" is being tokenized as an ID token. Since both ID and AND match the input "and", ANTLR needs to make a decision which token to choose. It takes ID since it was defined before AND.
The solution: define AND before ID:
AND: 'and';
ID : LETTER(LETTER|DIGIT)* ;

Trouble migrating antlr grammar

I have never used antlr in past, but now have to migrate grammar for an older version to the latest. I am trying to generate lexer and parser for c# target. I am stuck on migrating the start rule seen below.
grammar expr;
DQUOTE: '\"';
SQUOTE: '\'';
NEG : '-';
PLUS : '+';
OPEN : '(';
CLOSE : ')';
PERIOD: '.';
COMMA : ',';
start returns [Expression value]
:
expression EOF { $value = $expression.value; }
;
expression returns [Expression value]
:
literal { $value = $literal.value; }
| name { $value = $name.value; }
| functionCall { $value = $functionCall.value; }
;
I get the following error.
syntax error:
mismatched input '[Expression value]' expecting ARG_ACTION while
matching a rule.
I have already come across a post Troubles with returns declaration on the first parser rule in an ANTLR4 grammar. But Sam's response has not helped me figure out what I should be changing in my case.
I would appreciate if anyone could let me know the equivalent of the start rule in latest grammar.
The answer you linked appears to be applicable to your case. Move lexer rules (i.e. those starting with uppercase letters, DQUOTE and so on) after parser rules like start.

How to solve a shift/reduce conflict?

I'm using CUP to create a parser that I need for my thesis. I have a shift/reduce conflict in my grammar. I have this production rule:
command ::= IDENTIFIER | IDENTIFIER LPAREN parlist RPAREN;
and I have this warning:
Warning : *** Shift/Reduce conflict found in state #3
between command ::= IDENTIFIER (*)
and command ::= IDENTIFIER (*) LPAREN parlist RPAREN
under symbol LPAREN
Now, I actually wanted it to shift so I'm pretty ok with it, but my professor told me to find a way to solve the conflict. I'm blind. I've always read about the if/else conflict but to me this doesn't seem the case.
Can you help me?
P.S.: IDENTIFIER, LPAREN "(" and RPAREN ")" are terminal, parlist and command are not.
Your problem is not in those rules at all. Although Michael Mrozek answer is correct approach to resolving the "dangling else problem", it does not grasp the problem at hand.
If you look at the error message, you see that the shift / reduce conflict is present when lexing LPAREN. I am pretty sure that the rules alone will not create a conflict.
I can't see your grammar, so I can't help you. But your conflict is probably when a command is followed by a different rule that start with a LPAREN.
Look at any other rules that can potentially be after command and start with LPAREN. You will then have to consolidate the rules. There is a very good chance that your grammar is erroneous for a specific input.
You have two productions:
command ::= IDENTIFIER
command ::= IDENTIFIER LPAREN parlist RPAREN;
It's a shift/reduce conflict when the input tokens are IDENTIFIER LPAREN, because:
LPAREN could be the start of a new production you haven't listed, in which case the parser should reduce the IDENTIFIER already on the stack into command, and have command LPAREN remaining
They could both be the start of the second production, so it should shift the LPAREN onto the stack next to IDENTIFIER and keep reading, trying to find a parlist.
You can fix it by doing something like this:
command ::= IDENTIFIER command2
command2 ::= LPAREN parlist RPAREN |;
Try to set a precedence:
precedence left LPAREN, RPARENT;
It forces CUP to decide the conflict, taking the left match.