When are K configuration cells type-checked? - kframework

It is a common K idiom to define a programming language's syntax with a top-sort of well-formed programs (e.g. Pgm) and then to restrict the <k> cell to have this sort in the configuration declaration using the special $PGM variable which is passed automatically by krun. This prevents users from executing programs with krun that are not well-formed. My question is:
Are the sort of cells checked only at start-up time or after each rule evaluation?
Do different cells show different behavior depending on their identity (e.g. the <k> cell) or how they are typed (e.g. user-defined types versus builtin types)?
Here is a partial example to show what I mean:
configuration
<mylang>
<k> $PGM:Pgm </k>
<env> .Env:Env </env> // Env is a custom map structure defined for environments
<store> .Map </store> // For the store we use the K builtin Map
...
</mylang>
For the <k> cell, I conclude that it is definitely only checked at start-up time, since program evaluation typically tears a program apart into an expression and a continuation (e.g. ADD ~> ...) which cannot have the sort Pgm anymore because ~> is builtin.
So, elaborating on questions (1-2) above, is the <k> cell exceptional in this sense?

Each rule is sort-checked at kompile time to be sort-preserving, so it's not needed to check this at runtime. If something of the correct sort goes in, something of the correct sort comes out.
The <k> cell gets sort K, at least for example, in this definition: https://github.com/kframework/evm-semantics/blob/272608d70f363ed3d8d921887b98a26102a03032/evm.md#configuration
it results in compiled.txt (found at .build/defn/java/driver-kompiled/compiled.txt) which looks like:
...
syntax KCell ::= "project:KCell" "(" K ")" [function, projection]
syntax KCell ::= "initKCell" "(" Map ")" [function, initializer, noThread]
syntax KCell ::= "<k>" K "</k>" [cell, cellName(k), contentStartColumn(7), contentStartLine(31), format(%1%i%n%2%d%n%3), maincell, org.kframework.definition.Production(syntax #RuleContent ::= #RuleBody [klabel(#ruleNoConditions), symbol])]
...
But other cells get more specific sorts:
...
syntax JumpDestsCell ::= "project:JumpDestsCell" "(" K ")" [function, projection]
syntax JumpDestsCell ::= "initJumpDestsCell" [function, initializer, noThread]
syntax JumpDestsCell ::= "<jumpDests>" Set "</jumpDests>" [cell, cellName(jumpDests), contentStartColumn(7), contentStartLine(31), format(%1%i%n%2%d%n%3), org.kframework.definition.Production(syntax #RuleContent ::= #RuleBody [klabel(#ruleNoConditions), symbol])]
...
I'm not sure how K decides that the <k> cell needs to get sort K, but I don't think it's based on analyzing the rules. I think it's likely that it sees $PGM in that cell, so it adds the maincell attribute you see and gives it sort K. Everething is a subsort of K.
I'm fairly certain it's not any $ variable in the configuration that gives it sort K, because the <chainID> cell in KEVM gets these declarations:
...
syntax ChainIDCell ::= "project:ChainIDCell" "(" K ")" [function, projection]
syntax ChainIDCell ::= "initChainIDCell" "(" Map ")" [function, initializer, noThread]
syntax ChainIDCell ::= "<chainID>" Int "</chainID>" [cell, cellName(chainID), contentStartColumn(7), contentStartLine(31), format(%1%i%n%2%d%n%3), org.kframework.definition.Production(syntax #RuleContent ::= #RuleBody [klabel(#ruleNoConditions), symbol])]
...
Note that there isn't very much special about the _~>_ operator. It's declared here: https://github.com/kframework/k/blob/135469ea0ebea96dacf0f9a49261ff1171440c20/k-distribution/include/kframework/builtin/kast.k#L57

Related

How do I make an item optional or repeatable in K syntax rule?

How can I convert this EBNF rules below with K Framework ?
An element can be used to mean zero or more of the previous:
items ::= {"," item}*
For now, I am using a List from the Domain module. But inline List declarations are not allowed, like this one:
syntax Foo ::= Stmt List{Id, ""}
For now, I have to create a new syntax rule for the listed item to counter the problem:
syntax Ids ::= List{Id, ""}
syntax Foo ::= Stmt Ids
Is there another way to counter this creation of a new rule?
An element can appear zero or one time. In other words it can be optional:
array-decl ::= <variable> "[" {Int}? "]"
Where we want to accept: a[4] and a[]. For now, to bypass this one I create 2 rules, where one branch has the item and the other not. But this solution duplicate rules in an unnecessary way in my opinion.
An element can appear one or more of the previous:
e ::= {a-z}+
Where we don't accept any non-zero length sequence of lower case letters. Right now, I didn't find a way to simulate this.
Thank you in advance!
Inline zero-or-more productions have been restricted in the K-framework because the backend doesn't support terms with a variable number of arguments.
Therefore we ask that each list is declared as a separate production which will produce a cons list. Typical functional style matching can be used to process the AST.
Your typical EBNF extensions would look something like this:
{Id ","}* - syntax Ids ::= List{Id, ","}
{Id ","}+ - syntax Ids ::= NeList{Id, ","}
Id? - syntax OptionalId ::= "" [klabel(none)] | Id [klabel(some)]
The optional (?) production has the same problem. So we ask the user to specify labels that can be referenced by rules. Note that the empty production is not allowed in the semantics module because it may interfere with parsing the concrete syntax in rules. So you will need to create a COMMON module with most of the syntax, and a *-SYNTAX module with the productions that can interfere with rule parsing (empty productions and tokens that can conflict with K variables).
No, there is currently no mechanism to do this without the extra production.
I typically do this as follows:
syntax MaybeFoo ::= ".MaybeFoo" | Foo
syntax ArrayDecl ::= Variable "[" MaybeFoo "]"
Non-empty lists may be declared similar to lists:
syntax Bars ::= NeList{Bar, ","}

How to use [binder] in K?

I have syntax:
syntax Process ::= KVar "(" KVar ")" "." Process [binder]
| "new" KVar "." Process [binder]
syntax Program ::= KVar "(" KVarVec ")" "=" Process [binder]
syntax KVarVec ::= KVar | KVar "," KVarVec
The two syntax has three productions that bind differently:
a(x).P, where x is bound in P, but a is a name that isn't being bound by that term.
new a.P binds a in P like a lambda.
f(a,b,c) = P binds a vector a,b,c of KVar in P. Each KVar in the vector is supposed to be bound in P.
How can I tell binder to bind specific variables in a production? Is there something like binder(2) to tell it that the second KVar is supposed to be bound? what if its several KVars defined by another syntax?
Currently one of the limitations of the binder attribute is that the variable bound must be the first nonterminal in the production, and the term that it is bound in must be the last nonterminal. Feel free to make a feature request for the generalization you propose on GitHub and I'll get to it at some point. Might not be right away though.

K Framework: Substitution not substituting in simple terms?

I have the following K file:
require "substitution.k"
module PURE
imports DOMAINS
imports SUBSTITUTION
syntax PSort ::= "$Type" [token]
| "$Kind" [token]
syntax Type ::= PSort
| KVar
| "Pi" KVar ":" Term "." Term [binder]
syntax Term ::= Type
| "(" Term ")" [bracket]
> Term Term [left]
> "declare" KVar ":" Term "in" Term
syntax KResult ::= Type
configuration
<T>
<k> typeof($PGM:Term, ?T) ~> ?T </k>
<typeEnv> .Map </typeEnv>
</T>
syntax KItem ::= typeof(Term, Term)
rule <k> typeof(declare X : T in E, T2) => typeof(E, T2) ... </k>
<typeEnv> TEnv => TEnv[X <- T] </typeEnv>
// VAR
rule <k> typeof(X:KVar, T) => . ... </k>
<typeEnv> ... X |-> T ... </typeEnv>
// APP
syntax KItem ::= Term "=" Term
rule T = T => .
rule typeof(M N, T) =>
typeof(M, Pi ?X : ?T1. ?T2) ~>
typeof(N, ?T1) ~>
?T2[N/?X] = T
endmodule
When I compile it with the Java backend and run the following file:
declare nat : $Type in
declare Z : nat in
declare Vector : Pi n : nat . $Type in
declare blah : Pi n : nat . (Vector n) in
blah Z
I get:
<T>
<k>
Vector n
</k>
<typeEnv>
Vector |-> Pi n : nat . $Type
Z |-> nat
blah |-> Pi n : nat . ( Vector n )
nat |-> $Type
</typeEnv>
</T>
But I want it to substitute Z for n and get Vector Z.
This appears to be a bug in the java backend that prematurely applies the substitution operator while its arguments are still symbolic variables. As a result, the substitution operator disappears prematurely, and then when the term that was substituted is instantiated via unification, it has not been substituted, which leads to the problem that you describe. Here is an issue tracking the problem: https://github.com/kframework/k/issues/1165
I took a stab at fixing it, but it proved to be nontrivial and I don't have time to dig deeper right now. You are welcome to try to fix it in a pull request if you want, although I am unsure why the fix I wrote is making other things break. Your better choice is probably to rewrite your typing rules so that they don't try to perform substitution on a variable. One way to do this would be to make the rule for application modify the type environment and then restore it when it's been fully typed. You can take a look at the K tutorial folder 1_k/5_types for some examples of how you can type a lambda-calculus-like language.

Reduce/Reduce conflict when introducing pointers in my grammar

I'm working on a small compiler in order to get a greater appreciation of the difficulties of creating one's own language. Right now I'm at the stage of adding pointer functionality to my grammar but I got a reduce/reduce conflict by doing it.
Here is a simplified version of my grammar that is compilable by bnfc. I use the happy parser generator and that's the program telling me there is a reduce/reduce conflict.
entrypoints Stmt ;
-- Statements
-------------
SDecl. Stmt ::= Type Ident; -- ex: "int my_var;"
SExpr. Stmt ::= Expr; -- ex: "printInt(123); "
-- Types
-------------
TInt. Type ::= "int" ;
TPointer. Type ::= Type "*" ;
TAlias. Type ::= Ident ; -- This is how I implement typedefs
-- Expressions
--------------
EMult. Expr1 ::= Expr1 "*" Expr2 ;
ELitInt. Expr2 ::= Integer ;
EVariable. Expr2 ::= Ident ;
-- and the standard corecions
_. Expr ::= Expr1 ;
_. Expr1 ::= Expr2 ;
I'm in a learning stage of how grammars work. But I think I know what happens. Consider these two programs
main(){
int a;
int b;
a * b;
}
and
typedef int my_type;
main(){
my_type * my_type_pointer_variable;
}
(The typedef and main(){} part isn't relevant and in my grammar. But they give some context)
In the first program I wish it would parse a "*" b as Stmt ==(SExpr)==> Expr ==(EMult)==> Expr * Expr ==(..)==> Ident "*" Ident, that is to essentially start stepping using the SExpr rule.
At the same time I would like my_type * my_type_pointer_variable to be expanded using the rules. Stmt ==(SDecl)==> Type Ident ==(TPointer)==> Type "*" Ident ==(TAlias)==> Ident "*" Ident.
But the grammar stage have no idea if an identifier originally is a type alias or a variable.
(1) How can I get rid of the reduce/reduce conflict and (2) am I the only one having this issue? Is there an obvious solution and how does the c grammar resolve this issue?
So far I have successfully just been able to change the syntax of my language by using "&" or some other symbol instead of "*", but that's very undesirable. Also I cannot make sense from various public c grammars and tried to see why they don't have this issue but I have had no luck in this.
And last, how do I resolve issues like these on my own? All I understood from happys more verbose output is how the conflict happens, is cleverness the only way to work around these conflicts? I'm afraid I'll stumble on even more issues for example when introducing EIndir. Expr = '*' Expr;
The usual way this problem is dealt with in C parsers is something generally called "the lexer feedback hack". Its a 'hack' in the sense that it doesn't deal with it in the grammar at all; instead, when the lexer recognizes an identifier, it classifies that identifier as either a typename or a non-typename, and returns a different token for each case (usually designated 'TypeIdent' for an identifier that is a typename and simply 'Ident' for any other). The lexer makes this selection by looking at the current state of the symbol table, so it sees all the typedefs that have occurred prior to the current point in the parse, but not typedefs that are after the current point. This is why C requires that you declare typedefs before their first use in each compilation unit.

BNF grammar for sequence of statements

If I am making a grammar for a c-like language that has a sequence of statements, what is the most standard way for defining the grammar?
My thought is to do something like this:
<program> ::= <statement>
<statement> ::= <statement-head><statement-tail>
<statement-head> ::= <if-statement> | <var-declaration> | <assignment> | <whatever>
<statement-tail> ::= ; | ;<statement>
but that feels a little clunky to me. I've also considered making
<program> ::= <statement>*
or
<statement> ::= <statement-head> ; | <sequence>
<sequence> ::= <statement> <statement>
type productions.
Is there a standard or accepted way to do this. I want my AST to be as clean as possible.
A very common way would be:
<block-statement> ::= '{' <statement-list> '}' ;
<statement-list> ::= /* empty */ | <statement-list> <statement> ;
<statement> ::= <whatever> ';' ;
And then you define actual statements instead of typing <whatever>. It seems much cleaner to include trailing semicolons as part of individual statements rather than putting them in the definition for the list non-terminal.
You can find the BNF for C here and I think it was taken from K&R, which you could check out. You could also check out the SQL BNF here which may provide more information on formulating good sequences.
This will provide some convention information.
In terms of AST generation, it really doesn't matter how 'clunky' your definition is providing it parses the source correctly for all permutations. Then just add the actions to build your AST.
Just make sure you are constructing your grammer for the right parser generator such as an LL or LR parser as you may run into problems with reduction, which will mean some rules need rewriting in a new way. See this on eliminating left recursion.
You may also want to check out Bison/Yacc examples such as these or these. Also check out the Dragon Book and a book called "Modern Compiler Implementation in C"