What is a concatenated term and in what contexts are they unexpected? - kframework

My main question is what an "unexpected concatenated term". I have some context for the error below though there seem to be a slew of other issues that might be confounding this one.
I have a term foo of sort Map and a term bar also of sort Map. foo takes a list of expressions and the goal is to place them all into a Map. To that end for each expression I call bar on it and concatenate. From the syntax of Map I believe it should be enough to then say bar(exp) foo(exps) to get the entire Map.
This compiles fine however when I try to run it as soon as I rewrite into bar(exp) foo(exps) I get a
[Error] Critical: unexpected concatenated termfoo(...) while evaluating function _Map_. I removed the expressions themselves for brevity.
I believe the issue may be with the Map union having a higher priority than my foo and bar so I tried assigning bar as a function though because bar is strict in its arguments this caused an error with sorts KItem and Exp being incompatible.

#ALee we would need to see your code to be sure, but I think the confusion here is that you are trying to union two maps together, and K actually doesn't support map union.
We have the constructor _Map_, which lets you, for example, build the Map which looks like this: "a" |-> 3 "b" |-> 4. But you cannot take two arbitrary maps and use this constructor to put them together.
The reason for this is that the resulting term may not be well-defined (what if both maps define the same key, but have different values?).
Instead, if you want this behavior, you need to write your own map union operator, which makes a choice about what to do when there are overlapping keys in the two maps. This example prefers the keys from the second Map argument (over-writing the value of the first map if the same key exists there):
syntax Map ::= union ( Map , Map ) [function]
rule union(M, .Map) => M
rule union(M, K |-> V M') => union(M [ K <- V ], M')

Related

display working out of order in Chez Scheme

I'm using chez 9.5.4 on a Mac.
The following code:
;; demo.ss
(map display (list "this " "is " "weird "))
does this:
$ chez --script demo.ss
weird this is
Why the accidental Yoda?
How do I prevent this?
It works as expected in Chicken Scheme.
As answered by u/bjoli on reddit:
Your code is relying on unspecified behaviour: The order of map is unspecified.
You want for-each.
Chez has no stack overflow (like racket or guile). Doing a right fold with map means no (reverse ...) At the end. It is faster, except in some continuation-heavy code.
Schemes without the expanding stack optimization all do a left fold. Like chicken.
The short answer is that this is just what map does.
According to the r7rs-small specification, on page 51 of https://small.r7rs.org/attachment/r7rs.pdf :
The dynamic order in which proc is applied to
the elements of the list s is unspecified.
That's because map is intended for transforming lists by applying a pure function to each of their elements. The only effect of map should be its result list.
As divs1210 quotes u/bjoli in pointing out, Scheme also defines a procedure that does the thing you want. In fact, for-each is described on the very same page of the r7rs-small pdf! It says:
The arguments to for-each are like the arguments to map,
but for-each calls proc for its side effects rather than for
its values. Unlike map, for-each is guaranteed to call proc
on the elements of the list s in order from the first ele-
ment(s) to the last, and the value returned by for-each
is unspecified.

Difficulty writing PEG recursive expression grammar with Arpeggio

My input text might have a simple statement like this:
aircraft
In my language I call this a name which represents a set of instances with various properties.
It yields an instance_set of all aircraft instances in this example.
I can apply a filter in parenthesis to any instance_set:
aircraft(Altitude < ceiling)
It yields another, possibly reduced instance_set.
And since it is an instance set, I can filter it yet again:
aircraft(Altitude < ceiling)(Speed > min_speed)
I foolishly thought I could do something like this in my grammar:
instance_set = expr
expr = source / instance_set
source = name filter?
It parses my first two cases correctly, but chokes on the last one:
aircraft(Altitude < ceiling)(Speed > min_speed)
The error reported being just before the second open paren.
Why doesn't Arpeggio see that there is just a filtered instance_set which is itself a filtered instance set?
I humbly submit my appeal to the peg parsing gods and await insight...
Your first two cases both match source. Once source is matched, it's matched; that's the PEG contract. So the parser isn't going to explore the alternative.
But suppose it did. How could that help? The rule says that if an expr is not a source, then it's an instance_set. But an instance_set is just an expr. In other words, an expr is either a source or it's an expr. Clearly the alternative doesn't get us anywhere.
I'm pretty sure Arpeggio has repetitions, which is what you really want here:
source = name filter*

Fast lookup of tree with placeholders?

For an application I'm considering, there would be a large (100,000+) 'database' of trees (think expressions in a programming language, or S-expressions), and I would need to query that database for expressions that match a specific given expression.
Before giving the details of what I'd like to have, note that I'd appreciate any information related to indexing a large set of trees for optimizing lookup by a subtree.
In my specific situation (which would be for a backend to be used by Metamath proof assistants), expressions have the following structure (in Haskell-like notation):
data Expression = Placeholder Id | VarName Id | ConstName Id [Expression]
or as a BNF for an S-expression form:
Expression = '?' Id | Id | '(' Id Expression* ')'
where Id is some kind of identifier.
For example, I could have a database with expressions like
(equiv ?ph ?ps)
(not (in (appl (sqrt) (2)) (Q)))
(equiv (eq ?A ?B) (forall ?x (equiv (in ?x ?A) (in ?x ?B))))
In this context, two expressions match if they can be made equal by substitution of expressions for placeholders. So looking up (equiv (eq A (emptyset)) ?ph) in the above mini-database would result in the first and last expressions.
So again: how would I implement fast lookups in a large set of (expression) trees with placeholders? What kind of index data structure could I use?
I would implement the lookup with a trie. Each key would consist of one of the following:
ConstName Identifier
Variable w/ context info
ConstValue
Placeholder
These should be ordered in some fashion- possibly Placeholder, then all ConstNames (alphabetical), then variables (scope ordering, then argument order), then ConstValues (numerical order). As long as there's a concrete ordering for usage in the trie, you're fine.
Traverse the expression's tree, injecting the appropriate keys into the trie as they are encountered. Do this for all the expressions you want to insert into your data structure. When it comes time to query it, you can traverse the trie in a similar fashion, but with a few new rules.
Everything matches a placeholder node. If it matches some other key as well, then you'll need to explore both branches (easily done via a recursive DFS-like approach).
A placeholder matches everything. This is not equivalent to the previous point- we are talking about placeholders in the query here, the previous bullet is regarding placeholders as trie keys.
Now, this does mean that the search space can somewhat "explode" as you encounter placeholders, but there is one thing you can do to try to mitigate this in practice. Traverse the expression's tree in a breadth-first fashion (both in construction of the trie, and querying). This means if one of the arguments is a placeholder, you won't have to full-depth search every single subtree that matches that expression so far- instead you jump ahead to the next argument- which may not be a placeholder, and will thus greatly prune the search space (compared to matching "everything").
For completeness sake, lets take one of your examples
(not (in (appl (sqrt) (2)) (Q)))
and make a trie entry from that-
not -> in -> apply -> "Q" -> sqrt -> 2
adding (not (in ?ph E)) to this would result in-
not -> in -> apply -> "Q" -> sqrt -> 2
\-> ?ph -> "E"
Continue in this fashion injecting expressions into the trie. Also traverse in this fashion for querying until you reach the ends of your searches into the trie, and return those that matched.
Note- the uniqueness of these entries is based on the assumption you do not have to support variadic functions. If you do, attach to each key some context info (read the next paragraphs for info on how to do this) to distinguish which arguments go to which functions
There is one detail I glossed over- variables. If you only want it to match if they are the exact same variable name, then no work is necessary. But this likely isn't what you want; you probably want it to match generic variables as long as they are "consistent" with each other. The way to do this is to assign each variable an identifier that represents the scope of which it was first defined.
The easiest way to do this is just compose an identifier from the concatenation of the argument ordering of its ancestors. That is, if a variable is first defined as the second argument to a function which is the fifth argument to the root function, then we might label it as (5, 2) or (2, 5), whichever makes more sense intuitively. Either way, this will ensure the variable is given a consistent identifier regardless of other variables / functions elsewhere. Then proceed as normal with this new variable name.

How would you structure a spreadsheet app in elm?

I've been looking at elm and I really enjoy learning the language. I've been thinking about doing a spreadsheet application, but i can't wrap my head how it would be structured.
Let's say we have three cells; A, B and C.
If I enter 4 in cell A and =A in cell B how would i get cell B to always equal cell A? If i then enter =A+B in cell C, can that be evaluated to 8, and also be updated when A or B changes?
Not sure how to lever Signals for such dynamic behavior..
Regards Oskar
First you need to decide how to represent your spreadsheet grid. If you come from a C background, you may want to use a 2D array, but I've found that a dictionary actually works better in Elm. So you can define type alias Grid a = Dict (Int, Int) a.
As for the a, what each cell holds... this is an opportunity to define a domain-specific language. So something like
type Expr = Lit Float | Ref (Int, Int) | Op2 (Float -> Float -> Float) Expr Expr
This means an expression is either a literal float, a reference to another cell location, or an operator. An operator can be any function on two floats, and two other expressions which get recursively evaluated. Depending on what you're going for, you can instead define specific tags for each operation, like Plus Expr Expr | Times Expr Expr, or you can add extra opN tags for operations of different arity (like negate).
So then you might define type alias Spreadsheet = Grid Expr, and if you want to alias (Int, Int) to something, that might help too. I'm also assuming you only want floats in your spreadsheet.
Now you need functions to convert strings to expressions and back. The traditional names for these functions are parse and eval.
parse : String -> Maybe Expr -- Result can also work
eval : Spreadsheet -> Grid Float
evalOne : Expr -> Spreadsheet -> Maybe Float
Parse will be a little tricky; the String module is your friend. Eval will involve chasing references through the spreadsheet and filling in the results, recursively. At first you'll want to ignore the possibility of catching infinite loops. Also, this is just a sketch, if you find that different type signatures work better, use them.
As for the view, I'd start with read-only, so you can verify hard-coded spreadsheets are evaluated properly. Then you can worry about editing, with the idea being that you just rerun the parser and evaluator and get a new spreadsheet to render. It should work because a spreadsheet has no state other than the contents of each cell. (Minimizing the recomputed work is one of many different ways you can extend this.) If you're using elm-html, table elements ought to be fine.
Hope this sets you off in the right direction. This is an ambitious project and I'd love to see it when you're done (post it to the mailing list). Good luck!

How to tell if an identifier is being assigned or referenced? (FLEX/BISON)

So, I'm writing a language using flex/bison and I'm having difficulty with implementing identifiers, specifically when it comes to knowing when you're looking at an assignment or a reference,
for example:
1) A = 1+2
2) B + C (where B and C have already been assigned values)
Example one I can work out by returning an ID token from flex to bison, and just following a grammar that recognizes that 1+2 is an integer expression, putting A into the symbol table, and setting its value.
examples two and three are more difficult for me because: after going through my lexer, what's being returned in ex.2 to bison is "ID PLUS ID" -> I have a grammar that recognizes arithmetic expressions for numerical values, like INT PLUS INT (which would produce an INT), or DOUBLE MINUS INT (which would produce a DOUBLE). if I have "ID PLUS ID", how do I know what type the return value is?
Here's the best idea that I've come up with so far: When tokenizing, every time an ID comes up, I search for its value and type in the symbol table and switch out the ID token with its respective information; for example: while tokenizing, I come across B, which has a regex that matches it as being an ID. I look in my symbol table and see that it has a value of 51.2 and is a DOUBLE. So instead of returning ID, with a value of B to bison, I'm returning DOUBLE with a value of 51.2
I have two different solutions that contradict each other. Here's why: if I want to assign a value to an ID, I would say to my compiler A = 5. In this situation, if I'm using my previously described solution, What I'm going to get after everything is tokenized might be, INT ASGN INT, or STRING ASGN INT, etc... So, in this case, I would use the former solution, as opposed to the latter.
My question would be: what kind of logical device do I use to help my compiler know which solution to use?
NOTE: I didn't think it necessary to post source code to describe my conundrum, but I will if anyone could use it effectively as a reference to help me understand their input on this topic.
Thank you.
The usual way is to have a yacc/bison rule like:
expr: ID { $$ = lookupId($1); }
where the the lookupId function looks up a symbol in the symbol table and returns its type and value (or type and storage location if you're writing a compiler rather than a strict interpreter). Then, your other expr rules don't need to care whether their operands come from constants or symbols or other expressions:
expr: expr '+' expr { $$ = DoAddition($1, $3); }
The function DoAddition takes the types and values (or locations) for its two operands and either adds them, producing a result, or produces code to do the addition at run time.
If possible redesign your language so that the situation is unambiguous. This is why even Javascript has var.
Otherwise you're going to need to disambiguate via semantic rules, for example that the first use of an identifier is its declaration. I don't see what the problem is with your case (2): just generate the appropriate code. If B and C haven't been used yet, a value-reading use like this should be illegal, but that involves you in control flow analysis if taken to the Nth degree of accuracy, so you might prefer to assume initial values of zero.
In any case you can see that it's fundamentally a language design problem rather than a coding problem.