First and follow of the non-terminals in two grammars - grammar

Given the following grammar:
S -> L=L
s -> L
L -> *L
L -> id
What are the first and follow for the non-terminals?
If the grammar is changed into:
S -> L=R
S -> R
L -> *R
L -> id
R -> L
What will be the first and follow ?

When I took a compiler course in college I didn't understand FIRST and FOLLOWS at all. I implemented the algorithms described in the Dragon book, but I had no clue what was going on. I think I do now.
I assume you have some book that gives a formal definition of these two sets, and the book is completely incomprehensible. I'll try to give an informal description of them, and hopefully that will help you make sense of what's in your book.
The FIRST set is the set of terminals you could possibly see as the first part of the expansion of a non-terminal. The FOLLOWS set is the set of terminals you could possibly see following the expansion of a non-terminal.
In your first grammar, there are only three kinds of terminals: =, *, and id. (You might also consider $, the end-of-input symbol, to be a terminal.) The only non-terminals are S (a statement) and L (an Lvalue -- a "thing" you can assign to).
Think of FIRST(S) as the set of non-terminals that could possibly start a statement. Intuitively, you know you do not start a statement with =. So you wouldn't expect that to show up in FIRST(S).
So how does a statement start? There are two production rules that define what an S looks like, and they both start with L. So to figure out what's in FIRST(S), you really have to look at what's in FIRST(L). There are two production rules that define what an Lvalue looks like: it either starts with a * or with an id. So FIRST(S) = FIRST(L) = { *, id }.
FOLLOWS(S) is easy. Nothing follows S because it is the start symbol. So the only thing in FOLLOWS(S) is $, the end-of-input symbol.
FOLLOWS(L) is a little trickier. You have to look at every production rule where L appears, and see what comes after it. In the first rule, you see that = may follow L. So = is in FOLLOWS(L). But you also notice in that rule that there is another L at the end of the production rule. So another thing that could follow L is anything that could follow that production. We already figured out that the only thing that can follow the S production is the end-of-input. So FOLLOWS(L) = { =, $ }. (If you look at the other production rules, L always appears at the end of them, so you just get $ from those.)
Take a look at this Easy Explanation, and for now ignore all the stuff about ϵ, because you don't have any productions which contain the empty-string. Under "Rules for First Sets", rules #1, #3, and #4.1 should make sense. Under "Rules for Follows Sets", rules #1, #2, and #3 should make sense.
Things get more complicated when you have ϵ in your production rules. Suppose you have something like this:
D -> S C T id = V // Declaration is [Static] [Const] Type id = Value
S -> static | ϵ // The 'static' keyword is optional
C -> const | ϵ // The 'const' keyword is optional
T -> int | float // The Type is mandatory and is either 'int' or 'float'
V -> ... // The Value gets complicated, not important here.
Now if you want to compute FIRST(D) you can't just look at FIRST(S), because S may be "empty". You know intuitively that FIRST(D) is { static, const, int, float }. That intuition is codified in rule #4.2. Think of SCT in this example as Y1Y2Y3 in the "Easy Explanation" rules.
If you want to compute FOLLOWS(S), you can't just look at FIRST(C), because that may be empty, so you also have to look at FIRST(T). So FOLLOWS(S) = { const, int, float }. You get that by applying "Rules for follow sets" #2 and #4 (more or less).
I hope that helps and that you can figure out FIRST and FOLLOWS for the second grammar on your own.
If it helps, R represents an Rvalue -- a "thing" you can't assign to, such as a constant or a literal. An Lvalue can also act as an Rvalue (but not the other way around).
a = 2; // a is an lvalue, 2 is an rvalue
a = b; // a is an lvalue, b is an lvalue, but in this context it's an rvalue
2 = a; // invalid because 2 cannot be an lvalue
2 = 3; // invalid, same reason.
*4 = b; // Valid! You would almost never write code like this, but it is
// grammatically correct: dereferencing an Rvalue gives you an Lvalue.

Related

menhir - associate AST nodes with token locations in source file

I am using Menhir to parse a DSL. My parser builds an AST using an elaborate collection of nested types. During later typecheck and other passes in error reports generated for a user, I would like to refer to source file position where it occurred. These are not parsing errors, and they generated after parsing is completed.
A naive solution would be to equip all AST types with additional location information, but that would make working with them (e.g. constructing or matching) unnecessary clumsy. What are the established practices to do that?
I don't know if it's a best practice, but I like the approach taken in the abstract syntax tree of the Frama-C system; see https://github.com/Frama-C/Frama-C-snapshot/blob/master/src/kernel_services/ast_data/cil_types.mli
This approach uses "layers" of records and algebraic types nested in each other. The records hold meta-information like source locations, as well as the algebraic "node" you can match on.
For example, here is a part of the representation of expressions:
type ...
and exp = {
eid: int; (** unique identifier *)
enode: exp_node; (** the expression itself *)
eloc: location; (** location of the expression. *)
}
and exp_node =
| Const of constant (** Constant *)
| Lval of lval (** Lvalue *)
| UnOp of unop * exp * typ
| BinOp of binop * exp * exp * typ
...
So given a variable e of type exp, you can access its source location with e.eloc, and pattern match on its abstract syntax tree in e.enode.
So simple, "top-level" matches on syntax are very easy:
let rec is_const_expr e =
match e.enode with
| Const _ -> true
| Lval _ -> false
| UnOp (_op, e', _typ) -> is_const_expr e'
| BinOp (_op, l, r, _typ) -> is_const_expr l && is_const_expr r
To match deeper in an expression, you have to go through a record at each level. This adds some syntactic clutter, but not too much, as you can pattern match on only the one record field that interests you:
let optimize_double_negation e =
match e.enode with
| UnOp (Neg, { enode = UnOp (Neg, e', _) }, _) -> e'
| _ -> e
For comparison, on a pure AST without metadata, this would be something like:
let optimize_double_negation e =
match e.enode with
| UnOp (Neg, UnOp (Neg, e', _), _) -> e'
| _ -> e
I find that Frama-C's approach works well in practice.
You need somehow to attach the location information to your nodes. The usual solution is to encode your AST node as a record, e.g.,
type node =
| Typedef of typdef
| Typeexp of typeexp
| Literal of string
| Constant of int
| ...
type annotated_node = { node : node; loc : loc}
Since you're using records, you can still pattern match without too much syntactic overhead, e.g.,
match node with
| {node=Typedef t} -> pp_typedef t
| ...
Depending on your representation, you may choose between wrapping each branch of your type individually, wrapping the whole type, or recursively, like in Frama-C example by #Isabelle Newbie.
A similar but more general approach is to extend a node not with the location, but just with a unique identifier and to use a final map to add arbitrary data to nodes. The benefit of this approach is that you can extend your nodes with arbitrary data as you actually externalize node attributes. The drawback is that you can't actually guarantee the totality of an attribute since finite maps are no total. Thus it is harder to preserve an invariant that, for example, all nodes have a location.
Since every heap allocated object already has an implicit unique identifier, the address, it is possible to attach data to the heap allocated objects without actually wrapping it in another type. For example, we can still use type node as it is and use finite maps to attach arbitrary pieces of information to them, as long as each node is a heap object, i.e., the node definition doesn't contain constant constructors (in case if it has, you can work around it by adding a bogus unit value, e.g., | End can be represented as | End of unit.
Of course, by saying an address, I do not literally mean the physical or virtual address of an object. OCaml uses a moving GC so an actual address of an OCaml object may change during a program execution. Moreover, an address, in general, is not unique, as once an object is deallocated its address can be grabbed by a completely different entity.
Fortunately, after ephemera were added to the recent version of OCaml it is no longer a problem. Moreover, an ephemeron will play nicely with the GC, so that if a node is no longer reachable its attributes (like file locations) will be collected by the GC. So, let's ground this with a concrete example. Suppose we have two nodes c1 and c2:
let c1 = Literal "hello"
let c2 = Constant 42
Now we can create a location mapping from nodes to locations (we will represent the latter as just strings)
module Locations = Ephemeron.K1.Make(struct
type t = node
let hash = Hashtbl.hash (* or your own hash if you have one *)
let equal = (=) (* or a specilized equal operator *)
end)
The Locations module provides an interface of a typical imperative hash table. So let's use it. In the parser, whenever you create a new node you should register its locations in the global locations value, e.g.,
let locations = Locations.create 1337
(* somewhere in the semantics actions, where c1 and c2 are created *)
Locations.add c1 "hello.ml:12:32"
Locations.add c2 "hello.ml:13:56"
And later, you can extract the location:
# Locations.find locs c1;;
- : string = "hello.ml:12:32"
As you see, although the solution is nice in the sense, that it doesn't touch the node data type, so the rest of your code can pattern match on it nice and easy, it is still a little bit dirty, as it requires global mutable state, that is hard to maintain. Also, since we are using an object address as a key, every newly created object, even if it was logically derived from the original object, will have a different identity. For example, suppose you have a function, that normalizes all literals:
let normalize = function
| Literal str -> Literal (normalize_literal str)
| node -> node
It will create a new Literal node from the original nodes, so all the location information will be lost. That means, that you need to update the location information, every time you derive one node from another.
Another issue with ephemera is that they can't survive the marshaling or serialization. I.e., if you store your AST somewhere in a file, and then you restore it, all nodes will loose their identity, and the location table will become empty.
Speaking of the "monadic approach" that you mentioned in comments. Though monads are magic, they still can't magically solve all the problems. They are not silver bullets :) In order to attach something to a node we still need to extend it with an extra attribute - either a location information directly or an identity through which we can attach properties indirectly. The monad can be useful for the latter though, as instead of having a global reference to the last assigned identifier, we can use a state monad, to encapsulate our id generator. And for the sake of completeness, instead of using a state monad or a global reference to generate unique identifiers, you can use UUID and get identifiers that are not only unique in a program run, but are also universally unique, in the sense that there are no other objects in the world with the same identifier, no matter how often you run your program (in the sane world). And although it looks like that generating the UUID doesn't use any state, underneath the hood it still uses an imperative random number generator, so it is sort of cheating, but still can seen as pure functional, as it doesn't contain observable effects.

How to modify parsing grammar to allow assignment and non-assignment statements?

So the question is about the grammar below. I'm working on a mini-interpreted language for fun (we learned about some compiler design in class, so I want to take it to the next level and try something on my own). I'm stuck trying to make the non-terminal symbol Expr.
Statement ::= Expr SC
Expr ::= /* I need help here */
Assign ::= Name EQUAL Expr
AddSub ::= MulDiv {(+|-) AddSub}
MulDiv ::= Primary {(*|/) MulDiv}
Primary ::= INT | FLOAT | STR | LP Expr RP | Name
Name ::= ID {. Name}
Expr has to be made such that Statement must allow for the two cases:
x = 789; (regular assignment, followed by semicolon)
x+2; (no assignment, just calculation, discarded; followed by a semicolon)
The purpose of the second case is to setup the foundation for more changes in the future. I was thinking about unary increment and decrement operators, and also function calls; both of which don't require assignment to be meaningful.
I've looked at other grammars (C# namely), but it was too complicated and lengthy to understand. Naturally I'm not looking for solutions, but only for guidance on how I could modify my grammar.
All help is appreciated.
EDIT: I should say that my initial thought was Expr ::= Assign | AddSub, but that wouldn't work since it would create ambiguity since both could start with the non-terminal symbol Name. I have made my tokenizer such that it allows one token look ahead (peek), but I have not made such a thing for the non terminals, since it would be trying to fix a problem that could be avoided (ambiguity). In the grammar, the terminals are the ones that are all-caps.
The simplest solution is the one actually taken by the designers of C, and thus by the various C derivatives: treat assignment simply as yet another operator, without restricting it to being at the top-level of a statement. Hence, in C, the following is unproblematic:
while ((ch = getchar()) != EOF) { ... }
Not everyone will consider that good style, but it is certainly common (particularly in the clauses of the for statement, whose syntax more or less requires that assignment be an expression).
There are two small complications, which are relatively easy to accomplish:
Logically, and unlike most operators, assignment associates to the right so that a = b = 0 is parsed as a = (b = 0) and not (a = b) = 0 (which would be highly unexpected). It also binds very weakly, at least to the right.
Opinions vary as to how tightly it should bind to the left. In C, for the most part a strict precedence model is followed so that a = 2 + b = 3 is rejected since it is parsed as a = ((2 + b) = 3). a = 2 + b = 3 might seem like terrible style, but consider also a < b ? (x = a) : (y = a). In C++, where the result of the ternary operator can be a reference, you could write that as (a < b ? x : y) = a in which the parentheses are required even thought assignment has lower precedence than the ternary operator.
None of these options are difficult to implement in a grammar, though.
In many languages, the left-hand side of an assignment has a restricted syntax. In C++, which has reference values, the restriction could be considered semantic, and I believe it is usually implemented with a semantic check, but in many C derivatives lvalue can be defined syntactically. Such definitions are unambiguous, but they are often not amenable to parsing with a top-down grammar, and they can create complications even for a bottom-up grammar. Doing the check post-parse is always a simple solution.
If you really want to distinguish assignment statements from expression statements, then you indeed run into the problem of prediction failure (not ambiguity) if you use a top-down parsing technique such as recursive descent. Since the grammar is not ambiguous, a simple solution is to use an LALR(1) parser generator such as bison/yacc, which has no problems parsing such a grammar since it does not require an early decision as to which kind of statement is being parsed. On the whole, the use of LALR(1) or even GLR parser generators simplifies implementation of a parser by allowing you to specify a grammar in a form which is easily readable and corresponds to the syntactic analysis. (For example, an LALR(1) parser can handle left-associative operators naturally, while a LL(1) grammar can only produce right-associative parses and therefore requires some kind of reconstruction of the syntax tree.)
A recursive descent parser is a computer program, not a grammar, and its expressiveness is thus not limited by the formal constraints of LL(1) grammars. That is both a strength and a weakness: the strength is that you can find solutions which are not limited by the limitations of LL(1) grammars; the weakness is that it is much more complicated (even, sometimes, impossible) to extract a clear statement about the precise syntax of the language. This power, for example, allows recursive descent grammars to handle left associativity in a more-or-less natural way despite the restriction mentioned above.
If you want to go down this road, then the solution is simple enough. You will have some sort of function:
/* This function parses and returns a single expression */
Node expr() {
Node left = value();
while (true) {
switch (lookahead) {
/* handle each possible operator token. I left out
* the detail of handling operator precedence since it's
* not relevant here
*/
case OP_PLUS: {
accept(lookahead);
left = MakeNode(OP_PLUS, left, value());
break;
}
/* If no operator found, return the current expression */
default:
return left;
}
}
}
That easily be modified to be able to parse both expressions and statements. First, refactor the function so that it parses the "rest" of an expression, given the first operator. (The only change is a new prototype and the deletion of the first line in the body.)
/* This function parses and returns a single expression
* after the first value has been parsed. The value must be
* passed as an argument.
*/
Node expr_rest(Node left) {
while (true) {
switch (lookahead) {
/* handle each possible operator token. I left out
* the detail of handling operator precedence since it's
* not relevant here
*/
case OP_PLUS: {
accept(lookahead);
left = MakeNode(OP_PLUS, left, value());
break;
}
/* If no operator found, return the current expression */
default:
return left;
}
}
}
With that in place, it is straightforward to implement both expr and stmt:
Node expr() {
return expr_rest(value());
}
Node stmt() {
/* Check lookahead for statements which start with
* a keyword. Omitted for simplicity.
*/
/* either first value in an expr or target of assignment */
Node left = value();
switch (lookahead) {
case OP_ASSIGN:
accept(lookahead);
return MakeAssignment(left, expr())
}
/* Handle += and other mutating assignments if desired */
default: {
/* Not an assignment, just an expression */
return MakeExpressionStatement(expr_rest(left));
}
}
}

Check if variable is empty or filled

I have the following problem:
prolog prog:
man(thomas, 2010).
man(leon, 2011).
man(thomas, 2012).
man(Man) :- once(man(Man, _).
problem:
?- man(thomas).
true ; %i want only on true even if there are more "thomas" *working because of once()*
?- man(X).
X = thomas ; %i want all man to be listed *isn't working*
goal:
?- man(thomas).
true ;
?- man(X).
X = thomas ;
X = leon ;
X = thomas ;
I do unterstand why this happens, but still want to get the names of all man.
So my solution woud be to look if "Man" is initialized, if yes than "once.." else then... something like that:
man(Man) :- (->check<-,once(man(Man, _)); man(Man, _).
On "check" shoud be the code sniped that checks if the variable "Man" is filled.
Is this possible?
One way to achieve this is as follows:
man(X) :-
(nonvar(X), man(X, _)), !
;
man(X, _).
Or, more preferred, would be:
man(X) :-
( var(X)
-> man(X, _)
; once(man(X, _))
).
The cut will ensure only one solution (at most) to an instantiated X, whereas the non-instantiated case will run its course. Note that, with the cut, you don't need once/1. The reason once/1 doesn't work as expected without the cut is that backtracking will still come back and take the "or" condition and succeed there as well.
man(X) :-
setof(t,Y^man(X,Y),_).
Additionally to what you are asking this removes redundant answers/solutions.
The built-in setof/3 describes in its last argument the sorted list of solutions found in the first argument. And that for each different instantiation of the free variables of the goal.
Free variables are those which neither occur in the first argument nor as an existential variable – the term on the left of (^)/2.
In our case this means that the last argument will always be [t] which is uninteresting. Therefore the _.
Two variables occurring in the goal are X and Y. Or, to be more precise the variables contained in X and Y. Y is an existential variable.
The only free variable is X. So all solutions for X are enumerated without redundancies. Note that you cannot depend on the precise order which happens to be sorted in this concrete case in many implementations.

Why can't a LL grammar be left-recursive?

In the dragon book, LL grammar is defined as follows:
A grammar is LL if and only if for any production A -> a|b, the following two conditions apply.
FIRST(a) and FIRST(b) are disjoint. This implies that they cannot both derive EMPTY
If b can derive EMPTY, then a cannot derive any string that begins with FOLLOW(A), that is FIRST(a) and FOLLOW(A) must be disjoint.
And I know that LL grammar can't be left recursive, but what is the formal reason? I guess left-recursive grammar will contradict rule 2, right? e.g., I've written following grammar:
S->SA|empty
A->a
Because FIRST(SA) = {a, empty} and FOLLOW(S) ={$, a}, then FIRST(SA) and FOLLOW(S) are not disjoint, so this grammar is not LL. But I don't know if it is the left-recursion make FIRST(SA) and FOLLOW(S) not disjoint, or there is some other reason? Put it in another way, is it true that every left-recursive grammar will have a production that will violate condition 2 of LL grammar?
OK, I figure it out, if a grammar contains left-recursive production, like:
S->SA
Then somehow it must contain another production to "finish" the recursion,say:
S->B
And since FIRST(B) is a subset of FIRST(SA), so they are joint, this violates condition 1, there must be conflict when filling parse table entries corresponding to terminals both in FIRST(B) and FIRST(SA). To summarize, left-recursion grammar could cause FIRST set of two or more productions to have common terminals, thus violating condition 1.
Consider your grammar:
S->SA|empty
A->a
This is a shorthand for the three rules:
S -> SA
S -> empty
A -> a
Now consider the string aaa. How was it produced? You can only read one character at a time if you have no lookahead, so you start off like this (you have S as start symbol):
S -> SA
S -> empty
A -> a
Fine, you have produced the first a. But now you cannot apply any more rules because there is no more non-terminals. You are stuck!
What you should have done was this:
S -> SA
S -> SA
S -> SA
S -> empty
A -> a
A -> a
A -> a
But you don't know this without reading the entire string. You would need an infinite amount of lookahead.
In a general sense, yes, every left-recursive grammar can have ambiguous strings without infinite lookahead. Look at the example again: There are two different rules for S. Which one should we use?
An LL(k) grammar is one that allows the construction of a deterministic, descent parser with only k symbols of lookahead. The problem with left recursion is that it makes it impossible to determine which rule to apply until the complete input string is examined, which makes the required k potentially infinite.
Using your example, choose a k, and give the parser an input sequence of length n >= k:
aaaaaaa...
A parser cannot decide if it should apply S->SA or S->empty by looking at the k symbols ahead because the decision would depend on how many times S->SA has been chosen before, and that is information the parser does not have.
The parser would have to choose S->SA exactly n times and S->empty once, and it's impossible to decide which is right by looking at the first k symbols in the input stream.
To know, a parser would have to both examine the complete input sequence, and keep count of how many times S->SA has been chosen, but such a parser would fall outside of the definition of LL(k).
Note that unlimited lookahead is not a solution because a parser runs on limited resources, so there will always be a finite input sequence of a length large enough to make the parser crash before producing any output.
In the book "The Theory of Parsing", Volume 2, by Aho and Ullman, page 681 you can find Lemma 8.3 that states: "No LL(k) grammar is left-recursive".
The proof says:
Suppose that G = (N, T, P, S) has a left-recursive nonterminal A. Then there is a derivation A -> Aw. If w -> e then it is easy to show that G is ambiguous and hence cannot be LL. Thus, assume that w -> v for some v in T+ (a non empty string of terminals). We can further assume that A -> u, being u some string of terminals and that there exists a derivation
Hence, there is another derivation:

Optional named arguments without wrapping them all in "OptionValue"

Suppose I have a function with optional named arguments but I insist on referring to the arguments by their unadorned names.
Consider this function that adds its two named arguments, a and b:
Options[f] = {a->0, b->0}; (* The default values. *)
f[OptionsPattern[]] :=
OptionValue[a] + OptionValue[b]
How can I write a version of that function where that last line is replaced with simply a+b?
(Imagine that that a+b is a whole slew of code.)
The answers to the following question show how to abbreviate OptionValue (easier said than done) but not how to get rid of it altogether: Optional named arguments in Mathematica
Philosophical Addendum: It seems like if Mathematica is going to have this magic with OptionsPattern and OptionValue it might as well go all the way and have a language construct for doing named arguments properly where you can just refer to them by, you know, their names. Like every other language with named arguments does. (And in the meantime, I'm curious what workarounds are possible...)
Why not just use something like:
Options[f] = {a->0, b->0};
f[args___] := (a+b) /. Flatten[{args, Options[f]}]
For more complicated code I'd probably use something like:
Options[f] = {a->0, b->0};
f[OptionsPattern[]] := Block[{a,b}, {a,b} = OptionValue[{a,b}]; a+b]
and use a single call to OptionValue to get all the values at once. (Main reason is that this cuts down on messages if there are unknown options present.)
Update, to programmatically generate the variables from the options list:
Options[f] = {a -> 0, b -> 0};
f[OptionsPattern[]] :=
With[{names = Options[f][[All, 1]]},
Block[names, names = OptionValue[names]; a + b]]
Here is the final version of my answer, containing the contributions from the answer by Brett Champion.
ClearAll[def];
SetAttributes[def, HoldAll];
def[lhs : f_[args___] :> rhs_] /; !FreeQ[Unevaluated[lhs], OptionsPattern] :=
With[{optionNames = Options[f][[All, 1]]},
lhs := Block[optionNames, optionNames = OptionValue[optionNames]; rhs]];
def[lhs : f_[args___] :> rhs_] := lhs := rhs;
The reason why the definition is given as a delayed rule in the argument is that this way we can
benefit from the syntax highlighting. Block trick is used because it fits the problem: it does not interfere with possible nested lexical scoping constructs inside your function, and therefore there is no danger of inadvertent variable capture. We check for presence of OptionsPattern since this code wil not be correct for definitions without it, and we want def to also work in that case.
Example of use:
Clear[f, a, b, c, d];
Options[f] = {a -> c, b -> d};
(*The default values.*)
def[f[n_, OptionsPattern[]] :> (a + b)^n]
You can look now at the definition:
Global`f
f[n$_,OptionsPattern[]]:=Block[{a,b},{a,b}=OptionValue[{a,b}];(a+b)^n$]
f[n_,m_]:=m+n
Options[f]={a->c,b->d}
We can test it now:
In[10]:= f[2]
Out[10]= (c+d)^2
In[11]:= f[2,a->e,b->q]
Out[11]= (e+q)^2
The modifications are done at "compile - time" and are pretty transparent. While this solution saves
some typing w.r.t. Brett's, it determines the set of option names at "compile-time", while Brett's - at "run-time". Therefore, it is a bit more fragile than Brett's: if you add some new option to the function after it has been defined with def, you must Clear it and rerun def. In practice, however, it is customary to start with ClearAll and put all definitions in one piece (cell), so this does not seem to be a real problem. Also, it can not work with string option names, but your original concept also assumes they are Symbols. Also, they should not have global values, at least not at the time when def executes.
Here's a kind of horrific solution:
Options[f] = {a->0, b->0};
f[OptionsPattern[]] := Module[{vars, tmp, ret},
vars = Options[f][[All,1]];
tmp = cat[vars];
each[{var_, val_}, Transpose[{vars, OptionValue[Automatic,#]& /# vars}],
var = val];
ret =
a + b; (* finally! *)
eval["ClearAll[", StringTake[tmp, {2,-2}], "]"];
ret]
It uses the following convenience functions:
cat = StringJoin##(ToString/#{##})&; (* Like sprintf/strout in C/C++. *)
eval = ToExpression[cat[##]]&; (* Like eval in every other lang. *)
SetAttributes[each, HoldAll]; (* each[pattern, list, body] *)
each[pat_, lst_, bod_] := ReleaseHold[ (* converts pattern to body for *)
Hold[Cases[Evaluate#lst, pat:>bod];]]; (* each element of list. *)
Note that this doesn't work if a or b has a global value when the function is called. But that was always the case for named arguments in Mathematica anyway.