How to use hyperoperators with Scalars that aren't really scalar? - raku

I want to make a hash of sets. Well, SetHashes, since they need to be mutable.
In fact, I would like to initialize my Hash with multiple identical copies of the same SetHash.
I have an array containing the keys for the new hash: #keys
And I have my SetHash already initialized in a scalar variable: $set
I'm looking for a clean way to initialize the hash.
This works:
my %hash = ({ $_ => $set.clone } for #keys);
(The parens are needed for precedence; without them, the assignment to %hash is part of the body of the for loop. I could change it to a non-postfix for loop or make any of several other minor changes to get the same result in a slightly different way, but that's not what I'm interested in here.)
Instead, I was kind of hoping I could use one of Raku's nifty hyper-operators, maybe like this:
my %hash = #keys »=>» $set;
That expression works a treat when $set is a simple string or number, but a SetHash?
Array >>=>>> SetHash can never work reliably: order of keys in SetHash is indeterminate
Good to know, but I don't want it to hyper over the RHS, in any order. That's why I used the right-pointing version of the hyperop: so it would instead replicate the RHS as needed to match it up to the LHS. In this sort of expression, is there any way to say "Yo, Raku, treat this as a scalar. No, really."?
I tried an explicit Scalar wrapper (which would make the values harder to get at, but it was an experiment):
my %map = #keys »=>» $($set,)
And that got me this message:
Lists on either side of non-dwimmy hyperop of infix:«=>» are not of the same length while recursing
left: 1 elements, right: 4 elements
So it has apparently recursed into the list on the left and found a single key and is trying to map it to a set on the right which has 4 elements. Which is what I want - the key mapped to the set. But instead it's mapping it to the elements of the set, and the hyperoperator is pointing the wrong way for that combination of sizes.
So why is it recursing on the right at all? I thought a Scalar container would prevent that. The documentation says it prevents flattening; how is this recursion not flattening? What's the distinction being drawn?
The error message says the version of the hyperoperator I'm using is "non-dwimmy", which may explain why it's not in fact doing what I mean, but is there maybe an even-less-dwimmy version that lets me be even more explicit? I still haven't gotten my brain aligned well enough with the way Raku works for it to be able to tell WIM reliably.

I'm looking for a clean way to initialize the hash.
One idiomatic option:
my %hash = #keys X=> $set;
See X metaoperator.
The documentation says ... a Scalar container ... prevents flattening; how is this recursion not flattening? What's the distinction being drawn?
A cat is an animal, but an animal is not necessarily a cat. Flattening may act recursively, but some operations that act recursively don't flatten. Recursive flattening stops if it sees a Scalar. But hyperoperation isn't flattening. I get where you're coming from, but this is not the real problem, or at least not a solution.
I had thought that hyperoperation had two tests controlling recursing:
Is it hyperoperating a nodal operation (eg .elems)? If so, just apply it like a parallel shallow map (so don't recurse). (The current doc quite strongly implies that nodal can only be usefully applied to a method, and only a List one (or augmentation thereof) rather than any routine that might get hyperoperated. That is much more restrictive than I was expecting, and I'm sceptical of its truth.)
Otherwise, is a value Iterable? If so, then recurse into that value. In general the value of a Scalar automatically behaves as the value it contains, and that applies here. So Scalars won't help.
A SetHash doesn't do the Iterable role. So I think this refusal to hyperoperate with it is something else.
I just searched the source and that yields two matches in the current Rakudo source, both in the Hyper module, with this one being the specific one we're dealing with:
multi method infix(List:D \left, Associative:D \right) {
die "{left.^name} $.name {right.^name} can never work reliably..."
}
For some reason hyperoperation explicitly rejects use of Associatives on either the right or left when coupled with the other side being a List value.
Having pursued the "blame" (tracking who made what changes) I arrived at the commit "Die on Associative <<op>> Iterable" which says:
This can never work due to the random order of keys in the Associative.
This used to die before, but with a very LTA error about a Pair.new()
not finding a suitable candidate.
Perhaps this behaviour could be refined so that the determining factor is, first, whether an operand does the Iterable role, and then if it does, and is Associative, it dies, but if it isn't, it's accepted as a single item?
A search for "can never work reliably" in GH/rakudo/rakudo issues yields zero matches.
Maybe file an issue? (Update I filed "RFC: Allow use of hyperoperators with an Associative that does not do Iterable role instead of dying with "can never work reliably".)
For now we need to find some other technique to stop a non-Iterable Associative being rejected. Here I use a Capture literal:
my %hash = #keys »=>» \($set);
This yields: {a => \(SetHash.new("b","a","c")), b => \(SetHash.new("b","a","c")), ....
Adding a custom op unwraps en passant:
sub infix:« my=> » ($lhs, $rhs) { $lhs => $rhs[0] }
my %hash = #keys »my=>» \($set);
This yields the desired outcome: {a => SetHash(a b c), b => SetHash(a b c), ....
my %hash = ({ $_ => $set.clone } for #keys);
(The parens seem to be needed so it can tell that the curlies are a block instead of a Hash literal...)
No. That particular code in curlies is a Block regardless of whether it's in parens or not.
More generally, Raku code of the form {...} in term position is almost always a Block.
For an explanation of when a {...} sequence is a Hash, and how to force it to be one, see my answer to the Raku SO Is that a Hash or a Block?.
Without the parens you've written this:
my %hash = { block of code } for #keys
which attempts to iterate #keys, running the code my %hash = { block of code } for each iteration. The code fails because you can't assign a block of code to a hash.
Putting parens around the ({ block of code } for #keys) part completely alters the meaning of the code.
Now it runs the block of code for each iteration. And it concatenates the result of each run into a list of results, each of which is a Pair generated by the code $_ => $set.clone. Then, when the for iteration has completed, that resulting list of pairs is assigned, once, to my %hash.

Related

How do I remember the root of a binary search tree in Haskell

I am new to Functional programming.
The challenge I have is regarding the mental map of how a binary search tree works in Haskell.
In other programs (C,C++) we have something called root. We store it in a variable. We insert elements into it and do balancing etc..
The program takes a break does other things (may be process user inputs, create threads) and then figures out it needs to insert a new element in the already created tree. It knows the root (stored as a variable) and invokes the insert function with the root and the new value.
So far so good in other languages. But how do I mimic such a thing in Haskell, i.e.
I see functions implementing converting a list to a Binary Tree, inserting a value etc.. That's all good
I want this functionality to be part of a bigger program and so i need to know what the root is so that i can use it to insert it again. Is that possible? If so how?
Note: Is it not possible at all because data structures are immutable and so we cannot use the root at all to insert something. in such a case how is the above situation handled in Haskell?
It all happens in the same way, really, except that instead of mutating the existing tree variable we derive a new tree from it and remember that new tree instead of the old one.
For example, a sketch in C++ of the process you describe might look like:
int main(void) {
Tree<string> root;
while (true) {
string next;
cin >> next;
if (next == "quit") exit(0);
root.insert(next);
doSomethingWith(root);
}
}
A variable, a read action, and loop with a mutate step. In haskell, we do the same thing, but using recursion for looping and a recursion variable instead of mutating a local.
main = loop Empty
where loop t = do
next <- getLine
when (next /= "quit") $ do
let t' = insert next t
doSomethingWith t'
loop t'
If you need doSomethingWith to be able to "mutate" t as well as read it, you can lift your program into State:
main = loop Empty
where loop t = do
next <- getLine
when (next /= "quit") $ do
loop (execState doSomethingWith (insert next t))
Writing an example with a BST would take too much time but I give you an analogous example using lists.
Let's invent a updateListN which updates the n-th element in a list.
updateListN :: Int -> a -> [a] -> [a]
updateListN i n l = take (i - 1) l ++ n : drop i l
Now for our program:
list = [1,2,3,4,5,6,7,8,9,10] -- The big data structure we might want to use multiple times
main = do
-- only for shows
print $ updateListN 3 30 list -- [1,2,30,4,5,6,7,8,9,10]
print $ updateListN 8 80 list -- [1,2,3,4,5,6,7,80,9,10]
-- now some illustrative complicated processing
let list' = foldr (\i l -> updateListN i (i*10) l) list list
-- list' = [10,20,30,40,50,60,70,80,90,100]
-- Our crazily complicated illustrative algorithm still needs `list`
print $ zipWith (-) list' list
-- [9,18,27,36,45,54,63,72,81,90]
See how we "updated" list but it was still available? Most data structures in Haskell are persistent, so updates are non-destructive. As long as we have a reference of the old data around we can use it.
As for your comment:
My program is trying the following a) Convert a list to a Binary Search Tree b) do some I/O operation c) Ask for a user input to insert a new value in the created Binary Search Tree d) Insert it into the already created list. This is what the program intends to do. Not sure how to get this done in Haskell (or) is am i stuck in the old mindset. Any ideas/hints welcome.
We can sketch a program:
data BST
readInt :: IO Int; readInt = undefined
toBST :: [Int] -> BST; toBST = undefined
printBST :: BST -> IO (); printBST = undefined
loop :: [Int] -> IO ()
loop list = do
int <- readInt
let newList = int : list
let bst = toBST newList
printBST bst
loop newList
main = loop []
"do balancing" ... "It knows the root" nope. After re-balancing the root is new. The function balance_bst must return the new root.
Same in Haskell, but also with insert_bst. It too will return the new root, and you will use that new root from that point forward.
Even if the new root's value is the same, in Haskell it's a new root, since one of its children has changed.
See ''How to "think functional"'' here.
Even in C++ (or other imperative languages), it would usually be considered a poor idea to have a single global variable holding the root of the binary search tree.
Instead code that needs access to a tree should normally be parameterised on the particular tree it operates on. That's a fancy way of saying: it should be a function/method/procedure that takes the tree as an argument.
So if you're doing that, then it doesn't take much imagination to figure out how several different sections of code (or one section, on several occasions) could get access to different versions of an immutable tree. Instead of passing the same tree to each of these functions (with modifications in between), you just pass a different tree each time.
It's only a little more work to imagine what your code needs to do to "modify" an immutable tree. Obviously you won't produce a new version of the tree by directly mutating it, you'll instead produce a new value (probably by calling methods on the class implementing the tree for you, but if necessary by manually assembling new nodes yourself), and then you'll return it so your caller can pass it on - by returning it to its own caller, by giving it to another function, or even calling you again.
Putting that all together, you can have your whole program manipulate (successive versions of) this binary tree without ever having it stored in a global variable that is "the" tree. An early function (possibly even main) creates the first version of the tree, passes it to the first thing that uses it, gets back a new version of the tree and passes it to the next user, and so on. And each user of the tree can call other subfunctions as needed, with possibly many of new versions of the tree produced internally before it gets returned to the top level.
Note that I haven't actually described any special features of Haskell here. You can do all of this in just about any programming language, including C++. This is what people mean when they say that learning other types of programming makes them better programmers even in imperative languages they already knew. You can see that your habits of thought are drastically more limited than they need to be; you could not imagine how you could deal with a structure "changing" over the course of your program without having a single variable holding a structure that is mutated, when in fact that is just a small part of the tools that even C++ gives you for approaching the problem. If you can only imagine this one way of dealing with it then you'll never notice when other ways would be more helpful.
Haskell also has a variety of tools it can bring to this problem that are less common in imperative languages, such as (but not limited to):
Using the State monad to automate and hide much of the boilerplate of passing around successive versions of the tree.
Function arguments allow a function to be given an unknown "tree-consumer" function, to which it can give a tree, without any one place both having the tree and knowing which function it's passing it to.
Lazy evaluation sometimes negates the need to even have successive versions of the tree; if the modifications are expanding branches of the tree as you discover they are needed (like a move-tree for a game, say), then you could alternatively generate "the whole tree" up front even if it's infinite, and rely on lazy evaluation to limit how much work is done generating the tree to exactly the amount you need to look at.
Haskell does in fact have mutable variables, it just doesn't have functions that can access mutable variables without exposing in their type that they might have side effects. So if you really want to structure your program exactly as you would in C++ you can; it just won't really "feel like" you're writing Haskell, won't help you learn Haskell properly, and won't allow you to benefit from many of the useful features of Haskell's type system.

Assign a Seq(Seq) into arrays

What is it the correct syntax to assign a Seq(Seq) into multiple typed arrays without assign the Seq to an scalar first? Has the Seq to be flattened somehow? This fails:
class A { has Int $.r }
my A (#ra1, #ra2);
#create two arrays with 5 random numbers below a certain limit
#Fails: Type check failed in assignment to #ra1; expected A but got Seq($((A.new(r => 3), A.n...)
(#ra1, #ra2) =
<10 20>.map( -> $up_limit {
(^5).map({A.new( r => (^$up_limit).pick ) })
});
TL;DR Binding is faster than assignment, so perhaps this is the best practice solution to your problem:
:(#ra1, #ra2) := <10 20>.map(...);
While uglier than the solution in the accepted answer, this is algorithmically faster because binding is O(1) in contrast to assignment's O(N) in the length of the list(s) being bound.
Assigning / copying
Simplifying, your non-working code is:
(#listvar1, #listvar2) = list1, list2;
In Raku infix = means assignment / copying from the right of the = into one or more of the container variables on the left of the =.
If a variable on the left is bound to a Scalar container, then it will assign one of the values on the right. Then the assignment process starts over with the next container variable on the left and the next value on the right.
If a variable on the left is bound to an Array container, then it uses up all remaining values on the right. So your first array variable receives both list1 and list2. This is not what you want.
Simplifying, here's Christoph's answer:
#listvar1, #listvar2 Z= list1, list2;
Putting the = aside for a moment, Z is an infix version of the zip routine. It's like (a physical zip pairing up consecutive arguments on its left and right. When used with an operator it applies that operator to the pair. So you can read the above Z= as:
#listvar1 = list1;
#listvar2 = list2;
Job done?
Assignment into Array containers entails:
Individually copying as many individual items as there are in each list into the containers. (In the code in your example list1 and list2 contain 5 elements each, so there would be 10 copying operations in total.)
Forcing the containers to resize as necessary to accommodate the items.
Doubling up the memory used by the items (the original list elements and the duplicates copied into the Array elements).
Checking that the type of each item matches the element type constraint.
Assignment is in general much slower and more memory intensive than binding...
Binding
:(#listvar1, #listvar2) := list1, list2;
The := operator binds whatever's on its left to the arguments on its right.
If there's a single variable on the left then things are especially simple. After binding, the variable now refers precisely to what's on the right. (This is especially simple and fast -- a quick type check and it's done.)
But that's not so in our case.
Binding also accepts a standalone signature literal on its left. The :(...) in my answer is a standalone Signature literal.
(Signatures are typically attached to a routine without the colon prefix. For example, in sub foo (#var1, #var2) {} the (#var1, #var2) part is a signature attached to the routine foo. But as you can see, one can write a signature separately and let Raku know it's a signature by prefixing a pair of parens with a colon. A key difference is that any variables listed in the signature must have already been declared.)
When there's a signature literal on the left then binding happens according to the same logic as binding arguments in routine calls to a receiving routine's signature.
So the net result is that the variables get the values they'd have inside this sub:
sub foo (#listvar1, #listvar2) { }
foo list1, list2;
which is to say the effect is the same as:
#listvar1 := list1;
#listvar2 := list2;
Again, as with Christoph's answer, job done.
But this way we'll have avoided assignment overhead.
Not entirely sure if it's by design, but what seems to happen is that both of your sequences are getting stored into #ra1, while #ra2 remains empty. This violates the type constraint.
What does work is
#ra1, #ra2 Z= <10 20>.map(...);

How to assign the .lines Seq to a variable and iterate over it?

Assigning an iterator to variable changes apparently how the Seq behaves. E.g.
use v6;
my $i = '/etc/lsb-release'.IO.lines;
say $i.WHAT;
say '/etc/lsb-release'.IO.lines.WHAT;
.say for $i;
.say for '/etc/lsb-release'.IO.lines;
results in:
(Seq)
(Seq)
(DISTRIB_ID=Ubuntu DISTRIB_RELEASE=18.04 DISTRIB_CODENAME=bionic DISTRIB_DESCRIPTION="Ubuntu 18.04.1 LTS")
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=18.04
DISTRIB_CODENAME=bionic
DISTRIB_DESCRIPTION="Ubuntu 18.04.1 LTS"
So once assigned I get only the string representation of the sequence. I know I can use .say for $i.lines to get the same output but I do not understand the difference between the assigned and unassigned iterator/Seq.
Assignment in Perl 6 is always about putting something into something else.
Assignment into a Scalar ($ sigil) stores the thing being assigned into a Scalar container object, meaning it will be treated as a single item; this is why for $item { } will not do an iteration. There are various ways to overcome this; the most conceptually simple way is to use the <> postfix operator, which strips away any Scalar container:
my $i = '/etc/lsb-release'.IO.lines;
.say for $i<>;
There's also the slip operator ("flatten into"), which will achieve the same:
my $i = '/etc/lsb-release'.IO.lines;
.say for |$i;
Assignment into an Array will - unless the right-hand side is marked lazy - iterate it and store each element into the Array. Thus:
my #i = '/etc/lsb-release'.IO.lines;
.say for #i;
Will work, but it will eagerly read all the lines into #i before the loop starts. This is OK for a small file, but less ideal for a large file, where we might prefer to work lazily (that is, only pulling a bit of the file into memory at a time). One might try:
my #i = lazy '/etc/lsb-release'.IO.lines;
.say for #i;
But that won't help with the retention problem; it just means the array will be populated lazily from the file as the iteration takes place. Of course, sometimes we might want to go through the lines multiple times, in which case assignment into an Array would be the best choice.
By contrast, declaring a symbol and binding it to that:
my \i = '/etc/lsb-release'.IO.lines;
.say for i;
Is not a "put into" operation at all; it just makes the symbol i refer to exactly what lines returns. This is rather clearer than putting it into a Scalar container only to take it out again. It's also a little easier on the reader, since a my \foo = ... can never be rebound, and so the reader doesn't need to be on the lookup for any potential changes later on in the code.
As a final note, it's worth knowing that the my \foo = ... form is actually a binding, rather than an assignment. Perl 6 allows us to write it with the = operator rather than forcing :=, even if in this case the semantics are := semantics. This is just one of a number of cases where a declaration with an initializer differs a bit from a normal assignment, e.g. has $!foo = rand actually runs the assignment on every object instantiation, while state $foo = rand only runs it only if we're on the first entry to the current closure clone.
If you want to be able to iterate over the sequence you need to either assign it to a positional :
my #i = '/etc/lsb-release'.IO.lines;
.say for #i;
Or you can tell the iterator that you want to treat the given thing as iterable :
.say for #$i
Or you can Slip it into a list for the iterator :
.say for |$i

Clearing numerical values in Mathematica

I am working on fairly large Mathematica projects and the problem arises that I have to intermittently check numerical results but want to easily revert to having all my constructs in analytical form.
The code is fairly fluid I don't want to use scoping constructs everywhere as they add work overhead. Is there an easy way for identifying and clearing all assignments that are numerical?
EDIT: I really do know that scoping is the way to do this correctly ;-). However, for my workflow I am really just looking for a dirty trick to nix all numerical assignments after the fact instead of having the foresight to put down a Block.
If your assignments are on the top level, you can use something like this:
a = 1;
b = c;
d = 3;
e = d + b;
Cases[DownValues[In],
HoldPattern[lhs_ = rhs_?NumericQ] |
HoldPattern[(lhs_ = rhs_?NumericQ;)] :> Unset[lhs],
3]
This will work if you have a sufficient history length $HistoryLength (defaults to infinity). Note however that, in the above example, e was assigned 3+c, and 3 here was not undone. So, the problem is really ambiguous in formulation, because some numbers could make it into definitions. One way to avoid this is to use SetDelayed for assignments, rather than Set.
Another alternative would be to analyze the names in say Global' context (if that is the context where your symbols live), and then say OwnValues and DownValues of the symbols, in a fashion similar to the above, and remove definitions with purely numerical r.h.s.
But IMO neither of these approaches are robust. I'd still use scoping constructs and try to isolate numerics. One possibility is to wrap you final code in Block, and assign numerical values inside this Block. This seems a much cleaner approach. The work overhead is minimal - you just have to remember which symbols you want to assign the values to. Block will automatically ensure that outside it, the symbols will have no definitions.
EDIT
Yet another possibility is to use local rules. For example, one could define rule[a] = a->1; rule[d]=d->3 instead of the assignments above. You could then apply these rules, extracting them as say
DownValues[rule][[All, 2]], whenever you want to test with some numerical arguments.
Building on Andrew Moylan's solution, one can construct a Block like function that would takes rules:
SetAttributes[BlockRules, HoldRest]
BlockRules[rules_, expr_] :=
Block ## Append[Apply[Set, Hold#rules, {2}], Unevaluated[expr]]
You can then save your numeric rules in a variable, and use BlockRules[ savedrules, code ], or even define a function that would apply a fixed set of rules, kind of like so:
In[76]:= NumericCheck =
Function[body, BlockRules[{a -> 3, b -> 2`}, body], HoldAll];
In[78]:= a + b // NumericCheck
Out[78]= 5.
EDIT In response to Timo's comment, it might be possible to use NotebookEvaluate (new in 8) to achieve the requested effect.
SetAttributes[BlockRules, HoldRest]
BlockRules[rules_, expr_] :=
Block ## Append[Apply[Set, Hold#rules, {2}], Unevaluated[expr]]
nb = CreateDocument[{ExpressionCell[
Defer[Plot[Sin[a x], {x, 0, 2 Pi}]], "Input"],
ExpressionCell[Defer[Integrate[Sin[a x^2], {x, 0, 2 Pi}]],
"Input"]}];
BlockRules[{a -> 4}, NotebookEvaluate[nb, InsertResults -> "True"];]
As the result of this evaluation you get a notebook with your commands evaluated when a was locally set to 4. In order to take it further, you would have to take the notebook
with your code, open a new notebook, evaluate Notebooks[] to identify the notebook of interest and then do :
BlockRules[variablerules,
NotebookEvaluate[NotebookPut[NotebookGet[nbobj]],
InsertResults -> "True"]]
I hope you can make this idea work.

Reliable clean-up in Mathematica

For better or worse, Mathematica provides a wealth of constructs that allow you to do non-local transfers of control, including Return, Catch/Throw, Abort and Goto. However, these kinds of non-local transfers of control often conflict with writing robust programs that need to ensure that clean-up code (like closing streams) gets run. Many languages provide ways of ensuring that clean-up code gets run in a wide variety of circumstances; Java has its finally blocks, C++ has destructors, Common Lisp has UNWIND-PROTECT, and so on.
In Mathematica, I don't know how to accomplish the same thing. I have a partial solution that looks like this:
Attributes[CleanUp] = {HoldAll};
CleanUp[body_, form_] :=
Module[{return, aborted = False},
Catch[
CheckAbort[
return = body,
aborted = True];
form;
If[aborted,
Abort[],
return],
_, (form; Throw[##]) &]];
This certainly isn't going to win any beauty contests, but it also only handles Abort and Throw. In particular, it fails in the presence of Return; I figure if you're using Goto to do this kind of non-local control in Mathematica you deserve what you get.
I don't see a good way around this. There's no CheckReturn for instance, and when you get right down to it, Return has pretty murky semantics. Is there a trick I'm missing?
EDIT: The problem with Return, and the vagueness in its definition, has to do with its interaction with conditionals (which somehow aren't "control structures" in Mathematica). An example, using my CleanUp form:
CleanUp[
If[2 == 2,
If[3 == 3,
Return["foo"]]];
Print["bar"],
Print["cleanup"]]
This will return "foo" without printing "cleanup". Likewise,
CleanUp[
baz /.
{bar :> Return["wongle"],
baz :> Return["bongle"]},
Print["cleanup"]]
will return "bongle" without printing cleanup. I don't see a way around this without tedious, error-prone and maybe impossible code-walking or somehow locally redefining Return using Block, which is heinously hacky and doesn't actually seem to work (though experimenting with it is a great way to totally wedge a kernel!)
Great question, but I don't agree that the semantics of Return are murky; They are documented in the link you provide. In short, Return exits the innermost construct (namely, a control structure or function definition) in which it is invoked.
The only case in which your CleanUp function above fails to cleanup from a Return is when you directly pass a single or CompoundExpression (e.g. (one;two;three) directly as input to it.
Return exits the function f:
In[28]:= f[] := Return["ret"]
In[29]:= CleanUp[f[], Print["cleaned"]]
During evaluation of In[29]:= cleaned
Out[29]= "ret"
Return exits x:
In[31]:= x = Return["foo"]
In[32]:= CleanUp[x, Print["cleaned"]]
During evaluation of In[32]:= cleaned
Out[32]= "foo"
Return exits the Do loop:
In[33]:= g[] := (x = 0; Do[x++; Return["blah"], {10}]; x)
In[34]:= CleanUp[g[], Print["cleaned"]]
During evaluation of In[34]:= cleaned
Out[34]= 1
Returns from the body of CleanUp at the point where body is evaluated (since CleanUp is HoldAll):
In[35]:= CleanUp[Return["ret"], Print["cleaned"]];
Out[35]= "ret"
In[36]:= CleanUp[(Print["before"]; Return["ret"]; Print["after"]),
Print["cleaned"]]
During evaluation of In[36]:= before
Out[36]= "ret"
As I noted above, the latter two examples are the only problematic cases I can contrive (although I could be wrong) but they can be handled by adding a definition to CleanUp:
In[44]:= CleanUp[CompoundExpression[before___, Return[ret_], ___], form_] :=
(before; form; ret)
In[45]:= CleanUp[Return["ret"], Print["cleaned"]]
During evaluation of In[46]:= cleaned
Out[45]= "ret"
In[46]:= CleanUp[(Print["before"]; Return["ret"]; Print["after"]),
Print["cleaned"]]
During evaluation of In[46]:= before
During evaluation of In[46]:= cleaned
Out[46]= "ret"
As you said, not going to win any beauty contests, but hopefully this helps solve your problem!
Response to your update
I would argue that using Return inside If is unnecessary, and even an abuse of Return, given that If already returns either the second or third argument based on the state of the condition in the first argument. While I realize your example is probably contrived, If[3==3, Return["Foo"]] is functionally identical to If[3==3, "foo"]
If you have a more complicated If statement, you're better off using Throw and Catch to break out of the evaluation and "return" something to the point you want it to be returned to.
That said, I realize you might not always have control over the code you have to clean up after, so you could always wrap the expression in CleanUp in a no-op control structure, such as:
ret1 = Do[ret2 = expr, {1}]
... by abusing Do to force a Return not contained within a control structure in expr to return out of the Do loop. The only tricky part (I think, not having tried this) is having to deal with two different return values above: ret1 will contain the value of an uncontained Return, but ret2 would have the value of any other evaluation of expr. There's probably a cleaner way to handle that, but I can't see it right now.
HTH!
Pillsy's later version of CleanUp is a good one. At the risk of being pedantic, I must point out a troublesome use case:
Catch[CleanUp[Throw[23], Print["cleanup"]]]
The problem is due to the fact that one cannot explicitly specify a tag pattern for Catch that will match an untagged Throw.
The following version of CleanUp addresses that problem:
SetAttributes[CleanUp, HoldAll]
CleanUp[expr_, cleanup_] :=
Module[{exprFn, result, abort = False, rethrow = True, seq},
exprFn[] := expr;
result = CheckAbort[
Catch[
Catch[result = exprFn[]; rethrow = False; result],
_,
seq[##]&
],
abort = True
];
cleanup;
If[abort, Abort[]];
If[rethrow, Throw[result /. seq -> Sequence]];
result
]
Alas, this code is even less likely to be competitive in a beauty contest. Furthermore, it wouldn't surprise me if someone jumped in with yet another non-local control flow that that this code will not handle. Even in the unlikely event that it handles all possible cases now, problematic cases could be introduced in Mathematica X (where X > 7.01).
I fear that there cannot be a definitive answer to this problem until Wolfram introduces a new control structure expressly for this purpose. UnwindProtect would be a fine name for such a facility.
Michael Pilat provided the key trick for "catching" returns, but I ended up using it in a slightly different way, using the fact that Return forces the return value of a named function as well as control structures like Do. I made the expression that is being cleaned up after into the down-value of a local symbol, like so:
Attributes[CleanUp] = {HoldAll};
CleanUp[expr_, form_] :=
Module[{body, value, aborted = False},
body[] := expr;
Catch[
CheckAbort[
value = body[],
aborted = True];
form;
If[aborted,
Abort[],
value],
_, (form; Throw[##]) &]];