Binding of private attributes: nqp::bindattr vs := - raku

I'm trying to find how the binding operation works on attributes and what makes it so different from nqp::bindattr. Consider the following example:
class Foo {
has #!foo;
submethod TWEAK {
my $fval = [<a b c>];
use nqp;
nqp::bindattr( nqp::decont(self), $?CLASS, '#!foo',
##!foo :=
Proxy.new(
FETCH => -> $ { $fval },
STORE => -> $, $v { $fval = $v }
)
);
}
method check {
say #!foo.perl;
}
}
my $inst = Foo.new;
$inst.check;
It prints:
$["a", "b", "c"]
Replacing nqp::bindattr with the binding operator from the comment gives correct output:
["a", "b", "c"]
Similarly, if foo is a public attribute and accessor is used the output would be correct too due to deconterisation taking place within the accessor.
I use similar code in my AttrX::Mooish module where use of := would overcomplicate the implementation. So far, nqp::bindattr did the good job for me until the above problem arised.
I tried tracing down Rakudo's internals looking for := implementation but without any success so far. I would ask here either for an advise as to how to simulate the operator or where in the source to look for its implementation.

Before I dig into the answer: most things in this post are implementation-defined, and the implementation is free to define them differently in the future.
To find out what something (naively) compiles into under Rakudo Perl 6, use the --target=ast option (perl6 --target=ast foo.p6). For example, the bind in:
class C {
has $!a;
submethod BUILD() {
my $x = [1,2,3];
$!a := $x
}
}
Comes out as:
- QAST::Op(bind) :statement_id<7>
- QAST::Var(attribute $!a) <wanted> $!a
- QAST::Var(lexical self)
- QAST::WVal(C)
- QAST::Var(lexical $x) $x
While switching it for #!a like here:
class C {
has #!a;
submethod BUILD() {
my $x = [1,2,3];
#!a := $x
}
}
Comes out as:
- QAST::Op(bind) :statement_id<7>
- QAST::Var(attribute #!a) <wanted> #!a
- QAST::Var(lexical self)
- QAST::WVal(C)
- QAST::Op(p6bindassert)
- QAST::Op(decont)
- QAST::Var(lexical $x) $x
- QAST::WVal(Positional)
The decont instruction is the big difference here, and it will take the contents of the Proxy by calling its FETCH, thus why the containerization is gone. Thus, you can replicate the behavior by inserting nqp::decont around the Proxy, although that rather begs the question of what the Proxy is doing there if the correct answer is obtained without it!
Both := and = are compiled using case analysis (namely, by looking at what is on the left hand side). := only works for a limited range of simple expressions on the left; it is a decidedly low-level operator. By contrast, = falls back to a sub call if the case analysis doesn't come up with a more efficient form to emit, though in most cases it manages something better.
The case analysis for := inserts a decont when the target is a lexical or attribute with sigil # or %, since - at a Perl 6 level - having an item bound to an # or % makes no sense. Using nqp::bindattr is going a level below Perl 6 semantics, and so it's possible to end up with the Proxy bound directly there using that. However, it also violates expectations elsewhere. Don't expect that to go well (but it seems you don't want to do that anyway.)

Related

Raku Ambiguous call to infix(Hyper: Dan::Series, Int)

I am writing a model Series class (kinda like the one in pandas) - and it should be both Positional and Associative.
class Series does Positional does Iterable does Associative {
has Array $.data is required;
has Array $.index;
### Construction ###
method TWEAK {
# sort out data-index dependencies
$!index = gather {
my $i = 0;
for |$!data -> $d {
take ( $!index[$i++] => $d )
}
}.Array
}
### Output ###
method Str {
$!index
}
### Role Support ###
# Positional role support
# viz. https://docs.raku.org/type/Positional
method of {
Mu
}
method elems {
$!data.elems
}
method AT-POS( $p ) {
$!data[$p]
}
method EXISTS-POS( $p ) {
0 <= $p < $!data.elems ?? True !! False
}
# Iterable role support
# viz. https://docs.raku.org/type/Iterable
method iterator {
$!data.iterator
}
method flat {
$!data.flat
}
method lazy {
$!data.lazy
}
method hyper {
$!data.hyper
}
# Associative role support
# viz. https://docs.raku.org/type/Associative
method keyof {
Str(Any)
}
method AT-KEY( $k ) {
for |$!index -> $p {
return $p.value if $p.key ~~ $k
}
}
method EXISTS-KEY( $k ) {
for |$!index -> $p {
return True if $p.key ~~ $k
}
}
#`[ solution attempt #1 does NOT get called
multi method infix(Hyper: Series, Int) is default {
die "I was called"
}
#]
}
my $s = Series.new(data => [rand xx 5], index => [<a b c d e>]);
say ~$s;
say $s[2];
say $s<b>;
So far pretty darn cool.
I can go dd $s.hyper and get this
HyperSeq.new(configuration => HyperConfiguration.new(batch => 64, degree => 1))
BUT (there had to be a but coming), I want to be able to do hyper math on my Series' elements, something like:
say $s >>+>> 2;
But that yields:
Ambiguous call to 'infix(Hyper: Dan::Series, Int)'; these signatures all match:
(Hyper: Associative:D \left, \right, *%_)
(Hyper: Positional:D \left, \right, *%_)
in block <unit> at ./synopsis-dan.raku line 63
How can I tell my class Series not to offer the Associative hyper candidate...?
Note: edited example to be a runnable MRE per #raiph's comment ... I have thus left in the minimum requirements for the 3 roles in play per docs.raku.org
After some experimentation (and new directions to consider from the very helpful comments to this SO along the way), I think I have found a solution:
drop the does Associative role from the class declaration like this:
class Series does Positional does Iterable {...}
BUT
leave the Associative role support methods in the body of the class:
# Associative role support
# viz. https://docs.raku.org/type/Associative
method keyof {
Str(Any)
}
method AT-KEY( $k ) {
for |$!index -> $p {
return $p.value if $p.key ~~ $k
}
}
method EXISTS-KEY( $k ) {
for |$!index -> $p {
return True if $p.key ~~ $k
}
}
This gives me the Positional and Associative accessors, and functional hyper math operators:
my $s = Series.new(data => [rand xx 5], index => [<a b c d e>]);
say ~$s; #([a => 0.6137271559776396 b => 0.7942959887386045 c => 0.5768018697817604 d => 0.8964323560788711 e => 0.025740150933493577] , dtype: Num)
say $s[2]; #0.7942959887386045
say $s<b>; #0.5768018697817604
say $s >>+>> 2; #(2.6137271559776396 2.7942959887386047 2.5768018697817605 2.896432356078871 2.0257401509334936)
While this feels a bit thin (and probably lacks the full set of Associative functions) I am fairly confident that the basic methods will give me slimmed down access like a hash from a key capability that I seek. And it no longer creates the ambiguous call.
This solution may be cheating a bit in that I know the level of compromise that I will accept ;-).
Take #1
First, an MRE with an emphasis on the M1:
class foo does Positional does Associative { method of {} }
sub infix:<baz> (\l,\r) { say 'baz' }
foo.new >>baz>> 42;
yields:
Ambiguous call to 'infix(Hyper: foo, Int)'; these signatures all match:
(Hyper: Associative:D \left, \right, *%_)
(Hyper: Positional:D \left, \right, *%_)
in block <unit> at ./synopsis-dan.raku line 63
The error message shows it's A) a call to a method named infix with an invocant matching Hyper, and B) there are two methods that potentially match that call.
Given that there's no class Hyper in your MRE, these methods and the Hyper class must be either built-ins or internal details that are leaking out.
A search of the doc finds no such class. So Hyper is undocumented Given that the doc has fairly broad coverage these days, this suggests Hyper is an internal detail. But regardless, it looks like you can't solve your problem using official/documented features.
Hopefully this bad news is still better than none.2
Take #2
Where's the fun in letting little details like "not an official feature" stop us doing what we want to do?
There's a core.c module named Hyper.pm6 in the Rakudo source repo.
A few seconds browsing that, and clicks on its History and Blame, and I can instantly see it really is time for me to conclude this SO answer, with a recommendation for your next move.
To wit, I suggest you start another SO, using this answer as its heart (but reversing my presentation order, ie starting by mentioning Hyper, and that it's not doc'd), and namechecking Liz (per Hyper's History/Blame), with a link back to your Q here as its background. I'm pretty sure that will get you a good answer, or at least an authoritative one.
Footnotes
1 I also tried this:
class foo does Positional does Associative { method of {} }
sub postfix:<bar>(\arg) { say 'bar' }
foo.new>>bar;
but that worked (displayed bar).
2 If you didn't get to my Take #1 conclusion yourself, perhaps that was was because your MRE wasn't very M? If you did arrive at the same point (cf "solution attempt #1 does NOT get called" in your MRE) then please read and, for future SOs, take to heart, the wisdom of "Explain ... any difficulties that have prevented you from solving it yourself".

Variable getting overwritten in for loop

In a for loop, a different variable is assigned a value. The variable which has already been assigned a value is getting assigned the value from next iteration. At the end, both variable have the same value.
The code is for validating data in a file. When I print the values, it prints correct value for first iteration but in the next iteration, the value assigned in first iteration is changed.
When I print the value of $value3 and $value4 in the for loop, it shows null for $value4 and some value for $value3 but in the next iteration, the value of $value3 is overwritten by the value of $value4
I have tried on rakudo perl 6.c
my $fh= $!FileName.IO.open;
my $fileObject = FileValidation.new( file => $fh );
for (3,4).list {
put "Iteration: ", $_;
if ($_ == 4) {
$value4 := $fileObject.FileValidationFunction(%.ValidationRules{4}<ValidationFunction>, %.ValidationRules{4}<Arguments>);
}
if ($_ == 3) {
$value3 := $fileObject.FileValidationFunction(%.ValidationRules{3}<ValidationFunction>, %.ValidationRules{3}<Arguments>);
}
$fh.seek: SeekFromBeginning;
}
TL;DR It's not possible to confidently answer your question as it stands. This is a nanswer -- an answer in the sense I'm writing it as one but also quite possibly not an answer in the sense of helping you fix your problem.
Is it is rw? A first look.
The is rw trait on a routine or class attribute means it returns a container that contains a value rather than just returning a value.
If you then alias that container then you can get the behavior you've described.
For example:
my $foo;
sub bar is rw { $foo = rand }
my ($value3, $value4);
$value3 := bar;
.say for $value3, $value4;
$value4 := bar;
.say for $value3, $value4;
displays:
0.14168492246366005
(Any)
0.31843665763839857
0.31843665763839857
This isn't a bug in the language or compiler. It's just P6 code doing what it's supposed to do.
A longer version of the same thing
Perhaps the above is so far from your code it's disorienting. So here's the same thing wrapped in something like the code you provided.
spurt 'junk', 'junk';
class FileValidation {
has $.file;
has $!foo;
method FileValidationFunction ($,$) is rw { $!foo = rand }
}
class bar {
has $!FileName = 'junk';
has %.ValidationRules =
{ 3 => { ValidationFunction => {;}, Arguments => () },
4 => { ValidationFunction => {;}, Arguments => () } }
my ($value3, $value4);
method baz {
my $fh= $!FileName.IO.open;
my $fileObject = FileValidation.new( file => $fh );
my ($value3, $value4);
for (3,4).list {
put "Iteration: ", $_;
if ($_ == 4) {
$value4 := $fileObject.FileValidationFunction(
%.ValidationRules{4}<ValidationFunction>, %.ValidationRules{4}<Arguments>);
}
if ($_ == 3) {
$value3 := $fileObject.FileValidationFunction(
%.ValidationRules{3}<ValidationFunction>, %.ValidationRules{3}<Arguments>);
}
$fh.seek: SeekFromBeginning;
.say for $value3, $value4
}
}
}
bar.new.baz
This outputs:
Iteration: 3
0.5779679442816953
(Any)
Iteration: 4
0.8650280000277686
0.8650280000277686
Is it is rw? A second look.
Brad and I came up with essentially the same answer (at the same time; I was a minute ahead of Brad but who's counting? I mean besides me? :)) but Brad nicely nails the fix:
One way to avoid aliasing a container is to just use =.
(This is no doubt also why #ElizabethMattijsen++ asked about trying = instead of :=.)
You've commented that changing from := to = made no difference.
But presumably you didn't change from := to = throughout your entire codebase but rather just (the equivalent of) the two in the code you've shared.
So perhaps the problem can still be fixed by switching from := to =, but in some of your code elsewhere. (That said, don't just globally replace := with =. Instead, make sure you understand their difference and then change them as appropriate. You've got a test suite, right? ;))
How to move forward if you're still stuck
Right now your question has received several upvotes and no downvotes and you've got two answers (that point to the same problem).
But maybe our answers aren't good enough.
If so...
The addition of the reddit comment, and trying = instead of :=, and trying the latest compiler, and commenting on those things, leaves me glad I didn't downvote your question, but I haven't upvoted it yet and there's a reason for that. It's because your question is still missing a Minimal Reproducible Example.
You responded to my suggestion about producing an MRE with:
The problem is that I am not able to replicate this in a simpler environment
I presumed that's your situation, but as you can imagine, that means we can't confidently replicate it at all. That may be the way you prefer to go for reasons but it goes against SO guidance (in the link above) and if the current answers aren't adequate then the sensible way forward is for you to do what it takes to share code that reproduces your problem.
If it's large, please don't just paste it into your question but instead link to it. Perhaps you can set it up on glot.io using the + button to use multiple files (up to 6 I think, plus there's a standard input too). If not, perhaps gist it via, say, gist.github.com, and if I can I'll set it up on glot.io for you.
What is probably happening is that you are returning a container rather than a value, then aliasing the container to a variable.
class Foo {
has $.a is rw;
}
my $o = Foo.new( a => 1 );
my $old := $o.a;
say $old; # 1
$o.a = 2;
say $old; # 2
One way to avoid aliasing a container is to just use =.
my $old = $o.a;
say $old; # 1
$o.a = 2;
say $old; # 1
You could also decontainerize the value using either .self or .<>
my $old := $o.a.<>;
say $old; # 1
$o.a = 2;
say $old; # 1
(Note that .<> above could be .self or just <>.)

(Identifier) terms vs. constants vs. null signature routines

Identifier terms are defined in the documentation alongside constants, with pretty much the same use case, although terms compute their value in run time while constants get it in compile time. Potentially, that could make terms use global variables, but that's action at a distance and ugly, so I guess that's not their use case.
OTOH, they could be simply routines with null signature:
sub term:<þor> { "Is mighty" }
sub Þor { "Is mighty" }
say þor, Þor;
But you can already define routines with null signature. You can sabe, however, the error when you write:
say Þor ~ Þor;
Which would produce a many positionals passed; expected 0 arguments but got 1, unlike the term. That seems however a bit farfetched and you can save the trouble by just adding () at the end.
Another possible use case is defying the rules of normal identifiers
sub term:<✔> { True }
say ✔; # True
Are there any other use cases I have missed?
Making zero-argument subs work as terms will break the possibility to post-declare subs, since finding a sub after having parsed usages of it would require re-parsing of earlier code (which the perl 6 language refuses to do, "one-pass parsing" and all that) if the sub takes no arguments.
Terms are useful in combination with the ternary operator:
$ perl6 -e 'sub a() { "foo" }; say 42 ?? a !! 666'
===SORRY!=== Error while compiling -e
Your !! was gobbled by the expression in the middle; please parenthesize
$ perl6 -e 'sub term:<a> { "foo" }; say 42 ?? a !! 666'
foo
Constants are basically terms. So of course they are grouped together.
constant foo = 12;
say foo;
constant term:<bar> = 36;
say bar;
There is a slight difference because term:<…> works by modifying the parser. So it takes precedence.
constant fubar = 38;
constant term:<fubar> = 45;
say fubar; # 45
The above will print 45 regardless of which constant definition comes first.
Since term:<…> takes precedence the only way to get at the other value is to use ::<fubar> to directly access the symbol table.
say ::<fubar>; # 38
say ::<term:<fubar>>; # 45
There are two main use-cases for term:<…>.
One is to get a subroutine to be parsed similarly to a constant or sigilless variable.
sub fubar () { 'fubar'.comb.roll }
# say( fubar( prefix:<~>( 4 ) ) );
say fubar ~ 4; # ERROR
sub term:<fubar> () { 'fubar'.comb.roll }
# say( infix:<~>( fubar, 4 ) );
say fubar ~ 4;
The other is to have a constant or sigiless variable be something other than an a normal identifier.
my \✔ = True; # ERROR: Malformed my
my \term:<✔> = True;
say ✔;
Of course both use-cases can be combined.
sub term:<✔> () { True }
Perl 5 allows subroutines to have an empty prototype (different than a signature) which will alter how it gets parsed. The main purpose of prototypes in Perl 5 is to alter how the code gets parsed.
use v5;
sub fubar () { ord [split('','fubar')]->[rand 5] }
# say( fubar() + 4 );
say fubar + 4; # infix +
use v5;
sub fubar { ord [split('','fubar')]->[rand 5] }
# say( fubar( +4 ) );
say fubar + 4; # prefix +
Perl 6 doesn't use signatures the way Perl 5 uses prototypes. The main way to alter how Perl 6 parses code is by using the namespace.
use v6;
sub fubar ( $_ ) { .comb.roll }
sub term:<fubar> () { 'fubar'.comb.roll }
say fubar( 'zoo' ); # `z` or `o` (`o` is twice as likely)
say fubar; # `f` or `u` or `b` or `a` or `r`
sub prefix:<✔> ( $_ ) { "checked $_" }
say ✔ 'under the bed'; # checked under the bed
Note that Perl 5 doesn't really have constants, they are just subroutines with an empty prototype.
use v5;
use constant foo => 12;
use v5;
sub foo () { 12 } # ditto
(This became less true after 5.16)
As far as I know all of the other uses of prototypes have been superseded by design decisions in Perl 6.
use v5;
sub foo (&$) { $_[0]->($_[1]) }
say foo { 100 + $_[0] } 5; # 105;
That block is seen as a sub lambda because of the prototype of the foo subroutine.
use v6;
# sub foo ( &f, $v ) { f $v }
sub foo { #_[0].( #_[1] ) }
say foo { 100 + #_[0] }, 5; # 105
In Perl 6 a block is seen as a lambda if a term is expected. So there is no need to alter the parser with a feature like a prototype.
You are asking for exactly one use of prototypes to be brought back even though there is already a feature that covers that use-case.
Doing so would be a special-case. Part of the design ethos of Perl 6 is to limit the number of special-cases.
Other versions of Perl had a wide variety of special-cases, and it isn't always easy to remember them all.
Don't get me wrong; the special-cases in Perl 5 are useful, but Perl 6 has for the most part made them general-cases.

Dynamic/Static scope with Deep/Shallow binding (exercises)

I'm studying dynamic/static scope with deep/shallow binding and running code manually to see how these different scopes/bindings actually work. I read the theory and googled some example exercises and the ones I found are very simple (like this one which was very helpful with dynamic scoping) But I'm having trouble understanding how static scope works.
Here I post an exercise I did to check if I got the right solution:
considering the following program written in pseudocode:
int u = 42;
int v = 69;
int w = 17;
proc add( z:int )
u := v + u + z
proc bar( fun:proc )
int u := w;
fun(v)
proc foo( x:int, w:int )
int v := x;
bar(add)
main
foo(u,13)
print(u)
end;
What is printed to screen
a) using static scope? answer=180
b) using dynamic scope and deep binding? answer=69 (sum for u = 126 but it's foo's local v, right?)
c) using dynamic scope and shallow binding? answer=69 (sum for u = 101 but it's foo's local v, right?)
PS: I'm trying to practice doing some exercises like this if you know where I can find these types of problems (preferable with solutions) please give the link, thanks!
Your answer for lexical (static) scope is correct. Your answers for dynamic scope are wrong, but if I'm reading your explanations right, it's because you got confused between u and v, rather than because of any real misunderstanding about how deep and shallow binding work. (I'm assuming that your u/v confusion was just accidental, and not due to a strange confusion about values vs. references in the call to foo.)
a) using static scope? answer=180
Correct.
b) using dynamic scope and deep binding? answer=69 (sum for u = 126 but it's foo's local v, right?)
Your parenthetical explanation is right, but your answer is wrong: u is indeed set to 126, and foo indeed localizes v, but since main prints u, not v, the answer is 126.
c) using dynamic scope and shallow binding? answer=69 (sum for u = 101 but it's foo's local v, right?)
The sum for u is actually 97 (42+13+42), but since bar localizes u, the answer is 42. (Your parenthetical explanation is wrong for this one — you seem to have used the global variable w, which is 17, in interpreting the statement int u := w in the definition of bar; but that statement actually refers to foo's local variable w, its second parameter, which is 13. But that doesn't actually affect the answer. Your answer is wrong for this one only because main prints u, not v.)
For lexical scope, it's pretty easy to check your answers by translating the pseudo-code into a language with lexical scope. Likewise dynamic scope with shallow binding. (In fact, if you use Perl, you can test both ways almost at once, since it supports both; just use my for lexical scope, then do a find-and-replace to change it to local for dynamic scope. But even if you use, say, JavaScript for lexical scope and Bash for dynamic scope, it should be quick to test both.)
Dynamic scope with deep binding is much trickier, since few widely-deployed languages support it. If you use Perl, you can implement it manually by using a hash (an associative array) that maps from variable-names to scalar-refs, and passing this hash from function to function. Everywhere that the pseudocode declares a local variable, you save the existing scalar-reference in a Perl lexical variable, then put the new mapping in the hash; and at the end of the function, you restore the original scalar-reference. To support the binding, you create a wrapper function that creates a copy of the hash, and passes that to its wrapped function. Here is a dynamically-scoped, deeply-binding implementation of your program in Perl, using that approach:
#!/usr/bin/perl -w
use warnings;
use strict;
# Create a new scalar, initialize it to the specified value,
# and return a reference to it:
sub new_scalar($)
{ return \(shift); }
# Bind the specified procedure to the specified environment:
sub bind_proc(\%$)
{
my $V = { %{+shift} };
my $f = shift;
return sub { $f->($V, #_); };
}
my $V = {};
$V->{u} = new_scalar 42; # int u := 42
$V->{v} = new_scalar 69; # int v := 69
$V->{w} = new_scalar 17; # int w := 17
sub add(\%$)
{
my $V = shift;
my $z = $V->{z}; # save existing z
$V->{z} = new_scalar shift; # create & initialize new z
${$V->{u}} = ${$V->{v}} + ${$V->{u}} + ${$V->{z}};
$V->{z} = $z; # restore old z
}
sub bar(\%$)
{
my $V = shift;
my $fun = shift;
my $u = $V->{u}; # save existing u
$V->{u} = new_scalar ${$V->{w}}; # create & initialize new u
$fun->(${$V->{v}});
$V->{u} = $u; # restore old u
}
sub foo(\%$$)
{
my $V = shift;
my $x = $V->{x}; # save existing x
$V->{x} = new_scalar shift; # create & initialize new x
my $w = $V->{w}; # save existing w
$V->{w} = new_scalar shift; # create & initialize new w
my $v = $V->{v}; # save existing v
$V->{v} = new_scalar ${$V->{x}}; # create & initialize new v
bar %$V, bind_proc %$V, \&add;
$V->{v} = $v; # restore old v
$V->{w} = $w; # restore old w
$V->{x} = $x; # restore old x
}
foo %$V, ${$V->{u}}, 13;
print "${$V->{u}}\n";
__END__
and indeed it prints 126. It's obviously messy and error-prone, but it also really helps you understand what's going on, so for educational purposes I think it's worth it!
Simple and deep binding are Lisp interpreter viewpoints of the pseudocode. Scoping is just pointer arithmetic. Dynamic scope and static scope are the same if there are no free variables.
Static scope relies on a pointer to memory. Empty environments hold no symbol to value associations; denoted by word "End." Each time the interpreter reads an assignment, it makes space for association between a symbol and value.
The environment pointer is updated to point to the last association constructed.
env = End
env = [u,42] -> End
env = [v,69] -> [u,42] -> End
env = [w,17] -> [v,69] -> [u,42] -> End
Let me record this environment memory location as AAA. In my Lisp interpreter, when meeting a procedure, we take the environment pointer and put it our pocket.
env = [add,[closure,(lambda(z)(setq u (+ v u z)),*AAA*]]->[w,17]->[v,69]->[u,42]->End.
That's pretty much all there is until the procedure add is called. Interestingly, if add is never called, you just cost yourself a pointer.
Suppose the program calls add(8). OK, let's roll. The environment AAA is made current. Environment is ->[w,17]->[v,69]->[u,42]->End.
Procedure parameters of add are added to the front of the environment. The environment becomes [z,8]->[w,17]->[v,69]->[u,42]->End.
Now the procedure body of add is executed. Free variable v will have value 69. Free variable u will have value 42. z will have the value 8.
u := v + u + z
u will be assigned the value of 69 + 42 + 8 becomeing 119.
The environment will reflect this: [z,8]->[w,17]->[v,69]->[u,119]->End.
Assume procedure add has completed its task. Now the environment gets restored to its previous value.
env = [add,[closure,(lambda(z)(setq u (+ v u z)),*AAA*]]->[w,17]->[v,69]->[u,119]->End.
Notice how the procedure add has had a side effect of changing the value of free variable u. Awesome!
Regarding dynamic scoping: it just ensures closure leaves out dynamic symbols, thereby avoiding being captured and becoming dynamic.
Then put assignment to dynamic at top of code. If dynamic is same as parameter name, it gets masked by parameter value passed in.
Suppose I had a dynamic variable called z. When I called add(8), z would have been set to 8 regardless of what I wanted. That's probably why dynamic variables have longer names.
Rumour has it that dynamic variables are useful for things like backtracking, using let Lisp constructs.
Static binding, also known as lexical scope, refers to the scoping mechanism found in most modern languages.
In "lexical scope", the final value for u is neither 180 or 119, which are wrong answers.
The correct answer is u=101.
Please see standard Perl code below to understand why.
use strict;
use warnings;
my $u = 42;
my $v = 69;
my $w = 17;
sub add {
my $z = shift;
$u = $v + $u + $z;
}
sub bar {
my $fun = shift;
$u = $w;
$fun->($v);
}
sub foo {
my ($x, $w) = #_;
$v = $x;
bar( \&add );
}
foo($u,13);
print "u: $u\n";
Regarding shallow binding versus deep binding, both mechanisms date from the former LISP era.
Both mechanisms are meant to achieve dynamic binding (versus lexical scope binding) and therefore they produce identical results !
The differences between shallow binding and deep binding do not reside in semantics, which are identical, but in the implementation of dynamic binding.
With deep binding, variable bindings are set within a stack as "varname => varvalue" pairs.
The value of a given variable is retrieved from traversing the stack from top to bottom until a binding for the given variable is found.
Updating the variable consists in finding the binding in the stack and updating the associated value.
On entering a subroutine, a new binding for each actual parameter is pushed onto the stack, potentially hiding an older binding which is therefore no longer accessible wrt the retrieving mechanism described above (that stops at the 1st retrieved binding).
On leaving the subroutine, bindings for these parameters are simply popped from the binding stack, thus re-enabling access to the former bindings.
Please see the the code below for a Perl implementation of deep-binding dynamic scope.
use strict;
use warnings;
use utf8;
##
# Dynamic-scope deep-binding implementation
my #stack = ();
sub bindv {
my ($varname, $varval);
unshift #stack, [ $varname => $varval ]
while ($varname, $varval) = splice #_, 0, 2;
return $varval;
}
sub unbindv {
my $n = shift || 1;
shift #stack while $n-- > 0;
}
sub getv {
my $varname = shift;
for (my $i=0; $i < #stack; $i++) {
return $stack[$i][1]
if $varname eq $stack[$i][0];
}
return undef;
}
sub setv {
my ($varname, $varval) = #_;
for (my $i=0; $i < #stack; $i++) {
return $stack[$i][1] = $varval
if $varname eq $stack[$i][0];
}
return bindv($varname, $varval);
}
##
# EXERCICE
bindv( u => 42,
v => 69,
w => 17,
);
sub add {
bindv(z => shift);
setv(u => getv('v')
+ getv('u')
+ getv('z')
);
unbindv();
}
sub bar {
bindv(fun => shift);
setv(u => getv('w'));
getv('fun')->(getv('v'));
unbindv();
}
sub foo {
bindv(x => shift,
w => shift,
);
setv(v => getv('x'));
bar( \&add );
unbindv(2);
}
foo( getv('u'), 13);
print "u: ", getv('u'), "\n";
The result is u=97
Nevertheless, this constant traversal of the binding stack is costly : 0(n) complexity !
Shallow binding brings a wonderful O(1) enhanced performance over the previous implementation !
Shallow binding is improving the former mechanism by assigning each variable its own "cell", storing the value of the variable within the cell.
The value of a given variable is simply retrieved from the variable's
cell (using a hash table on variable names, we achieve a
0(1) complexity for accessing variable's values!)
Updating the variable's value is simply storing the value into the
variable's cell.
Creating a new binding (entering subs) works by pushing the old value
of the variable (a previous binding) onto the stack, and storing the
new local value in the value cell.
Eliminating a binding (leaving subs) works by popping the old value
off the stack into the variable's value cell.
Please see the the code below for a trivial Perl implementation of shallow-binding dynamic scope.
use strict;
use warnings;
our $u = 42;
our $v = 69;
our $w = 17;
our $z;
our $fun;
our $x;
sub add {
local $z = shift;
$u = $v + $u + $z;
}
sub bar {
local $fun = shift;
$u = $w;
$fun->($v);
}
sub foo {
local $x = shift;
local $w = shift;
$v = $x;
bar( \&add );
}
foo($u,13);
print "u: $u\n";
As you shall see, the result is still u=97
As a conclusion, remember two things :
shallow binding produces the same results as deep binding, but runs faster, since there is never a need to search for a binding.
The problem is not shallow binding versus deep binding versus
static binding BUT lexical scope versus dynamic scope (implemented either with deep or shallow binding).

Expression Evaluation and Tree Walking using polymorphism? (ala Steve Yegge)

This morning, I was reading Steve Yegge's: When Polymorphism Fails, when I came across a question that a co-worker of his used to ask potential employees when they came for their interview at Amazon.
As an example of polymorphism in
action, let's look at the classic
"eval" interview question, which (as
far as I know) was brought to Amazon
by Ron Braunstein. The question is
quite a rich one, as it manages to
probe a wide variety of important
skills: OOP design, recursion, binary
trees, polymorphism and runtime
typing, general coding skills, and (if
you want to make it extra hard)
parsing theory.
At some point, the candidate hopefully
realizes that you can represent an
arithmetic expression as a binary
tree, assuming you're only using
binary operators such as "+", "-",
"*", "/". The leaf nodes are all
numbers, and the internal nodes are
all operators. Evaluating the
expression means walking the tree. If
the candidate doesn't realize this,
you can gently lead them to it, or if
necessary, just tell them.
Even if you tell them, it's still an
interesting problem.
The first half of the question, which
some people (whose names I will
protect to my dying breath, but their
initials are Willie Lewis) feel is a
Job Requirement If You Want To Call
Yourself A Developer And Work At
Amazon, is actually kinda hard. The
question is: how do you go from an
arithmetic expression (e.g. in a
string) such as "2 + (2)" to an
expression tree. We may have an ADJ
challenge on this question at some
point.
The second half is: let's say this is
a 2-person project, and your partner,
who we'll call "Willie", is
responsible for transforming the
string expression into a tree. You get
the easy part: you need to decide what
classes Willie is to construct the
tree with. You can do it in any
language, but make sure you pick one,
or Willie will hand you assembly
language. If he's feeling ornery, it
will be for a processor that is no
longer manufactured in production.
You'd be amazed at how many candidates
boff this one.
I won't give away the answer, but a
Standard Bad Solution involves the use
of a switch or case statment (or just
good old-fashioned cascaded-ifs). A
Slightly Better Solution involves
using a table of function pointers,
and the Probably Best Solution
involves using polymorphism. I
encourage you to work through it
sometime. Fun stuff!
So, let's try to tackle the problem all three ways. How do you go from an arithmetic expression (e.g. in a string) such as "2 + (2)" to an expression tree using cascaded-if's, a table of function pointers, and/or polymorphism?
Feel free to tackle one, two, or all three.
[update: title modified to better match what most of the answers have been.]
Polymorphic Tree Walking, Python version
#!/usr/bin/python
class Node:
"""base class, you should not process one of these"""
def process(self):
raise('you should not be processing a node')
class BinaryNode(Node):
"""base class for binary nodes"""
def __init__(self, _left, _right):
self.left = _left
self.right = _right
def process(self):
raise('you should not be processing a binarynode')
class Plus(BinaryNode):
def process(self):
return self.left.process() + self.right.process()
class Minus(BinaryNode):
def process(self):
return self.left.process() - self.right.process()
class Mul(BinaryNode):
def process(self):
return self.left.process() * self.right.process()
class Div(BinaryNode):
def process(self):
return self.left.process() / self.right.process()
class Num(Node):
def __init__(self, _value):
self.value = _value
def process(self):
return self.value
def demo(n):
print n.process()
demo(Num(2)) # 2
demo(Plus(Num(2),Num(5))) # 2 + 3
demo(Plus(Mul(Num(2),Num(3)),Div(Num(10),Num(5)))) # (2 * 3) + (10 / 2)
The tests are just building up the binary trees by using constructors.
program structure:
abstract base class: Node
all Nodes inherit from this class
abstract base class: BinaryNode
all binary operators inherit from this class
process method does the work of evaluting the expression and returning the result
binary operator classes: Plus,Minus,Mul,Div
two child nodes, one each for left side and right side subexpressions
number class: Num
holds a leaf-node numeric value, e.g. 17 or 42
The problem, I think, is that we need to parse perentheses, and yet they are not a binary operator? Should we take (2) as a single token, that evaluates to 2?
The parens don't need to show up in the expression tree, but they do affect its shape. E.g., the tree for (1+2)+3 is different from 1+(2+3):
+
/ \
+ 3
/ \
1 2
versus
+
/ \
1 +
/ \
2 3
The parentheses are a "hint" to the parser (e.g., per superjoe30, to "recursively descend")
This gets into parsing/compiler theory, which is kind of a rabbit hole... The Dragon Book is the standard text for compiler construction, and takes this to extremes. In this particular case, you want to construct a context-free grammar for basic arithmetic, then use that grammar to parse out an abstract syntax tree. You can then iterate over the tree, reducing it from the bottom up (it's at this point you'd apply the polymorphism/function pointers/switch statement to reduce the tree).
I've found these notes to be incredibly helpful in compiler and parsing theory.
Representing the Nodes
If we want to include parentheses, we need 5 kinds of nodes:
the binary nodes: Add Minus Mul Divthese have two children, a left and right side
+
/ \
node node
a node to hold a value: Valno children nodes, just a numeric value
a node to keep track of the parens: Parena single child node for the subexpression
( )
|
node
For a polymorphic solution, we need to have this kind of class relationship:
Node
BinaryNode : inherit from Node
Plus : inherit from Binary Node
Minus : inherit from Binary Node
Mul : inherit from Binary Node
Div : inherit from Binary Node
Value : inherit from Node
Paren : inherit from node
There is a virtual function for all nodes called eval(). If you call that function, it will return the value of that subexpression.
String Tokenizer + LL(1) Parser will give you an expression tree... the polymorphism way might involve an abstract Arithmetic class with an "evaluate(a,b)" function, which is overridden for each of the operators involved (Addition, Subtraction etc) to return the appropriate value, and the tree contains Integers and Arithmetic operators, which can be evaluated by a post(?)-order traversal of the tree.
I won't give away the answer, but a
Standard Bad Solution involves the use
of a switch or case statment (or just
good old-fashioned cascaded-ifs). A
Slightly Better Solution involves
using a table of function pointers,
and the Probably Best Solution
involves using polymorphism.
The last twenty years of evolution in interpreters can be seen as going the other way - polymorphism (eg naive Smalltalk metacircular interpreters) to function pointers (naive lisp implementations, threaded code, C++) to switch (naive byte code interpreters), and then onwards to JITs and so on - which either require very big classes, or (in singly polymorphic languages) double-dispatch, which reduces the polymorphism to a type-case, and you're back at stage one. What definition of 'best' is in use here?
For simple stuff a polymorphic solution is OK - here's one I made earlier, but either stack and bytecode/switch or exploiting the runtime's compiler is usually better if you're, say, plotting a function with a few thousand data points.
Hm... I don't think you can write a top-down parser for this without backtracking, so it has to be some sort of a shift-reduce parser. LR(1) or even LALR will of course work just fine with the following (ad-hoc) language definition:
Start -> E1
E1 -> E1+E1 | E1-E1
E1 -> E2*E2 | E2/E2 | E2
E2 -> number | (E1)
Separating it out into E1 and E2 is necessary to maintain the precedence of * and / over + and -.
But this is how I would do it if I had to write the parser by hand:
Two stacks, one storing nodes of the tree as operands and one storing operators
Read the input left to right, make leaf nodes of the numbers and push them into the operand stack.
If you have >= 2 operands on the stack, pop 2, combine them with the topmost operator in the operator stack and push this structure back to the operand tree, unless
The next operator has higher precedence that the one currently on top of the stack.
This leaves us the problem of handling brackets. One elegant (I thought) solution is to store the precedence of each operator as a number in a variable. So initially,
int plus, minus = 1;
int mul, div = 2;
Now every time you see a a left bracket increment all these variables by 2, and every time you see a right bracket, decrement all the variables by 2.
This will ensure that the + in 3*(4+5) has higher precedence than the *, and 3*4 will not be pushed onto the stack. Instead it will wait for 5, push 4+5, then push 3*(4+5).
Re: Justin
I think the tree would look something like this:
+
/ \
2 ( )
|
2
Basically, you'd have an "eval" node, that just evaluates the tree below it. That would then be optimized out to just being:
+
/ \
2 2
In this case the parens aren't required and don't add anything. They don't add anything logically, so they'd just go away.
I think the question is about how to write a parser, not the evaluator. Or rather, how to create the expression tree from a string.
Case statements that return a base class don't exactly count.
The basic structure of a "polymorphic" solution (which is another way of saying, I don't care what you build this with, I just want to extend it with rewriting the least amount of code possible) is deserializing an object hierarchy from a stream with a (dynamic) set of known types.
The crux of the implementation of the polymorphic solution is to have a way to create an expression object from a pattern matcher, likely recursive. I.e., map a BNF or similar syntax to an object factory.
Or maybe this is the real question:
how can you represent (2) as a BST?
That is the part that is tripping me
up.
Recursion.
#Justin:
Look at my note on representing the nodes. If you use that scheme, then
2 + (2)
can be represented as
.
/ \
2 ( )
|
2
should use a functional language imo. Trees are harder to represent and manipulate in OO languages.
As people have been mentioning previously, when you use expression trees parens are not necessary. The order of operations becomes trivial and obvious when you're looking at an expression tree. The parens are hints to the parser.
While the accepted answer is the solution to one half of the problem, the other half - actually parsing the expression - is still unsolved. Typically, these sorts of problems can be solved using a recursive descent parser. Writing such a parser is often a fun exercise, but most modern tools for language parsing will abstract that away for you.
The parser is also significantly harder if you allow floating point numbers in your string. I had to create a DFA to accept floating point numbers in C -- it was a very painstaking and detailed task. Remember, valid floating points include: 10, 10., 10.123, 9.876e-5, 1.0f, .025, etc. I assume some dispensation from this (in favor of simplicty and brevity) was made in the interview.
I've written such a parser with some basic techniques like
Infix -> RPN and
Shunting Yard and
Tree Traversals.
Here is the implementation I've came up with.
It's written in C++ and compiles on both Linux and Windows.
Any suggestions/questions are welcomed.
So, let's try to tackle the problem all three ways. How do you go from an arithmetic expression (e.g. in a string) such as "2 + (2)" to an expression tree using cascaded-if's, a table of function pointers, and/or polymorphism?
This is interesting,but I don't think this belongs to the realm of object-oriented programming...I think it has more to do with parsing techniques.
I've kind of chucked this c# console app together as a bit of a proof of concept. Have a feeling it could be a lot better (that switch statement in GetNode is kind of clunky (it's there coz I hit a blank trying to map a class name to an operator)). Any suggestions on how it could be improved very welcome.
using System;
class Program
{
static void Main(string[] args)
{
string expression = "(((3.5 * 4.5) / (1 + 2)) + 5)";
Console.WriteLine(string.Format("{0} = {1}", expression, new Expression.ExpressionTree(expression).Value));
Console.WriteLine("\nShow's over folks, press a key to exit");
Console.ReadKey(false);
}
}
namespace Expression
{
// -------------------------------------------------------
abstract class NodeBase
{
public abstract double Value { get; }
}
// -------------------------------------------------------
class ValueNode : NodeBase
{
public ValueNode(double value)
{
_double = value;
}
private double _double;
public override double Value
{
get
{
return _double;
}
}
}
// -------------------------------------------------------
abstract class ExpressionNodeBase : NodeBase
{
protected NodeBase GetNode(string expression)
{
// Remove parenthesis
expression = RemoveParenthesis(expression);
// Is expression just a number?
double value = 0;
if (double.TryParse(expression, out value))
{
return new ValueNode(value);
}
else
{
int pos = ParseExpression(expression);
if (pos > 0)
{
string leftExpression = expression.Substring(0, pos - 1).Trim();
string rightExpression = expression.Substring(pos).Trim();
switch (expression.Substring(pos - 1, 1))
{
case "+":
return new Add(leftExpression, rightExpression);
case "-":
return new Subtract(leftExpression, rightExpression);
case "*":
return new Multiply(leftExpression, rightExpression);
case "/":
return new Divide(leftExpression, rightExpression);
default:
throw new Exception("Unknown operator");
}
}
else
{
throw new Exception("Unable to parse expression");
}
}
}
private string RemoveParenthesis(string expression)
{
if (expression.Contains("("))
{
expression = expression.Trim();
int level = 0;
int pos = 0;
foreach (char token in expression.ToCharArray())
{
pos++;
switch (token)
{
case '(':
level++;
break;
case ')':
level--;
break;
}
if (level == 0)
{
break;
}
}
if (level == 0 && pos == expression.Length)
{
expression = expression.Substring(1, expression.Length - 2);
expression = RemoveParenthesis(expression);
}
}
return expression;
}
private int ParseExpression(string expression)
{
int winningLevel = 0;
byte winningTokenWeight = 0;
int winningPos = 0;
int level = 0;
int pos = 0;
foreach (char token in expression.ToCharArray())
{
pos++;
switch (token)
{
case '(':
level++;
break;
case ')':
level--;
break;
}
if (level <= winningLevel)
{
if (OperatorWeight(token) > winningTokenWeight)
{
winningLevel = level;
winningTokenWeight = OperatorWeight(token);
winningPos = pos;
}
}
}
return winningPos;
}
private byte OperatorWeight(char value)
{
switch (value)
{
case '+':
case '-':
return 3;
case '*':
return 2;
case '/':
return 1;
default:
return 0;
}
}
}
// -------------------------------------------------------
class ExpressionTree : ExpressionNodeBase
{
protected NodeBase _rootNode;
public ExpressionTree(string expression)
{
_rootNode = GetNode(expression);
}
public override double Value
{
get
{
return _rootNode.Value;
}
}
}
// -------------------------------------------------------
abstract class OperatorNodeBase : ExpressionNodeBase
{
protected NodeBase _leftNode;
protected NodeBase _rightNode;
public OperatorNodeBase(string leftExpression, string rightExpression)
{
_leftNode = GetNode(leftExpression);
_rightNode = GetNode(rightExpression);
}
}
// -------------------------------------------------------
class Add : OperatorNodeBase
{
public Add(string leftExpression, string rightExpression)
: base(leftExpression, rightExpression)
{
}
public override double Value
{
get
{
return _leftNode.Value + _rightNode.Value;
}
}
}
// -------------------------------------------------------
class Subtract : OperatorNodeBase
{
public Subtract(string leftExpression, string rightExpression)
: base(leftExpression, rightExpression)
{
}
public override double Value
{
get
{
return _leftNode.Value - _rightNode.Value;
}
}
}
// -------------------------------------------------------
class Divide : OperatorNodeBase
{
public Divide(string leftExpression, string rightExpression)
: base(leftExpression, rightExpression)
{
}
public override double Value
{
get
{
return _leftNode.Value / _rightNode.Value;
}
}
}
// -------------------------------------------------------
class Multiply : OperatorNodeBase
{
public Multiply(string leftExpression, string rightExpression)
: base(leftExpression, rightExpression)
{
}
public override double Value
{
get
{
return _leftNode.Value * _rightNode.Value;
}
}
}
}
Ok, here is my naive implementation. Sorry, I did not feel to use objects for that one but it is easy to convert. I feel a bit like evil Willy (from Steve's story).
#!/usr/bin/env python
#tree structure [left argument, operator, right argument, priority level]
tree_root = [None, None, None, None]
#count of parethesis nesting
parenthesis_level = 0
#current node with empty right argument
current_node = tree_root
#indices in tree_root nodes Left, Operator, Right, PRiority
L, O, R, PR = 0, 1, 2, 3
#functions that realise operators
def sum(a, b):
return a + b
def diff(a, b):
return a - b
def mul(a, b):
return a * b
def div(a, b):
return a / b
#tree evaluator
def process_node(n):
try:
len(n)
except TypeError:
return n
left = process_node(n[L])
right = process_node(n[R])
return n[O](left, right)
#mapping operators to relevant functions
o2f = {'+': sum, '-': diff, '*': mul, '/': div, '(': None, ')': None}
#converts token to a node in tree
def convert_token(t):
global current_node, tree_root, parenthesis_level
if t == '(':
parenthesis_level += 2
return
if t == ')':
parenthesis_level -= 2
return
try: #assumption that we have just an integer
l = int(t)
except (ValueError, TypeError):
pass #if not, no problem
else:
if tree_root[L] is None: #if it is first number, put it on the left of root node
tree_root[L] = l
else: #put on the right of current_node
current_node[R] = l
return
priority = (1 if t in '+-' else 2) + parenthesis_level
#if tree_root does not have operator put it there
if tree_root[O] is None and t in o2f:
tree_root[O] = o2f[t]
tree_root[PR] = priority
return
#if new node has less or equals priority, put it on the top of tree
if tree_root[PR] >= priority:
temp = [tree_root, o2f[t], None, priority]
tree_root = current_node = temp
return
#starting from root search for a place with higher priority in hierarchy
current_node = tree_root
while type(current_node[R]) != type(1) and priority > current_node[R][PR]:
current_node = current_node[R]
#insert new node
temp = [current_node[R], o2f[t], None, priority]
current_node[R] = temp
current_node = temp
def parse(e):
token = ''
for c in e:
if c <= '9' and c >='0':
token += c
continue
if c == ' ':
if token != '':
convert_token(token)
token = ''
continue
if c in o2f:
if token != '':
convert_token(token)
convert_token(c)
token = ''
continue
print "Unrecognized character:", c
if token != '':
convert_token(token)
def main():
parse('(((3 * 4) / (1 + 2)) + 5)')
print tree_root
print process_node(tree_root)
if __name__ == '__main__':
main()