How can I implement 2d subscripts via AT-POS for different classes? - raku

here is an MRE (showing two attempts, with debug left in to be helpful) to try and get 2d subscripting working with AT-POS across a DataFrame that has columns of Series...
class Series does Positional {
has Real #.data = [0.1,0.2,0.3];
method AT-POS( $p ) {
#!data[$p]
}
}
class DataFrame does Positional {
has Series #.series;
#`[ ATTEMPT #1
method AT-POS( $p, $q? ) {
given $q {
when Int { #say 'Int';
#!series[$p][$q]
}
when Whatever { #say '*';
#!series[$p].data
}
default { #say 'default';
#!series[$p]
}
}
}
#]
# ATTEMPT #2
method AT-POS(|c) is raw { #`[dd c;] #!series.AT-POS(|c) }
}
my $df = DataFrame.new( series => [Series.new xx 3] );
say $df[1].data; #[0.1 0.2 0.3]
say $df[1][2]; #0.3
say $df[0,1]; #(Series.new(data => $[0.1, 0.2, 0.3]) Series.new(data => $[0.1, 0.2, 0.3]))
say $df[1;2]; #0.3
say $df[1;*]; #got (0.1) ... expected [0.1 0.2 0.3]
say $df[*;1]; #got (0.2) ... wanted [0.2 0.2 0.2]
I already researched on SO and found three related questions here, here and here ... and the attempt #2 in my code seeks to apply #lizmats Answer to the third one. Encouragingly both the attempts in my MRE have the same behaviour. But I cannot workout
why the when Whatever {} option is not entered (attempt #1)
what the |c is doing - even though I can see that it works in the single subscript case (attempt #2)
I have done some experimenting with multi postcircumfix:<[ ]>( DataFrame:D $df, #slicer where Range|List ) is export {} but this seems to overcomplicate matters.
==================
Great answer from #jonathan building on the original from #Lizmat - thanks! Here is the final, working code:
class Series does Positional {
has Real #.data = [0.1,0.2,0.3];
method elems {
#!data.elems
}
method AT-POS( |p ) is raw {
#!data.AT-POS( |p )
}
}
class DataFrame does Positional {
has Series #.series;
method elems {
#!series.elems
}
method AT-POS( |p ) is raw {
#!series.AT-POS( |p )
}
}
my $df = DataFrame.new( series => Series.new xx 3 );
say $df[1].data; #[0.1 0.2 0.3]
say $df[1][2]; #0.3
say $df[0,1]; #(Series.new(data => $[0.1, 0.2, 0.3]) Series.new(data => $[0.1, 0.2, 0.3]))
say $df[1;2]; #0.3
say $df[1;*]; #(0.1 0.2 0.3)
say $df[*;1]; #(0.2 0.2 0.2)

The AT-POS method is only ever passed integer array indices.
The logic to handle slicing (with *, ranges, other iterables, the zen slice) is located in the array indexing operator, which is implemented as the multiple-dispatch subroutine postcircumfix:<[ ]> for single-dimension indexing and postcircumfix:<[; ]> for multi-dimension indexing. The idea is that a class that wants to act as an array-alike need not worry about re-implementing all of the slicing behavior and, further, that the slicing behavior will behave consistently over different user-defined types.
For slicing to work, one must implement elems as well as AT-POS. Adding:
method elems() { #!data.elems }
Is Series and:
method elems() { #!series.elems }
In DataFrame gives the results you're looking for.
If one really wants different slicing semantics, or a far more efficient implementation than the standard one is possible, one can also add multi candidates for the indexing operator (remembering to mark them is export).

This answer is simply an elaboration of the point #raiph made in a comment:
You may be able to simplify the code by using handles
Indeed you can – so much so that I thought it was worth showing what that would look like in a code block without a comment's formatting limits.
Using handles, you can simplify each of the two classes from 9 non-whitespace lines to 3:
class Series does Positional {
has Real #.data handles <elems AT-POS> = [0.1,0.2,0.3];
}
class DataFrame does Positional {
has Series #.series handles <elems AT-POS>;
}
(or you could even have each class on 1 line, if you format them the way I'd be tempted to.)
This code produces all the same results to the say statements from the code in the question.

Related

Raku Ambiguous call to infix(Hyper: Dan::Series, Int)

I am writing a model Series class (kinda like the one in pandas) - and it should be both Positional and Associative.
class Series does Positional does Iterable does Associative {
has Array $.data is required;
has Array $.index;
### Construction ###
method TWEAK {
# sort out data-index dependencies
$!index = gather {
my $i = 0;
for |$!data -> $d {
take ( $!index[$i++] => $d )
}
}.Array
}
### Output ###
method Str {
$!index
}
### Role Support ###
# Positional role support
# viz. https://docs.raku.org/type/Positional
method of {
Mu
}
method elems {
$!data.elems
}
method AT-POS( $p ) {
$!data[$p]
}
method EXISTS-POS( $p ) {
0 <= $p < $!data.elems ?? True !! False
}
# Iterable role support
# viz. https://docs.raku.org/type/Iterable
method iterator {
$!data.iterator
}
method flat {
$!data.flat
}
method lazy {
$!data.lazy
}
method hyper {
$!data.hyper
}
# Associative role support
# viz. https://docs.raku.org/type/Associative
method keyof {
Str(Any)
}
method AT-KEY( $k ) {
for |$!index -> $p {
return $p.value if $p.key ~~ $k
}
}
method EXISTS-KEY( $k ) {
for |$!index -> $p {
return True if $p.key ~~ $k
}
}
#`[ solution attempt #1 does NOT get called
multi method infix(Hyper: Series, Int) is default {
die "I was called"
}
#]
}
my $s = Series.new(data => [rand xx 5], index => [<a b c d e>]);
say ~$s;
say $s[2];
say $s<b>;
So far pretty darn cool.
I can go dd $s.hyper and get this
HyperSeq.new(configuration => HyperConfiguration.new(batch => 64, degree => 1))
BUT (there had to be a but coming), I want to be able to do hyper math on my Series' elements, something like:
say $s >>+>> 2;
But that yields:
Ambiguous call to 'infix(Hyper: Dan::Series, Int)'; these signatures all match:
(Hyper: Associative:D \left, \right, *%_)
(Hyper: Positional:D \left, \right, *%_)
in block <unit> at ./synopsis-dan.raku line 63
How can I tell my class Series not to offer the Associative hyper candidate...?
Note: edited example to be a runnable MRE per #raiph's comment ... I have thus left in the minimum requirements for the 3 roles in play per docs.raku.org
After some experimentation (and new directions to consider from the very helpful comments to this SO along the way), I think I have found a solution:
drop the does Associative role from the class declaration like this:
class Series does Positional does Iterable {...}
BUT
leave the Associative role support methods in the body of the class:
# Associative role support
# viz. https://docs.raku.org/type/Associative
method keyof {
Str(Any)
}
method AT-KEY( $k ) {
for |$!index -> $p {
return $p.value if $p.key ~~ $k
}
}
method EXISTS-KEY( $k ) {
for |$!index -> $p {
return True if $p.key ~~ $k
}
}
This gives me the Positional and Associative accessors, and functional hyper math operators:
my $s = Series.new(data => [rand xx 5], index => [<a b c d e>]);
say ~$s; #([a => 0.6137271559776396 b => 0.7942959887386045 c => 0.5768018697817604 d => 0.8964323560788711 e => 0.025740150933493577] , dtype: Num)
say $s[2]; #0.7942959887386045
say $s<b>; #0.5768018697817604
say $s >>+>> 2; #(2.6137271559776396 2.7942959887386047 2.5768018697817605 2.896432356078871 2.0257401509334936)
While this feels a bit thin (and probably lacks the full set of Associative functions) I am fairly confident that the basic methods will give me slimmed down access like a hash from a key capability that I seek. And it no longer creates the ambiguous call.
This solution may be cheating a bit in that I know the level of compromise that I will accept ;-).
Take #1
First, an MRE with an emphasis on the M1:
class foo does Positional does Associative { method of {} }
sub infix:<baz> (\l,\r) { say 'baz' }
foo.new >>baz>> 42;
yields:
Ambiguous call to 'infix(Hyper: foo, Int)'; these signatures all match:
(Hyper: Associative:D \left, \right, *%_)
(Hyper: Positional:D \left, \right, *%_)
in block <unit> at ./synopsis-dan.raku line 63
The error message shows it's A) a call to a method named infix with an invocant matching Hyper, and B) there are two methods that potentially match that call.
Given that there's no class Hyper in your MRE, these methods and the Hyper class must be either built-ins or internal details that are leaking out.
A search of the doc finds no such class. So Hyper is undocumented Given that the doc has fairly broad coverage these days, this suggests Hyper is an internal detail. But regardless, it looks like you can't solve your problem using official/documented features.
Hopefully this bad news is still better than none.2
Take #2
Where's the fun in letting little details like "not an official feature" stop us doing what we want to do?
There's a core.c module named Hyper.pm6 in the Rakudo source repo.
A few seconds browsing that, and clicks on its History and Blame, and I can instantly see it really is time for me to conclude this SO answer, with a recommendation for your next move.
To wit, I suggest you start another SO, using this answer as its heart (but reversing my presentation order, ie starting by mentioning Hyper, and that it's not doc'd), and namechecking Liz (per Hyper's History/Blame), with a link back to your Q here as its background. I'm pretty sure that will get you a good answer, or at least an authoritative one.
Footnotes
1 I also tried this:
class foo does Positional does Associative { method of {} }
sub postfix:<bar>(\arg) { say 'bar' }
foo.new>>bar;
but that worked (displayed bar).
2 If you didn't get to my Take #1 conclusion yourself, perhaps that was was because your MRE wasn't very M? If you did arrive at the same point (cf "solution attempt #1 does NOT get called" in your MRE) then please read and, for future SOs, take to heart, the wisdom of "Explain ... any difficulties that have prevented you from solving it yourself".

Binding of private attributes: nqp::bindattr vs :=

I'm trying to find how the binding operation works on attributes and what makes it so different from nqp::bindattr. Consider the following example:
class Foo {
has #!foo;
submethod TWEAK {
my $fval = [<a b c>];
use nqp;
nqp::bindattr( nqp::decont(self), $?CLASS, '#!foo',
##!foo :=
Proxy.new(
FETCH => -> $ { $fval },
STORE => -> $, $v { $fval = $v }
)
);
}
method check {
say #!foo.perl;
}
}
my $inst = Foo.new;
$inst.check;
It prints:
$["a", "b", "c"]
Replacing nqp::bindattr with the binding operator from the comment gives correct output:
["a", "b", "c"]
Similarly, if foo is a public attribute and accessor is used the output would be correct too due to deconterisation taking place within the accessor.
I use similar code in my AttrX::Mooish module where use of := would overcomplicate the implementation. So far, nqp::bindattr did the good job for me until the above problem arised.
I tried tracing down Rakudo's internals looking for := implementation but without any success so far. I would ask here either for an advise as to how to simulate the operator or where in the source to look for its implementation.
Before I dig into the answer: most things in this post are implementation-defined, and the implementation is free to define them differently in the future.
To find out what something (naively) compiles into under Rakudo Perl 6, use the --target=ast option (perl6 --target=ast foo.p6). For example, the bind in:
class C {
has $!a;
submethod BUILD() {
my $x = [1,2,3];
$!a := $x
}
}
Comes out as:
- QAST::Op(bind) :statement_id<7>
- QAST::Var(attribute $!a) <wanted> $!a
- QAST::Var(lexical self)
- QAST::WVal(C)
- QAST::Var(lexical $x) $x
While switching it for #!a like here:
class C {
has #!a;
submethod BUILD() {
my $x = [1,2,3];
#!a := $x
}
}
Comes out as:
- QAST::Op(bind) :statement_id<7>
- QAST::Var(attribute #!a) <wanted> #!a
- QAST::Var(lexical self)
- QAST::WVal(C)
- QAST::Op(p6bindassert)
- QAST::Op(decont)
- QAST::Var(lexical $x) $x
- QAST::WVal(Positional)
The decont instruction is the big difference here, and it will take the contents of the Proxy by calling its FETCH, thus why the containerization is gone. Thus, you can replicate the behavior by inserting nqp::decont around the Proxy, although that rather begs the question of what the Proxy is doing there if the correct answer is obtained without it!
Both := and = are compiled using case analysis (namely, by looking at what is on the left hand side). := only works for a limited range of simple expressions on the left; it is a decidedly low-level operator. By contrast, = falls back to a sub call if the case analysis doesn't come up with a more efficient form to emit, though in most cases it manages something better.
The case analysis for := inserts a decont when the target is a lexical or attribute with sigil # or %, since - at a Perl 6 level - having an item bound to an # or % makes no sense. Using nqp::bindattr is going a level below Perl 6 semantics, and so it's possible to end up with the Proxy bound directly there using that. However, it also violates expectations elsewhere. Don't expect that to go well (but it seems you don't want to do that anyway.)

Alternative to nested if for conditional logic

Is there a design pattern or methodology or language that allows you to write complex conditional logic beyond just nested Ifs?
At the very least, does this kind of question or problem have a name? I was unable to find anything here or through Google that described what I was trying to solve that wasn't just, replace your IF with a Switch statement.
I'm playing around with a script to generate a bunch of data. As part of this, I'd like to add in a lot of branching conditional logic that should provide variety as well as block off certain combinations.
Something like, If User is part of group A, then they can't be part of group B, and if they have Attribute C, then that limits them to characteristic 5 or 6, but nothing below or above that.
The answer is simple: refactoring.
Let's take an example (pseudo-code):
if (a) {
if (b) {
if (c) {
// do something
}
}
}
can be replaced by:
if (a && b && c) {
// do something
}
Now, say that a, b and c are complex predicates which makes the code hard to read, for example:
if (visitorIsInActiveTestCell(visitor) &&
!specialOptOutConditionsApply(request, visitor) &&
whatEverWeWantToCheckHere(bla, blabla)) {
// do something
}
we can refactor it as well and create a new method:
def shouldDoSomething(request, visitor, bla, blabla) {
return visitorIsInActiveTestCell(visitor) &&
!specialOptOutConditionsApply(request, visitor) &&
whatEverWeWantToCheckHere(bla, blabla)
}
and now our if condition isn't nested and becomes easier to read and understand:
if (shouldDoSomething(request, visitor, bla, blabla)) {
// do something
}
Sometimes it's not straightforward to extract such logic and refactor, and it may require taking some time to think about it, but I haven't yet ran into an example in which it was impossible.
All of the foregoing answers seem to miss the question.
One of the patterns that frequently occurs in hardware-interface looks like this:
if (something) {
step1;
if ( the result of step1) {
step2;
if (the result of step2) {
step3;
... and so on
}}}...
This structure cannot be collapsed into a logical conjunction, as each step is dependent on the result of the previous one, and may itself have internal conditions.
In assembly code, it would be a simple matter of test and branch to a common target; i.e., the dreaded "go to". In C, you end up with a pile of indented code that after about 8 levels is very difficult to read.
About the best that I've been able to come up with is:
while( true) {
if ( !something)
break;
step1
if ( ! result of step1)
break;
step2
if ( ! result of step2)
break;
step3
...
break;
}
Does anyone have a better solution?
It is possible you want to replace your conditional logic with polymorphism, assuming you are using an object-oriented language.
That is, instead of:
class Bird:
#...
def getSpeed(self):
if self.type == EUROPEAN:
return self.getBaseSpeed();
elif self.type == AFRICAN:
return self.getBaseSpeed() - self.getLoadFactor() * self.numberOfCoconuts;
elif self.type == NORWEGIAN_BLUE:
return 0 if isNailed else self.getBaseSpeed(self.voltage)
else:
raise Exception("Should be unreachable")
You can say:
class Bird:
#...
def getSpeed(self):
pass
class European(Bird):
def getSpeed(self):
return self.getBaseSpeed()
class African(Bird):
def getSpeed(self):
return self.getBaseSpeed() - self.getLoadFactor() * self.numberOfCoconuts
class NorwegianBlue(Bird):
def getSpeed():
return 0 if self.isNailed else self.getBaseSpeed(self.voltage)
# Somewhere in client code
speed = bird.getSpeed()
Taken from here.

Conflicting lifetime requirement for iterator returned from function

This may be a duplicate. I don't know. I couldn't understand the other answers well enough to know that. :)
Rust version: rustc 1.0.0-nightly (b47aebe3f 2015-02-26) (built 2015-02-27)
Basically, I'm passing a bool to this function that's supposed to build an iterator that filters one way for true and another way for false. Then it kind of craps itself because it doesn't know how to keep that boolean value handy, I guess. I don't know. There are actually multiple lifetime problems here, which is discouraging because this is a really common pattern for me, since I come from a .NET background.
fn main() {
for n in values(true) {
println!("{}", n);
}
}
fn values(even: bool) -> Box<Iterator<Item=usize>> {
Box::new([3usize, 4, 2, 1].iter()
.map(|n| n * 2)
.filter(|n| if even {
n % 2 == 0
} else {
true
}))
}
Is there a way to make this work?
You have two conflicting issues, so let break down a few representative pieces:
[3usize, 4, 2, 1].iter()
.map(|n| n * 2)
.filter(|n| n % 2 == 0))
Here, we create an array in the stack frame of the method, then get an iterator to it. Since we aren't allowed to consume the array, the iterator item is &usize. We then map from the &usize to a usize. Then we filter against a &usize - we aren't allowed to consume the filtered item, otherwise the iterator wouldn't have it to return!
The problem here is that we are ultimately rooted to the stack frame of the function. We can't return this iterator, because the array won't exist after the call returns!
To work around this for now, let's just make it static. Now we can focus on the issue with even.
filter takes a closure. Closures capture any variable used that isn't provided as an argument to the closure. By default, these variables are captured by reference. However, even is again a variable located on the stack frame. This time however, we can give it to the closure by using the move keyword. Here's everything put together:
fn main() {
for n in values(true) {
println!("{}", n);
}
}
static ITEMS: [usize; 4] = [3, 4, 2, 1];
fn values(even: bool) -> Box<Iterator<Item=usize>> {
Box::new(ITEMS.iter()
.map(|n| n * 2)
.filter(move |n| if even {
n % 2 == 0
} else {
true
}))
}

Expression Evaluation and Tree Walking using polymorphism? (ala Steve Yegge)

This morning, I was reading Steve Yegge's: When Polymorphism Fails, when I came across a question that a co-worker of his used to ask potential employees when they came for their interview at Amazon.
As an example of polymorphism in
action, let's look at the classic
"eval" interview question, which (as
far as I know) was brought to Amazon
by Ron Braunstein. The question is
quite a rich one, as it manages to
probe a wide variety of important
skills: OOP design, recursion, binary
trees, polymorphism and runtime
typing, general coding skills, and (if
you want to make it extra hard)
parsing theory.
At some point, the candidate hopefully
realizes that you can represent an
arithmetic expression as a binary
tree, assuming you're only using
binary operators such as "+", "-",
"*", "/". The leaf nodes are all
numbers, and the internal nodes are
all operators. Evaluating the
expression means walking the tree. If
the candidate doesn't realize this,
you can gently lead them to it, or if
necessary, just tell them.
Even if you tell them, it's still an
interesting problem.
The first half of the question, which
some people (whose names I will
protect to my dying breath, but their
initials are Willie Lewis) feel is a
Job Requirement If You Want To Call
Yourself A Developer And Work At
Amazon, is actually kinda hard. The
question is: how do you go from an
arithmetic expression (e.g. in a
string) such as "2 + (2)" to an
expression tree. We may have an ADJ
challenge on this question at some
point.
The second half is: let's say this is
a 2-person project, and your partner,
who we'll call "Willie", is
responsible for transforming the
string expression into a tree. You get
the easy part: you need to decide what
classes Willie is to construct the
tree with. You can do it in any
language, but make sure you pick one,
or Willie will hand you assembly
language. If he's feeling ornery, it
will be for a processor that is no
longer manufactured in production.
You'd be amazed at how many candidates
boff this one.
I won't give away the answer, but a
Standard Bad Solution involves the use
of a switch or case statment (or just
good old-fashioned cascaded-ifs). A
Slightly Better Solution involves
using a table of function pointers,
and the Probably Best Solution
involves using polymorphism. I
encourage you to work through it
sometime. Fun stuff!
So, let's try to tackle the problem all three ways. How do you go from an arithmetic expression (e.g. in a string) such as "2 + (2)" to an expression tree using cascaded-if's, a table of function pointers, and/or polymorphism?
Feel free to tackle one, two, or all three.
[update: title modified to better match what most of the answers have been.]
Polymorphic Tree Walking, Python version
#!/usr/bin/python
class Node:
"""base class, you should not process one of these"""
def process(self):
raise('you should not be processing a node')
class BinaryNode(Node):
"""base class for binary nodes"""
def __init__(self, _left, _right):
self.left = _left
self.right = _right
def process(self):
raise('you should not be processing a binarynode')
class Plus(BinaryNode):
def process(self):
return self.left.process() + self.right.process()
class Minus(BinaryNode):
def process(self):
return self.left.process() - self.right.process()
class Mul(BinaryNode):
def process(self):
return self.left.process() * self.right.process()
class Div(BinaryNode):
def process(self):
return self.left.process() / self.right.process()
class Num(Node):
def __init__(self, _value):
self.value = _value
def process(self):
return self.value
def demo(n):
print n.process()
demo(Num(2)) # 2
demo(Plus(Num(2),Num(5))) # 2 + 3
demo(Plus(Mul(Num(2),Num(3)),Div(Num(10),Num(5)))) # (2 * 3) + (10 / 2)
The tests are just building up the binary trees by using constructors.
program structure:
abstract base class: Node
all Nodes inherit from this class
abstract base class: BinaryNode
all binary operators inherit from this class
process method does the work of evaluting the expression and returning the result
binary operator classes: Plus,Minus,Mul,Div
two child nodes, one each for left side and right side subexpressions
number class: Num
holds a leaf-node numeric value, e.g. 17 or 42
The problem, I think, is that we need to parse perentheses, and yet they are not a binary operator? Should we take (2) as a single token, that evaluates to 2?
The parens don't need to show up in the expression tree, but they do affect its shape. E.g., the tree for (1+2)+3 is different from 1+(2+3):
+
/ \
+ 3
/ \
1 2
versus
+
/ \
1 +
/ \
2 3
The parentheses are a "hint" to the parser (e.g., per superjoe30, to "recursively descend")
This gets into parsing/compiler theory, which is kind of a rabbit hole... The Dragon Book is the standard text for compiler construction, and takes this to extremes. In this particular case, you want to construct a context-free grammar for basic arithmetic, then use that grammar to parse out an abstract syntax tree. You can then iterate over the tree, reducing it from the bottom up (it's at this point you'd apply the polymorphism/function pointers/switch statement to reduce the tree).
I've found these notes to be incredibly helpful in compiler and parsing theory.
Representing the Nodes
If we want to include parentheses, we need 5 kinds of nodes:
the binary nodes: Add Minus Mul Divthese have two children, a left and right side
+
/ \
node node
a node to hold a value: Valno children nodes, just a numeric value
a node to keep track of the parens: Parena single child node for the subexpression
( )
|
node
For a polymorphic solution, we need to have this kind of class relationship:
Node
BinaryNode : inherit from Node
Plus : inherit from Binary Node
Minus : inherit from Binary Node
Mul : inherit from Binary Node
Div : inherit from Binary Node
Value : inherit from Node
Paren : inherit from node
There is a virtual function for all nodes called eval(). If you call that function, it will return the value of that subexpression.
String Tokenizer + LL(1) Parser will give you an expression tree... the polymorphism way might involve an abstract Arithmetic class with an "evaluate(a,b)" function, which is overridden for each of the operators involved (Addition, Subtraction etc) to return the appropriate value, and the tree contains Integers and Arithmetic operators, which can be evaluated by a post(?)-order traversal of the tree.
I won't give away the answer, but a
Standard Bad Solution involves the use
of a switch or case statment (or just
good old-fashioned cascaded-ifs). A
Slightly Better Solution involves
using a table of function pointers,
and the Probably Best Solution
involves using polymorphism.
The last twenty years of evolution in interpreters can be seen as going the other way - polymorphism (eg naive Smalltalk metacircular interpreters) to function pointers (naive lisp implementations, threaded code, C++) to switch (naive byte code interpreters), and then onwards to JITs and so on - which either require very big classes, or (in singly polymorphic languages) double-dispatch, which reduces the polymorphism to a type-case, and you're back at stage one. What definition of 'best' is in use here?
For simple stuff a polymorphic solution is OK - here's one I made earlier, but either stack and bytecode/switch or exploiting the runtime's compiler is usually better if you're, say, plotting a function with a few thousand data points.
Hm... I don't think you can write a top-down parser for this without backtracking, so it has to be some sort of a shift-reduce parser. LR(1) or even LALR will of course work just fine with the following (ad-hoc) language definition:
Start -> E1
E1 -> E1+E1 | E1-E1
E1 -> E2*E2 | E2/E2 | E2
E2 -> number | (E1)
Separating it out into E1 and E2 is necessary to maintain the precedence of * and / over + and -.
But this is how I would do it if I had to write the parser by hand:
Two stacks, one storing nodes of the tree as operands and one storing operators
Read the input left to right, make leaf nodes of the numbers and push them into the operand stack.
If you have >= 2 operands on the stack, pop 2, combine them with the topmost operator in the operator stack and push this structure back to the operand tree, unless
The next operator has higher precedence that the one currently on top of the stack.
This leaves us the problem of handling brackets. One elegant (I thought) solution is to store the precedence of each operator as a number in a variable. So initially,
int plus, minus = 1;
int mul, div = 2;
Now every time you see a a left bracket increment all these variables by 2, and every time you see a right bracket, decrement all the variables by 2.
This will ensure that the + in 3*(4+5) has higher precedence than the *, and 3*4 will not be pushed onto the stack. Instead it will wait for 5, push 4+5, then push 3*(4+5).
Re: Justin
I think the tree would look something like this:
+
/ \
2 ( )
|
2
Basically, you'd have an "eval" node, that just evaluates the tree below it. That would then be optimized out to just being:
+
/ \
2 2
In this case the parens aren't required and don't add anything. They don't add anything logically, so they'd just go away.
I think the question is about how to write a parser, not the evaluator. Or rather, how to create the expression tree from a string.
Case statements that return a base class don't exactly count.
The basic structure of a "polymorphic" solution (which is another way of saying, I don't care what you build this with, I just want to extend it with rewriting the least amount of code possible) is deserializing an object hierarchy from a stream with a (dynamic) set of known types.
The crux of the implementation of the polymorphic solution is to have a way to create an expression object from a pattern matcher, likely recursive. I.e., map a BNF or similar syntax to an object factory.
Or maybe this is the real question:
how can you represent (2) as a BST?
That is the part that is tripping me
up.
Recursion.
#Justin:
Look at my note on representing the nodes. If you use that scheme, then
2 + (2)
can be represented as
.
/ \
2 ( )
|
2
should use a functional language imo. Trees are harder to represent and manipulate in OO languages.
As people have been mentioning previously, when you use expression trees parens are not necessary. The order of operations becomes trivial and obvious when you're looking at an expression tree. The parens are hints to the parser.
While the accepted answer is the solution to one half of the problem, the other half - actually parsing the expression - is still unsolved. Typically, these sorts of problems can be solved using a recursive descent parser. Writing such a parser is often a fun exercise, but most modern tools for language parsing will abstract that away for you.
The parser is also significantly harder if you allow floating point numbers in your string. I had to create a DFA to accept floating point numbers in C -- it was a very painstaking and detailed task. Remember, valid floating points include: 10, 10., 10.123, 9.876e-5, 1.0f, .025, etc. I assume some dispensation from this (in favor of simplicty and brevity) was made in the interview.
I've written such a parser with some basic techniques like
Infix -> RPN and
Shunting Yard and
Tree Traversals.
Here is the implementation I've came up with.
It's written in C++ and compiles on both Linux and Windows.
Any suggestions/questions are welcomed.
So, let's try to tackle the problem all three ways. How do you go from an arithmetic expression (e.g. in a string) such as "2 + (2)" to an expression tree using cascaded-if's, a table of function pointers, and/or polymorphism?
This is interesting,but I don't think this belongs to the realm of object-oriented programming...I think it has more to do with parsing techniques.
I've kind of chucked this c# console app together as a bit of a proof of concept. Have a feeling it could be a lot better (that switch statement in GetNode is kind of clunky (it's there coz I hit a blank trying to map a class name to an operator)). Any suggestions on how it could be improved very welcome.
using System;
class Program
{
static void Main(string[] args)
{
string expression = "(((3.5 * 4.5) / (1 + 2)) + 5)";
Console.WriteLine(string.Format("{0} = {1}", expression, new Expression.ExpressionTree(expression).Value));
Console.WriteLine("\nShow's over folks, press a key to exit");
Console.ReadKey(false);
}
}
namespace Expression
{
// -------------------------------------------------------
abstract class NodeBase
{
public abstract double Value { get; }
}
// -------------------------------------------------------
class ValueNode : NodeBase
{
public ValueNode(double value)
{
_double = value;
}
private double _double;
public override double Value
{
get
{
return _double;
}
}
}
// -------------------------------------------------------
abstract class ExpressionNodeBase : NodeBase
{
protected NodeBase GetNode(string expression)
{
// Remove parenthesis
expression = RemoveParenthesis(expression);
// Is expression just a number?
double value = 0;
if (double.TryParse(expression, out value))
{
return new ValueNode(value);
}
else
{
int pos = ParseExpression(expression);
if (pos > 0)
{
string leftExpression = expression.Substring(0, pos - 1).Trim();
string rightExpression = expression.Substring(pos).Trim();
switch (expression.Substring(pos - 1, 1))
{
case "+":
return new Add(leftExpression, rightExpression);
case "-":
return new Subtract(leftExpression, rightExpression);
case "*":
return new Multiply(leftExpression, rightExpression);
case "/":
return new Divide(leftExpression, rightExpression);
default:
throw new Exception("Unknown operator");
}
}
else
{
throw new Exception("Unable to parse expression");
}
}
}
private string RemoveParenthesis(string expression)
{
if (expression.Contains("("))
{
expression = expression.Trim();
int level = 0;
int pos = 0;
foreach (char token in expression.ToCharArray())
{
pos++;
switch (token)
{
case '(':
level++;
break;
case ')':
level--;
break;
}
if (level == 0)
{
break;
}
}
if (level == 0 && pos == expression.Length)
{
expression = expression.Substring(1, expression.Length - 2);
expression = RemoveParenthesis(expression);
}
}
return expression;
}
private int ParseExpression(string expression)
{
int winningLevel = 0;
byte winningTokenWeight = 0;
int winningPos = 0;
int level = 0;
int pos = 0;
foreach (char token in expression.ToCharArray())
{
pos++;
switch (token)
{
case '(':
level++;
break;
case ')':
level--;
break;
}
if (level <= winningLevel)
{
if (OperatorWeight(token) > winningTokenWeight)
{
winningLevel = level;
winningTokenWeight = OperatorWeight(token);
winningPos = pos;
}
}
}
return winningPos;
}
private byte OperatorWeight(char value)
{
switch (value)
{
case '+':
case '-':
return 3;
case '*':
return 2;
case '/':
return 1;
default:
return 0;
}
}
}
// -------------------------------------------------------
class ExpressionTree : ExpressionNodeBase
{
protected NodeBase _rootNode;
public ExpressionTree(string expression)
{
_rootNode = GetNode(expression);
}
public override double Value
{
get
{
return _rootNode.Value;
}
}
}
// -------------------------------------------------------
abstract class OperatorNodeBase : ExpressionNodeBase
{
protected NodeBase _leftNode;
protected NodeBase _rightNode;
public OperatorNodeBase(string leftExpression, string rightExpression)
{
_leftNode = GetNode(leftExpression);
_rightNode = GetNode(rightExpression);
}
}
// -------------------------------------------------------
class Add : OperatorNodeBase
{
public Add(string leftExpression, string rightExpression)
: base(leftExpression, rightExpression)
{
}
public override double Value
{
get
{
return _leftNode.Value + _rightNode.Value;
}
}
}
// -------------------------------------------------------
class Subtract : OperatorNodeBase
{
public Subtract(string leftExpression, string rightExpression)
: base(leftExpression, rightExpression)
{
}
public override double Value
{
get
{
return _leftNode.Value - _rightNode.Value;
}
}
}
// -------------------------------------------------------
class Divide : OperatorNodeBase
{
public Divide(string leftExpression, string rightExpression)
: base(leftExpression, rightExpression)
{
}
public override double Value
{
get
{
return _leftNode.Value / _rightNode.Value;
}
}
}
// -------------------------------------------------------
class Multiply : OperatorNodeBase
{
public Multiply(string leftExpression, string rightExpression)
: base(leftExpression, rightExpression)
{
}
public override double Value
{
get
{
return _leftNode.Value * _rightNode.Value;
}
}
}
}
Ok, here is my naive implementation. Sorry, I did not feel to use objects for that one but it is easy to convert. I feel a bit like evil Willy (from Steve's story).
#!/usr/bin/env python
#tree structure [left argument, operator, right argument, priority level]
tree_root = [None, None, None, None]
#count of parethesis nesting
parenthesis_level = 0
#current node with empty right argument
current_node = tree_root
#indices in tree_root nodes Left, Operator, Right, PRiority
L, O, R, PR = 0, 1, 2, 3
#functions that realise operators
def sum(a, b):
return a + b
def diff(a, b):
return a - b
def mul(a, b):
return a * b
def div(a, b):
return a / b
#tree evaluator
def process_node(n):
try:
len(n)
except TypeError:
return n
left = process_node(n[L])
right = process_node(n[R])
return n[O](left, right)
#mapping operators to relevant functions
o2f = {'+': sum, '-': diff, '*': mul, '/': div, '(': None, ')': None}
#converts token to a node in tree
def convert_token(t):
global current_node, tree_root, parenthesis_level
if t == '(':
parenthesis_level += 2
return
if t == ')':
parenthesis_level -= 2
return
try: #assumption that we have just an integer
l = int(t)
except (ValueError, TypeError):
pass #if not, no problem
else:
if tree_root[L] is None: #if it is first number, put it on the left of root node
tree_root[L] = l
else: #put on the right of current_node
current_node[R] = l
return
priority = (1 if t in '+-' else 2) + parenthesis_level
#if tree_root does not have operator put it there
if tree_root[O] is None and t in o2f:
tree_root[O] = o2f[t]
tree_root[PR] = priority
return
#if new node has less or equals priority, put it on the top of tree
if tree_root[PR] >= priority:
temp = [tree_root, o2f[t], None, priority]
tree_root = current_node = temp
return
#starting from root search for a place with higher priority in hierarchy
current_node = tree_root
while type(current_node[R]) != type(1) and priority > current_node[R][PR]:
current_node = current_node[R]
#insert new node
temp = [current_node[R], o2f[t], None, priority]
current_node[R] = temp
current_node = temp
def parse(e):
token = ''
for c in e:
if c <= '9' and c >='0':
token += c
continue
if c == ' ':
if token != '':
convert_token(token)
token = ''
continue
if c in o2f:
if token != '':
convert_token(token)
convert_token(c)
token = ''
continue
print "Unrecognized character:", c
if token != '':
convert_token(token)
def main():
parse('(((3 * 4) / (1 + 2)) + 5)')
print tree_root
print process_node(tree_root)
if __name__ == '__main__':
main()