Python 3.10 - How to properly type hint __add__ in a dataclass that inherits Generic[TypeVar]? - oop

I have this package called totemp that provides convertions between temperature scales. It's currently in development and I just don't know how to type hint the add magic method.
There are 8 dataclasses, Celsius, Fahrenheit, Delisle, Kelvin, Newton, Rankine, Réaumur and Romer.
Celsius class with dunder add and one of the convertion methods (#no_type_check to prevent mypy from not letting me commit).
from dataclasses import dataclass, field
from typing import Generic, TypeVar, no_type_check
TEMP = TypeVar('TEMP', int, float) # TEMP must be int or float
#dataclass
class Celsius(Generic[TEMP]):
"""..."""
__value: TEMP
__symbol: str = field(compare=False, repr=False, default='ºC')
def __str__(self) -> str:
return f'{self.__value}{self.__symbol}'
def __repr__(self) -> str:
return f'Celsius({self.__value})'
#no_type_check
def __add__(self, other):
match str(type(other)):
case "<class 'totemp.temperature_types.Celsius'>":
return Celsius(self.__value + other.value)
case _:
try:
other = other.to_celsius()
return Celsius(self.__value + other.value)
except AttributeError as error:
print(
f'\033[31;1mAttributeError: {error}.\n ' +
f"Cause: '{other}' is not a temperature scale.\033[m"
)
def to_fahrenheit(self) -> 'Fahrenheit[TEMP]':
"""
Returns a Fahrenheit object which contains the class attribute "value"
with the result from the conversion typed the same as the attribute.
Returns
-------
Fahrenheit[TEMP]
"""
fahrenheit = type(self.__value)(self.__value * 9 / 5 + 32)
return Fahrenheit(fahrenheit)
I tried creating an abstract class, but that didn't worked well, didn't even know what I was doing tbh. Tried read Python docs, but I had no response for what i'm trying to do, couldn't find any solution for myself. Thought about NewType, but after reading about in the docs, didn't understood it very well too.
Thought about doing something like:
GenericType = NewType('Temp', Union[Celsius, Fahrenheit, ...])
But ended up with more mypy errors that I couldn't deal with.
How do I properly type hint dunder add and other magic methods (such as gt, lt, e.g.)?

I got lost in more and more things I thought might be optimized with the code, so I ended up placing a merge request for your current dev branch.
To quickly answer the basic idea relevant for this particular question:
I don't think the class should be generic. I don't understand the purpose of that.
A base class is definitely a good idea (considering the DRY principle). Making it abstract, you could enforce a convert_to method for all subclasses, which you could then utilize in your numeric operand dunder methods like __add__.
Then typing would become fairly easy with bounded type variables:
from abc import ABCMeta, abstractmethod
from typing import Any, ClassVar, TypeVar
T = TypeVar("T", bound="AbstractTemperature")
class AbstractTemperature(metaclass=ABCMeta):
def __init__(self, value: float) -> None:
self._value = value
#property
def value(self) -> float:
return self._value
#abstractmethod
def convert_to(self, temp_cls: type[T]) -> T:
...
def __add__(self: T, other: Any) -> T:
cls = self.__class__
try:
if isinstance(other, AbstractTemperature):
return cls(self._value + other.convert_to(cls).value)
return cls(self._value + other)
except TypeError:
return NotImplemented

Related

Confusion on when to use T.let vs. T.cast

Based on: https://sorbet.org/docs/type-assertions
In my mind, T.cast seems like a more powerful version of T.let. T.cast(expr, type) forces Sorbet to accept expr as type, even if expr cannot be statically verified as type. T.let(expr, type) hints to Sorbet that expr has type, and Sorbet will be able to typecheck that hint is valid. Based on this understanding, I would assume that T.cast could always be used in place of T.let, but T.let is preferred because Sorbet is providing the most type safety. However, I recently found a case where that is not true (see below).
This example is based on: https://sorbet.org/docs/error-reference#7001
# typed: true
class Main
extend T::Sig
sig { params(elem: Integer).returns(T::Boolean) }
def valid?(elem)
elem == 0
end
sig { params(list: T::Array[Integer]).returns(T::Boolean) }
def works(list)
# Declare `all_valid` with type `T::Boolean`
all_valid = T.let(false, T::Boolean)
list.each do |elem|
# Does not change the type of `all_valid`
all_valid &&= valid?(elem) # ok
end
all_valid
end
sig { params(list: T::Array[Integer]).returns(T::Boolean) }
def does_not_work(list)
all_valid = T.cast(false, T::Boolean)
list.each do |elem|
# ERROR! Changing the type of a variable in a loop is not permitted
all_valid &&= valid?(elem)
end
all_valid
end
end
Why does this work with T.let but not with T.cast?
The T.let annotation explicitly widens the type from FalseClass to T:Boolean
T.cast on the other hand simply relaxes Sorbet’s static checks, it doesn’t silence type errors at runtime that occur from treating TrueClass or FalseClass as T::Boolean.

Implementing map & min that takes the tables.keys iterator as argument in Nim

I would like to define overloads of map and min/max (as originally defined in sequtils) that works for tables.keys. Specifically, I want to be able to write something like the following:
import sequtils, sugar, tables
# A mapping from coordinates (x, y) to values.
var locations = initTable[(int, int), int]()
# Put in some random values.
locations[(1, 2)] = 1
locations[(2, 1)] = 2
locations[(-2, 5)] = 3
# Get the minimum X coordinate.
let minX = locations.keys.map(xy => xy[0]).min
echo minX
Now this fails with:
/usercode/in.nim(12, 24) Error: type mismatch: got <iterable[lent (int, int)], proc (xy: GenericParam): untyped>
but expected one of:
proc map[T, S](s: openArray[T]; op: proc (x: T): S {.closure.}): seq[S]
first type mismatch at position: 1
required type for s: openArray[T]
but expression 'keys(locations)' is of type: iterable[lent (int, int)]
expression: map(keys(locations), proc (xy: auto): auto = xy[0])
Below are my three attempts at writing a map that works (code on Nim playground: https://play.nim-lang.org/#ix=3Heq). Attempts 1 & 2 failed and attempt 3 succeeded. Similarly, I implemented min using both attempt 1 & attempt 2, and attempt 1 failed while attempt 2 succeeded.
However, I'm confused as to why the previous attempts fail, and what the best practice is:
Why does attempt 1 fail when the actual return type of the iterators is iterable[T]?
Why does attempt 2 fail for tables.keys? Is tables.keys implemented differently?
Is attempt 2 the canonical way of taking iterators / iterables as function arguments? Are there alternatives to this?
Attempt 1: Function that takes an iterable[T].
Since the Nim manual seems to imply that the result type of calling an iterator is iterable[T], I tried defining map for iterable[T] like this:
iterator map[A, B](iter: iterable[A], fn: A -> B): B =
for x in iter:
yield fn(x)
But it failed with a pretty long and confusing message:
/usercode/in.nim(16, 24) template/generic instantiation of `map` from here
/usercode/in.nim(11, 12) Error: type mismatch: got <iterable[(int, int)]>
but expected one of:
iterator items(a: cstring): char
first type mismatch at position: 1
required type for a: cstring
but expression 'iter' is of type: iterable[(int, int)]
... (more output like this)
From my understanding it seems to say that items is not defined for iterable[T], which seems weird to me because I think items is exactly what's need for an object to be iterable?
Attempt 2: Function that returns an iterator.
I basically copied the implementation in def-/nim-itertools and defined a map function that takes an iterator and returns a new closure iterator:
type Iterable[T] = (iterator: T)
func map[A, B](iter: Iterable[A], fn: A -> B): iterator: B =
(iterator: B =
for x in iter():
yield fn(x))
but this failed with:
/usercode/in.nim(25, 24) Error: type mismatch: got <iterable[lent (int, int)], proc (xy: GenericParam): untyped>
but expected one of:
func map[A, B](iter: Iterable[A]; fn: A -> B): B
first type mismatch at position: 1
required type for iter: Iterable[map.A]
but expression 'keys(locations)' is of type: iterable[lent (int, int)]
proc map[T, S](s: openArray[T]; op: proc (x: T): S {.closure.}): seq[S]
first type mismatch at position: 1
required type for s: openArray[T]
but expression 'keys(locations)' is of type: iterable[lent (int, int)]
expression: map(keys(locations), proc (xy: auto): auto = xy[0])
which hints that maybe tables.keys doesn't return an iterator?
Attempt 3: Rewrite keys using attempt 2.
This replaces tables.keys using a custom myKeys that's implemented in a similar fashion to the version of map in attempt 2. Combined with map in attempt 2, this works:
func myKeys[K, V](table: Table[K, V]): iterator: K =
(iterator: K =
for x in table.keys:
yield x)
Explanation of errors in first attempts
which hints that maybe tables.keys doesn't return an iterator
You are right. It does not return an iterator, it is an iterator that returns elements of the type of your Table keys. Unlike in python3, there seems to be no difference between type(locations.keys) and type(locations.keys()). They both return (int, int).
Here is keys prototype:
iterator keys[A, B](t: Table[A, B]): lent A
The lent keyword avoids copies from the Table elements.
Hence you get a type mismatch for your first and second attempt:
locations.keys.map(xy => xy[0]) has an incorrect first parameter, since you get a (int, int) element where you expect a iterable[A].
Proposals
As for a solution, you can either first convert your keys to a sequence (which is heavy), like hola suggested.
You can directly rewrite a procedure for your specific application, mixing both the copy in the sequence and your operation, gaining a bit in performance.
import tables
# A mapping from coordinates (x, y) to values.
var locations = initTable[(int, int), int]()
# Put in some random values.
locations[(1, 2)] = 1
locations[(2, 1)] = 2
locations[(-2, 5)] = 3
func firstCoordinate[X, Y, V](table: Table[(X, Y), V]): seq[X] =
result = #[]
for x in table.keys:
result.add(x[0])
let minX = locations.firstCoordinate.min
echo minX
This is not strictly adhering your API, but should be more efficient.

Can a map be used in the 'def __init__' ? And how to read errors

I've created a class for a Counter that I'm wanting to build, and I want this object to figure out player scores. I'm expecting to see 3 when I test it, however what I get is the following error:
TypeError: unsupported operand type(s) for += 'method' and 'int' I'm not sure what that means, so I'm not sure if the issue is the map, my object itself, or my syntax. Any and all advice is welcome!
class Counter:
def __init__(self):
self.pointValue = {'Item_1': 3,
'Item_2': 5,
'Item_3': -3
}
def playerScore(self, itemFound ):
self.playerScore += (self.pointValue[itemFound])
return self.playerScore
gameCounter = Counter()
x = 'Item_1'
print (gameCounter.playerScore(x))
You can absolutely use a map (in Python it's called a dict) in __init__; it is not that part of the code that is throwing the error.
The message unsupported operand type(s) for += 'method' and 'int' refers to the line
self.playerScore += (self.pointValue[itemFound])
where self.playerScore is the method you are currently calling; it makes no sense to add an integer to a method, so an error is raised.
I think you actually want something like:
class ScoreCounter:
POINT_VALUE = {'Item_1': 3,
'Item_2': 5,
'Item_3': -3}
def __init__(self):
self.score = 0
def find_item(self, item):
self.score += self.POINT_VALUE.get(item, 0)
return self.score
Which you could use like:
>>> c = ScoreCounter()
>>> c.find_item("Item_1")
3
Note that the POINT_VALUE is a class attribute, shared by all ScoreCounter instances, whereas the score is an instance attribute, separate for each one. Also note that score is a different name to find_item, so you don't end up trying to assign to the method, and I have avoided the name Counter as it's used elsewhere. Finally, I have adopted naming conventions from the Python style guide.

How can i rewrite or convert this C++ code to Haskell code

The c++ code which i wanted to rewrite or convert is:
class numberClass
{
private:
int value;
public:
int read()
{
return value;
}
void load(int x)
{
value = x;
}
void increment()
{
value= value +1;
}
};
int main()
{
numberClass num;
num.load(5);
int x=num.read();
cout<<x<<endl;
num.increment();
x=num.read();
cout<<x;
}
I do not know how to make any entity(like variable in C++) that can hold value throughout the program in haskell.
Please help.
Thanks
Basically, you can't. Values are immutable, and Haskell has no variables in the sense of boxes where you store values, like C++ and similar. You can do something similar using IORefs (which are boxes you can store values in), but it's almost always a wrong design to use them.
Haskell is a very different programming language, it's not a good idea to try to translate code from a language like C, C++, Java or so to Haskell. One has to view the tasks from different angles and approach it in a different way.
That being said:
module Main (main) where
import Data.IORef
main :: IO ()
main = do
num <- newIORef 5 :: IO (IORef Int)
x <- readIORef num
print x
modifyIORef num (+1)
x <- readIORef num
print x
Well, assuming that it's the wrapping, not the mutability, you can easily have a type that only allows constructing constant values and incrementation:
module Incr (Incr, incr, fromIncr, toIncr) where
newtype Incr a = Incr a deriving (Read, Show)
fromIncr :: Incr a -> a
fromIncr (Incr x) = x
incr :: (Enum a) => Incr a -> Incr a
incr (Incr x) = Incr (succ x)
toIncr :: a -> Incr a
toIncr = Incr
As Daniel pointed out, mutability is out of the question, but another purpose of your class is encapsulation, which this module provides just like the C++ class. Of course to a Haskell programmer, this module might not seem very useful, but perhaps you have use cases in mind, where you want to statically prevent library users from using regular addition or multiplication.
A direct translation of your code to haskell is rather stupid but of course possible (as shown in Daniel's answer).
Usually when you are working with state in haskell you might be able to work with the State Monad. As long as you are executing inside the State Monad you can query and update your state. If you want to be able to do some IO in addition (as in your example), you need to stack your State Monad on top of IO.
Using this approach your code might look like this:
import Control.Monad.State
import Prelude hiding(read)
increment = modify (+1)
load = put
read = get
normal :: StateT Int IO ()
normal = do
load 5
x <- read
lift (print x)
increment
x <- read
lift (print x)
main = evalStateT normal 0
But here you don't have an explicit type for your numberClass. If you want this there is a nice library on hackage that you could use: data-lenses.
Using lenses the code might be a little closer to your C++ version:
{-# LANGUAGE TemplateHaskell #-}
import Control.Monad.State(StateT,evalStateT,lift)
import Prelude hiding(read)
import Data.Lens.Lazy((~=),access,(%=))
import Data.Lens.Template(makeLenses)
data Number = Number {
_value :: Int
} deriving (Show)
$( makeLenses [''Number] )
increment = value %= succ
load x = value ~= x
read = access value
withLens :: StateT Number IO ()
withLens = do
load 5
x <- read
lift $ print x
increment
x <- read
lift $ print x
main = evalStateT withLens (Number 0)
Still not exactly your code...but well, it's haskell and not yet another OO-language.

Expression Evaluation and Tree Walking using polymorphism? (ala Steve Yegge)

This morning, I was reading Steve Yegge's: When Polymorphism Fails, when I came across a question that a co-worker of his used to ask potential employees when they came for their interview at Amazon.
As an example of polymorphism in
action, let's look at the classic
"eval" interview question, which (as
far as I know) was brought to Amazon
by Ron Braunstein. The question is
quite a rich one, as it manages to
probe a wide variety of important
skills: OOP design, recursion, binary
trees, polymorphism and runtime
typing, general coding skills, and (if
you want to make it extra hard)
parsing theory.
At some point, the candidate hopefully
realizes that you can represent an
arithmetic expression as a binary
tree, assuming you're only using
binary operators such as "+", "-",
"*", "/". The leaf nodes are all
numbers, and the internal nodes are
all operators. Evaluating the
expression means walking the tree. If
the candidate doesn't realize this,
you can gently lead them to it, or if
necessary, just tell them.
Even if you tell them, it's still an
interesting problem.
The first half of the question, which
some people (whose names I will
protect to my dying breath, but their
initials are Willie Lewis) feel is a
Job Requirement If You Want To Call
Yourself A Developer And Work At
Amazon, is actually kinda hard. The
question is: how do you go from an
arithmetic expression (e.g. in a
string) such as "2 + (2)" to an
expression tree. We may have an ADJ
challenge on this question at some
point.
The second half is: let's say this is
a 2-person project, and your partner,
who we'll call "Willie", is
responsible for transforming the
string expression into a tree. You get
the easy part: you need to decide what
classes Willie is to construct the
tree with. You can do it in any
language, but make sure you pick one,
or Willie will hand you assembly
language. If he's feeling ornery, it
will be for a processor that is no
longer manufactured in production.
You'd be amazed at how many candidates
boff this one.
I won't give away the answer, but a
Standard Bad Solution involves the use
of a switch or case statment (or just
good old-fashioned cascaded-ifs). A
Slightly Better Solution involves
using a table of function pointers,
and the Probably Best Solution
involves using polymorphism. I
encourage you to work through it
sometime. Fun stuff!
So, let's try to tackle the problem all three ways. How do you go from an arithmetic expression (e.g. in a string) such as "2 + (2)" to an expression tree using cascaded-if's, a table of function pointers, and/or polymorphism?
Feel free to tackle one, two, or all three.
[update: title modified to better match what most of the answers have been.]
Polymorphic Tree Walking, Python version
#!/usr/bin/python
class Node:
"""base class, you should not process one of these"""
def process(self):
raise('you should not be processing a node')
class BinaryNode(Node):
"""base class for binary nodes"""
def __init__(self, _left, _right):
self.left = _left
self.right = _right
def process(self):
raise('you should not be processing a binarynode')
class Plus(BinaryNode):
def process(self):
return self.left.process() + self.right.process()
class Minus(BinaryNode):
def process(self):
return self.left.process() - self.right.process()
class Mul(BinaryNode):
def process(self):
return self.left.process() * self.right.process()
class Div(BinaryNode):
def process(self):
return self.left.process() / self.right.process()
class Num(Node):
def __init__(self, _value):
self.value = _value
def process(self):
return self.value
def demo(n):
print n.process()
demo(Num(2)) # 2
demo(Plus(Num(2),Num(5))) # 2 + 3
demo(Plus(Mul(Num(2),Num(3)),Div(Num(10),Num(5)))) # (2 * 3) + (10 / 2)
The tests are just building up the binary trees by using constructors.
program structure:
abstract base class: Node
all Nodes inherit from this class
abstract base class: BinaryNode
all binary operators inherit from this class
process method does the work of evaluting the expression and returning the result
binary operator classes: Plus,Minus,Mul,Div
two child nodes, one each for left side and right side subexpressions
number class: Num
holds a leaf-node numeric value, e.g. 17 or 42
The problem, I think, is that we need to parse perentheses, and yet they are not a binary operator? Should we take (2) as a single token, that evaluates to 2?
The parens don't need to show up in the expression tree, but they do affect its shape. E.g., the tree for (1+2)+3 is different from 1+(2+3):
+
/ \
+ 3
/ \
1 2
versus
+
/ \
1 +
/ \
2 3
The parentheses are a "hint" to the parser (e.g., per superjoe30, to "recursively descend")
This gets into parsing/compiler theory, which is kind of a rabbit hole... The Dragon Book is the standard text for compiler construction, and takes this to extremes. In this particular case, you want to construct a context-free grammar for basic arithmetic, then use that grammar to parse out an abstract syntax tree. You can then iterate over the tree, reducing it from the bottom up (it's at this point you'd apply the polymorphism/function pointers/switch statement to reduce the tree).
I've found these notes to be incredibly helpful in compiler and parsing theory.
Representing the Nodes
If we want to include parentheses, we need 5 kinds of nodes:
the binary nodes: Add Minus Mul Divthese have two children, a left and right side
+
/ \
node node
a node to hold a value: Valno children nodes, just a numeric value
a node to keep track of the parens: Parena single child node for the subexpression
( )
|
node
For a polymorphic solution, we need to have this kind of class relationship:
Node
BinaryNode : inherit from Node
Plus : inherit from Binary Node
Minus : inherit from Binary Node
Mul : inherit from Binary Node
Div : inherit from Binary Node
Value : inherit from Node
Paren : inherit from node
There is a virtual function for all nodes called eval(). If you call that function, it will return the value of that subexpression.
String Tokenizer + LL(1) Parser will give you an expression tree... the polymorphism way might involve an abstract Arithmetic class with an "evaluate(a,b)" function, which is overridden for each of the operators involved (Addition, Subtraction etc) to return the appropriate value, and the tree contains Integers and Arithmetic operators, which can be evaluated by a post(?)-order traversal of the tree.
I won't give away the answer, but a
Standard Bad Solution involves the use
of a switch or case statment (or just
good old-fashioned cascaded-ifs). A
Slightly Better Solution involves
using a table of function pointers,
and the Probably Best Solution
involves using polymorphism.
The last twenty years of evolution in interpreters can be seen as going the other way - polymorphism (eg naive Smalltalk metacircular interpreters) to function pointers (naive lisp implementations, threaded code, C++) to switch (naive byte code interpreters), and then onwards to JITs and so on - which either require very big classes, or (in singly polymorphic languages) double-dispatch, which reduces the polymorphism to a type-case, and you're back at stage one. What definition of 'best' is in use here?
For simple stuff a polymorphic solution is OK - here's one I made earlier, but either stack and bytecode/switch or exploiting the runtime's compiler is usually better if you're, say, plotting a function with a few thousand data points.
Hm... I don't think you can write a top-down parser for this without backtracking, so it has to be some sort of a shift-reduce parser. LR(1) or even LALR will of course work just fine with the following (ad-hoc) language definition:
Start -> E1
E1 -> E1+E1 | E1-E1
E1 -> E2*E2 | E2/E2 | E2
E2 -> number | (E1)
Separating it out into E1 and E2 is necessary to maintain the precedence of * and / over + and -.
But this is how I would do it if I had to write the parser by hand:
Two stacks, one storing nodes of the tree as operands and one storing operators
Read the input left to right, make leaf nodes of the numbers and push them into the operand stack.
If you have >= 2 operands on the stack, pop 2, combine them with the topmost operator in the operator stack and push this structure back to the operand tree, unless
The next operator has higher precedence that the one currently on top of the stack.
This leaves us the problem of handling brackets. One elegant (I thought) solution is to store the precedence of each operator as a number in a variable. So initially,
int plus, minus = 1;
int mul, div = 2;
Now every time you see a a left bracket increment all these variables by 2, and every time you see a right bracket, decrement all the variables by 2.
This will ensure that the + in 3*(4+5) has higher precedence than the *, and 3*4 will not be pushed onto the stack. Instead it will wait for 5, push 4+5, then push 3*(4+5).
Re: Justin
I think the tree would look something like this:
+
/ \
2 ( )
|
2
Basically, you'd have an "eval" node, that just evaluates the tree below it. That would then be optimized out to just being:
+
/ \
2 2
In this case the parens aren't required and don't add anything. They don't add anything logically, so they'd just go away.
I think the question is about how to write a parser, not the evaluator. Or rather, how to create the expression tree from a string.
Case statements that return a base class don't exactly count.
The basic structure of a "polymorphic" solution (which is another way of saying, I don't care what you build this with, I just want to extend it with rewriting the least amount of code possible) is deserializing an object hierarchy from a stream with a (dynamic) set of known types.
The crux of the implementation of the polymorphic solution is to have a way to create an expression object from a pattern matcher, likely recursive. I.e., map a BNF or similar syntax to an object factory.
Or maybe this is the real question:
how can you represent (2) as a BST?
That is the part that is tripping me
up.
Recursion.
#Justin:
Look at my note on representing the nodes. If you use that scheme, then
2 + (2)
can be represented as
.
/ \
2 ( )
|
2
should use a functional language imo. Trees are harder to represent and manipulate in OO languages.
As people have been mentioning previously, when you use expression trees parens are not necessary. The order of operations becomes trivial and obvious when you're looking at an expression tree. The parens are hints to the parser.
While the accepted answer is the solution to one half of the problem, the other half - actually parsing the expression - is still unsolved. Typically, these sorts of problems can be solved using a recursive descent parser. Writing such a parser is often a fun exercise, but most modern tools for language parsing will abstract that away for you.
The parser is also significantly harder if you allow floating point numbers in your string. I had to create a DFA to accept floating point numbers in C -- it was a very painstaking and detailed task. Remember, valid floating points include: 10, 10., 10.123, 9.876e-5, 1.0f, .025, etc. I assume some dispensation from this (in favor of simplicty and brevity) was made in the interview.
I've written such a parser with some basic techniques like
Infix -> RPN and
Shunting Yard and
Tree Traversals.
Here is the implementation I've came up with.
It's written in C++ and compiles on both Linux and Windows.
Any suggestions/questions are welcomed.
So, let's try to tackle the problem all three ways. How do you go from an arithmetic expression (e.g. in a string) such as "2 + (2)" to an expression tree using cascaded-if's, a table of function pointers, and/or polymorphism?
This is interesting,but I don't think this belongs to the realm of object-oriented programming...I think it has more to do with parsing techniques.
I've kind of chucked this c# console app together as a bit of a proof of concept. Have a feeling it could be a lot better (that switch statement in GetNode is kind of clunky (it's there coz I hit a blank trying to map a class name to an operator)). Any suggestions on how it could be improved very welcome.
using System;
class Program
{
static void Main(string[] args)
{
string expression = "(((3.5 * 4.5) / (1 + 2)) + 5)";
Console.WriteLine(string.Format("{0} = {1}", expression, new Expression.ExpressionTree(expression).Value));
Console.WriteLine("\nShow's over folks, press a key to exit");
Console.ReadKey(false);
}
}
namespace Expression
{
// -------------------------------------------------------
abstract class NodeBase
{
public abstract double Value { get; }
}
// -------------------------------------------------------
class ValueNode : NodeBase
{
public ValueNode(double value)
{
_double = value;
}
private double _double;
public override double Value
{
get
{
return _double;
}
}
}
// -------------------------------------------------------
abstract class ExpressionNodeBase : NodeBase
{
protected NodeBase GetNode(string expression)
{
// Remove parenthesis
expression = RemoveParenthesis(expression);
// Is expression just a number?
double value = 0;
if (double.TryParse(expression, out value))
{
return new ValueNode(value);
}
else
{
int pos = ParseExpression(expression);
if (pos > 0)
{
string leftExpression = expression.Substring(0, pos - 1).Trim();
string rightExpression = expression.Substring(pos).Trim();
switch (expression.Substring(pos - 1, 1))
{
case "+":
return new Add(leftExpression, rightExpression);
case "-":
return new Subtract(leftExpression, rightExpression);
case "*":
return new Multiply(leftExpression, rightExpression);
case "/":
return new Divide(leftExpression, rightExpression);
default:
throw new Exception("Unknown operator");
}
}
else
{
throw new Exception("Unable to parse expression");
}
}
}
private string RemoveParenthesis(string expression)
{
if (expression.Contains("("))
{
expression = expression.Trim();
int level = 0;
int pos = 0;
foreach (char token in expression.ToCharArray())
{
pos++;
switch (token)
{
case '(':
level++;
break;
case ')':
level--;
break;
}
if (level == 0)
{
break;
}
}
if (level == 0 && pos == expression.Length)
{
expression = expression.Substring(1, expression.Length - 2);
expression = RemoveParenthesis(expression);
}
}
return expression;
}
private int ParseExpression(string expression)
{
int winningLevel = 0;
byte winningTokenWeight = 0;
int winningPos = 0;
int level = 0;
int pos = 0;
foreach (char token in expression.ToCharArray())
{
pos++;
switch (token)
{
case '(':
level++;
break;
case ')':
level--;
break;
}
if (level <= winningLevel)
{
if (OperatorWeight(token) > winningTokenWeight)
{
winningLevel = level;
winningTokenWeight = OperatorWeight(token);
winningPos = pos;
}
}
}
return winningPos;
}
private byte OperatorWeight(char value)
{
switch (value)
{
case '+':
case '-':
return 3;
case '*':
return 2;
case '/':
return 1;
default:
return 0;
}
}
}
// -------------------------------------------------------
class ExpressionTree : ExpressionNodeBase
{
protected NodeBase _rootNode;
public ExpressionTree(string expression)
{
_rootNode = GetNode(expression);
}
public override double Value
{
get
{
return _rootNode.Value;
}
}
}
// -------------------------------------------------------
abstract class OperatorNodeBase : ExpressionNodeBase
{
protected NodeBase _leftNode;
protected NodeBase _rightNode;
public OperatorNodeBase(string leftExpression, string rightExpression)
{
_leftNode = GetNode(leftExpression);
_rightNode = GetNode(rightExpression);
}
}
// -------------------------------------------------------
class Add : OperatorNodeBase
{
public Add(string leftExpression, string rightExpression)
: base(leftExpression, rightExpression)
{
}
public override double Value
{
get
{
return _leftNode.Value + _rightNode.Value;
}
}
}
// -------------------------------------------------------
class Subtract : OperatorNodeBase
{
public Subtract(string leftExpression, string rightExpression)
: base(leftExpression, rightExpression)
{
}
public override double Value
{
get
{
return _leftNode.Value - _rightNode.Value;
}
}
}
// -------------------------------------------------------
class Divide : OperatorNodeBase
{
public Divide(string leftExpression, string rightExpression)
: base(leftExpression, rightExpression)
{
}
public override double Value
{
get
{
return _leftNode.Value / _rightNode.Value;
}
}
}
// -------------------------------------------------------
class Multiply : OperatorNodeBase
{
public Multiply(string leftExpression, string rightExpression)
: base(leftExpression, rightExpression)
{
}
public override double Value
{
get
{
return _leftNode.Value * _rightNode.Value;
}
}
}
}
Ok, here is my naive implementation. Sorry, I did not feel to use objects for that one but it is easy to convert. I feel a bit like evil Willy (from Steve's story).
#!/usr/bin/env python
#tree structure [left argument, operator, right argument, priority level]
tree_root = [None, None, None, None]
#count of parethesis nesting
parenthesis_level = 0
#current node with empty right argument
current_node = tree_root
#indices in tree_root nodes Left, Operator, Right, PRiority
L, O, R, PR = 0, 1, 2, 3
#functions that realise operators
def sum(a, b):
return a + b
def diff(a, b):
return a - b
def mul(a, b):
return a * b
def div(a, b):
return a / b
#tree evaluator
def process_node(n):
try:
len(n)
except TypeError:
return n
left = process_node(n[L])
right = process_node(n[R])
return n[O](left, right)
#mapping operators to relevant functions
o2f = {'+': sum, '-': diff, '*': mul, '/': div, '(': None, ')': None}
#converts token to a node in tree
def convert_token(t):
global current_node, tree_root, parenthesis_level
if t == '(':
parenthesis_level += 2
return
if t == ')':
parenthesis_level -= 2
return
try: #assumption that we have just an integer
l = int(t)
except (ValueError, TypeError):
pass #if not, no problem
else:
if tree_root[L] is None: #if it is first number, put it on the left of root node
tree_root[L] = l
else: #put on the right of current_node
current_node[R] = l
return
priority = (1 if t in '+-' else 2) + parenthesis_level
#if tree_root does not have operator put it there
if tree_root[O] is None and t in o2f:
tree_root[O] = o2f[t]
tree_root[PR] = priority
return
#if new node has less or equals priority, put it on the top of tree
if tree_root[PR] >= priority:
temp = [tree_root, o2f[t], None, priority]
tree_root = current_node = temp
return
#starting from root search for a place with higher priority in hierarchy
current_node = tree_root
while type(current_node[R]) != type(1) and priority > current_node[R][PR]:
current_node = current_node[R]
#insert new node
temp = [current_node[R], o2f[t], None, priority]
current_node[R] = temp
current_node = temp
def parse(e):
token = ''
for c in e:
if c <= '9' and c >='0':
token += c
continue
if c == ' ':
if token != '':
convert_token(token)
token = ''
continue
if c in o2f:
if token != '':
convert_token(token)
convert_token(c)
token = ''
continue
print "Unrecognized character:", c
if token != '':
convert_token(token)
def main():
parse('(((3 * 4) / (1 + 2)) + 5)')
print tree_root
print process_node(tree_root)
if __name__ == '__main__':
main()