In ANTLR how should I implement a Visitor with state / context? - antlr

I'm trying to write a simple ANTLR parser to handle date adjustment, so for example I could write: +1w+1d to mean "traverse a week and a day", or MIN(+30d, +1m) to mean "30 days from the input date, or 1 month, whichever is sooner". These rules should be composable, so MIN(+30d, +1m)+1d means "the day after (30 days from the input date, or 1 month from the input date, whichever is sooner)" and +1dMIN(+30d, +1m) means "30 days from (the day after the input date) or 1 month after (the day after the input date) whichever is sooner".
[I appreciate the examples here are relatively trite - the real grammar needs to understand about weekends, holidays, month boundaries etc, so it might be things like "one month after(the end of the input date's month or the Friday after the input date - whichever comes first)" etc etc]
The code I want to write is:
DateAdjutmeParser parser = buildFromString("MAX(+30d,+1m)");
ParseTree tree = parser.rootNode();
return new MyVisitor().visit(tree, LocalDate.of(2020,4,23)); //Not allowed extra parameters here.
The problem is how exactly can I pass the "context Date"? I can't store it in the MyVisitor class as a member since the visit() call is recursive and that would overwrite the context. I could build up a parallel set of objects that did have the right methods, but that seems a lot of boilerplate.
Is there an ANTLR solution?
More details:
This is the Visitor I'd like to write:
public class MyVisitor extends DateAdjustBaseVisitor<LocalDate> {
#Override
public LocalDate visitOffsetRule(DateAdjustParser.OffsetRuleContext ctx) {
LocalDate contextDate = ???; //
return contextDate.plus(Integer.valueOf(ctx.num.toString()), ChronoUnit.valueOf(ctx.unit.toString()));
}
#Override
public LocalDate visitMinMaxRule(DateAdjustParser.MinMaxRuleContext ctx) {
LocalDate contextDate = ???; //
LocalDate left = this.visitChildren(ctx.left, contextDate);
LocalDate right = this.visitChildren(ctx.right, contextDate);
if(ctx.type.getText().equals("MIN")) {
return left.compareTo(right) > 0 ? left : right;
} else {
return left.compareTo(right) < 0 ? left : right;
}
}
}
here's my grammar:
grammar DateAdjust;
rootNode: offset+;
offset
: num=NUMBER unit=UNIT #OffsetRule
| type=( MIN | MAX ) '(' left=offset ',' right=offset ')' #MinMaxRule
;
UNIT: [dwmy]; //Days Weeks Months Years
NUMBER: [+-]?[0..9]+;
MAX: 'MAX';
MIN: 'MIN';

Not an Antlr-specific solution, but a typical DSL solution is to use a scoped state table (aka symbol table) to accumulate the results of an AST/CST walk.
See this answer for an implementation. Another exists in the Antlr repository.

Related

Getting Time range between non intersecting ranges

I have the following timelines :
7 a.m --------------------- 12 a.m. 2 am .................. 10 a.m
10-------11 3------5
closed closed
the output should be the non-intersecting time ranges:
7-10 a.m, 11 -12 a.m, 2-3 p.m, 5-10 p.m
I tried to minus and subtract method for Ranges but didn't work
A tricky part could be the following case
7 a.m --------------------- 12 a.m. 2 am .................. 10 a.m
10----------------------------------------5
closed
the output should be the non-intersecting time ranges:
7-10 a.m, 5-10 p.m
Any Idea for kotlin implementation?
I tried to minus and subtract method for Ranges but didn't work
Sounds like a pretty common case and I suspect there are some existing algorithms for it, but nothing comes out of top of my head.
My idea is to first transform both lists of ranges into a single list of opening/closing "events", ordered by time. The start of an opening range increases the "openess" by +1 while its end decreases it (-1). Start of a closing range also decreases "openess" while its end increases it. Then we iterate the events in the time order, keeping the information on what is the current "openess" level. Whenever the "openess" level is 1, that means we are in the middle of an opening range, but not inside a closing range, so we are entirely open.
Assuming both lists of ranges are initially properly ordered, as in your example, I believe it should be doable in linear time and even without this intermediary list of events. However, such implementation would be pretty complicated to cover all possible states, so I decided to go with a simpler solution which is I believe O(n * log(n)). Also, this implementation requires that opening ranges do not overlap with each other, the same for closing ranges:
fun main() {
// your first example
println(listOf(Range(7, 12), Range(14, 22)) - listOf(Range(10, 11), Range(15, 17)))
// second example
println(listOf(Range(7, 12), Range(14, 22)) - listOf(Range(10, 17)))
// two close rangs "touch" each other
println(listOf(Range(8, 16)) - listOf(Range(10, 11), Range(11, 13)))
// both open and close range starts at the same time
println(listOf(Range(8, 16)) - listOf(Range(8, 12)))
}
data class Range(val start: Int, val end: Int)
operator fun List<Range>.minus(other: List<Range>): List<Range> {
// key is the time, value is the change of "openess" at this time
val events = sortedMapOf<Int, Int>()
forEach { (start, end) ->
events.merge(start, 1, Int::plus)
events.merge(end, -1, Int::plus)
}
other.forEach { (start, end) ->
events.merge(start, -1, Int::plus)
events.merge(end, 1, Int::plus)
}
val result = mutableListOf<Range>()
var currOpeness = 0
var currStart = 0
for ((time, change) in events) {
// we were open and now closing
if (currOpeness == 1 && change < 0) {
result += Range(currStart, time)
}
currOpeness += change
// we were closed and now opening
if (currOpeness == 1 && change > 0) {
currStart = time
}
}
return result
}

Kotlin: How to utilize Kotlin data structure to write efficient solutions during interview

I got this question on an interview I was doing and flopped at the last part with the Kotlin. I am trying to recreate fun taskCompleted() I wanted to see how I can implement it, just to play with it. So this is what I was given
data class Article(val title: String, val content: String, val duration: Int, val date: Int)
data class StreakInfo(val goalsMet: Int, val breakDown: List<Boolean>)
interface ReadingGoalService{
fun getDailyReadingGoal(): Int
}
interface TaskService{
fun allTasks(): List<Article>
}
class MockInterview{
fun taskCompleted(){}
//temporary
private fun today(): Int{
return 1
}
}
fun getAllSteaks(streakInfo: StreakInfo){
//don't implement
}
The ask was to do fun taskCompleted() where, I was expected to the following
//TODO 1 current day is today
//TODO 2 take all the articles that were read on the current day
//TODO 3 sum readingDurations of articles for current Day
//TODO 4 check if sum of readingDurations is >= than reading goal
// TODO 5 if sum of readingDuration is >= goal repeat 2..3.. and 4. for currentDay-1
//TODO 6 if sum of readingDuration is < goal - stop
Any ideas how I can do this in Kotlin? I plan on using this as a learning process.
As I mentioned, I highly suggest you take a look at the Kotlin Koans since they walk you through many examples that I think would certainly teach you how to perform almost all of those TODO's! (and they have the solutions!)
In this interview were testing whether you knew on how to use the aggregate and mapping functions from the Kotlin Standard Library and checking whether you would see that there was a lot of code that you could re-use by making them functions with the right amount of parameters
Here's how I would approach it:
fun today() = 1 // TODO 1: Current day is today
fun tasksDoneByDay(taskService: TaskService, day: Int = today()): List<Article> {
// TODO2: Here I have all tasks that were done at "today()"
return taskService.allTasks().filter { task -> task.date == day }
}
// TODO 3
fun durationsForDay(taskService: TaskService, day: Int = today()) = tasksDoneByDay(taskService, day).sumBy { it.duration }
// TODO 4
fun durationsForDayMeetGoal(taskService: TaskService, goal: Int, day: Int = today()) = durationsForDay(taskService, day) >= goal
// TODO 5 if sum of readingDuration is >= goal repeat 2..3.. and 4. for currentDay-1
fun todo5(taskService: TaskService, goal: Int) {
val today = today()
return when(val durationsForTodayMeetGoal = durationsForDayMeetGoal(taskService, goal, today)) {
false -> durationsForDayMeetGoal(taskService, goal, today - 1)
true -> durationsForTodayMeetGoal
}
}
// TODO 6 I don't understand...
essentially the same as todo five but when it's false dont do anything?

Period of weeks to string with weeks instead of days only

var period = Period.ofWeeks(2)
println("period of two weeks: $period")
gives
period of two weeks: P14D
Unfortunately for my purpose I need P2W as output, so directly the weeks instead of weeks converted to days. Is there any elegant way to do this, besides building my Period string manually?
Your observation is true. java.time.Period does not really conserve the week value but automatically converts it to days - already in the factory method during construction. Reason is that a Period only has days and months/years as inner state.
Possible workarounds or alternatives in the order of increasing complexity and count of features:
You write your own class implementing the interface java.time.temporal.TemporalAmount with weeks as inner state.
You use the small library threeten-extra which offers the class Weeks. But be aware of the odd style of printing negative durations like P-2W instead of -P2W. Example:
Weeks w1 = Weeks.of(2);
Weeks w2 = Weeks.parse("P2W");
System.out.println(w1.toString()); // P2W
System.out.println(w2.toString()); // P2W
Or you use my library Time4J which offers the class net.time4j.Duration. This class does not implement the interface TemporalAmount but offers a conversion method named toTemporalAmount(). Various normalization features and formatting (ISO, via patterns and via net.time4j.PrettyTime) and parsing (ISO and via patterns) capabilities are offered, too. Example:
Duration<CalendarUnit> d1 = Duration.of(2, CalendarUnit.WEEKS);
Duration<IsoDateUnit> d2 = Duration.parseWeekBasedPeriod("P2W"); // also for week-based-years
System.out.println(d1.toString()); // P2W
System.out.println(PrettyTime.of(Locale.US).print(d2); // 2 weeks
As extra, the same library also offers the class net.time4j.range.Weeks as simplified week-only duation.
The Period toString method only handles Day, Month and Year.
as you can see below is toString() method from class java.time.Period.
So Unfortunately I think you need to create it yourself.
/**
* Outputs this period as a {#code String}, such as {#code P6Y3M1D}.
* <p>
* The output will be in the ISO-8601 period format.
* A zero period will be represented as zero days, 'P0D'.
*
* #return a string representation of this period, not null
*/
#Override
public String toString() {
if (this == ZERO) {
return "P0D";
} else {
StringBuilder buf = new StringBuilder();
buf.append('P');
if (years != 0) {
buf.append(years).append('Y');
}
if (months != 0) {
buf.append(months).append('M');
}
if (days != 0) {
buf.append(days).append('D');
}
return buf.toString();
}
}

OptaPlanner- TSPTW minimizing total time

I am using OptaPlanner to solve what is effectively the Traveling Salesman Problem with Time Windows (TSPTW). I have a working initial solution based on the OptaPlanner provided VRPTW example.
I am now trying to address my requirements that deviate from the standard TSPTW, which are:
I am trying to minimize the total time spent rather than the total distance traveled. Because of this, idle time counts against me.
In additional to the standard time windowed visits I also must support no-later-than (NLT) visits (i.e. don't visit after X time) and no-earlier-than (NET) visits (i.e don't visit before X time).
My current solution always sets the first visit's arrival time to that visit's start time. This has the following problems with respect to my requirements:
This can introduce unnecessary idle time that could be avoided if the visit was arrived at sometime later in its time window.
The behavior with NLT is problematic. If I define an NLT with the start time set to Long.MIN_VALUE (to represent that it is unbounded without resorting to nulls) then that is the time the NLT visit is arrived at (the same problem as #1). I tried addressing this by setting the start time to the NLT time. This resulted in arriving just in time for the NLT visit but overshooting the time windows of subsequent visits.
How should I address this/these problems? I suspect a solution will involve ArrivalTimeUpdatingVariableListener but I don't know what that solution should look like.
In case it's relevant, I've pasted in my current scoring rules below. One thing to note is that "distance" is really travel time. Also, for domain reasons, I am encouraging NLT and NET arrival times to be close to the cutoff time (end time for NLT, start time for NET).
import org.optaplanner.core.api.score.buildin.hardsoftlong.HardSoftLongScoreHolder;
global HardSoftLongScoreHolder scoreHolder;
// Hard Constraints
rule "ArrivalAfterWindowEnd"
when
Visit(arrivalTime > maxStartTime, $arrivalTime : arrivalTime, $maxStartTime : maxStartTime)
then
scoreHolder.addHardConstraintMatch(kcontext, $maxStartTime - $arrivalTime);
end
// Soft Constraints
rule "MinimizeDistanceToPreviousEvent"
when
Visit(previousRouteEvent != null, $distanceFromPreviousRouteEvent : distanceFromPreviousRouteEvent)
then
scoreHolder.addSoftConstraintMatch(kcontext, -$distanceFromPreviousRouteEvent);
end
rule "MinimizeDistanceFromLastEventToHome"
when
$visit : Visit(previousRouteEvent != null)
not Visit(previousRouteEvent == $visit)
$home : Home()
then
scoreHolder.addSoftConstraintMatch(kcontext, -$visit.getDistanceTo($home));
end
rule "MinimizeIdle"
when
Visit(scheduleType != ScheduleType.NLT, arrivalTime < minStartTime, $minStartTime : minStartTime, $arrivalTime : arrivalTime)
then
scoreHolder.addSoftConstraintMatch(kcontext, $arrivalTime - $minStartTime);
end
rule "PreferLatestNLT"
when
Visit(scheduleType == ScheduleType.NLT, arrivalTime < maxStartTime, $maxStartTime : maxStartTime, $arrivalTime : arrivalTime)
then
scoreHolder.addSoftConstraintMatch(kcontext, $arrivalTime - $maxStartTime);
end
rule "PreferEarliestNET"
when
Visit(scheduleType == ScheduleType.NET, arrivalTime > minStartTime, $minStartTime : minStartTime, $arrivalTime : arrivalTime)
then
scoreHolder.addSoftConstraintMatch(kcontext, $minStartTime - $arrivalTime);
end
To see an example that uses real road times instead of road distances: In the examples app, open Vehicle Routing, click button Import, load the file roaddistance/capacitated/belgium-road-time-n50-k10.vrp. Those times were calculated with GraphHopper.
To see an example that uses Time Windows, open the Vehicle Routing and quick open a dataset that is called cvrptw (tw stands for Time Windows). If you look at the academic spec (linked from docs chapter 3 IIRC) for CVRPTW, you'll see it already has a hard constraint "Do not arrive after time window closes" - so you'll see that one in score rules drl. As for arriving too early (and therefore losing the idle time): copy paste that hard constraint, make it a soft, make it use readyTime instead of dueTime and reverse it's comparison and penalty calculation. I actually originally implemented that (as it's the logical thing to have), but because I followed the academic spec (to compare with results of the academics) I had to remove it.
I was able to solve my problem by modifying ArrivalTimeUpdatingVariableListener's updateArrivalTime method to reach backwards and (attempt to) shift the previous arrival time. Additionally, I introduced a getPreferredStartTime() method to support NLT events defaulting to as late as possible. Finally, just for code cleanliness, I moved the updateArrivalTime method from ArrivalTimeUpdatingVariableListener into the Visit class.
Here is the relevant code from the Visit class:
public long getPreferredStartTime()
{
switch(scheduleType)
{
case NLT:
return getMaxStartTime();
default:
return getMinStartTime();
}
}
public Long getStartTime()
{
Long arrivalTime = getArrivalTime();
if (arrivalTime == null)
{
return null;
}
switch(scheduleType)
{
case NLT:
return arrivalTime;
default:
return Math.max(arrivalTime, getMinStartTime());
}
}
public Long getEndTime()
{
Long startTime = getStartTime();
if (startTime == null)
{
return null;
}
return startTime + duration;
}
public void updateArrivalTime(ScoreDirector scoreDirector)
{
if(previousRouteEvent instanceof Visit)
{
updateArrivalTime(scoreDirector, (Visit)previousRouteEvent);
return;
}
long arrivalTime = getPreferredStartTime();
if(Utilities.equal(this.arrivalTime, arrivalTime))
{
return;
}
setArrivalTime(scoreDirector, arrivalTime);
}
private void updateArrivalTime(ScoreDirector scoreDirector, Visit previousVisit)
{
long departureTime = previousVisit.getEndTime();
long arrivalTime = departureTime + getDistanceFromPreviousRouteEvent();
if(Utilities.equal(this.arrivalTime, arrivalTime))
{
return;
}
if(arrivalTime > maxStartTime)
{
if(previousVisit.shiftTimeLeft(scoreDirector, arrivalTime - maxStartTime))
{
return;
}
}
else if(arrivalTime < minStartTime)
{
if(previousVisit.shiftTimeRight(scoreDirector, minStartTime - arrivalTime))
{
return;
}
}
setArrivalTime(scoreDirector, arrivalTime);
}
/**
* Set the arrival time and propagate the change to any following entities.
*/
private void setArrivalTime(ScoreDirector scoreDirector, long arrivalTime)
{
scoreDirector.beforeVariableChanged(this, "arrivalTime");
this.arrivalTime = arrivalTime;
scoreDirector.afterVariableChanged(this, "arrivalTime");
Visit nextEntity = getNextVisit();
if(nextEntity != null)
{
nextEntity.updateArrivalTime(scoreDirector, this);
}
}
/**
* Attempt to shift the arrival time backward by the specified amount.
* #param requested The amount of time that should be subtracted from the arrival time.
* #return Returns true if the arrival time was changed.
*/
private boolean shiftTimeLeft(ScoreDirector scoreDirector, long requested)
{
long available = arrivalTime - minStartTime;
if(available <= 0)
{
return false;
}
requested = Math.min(requested, available);
if(previousRouteEvent instanceof Visit)
{
//Arrival time is inflexible as this is not the first event. Forward to previous event.
return ((Visit)previousRouteEvent).shiftTimeLeft(scoreDirector, requested);
}
setArrivalTime(scoreDirector, arrivalTime - requested);
return true;
}
/**
* Attempt to shift the arrival time forward by the specified amount.
* #param requested The amount of time that should be added to the arrival time.
* #return Returns true if the arrival time was changed.
*/
private boolean shiftTimeRight(ScoreDirector scoreDirector, long requested)
{
long available = maxStartTime - arrivalTime;
if(available <= 0)
{
return false;
}
requested = Math.min(requested, available);
if(previousRouteEvent instanceof Visit)
{
//Arrival time is inflexible as this is not the first event. Forward to previous event.
//Note, we could start later anyways but that won't decrease idle time, which is the purpose of shifting right
return ((Visit)previousRouteEvent).shiftTimeRight(scoreDirector, requested);
}
setArrivalTime(scoreDirector, arrivalTime + requested);
return false;
}

Expression Evaluation and Tree Walking using polymorphism? (ala Steve Yegge)

This morning, I was reading Steve Yegge's: When Polymorphism Fails, when I came across a question that a co-worker of his used to ask potential employees when they came for their interview at Amazon.
As an example of polymorphism in
action, let's look at the classic
"eval" interview question, which (as
far as I know) was brought to Amazon
by Ron Braunstein. The question is
quite a rich one, as it manages to
probe a wide variety of important
skills: OOP design, recursion, binary
trees, polymorphism and runtime
typing, general coding skills, and (if
you want to make it extra hard)
parsing theory.
At some point, the candidate hopefully
realizes that you can represent an
arithmetic expression as a binary
tree, assuming you're only using
binary operators such as "+", "-",
"*", "/". The leaf nodes are all
numbers, and the internal nodes are
all operators. Evaluating the
expression means walking the tree. If
the candidate doesn't realize this,
you can gently lead them to it, or if
necessary, just tell them.
Even if you tell them, it's still an
interesting problem.
The first half of the question, which
some people (whose names I will
protect to my dying breath, but their
initials are Willie Lewis) feel is a
Job Requirement If You Want To Call
Yourself A Developer And Work At
Amazon, is actually kinda hard. The
question is: how do you go from an
arithmetic expression (e.g. in a
string) such as "2 + (2)" to an
expression tree. We may have an ADJ
challenge on this question at some
point.
The second half is: let's say this is
a 2-person project, and your partner,
who we'll call "Willie", is
responsible for transforming the
string expression into a tree. You get
the easy part: you need to decide what
classes Willie is to construct the
tree with. You can do it in any
language, but make sure you pick one,
or Willie will hand you assembly
language. If he's feeling ornery, it
will be for a processor that is no
longer manufactured in production.
You'd be amazed at how many candidates
boff this one.
I won't give away the answer, but a
Standard Bad Solution involves the use
of a switch or case statment (or just
good old-fashioned cascaded-ifs). A
Slightly Better Solution involves
using a table of function pointers,
and the Probably Best Solution
involves using polymorphism. I
encourage you to work through it
sometime. Fun stuff!
So, let's try to tackle the problem all three ways. How do you go from an arithmetic expression (e.g. in a string) such as "2 + (2)" to an expression tree using cascaded-if's, a table of function pointers, and/or polymorphism?
Feel free to tackle one, two, or all three.
[update: title modified to better match what most of the answers have been.]
Polymorphic Tree Walking, Python version
#!/usr/bin/python
class Node:
"""base class, you should not process one of these"""
def process(self):
raise('you should not be processing a node')
class BinaryNode(Node):
"""base class for binary nodes"""
def __init__(self, _left, _right):
self.left = _left
self.right = _right
def process(self):
raise('you should not be processing a binarynode')
class Plus(BinaryNode):
def process(self):
return self.left.process() + self.right.process()
class Minus(BinaryNode):
def process(self):
return self.left.process() - self.right.process()
class Mul(BinaryNode):
def process(self):
return self.left.process() * self.right.process()
class Div(BinaryNode):
def process(self):
return self.left.process() / self.right.process()
class Num(Node):
def __init__(self, _value):
self.value = _value
def process(self):
return self.value
def demo(n):
print n.process()
demo(Num(2)) # 2
demo(Plus(Num(2),Num(5))) # 2 + 3
demo(Plus(Mul(Num(2),Num(3)),Div(Num(10),Num(5)))) # (2 * 3) + (10 / 2)
The tests are just building up the binary trees by using constructors.
program structure:
abstract base class: Node
all Nodes inherit from this class
abstract base class: BinaryNode
all binary operators inherit from this class
process method does the work of evaluting the expression and returning the result
binary operator classes: Plus,Minus,Mul,Div
two child nodes, one each for left side and right side subexpressions
number class: Num
holds a leaf-node numeric value, e.g. 17 or 42
The problem, I think, is that we need to parse perentheses, and yet they are not a binary operator? Should we take (2) as a single token, that evaluates to 2?
The parens don't need to show up in the expression tree, but they do affect its shape. E.g., the tree for (1+2)+3 is different from 1+(2+3):
+
/ \
+ 3
/ \
1 2
versus
+
/ \
1 +
/ \
2 3
The parentheses are a "hint" to the parser (e.g., per superjoe30, to "recursively descend")
This gets into parsing/compiler theory, which is kind of a rabbit hole... The Dragon Book is the standard text for compiler construction, and takes this to extremes. In this particular case, you want to construct a context-free grammar for basic arithmetic, then use that grammar to parse out an abstract syntax tree. You can then iterate over the tree, reducing it from the bottom up (it's at this point you'd apply the polymorphism/function pointers/switch statement to reduce the tree).
I've found these notes to be incredibly helpful in compiler and parsing theory.
Representing the Nodes
If we want to include parentheses, we need 5 kinds of nodes:
the binary nodes: Add Minus Mul Divthese have two children, a left and right side
+
/ \
node node
a node to hold a value: Valno children nodes, just a numeric value
a node to keep track of the parens: Parena single child node for the subexpression
( )
|
node
For a polymorphic solution, we need to have this kind of class relationship:
Node
BinaryNode : inherit from Node
Plus : inherit from Binary Node
Minus : inherit from Binary Node
Mul : inherit from Binary Node
Div : inherit from Binary Node
Value : inherit from Node
Paren : inherit from node
There is a virtual function for all nodes called eval(). If you call that function, it will return the value of that subexpression.
String Tokenizer + LL(1) Parser will give you an expression tree... the polymorphism way might involve an abstract Arithmetic class with an "evaluate(a,b)" function, which is overridden for each of the operators involved (Addition, Subtraction etc) to return the appropriate value, and the tree contains Integers and Arithmetic operators, which can be evaluated by a post(?)-order traversal of the tree.
I won't give away the answer, but a
Standard Bad Solution involves the use
of a switch or case statment (or just
good old-fashioned cascaded-ifs). A
Slightly Better Solution involves
using a table of function pointers,
and the Probably Best Solution
involves using polymorphism.
The last twenty years of evolution in interpreters can be seen as going the other way - polymorphism (eg naive Smalltalk metacircular interpreters) to function pointers (naive lisp implementations, threaded code, C++) to switch (naive byte code interpreters), and then onwards to JITs and so on - which either require very big classes, or (in singly polymorphic languages) double-dispatch, which reduces the polymorphism to a type-case, and you're back at stage one. What definition of 'best' is in use here?
For simple stuff a polymorphic solution is OK - here's one I made earlier, but either stack and bytecode/switch or exploiting the runtime's compiler is usually better if you're, say, plotting a function with a few thousand data points.
Hm... I don't think you can write a top-down parser for this without backtracking, so it has to be some sort of a shift-reduce parser. LR(1) or even LALR will of course work just fine with the following (ad-hoc) language definition:
Start -> E1
E1 -> E1+E1 | E1-E1
E1 -> E2*E2 | E2/E2 | E2
E2 -> number | (E1)
Separating it out into E1 and E2 is necessary to maintain the precedence of * and / over + and -.
But this is how I would do it if I had to write the parser by hand:
Two stacks, one storing nodes of the tree as operands and one storing operators
Read the input left to right, make leaf nodes of the numbers and push them into the operand stack.
If you have >= 2 operands on the stack, pop 2, combine them with the topmost operator in the operator stack and push this structure back to the operand tree, unless
The next operator has higher precedence that the one currently on top of the stack.
This leaves us the problem of handling brackets. One elegant (I thought) solution is to store the precedence of each operator as a number in a variable. So initially,
int plus, minus = 1;
int mul, div = 2;
Now every time you see a a left bracket increment all these variables by 2, and every time you see a right bracket, decrement all the variables by 2.
This will ensure that the + in 3*(4+5) has higher precedence than the *, and 3*4 will not be pushed onto the stack. Instead it will wait for 5, push 4+5, then push 3*(4+5).
Re: Justin
I think the tree would look something like this:
+
/ \
2 ( )
|
2
Basically, you'd have an "eval" node, that just evaluates the tree below it. That would then be optimized out to just being:
+
/ \
2 2
In this case the parens aren't required and don't add anything. They don't add anything logically, so they'd just go away.
I think the question is about how to write a parser, not the evaluator. Or rather, how to create the expression tree from a string.
Case statements that return a base class don't exactly count.
The basic structure of a "polymorphic" solution (which is another way of saying, I don't care what you build this with, I just want to extend it with rewriting the least amount of code possible) is deserializing an object hierarchy from a stream with a (dynamic) set of known types.
The crux of the implementation of the polymorphic solution is to have a way to create an expression object from a pattern matcher, likely recursive. I.e., map a BNF or similar syntax to an object factory.
Or maybe this is the real question:
how can you represent (2) as a BST?
That is the part that is tripping me
up.
Recursion.
#Justin:
Look at my note on representing the nodes. If you use that scheme, then
2 + (2)
can be represented as
.
/ \
2 ( )
|
2
should use a functional language imo. Trees are harder to represent and manipulate in OO languages.
As people have been mentioning previously, when you use expression trees parens are not necessary. The order of operations becomes trivial and obvious when you're looking at an expression tree. The parens are hints to the parser.
While the accepted answer is the solution to one half of the problem, the other half - actually parsing the expression - is still unsolved. Typically, these sorts of problems can be solved using a recursive descent parser. Writing such a parser is often a fun exercise, but most modern tools for language parsing will abstract that away for you.
The parser is also significantly harder if you allow floating point numbers in your string. I had to create a DFA to accept floating point numbers in C -- it was a very painstaking and detailed task. Remember, valid floating points include: 10, 10., 10.123, 9.876e-5, 1.0f, .025, etc. I assume some dispensation from this (in favor of simplicty and brevity) was made in the interview.
I've written such a parser with some basic techniques like
Infix -> RPN and
Shunting Yard and
Tree Traversals.
Here is the implementation I've came up with.
It's written in C++ and compiles on both Linux and Windows.
Any suggestions/questions are welcomed.
So, let's try to tackle the problem all three ways. How do you go from an arithmetic expression (e.g. in a string) such as "2 + (2)" to an expression tree using cascaded-if's, a table of function pointers, and/or polymorphism?
This is interesting,but I don't think this belongs to the realm of object-oriented programming...I think it has more to do with parsing techniques.
I've kind of chucked this c# console app together as a bit of a proof of concept. Have a feeling it could be a lot better (that switch statement in GetNode is kind of clunky (it's there coz I hit a blank trying to map a class name to an operator)). Any suggestions on how it could be improved very welcome.
using System;
class Program
{
static void Main(string[] args)
{
string expression = "(((3.5 * 4.5) / (1 + 2)) + 5)";
Console.WriteLine(string.Format("{0} = {1}", expression, new Expression.ExpressionTree(expression).Value));
Console.WriteLine("\nShow's over folks, press a key to exit");
Console.ReadKey(false);
}
}
namespace Expression
{
// -------------------------------------------------------
abstract class NodeBase
{
public abstract double Value { get; }
}
// -------------------------------------------------------
class ValueNode : NodeBase
{
public ValueNode(double value)
{
_double = value;
}
private double _double;
public override double Value
{
get
{
return _double;
}
}
}
// -------------------------------------------------------
abstract class ExpressionNodeBase : NodeBase
{
protected NodeBase GetNode(string expression)
{
// Remove parenthesis
expression = RemoveParenthesis(expression);
// Is expression just a number?
double value = 0;
if (double.TryParse(expression, out value))
{
return new ValueNode(value);
}
else
{
int pos = ParseExpression(expression);
if (pos > 0)
{
string leftExpression = expression.Substring(0, pos - 1).Trim();
string rightExpression = expression.Substring(pos).Trim();
switch (expression.Substring(pos - 1, 1))
{
case "+":
return new Add(leftExpression, rightExpression);
case "-":
return new Subtract(leftExpression, rightExpression);
case "*":
return new Multiply(leftExpression, rightExpression);
case "/":
return new Divide(leftExpression, rightExpression);
default:
throw new Exception("Unknown operator");
}
}
else
{
throw new Exception("Unable to parse expression");
}
}
}
private string RemoveParenthesis(string expression)
{
if (expression.Contains("("))
{
expression = expression.Trim();
int level = 0;
int pos = 0;
foreach (char token in expression.ToCharArray())
{
pos++;
switch (token)
{
case '(':
level++;
break;
case ')':
level--;
break;
}
if (level == 0)
{
break;
}
}
if (level == 0 && pos == expression.Length)
{
expression = expression.Substring(1, expression.Length - 2);
expression = RemoveParenthesis(expression);
}
}
return expression;
}
private int ParseExpression(string expression)
{
int winningLevel = 0;
byte winningTokenWeight = 0;
int winningPos = 0;
int level = 0;
int pos = 0;
foreach (char token in expression.ToCharArray())
{
pos++;
switch (token)
{
case '(':
level++;
break;
case ')':
level--;
break;
}
if (level <= winningLevel)
{
if (OperatorWeight(token) > winningTokenWeight)
{
winningLevel = level;
winningTokenWeight = OperatorWeight(token);
winningPos = pos;
}
}
}
return winningPos;
}
private byte OperatorWeight(char value)
{
switch (value)
{
case '+':
case '-':
return 3;
case '*':
return 2;
case '/':
return 1;
default:
return 0;
}
}
}
// -------------------------------------------------------
class ExpressionTree : ExpressionNodeBase
{
protected NodeBase _rootNode;
public ExpressionTree(string expression)
{
_rootNode = GetNode(expression);
}
public override double Value
{
get
{
return _rootNode.Value;
}
}
}
// -------------------------------------------------------
abstract class OperatorNodeBase : ExpressionNodeBase
{
protected NodeBase _leftNode;
protected NodeBase _rightNode;
public OperatorNodeBase(string leftExpression, string rightExpression)
{
_leftNode = GetNode(leftExpression);
_rightNode = GetNode(rightExpression);
}
}
// -------------------------------------------------------
class Add : OperatorNodeBase
{
public Add(string leftExpression, string rightExpression)
: base(leftExpression, rightExpression)
{
}
public override double Value
{
get
{
return _leftNode.Value + _rightNode.Value;
}
}
}
// -------------------------------------------------------
class Subtract : OperatorNodeBase
{
public Subtract(string leftExpression, string rightExpression)
: base(leftExpression, rightExpression)
{
}
public override double Value
{
get
{
return _leftNode.Value - _rightNode.Value;
}
}
}
// -------------------------------------------------------
class Divide : OperatorNodeBase
{
public Divide(string leftExpression, string rightExpression)
: base(leftExpression, rightExpression)
{
}
public override double Value
{
get
{
return _leftNode.Value / _rightNode.Value;
}
}
}
// -------------------------------------------------------
class Multiply : OperatorNodeBase
{
public Multiply(string leftExpression, string rightExpression)
: base(leftExpression, rightExpression)
{
}
public override double Value
{
get
{
return _leftNode.Value * _rightNode.Value;
}
}
}
}
Ok, here is my naive implementation. Sorry, I did not feel to use objects for that one but it is easy to convert. I feel a bit like evil Willy (from Steve's story).
#!/usr/bin/env python
#tree structure [left argument, operator, right argument, priority level]
tree_root = [None, None, None, None]
#count of parethesis nesting
parenthesis_level = 0
#current node with empty right argument
current_node = tree_root
#indices in tree_root nodes Left, Operator, Right, PRiority
L, O, R, PR = 0, 1, 2, 3
#functions that realise operators
def sum(a, b):
return a + b
def diff(a, b):
return a - b
def mul(a, b):
return a * b
def div(a, b):
return a / b
#tree evaluator
def process_node(n):
try:
len(n)
except TypeError:
return n
left = process_node(n[L])
right = process_node(n[R])
return n[O](left, right)
#mapping operators to relevant functions
o2f = {'+': sum, '-': diff, '*': mul, '/': div, '(': None, ')': None}
#converts token to a node in tree
def convert_token(t):
global current_node, tree_root, parenthesis_level
if t == '(':
parenthesis_level += 2
return
if t == ')':
parenthesis_level -= 2
return
try: #assumption that we have just an integer
l = int(t)
except (ValueError, TypeError):
pass #if not, no problem
else:
if tree_root[L] is None: #if it is first number, put it on the left of root node
tree_root[L] = l
else: #put on the right of current_node
current_node[R] = l
return
priority = (1 if t in '+-' else 2) + parenthesis_level
#if tree_root does not have operator put it there
if tree_root[O] is None and t in o2f:
tree_root[O] = o2f[t]
tree_root[PR] = priority
return
#if new node has less or equals priority, put it on the top of tree
if tree_root[PR] >= priority:
temp = [tree_root, o2f[t], None, priority]
tree_root = current_node = temp
return
#starting from root search for a place with higher priority in hierarchy
current_node = tree_root
while type(current_node[R]) != type(1) and priority > current_node[R][PR]:
current_node = current_node[R]
#insert new node
temp = [current_node[R], o2f[t], None, priority]
current_node[R] = temp
current_node = temp
def parse(e):
token = ''
for c in e:
if c <= '9' and c >='0':
token += c
continue
if c == ' ':
if token != '':
convert_token(token)
token = ''
continue
if c in o2f:
if token != '':
convert_token(token)
convert_token(c)
token = ''
continue
print "Unrecognized character:", c
if token != '':
convert_token(token)
def main():
parse('(((3 * 4) / (1 + 2)) + 5)')
print tree_root
print process_node(tree_root)
if __name__ == '__main__':
main()