Conventions for naming class operations? - naming-conventions

What conventions do you use for naming class operations?

Full word doc : Download C# Coding Standards & Best Practices
Naming Conventions and Standards
Note :
The terms Pascal Casing and Camel Casing are used throughout this document.
Pascal Casing - First character of all words are Upper Case and other characters are lower case.
Example: BackColor
Camel Casing - First character of all words, except the first word are Upper Case and other characters are lower case.
Example: backColor
Use Pascal casing for Class names
public class HelloWorld
{
...
}
Use Pascal casing for Method names
void SayHello(string name)
{
...
}
Use Camel casing for variables and method parameters
int totalCount = 0;
void SayHello(string name)
{
string fullMessage = "Hello " + name;
...
}
Use the prefix “I” with Camel Casing for interfaces ( Example: IEntity )
Do not use Hungarian notation to name variables.
In earlier days most of the programmers liked it - having the data type as a prefix for the variable name and using m_ as prefix for member variables. Eg:
string m_sName;
int nAge;
However, in .NET coding standards, this is not recommended. Usage of data type and m_ to represent member variables should not be used. All variables should use camel casing.
Some programmers still prefer to use the prefix m_ to represent member variables, since there is no other easy way to identify a member variable.
Use Meaningful, descriptive words to name variables. Do not use abbreviations.
Good:
string address
int salary
Not Good:
string nam
string addr
int sal
Do not use single character variable names like i, n, s etc. Use names like index, temp
One exception in this case would be variables used for iterations in loops:
for ( int i = 0; i < count; i++ )
{
...
}
If the variable is used only as a counter for iteration and is not used anywhere else in the loop, many people still like to use a single char variable (i) instead of inventing a different suitable name.
Do not use underscores (_) for local variable names.
All member variables must be prefixed with underscore (_) so that they can be identified from other local variables.
Do not use variable names that resemble keywords.
Prefix boolean variables, properties and methods with “is” or similar prefixes.
Ex: private bool _isFinished
Namespace names should follow the standard pattern
...
Use appropriate prefix for the UI elements so that you can identify them from the rest of the variables.
There are 2 different approaches recommended here.
a. Use a common prefix ( ui_ ) for all UI elements. This will help you group all of the UI elements together and easy to access all of them from the intellisense.
b. Use appropriate prefix for each of the ui element. A brief list is given below. Since .NET has given several controls, you may have to arrive at a complete list of standard prefixes for each of the controls (including third party controls) you are using.
Control Prefix
Label lbl
TextBox txt
DataGrid dtg
Button btn
ImageButton imb
Hyperlink hlk
DropDownList ddl
ListBox lst
DataList dtl
Repeater rep
Checkbox chk
CheckBoxList cbl
RadioButton rdo
RadioButtonList rbl
Image img
Panel pnl
PlaceHolder phd
Table tbl
Validators val
File name should match with class name.
For example, for the class HelloWorld, the file name should be helloworld.cs (or, helloworld.vb)
Use Pascal Case for file names.
Indentation and Spacing
Use TAB for indentation. Do not use SPACES. Define the Tab size as 4.
Comments should be in the same level as the code (use the same level of indentation).
Good:
// Format a message and display
string fullMessage = "Hello " + name;
DateTime currentTime = DateTime.Now;
string message = fullMessage + ", the time is : " + currentTime.ToShortTimeString();
MessageBox.Show ( message );
Not Good:
// Format a message and display
string fullMessage = "Hello " + name;
DateTime currentTime = DateTime.Now;
string message = fullMessage + ", the time is : " + currentTime.ToShortTimeString();
MessageBox.Show ( message );
Curly braces ( {} ) should be in the same level as the code outside the braces.
Use one blank line to separate logical groups of code.
Good:
bool SayHello ( string name )
{
string fullMessage = "Hello " + name;
DateTime currentTime = DateTime.Now;
string message = fullMessage + ", the time is : " + currentTime.ToShortTimeString();
MessageBox.Show ( message );
if ( ... )
{
// Do something
// ...
return false;
}
return true;
}
Not Good:
bool SayHello (string name)
{
string fullMessage = "Hello " + name;
DateTime currentTime = DateTime.Now;
string message = fullMessage + ", the time is : " + currentTime.ToShortTimeString();
MessageBox.Show ( message );
if ( ... )
{
// Do something
// ...
return false;
}
return true;
}
There should be one and only one single blank line between each method inside the class.
The curly braces should be on a separate line and not in the same line as if, for etc.
Good:
if ( ... )
{
// Do something
}
Not Good:
if ( ... ) {
// Do something
}
Use a single space before and after each operator and brackets.
Good:
if ( showResult == true )
{
for ( int i = 0; i < 10; i++ )
{
//
}
}
Not Good:
if(showResult==true)
{
for(int i= 0;i<10;i++)
{
//
}
}
Use #region to group related pieces of code together. If you use proper grouping using #region, the page should like this when all definitions are collapsed.
Keep private member variables, properties and methods in the top of the file and public members in the bottom.

I find it makes everyone's life easier to use the same naming conventions used by the language and framework you are working in.
For example, .Net has a convention. Model what your language does, and the "users" of your code and libraries will be happier. So, the answer may be, it depends on your language and / or platform...

Naming conventions are a controversial topic, because it's an arbitrary distinction.
The two answers above are good ones. My addition is this:
Your goal is readability. Your code tells a story, albeit a sometimes kind of boring one. Make sure the story is clear.
For extra fun see these links:
http://www.joelonsoftware.com/articles/Wrong.html
http://en.wikipedia.org/wiki/Naming_convention_%28programming%29

Related

vuejs - assigning variable with value as variable name [duplicate]

Other than the obvious fact that the first form could use a variable and not just a string literal, is there any reason to use one over the other, and if so under which cases?
In code:
// Given:
var foo = {'bar': 'baz'};
// Then
var x = foo['bar'];
// vs.
var x = foo.bar;
Context: I've written a code generator which produces these expressions and I'm wondering which is preferable.
(Sourced from here.)
Square bracket notation allows the use of characters that can't be used with dot notation:
var foo = myForm.foo[]; // incorrect syntax
var foo = myForm["foo[]"]; // correct syntax
including non-ASCII (UTF-8) characters, as in myForm["ダ"] (more examples).
Secondly, square bracket notation is useful when dealing with
property names which vary in a predictable way:
for (var i = 0; i < 10; i++) {
someFunction(myForm["myControlNumber" + i]);
}
Roundup:
Dot notation is faster to write and clearer to read.
Square bracket notation allows access to properties containing
special characters and selection of
properties using variables
Another example of characters that can't be used with dot notation is property names that themselves contain a dot.
For example a json response could contain a property called bar.Baz.
var foo = myResponse.bar.Baz; // incorrect syntax
var foo = myResponse["bar.Baz"]; // correct syntax
The bracket notation allows you to access properties by name stored in a variable:
var obj = { "abc" : "hello" };
var x = "abc";
var y = obj[x];
console.log(y); //output - hello
obj.x would not work in this case.
The two most common ways to access properties in JavaScript are with a dot and with square brackets. Both value.x and value[x] access a property on value—but not necessarily the same property. The difference is in how x is interpreted. When using a dot, the part after the dot must be a valid variable name, and it directly names the property. When using square brackets, the expression between the brackets is evaluated to get the property name. Whereas value.x fetches the property of value named “x”, value[x] tries to evaluate the expression x and uses the result as the property name.
So if you know that the property you are interested in is called “length”, you say value.length. If you want to extract the property named by the value held in the variable i, you say value[i]. And because property names can be any string, if you want to access a property named “2” or “John Doe”, you must use square brackets: value[2] or value["John Doe"]. This is the case even though you know the precise name of the property in advance, because neither “2” nor “John Doe” is a valid variable name and so cannot be accessed through dot notation.
In case of Arrays
The elements in an array are stored in properties. Because the names of these properties are numbers and we often need to get their name from a variable, we have to use the bracket syntax to access them. The length property of an array tells us how many elements it contains. This property name is a valid variable name, and we know its name in advance, so to find the length of an array, you typically write array.length because that is easier to write than array["length"].
Dot notation does not work with some keywords (like new and class) in internet explorer 8.
I had this code:
//app.users is a hash
app.users.new = {
// some code
}
And this triggers the dreaded "expected indentifier" (at least on IE8 on windows xp, I havn't tried other environments). The simple fix for that is to switch to bracket notation:
app.users['new'] = {
// some code
}
Generally speaking, they do the same job.
Nevertheless, the bracket notation gives you the opportunity to do stuff that you can't do with dot notation, like
var x = elem["foo[]"]; // can't do elem.foo[];
This can be extended to any property containing special characters.
Both foo.bar and foo["bar"] access a property on foo but not necessarily the same property. The difference is in how bar is interpreted. When using a dot, the word after the dot is the literal name of the property. When using square brackets, the expression between the brackets is evaluated to get the property name. Whereas foo.bar fetches the
property of value named “bar” , foo["bar"] tries to evaluate the expression "bar" and uses the result, converted to a string, as the property name
Dot Notation’s Limitation
if we take this oject :
const obj = {
123: 'digit',
123name: 'start with digit',
name123: 'does not start with digit',
$name: '$ sign',
name-123: 'hyphen',
NAME: 'upper case',
name: 'lower case'
};
accessing their propriete using dot notation
obj.123; // ❌ SyntaxError
obj.123name; // ❌ SyntaxError
obj.name123; // ✅ 'does not start with digit'
obj.$name; // ✅ '$ sign'
obj.name-123; // ❌ SyntaxError
obj.'name-123';// ❌ SyntaxError
obj.NAME; // ✅ 'upper case'
obj.name; // ✅ 'lower case'
But none of this is a problem for the Bracket Notation:
obj['123']; // ✅ 'digit'
obj['123name']; // ✅ 'start with digit'
obj['name123']; // ✅ 'does not start with digit'
obj['$name']; // ✅ '$ sign'
obj['name-123']; // ✅ 'does not start with digit'
obj['NAME']; // ✅ 'upper case'
obj['name']; // ✅ 'lower case'
accessing variable using variable :
const variable = 'name';
const obj = {
name: 'value'
};
// Bracket Notation
obj[variable]; // ✅ 'value'
// Dot Notation
obj.variable; // undefined
You need to use brackets if the property name has special characters:
var foo = {
"Hello, world!": true,
}
foo["Hello, world!"] = false;
Other than that, I suppose it's just a matter of taste. IMHO, the dot notation is shorter and it makes it more obvious that it's a property rather than an array element (although of course JavaScript does not have associative arrays anyway).
Be careful while using these notations:
For eg. if we want to access a function present in the parent of a window.
In IE :
window['parent']['func']
is not equivalent to
window.['parent.func']
We may either use:
window['parent']['func']
or
window.parent.func
to access it
You have to use square bracket notation when -
The property name is number.
var ob = {
1: 'One',
7 : 'Seven'
}
ob.7 // SyntaxError
ob[7] // "Seven"
The property name has special character.
var ob = {
'This is one': 1,
'This is seven': 7,
}
ob.'This is one' // SyntaxError
ob['This is one'] // 1
The property name is assigned to a variable and you want to access the
property value by this variable.
var ob = {
'One': 1,
'Seven': 7,
}
var _Seven = 'Seven';
ob._Seven // undefined
ob[_Seven] // 7
Bracket notation can use variables, so it is useful in two instances where dot notation will not work:
1) When the property names are dynamically determined (when the exact names are not known until runtime).
2) When using a for..in loop to go through all the properties of an object.
source: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Working_with_Objects
Case where [] notation is helpful :
If your object is dynamic and there could be some random values in keys like number and []or any other special character, for example -
var a = { 1 : 3 };
Now if you try to access in like a.1 it will through an error, because it is expecting an string over there.
Dot notation is always preferable. If you are using some "smarter" IDE or text editor, it will show undefined names from that object.
Use brackets notation only when you have the name with like dashes or something similar invalid. And also if the name is stored in a variable.
Let me add some more use case of the square-bracket notation. If you want to access a property say x-proxy in a object, then - will be interpreted wrongly. Their are some other cases too like space, dot, etc., where dot operation will not help you. Also if u have the key in a variable then only way to access the value of the key in a object is by bracket notation. Hope you get some more context.
An example where the dot notation fails
json = {
"value:":4,
'help"':2,
"hello'":32,
"data+":2,
"😎":'😴',
"a[]":[
2,
2
]
};
// correct
console.log(json['value:']);
console.log(json['help"']);
console.log(json["help\""]);
console.log(json['hello\'']);
console.log(json["hello'"]);
console.log(json["data+"]);
console.log(json["😎"]);
console.log(json["a[]"]);
// wrong
console.log(json.value:);
console.log(json.help");
console.log(json.hello');
console.log(json.data+);
console.log(json.😎);
console.log(json.a[]);
The property names shouldn't interfere with the syntax rules of javascript for you to be able to access them as json.property_name
Or when you want to dynamically change the classList action for an element:
// Correct
showModal.forEach(node => {
node.addEventListener(
'click',
() => {
changeClass(findHidden, 'remove'); // Correct
},
true
);
});
//correct
function changeClass(findHidden, className) {
for (let item of findHidden) {
console.log(item.classList[className]('hidden'));// Correct
}
}
// Incorrect
function changeClass(findHidden, className) {
for (let item of findHidden) {
console.log(item.classList.className('hidden')); // Doesn't work
}
}
I'm giving another example to understand the usage differences btw them clearly. When using nested array and nested objects
const myArray = [
{
type: "flowers",
list: [ "a", "b", "c" ],
},
{
type: "trees",
list: [ "x", "y", "z" ],
}
];
Now if we want to access the second item from the trees list means y.
We can't use bracket notation all the time
const secondTree = myArray[1]["list"][1]; // incorrect syntex
Instead, we have to use
const secondTree = myArray[1].list[1]; // correct syntex
The dot notation and bracket notation both are used to access the object properties in JavaScript. The dot notation is mostly used as it is easier to read and comprehend. So why should we use bracket notation and what is the difference between then? well, the bracket notation [] allows us to access object properties using variables because it converts the expression inside the square brackets to a string.
const person = {
name: 'John',
age: 30
};
//dot notation
const nameDot = person.name;
console.log(nameDot);
// 'John'
const nameBracket = person['name'];
console.log(nameBracket);
// 'John'
Now, let's look at a variable example:
const person = {
name: 'John',
age: 30
};
const myName = 'name';
console.log(person[myName]);
// 'John'
Another advantage is that dot notation only contain be alphanumeric (and _ and $) so for instance, if you want to access an object like the one below (contains '-', you must use the bracket notation for that)
const person = {
'my-name' : 'John'
}
console.log(person['my-name']); // 'John'
// console.log(person.my-name); // Error

Add a VSA (Vendor Specific Attribute) to Access-Accept reply programmatically in FreeRADIUS C module

I have a FreeRADIUS C language module that implements MOD_AUTHENTICATE and MOD_AUTHORIZE methods for custom auth purpose. I need the ability to programmatically add VSAs to the Access-Accept reply.
I have toyed a bit with radius_pair_create() and fr_pair_add() methods (see snippet below) but that didn’t yield any change to the reply content, possibly because I specified ad-hoc values that don’t exist in a vendor-specific dictionary. Or because I didn’t use them correctly.
My FreeRADIUS version is 3_0_19
Any information, pointers and, especially, syntax samples will be highly appreciated.
void test_vsa(REQUEST *request)
{
VALUE_PAIR *vp = NULL;
vp = radius_pair_create(request->reply, NULL, 18, 0);
if (vp)
{
log("Created VALUE_PAIR");
vp->vp_integer = 96;
fr_pair_add(&request->reply->vps, vp);
}
else
{
log("Failed to create VALUE_PAIR");
}
}
So first off you're writing an integer value to a string attribute, which is wrong. The only reason why the server isn't SEGVing is because the length of the VP has been left at zero, so the RADIUS encoder doesn't bother dereferencing the char * inside the pair that's meant to contain the pair's value.
fr_pair_make is the easier function to use here, as it takes both the attribute name and value as strings, so you don't need to worry about the C types.
The code snippet below should do what you want.
void test_avp(REQUEST *request)
{
VALUE_PAIR *vp = NULL;
vp = fr_pair_make(request->reply, &request->reply->vps, "Reply-Message", "Hello from FreeRADIUS", T_OP_SET);
if (vp)
{
log("Created VALUE_PAIR");
}
else
{
log("Failed to create VALUE_PAIR");
}
}
For a bit more of an explanation, lets look at the doxygen header:
/** Create a VALUE_PAIR from ASCII strings
*
* Converts an attribute string identifier (with an optional tag qualifier)
* and value string into a VALUE_PAIR.
*
* The string value is parsed according to the type of VALUE_PAIR being created.
*
* #param[in] ctx for talloc
* #param[in] vps list where the attribute will be added (optional)
* #param[in] attribute name.
* #param[in] value attribute value (may be NULL if value will be set later).
* #param[in] op to assign to new VALUE_PAIR.
* #return a new VALUE_PAIR.
*/
VALUE_PAIR *fr_pair_make(TALLOC_CTX *ctx, VALUE_PAIR **vps,
char const *attribute, char const *value, FR_TOKEN op)
ctx - This is the packet or request that the vps will belong to. If you're adding attributes to the request it should be request->packet, reply would be request->reply, control would be request.
vps - If specified, this will be which list to insert the new VP into. If this is NULL fr_pair_make will just return the pair and let you insert it into a list.
attribute - The name of the attribute as a string.
value - The value of the attribute as a string. For non-string types, fr_pair_make will attempt to perform a conversion. So, for example, passing "12345" for an integer type, will result in the integer value 12345 being written to an int field in the attribute.
op - You'll usually want to us T_OP_SET which means overwrite existing instances of the same attribute. See the T_OP_* values of FR_TOKEN and the code that uses them, if you want to understand the different operators and what they do.

What could be a reason for `_localctx` being null in an antlr4 semantic predicate?

I'm using list labels to gather tokens and semantic predicates to validate sequences in my parser grammar. E.g.
line
:
(text+=WORD | text+=NUMBER)+ ((BLANK | SKIP)+ (text+=WORD | text+=NUMBER)+)+
{Parser.validateContext(_localctx)}?
(BLANK | SKIP)*
;
where
WORD: [\u0021-\u002F\u003A-\u007E]+; // printable ASCII characters (excluding SP and numbers)
NUMBER: [\u0030-\u0039]+; // printable ASCII number characters
BLANK: '\u0020';
SKIP: '\u0020\u0020' | '\t'; // two SPs or a HT symbol
The part of Parser.validateContext used to validate the line rule would be implemented like this
private static final boolean validateContext(ParserRuleContext context) {
//.. other contexts
if(context instanceof LineContext)
return "<reference-sequence>".equals(Parser.joinTokens(((LineContext) context).text, " "));
return false;}
where Parser.joinTokens is defined as
private static String joinTokens(java.util.List<org.antlr.v4.runtime.Token> tokens, String delimiter) {
StringBuilder builder = new StringBuilder();
int i = 0, n;
if((n = tokens.size()) == 0) return "";
builder.append(tokens.get(0).getText());
while(++i < n) builder.append(delimiter + tokens.get(i).getText());
return builder.toString();}
Both are put in a #parser::members clause a the beginning of the grammar file.
My problem is this: sometimes the _localctx reference is null and I receive "no viable alternative" errors. These are probably caused because the failing predicate guards the respective rule and no alternative is given.
Is there a reason–potentially an error on my part–why _localctx would be null?
UPDATE: The answer to this question seems to suggest that semantic predicates are also called during prediction. Maybe during prediction no context is created and _localctx is set to null.
The semantics of _localctx in a predicate are not defined. Allowable behavior includes, but is not limited to the following (and may change during any release):
Failing to compile (no identifier with that name)
Using the wrong context object
Not having a context object (null)
To reference the context of the current rule from within a predicate, you need to use $ctx instead.
Note that the same applies for rule parameters, locals, and/or return values which are used in a predicate. For example, the parameter a cannot be referenced as a, but must instead be $a.

Char.IsSymbol("*") is false

I'm working on a password validation routine, and am surprised to find that VB does not consider '*' to be a symbol per the Char.IsSymbol() check.
Here is the output from the QuickWatch:
char.IsSymbol("*") False Boolean
The MS documentation does not specify what characters are matched by IsSymbol, but does imply that standard mathematical symbols are included here.
Does anyone have any good ideas for matching all standard US special characters?
Characters that are symbols in this context: UnicodeCategory.MathSymbol, UnicodeCategory.CurrencySymbol, UnicodeCategory.ModifierSymbol and UnicodeCategory.OtherSymbol from the System.Globalization namespace. These are the Unicode characters designated Sm, Sc, Sk and So, respectively. All other characters return False.
From the .Net source:
internal static bool CheckSymbol(UnicodeCategory uc)
{
switch (uc)
{
case UnicodeCategory.MathSymbol:
case UnicodeCategory.CurrencySymbol:
case UnicodeCategory.ModifierSymbol:
case UnicodeCategory.OtherSymbol:
return true;
default:
return false;
}
}
or converted to VB.Net:
Friend Shared Function CheckSymbol(uc As UnicodeCategory) As Boolean
Select Case uc
Case UnicodeCategory.MathSymbol, UnicodeCategory.CurrencySymbol, UnicodeCategory.ModifierSymbol, UnicodeCategory.OtherSymbol
Return True
Case Else
Return False
End Select
End Function
CheckSymbol is called by IsSymbol with the Unicode category of the given char.
Since the * is in the category OtherPunctuation (you can check this with char.GetUnicodeCategory()), it is not considered a symbol, and the method correctly returns False.
To answer your question: use char.GetUnicodeCategory() to check which category the character falls in, and decide to include it or not in your own logic.
If you simply need to know that character is something else than digit or letter,
use just
!char.IsLetterOrDigit(c)
preferably with
&& !char.IsControl(c)
Maybe you have the compiler option "strict" of, because with
Char.IsSymbol("*")
I get a compiler error
BC30512: Option Strict On disallows implicit conversions from 'String' to 'Char'.
To define a Character literal in VB.NET, you must add a c to the string, like this:
Char.IsSymbol("*"c)
IsPunctuation(x) is what you are looking for.
This worked for me in C#:
string Password = "";
ConsoleKeyInfo key;
do
{
key = Console.ReadKey(true);
// Ignore any key out of range.
if (char.IsPunctuation(key.KeyChar) ||char.IsLetterOrDigit(key.KeyChar) || char.IsSymbol(key.KeyChar))
{
// Append the character to the password.
Password += key.KeyChar;
Console.Write("*");
}
// Exit if Enter key is pressed.
} while (key.Key != ConsoleKey.Enter);

Expression Evaluation and Tree Walking using polymorphism? (ala Steve Yegge)

This morning, I was reading Steve Yegge's: When Polymorphism Fails, when I came across a question that a co-worker of his used to ask potential employees when they came for their interview at Amazon.
As an example of polymorphism in
action, let's look at the classic
"eval" interview question, which (as
far as I know) was brought to Amazon
by Ron Braunstein. The question is
quite a rich one, as it manages to
probe a wide variety of important
skills: OOP design, recursion, binary
trees, polymorphism and runtime
typing, general coding skills, and (if
you want to make it extra hard)
parsing theory.
At some point, the candidate hopefully
realizes that you can represent an
arithmetic expression as a binary
tree, assuming you're only using
binary operators such as "+", "-",
"*", "/". The leaf nodes are all
numbers, and the internal nodes are
all operators. Evaluating the
expression means walking the tree. If
the candidate doesn't realize this,
you can gently lead them to it, or if
necessary, just tell them.
Even if you tell them, it's still an
interesting problem.
The first half of the question, which
some people (whose names I will
protect to my dying breath, but their
initials are Willie Lewis) feel is a
Job Requirement If You Want To Call
Yourself A Developer And Work At
Amazon, is actually kinda hard. The
question is: how do you go from an
arithmetic expression (e.g. in a
string) such as "2 + (2)" to an
expression tree. We may have an ADJ
challenge on this question at some
point.
The second half is: let's say this is
a 2-person project, and your partner,
who we'll call "Willie", is
responsible for transforming the
string expression into a tree. You get
the easy part: you need to decide what
classes Willie is to construct the
tree with. You can do it in any
language, but make sure you pick one,
or Willie will hand you assembly
language. If he's feeling ornery, it
will be for a processor that is no
longer manufactured in production.
You'd be amazed at how many candidates
boff this one.
I won't give away the answer, but a
Standard Bad Solution involves the use
of a switch or case statment (or just
good old-fashioned cascaded-ifs). A
Slightly Better Solution involves
using a table of function pointers,
and the Probably Best Solution
involves using polymorphism. I
encourage you to work through it
sometime. Fun stuff!
So, let's try to tackle the problem all three ways. How do you go from an arithmetic expression (e.g. in a string) such as "2 + (2)" to an expression tree using cascaded-if's, a table of function pointers, and/or polymorphism?
Feel free to tackle one, two, or all three.
[update: title modified to better match what most of the answers have been.]
Polymorphic Tree Walking, Python version
#!/usr/bin/python
class Node:
"""base class, you should not process one of these"""
def process(self):
raise('you should not be processing a node')
class BinaryNode(Node):
"""base class for binary nodes"""
def __init__(self, _left, _right):
self.left = _left
self.right = _right
def process(self):
raise('you should not be processing a binarynode')
class Plus(BinaryNode):
def process(self):
return self.left.process() + self.right.process()
class Minus(BinaryNode):
def process(self):
return self.left.process() - self.right.process()
class Mul(BinaryNode):
def process(self):
return self.left.process() * self.right.process()
class Div(BinaryNode):
def process(self):
return self.left.process() / self.right.process()
class Num(Node):
def __init__(self, _value):
self.value = _value
def process(self):
return self.value
def demo(n):
print n.process()
demo(Num(2)) # 2
demo(Plus(Num(2),Num(5))) # 2 + 3
demo(Plus(Mul(Num(2),Num(3)),Div(Num(10),Num(5)))) # (2 * 3) + (10 / 2)
The tests are just building up the binary trees by using constructors.
program structure:
abstract base class: Node
all Nodes inherit from this class
abstract base class: BinaryNode
all binary operators inherit from this class
process method does the work of evaluting the expression and returning the result
binary operator classes: Plus,Minus,Mul,Div
two child nodes, one each for left side and right side subexpressions
number class: Num
holds a leaf-node numeric value, e.g. 17 or 42
The problem, I think, is that we need to parse perentheses, and yet they are not a binary operator? Should we take (2) as a single token, that evaluates to 2?
The parens don't need to show up in the expression tree, but they do affect its shape. E.g., the tree for (1+2)+3 is different from 1+(2+3):
+
/ \
+ 3
/ \
1 2
versus
+
/ \
1 +
/ \
2 3
The parentheses are a "hint" to the parser (e.g., per superjoe30, to "recursively descend")
This gets into parsing/compiler theory, which is kind of a rabbit hole... The Dragon Book is the standard text for compiler construction, and takes this to extremes. In this particular case, you want to construct a context-free grammar for basic arithmetic, then use that grammar to parse out an abstract syntax tree. You can then iterate over the tree, reducing it from the bottom up (it's at this point you'd apply the polymorphism/function pointers/switch statement to reduce the tree).
I've found these notes to be incredibly helpful in compiler and parsing theory.
Representing the Nodes
If we want to include parentheses, we need 5 kinds of nodes:
the binary nodes: Add Minus Mul Divthese have two children, a left and right side
+
/ \
node node
a node to hold a value: Valno children nodes, just a numeric value
a node to keep track of the parens: Parena single child node for the subexpression
( )
|
node
For a polymorphic solution, we need to have this kind of class relationship:
Node
BinaryNode : inherit from Node
Plus : inherit from Binary Node
Minus : inherit from Binary Node
Mul : inherit from Binary Node
Div : inherit from Binary Node
Value : inherit from Node
Paren : inherit from node
There is a virtual function for all nodes called eval(). If you call that function, it will return the value of that subexpression.
String Tokenizer + LL(1) Parser will give you an expression tree... the polymorphism way might involve an abstract Arithmetic class with an "evaluate(a,b)" function, which is overridden for each of the operators involved (Addition, Subtraction etc) to return the appropriate value, and the tree contains Integers and Arithmetic operators, which can be evaluated by a post(?)-order traversal of the tree.
I won't give away the answer, but a
Standard Bad Solution involves the use
of a switch or case statment (or just
good old-fashioned cascaded-ifs). A
Slightly Better Solution involves
using a table of function pointers,
and the Probably Best Solution
involves using polymorphism.
The last twenty years of evolution in interpreters can be seen as going the other way - polymorphism (eg naive Smalltalk metacircular interpreters) to function pointers (naive lisp implementations, threaded code, C++) to switch (naive byte code interpreters), and then onwards to JITs and so on - which either require very big classes, or (in singly polymorphic languages) double-dispatch, which reduces the polymorphism to a type-case, and you're back at stage one. What definition of 'best' is in use here?
For simple stuff a polymorphic solution is OK - here's one I made earlier, but either stack and bytecode/switch or exploiting the runtime's compiler is usually better if you're, say, plotting a function with a few thousand data points.
Hm... I don't think you can write a top-down parser for this without backtracking, so it has to be some sort of a shift-reduce parser. LR(1) or even LALR will of course work just fine with the following (ad-hoc) language definition:
Start -> E1
E1 -> E1+E1 | E1-E1
E1 -> E2*E2 | E2/E2 | E2
E2 -> number | (E1)
Separating it out into E1 and E2 is necessary to maintain the precedence of * and / over + and -.
But this is how I would do it if I had to write the parser by hand:
Two stacks, one storing nodes of the tree as operands and one storing operators
Read the input left to right, make leaf nodes of the numbers and push them into the operand stack.
If you have >= 2 operands on the stack, pop 2, combine them with the topmost operator in the operator stack and push this structure back to the operand tree, unless
The next operator has higher precedence that the one currently on top of the stack.
This leaves us the problem of handling brackets. One elegant (I thought) solution is to store the precedence of each operator as a number in a variable. So initially,
int plus, minus = 1;
int mul, div = 2;
Now every time you see a a left bracket increment all these variables by 2, and every time you see a right bracket, decrement all the variables by 2.
This will ensure that the + in 3*(4+5) has higher precedence than the *, and 3*4 will not be pushed onto the stack. Instead it will wait for 5, push 4+5, then push 3*(4+5).
Re: Justin
I think the tree would look something like this:
+
/ \
2 ( )
|
2
Basically, you'd have an "eval" node, that just evaluates the tree below it. That would then be optimized out to just being:
+
/ \
2 2
In this case the parens aren't required and don't add anything. They don't add anything logically, so they'd just go away.
I think the question is about how to write a parser, not the evaluator. Or rather, how to create the expression tree from a string.
Case statements that return a base class don't exactly count.
The basic structure of a "polymorphic" solution (which is another way of saying, I don't care what you build this with, I just want to extend it with rewriting the least amount of code possible) is deserializing an object hierarchy from a stream with a (dynamic) set of known types.
The crux of the implementation of the polymorphic solution is to have a way to create an expression object from a pattern matcher, likely recursive. I.e., map a BNF or similar syntax to an object factory.
Or maybe this is the real question:
how can you represent (2) as a BST?
That is the part that is tripping me
up.
Recursion.
#Justin:
Look at my note on representing the nodes. If you use that scheme, then
2 + (2)
can be represented as
.
/ \
2 ( )
|
2
should use a functional language imo. Trees are harder to represent and manipulate in OO languages.
As people have been mentioning previously, when you use expression trees parens are not necessary. The order of operations becomes trivial and obvious when you're looking at an expression tree. The parens are hints to the parser.
While the accepted answer is the solution to one half of the problem, the other half - actually parsing the expression - is still unsolved. Typically, these sorts of problems can be solved using a recursive descent parser. Writing such a parser is often a fun exercise, but most modern tools for language parsing will abstract that away for you.
The parser is also significantly harder if you allow floating point numbers in your string. I had to create a DFA to accept floating point numbers in C -- it was a very painstaking and detailed task. Remember, valid floating points include: 10, 10., 10.123, 9.876e-5, 1.0f, .025, etc. I assume some dispensation from this (in favor of simplicty and brevity) was made in the interview.
I've written such a parser with some basic techniques like
Infix -> RPN and
Shunting Yard and
Tree Traversals.
Here is the implementation I've came up with.
It's written in C++ and compiles on both Linux and Windows.
Any suggestions/questions are welcomed.
So, let's try to tackle the problem all three ways. How do you go from an arithmetic expression (e.g. in a string) such as "2 + (2)" to an expression tree using cascaded-if's, a table of function pointers, and/or polymorphism?
This is interesting,but I don't think this belongs to the realm of object-oriented programming...I think it has more to do with parsing techniques.
I've kind of chucked this c# console app together as a bit of a proof of concept. Have a feeling it could be a lot better (that switch statement in GetNode is kind of clunky (it's there coz I hit a blank trying to map a class name to an operator)). Any suggestions on how it could be improved very welcome.
using System;
class Program
{
static void Main(string[] args)
{
string expression = "(((3.5 * 4.5) / (1 + 2)) + 5)";
Console.WriteLine(string.Format("{0} = {1}", expression, new Expression.ExpressionTree(expression).Value));
Console.WriteLine("\nShow's over folks, press a key to exit");
Console.ReadKey(false);
}
}
namespace Expression
{
// -------------------------------------------------------
abstract class NodeBase
{
public abstract double Value { get; }
}
// -------------------------------------------------------
class ValueNode : NodeBase
{
public ValueNode(double value)
{
_double = value;
}
private double _double;
public override double Value
{
get
{
return _double;
}
}
}
// -------------------------------------------------------
abstract class ExpressionNodeBase : NodeBase
{
protected NodeBase GetNode(string expression)
{
// Remove parenthesis
expression = RemoveParenthesis(expression);
// Is expression just a number?
double value = 0;
if (double.TryParse(expression, out value))
{
return new ValueNode(value);
}
else
{
int pos = ParseExpression(expression);
if (pos > 0)
{
string leftExpression = expression.Substring(0, pos - 1).Trim();
string rightExpression = expression.Substring(pos).Trim();
switch (expression.Substring(pos - 1, 1))
{
case "+":
return new Add(leftExpression, rightExpression);
case "-":
return new Subtract(leftExpression, rightExpression);
case "*":
return new Multiply(leftExpression, rightExpression);
case "/":
return new Divide(leftExpression, rightExpression);
default:
throw new Exception("Unknown operator");
}
}
else
{
throw new Exception("Unable to parse expression");
}
}
}
private string RemoveParenthesis(string expression)
{
if (expression.Contains("("))
{
expression = expression.Trim();
int level = 0;
int pos = 0;
foreach (char token in expression.ToCharArray())
{
pos++;
switch (token)
{
case '(':
level++;
break;
case ')':
level--;
break;
}
if (level == 0)
{
break;
}
}
if (level == 0 && pos == expression.Length)
{
expression = expression.Substring(1, expression.Length - 2);
expression = RemoveParenthesis(expression);
}
}
return expression;
}
private int ParseExpression(string expression)
{
int winningLevel = 0;
byte winningTokenWeight = 0;
int winningPos = 0;
int level = 0;
int pos = 0;
foreach (char token in expression.ToCharArray())
{
pos++;
switch (token)
{
case '(':
level++;
break;
case ')':
level--;
break;
}
if (level <= winningLevel)
{
if (OperatorWeight(token) > winningTokenWeight)
{
winningLevel = level;
winningTokenWeight = OperatorWeight(token);
winningPos = pos;
}
}
}
return winningPos;
}
private byte OperatorWeight(char value)
{
switch (value)
{
case '+':
case '-':
return 3;
case '*':
return 2;
case '/':
return 1;
default:
return 0;
}
}
}
// -------------------------------------------------------
class ExpressionTree : ExpressionNodeBase
{
protected NodeBase _rootNode;
public ExpressionTree(string expression)
{
_rootNode = GetNode(expression);
}
public override double Value
{
get
{
return _rootNode.Value;
}
}
}
// -------------------------------------------------------
abstract class OperatorNodeBase : ExpressionNodeBase
{
protected NodeBase _leftNode;
protected NodeBase _rightNode;
public OperatorNodeBase(string leftExpression, string rightExpression)
{
_leftNode = GetNode(leftExpression);
_rightNode = GetNode(rightExpression);
}
}
// -------------------------------------------------------
class Add : OperatorNodeBase
{
public Add(string leftExpression, string rightExpression)
: base(leftExpression, rightExpression)
{
}
public override double Value
{
get
{
return _leftNode.Value + _rightNode.Value;
}
}
}
// -------------------------------------------------------
class Subtract : OperatorNodeBase
{
public Subtract(string leftExpression, string rightExpression)
: base(leftExpression, rightExpression)
{
}
public override double Value
{
get
{
return _leftNode.Value - _rightNode.Value;
}
}
}
// -------------------------------------------------------
class Divide : OperatorNodeBase
{
public Divide(string leftExpression, string rightExpression)
: base(leftExpression, rightExpression)
{
}
public override double Value
{
get
{
return _leftNode.Value / _rightNode.Value;
}
}
}
// -------------------------------------------------------
class Multiply : OperatorNodeBase
{
public Multiply(string leftExpression, string rightExpression)
: base(leftExpression, rightExpression)
{
}
public override double Value
{
get
{
return _leftNode.Value * _rightNode.Value;
}
}
}
}
Ok, here is my naive implementation. Sorry, I did not feel to use objects for that one but it is easy to convert. I feel a bit like evil Willy (from Steve's story).
#!/usr/bin/env python
#tree structure [left argument, operator, right argument, priority level]
tree_root = [None, None, None, None]
#count of parethesis nesting
parenthesis_level = 0
#current node with empty right argument
current_node = tree_root
#indices in tree_root nodes Left, Operator, Right, PRiority
L, O, R, PR = 0, 1, 2, 3
#functions that realise operators
def sum(a, b):
return a + b
def diff(a, b):
return a - b
def mul(a, b):
return a * b
def div(a, b):
return a / b
#tree evaluator
def process_node(n):
try:
len(n)
except TypeError:
return n
left = process_node(n[L])
right = process_node(n[R])
return n[O](left, right)
#mapping operators to relevant functions
o2f = {'+': sum, '-': diff, '*': mul, '/': div, '(': None, ')': None}
#converts token to a node in tree
def convert_token(t):
global current_node, tree_root, parenthesis_level
if t == '(':
parenthesis_level += 2
return
if t == ')':
parenthesis_level -= 2
return
try: #assumption that we have just an integer
l = int(t)
except (ValueError, TypeError):
pass #if not, no problem
else:
if tree_root[L] is None: #if it is first number, put it on the left of root node
tree_root[L] = l
else: #put on the right of current_node
current_node[R] = l
return
priority = (1 if t in '+-' else 2) + parenthesis_level
#if tree_root does not have operator put it there
if tree_root[O] is None and t in o2f:
tree_root[O] = o2f[t]
tree_root[PR] = priority
return
#if new node has less or equals priority, put it on the top of tree
if tree_root[PR] >= priority:
temp = [tree_root, o2f[t], None, priority]
tree_root = current_node = temp
return
#starting from root search for a place with higher priority in hierarchy
current_node = tree_root
while type(current_node[R]) != type(1) and priority > current_node[R][PR]:
current_node = current_node[R]
#insert new node
temp = [current_node[R], o2f[t], None, priority]
current_node[R] = temp
current_node = temp
def parse(e):
token = ''
for c in e:
if c <= '9' and c >='0':
token += c
continue
if c == ' ':
if token != '':
convert_token(token)
token = ''
continue
if c in o2f:
if token != '':
convert_token(token)
convert_token(c)
token = ''
continue
print "Unrecognized character:", c
if token != '':
convert_token(token)
def main():
parse('(((3 * 4) / (1 + 2)) + 5)')
print tree_root
print process_node(tree_root)
if __name__ == '__main__':
main()