Related
I have 4 files all in the same directory: main.rakumod, infix_ops.rakumod, prefix_ops.rakumod and script.raku:
main module has a class definition (class A)
*_ops modules have some operator routine definitions to write, e.g., $a1 + $a2 in an overloaded way.
script.raku tries to instantaniate A object(s) and use those user-defined operators.
Why 3 files not 1? Since class definition might be long and separating overloaded operator definitions in files seemed like a good idea for writing tidier code (easier to manage).
e.g.,
# main.rakumod
class A {
has $.x is rw;
}
# prefix_ops.rakumod
use lib ".";
use main;
multi prefix:<++>(A:D $obj) {
++$obj.x;
$obj;
}
and similar routines in infix_ops.rakumod. Now, in script.raku, my aim is to import main module only and see the overloaded operators also available:
# script.raku
use lib ".";
use main;
my $a = A.new(x => -1);
++$a;
but it naturally doesn't see ++ multi for A objects because main.rakumod doesn't know the *_ops.rakumod files as it stands. Is there a way I can achieve this? If I use prefix_ops in main.rakumod, it says 'use lib' may not be pre-compiled perhaps because of circular dependentness
it says 'use lib' may not be pre-compiled
The word "may" is ambiguous. Actually it cannot be precompiled.
The message would be better if it said something to the effect of "Don't put use lib in a module."
This has now been fixed per #codesections++'s comment below.
perhaps because of circular dependentness
No. use lib can only be used by the main program file, the one directly run by Rakudo.
Is there a way I can achieve this?
Here's one way.
We introduce a new file that's used by the other packages to eliminate the circularity. So now we have four files (I've rationalized the naming to stick to A or variants of it for the packages that contribute to the type A):
A-sawn.rakumod that's a role or class or similar:
unit role A-sawn;
Other packages that are to be separated out into their own files use the new "sawn" package and does or is it as appropriate:
use A-sawn;
unit class A-Ops does A-sawn;
multi prefix:<++>(A-sawn:D $obj) is export { ++($obj.x) }
multi postfix:<++>(A-sawn:D $obj) is export { ($obj.x)++ }
The A.rakumod file for the A type does the same thing. It also uses whatever other packages are to be pulled into the same A namespace; this will import symbols from it according to Raku's standard importing rules. And then relevant symbols are explicitly exported:
use A-sawn;
use A-Ops;
sub EXPORT { Map.new: OUTER:: .grep: /'fix:<'/ }
unit class A does A-sawn;
has $.x is rw;
Finally, with this setup in place, the main program can just use A;:
use lib '.';
use A;
my $a = A.new(x => -1);
say $a++; # A.new(x => -1)
say ++$a; # A.new(x => 1)
say ++$a; # A.new(x => 2)
The two main things here are:
Introducing an (empty) A-sawn package
This type eliminates circularity using the technique shown in #codesection's answer to Best Way to Resolve Circular Module Loading.
Raku culture has a fun generic term/meme for techniques that cut through circular problems: "circular saws". So I've used a -sawn suffix of the "sawn" typename as a convention when using this technique.[1]
Importing symbols into a package and then re-exporting them
This is done via sub EXPORT { Map.new: ... }.[2] See the doc for sub EXPORT.
The Map must contain a list of symbols (Pairs). For this case I've grepped through keys from the OUTER:: pseudopackage that refers to the symbol table of the lexical scope immediately outside the sub EXPORT the OUTER:: appears in. This is of course the lexical scope into which some symbols (for operators) have just been imported by the use Ops; statement. I then grep that symbol table for keys containing fix:<; this will catch all symbol keys with that string in their name (so infix:<..., prefix:<... etc.). Alter this code as needed to suit your needs.[3]
Footnotes
[1] As things stands this technique means coming up with a new name that's different from the one used by the consumer of the new type, one that won't conflict with any other packages. This suggests a suffix. I think -sawn is a reasonable choice for an unusual and distinctive and mnemonic suffix. That said, I imagine someone will eventually package this process up into a new language construct that does the work behind the scenes, generating the name and automating away the manual changes one has to make to packages with the shown technique.
[2] A critically important point is that, if a sub EXPORT is to do what you want, it must be placed outside the package definition to which it applies. And that in turn means it must be before a unit package declaration. And that in turn means any use statement relied on by that sub EXPORT must appear within the same or outer lexical scope. (This is explained in the doc but I think it bears summarizing here to try head off much head scratching because there's no error message if it's in the wrong place.)
[3] As with the circularity saw aspect discussed in footnote 1, I imagine someone will also eventually package up this import-and-export mechanism into a new construct, or, perhaps even better, an enhancement of Raku's built in use statement.
Hi #hanselmann here is how I would write this (in 3 files / same dir):
Define my class(es):
# MyClass.rakumod
unit module MyClass;
class A is export {
has $.x is rw;
}
Define my operators:
# Prefix_Ops.rakumod
unit module Prefix_Ops;
use MyClass;
multi prefix:<++>(A:D $obj) is export {
++$obj.x;
$obj;
}
Run my code:
# script.raku
use lib ".";
use MyClass;
use Prefix_Ops;
my $a = A.new(x => -1);
++$a;
say $a.x; #0
Taking my cue from the Module docs there are a couple of things I am doing different:
Avoiding the use of main (or Main, or MAIN) --- I am wary that MAIN is a reserved name and just want to keep clear of engaging any of that (cool) machinery
Bringing in the unit module declaration at the top of each 'rakumod' file ... it may be possible to use bare files in Raku ... but I have never tried this and would say that it is not obvious from the docs that it is even possible, or supported
Now since I wanted this to work first time you will note that I use the same file name and module name ... again it may be possible to do that differently (multiple modules in one file and so on) ... but I have not tried that either
Using the 'is export' trait where I want my script to be able to use these definitions ... as you will know from close study of the docs ;-) is that each module has it's own namespace (the "stash") and we need export to shove the exported definitions into the namespace of the script
As #raiph mentions you only need the script to define the module library location
Since you want your prefix multi to "know" about class A then you also need to use MyClass in the Prefix_Ops module
Anyway, all-in-all, I think that the raku module system exemplifies the unique combination of "easy things easy and hard thinks doable" ... all I had to do with your code (which was very close) was tweak a few filenames and sprinkle in some concise concepts like 'unit module' and 'is export' and it really does not look much different since raku keeps all the import/export machinery under the surface like the swan gliding over the river...
I was playing with this in 2018.01:
my $proc = Proc.new: :out;
my $f = $proc.clone;
$f.spawn: 'ls';
put $f.out.slurp;
It says it can't do it. It's curious that the error message is about a routine I didn't use and a different class:
Cannot resolve caller stdout(Proc::Async: :bin); none of these signatures match:
(Proc::Async:D $: :$bin!, *%_)
(Proc::Async:D $: :$enc, :$translate-nl, *%_)
in block <unit> at proc-out.p6 line 3
Everything inherits a default clone method from Mu, which does a shallow clone, but that doesn't mean that everything makes sense to clone. This especially goes for objects that might hold references to OS-level things, such as Proc or IO::Handle. As the person who designed Proc::Async, I can say for certain that making it do anything useful on clone was not a design consideration. I didn't design Proc, but I suspect the same applies.
As for the error, keep in mind that the Perl 6 standard library is implemented in Perl 6 (a lot like in Java and .Net, but less like Perl 5 where many things that are provided by default go directly to something written in C). In this particular case, Proc is implemented in terms of Proc::Async. Rakudo tries to trim stack traces somewhat to eliminate calls inside of the setting, which is usually a win for the language user, but in cases like this can be a little less helpful. Running Rakudo with the --ll-exception flag provides the full details, and thus makes clearer what is going on.
This site tickled my sense of humour - http://www.antiifcampaign.com/ but can polymorphism work in every case where you would use an if statement?
Smalltalk, which is considered as a "truly" object oriented language, has no "if" statement, and it has no "for" statement, no "while" statement. There are other examples (like Haskell) but this is a good one.
Quoting Smalltalk has no “if” statement:
Some of the audience may be thinking
that this is evidence confirming their
suspicions that Smalltalk is weird,
but what I’m going to tell you is
this:
An “if” statement is an abomination in an Object Oriented language.
Why? Well, an OO language is composed
of classes, objects and methods, and
an “if” statement is inescapably none
of those. You can’t write “if” in an
OO way. It shouldn’t exist.
Conditional execution, like everything
else, should be a method. A method of
what? Boolean.
Now, funnily enough, in Smalltalk,
Boolean has a method called
ifTrue:ifFalse: (that name will look
pretty odd now, but pass over it for
now). It’s abstract in Boolean, but
Boolean has two subclasses: True and
False. The method is passed two blocks
of code. In True, the method simply
runs the code for the true case. In
False, it runs the code for the false
case. Here’s an example that hopefully
explains:
(x >= 0) ifTrue: [
'Positive'
] ifFalse: [
'Negative'
]
You should be able to see ifTrue: and
ifFalse: in there. Don’t worry that
they’re not together.
The expression (x >= 0) evaluates to
true or false. Say it’s true, then we
have:
true ifTrue: [
'Positive'
] ifFalse: [
'Negative'
]
I hope that it’s fairly obvious that
that will produce ‘Positive’.
If it was false, we’d have:
false ifTrue: [
'Positive'
] ifFalse: [
'Negative'
]
That produces ‘Negative’.
OK, that’s how it’s done. What’s so
great about it? Well, in what other
language can you do this? More
seriously, the answer is that there
aren’t any special cases in this
language. Everything can be done in an
OO way, and everything is done in an
OO way.
I definitely recommend reading the whole post and Code is an object from the same author as well.
That website is against using if statements for checking if an object has a specific type. This is completely different from if (foo == 5). It's bad to use ifs like if (foo instanceof pickle). The alternative, using polymorphism instead, promotes encapsulation, making code infinitely easier to debug, maintain, and extend.
Being against ifs in general (doing a certain thing based on a condition) will gain you nothing. Notice how all the other answers here still make decisions, so what's really the difference?
Explanation of the why behind polymorphism:
Take this situation:
void draw(Shape s) {
if (s instanceof Rectangle)
//treat s as rectangle
if (s instanceof Circle)
//treat s as circle
}
It's much better if you don't have to worry about the specific type of an object, generalizing how objects are processed:
void draw(Shape s) {
s.draw();
}
This moves the logic of how to draw a shape into the shape class itself, so we can now treat all shapes the same. This way if we want to add a new type of shape, all we have to do is write the class and give it a draw method instead of modifying every conditional list in the whole program.
This idea is everywhere in programming today, the whole concept of interfaces is all about polymorphism. (Shape is an interface defining a certain behavior, allowing us to process any type that implements the Shape interface in our method.) Dynamic programming languages take this even further, allowing us to pass any type that supports the necessary actions into a method. Which looks better to you? (Python-style pseudo-code)
def multiply(a,b):
if (a is string and b is int):
//repeat a b times.
if (a is int and b is int):
//multiply a and b
or using polymorphism:
def multiply(a,b):
return a*b
You can now use any 2 types that support the * operator, allowing you to use the method with types that haven't event been created yet.
See polymorphism and what is polymorhism.
Though not OOP-related: In Prolog, the only way to write your whole application is without if statements.
Yes actually, you can have a turing-complete language that has no "if" per se and only allows "while" statements:
http://cseweb.ucsd.edu/classes/fa08/cse200/while.html
As for OO design, it makes sense to use an inheritance pattern rather than switches based on a type field in certain cases... That's not always feasible or necessarily desirable though.
#ennuikiller: conditionals would just be a matter of syntactic sugar:
if (test) body; is equivalent to x=test; while (x) {x=nil; body;}
if-then-else is a little more verbose:
if (test) ifBody; else elseBody;
is equivalent to
x = test; y = true;
while (x) {x = nil; y = nil; ifBody;}
while (y) {y = nil; elseBody;}
the primitive data structure is a list of lists. you could say 2 scalars are equal if they are lists of the same length. you would loop over them simultaneously using the head/tail operators and see if they stop at the same point.
of course that could all be wrapped up in macros.
The simplest turing complete language is probably iota. It contains only 2 symbols ('i' and '*').
Yep. if statements imply branches which can be very costly on a lot of modern processors - particularly PowerPC. Many modern PCs do a lot of pipeline re-ordering and so branch mis-predictions can cost an order of >30 cycles per branch miss.
On console programming it's sometimes faster to just execute the code and ignore it than check if you should execute it!
Simple branch avoidance in C:
if (++i >= 15)
{
i = 0;
)
can be re-written as
i = (i + 1) & 15;
However, if you want to see some real anti-if fu then read this
Oh and on the OOP question - I'll replace a branch mis-prediction with a virtual function call? No thanks....
The reasoning behind the "anti-if" campaign is similar to what Kent Beck said:
Good code invariably has small methods and
small objects. Only by factoring the system into many small pieces of state
and function can you hope to satisfy the “once and only once” rule. I get lots
of resistance to this idea, especially from experienced developers, but no one
thing I do to systems provides as much help as breaking it into more pieces.
If you don't know how to factor a program with composition and inheritance, then your classes and methods will tend to grow bigger over time. When you need to make a change, the easiest thing will be to add an IF somewhere. Add too many IFs, and your program will become less and less maintainable, and still the easiest thing will be to add more IFs.
You don't have to turn every IF into an object collaboration; but it's a very good thing when you know how to :-)
You can define True and False with objects (in a pseudo-python):
class True:
def if(then,else):
return then
def or(a):
return True()
def and(a):
return a
def not():
return False()
class False:
def if(then,else):
return false
def or(a):
return a
def and(a):
return False()
def not():
return True()
I think it is an elegant way to construct booleans, and it proves that you can replace every if by polymorphism, but that's not the point of the anti-if campaign. The goal is to avoid writing things such as (in a pathfinding algorithm) :
if type == Block or type == Player:
# You can't pass through this
else:
# You can
But rather call a is_traversable method on each object. In a sense, that's exactly the inverse of pattern matching. "if" is useful, but in some cases, it is not the best solution.
I assume you are actually asking about replacing if statements that check types, as opposed to replacing all if statements.
To replace an if with polymorphism requires a method in a common supertype you can use for dispatching, either by overriding it directly, or by reusing overridden methods as in the visitor pattern.
But what if there is no such method, and you can't add one to a common supertype because the super types are not maintained by you? Would you really go to the lengths of introducing a new supertype along with subtypes just to get rid of a single if? That would be taking purity a bit far in my opinion.
Also, both approaches (direct overriding and the visitor pattern) have their disadvantages: Overriding the method directly requires that you implement your method in the classes you want to switch on, which might not help cohesion. On the other hand, the visitor pattern is awkward if several cases share the same code. With an if you can do:
if (o instanceof OneType || o instanceof AnotherType) {
// complicated logic goes here
}
How would you share the code with the visitor pattern? Call a common method? Where would you put that method?
So no, I don't think replacing such if statements is always an improvement. It often is, but not always.
I used to write code a lot as the recommend in the anti-if campaign, using either callbacks in a delegate dictionary or polymorphism.
It's quite a beguiling argument, especially if you are dealing with messy code bases but to be honest, although it's great for a plugin model or simplifying large nested if statements, it does make navigating and readability a bit of a pain.
For example F12 (Go To Definition) in visual studio will take you to an abstract class (or, in my case an interface definition).
It also makes quick visual scanning of a class very cumbersome, and adds an overhead in setting up the delegates and lookup hashes.
Using the recommendations put forward in the anti-if campaign as much as they appear to be recommending looks like 'ooh, new shiny thing' programming to me.
As for the other constructs put forward in this thread, albeit it has been done in the spirit of a fun challenge, are just substitutes for an if statement, and don't really address what the underlying beliefs of the anti-if campaign.
You can avoid ifs in your business logic code if you keep them in your construction code (Factories, builders, Providers etc.). Your business logic code would be much more readable, easier to understand or easier to maintain or extend. See: http://www.youtube.com/watch?v=4F72VULWFvc
Haskell doesn't even have if statements, being pure functional. ;D
You can do it without if per se, but you can't do it without a mechanism that allows you to make a decision based on some condition.
In assembly, there's no if statement. There are conditional jumps.
In Haskell for instance, there's no explicit if, instead, you define a function multiple times, I forgot the exact syntax, but it's something like this:
pseudo-haskell:
def posNeg(x < 0):
return "negative"
def posNeg(x == 0):
return "zero"
def posNeg(x):
return "positive"
When you call posNeg(a), the interpreter will look at the value of a, if it's < 0 then it will choose the first definition, if it's == 0 then it will choose the second definition, otherwise it will default to the third definition.
So while languages like Haskell and SmallTalk don't have the usual C-style if statement, they have other means of allowing you to make decisions.
This is actually a coding game I like to play with programming languages. It's called "if we had no if" which has its origins at: http://wiki.tcl.tk/4821
Basically, if we disallow the use of conditional constructs in the language: no if, no while, no for, no unless, no switch etc.. can we recreate our own IF function. The answer depends on the language and what language features we can exploit (remember using regular conditional constructs is cheating co no ternary operators!)
For example, in tcl, a function name is just a string and any string (including the empty string) is allowed for anything (function names, variable names etc.). So, exploiting this we can do:
proc 0 {true false} {uplevel 1 $false; # execute false code block, ignore true}
proc 1 {true false} {uplevel 1 $true; # execute true code block, ignore flase}
proc _IF {boolean true false} {
$boolean $true $false
}
#usage:
_IF [expr {1<2}] {
puts "this is true"
} {
#else:
puts "this is false"
}
or in javascript we can abuse the loose typing and the fact that almost anything can be cast into a string and combine that with its functional nature:
function fail (discard,execute) {execute()}
function pass (execute,discard) {execute()}
var truth_table = {
'false' : fail,
'true' : pass
}
function _IF (expr) {
return truth_table[!!expr];
}
//usage:
_IF(3==2)(
function(){alert('this is true')},
//else
function(){alert('this is false')}
);
Not all languages can do this sort of thing. But languages I like tend to be able to.
The idea of polymorphism is to call an object without to first verify the class of that object.
That doesn't mean the if statement should not be used at all; you should avoid to write
if (object.isArray()) {
// Code to execute when the object is an array.
} else if (object.inString()) {
// Code to execute if the object is a string.
}
It depends on the language.
Statically typed languages should be able to handle all of the type checking by sharing common interfaces and overloading functions/methods.
Dynamically typed languages might need to approach the problem differently since type is not checked when a message is passed, only when an object is being accessed (more or less). Using common interfaces is still good practice and can eliminate many of the type checking if statements.
While some constructs are usually a sign of code smell, I am hesitant to eliminate any approach to a problem apriori. There may be times when type checking via if is the expedient solution.
Note: Others have suggested using switch instead, but that is just a clever way of writing more legible if statements.
Well, if you're writing in Perl, it's easy!
Instead of
if (x) {
# ...
}
you can use
unless (!x){
# ...
}
;-)
In answer to the question, and as suggested by the last respondent, you need some if statements to detect state in a factory. At that point you then instantiate a set of collaborating classes that solve the state specific problem. Of course, other conditionals would be required as needed, but they would be minimized.
What would be removed of course would be the endless procedural state checking rife in so much service based code.
Interesting smalltalk is mentioned, as that's the language I used before being dragged across into Java. I don't get home as early as I used to.
I thought about adding my two cents: you can optimize away ifs in many languages where the second part of a boolean expression is not evaluated when it won't affect the result.
With the and operator, if the first operand evaluates to false, then there is no need to evaluate the second one. With the or operator, it's the opposite - there's no need to evaluate the second operand if the first one is true. Some languages always behave like that, others offer an alternative syntax.
Here's an if - elseif - else code made in JavaScript by only using operators and anonymous functions.
document.getElementById("myinput").addEventListener("change", function(e) {
(e.target.value == 1 && !function() {
alert('if 1');
}()) || (e.target.value == 2 && !function() {
alert('else if 2');
}()) || (e.target.value == 3 && !function() {
alert('else if 3');
}()) || (function() {
alert('else');
}());
});
<input type="text" id="myinput" />
This makes me want to try defining an esoteric language where blocks implicitly behave like self-executing anonymous functions and return true, so that you would write it like this:
(condition && {
action
}) || (condition && {
action
}) || {
action
}
Scheme uses a single namespace for all variables, regardless of whether they are bound to functions or other types of values. Common Lisp separates the two, such that the identifier "hello" may refer to a function in one context, and a string in another.
(Note 1: This question needs an example of the above; feel free to edit it and add one, or e-mail the original author with it and I will do so.)
However, in some contexts, such as passing functions as parameters to other functions, the programmer must explicitly distinguish that he's specifying a function variable, rather than a non-function variable, by using #', as in:
(sort (list '(9 A) '(3 B) '(4 C)) #'< :key #'first)
I have always considered this to be a bit of a wart, but I've recently run across an argument that this is actually a feature:
...the
important distinction actually lies in the syntax of forms, not in the
type of objects. Without knowing anything about the runtime values
involved, it is quite clear that the first element of a function form
must be a function. CL takes this fact and makes it a part of the
language, along with macro and special forms which also can (and must)
be determined statically. So my question is: why would you want the
names of functions and the names of variables to be in the same
namespace, when the primary use of function names is to appear where a
variable name would rarely want to appear?
Consider the case of class names: why should a class named FOO prevent
the use of variables named FOO? The only time I would be referring the
class by the name FOO is in contexts which expect a class name. If, on
the rare occasion I need to get the class object which is bound to the
class name FOO, there is FIND-CLASS.
This argument does make some sense to me from experience; there is a similar case in Haskell with field names, which are also functions used to access the fields. This is a bit awkward:
data Point = Point { x, y :: Double {- lots of other fields as well --} }
isOrigin p = (x p == 0) && (y p == 0)
This is solved by a bit of extra syntax, made especially nice by the NamedFieldPuns extension:
isOrigin2 Point{x,y} = (x == 0) && (y == 0)
So, to the question, beyond consistency, what are the advantages and disadvantages, both for Common Lisp vs. Scheme and in general, of a single namespace for all values versus separate ones for functions and non-function values?
The two different approaches have names: Lisp-1 and Lisp-2. A Lisp-1 has a single namespace for both variables and functions (as in Scheme) while a Lisp-2 has separate namespaces for variables and functions (as in Common Lisp). I mention this because you may not be aware of the terminology since you didn't refer to it in your question.
Wikipedia refers to this debate:
Whether a separate namespace for functions is an advantage is a source of contention in the Lisp community. It is usually referred to as the Lisp-1 vs. Lisp-2 debate. Lisp-1 refers to Scheme's model and Lisp-2 refers to Common Lisp's model. These names were coined in a 1988 paper by Richard P. Gabriel and Kent Pitman, which extensively compares the two approaches.
Gabriel and Pitman's paper titled Technical Issues of Separation in Function Cells and Value Cells addresses this very issue.
Actually, as outlined in the paper by Richard Gabriel and Kent Pitman, the debate is about Lisp-5 against Lisp-6, since there are several other namespaces already there, in the paper are mentioned type names, tag names, block names, and declaration names. edit: this seems to be incorrect, as Rainer points out in the comment: Scheme actually seems to be a Lisp-1. The following is largely unaffected by this error, though.
Whether a symbol denotes something to be executed or something to be referred to is always clear from the context. Throwing functions and variables into the same namespace is primarily a restriction: the programmer cannot use the same name for a thing and an action. What a Lisp-5 gets out of this is just that some syntactic overhead for referencing something from a different namespace than what the current context implies is avoided. edit: this is not the whole picture, just the surface.
I know that Lisp-5 proponents like the fact that functions are data, and that this is expressed in the language core. I like the fact that I can call a list "list" and a car "car" without confusing my compiler, and functions are a fundamentally special kind of data anyway. edit: this is my main point: separate namespaces are not a wart at all.
I also liked what Pascal Constanza had to say about this.
I've met a similar distinction in Python (unified namespace) vs Ruby (distinct namespaces for methods vs non-methods). In that context, I prefer Python's approach -- for example, with that approach, if I want to make a list of things, some of which are functions while others aren't, I don't have to do anything different with their names, depending on their "function-ness", for example. Similar considerations apply to all cases in which function objects are to be bandied around rather than called (arguments to, and return values from, higher-order functions, etc, etc).
Non-functions can be called, too (if their classes define __call__, in the case of Python -- a special case of "operator overloading") so the "contextual distinction" isn't necessarily clear, either.
However, my "lisp-oid" experience is/was mostly with Scheme rather than Common Lisp, so I may be subconsciously biased by the familiarity with the uniform namespace that in the end comes from that experience.
The name of a function in Scheme is just a variable with the function as its value. Whether I do (define x (y) (z y)) or (let ((x (lambda (y) (z y)))), I'm defining a function that I can call. So the idea that "a variable name would rarely want to appear there" is kind of specious as far as Scheme is concerned.
Scheme is a characteristically functional language, so treating functions as data is one of its tenets. Having functions be a type of their own that's stored like all other data is a way of carrying on the idea.
The biggest downside I see, at least for Common Lisp, is understandability. We can all agree that it uses different namespaces for variables and functions, but how many does it have? In PAIP, Norvig showed that it has "at least seven" namespaces.
When one of the language's classic books, written by a highly respected programmer, can't even say for certain in a published book, I think there's a problem. I don't have a problem with multiple namespaces, but I wish the language was, at the least, simple enough that somebody could understand this aspect of it entirely.
I'm comfortable using the same symbol for a variable and for a function, but in the more obscure areas I resort to using different names out of fear (colliding namespaces can be really hard to debug!), and that really should never be the case.
There's good things to both approaches. However, I find that when it matters, I prefer having both a function LIST and a a variable LIST than having to spell one of them incorrectly.
Aside from getting any real work done, I have an itch. My itch is to write a view engine that closely mimics a template system from another language (Template Toolkit/Perl). This is one of those if I had time/do it to learn something new kind of projects.
I've spent time looking at CoCo/R and ANTLR, and honestly, it makes my brain hurt, but some of CoCo/R is sinking in. Unfortunately, most of the examples are about creating a compiler that reads source code, but none seem to cover how to create a processor for templates.
Yes, those are the same thing, but I can't wrap my head around how to define the language for templates where most of the source is the html, rather than actual code being parsed and run.
Are there any good beginner resources out there for this kind of thing? I've taken a ganer at Spark, which didn't appear to have the grammar in the repo.
Maybe that is overkill, and one could just test-replace template syntax with c# in the file and compile it. http://msdn.microsoft.com/en-us/magazine/cc136756.aspx#S2
If you were in my shoes and weren't a language creating expert, where would you start?
The Spark grammar is implemented with a kind-of-fluent domain specific language.
It's declared in a few layers. The rules which recognize the html syntax are declared in MarkupGrammar.cs - those are based on grammar rules copied directly from the xml spec.
The markup rules refer to a limited subset of csharp syntax rules declared in CodeGrammar.cs - those are a subset because Spark only needs to recognize enough csharp to adjust single-quotes around strings to double-quotes, match curley braces, etc.
The individual rules themselves are of type ParseAction<TValue> delegate which accept a Position and return a ParseResult. The ParseResult is a simple class which contains the TValue data item parsed by the action and a new Position instance which has been advanced past the content which produced the TValue.
That isn't very useful on it's own until you introduce a small number of operators, as described in Parsing expression grammar, which can combine single parse actions to build very detailed and robust expressions about the shape of different syntax constructs.
The technique of using a delegate as a parse action came from a Luke H's blog post Monadic Parser Combinators using C# 3.0. I also wrote a post about Creating a Domain Specific Language for Parsing.
It's also entirely possible, if you like, to reference the Spark.dll assembly and inherit a class from the base CharGrammar to create an entirely new grammar for a particular syntax. It's probably the quickest way to start experimenting with this technique, and an example of that can be found in CharGrammarTester.cs.
Step 1. Use regular expressions (regexp substitution) to split your input template string to a token list, for example, split
hel<b>lo[if foo]bar is [bar].[else]baz[end]world</b>!
to
write('hel<b>lo')
if('foo')
write('bar is')
substitute('bar')
write('.')
else()
write('baz')
end()
write('world</b>!')
Step 2. Convert your token list to a syntax tree:
* Sequence
** Write
*** ('hel<b>lo')
** If
*** ('foo')
*** Sequence
**** Write
***** ('bar is')
**** Substitute
***** ('bar')
**** Write
***** ('.')
*** Write
**** ('baz')
** Write
*** ('world</b>!')
class Instruction {
}
class Write : Instruction {
string text;
}
class Substitute : Instruction {
string varname;
}
class Sequence : Instruction {
Instruction[] items;
}
class If : Instruction {
string condition;
Instruction then;
Instruction else;
}
Step 3. Write a recursive function (called the interpreter), which can walk your tree and execute the instructions there.
Another, alternative approach (instead of steps 1--3) if your language supports eval() (such as Perl, Python, Ruby): use a regexp substitution to convert the template to an eval()-able string in the host language, and run eval() to instantiate the template.
There are sooo many thing to do. But it does work for on simple GET statement plus a test. That's a start.
http://github.com/claco/tt.net/
In the end, I already had too much time in ANTLR to give loudejs' method a go. I wanted to spend a little more time on the whole process rather than the parser/lexer. Maybe in version 2 I can have a go at the Spark way when my brain understands things a little more.
Vici Parser (formerly known as LazyParser.NET) is an open-source tokenizer/template parser/expression parser which can help you get started.
If it's not what you're looking for, then you may get some ideas by looking at the source code.