When you define a class in Smalltalk, you can acces the documentation comment like this:
st> Integer comment
'I am the abstract integer class of the GNU Smalltalk system. My
subclasses'' instances can represent signed integers of various
sizes (a subclass is picked according to the size), with varying
efficiency.'
But maybe, I'm to newbie to Smalltalk, but I cannot find how to access to the method/message documentation. i.e.
Let's consider the following method
SomeClass >> #msg: arg
"This is a comment"
^self doThisTo: arg
It is tempting to implement a service for extracting comments such as:
commentOf: aCompiledMethod
^method sourceCode readStream upTo: $"; upTo: $"
In fact, in the example above, we would get the string 'This is a comment'. The problem, however, is that the double quote character is not always a comment delimiter. Consider for instance the following method
String >> #doubleQuoted
^'"', self , '"'
if we tried to use our method #commentOf: above for extracting comments from this method, we would get
' , self, '
which doesn't make any sense.
This means that our parsing should be less naïve. Therefore, the question that we should ask ourselves is whether it is not better to use the Smalltalk Parser of our environment. I don't know how to do this in gnu-smalltalk, so let me show here (as an example) how could we proceed in Pharo:
aCompiledMethod ast comments
The #ast message answers with the Abstract Syntactic Tree of the the method, also known as the Parse Tree, which exposes the syntactic structure of the method's source code. In particular, it detects all comments in the form of CommentNode objects, and that is why the #comments method simply consists in enumerating all parse nodes while collecting the comments.
Related
I am implementing futures in Pharo. I came across this website http://onsmalltalk.com/smalltalk-concurrency-playing-with-futures. I am following this example and trying to replicate it on Pharo. However, I get to this point the last step and I have no idea what ">>" means: This symbol is not also included as part of Smalltalk syntax in http://rigaux.org/language-study/syntax-across-languages-per-language/Smalltalk.html.
BlockClosure>>future
^ SFuture new value: self fixTemps
I can see future is not a variable or a method implemented by BlockClosure. What should I do with this part of the code to make the promises/futures work as indicated at http://onsmalltalk.com/smalltalk-concurrency-playing-with-futures? I cannot add it on the Playground or as a method to my Promise class as it is, or am I missing something?
After adding the future method to BlockClosure, this is the code I try on the PlayGround.
value1 := [200 timesRepeat:[Transcript show: '.']. 6] future.
value2 := [200 timesRepeat:[Transcript show: '+']. 6] future.
Transcript show: 'other work'.
Transcript show: (value1 + value2).
Date today
The transcript displays the below error instead of the expected value of 12.
UndefinedObject>>DoIt (value1 is Undeclared)
UndefinedObject>>DoIt (value2 is Undeclared)
For some reason that it would be nice to learn, there is a traditional notation in Smalltalk to refer to the method with selector, say, m in class C which is C>>m. For example, BlockClosure>>future denotes the method of BlockClosure with selector #future. Interestingly enough, the expression is not an evaluable Smalltalk one, meaning, it is not a Smalltalk expression. It is just a succinct way of saying, "what comes below is the source code of method m in class C". Just that.
In Smalltalk, however, methods are objects too. In fact, they are instances of CompiledMethod. This means that they can be retrieved by sending a message. In this case, the message is methodAt:. The receiver of the message is the class which implements the method and the argument is the selector (respectively, C and #m, or BlockClosure and #future in your example).
Most dialects, therefore, implement a synonym of methodAt: named >>. This is easily done in this way:
>> aSymbol
^self methodAt: aSymbol
This puts the Smalltalk syntax much closer to the traditional notation because now BlockClosure>>future looks like the expression that would send the message >> to BlockClosure with argument future. However, future is not a Symbol unless we prepend it with #, namely #future. So, if we prefix the selector with the # sign, we get the literal Symbol #future, which is a valid Smalltalk object. Now the expression
BlockClosure >> #future
becomes a message, and its result after evaluating it, the CompiledMethod with selector #future in the class BlockClosure.
In sum, BlockClosure>>future is a notation, not a valid Smalltalk expression. However, by tweaking it to be BlockClosure >> #future, it becomes an evaluable expression of the language that returns the method the notation referred to.
I know that everything is an object and you send messages to objects in Smalltalk to do almost everything.
Now how can we implement an object (memory representation and basic operations) to represent a primitive data type? For example how + for integers is implemented?
I looked at the source code for Smalltalk and found this in Smallint.st. Can someone explain this piece of code?
+ arg [
"Sum the receiver and arg and answer another Number"
<category: 'built ins'>
<primitive: VMpr_SmallInteger_plus>
^self generality == arg generality
ifFalse: [self retrySumCoercing: arg]
ifTrue: [(LargeInteger fromInteger: self) + (LargeInteger fromInteger: arg)]
]
Here is the link of above code: https://github.com/gnu-smalltalk/smalltalk/blob/62dab58e5231909c7286f1e61e26c9f503b2b3df/kernel/SmallInt.st
Conceptually speaking primitive methods are pieces of behavior (routines) implemented by the Virtual Machine (VM), not by regular Smalltalk code.
When the Smalltalk compiler finds the statement <primitive: ...> it interprets this as an special type of method whose argument (in your case VMpr_SmallInteger_plus) indicates the integer index of the target routine within the VM.
In this sense a primitive is a global routine not bound to the MethodDictionary of any particular class. The primitive logic is intended for a receiver and arguments of certain classes and that's why it must check that the receiver and the arguments (if any) conform its requirements. If not, the primitive fails and in that case the control flows to the Smalltalk code that follows the <primitive: ...> statement. Otherwise the primitive succeeds and the Smalltalk code below is not executed. Note also that the compiler will not allow for any Smalltalk code other than temporary declaration occurring above the <primitive:...> sentence.
In your example, if the argument arg is not of the expected class (presumably a SmallInteger) the routine gives up trying to sum it to the receiver and delegates the resolution of the operation to the Smalltalk code.
If the argument happens to be a SmallInteger, the primitive will compute the result (using the routine held in the VM) and answer with it.
I haven't seen the code of this primitive but it could also happen that the primitive fails if the result of the sum does not fit in a SmallInteger, in which case both the receiver and the argument would be cast to LargeIntegers and the addition would take place in the #+ method of the appropriate class (LargePositiveInteger or LargeNegativeInteger).
The other branch of the Smalltalk code allows for the implementation of a polymorphic sum between a SmallInteger and any other type of object. For instance this part of the Smalltalk code would take place if you evaluate 3 + 4.0 because in this case the argument is a Float. Something similar happens if you evaluate 3 + (4 / 3), etc.
I'm very new to Smalltalk and would like to understand a few things and confirm others (in order to see if I'm getting the idea or not):
1) In Smalltalk variables are untyped?
2) The only "type check" in Smalltalk occurs when a message is sent and the inheritance hierarchy is climbed up in order to bind the message to a method? And in case the class Object is reached it throws a run time error because the method doesn't exist?
3) There are no coercions because there are no types...?
4) Is it possible to overload methods or operators?
5) Is there some kind of Genericity? I mean, parametric polymorphism?
6) Is there some kind of compatibility/equivalence check for arguments when a message is sent? or when a variable is assigned?
Most questions probably have very short answers (If I'm in the right direction).
1) Variables have no declared types. They are all implicitly references to objects. The objects know what kind they are.
2) There is no implicit type check but you can do explicit checks if you like. Check out the methods isMemberOf: and isKindOf:.
3) Correct. There is no concept of coercion.
4) Operators are just messages. Any object can implement any method so, yes it has overloading.
5) Smalltalk is the ultimate in generic. Variables and collections can contain any object. Languages that have "generics" make the variables and collections more specific. Go figure. Polymorphism is based on the class of the receiver. To do multiple polymorphism use double dispatching.
6) There are no implicit checks. You can add your own explicit checks as needed.
Answer 3) you can change the type of an object using messages like #changeClassTo:, #changeClassToThatOf:, and #adoptInstance:. There are, of course caveats on what can be converted to what. See the method comments.
For the sake of completion, an example from the Squeak image:
Integer>>+ aNumber
"Refer to the comment in Number + "
aNumber isInteger ifTrue:
[self negative == aNumber negative
ifTrue: [^ (self digitAdd: aNumber) normalize]
ifFalse: [^ self digitSubtract: aNumber]].
aNumber isFraction ifTrue:
[^Fraction numerator: self * aNumber denominator + aNumber numerator denominator: aNumber denominator].
^ aNumber adaptToInteger: self andSend: #+
This shows:
that classes work as some kind of 'practical typing', effectively differentiating things that can be summed (see below).
a case of explicitly checking for Type/Class. Of course, if the parameter is not an Integer or Fraction, and does_not_understand #adaptToInteger:andSend:, it will raise a DNU (doesNotUnderstand see below).
some kind of 'coercion' going on, but not implicitly. The last line:
^aNumber adaptToInteger: self andSend: #+
asks the argument to the method to do the appropriate thing to add himself to an integer. This can involve asking the original receiver to return, say, a version of himself as a Float.
(doesn't really show, but insinuates) that #+ is defined in more than one class. Operators are defined as regular methods, they're called binary methods. The difference is some Smalltalk dialects limit them up to two chars length, and their precedence.
an example of dispatching on the type of the receiver and the argument. It uses double dispatch (see 3).
an explicit check where it's needed. Object can be seen as having types (classes), but variables are not. They just hold references to any object, as Smalltalk is dynamically typed.
This also shows that much of Smalltalk is implemented in Smalltalk itself, so the image is always a good place to look for this kind of things.
About DNU errors, they are actually a bit more involved:
When the search reaches the top class in the inheritance chain (presumably ProtoObject) and the method is not found, a #doesNotUndertand: message is sent to the object, with the message not understood as parameter) in case it wants to handle the miss. If #doesNotUnderstand: is not implemented, the lookup once again climbs up to Object, where its implementation is to throw an error.
Note: I'm not sure about the equivalence between Classes and Types, so I tried to be careful about that point.
I'm having a hard time understanding what I'm supposed to do. The only thing I've figured out is I need to use yacc on the cminus.y file. I'm totally confused about everything after that. Can someone explain this to me differently so that I can understand what I need to do?
INTRODUCTION:
We will use lex/flex and yacc/Bison to generate an LALR parser. I will give you a file called cminus.y. This is a yacc format grammar file for a simple C-like language called C-minus, from the book Compiler Construction by Kenneth C. Louden. I think the grammar should be fairly obvious.
The Yahoo group has links to several descriptions of how to use yacc. Now that you know flex it should be fairly easy to learn yacc. The only base type is int. An int is 4 bytes. Booleans are handled as ints, as in C. (Actually the grammar allows you to declare a variable as a type void, but let's not do that.) You can have one-dimensional arrays.
There are no pointers, but references to array elements should be treated as pointers (as in C).
The language provides for assignment, IF-ELSE, WHILE, and function calls and returns.
We want our compiler to output MIPS assembly code, and then we will be able to run it on SPIM. For a simple compiler like this with no optimization, an IR should not be necessary. We can output assembly code directly in one pass. However, our first step is to generate a symbol table.
SYMBOL TABLE:
I like Dr. Barrett’s approach here, which uses a lot of pointers to handle objects of different types. In essence the elements of the symbol table are identifier, type and pointer to an attribute object. The structure of the attribute object will differ according to the type. We only have a small number of types to deal with. I suggest using a linear search to find symbols in the table, at least to start. You can change it to hashing later if you want better performance. (If you want to keep in C, you can do dynamic allocation of objects using malloc.)
First you need to make a list of all the different types of symbols that there are—there are not many—and what attributes would be necessary for each. Be sure to allow for new attributes to be added, because we
have not covered all the issues yet. Looking at the grammar, the question of parameter lists for functions is a place where some thought needs to be put into the design. I suggest more symbol table entries and pointers.
TESTING:
The grammar is correct, so taking the existing grammar as it is and generating a parser, the parser will accept a correct C-minus program but it won’t produce any output, because there are no code snippets associated with the rules.
We want to add code snippets to build the symbol table and print information as it does so.
When an identifier is declared, you should print the information being entered into the symbol table. If a previous declaration of the same symbol in the same scope is found, an error message should be printed.
When an identifier is referenced, you should look it up in the table to make sure it is there. An error message should be printed if it has not been declared in the current scope.
When closing a scope, warnings should be generated for unreferenced identifiers.
Your test input should be a correctly formed C-minus program, but at this point nothing much will happen on most of the production rules.
SCOPING:
The most basic approach has a global scope and a scope for each function declared.
The language allows declarations within any compound statement, i.e. scope nesting. Implementing this will require some kind of scope numbering or stacking scheme. (Stacking works best for a one-pass
compiler, which is what we are building.)
(disclaimer) I don't have much experience with compiler classes (as in school courses on compilers) but here's what I understand:
1) You need to use the tools mentioned to create a parser which, when given input will tell the user if the input is a correct program as to the grammar defined in cminus.y. I've never used yacc/bison so I don't know how it is done, but this is what seems to be done:
(input) file-of-some-sort which represents output to be parsed
(output) reply-of-some-sort which tells if the (input) is correct with respect to the provided grammar.
2) It also seems that the output needs to check for variable consistency (ie, you can't use a variable you haven't declared same as any programming language), which is done via a symbol table. In short, every time something is declared you add it to the symbol table. When you encounter an identifier, if it is not one of the language identifiers (like if or while or for), you'll look it up in the symbol table to determine if it has been declared. If it is there, go on. If it's not - print some-sort-of-error
Note: point(2) there is a simplified take on a symbol table; in reality there's more to them than I just wrote but that should get you started.
I'd start with yacc examples - see what yacc can do and how it does it. I guess there must be some big example-complete-with-symbol-table out there which you can read to understand further.
Example:
Let's take input A:
int main()
{
int a;
a = 5;
return 0;
}
And input B:
int main()
{
int a;
b = 5;
return 0;
}
and assume we're using C syntax for parsing. Your parser should deem Input A all right, but should yell "b is undeclared" for Input B.
Aside from getting any real work done, I have an itch. My itch is to write a view engine that closely mimics a template system from another language (Template Toolkit/Perl). This is one of those if I had time/do it to learn something new kind of projects.
I've spent time looking at CoCo/R and ANTLR, and honestly, it makes my brain hurt, but some of CoCo/R is sinking in. Unfortunately, most of the examples are about creating a compiler that reads source code, but none seem to cover how to create a processor for templates.
Yes, those are the same thing, but I can't wrap my head around how to define the language for templates where most of the source is the html, rather than actual code being parsed and run.
Are there any good beginner resources out there for this kind of thing? I've taken a ganer at Spark, which didn't appear to have the grammar in the repo.
Maybe that is overkill, and one could just test-replace template syntax with c# in the file and compile it. http://msdn.microsoft.com/en-us/magazine/cc136756.aspx#S2
If you were in my shoes and weren't a language creating expert, where would you start?
The Spark grammar is implemented with a kind-of-fluent domain specific language.
It's declared in a few layers. The rules which recognize the html syntax are declared in MarkupGrammar.cs - those are based on grammar rules copied directly from the xml spec.
The markup rules refer to a limited subset of csharp syntax rules declared in CodeGrammar.cs - those are a subset because Spark only needs to recognize enough csharp to adjust single-quotes around strings to double-quotes, match curley braces, etc.
The individual rules themselves are of type ParseAction<TValue> delegate which accept a Position and return a ParseResult. The ParseResult is a simple class which contains the TValue data item parsed by the action and a new Position instance which has been advanced past the content which produced the TValue.
That isn't very useful on it's own until you introduce a small number of operators, as described in Parsing expression grammar, which can combine single parse actions to build very detailed and robust expressions about the shape of different syntax constructs.
The technique of using a delegate as a parse action came from a Luke H's blog post Monadic Parser Combinators using C# 3.0. I also wrote a post about Creating a Domain Specific Language for Parsing.
It's also entirely possible, if you like, to reference the Spark.dll assembly and inherit a class from the base CharGrammar to create an entirely new grammar for a particular syntax. It's probably the quickest way to start experimenting with this technique, and an example of that can be found in CharGrammarTester.cs.
Step 1. Use regular expressions (regexp substitution) to split your input template string to a token list, for example, split
hel<b>lo[if foo]bar is [bar].[else]baz[end]world</b>!
to
write('hel<b>lo')
if('foo')
write('bar is')
substitute('bar')
write('.')
else()
write('baz')
end()
write('world</b>!')
Step 2. Convert your token list to a syntax tree:
* Sequence
** Write
*** ('hel<b>lo')
** If
*** ('foo')
*** Sequence
**** Write
***** ('bar is')
**** Substitute
***** ('bar')
**** Write
***** ('.')
*** Write
**** ('baz')
** Write
*** ('world</b>!')
class Instruction {
}
class Write : Instruction {
string text;
}
class Substitute : Instruction {
string varname;
}
class Sequence : Instruction {
Instruction[] items;
}
class If : Instruction {
string condition;
Instruction then;
Instruction else;
}
Step 3. Write a recursive function (called the interpreter), which can walk your tree and execute the instructions there.
Another, alternative approach (instead of steps 1--3) if your language supports eval() (such as Perl, Python, Ruby): use a regexp substitution to convert the template to an eval()-able string in the host language, and run eval() to instantiate the template.
There are sooo many thing to do. But it does work for on simple GET statement plus a test. That's a start.
http://github.com/claco/tt.net/
In the end, I already had too much time in ANTLR to give loudejs' method a go. I wanted to spend a little more time on the whole process rather than the parser/lexer. Maybe in version 2 I can have a go at the Spark way when my brain understands things a little more.
Vici Parser (formerly known as LazyParser.NET) is an open-source tokenizer/template parser/expression parser which can help you get started.
If it's not what you're looking for, then you may get some ideas by looking at the source code.