I'm new to smalltalk and I'm impressed with the fact that there are only just 6 keywords in the language (self, super, true, false, nil & thisContext), and how pure it is in having almost everything as message passing, eg. looping using whileTrue, if/else using ifTrue, etc ... which are way different from what I'm used to in other languages.
Yet, there are cases where I just cannot make sense of how message passing really fit in, these include:
the assignment operator :=
the cascading operator ;
the period operator .
the way to create a set #( ... )
These aren't message passing, right?
As you've discovered, there's still some actual Smalltalk syntax. Block construction, literal strings/symbols/comments, local variable declaration (|...|), and returning (^) are a few things you didn't mention which are also syntax.
Some extensions (e.g. #(...), which typically creates an Array, not a set) are certainly expressible otherwise, for example #(1 2 3) is equivalent to Array with: 1 with: 2 with: 3; they're just there to make the code easier to read and write.
One thing that might help clarify : self, super, true, false, nil & thisContext are data primitives, rather than keywords.
They are the only 6 data primitives. These 6 are also known as pseudo-variables. Absolutely every other thing is an instance of Class Object or its subclasses.
There are very few pre-defined keywords in Smalltalk. They can be written in a very condensed form.
A famous example is Smalltalk Syntax on a Postcard (link)
exampleWithNumber: x
| y |
true & false not & (nil isNil) ifFalse: [self halt].
y := self size + super size.
#($a #a "a" 1 1.0)
do: [ :each |
Transcript show: (each class name);
show: ' '].
^x < y
Here's the comment for this method - which is larger than the method itself:
"A method that illustrates every part of Smalltalk method syntax
except primitives. It has unary, binary, and keyboard messages,
declares arguments and temporaries, accesses a global variable
(but not an instance variable), uses literals (array, character,
symbol, string, integer, float), uses the pseudo variables
true, false, nil, self, and super, and has sequence, assignment,
return and cascade. It has both zero argument and one argument blocks."
Related
I am implementing futures in Pharo. I came across this website http://onsmalltalk.com/smalltalk-concurrency-playing-with-futures. I am following this example and trying to replicate it on Pharo. However, I get to this point the last step and I have no idea what ">>" means: This symbol is not also included as part of Smalltalk syntax in http://rigaux.org/language-study/syntax-across-languages-per-language/Smalltalk.html.
BlockClosure>>future
^ SFuture new value: self fixTemps
I can see future is not a variable or a method implemented by BlockClosure. What should I do with this part of the code to make the promises/futures work as indicated at http://onsmalltalk.com/smalltalk-concurrency-playing-with-futures? I cannot add it on the Playground or as a method to my Promise class as it is, or am I missing something?
After adding the future method to BlockClosure, this is the code I try on the PlayGround.
value1 := [200 timesRepeat:[Transcript show: '.']. 6] future.
value2 := [200 timesRepeat:[Transcript show: '+']. 6] future.
Transcript show: 'other work'.
Transcript show: (value1 + value2).
Date today
The transcript displays the below error instead of the expected value of 12.
UndefinedObject>>DoIt (value1 is Undeclared)
UndefinedObject>>DoIt (value2 is Undeclared)
For some reason that it would be nice to learn, there is a traditional notation in Smalltalk to refer to the method with selector, say, m in class C which is C>>m. For example, BlockClosure>>future denotes the method of BlockClosure with selector #future. Interestingly enough, the expression is not an evaluable Smalltalk one, meaning, it is not a Smalltalk expression. It is just a succinct way of saying, "what comes below is the source code of method m in class C". Just that.
In Smalltalk, however, methods are objects too. In fact, they are instances of CompiledMethod. This means that they can be retrieved by sending a message. In this case, the message is methodAt:. The receiver of the message is the class which implements the method and the argument is the selector (respectively, C and #m, or BlockClosure and #future in your example).
Most dialects, therefore, implement a synonym of methodAt: named >>. This is easily done in this way:
>> aSymbol
^self methodAt: aSymbol
This puts the Smalltalk syntax much closer to the traditional notation because now BlockClosure>>future looks like the expression that would send the message >> to BlockClosure with argument future. However, future is not a Symbol unless we prepend it with #, namely #future. So, if we prefix the selector with the # sign, we get the literal Symbol #future, which is a valid Smalltalk object. Now the expression
BlockClosure >> #future
becomes a message, and its result after evaluating it, the CompiledMethod with selector #future in the class BlockClosure.
In sum, BlockClosure>>future is a notation, not a valid Smalltalk expression. However, by tweaking it to be BlockClosure >> #future, it becomes an evaluable expression of the language that returns the method the notation referred to.
I want to assign literals to some of the variables at the end of the file with my program, but to use these variables earlier. The only method I've come up with to do it is the following:
my $text;
say $text;
BEGIN {
$text = "abc";
}
Is there a better / more idiomatic way?
Just go functional.
Create subroutines instead:
say text();
sub text { "abc" }
UPDATE (Thanks raiph! Incorporating your feedback, including reference to using term:<>):
In the above code, I originally omitted the parentheses for the call to text, but it would be more maintainable to always include them to prevent the parser misunderstanding our intent. For example,
say text(); # "abc"
say text() ~ text(); # "abcabc"
say text; # "abc", interpreted as: say text()
say text ~ text; # ERROR, interpreted as: say text(~text())
sub text { "abc" };
To avoid this, you could make text a term, which effectively makes the bareword text behave the same as text():
say text; # "abc", interpreted as: say text()
say text ~ text; # "abcabc", interpreted as: say text() ~ text()
sub term:<text> { "abc" };
For compile-time optimizations and warnings, we can also add the pure trait to it (thanks Brad Gilbert!). is pure asserts that for a given input, the function "always produces the same output without any additional side effects":
say text; # "abc", interpreted as: say text()
say text ~ text; # "abcabc", interpreted as: say text() ~ text()
sub term:<text> is pure { "abc" };
Unlike Perl 5, in Perl 6 a BEGIN does not have to be a block. However, the lexical definition must be seen before it can be used, so the BEGIN block must be done before the say.
BEGIN my $text = "abc";
say $text;
Not sure whether this constitutes an answer to your question or not.
First, a rephrase of your question:
What options are there for succinctly referring to a variable (or constant etc.) whose initialization code appears further down in the same source file?
Post declare a routine
say foo;
sub foo { 'abc' }
When a P6 compiler parses an identifier that has no sigil, it checks to see if it has already seen a declaration of that identifier. If it hasn't, then it assumes that the identifier corresponds to a routine which will be declared later as a "listop" routine (which takes zero or more arguments) and moves on. (If its assumption turns out to be wrong, it fails the compilation.)
So you can use routines as if they were variables as described in Christopher Bottom's answer.
Autodeclare a variable on first use
strict is a "pragma" that controls how a P6 compiler reacts when it parses an as yet undeclared variable/constant that starts with a sigil.
P6 starts programs with strict mode switched on. This means that the compiler will insist on predeclaration of any sigil'd variable/constant. (By predeclaration I mean an explicit declaration that appears textually before the variable/constant is used.)
But you can write use strict or no strict to control whether the strict pragma is on or off in a given lexical scope, so this will work:
no strict;
say $text;
BEGIN {
$text = "abc";
}
Warning Having no strict in effect (which is unfortunately how most programming languages work) makes accidental misspelling of variable names a bigger nuisance than it is with use strict mode on.
Declare a variable explicitly in the same statement as its first use
You don't have to write a declaration as a separate statement. You can instead declare and use a variable in the same statement or expression:
say my $text;
BEGIN {
$text = "abc";
}
Warning If you repeat my $bar in the exact same lexical scope, the compiler will emit a warning. In contrast, say my $bar = 42; if foo { say my $bar = 99 } creates two distinct $bar variables without warning.
Initialize at run-time
The BEGIN phaser shown above runs at compile-time (after the my declaration, which also happens at compile-time, but before the say, which happens at run-time).
If you want to initialize variables/constants at run-time instead, use INIT instead:
say my $text;
INIT {
$text = "abc";
}
INIT code runs before any other run-time code, so the initialization still happens before the say gets executed.
Use a positronic (ym) variable
Given a literal interpretation of your question a "positronic" or ym variable would be yet another solution. (This feature is not built-in. I'm including it mostly because I encountered it after answering this question and think it belongs here, at the very least for entertainment value.)
Initialization and calculation of such a variable starts in the last statement using it and occurs backwards relative to the textual order of the code.
This is one of the several crazy sounding but actually working and useful concepts that Damian "mad scientist" Conway discusses in his 2011 presentation Temporally Quaquaversal Virtual Nanomachine Programming In Multiple Topologically Connected Quantum-Relativistic Parallel Spacetimes... Made Easy!.
Here's a link to the bit where he focuses on these variables.
(The whole presentation is a delight, especially if you're interested in physics; programming techniques; watching highly creative wunderkinds; and/or enjoy outstanding presentation skills and humor.)
Create a PS pragma?
In terms of coolness, the following pales in comparison to Damian's positronic variable feature that I just covered, but it's an idea I had while pondering this question.
Someone could presumably implement something like the following pragma:
use PS;
say $text;
BEGIN $text = 'abc';
This PS would lexically apply no strict and in addition require that, to avoid a compile-time error:
An auto-declared variable/constant must match up with a post declaration in a BEGIN or INIT phaser;
The declaration must include initialization if the first use (textually) of a variable/constant is not a binding or assignment.
I know that everything is an object and you send messages to objects in Smalltalk to do almost everything.
Now how can we implement an object (memory representation and basic operations) to represent a primitive data type? For example how + for integers is implemented?
I looked at the source code for Smalltalk and found this in Smallint.st. Can someone explain this piece of code?
+ arg [
"Sum the receiver and arg and answer another Number"
<category: 'built ins'>
<primitive: VMpr_SmallInteger_plus>
^self generality == arg generality
ifFalse: [self retrySumCoercing: arg]
ifTrue: [(LargeInteger fromInteger: self) + (LargeInteger fromInteger: arg)]
]
Here is the link of above code: https://github.com/gnu-smalltalk/smalltalk/blob/62dab58e5231909c7286f1e61e26c9f503b2b3df/kernel/SmallInt.st
Conceptually speaking primitive methods are pieces of behavior (routines) implemented by the Virtual Machine (VM), not by regular Smalltalk code.
When the Smalltalk compiler finds the statement <primitive: ...> it interprets this as an special type of method whose argument (in your case VMpr_SmallInteger_plus) indicates the integer index of the target routine within the VM.
In this sense a primitive is a global routine not bound to the MethodDictionary of any particular class. The primitive logic is intended for a receiver and arguments of certain classes and that's why it must check that the receiver and the arguments (if any) conform its requirements. If not, the primitive fails and in that case the control flows to the Smalltalk code that follows the <primitive: ...> statement. Otherwise the primitive succeeds and the Smalltalk code below is not executed. Note also that the compiler will not allow for any Smalltalk code other than temporary declaration occurring above the <primitive:...> sentence.
In your example, if the argument arg is not of the expected class (presumably a SmallInteger) the routine gives up trying to sum it to the receiver and delegates the resolution of the operation to the Smalltalk code.
If the argument happens to be a SmallInteger, the primitive will compute the result (using the routine held in the VM) and answer with it.
I haven't seen the code of this primitive but it could also happen that the primitive fails if the result of the sum does not fit in a SmallInteger, in which case both the receiver and the argument would be cast to LargeIntegers and the addition would take place in the #+ method of the appropriate class (LargePositiveInteger or LargeNegativeInteger).
The other branch of the Smalltalk code allows for the implementation of a polymorphic sum between a SmallInteger and any other type of object. For instance this part of the Smalltalk code would take place if you evaluate 3 + 4.0 because in this case the argument is a Float. Something similar happens if you evaluate 3 + (4 / 3), etc.
I'm very new to Smalltalk and would like to understand a few things and confirm others (in order to see if I'm getting the idea or not):
1) In Smalltalk variables are untyped?
2) The only "type check" in Smalltalk occurs when a message is sent and the inheritance hierarchy is climbed up in order to bind the message to a method? And in case the class Object is reached it throws a run time error because the method doesn't exist?
3) There are no coercions because there are no types...?
4) Is it possible to overload methods or operators?
5) Is there some kind of Genericity? I mean, parametric polymorphism?
6) Is there some kind of compatibility/equivalence check for arguments when a message is sent? or when a variable is assigned?
Most questions probably have very short answers (If I'm in the right direction).
1) Variables have no declared types. They are all implicitly references to objects. The objects know what kind they are.
2) There is no implicit type check but you can do explicit checks if you like. Check out the methods isMemberOf: and isKindOf:.
3) Correct. There is no concept of coercion.
4) Operators are just messages. Any object can implement any method so, yes it has overloading.
5) Smalltalk is the ultimate in generic. Variables and collections can contain any object. Languages that have "generics" make the variables and collections more specific. Go figure. Polymorphism is based on the class of the receiver. To do multiple polymorphism use double dispatching.
6) There are no implicit checks. You can add your own explicit checks as needed.
Answer 3) you can change the type of an object using messages like #changeClassTo:, #changeClassToThatOf:, and #adoptInstance:. There are, of course caveats on what can be converted to what. See the method comments.
For the sake of completion, an example from the Squeak image:
Integer>>+ aNumber
"Refer to the comment in Number + "
aNumber isInteger ifTrue:
[self negative == aNumber negative
ifTrue: [^ (self digitAdd: aNumber) normalize]
ifFalse: [^ self digitSubtract: aNumber]].
aNumber isFraction ifTrue:
[^Fraction numerator: self * aNumber denominator + aNumber numerator denominator: aNumber denominator].
^ aNumber adaptToInteger: self andSend: #+
This shows:
that classes work as some kind of 'practical typing', effectively differentiating things that can be summed (see below).
a case of explicitly checking for Type/Class. Of course, if the parameter is not an Integer or Fraction, and does_not_understand #adaptToInteger:andSend:, it will raise a DNU (doesNotUnderstand see below).
some kind of 'coercion' going on, but not implicitly. The last line:
^aNumber adaptToInteger: self andSend: #+
asks the argument to the method to do the appropriate thing to add himself to an integer. This can involve asking the original receiver to return, say, a version of himself as a Float.
(doesn't really show, but insinuates) that #+ is defined in more than one class. Operators are defined as regular methods, they're called binary methods. The difference is some Smalltalk dialects limit them up to two chars length, and their precedence.
an example of dispatching on the type of the receiver and the argument. It uses double dispatch (see 3).
an explicit check where it's needed. Object can be seen as having types (classes), but variables are not. They just hold references to any object, as Smalltalk is dynamically typed.
This also shows that much of Smalltalk is implemented in Smalltalk itself, so the image is always a good place to look for this kind of things.
About DNU errors, they are actually a bit more involved:
When the search reaches the top class in the inheritance chain (presumably ProtoObject) and the method is not found, a #doesNotUndertand: message is sent to the object, with the message not understood as parameter) in case it wants to handle the miss. If #doesNotUnderstand: is not implemented, the lookup once again climbs up to Object, where its implementation is to throw an error.
Note: I'm not sure about the equivalence between Classes and Types, so I tried to be careful about that point.
I know the language exists, but I can't put my finger on it.
dynamic scope
and
static typing?
We can try to reason about what such a language might look like. Obviously something like this (using a C-like syntax for demonstration purposes) cannot be allowed, or at least not with the obvious meaning:
int x_plus_(int y) {
return x + y; // requires that x have type int
}
int three_plus_(int y) {
double x = 3.0;
return x_plus_(y); // calls x_plus_ when x has type double
}
So, how to avoid this?
I can think of a few approaches offhand:
Commenters above mention that Fortran pre-'77 had this behavior. That worked because a variable's name determined its type; a function like x_plus_ above would be illegal, because x could never have an integer type. (And likewise one like three_plus_, for that matter, because y would have the same restriction.) Integer variables had to have names beginning with i, j, k, l, m, or n.
Perl uses syntax to distinguish a few broad categories of variables, namely scalars vs. arrays (regular arrays) vs. hashes (associative arrays). Variables belonging to the different categories can have the exact same name, because the syntax distinguishes which one is meant. For example, the expression foo $foo, $foo[0], $foo{'foo'} involves the function foo, the scalar $foo, the array #foo ($foo[0] being the first element of #foo), and the hash %foo ($foo{'foo'} being the value in %foo corresponding to the key 'foo'). Now, to be quite clear, Perl is not statically typed, because there are many different scalar types, and these are types not distinguished syntactically. (In particular: all references are scalars, even references to functions or arrays or hashes. So if you use the syntax to dereference a reference to an array, Perl has to check at runtime to see if the value really is an array-reference.) But this same approach could be used for a bona fide type system, especially if the type system were a very simple one. With that approach, the x_plus_ method would be using an x of type int, and would completely ignore the x declared by three_plus_. (Instead, it would use an x of type int that had to be provided from whatever scope called three_plus_.) This could either require some type annotations not included above, or it could use some form of type inference.
A function's signature could indicate the non-local variables it uses, and their expected types. In the above example, x_plus_ would have the signature "takes one argument of type int; uses a calling-scope x of type int; returns a value of type int". Then, just like how a function that calls x_plus_ would have to pass in an argument of type int, it would also have to provide a variable named x of type int — either by declaring it itself, or by inheriting that part of the type-signature (since calling x_plus_ is equivalent to using an x of type int) and propagating this requirement up to its callers. With this approach, the three_plus_ function above would be illegal, because it would violate the signature of the x_plus_ method it invokes — just the same as if it tried to pass a double as its argument.
The above could just have "undefined behavior"; the compiler wouldn't have to explicitly detect and reject it, but the spec wouldn't impose any particular requirements on how it had to handle it. It would be the responsibility of programmers to ensure that they never invoke a function with incorrectly-typed non-local variables.
Your professor was presumably thinking of #1, since pre-'77 Fortran was an actual real-world language with this property. But the other approaches are interesting to think about. :-)
I haven't found elsewhere has it written down, but AXIOM CAS (and various forks, including FriCAS which is still been actively developed) uses a script language called SPAD with both a very novel strong static dependent type system and dynamic scoping (although it is possibly an unintended implementation bug).
Most of the time the user won't realize that, but when they start trying to build closures like other functional languages it reveals its dynamic scoping nature:
FriCAS Computer Algebra System
Version: FriCAS 2021-03-06
Timestamp: Mon May 17 10:43:08 CST 2021
-----------------------------------------------------------------------------
Issue )copyright to view copyright notices.
Issue )summary for a summary of useful system commands.
Issue )quit to leave FriCAS and return to shell.
-----------------------------------------------------------------------------
(1) -> foo (x,y) == x + y
Type: Void
(2) -> foo (1,2)
Compiling function foo with type (PositiveInteger, PositiveInteger)
-> PositiveInteger
(2) 3
Type: PositiveInteger
(3) -> foo
(3) foo (x, y) == x + y
Type: FunctionCalled(foo)
(4) -> bar x y == x + y
Type: Void
(5) -> bar
(5) bar x == y +-> x + y
Type: FunctionCalled(bar)
(6) -> (bar 1)
Compiling function bar with type PositiveInteger ->
AnonymousFunction
(6) y +-> #1 + y
Type: AnonymousFunction
(7) -> ((bar 1) 2)
(7) #1 + 2
Type: Polynomial(Integer)
Such a behavior is similar to what will happen when trying to build a closure by using (lambda (x) (lambda (y) (+ x y))) in a dynamically scoped Lisp, such as Emacs Lisp. Actually the underlying representation of functions is essentially the same as Lisp in the early days since AXIOM has been first developed on top of an early Lisp implementation on IBM mainframe.
I believe it is however a defect (like what JMC happened did when implementing the first version of LISP language) because the implementor made the parser to do uncurrying as in the function definition of bar, but it is unlikely to be useful without the ability to build the closure in the language.
It is also worth notice that SPAD automatically renames the variables in anomalous functions to avoid captures so its dynamic scoping could be used as a feature as in other Lisps.
Dynamic scope means, that the variable and its type in a specific line of your code depends on the functions, called before. This means, you can not know the type in a specific line of your code, because you can not know, which code has been executed before.
Static typing means, that you have to know the type in every line of your code, before the code starts to run.
This is irreconcilable.