Finding the Pharo documentation for the compile & evaluate methods, etc. in the compiler class - smalltalk

I've got an embarrassingly simple question here. I'm a smalltalk newbie (I attempt to dabble with it every 5 years or so), and I've got Pharo 6.1 running. How do I go about finding the official standard library documentation? Especially for the compiler class? Things like the compile and evaluate methods? I don't see how to perform a search with the Help Browser, and the method comments in the compiler class are fairly terse and cryptic. I also don't see an obvious link to the standard library API documentation at: http://pharo.org/documentation. The books "Pharo by Example" and "Deep into Pharo" don't appear to cover that class either. I imagine the class is probably similar for Squeak and other smalltalks, so a link to their documentation for the compiler class could be helpful as well.
Thanks!

There are several classes that collaborate in the compilation of a method (or expression) and, given your interest in the subject, I'm tempted to stimulate you even further in their study and understanding.
Generally speaking, the main classes are the Scanner, the Parser, the Compiler and the Encoder. Depending on the dialect these may have slightly different names and implementations but the central idea remains the same.
The Scanner parses the stream of characters of the source code and produces a stream of tokens. These tokens are then parsed by the Parser, which transforms them into the nodes of the AST (Abstract Syntax Tree). Then the Compiler visits the nodes of the AST to analyze them semantically. Here all variable nodes are classified: method arguments, method temporaries, shared, block arguments, block temporaries, etc. It is during this analysis where all variables get bound in their corresponding scope. At this point the AST is no longer "abstract" as it has been annotated with binding information. Finally, the nodes are revisited to generate the literal frame and bytecodes of the compiled method.
Of course, there are lots of things I'm omitting from this summary (pragmas, block closures, etc.) but with these basic ideas in mind you should now be ready to debug a very simple example. For instance, start with
Object compile: 'm ^3'
to internalize the process.
After some stepping into and over, you will reach the first interesting piece of code which is the method OpalCompiler >> #compile. If we remove the error handling blocks this methods speaks for itself:
compile
| cm |
ast := self parse.
self doSemanticAnalysis.
self callPlugins.
cm := ast generate: self compilationContext compiledMethodTrailer
^cm
First we have the #parse message where the parse nodes are created. Then we have the semantic analysis I mentioned above and finally #generate: produces the encoding. You should debug each of these methods to understand the compilation process in depth. Given that you are dealing with a tree be prepared to navigate thru a lot of visitors.
Once you become familiar with the main ideas you may want to try more elaborated -yet simple- examples to see other objects entering the scene.

Here are some simple facts:
Evaluation in Smalltalk is available everywhere: in workspaces, in
the Transcript, in Browsers, inspectors, the debugger, etc.
Basically, if you are allowed to edit text, most likely you will
also be allowed to evaluate it.
There are 4 evaluation commands
Do it (evaluates without showing the answer)
Print it (evaluates and prints the answer next to the expression)
Inspect it (evaluates and opens an inspector on the result)
Debug it (opens a debugger so you can evaluate your expression step by step).
Your expression can contain any literal (numbers, arrays, strings, characters, etc.)
17 "valid expression"
Your expression can contain any message.
3 + 4.
'Hello world' size.
1 bitShift: 28
Your expression can use any Global variable
Object new.
Smalltalk compiler
Your expression can reference self, super, true, nil, false.
SharedRandom globalGenerator next < 0.2 ifTrue: [nil] ifFalse: [self]
Your expression can use any variables declared in the context of the pane where you are writing. For example:
If you are writing in a class browser, self will be bound to the current class
If you are writing in an inspector, self is bound to the object under inspection. You can also use its instances variables in the expression.
If you are in the debugger, your expression can reference self, the instance variables, message arguments, temporaries, etc.
Finally, if you are in a workspace (a.k.a. Playground), you can use any temporaries there, which will be automatically created and remembered, without you having to declare them.

As near as I can tell, there is no API documentation for the Pharo standard library, like you find with other programming languages. This seems to be confirmed on the Pharo User's mailing list: http://forum.world.st/Essential-Documentation-td4916861.html
...there is a draft version of the ANSI standard available: http://wiki.squeak.org/squeak/uploads/172/standard_v1_9-indexed.pdf
...but that doesn't seem to cover the compiler class.

Related

Using void structs in Raku via NativeCall

I'm trying to link libzip to Raku, and it uses a void struct or a struct with no body, like this:
struct zip;
typedef struct zip zip_t;
I declare it in my Raku program in the same way:
class zip_t is repr('CStruct'){};
This fails with:
Class zip_t has no attributes, which is illegal with the CStruct representation.
Only reference I have found to that error is in this non-addressed issue in MyHTML. That might make it a regression, but I'm really not sure. Any idea?
A google for "no attributes, which is illegal with the CStruct representation" yields three matches. The third leads to the following recent bug/change for module LibZip:
- class zip is repr('CStruct') is export { }
+ class zip is repr('CPointer') is export { }
Before I publish this I see Curt Tilmes has commented to similar effect.
I know little about C. But I enjoy researching things. This answer is a guess and some notes based on googling.
The error message you've quoted is about NativeCall, which in turn means it's about the Rakudo compiler, not the Raku language. (I presume you know this, and for many folk and situations the distinction isn't typically important, but I think it worth noting in this case.)
The top SO match for a google for "empty struct" is Empty structs in C. The question asks about the semantics of empty structs and using them with a foreign language. The question and its answers seem useful; the next few points are based on excerpts from them.
"Structures with no named members [have undefined behavior]". I'd say this explains why you're getting the NativeCall error message ("no attributes, which is illegal with the CStruct representation".). NativeCall is about having a reliable portable interface to C so it presumably must summarily reject undefined aspects. (Perhaps the error message could hint at what to do instead? Probably not. It's probably better that someone must search for matches to the message, just as you have done. And then presumably they would see this SO.)
I presume you're just trying to bind with libzip's empty struct as part of passing data back and forth without reading or writing it. I suspect that that is the crux of the problem here; how do you bind given that NativeCall (quite reasonably) refuses to do it in the usual manner?
"From the point of view of writing [a foreign language] binding to a C library ... You're never going to do anything with objects of the [empty struct] type except pass them to imported C functions." I presume this is true for your situation and that anything else would be undefined behavior per the C spec and thus at least tied to a specific C compiler for both the C library and the C compiler used to compile Rakudo and quite possibly undefined even then. I presume Curt has asked about your usage in case the binding is doing or requiring something crazy, but I very much doubt that it is.

When is invokedynamic actually useful (besides lazy constants)?

TL;DR
Please provide a piece of code written in some well known dynamic language (e.g. JavaScript) and how that code would look like in Java bytecode using invokedynamic and explain why the usage of invokedynamic is a step forward here.
Background
I have googled and read quite a lot about the not-that-new-anymore invokedynamic instruction which everyone on the internet agrees on that it will help speed dynamic languages on the JVM. Thanks to stackoverflow I managed to get my own bytecode instructions with Sable/Jasmin to run.
I have understood that invokedynamic is useful for lazy constants and I also think that I understood how the OpenJDK takes advantage of invokedynamic for lambdas.
Oracle has a small example, but as far as I can tell the usage of invokedynamic in this case defeats the purpose as the example for "adder" could much simpler, faster and with roughly the same effect expressed with the following bytecode:
aload whereeverAIs
checkcast java/lang/Integer
aload whereeverBIs
checkcast java/lang/Integer
invokestatic IntegerOps/adder(Ljava/lang/Integer;Ljava/lang/Integer;)Ljava/lang/Integer;
because for some reason Oracle's bootstrap method knows that both arguments are integers anyway. They even "admit" that:
[..]it assumes that the arguments [..] will be Integer objects. A bootstrap method requires additional code to properly link invokedynamic [..] if the parameters of the bootstrap method (in this example, callerClass, dynMethodName, and dynMethodType) vary.
Well yes, and without that interesing "additional code" there is no point in using invokedynamic here, is there?
So after that and a couple of further Javadoc and Blog entries I think that I have a pretty good grasp on how to use invokedynamic as a poor replacement when invokestatic/invokevirtual/invokevirtual or getfield would work just as well.
Now I am curious how to actually apply the invokedynamic instruction to a real world usecase so that it actually is some improvements over what we could with "traditional" invocations (except lazy constants, I got those...).
Actually, lazy operations are the main advantage of invokedynamic if you take the term “lazy creation” broadly. E.g., the lambda creation feature of Java 8 is a kind of lazy creation that includes the possibility that the actual class containing the code that will be finally invoked by the invokedynamic instruction doesn’t even exist prior to the execution of that instruction.
This can be projected to all kind of scripting languages delivering code in a different form than Java bytecode (might be even in source code). Here, the code may be compiled right before the first invocation of a method and remains linked afterwards. But it may even become unlinked if the scripting language supports redefinition of methods. This uses the second important feature of invokedynamic, to allow mutable CallSites which may be changed afterwards while supporting maximal performance when being invoked frequently without redefinition.
This possibility to change an invokedynamic target afterwards allows another option, linking to an interpreted execution on the first invocation, counting the number of executions and compiling the code only after exceeding a threshold (and relinking to the compiled code then).
Regarding dynamic method dispatch based on a runtime instance, it’s clear that invokedynamic can’t elide the dispatch algorithm. But if you detect at runtime that a particular call-site will always call the method of the same concrete type you may relink the CallSite to an optimized code which will do a short check if the target is that expected type and performs the optimized action then but branches to the generic code performing the full dynamic dispatch only if that test fails. The implementation may even de-optimize such a call-site if it detects that the fast path check failed a certain number of times.
This is close to how invokevirtual and invokeinterface are optimized internally in the JVM as for these it’s also the case that most of these instructions are called on the same concrete type. So with invokedynamic you can use the same technique for arbitrary lookup algorithms.
But if you want an entirely different use case, you can use invokedynamic to implement friend semantics which are not supported by the standard access modifier rules. Suppose you have a class A and B which are meant to have such a friend relationship in that A is allowed to invoke private methods of B. Then all these invocations may be encoded as invokedynamic instructions with the desired name and signature and pointing to a public bootstrap method in B which may look like this:
public static CallSite bootStrap(Lookup l, String name, MethodType type)
throws NoSuchMethodException, IllegalAccessException {
if(l.lookupClass()!=A.class || (l.lookupModes()&0xf)!=0xf)
throw new SecurityException("unprivileged caller");
l=MethodHandles.lookup();
return new ConstantCallSite(l.findStatic(B.class, name, type));
}
It first verifies that the provided Lookup object has full access to A as only A is capable of constructing such an object. So sneaky attempts of wrong callers are sorted out at this place. Then it uses a Lookup object having full access to B to complete the linkage. So, each of these invokedynamic instructions is permanently linked to the matching private method of B after the first invocation, running at the same speed as ordinary invocations afterwards.

What is formal comp. sci. name of this language property?

As a self-taught programmer, my definitions get fuzzy sometimes.
I'm very used to C and ObjC. In both of those your code must adhere to the language "structure". You can only do certain things in certain places. As an example, this is an error:
// beginning of file
NSLog(#"Hello world!"); // can't do this
#implementation MYClass
...
#end
However, in Ruby, anything you put anywhere is executed as the interpreter goes through it. So what is the difference between Ruby and Objective-C that allows this?
At first I thought it was that one was interpreted and the other compiled. Then I read some SO posts and the wikipedia definitions. Interpreted or compiled is a property of the implementation not the language. So that would mean there could (theoretically) be an interpreted implementation of Objective-C? In that case, the fact that a statement cannot be outside the implementation can't be a property of compiled languages, and vice-versa if there was a compiled implementation of Ruby. Or am I wrong in assuming that different implementations of a language would work the same way?
I'm not sure there's a technical term for it, but in most programming languages the context of the statement is extremely important.
Ruby has a concept of a root or main context where code is allowed. Other scripting languages follow this convention, presumably made popular by languages like Perl which allowed for very concise programming.
This allows things like this to be a complete and valid program:
print "Hello world!\n"
In other languages you need to define an entry point, such as a main routine, that is executed instead. Arbitrary code is not really allowed at the top level, which instead is reserved for things like function, type, constant, structure and class definitions.
A language like Ruby has a lot of control over the order in which the code is executed. C, by comparison, is usually composed of separate source files that are then linked together, where there's no inherent order to the way things are linked. All the modules are simply assembled into the final library or executable. This is why the main entry point is required, it defines which function to run first.
In short, it boils down to syntax, context, and language design considerations.
Ruby hides lots of stuff.
Ruby is OO like C++, Objective C and Java, and has main like C but you don't see this.
puts(42) is method call. It is a method of the main object called main. You can see it by typing puts self.
If you don't specify the receiver (receiver.method()) Ruby will use the implicit one, main.
Check available methods:
puts Object.private_methods.sort
Why you can put everything anywhere?
C/C++ look for main method called main, and when C/C++ find it, it will be executed.
Ruby on other hands doesn't need main or other method/class to run first.
It execute code from the first line until it meet the end of file(or __END__ on the separate line).
class Strongman
puts "I'm the best!"
end
is just syntactic sugar for Class.new method:
Strongman = Class.new do
puts "I'm the best!"
end
The same goes for 'module`.
for calls each and returns some kind of object. So you may think of it as something similar to method.
a = for i in 1..12; 42;end
puts a
# 1..12
In the end, it doesn't matter if it is method call or some kind of structure like C's int main(). Programming language decides what it should run first.

Add keyword to Objective-C using Clang

How would I go about adding a relatively trivial keyword to Objective-C using the Clang compiler? For example, adding a literal #yes which maps to [NSNumber numberWithBool:YES].
I have looked at the (excellent) source code for Clang and believe that most of the work I would need to do is in lib/Rewrite/RewriteObjC.cpp. There is the method RewriteObjC::RewriteObjCStringLiteral (see previous link) which does a similar job for literal NSString * instances.
I ask this question as Clang is very modular and I'm not sure which .td (see tablegen) files, .h files and AST visitor passes I would need to modify to achieve my goal.
If I understand the clang's code correctly (I'm still learning, so take caution), I think the starting point for this type of addition would be in Parser::ParseObjCAtExpression within clang/lib/Parse/ParseObjc.cpp.
One thing to note is that the Parser class is implemented in several files (seemingly separated by input language), but is declared entirely in clang/include/Parser.h.
Parser has many methods following the pattern of ParseObjCAt, e.g.,
ParseObjCAtExpression
ParseObjCAtStatement
ParseObjCAtDirectives
etc..
Specifically, line 1779 of ParseObjc.cpp appears to be where the parser detects an Objective-C string literal in the form of #"foo". However, it also calls ParsePostfixExpressionSuffix which I don't fully understand yet. I haven't figured out how it knows to parse a string literal (vs. an #synchronize, for example).
ExprResult Parser::ParseObjCAtExpression(SourceLocation AtLoc) {
...
return ParsePostfixExpressionSuffix(ParseObjCStringLiteral(AtLoc));
...
}
If you haven't yet, visit clang's "Getting Started" page to get started with compiling.

What OCaml standard library types cannot be marshalled?

I have a failure Marshaling a data structure (error abstract type (Custom)). There is one known abstract type in use, namely Big_int. However that Marshals fine. There is no custom C code in the application. Apart from Nums, Unix library is also used (however I don't believe there are any active objects of that type). We're Marshal'ing with Closures.
Two (only) third party libraries are in use: OCS Scheme (Scheme interpreter, pure Ocaml) and Dypgen (extensible GLR parser, also pure Ocaml). The problem is with a new feature of Dypgen, saving a dynamically extended parser.
The Ocaml error message is next to useless (it doesn't identify which abstract type with Custom tag is the culprit).
We suspected Lexbuf as the culprit because it contains a closure over an Ocaml channel, and can't be Marshal'ed, but it seems this isn't the problem. So my question is:
Which standard library components can't be Marshall'd?
Weak arrays cannot be marshaled. I am not familiar with OCS Scheme, but I would expect an interpreter for a garbage-collected language written in OCaml to use weak pointers (they let you piggy-back on OCaml's memory management).
In OCaml's defense, I do not think that the Custom method block contains the name of the type (retrospectively, that seems like a good thing to have).
EDIT: Yep:
$ grep Weak ~/Downloads/ocs-1.0.3/src/*.ml
/Users/pascal/Downloads/ocs-1.0.3/src/ocs_sym.ml:module SymTable = Weak.Make (HashSymbol)
EDIT2:
As pointed out by ygrek, there is room for a name in the custom method block. I should also clarify that weak arrays are not custom values, since my answer seemed to imply that. Weak arrays have the Abstract tag and are chained using the first word of data so that the garbage collector can traverse them in special weak-pointer-related phases of the collection cycle.