Objective C - What is the difference between IMP and function pointer? - objective-c

I recently started a project where I require to do swizzling.
After going through many tutorials I got a question, What is the difference between Implementation and function pointer?

From memory, an IMP is a memory-address just like a function pointer, and can be invoked just like an ordinary C function. However it is guaranteed to use objective-C messaging convention, where:
The first argument is the object to operate on (self).
The second argument is the _cmd (SELECTOR) to be invoked. I believe this is so to support dynamic features, such as ObjC message forwarding where we could wrap the original implementation in a proxy, say to start a transaction or perform a security check, or, for a Cocoa specific example, add some property observation cruft, by magic, at run-time. While we already have the function signature, I could be helpful, in some cases, to know "how did I get here?" with the message signature.
Following arguments, if any, are according to the method contract.

Related

smalltalk: about method- "withArgs:executeMethod:"

I'm trying to understand the method "withArgs: executeMethod: " in smalltalk, squeak.
1. I am trying to understand what is the role of the method?
2. What arguments need to be passed to it for it to be carried out?
A good way to understand this method is by considering it as a syntactic variant of the general expression
object msg: arg (*)
where object is the receiver of the message with selector msg: and arg its argument. There are of course variants with no or multiple arguments, but the idea is the same.
When object receives this message (*) the Virtual Machine (VM) looks up for the CompiledMethod with selector msg: in the object's hierarchy, and transfers it the control, binding self to object and the formal argument of the method to arg.
Notice that this invocation is managed by the VM, no by the Virtual Image (VI). So, how could we reflect the same in the VI? Well, there are two steps in this behavior (1) find the method and (2) bind its formal receiver and arguments to the actual ones and let it run.
Step (1) is the so called lookup algorithm. It is easily implemented in Smalltalk: just ask the receiver its class, check whether the class includes the selector #msg: and, if not, go to the superclass and repeat. If all checks fail, issue the doesNotUnderstand: message.
Step (2) exactly requires what #withArgs:executeMethod: provides. It allows us to say
object withArgs: {arg} executeMethod: method
where method is the CompiledMethod found in step (1). [We have to use {arg} rather than arg because the plural in withArgs: suggests that the method expects an Array of arguments.]
Why would we want this?
Generally speaking, giving the VI the capability to mimic behavior implemented in the VM is good because it makes metaprogramming easier (and more natural).
More practically, a relevant example of the use of this capability is the implementation of Method Wrappers. Briefly described, given any particular method, you can wrap it (as the wrappee) inside a wrapper method, which also has a preBlock. If you then substitute the original method in the MethodDictionary where it belongs, with the wrapper, you can let the wrapper first execute the preBlock and then the intended method. The first task is easy: just send the message preBlock value. For the second we have the method (the wrappee), the receiver and the arguments (if any). So, to complete the task you only need to send to the receiver withArgs:executeMethod: with the actual argument(s) and the wrappee.
Ah! Let's not forget to mention that one of the reasons for having Method Wrappers is to measure testing coverage.
Note also that withArgs:executeMethod: does not require the second argument, i.e., the method to execute, to be in any class, let alone the class of the receiver. In particular, you could create a CompiledMethod on the fly and execute it on any given object. Of course, it is up to you to make sure that the execution will not crash the VM by, say, using the third ivar of the receiver if the receiver has only two etc. A simple way to create a CompiledMethod without installing it in any class is by asking the Smalltalk compiler to do so (look for senders of newCompiler to learn how to do that).

What's so special about message passing in smalltalk

I was going through an introduction to Smalltalk.
In C++, the functions declared inside a class can be called by objects of that class, and similarly in Smalltalk a keyword, termed as message, is written adjacent to the name of the object.
(Don't know much but would also like to ask here whether in response to a message a unique method is there to be executed?)
Basically, to my naive mind, this seems to be only a difference in syntax style. But, I wonder if internally in terms of compilation or memory structure this difference in calling holds any significance.
Thanks in advance.
P.S : I bow down to all of you for your time and answers . Thanks a lot.
The fundamental difference is that in Smalltalk, the receiver of the message has complete control over how that message is handled. It's a true object, not a data structure with functions that operate on it.
That means that in Smalltalk you can send any message to any object. The compiler places no restrictions on that, it's all handled at runtime. In C++, you can only invoke functions that the compiler knows about.
Also, Smalltalk messages are simply symbols (unique character strings), not a function address in memory as in C++. That means it's easy to send messages interactively, or over a network connection. There is a perform: method that lets you send a message given its string name.
An object even receives messages it does not implement. The Virtual Machine detects that case and creates a Message object, and then sends the messageNotUnderstood: message. Again, it's the object's sole responsibility of how to handle that unknown message. Most objects simply inherit the default implementation which raises an error, but an object can also handle it itself. It could, for example, forward those messages to a remote object, or log them to a file, etc.
You call a function in C++ because during the compilation time you know which function will be called (or at least you have a finite set of functions defined in a class hierarchy.
Smalltalk is dynamically typed and late bound, so during the compilation time you have no idea which method is going to be evaluated (if one will be at all). Thus you send a message, and if the object has a method with that selector, it is evaluated. Otherwise, the "message not understood" exception is raised.
There are already good answers here. Let me add some details (originally, part of this was in a comment).
In plain C, the target of each function call is determined at link time (except when you use function pointers). C++ adds virtual functions, for which the actual function that will be invoked by a call is determined at runtime (dynamic dispatch, late binding). Function pointers allow for custom dispatch mechanisms to some degree, but you have to program it yourself.
In Smalltalk, all message sends are dynamically dispatched. In C++ terms this roughly means: All member functions are virtual, and there are no standalone functions (there is always a receiver). Therefore, the Smalltalk compiler never* decides which method will be invoked by a message send. Instead, the invoked method is determined at runtime by the Virtual Machine that implements Smalltalk.
One way to implement virtual function dispatching is virtual function tables. An approximate equivalent in Smalltalk are method dictionaries. However, these dictionaries are mutable, unlike typical virtual function tables, which are generated by the C++ compiler and do not change at runtime. All Smalltalk behaviors (Behavior being a superclass of Class) have such a method dictionary. As #aka.nice pointed out in his answer, the method dictionaries can be queried. But methods can also be added (or removed) while the Smalltalk system runs. When the Smalltalk VM dispatches a message send, it searches the method dictionaries of the receiver's superclass chain for the correct method. There are usually caches in place to avoid the recurring cost of that lookup.
Also note that message passing is the only way for objects to communicate in Smalltalk. Two objects cannot access each other's instance variables, even if they belong to the same class. In C++, you can write code that breaks this encapsulation. Hence, message sending is fundamental in Smalltalk, whereas in C++ it is basically an optional feature.
In C++, Java, and similar languages, there is another form of dispatch, called function overloading. It happens exclusively at compile time and selects a function based on the declared types of the arguments at the call site. You cannot influence it at runtime. Smalltalk obviously does not provide this form of dispatch because it does not have static typing of variables. It can be realized nevertheless using idioms such as double dispatch. Other languages, such as Common Lisp's CLOS or Groovy, provide the even more general multiple dispatch, which means that a method will be selected based on both the receiver's type and the runtime types of all the arguments.
* Some special messages such as ifTrue: ifFalse: whileTrue: are usually compiled directly to conditional branches and jumps in the bytecode, instead of message sends. But in most cases it does not influence the semantics.
Here are a few example of what you would not find in C++
In Smalltalk, you create a new class by sending a message (either to the superclass, or to the namespace depending on the dialect).
In Smalltalk, you compile a new method by sending a message to a Compiler.
In Smalltalk, a Debugger is opened in response to an unhandled exception by sending a message. All the exception handling is implemented in term of sending messages.
In Smalltalk you can query the methods of a Class, or gather all its instances by sending messages.
More trivially, all control structures (branch, loops, ...) are performed by sending messages.
It's messages all the way down.

Can Foundation tell me whether an Objective-C method requires a special structure return?

Background as I understand it: Objective-C method invocations are basically a C function call with two hidden parameters (the receiver and the selector). The Objective-C runtime contains a function named objc_msgSend() that allows to invoke methods that way. Unfortunately, when a function returns a struct some special treatment may be needed. There are arcane (some might say insane) rules that govern whether the structure is returned like other values or whether it's actually returned by reference in a hidden first argument. For Objective-C there's another function called objc_msgSend_stret() that must be used in these cases.
The question: Given a method, can NSMethodSignature or something else tell me whether I have to use objc_msgSend() or objc_msgSend_stret()? So far we have found out that NSMethodSignature knows this, it prints it in its debug output, but there doesn't seem to be a public API.
In case you want to respond with "why on earth would you want to do that?!", please read the following before you do: https://github.com/erikdoe/ocmock/pull/41
Objective-C uses the same underlying ABI for C on a given architecture, because methods are just C functions with implicit self and _cmd arguments.
In other words, if you have a method:
- (SomeStructType)myMeth:(SomeArgType)arg;
then really this is a plain C function:
SomeStructType myMeth(id self, SEL _cmd, SomeArgType arg);
I'm pretty sure you already know that, but I'm merely mentioning it for other readers.
In other words, you want to ask libffi or any kind of similar library how SomeStructType would be returned for that architecture.
NSMethodSignature has a -methodReturnType that you can inspect to see if the return type is a struct. Is this what you're trying to do?
From http://www.sealiesoftware.com/blog/archive/2008/10/30/objc_explain_objc_msgSend_stret.html:
The rules for which struct types return in registers are always
arcane, sometimes insane. ppc32 is trivial: structs never return in
registers. i386 is straightforward: structs with sizeof exactly equal
to 1, 2, 4, or 8 return in registers. x86_64 is more complicated,
including rules for returning floating-point struct fields in FPU
registers, and ppc64's rules and exceptions will make your head spin.
The gory details are documented in the Mac OS X ABI Guide, though as
usual if the documentation and the compiler disagree then the
documentation is wrong.
If you're calling objc_msgSend directly and need to know whether to
use objc_msgSend_stret for a particular struct type, I recommend the
empirical approach: write a line of code that calls your method,
compile it on each architecture you care about, and look at the
assembly code to see which dispatch function the compiler uses.

what was the second parameter in "id (*IMP)(id, SEL, ...) " used for?

my question as the title says.obviously, the first parameter was used for this pointer , in some taste of c++.what about the second one? thak you.
The signature of objc_msgSend() is:
id objc_msgSend(id self, SEL op, ...);
Every method call is compiled down to a call to this function. I.e., if you call:
[anArray objectAtIndex:42];
That will be compiled as if it were:
objc_msgSend(anArray, #selector(objectAtIndex:), 42);
Now, to your question, why do methods get compiled down to a function that has the SEL as the second argument. Or, more specifically, why is this method:
- (id)objectAtIndex:(NSUInteger)index;
Exactly equivalent to this C function:
id object_at_index(id object, SEL _cmd, NSUInteger index);
The answer is speed speed speed.
Speed
Specifically, by doing this, then objc_msgSend() never has to rewrite the stack frame* and it can also use a tail call optimization to jump directly to the method invocation. This is the same reason why you never see objc_msgSend() in backtraces in the debugger (save for when you actually crash/break in the messenger).
objc_msgSend() uses the object and the _cmd to look up the implementation of the method and then, quite literally, jumps to that implementation.
Very fast. Stack frame untouched.
And, as others have stated, having _cmd around in the method implementation can be handy for a variety of reasons. As well, it also means that the messenger can do neat tricks like proxy support via NSInvocation and the like.
*rewriting the stack frame can be insanely complex and expensive. Some of the arguments might be in registers some of the time, etc... All architecture dependent ABI nastiness. One of the biggest challenges to writing things like imp_implementationWithBlock() was figuring out how to do so without touching the stack because doing so would have been too slow and too bloated to be viable.
The purpose of having the second parameter contain the selector is to enable a common dispatch mechanism. As such, the method dispatch code always expects the second parameter to be the selector, and dispatches based on that, or follows the inheritance chain up, or even creates an NSInvocation and calls forwardInvocation:.
Generally, only system-level routines use the selector argument, although it's rather nice to have it when you hit an exception or are in the debugger trying to figure out what routine is giving you difficulties if you are using forwardInvocation
From the documentation:
Discussion
This data type is a pointer to the start of the function that implements the method. This function uses standard C calling conventions as implemented for the current CPU architecture. The first argument is a pointer to self (that is, the memory for the particular instance of this class, or, for a class method, a pointer to the metaclass). The second argument is the method selector. The method arguments follow.
In Objective-C when you call a method you need to know the target, the selector and the eventual arguments. Let's suppose that you are trying to do this manually: how can you know which method to call if you don't know the selector? Do you call some random method? No, you call the right method because you know the method name.

How to implement an IMP function that returns a large struct type determined at run-time?

Background: CamelBones registers Perl classes with the Objective-C runtime.
To do this, every Perl method is registered with the same IMP
function; that function examines its self & _cmd arguments to find
which Perl method to call.
This has worked well enough for several years, for messages that were
dispatched with objc_msgSend. But now I want to add support for
returning floating-point and large struct types from Perl methods.
Floating-point isn't hard; I'll simply write another IMP that returns
double, to handle messages dispatched with objc_msgSend_fpret.
The question is what to do about objc_msgSend_stret. Writing a
separate IMP for every possible struct return type is impractical, for
two reasons: First, because even if I did so only for struct types
that are known at compile-time, that's an absurd number of functions.
And second, because we're talking about a framework that can be linked against any arbitrary Objective-C & Perl code, we don't know all the potential struct types when the framework is being compiled.
What I hope to do is write a single IMP that can handle any return
type that's dispatched via objc_msgSend_stret. Could I write it as
returning void, and taking a pointer argument to a return buffer, like
the old objc_msgSend_stret was declared? Even if that happened to
work for now, could I rely on it continuing to work in the future?
Thanks for any advice - I've been racking my brain on this one. :-)
Update:
Here's the advice I received from one of Apple's runtime engineers, on their objc-language mailing list:
You must write assembly code to handle
this case.
Your suggestion fails on some
architectures, where ABI for "function
returning void with a pointer to a
struct as the first argument" differs
from "function returning a struct".
(On i386, the struct address is popped
from the stack by the caller in one
case and by the callee in the other
case.) That's why the prototype for
objc_msgSend_stret was changed.
The assembly code would capture the
struct return address, smuggle it into
non-struct-return C function call
without disturbing the rest of the
parameters, and then do the right
ABI-specific cleanup on exit (ret $4
on i386). Alternatively, the assembly
code can capture all of the
parameters. The forwarding machinery
does something like this. That code
might be in open-source CoreFoundation
if you want to see what the techniques
look like.
I'll leave this question open, in case someone brainstorms a better idea, but with this coming directly from Apple's own "runtime wrangler," I figure it's probably as authoritative an answer as I'm likely to get. Time to dust off the x86 reference manuals and knock the rust off my assembler-fu, I guess...
It seems that the Apple engineer is right: the only to way to go is assembly code. Here are some usefull pointers to getting started:
From the Objective-C runtime code: The i386 and x86_64 hand-crafted messenger assmbly stubs for the various messaging methods.
An SO answer that provides an overview of the dispatching.
A in-depth review of the dispatching mecanism with a line-by-line analysis of the assembly code
Hope it helps.