Assume I have a Cocoa-based Mac or iOS app. I'd like to run a static analyzer on my app's source code or my app's binary to retrieve a list of all Objective-C methods called therein. Is there a tool that can do this?
A few points:
I am looking for a static solution. I am not looking for a dynamic solution.
Something which can be run against either a binary or source code is acceptable.
Ideally the output would just be a massive de-duped list of Objective-C methods like:
…
-[MyClass foo]
…
+[NSMutableString stringWithCapacity:]
…
-[NSString length]
…
(If it's not de-duped that's cool)
If other types of symbols (C functions, static vars, etc) are present, that is fine.
I'm familiar with class-dump, but AFAIK, it dumps the declared Classes in your binary, not the called methods in your binary. That's not what I'm looking for. If I am wrong, and you can do this with class-dump, please correct me.
I'm not entirely sure this is feasible. So if it's not, that's a good answer too. :)
The closest I'm aware of is otx, which is a wrapper around otool and can reconstruct the selectors at objc_msgSend() call sites.
http://otx.osxninja.com/
If you are asking for finding a COMPLETE list of all methods called then this is impossible, both statically and dynamically. The reason is that methods may be called in a variety of ways and even be dynamically and programmatically assembled.
In addition to regular method invocations using the Objective-C messages like [Object message] you can also dispatch messages using the C-API functions from objc/message.h, e.g. objc_msgSend(str, del). Or you can dispatch them using the NSInvocation API or with performSelector:withObject: (and similar methods), see the examples here. The selectors used in all these cases can be static strings or they can even be constructed programmatically from strings, using things like NSSelectorFromString.
To make matters worse Objective-C even supports dynamic message resolution which allows an object to respond to messages that do not correspond to methods at all!
If you are satisfied with only specific method invocations then parsing the source code for the patterns listed above will give you a minimal list of methods that may be called during execution. But the list may be both incomplete (i.e., not contain methods that may be called) as well as overcomplete (i.e., may contain methods that are not called in practice).
Another great tool is class-dump which was always my first choices for static analysis.
otool -oV /path to executable/ | grep name | awk '{print $3}'
Related
As a self-taught programmer, my definitions get fuzzy sometimes.
I'm very used to C and ObjC. In both of those your code must adhere to the language "structure". You can only do certain things in certain places. As an example, this is an error:
// beginning of file
NSLog(#"Hello world!"); // can't do this
#implementation MYClass
...
#end
However, in Ruby, anything you put anywhere is executed as the interpreter goes through it. So what is the difference between Ruby and Objective-C that allows this?
At first I thought it was that one was interpreted and the other compiled. Then I read some SO posts and the wikipedia definitions. Interpreted or compiled is a property of the implementation not the language. So that would mean there could (theoretically) be an interpreted implementation of Objective-C? In that case, the fact that a statement cannot be outside the implementation can't be a property of compiled languages, and vice-versa if there was a compiled implementation of Ruby. Or am I wrong in assuming that different implementations of a language would work the same way?
I'm not sure there's a technical term for it, but in most programming languages the context of the statement is extremely important.
Ruby has a concept of a root or main context where code is allowed. Other scripting languages follow this convention, presumably made popular by languages like Perl which allowed for very concise programming.
This allows things like this to be a complete and valid program:
print "Hello world!\n"
In other languages you need to define an entry point, such as a main routine, that is executed instead. Arbitrary code is not really allowed at the top level, which instead is reserved for things like function, type, constant, structure and class definitions.
A language like Ruby has a lot of control over the order in which the code is executed. C, by comparison, is usually composed of separate source files that are then linked together, where there's no inherent order to the way things are linked. All the modules are simply assembled into the final library or executable. This is why the main entry point is required, it defines which function to run first.
In short, it boils down to syntax, context, and language design considerations.
Ruby hides lots of stuff.
Ruby is OO like C++, Objective C and Java, and has main like C but you don't see this.
puts(42) is method call. It is a method of the main object called main. You can see it by typing puts self.
If you don't specify the receiver (receiver.method()) Ruby will use the implicit one, main.
Check available methods:
puts Object.private_methods.sort
Why you can put everything anywhere?
C/C++ look for main method called main, and when C/C++ find it, it will be executed.
Ruby on other hands doesn't need main or other method/class to run first.
It execute code from the first line until it meet the end of file(or __END__ on the separate line).
class Strongman
puts "I'm the best!"
end
is just syntactic sugar for Class.new method:
Strongman = Class.new do
puts "I'm the best!"
end
The same goes for 'module`.
for calls each and returns some kind of object. So you may think of it as something similar to method.
a = for i in 1..12; 42;end
puts a
# 1..12
In the end, it doesn't matter if it is method call or some kind of structure like C's int main(). Programming language decides what it should run first.
I need to generate function name and then call it.
Is it possible to do like in php
<?php call_user_func_array(array($object, $method));?>?
There are four options:
Make the methods you want to call like this signals. Signals can be emited by name GLib.Signal.emit_by_name (g_signal_emit_by_name). The call is from GLib mode, but other modes with signal support are likely to have similar method.
Create a static table/hash table of delegate objects manually in code. The main advantage is that it is type-safe. Disadvantage is that you have to add each method in two places. It will also work in all vala modes.
Another option is to tell vala compiler to build the "gir" binding and use the GObject Introspection library to call the functions. That is much more complicated, but the compiler will maintain the list of available methods for you. This method is specific to the GLib mode.
The last option is to use the GLib.Module.symbol (g_module_symbol) function of GLib to find the symbol. You'll need to know the "mangled" C name of the symbol and it will not be type-safe. You will have to match argument types exactly and mind where the invocant should go. It avoids the overhead of GIR, but unlike GIR it can't tell you which methods exist, only get you a specific one. This method is used when connecting signals in GtkBuilder. I mentioned the function from GLib, but POSIX.dlsym can be used the same way.
ObjC has a very unique way of overriding methods. Specifically, that you can override functions in OSX's own framework. Via "categories" or "Swizzling". You can even override "buried" functions only used internally.
Can someone provide me with an example where there was a good reason to do this? Something you would use in released commercial software and not just some hacked up tool for internal use?
For example, maybe you wanted to improve on some built in method, or maybe there was a bug in a framework method you wanted to fix.
Also, can you explain why this can best be done with features in ObjC, and not in C++ / Java and the like. I mean, I've heard of the ability to load a C library, but allow certain functions to be replaced, with functions of the same name that were previously loaded. How is ObjC better at modifying library behaviour than that?
If you're extending the question from mere swizzling to actual library modification then I can think of useful examples.
As of iOS 5, NSURLConnection provides sendAsynchronousRequest:queue:completionHandler:, which is a block (/closure) driven way to perform an asynchronous load from any resource identifiable with a URL (local or remote). It's a very useful way to be able to proceed as it makes your code cleaner and smaller than the classical delegate alternative and is much more likely to keep the related parts of your code close to one another.
That method isn't supplied in iOS 4. So what I've done in my project is that, when the application is launched (via a suitable + (void)load), I check whether the method is defined. If not I patch an implementation of it onto the class. Henceforth every other part of the program can be written to the iOS 5 specification without performing any sort of version or availability check exactly as if I was targeting iOS 5 only, except that it'll also run on iOS 4.
In Java or C++ I guess the same sort of thing would be achieved by creating your own class to issue URL connections that performs a runtime check each time it is called. That's a worse solution because it's more difficult to step back from. This way around if I decide one day to support iOS 5 only I simply delete the source file that adds my implementation of sendAsynchronousRequest:.... Nothing else changes.
As for method swizzling, the only times I see it suggested are where somebody wants to change the functionality of an existing class and doesn't have access to the code in which the class is created. So you're usually talking about trying to modify logically opaque code from the outside by making assumptions about its implementation. I wouldn't really support that as an idea on any language. I guess it gets recommended more in Objective-C because Apple are more prone to making things opaque (see, e.g. every app that wanted to show a customised camera view prior to iOS 3.1, every app that wanted to perform custom processing on camera input prior to iOS 4.0, etc), rather than because it's a good idea in Objective-C. It isn't.
EDIT: so, in further exposition — I can't post full code because I wrote it as part of my job, but I have a class named NSURLConnectionAsyncForiOS4 with an implementation of sendAsynchronousRequest:queue:completionHandler:. That implementation is actually quite trivial, just dispatching an operation to the nominated queue that does a synchronous load via the old sendSynchronousRequest:... interface and then posts the results from that on to the handler.
That class has a + (void)load, which is the class method you add to a class that will be issued immediately after that class has been loaded into memory, effectively as a global constructor for the metaclass and with all the usual caveats.
In my +load I use the Objective-C runtime directly via its C interface to check whether sendAsynchronousRequest:... is defined on NSURLConnection. If it isn't then I add my implementation to NSURLConnection, so from henceforth it is defined. This explicitly isn't swizzling — I'm not adjusting the existing implementation of anything, I'm just adding a user-supplied implementation of something if Apple's isn't available. Relevant runtime calls are objc_getClass, class_getClassMethod and class_addMethod.
In the rest of the code, whenever I want to perform an asynchronous URL connection I just write e.g.
[NSURLConnection sendAsynchronousRequest:request
queue:[self anyBackgroundOperationQueue]
completionHandler:
^(NSURLResponse *response, NSData *data, NSError *blockError)
{
if(blockError)
{
// oh dear; was it fatal?
}
if(data)
{
// hooray! You know, unless this was an HTTP request, in
// which case I should check the response code, etc.
}
/* etc */
}
So the rest of my code is just written to the iOS 5 API and neither knows nor cares that I have a shim somewhere else to provide that one microscopic part of the iOS 5 changes on iOS 4. And, as I say, when I stop supporting iOS 4 I'll just delete the shim from the project and all the rest of my code will continue not to know or to care.
I had similar code to supply an alternative partial implementation of NSJSONSerialization (which dynamically created a new class in the runtime and copied methods to it); the one adjustment you need to make is that references to NSJSONSerialization elsewhere will be resolved once at load time by the linker, which you don't really want. So I added a quick #define of NSJSONSerialization to NSClassFromString(#"NSJSONSerialization") in my precompiled header. Which is less functionally neat but a similar line of action in terms of finding a way to keep iOS 4 support for the time being while just writing the rest of the project to the iOS 5 standards.
There are both good and bad cases. Since you didn't mention anything in particular these examples will be all-over-the-place.
It's perfectly normal (good idea) to override framework methods when subclassing:
When subclassing NSView (from the AppKit.framework), it's expected that you override drawRect:(NSRect). It's the mechanism used for drawing views.
When creating a custom NSMenu, you could override insertItemWithTitle:action:keyEquivalent:atIndex: and any other methods...
The main thing when subclassing is whether or not your behaviour completes re-defines the old behaviour... or extends it (in which case your override eventually calls [super ...];)
That said, however, you should always stand clear of using (and overriding) any private API methods (those normally have an underscore prefix in their name). This is a bad idea.
You also should not override existing methods via categories. That's also bad. It has undefined behaviour.
If you're talking about categories, you don't override methods with them (because there is no way to call original method, like calling super when subclassing), but only completely replace with your own ones, which makes the whole idea mostly pointless. Categories are only useful for safely extending functionality, and that's the only use I have even seen (and which is a very good, an excellent idea), although indeed they can be used for dangerous things.
If you mean overriding by subclassing, that is not unique. But in Obj-C you can override everything, even private undocumented methods, not just what was declared 'overridable' like in other languages. Personally, I think it's nice, as I remember in Delphi and C++ I used to “hack” access to private and protected members to workaround an internal bug in framework. This is not a good idea, but at some moments it can be a life saver.
There is also method swizzling, but that's not standard language feature, that's a hack. Hacking undocumented internals is rarely a good idea.
And regarding “how can you explain why this can best be done with features in ObjC”, the answer is simple — Obj-C is dynamic, and this freedom is common to almost all dynamic languages (Javascript, Python, Ruby, Io, a lot more). Unless artificially disabled, every dynamic language has it.
Refer to the wikipedia page on dynamic languages for longer explanation and more examples. For example, an even more miraculous things possible in Obj-C and other dynamic languages is that an object can change it's type (class) in place, without recreation.
This guy came up with a pretty neat tool to generate a class dependency graph - however, it relies on parsing your source code and looking for #import directives.
http://seriot.ch/blog.php?article=20110124
https://github.com/nst/objc_dep/blob/master/objc-dep.py
This is neat, but I have a number of problems with this. Not least of which is it doesn't take into account imports of imports nor prefix headers nor whether-or-not the class(es) in the file referenced by the import are actually being used.
I'd like to do something more akin to class-dump and examine the Objective-C metadata stored in the Mach-O file to generate an in-memory representation of the class dependencies.
I'd rather not do this from scratch, so I'm wondering:
Has it already been done?
Is there an open-source library which would provide me with the foundational tools I need to extract this information (a library which examines the Mach-O file and creates a façade of the Objective-C information contained within - such that I could iterate over all of the classes, their methods, properties, ivars, etc and scan for references to other classes) I figure class-dump's source would be a good place to start.
If you have experience in this sort of thing, is what I'm trying to accomplish feasible?
What roadblocks will I need to overcome?
Has it already been done?
Not that I know of.
Is there an open-source library which would provide me with the
foundational tools I need to extract this information?
At the core of class-dump is libMachObjC which does exatly what you want, i.e. parse all classes/methods/ivars and more. The API is very clean, it should be very easy to use.
If you have experience in this sort of thing, is what I'm trying to
accomplish feasible?
Unfortunately, no because some classes don't declare the real class but use id instead. For example, here is the information that can be extracted from a class-dump of UIKit:
#interface UITableView : UIScrollView <NSCoding>
{
int _style;
id <UITableViewDataSource> _dataSource;
id _rowData;
...
The _rowData ivar type information is id but if you check at runtime you will see that _rowData is an instance of the UITableViewRowData class. This information is not present in the Mach-O binary so you have no way to find the relation between UITableView and UITableViewRowData. The same applies for method parameters.
Here's a solution that relies on information in mach.o files, and generates graph dependency based on that information: https://github.com/PaulTaykalo/objc-dependency-visualizer
Has it already been done?
yes - but i can't recommend a good public implementation
Is there an open-source library which would provide me with the foundational tools I need to extract this information (a library which examines the Mach-O file and creates a façade of the Objective-C information contained within - such that I could iterate over all of the classes, their methods, properties, ivars, etc and scan for references to other classes) I figure class-dump's source would be a good place to start.
most use cases would benefit by using the objc runtime facilities objc/... rather than examining the binary.
If you have experience in this sort of thing, is what I'm trying to accomplish feasible?
yes. i've done something similar using the objc runtime.
What roadblocks will I need to overcome?
that depends largely on the level of detail you want... implementation time if you find no such implementation, but i figure you will find a few options if you google the more esoteric functions in the objc runtime; perhaps you would find one in an (open) language binding or bridge?
if you do end up writing one yourself, you can get registered objc classes using objc_getClassList, then access the properties/information you want from there.
I have a failure Marshaling a data structure (error abstract type (Custom)). There is one known abstract type in use, namely Big_int. However that Marshals fine. There is no custom C code in the application. Apart from Nums, Unix library is also used (however I don't believe there are any active objects of that type). We're Marshal'ing with Closures.
Two (only) third party libraries are in use: OCS Scheme (Scheme interpreter, pure Ocaml) and Dypgen (extensible GLR parser, also pure Ocaml). The problem is with a new feature of Dypgen, saving a dynamically extended parser.
The Ocaml error message is next to useless (it doesn't identify which abstract type with Custom tag is the culprit).
We suspected Lexbuf as the culprit because it contains a closure over an Ocaml channel, and can't be Marshal'ed, but it seems this isn't the problem. So my question is:
Which standard library components can't be Marshall'd?
Weak arrays cannot be marshaled. I am not familiar with OCS Scheme, but I would expect an interpreter for a garbage-collected language written in OCaml to use weak pointers (they let you piggy-back on OCaml's memory management).
In OCaml's defense, I do not think that the Custom method block contains the name of the type (retrospectively, that seems like a good thing to have).
EDIT: Yep:
$ grep Weak ~/Downloads/ocs-1.0.3/src/*.ml
/Users/pascal/Downloads/ocs-1.0.3/src/ocs_sym.ml:module SymTable = Weak.Make (HashSymbol)
EDIT2:
As pointed out by ygrek, there is room for a name in the custom method block. I should also clarify that weak arrays are not custom values, since my answer seemed to imply that. Weak arrays have the Abstract tag and are chained using the first word of data so that the garbage collector can traverse them in special weak-pointer-related phases of the collection cycle.