Parsing Objective-C code for static analysis

Parsing Objective-C code for static analysis - objective-c

I love static analysis and compile-time checks, almost to a fault, but most of my day job is in Objective-C. To resolve this tension, I'd like to be able to write my own analysis tools that I can run on my Objective-C projects.
But googling around the Internet suggests that people are having a hard time putting together a complete Objective-C grammar. One site basically recommends giving up.
I did find a grammar on the ANTLR website, but when I fired it up, I couldn't get it to parse anything at all. For example, it responded to the line:
void x();
with src/main/resources/somecode.m line 1:0 no viable alternative at input 'void'
:(
I took a closer look at the grammar and found the following disheartening disclaimer:
it's a work in progress, most of the .h file can be parsed
But I need something that can parse both interface and implementation.
Is there a complete Objective-C 2.0 grammar out there somewhere? I'd prefer something that can work with Scala (so anything Java compatible, like ANTLR, would be perfect), but at this point I'd be willing to adapt something designed for another parser toolkit.

As others mentioned, Clang would be the right solution. You can provide your own AST consumers, i.e. classes that will be invoked when going over the AST, leaving you not having to worry about parsing or messing with grammar.
Clang supports Objective-C in its entirety, and there's a lot of classes already in the static analyzer that you can model your own checks after. (in clang/lib/StaticAnalyzer/Checkers).
That directory contains a lot of static analyzer checkers, but you can also just create a normal AST consumer. Refer to http://code.google.com/p/chromium/wiki/WritingClangPlugins for more information.

Clang is a static analysis tool that has support for Objective-C. I've found it very useful in the past.
http://clang-analyzer.llvm.org/

clang is extensible; you can extend their existing static analysis or create your own. llvm / clang is architected as a series of libraries you can link to (dynamically or statically). A great starting point is the ARC (automatic reference counting) migrator library, which is responsible for statically analysing and rewriting objective-c code.
arcmt-test is a small example program that consumes the ARC migrator library.

You can use OCDepend, it's a static analysis tool based on Clang that simplifies managing Objective-C code quality and provides a highly flexible code query framework.

Related

When analyzing a binary compiled from Swift, is it possible to figure out the Swift method name for a function that has no symbol?

I'm new to disassembling and reverse engineering binaries, so forgive me if this question is nonsensical or impossible.
In the past when I've tried reverse engineering macOS binaries, analyzing the ones written in Objective-C yielded a lot of useful information, because generally all of the Objective-C classes and their method names were easily retrievable, making it a lot easier to figure out what any particular method did.
I'm trying to analyze a binary written in Swift (technically a combination of Swift and Objective-C), and most of the functions now have no symbol. There are some Objective-C methods that I can retrieve as usual, and a few functions that have a Swift-style mangled name, but nearly all of the rest have no symbol. I know a lot of those have to be Swift methods.
Is there anyway to figure out what this binary's Swift classes are and their associated methods like I can with Objective-C?
Using a tool like Hopper Disassembler reveals the mangled names of some Swift classes (usually a symbol like _TtC4Something25SomethingElse) and I can get a list of its instance variable names and their offsets, but no method names.
Note: the binary in question is an x64 macOS binary, not an iOS binary.

Usually reverse engineeing is the process of extracting meaningful constructs and descriptions from assembly. What you've done so far is usually only the first part of a "normal" reverse engineeing task.
This may sometimes be a tedious process, which involves mapping structures and understanding the meaning of functions directly from thier assembly code.
There are pleanty of reverse engineeing tutorials and other sources, and a good understanding of the relevant assembly language is required.
I really recommand this book (it's legally available online, original version is chm released by author) and this cannot easily be covered in a single SO question.
You might also want to get more specific help in the reverse engineeing SE beta.
I hope I pointed you in the right direction.

Is it a bad idea to use .mm files instead of .m just in case I use C++ later?

Assume I'm developing a typical Mac or iOS application using Apple's latest Xcode tools. Further assume that I am primarily developing this application using Objective-C and leveraging all of the relevant APIs from Apple's Cocoa or Cocoa Touch frameworks.
Let's say that I don't currently have any plans to use C++ or Objective-C++ in my code base, but I suspect that some time in the future I might want to sprinkle in a little Objective-C++ here an there.
So I'm considering naming all of my .m files as .mm instead, just in case. (This will have the desireable effect of a cleaner history in my SCM system, as I won't have to rename files later.)
Is this a bad idea? Is there any reason why using .mm files is definitely or significantly worse than using .m when the file doesn't actually contain any Objective-C++?
Presumably this file extension flips some switch in the compiler which will then have to parse the source code for not only ObjC, but also C++. Does this have a significant negative effect on build times for moderate-to-large code bases?
Does it have any other negative (or positive) effects that I should keep in mind?
NOTE: please do not respond with any comments about whether ObjC or C++ is better. That is not what this question is about.

It's not the worst idea, but it's not really a good idea, either.
The main purpose of Objective-C++ is to act as a bridge for Objective-C code that needs to use a C++ library. Thus, in most projects, almost all of the code is plain old Objective-C, with maybe a few .mm files to create a "wrapper" object to talk to the C++ library.
Therefore, it is extremely unlikely that you will need to change significant parts of your code over from Objective-C to Objective-C++. You shouldn't have a lot of file renames in your SCM history.
The main problem with using Objective-C++ everywhere is that you will be following "the road less traveled": 99% of the tutorials you read and open-source code you use and learn from will all be written to be compiled by the Obj-C compiler. Using the Obj-C++ compiler will be mostly the same, and probably won't make a difference most of the time, but you will eventually run into some problem that is due to Obj-C++ being compiled slightly differently, but when you find the bug it won't be obvious, and you'll spend a lot of time trying to diagnose it before you realize that it is because you are using a less well-tested compiler setup.
If you have a lot of C++ experience and find yourself "needing" features from C++ in your code, you probably don't really need them, you probably need to spend a little more time figuring out how to do the equivalent in Objective-C. When in Rome, do as the Romans do.
In general, "just in case" is not a good reason to stray from standard practice. You often wind up spending a lot of effort on something you aren't going to need.

Quote from Barry Wark:
The major disadvantage to using .mm over .m for "normal" Objective-C
is that compile times are significantly higher for Objective-C++. This
is because the C++ compiler takes longer than the C compiler. With
Xcode 3.2 and higher, Objective-C code can use the Clang frontend tool
chain to significantly speed up Objective-C/C compiling times. Since
Clang does not yet support Objective-C++/C++, this further widens the
gap in compiling times between the two.
BUT
UPDATE Feb 17, 2012 As of Xcode 4.0 (with LLVM 3.0), Clang has
supported Objective-C++. Even C++11 support is quite strong now.
So I think that its ok to use .mm as long as if you only use C features, .mm files should generate code that performs very similar to .m

As I wrote in a comment, C++ is not a strict superset of C, so it's possible you'd run into cases where you use e.g. C99 code which will not compile if you put it in an Objective-C++ file. I had this problem recently using C99 compound literals.

Yes, it's bad idea.
When I see a .mm file, I expect it to have C++ code (in addition to Objective-C of course). There are a few things not directly related to OOP that are a bit different in C++ comparing to C.
So name all your Objective-C files as .m. As soon as you need any C++ features – rename it to .mm and verify that everything works.
You get bonus points if you keep your header files C++–less.

.mm extension means Objective-C++ file. Compiler takes more time to compile c++ code than C code.
So, if it is not required, keep the extension as .m only.

From my experience (at Apple):
1) the xcode team thinks about c++ last (took forever to get blocks support in objc++)
2) objc++ is much slower in compiling

Why doesn’t Objective-C have namespaces?

Why doesn’t Objective-C have namespaces? It seems like a simple feature that would make some class names more readable (AVMutableVideoCompositionLayerInstruction anyone?) and axe the silly letter prefixes on class names. Is this mainly because of backwards compatibility? Is it harder to implement namespaces than it seems?

I don't know the answer but I suspect "it's harder than it looks" is probably it. You would have to introduce support in the compiler and linker in a way that doesn't break existing software. And while this is obviously possible (C++ has already done it) presumably the tool chain team have had higher priorities on their plate. e.g. in the recent past we have had garbage collection, GCD, blocks and Objective-C 2.0 appear so we can't say they have been doing nothing.
Namespace support is the one thing that I would dearly love to see introduced to Objective-C.

I don't know if you want to know if there is some official decision.
But namespace like many other feature are choice, choice made by the language contributor.
PHP only recently introduced Namespace, and for example Java use package that act like namespace or python use modules.
I think that there is an overhead in namespace implementation, mainly because Objective-c is dinamically typed so at runtime you have to make some check to resolve the namespace, to resolve default behaviour,etc.. and I can suppose that because Objective-c is also used in embedded enviroment (AKA iPhone) speed is very important.
You've to wrap everything I've said in a big IMHO ;D
Update:
I found this very interesting discussion http://clang-developers.42468.n3.nabble.com/Adding-namespaces-to-Objective-C-td1870848.html#a1872744 on the clang developer website explaining the reason why is definitely non-trivial to implement namespace in Obj-C.

Programming features missing in C++ and Java

What are the programming features that are missing in C++ and Java ?
For eg. You can't do recursive programming in QBasic ? You can't dynamically allocate memory in QBasic.
What would be the good to have features in C++, Java.
I think Lisp Programmers will be able to add a few.

I miss lambda expressions.

This answer deals only with C++
Things I miss from the syntax, or the standard library:
RegExp as part of the standard library
Threads as part of the standard library
Pointer to member methods (not objects!)
Properties would be nice (I have seen codes that emulate this via C++ preprocessor... note an nice looking code).
Some lower level networking API (sockets!), and higher level API (give me this file from this ftp, submit "this" to this site via POST).
This is the list of things I would like to see, but I assume other people will disagree with me.
Memory garbage collector is nice.
A n interface for a GUI toolkit - let MSVC map it to win32, and on Linux... (good question!)
A stable ABI. In C it's a standard - but on C++ we are still missing a few decades. I want also stable ABI between compilers - I want to compile one library in MinGW, the other with CL and all should work.
This is the list of things I want to see, but I know they will not get away:
Compatibility with C. Really, it's a myth right now. using namespace std killed it.
Include, headers. Most of the information is already available in the DLL/so/a/"library", do we really need to keep this bad decision from 30 years ago? If needed the compilers should keep information in the binaries.
The need for Makefiles - the compiler should be smart enough to know what to do with this code, from the code itself. Pascal is doing it quite good. I think also D.
(I might be wrong, please correct me) The official standard openly and freely available for viewing. Why should I pay for the official papers? Do I need to do it for HTTP? UTF8? Unicode?

I think this is a very subjective question. From a theoretical point of view there's nothing "missing" in Java because you can do everything you want to from the perspective of the outcome as an application.
As with QBasic - recursion may not be possible but that doesn't prevent you from changing your recursive algorithm to an iterative algorithm. Programming language theory tells us that you can do this with every recursive problem. So there's also nothing missing here.
I think what you mean are features that are "nice to have" - and here everyone has to decide for himself. I'd even say there are features in the language which would have been "nice not to have" such as static imports - but again this is my subjective opinion...

Creating your own language

If I were looking to create my own language are there any tools that would help me along? I have heard of yacc but I'm wondering how I would implement features that I want in the language.

Closely related questions (all taken by searching on [compiler] on stackoverflow):
Learning Resources on Parsers, Interpreters, and Compilers
Learning to write a compiler
Constructing a simple interpreter
...
And similar topics (from the same search):
Bootstrapping a language
How much of the compiler should we know?
Writing a compiler in its own language
...
Edit: I know the stackoverflow related question search isn't what we'd like it to be, but
did we really need the nth iteration of this topic? Meh!

The first tool I would recommend is the Dragon Book. That is the reference for building compilers. Designing a language is no easy task, implementing it is even more difficult. The dragon book helps there. The book even reference to the standard unix tools lex and yacc. The gnu equivalent tools are called flex and bison. They both generate lexer and parser. There exist also more modern tools for generating lexer and parser, e.g. for java there are ANTLR (I also remember javacc and CUP, but I used myself only ANTLR). The fact that ANTLR combines parser and lexer and that eclipse plugin is availabe make it very comfortable to use. But to compare them, the type of parser you need, and know for what you need them, you should read the Dragon book. There are also other things you have to consider, like runtime environment, programming paradigm, ....
If you have already certain design ideas and need help for a certain step or detail the anwsers could be more helpful.

ANTLR is a very nice parser generator written in Java. There's a terrific book available, too.

I like Flex (Fast Lex) [Lexical scanner]
and Bison (A Hairy Yacc) [Yet another compiler compiler]
Both are free and available on all *NIX installations. For Windows just install cygwin.
But I old school.
By using these tools you can also find the lex rules and yacc gramers for a lot of popular languages on the internet. Thus providing you with a quick way to get up and running and then you can customize the grammers as you go.
Example: Arithmetic expression handling [order of precedence etc is a done to death problem] you can quickly get the grammer for this from the web.
An alternative to think about is to write a front-end extension to GCC.
Non Trivial but if you want a compiled language it saves a lot of work in the code generation section (you will still need to know love and understand flex/bison).

I never finished the complete language, I had used rply and llvmlite implements a simple foxbase language, in https://github.com/acekingke/foxbase_compiler
so if you want use python, rply or llvmlite is helpful.
if you want use golang, goyacc maybe useful. But you should write a lexical analyzer by hard coding by hand. Or you can use https://github.com/acekingke/lexergo to simplify it.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas