How can i Lifting x86_64 assembly code to LLVM-IR? - virtual-machine

I'm researching of virus and I'm faced with the task of deobfuscating its virtual machine. I chose to do this through LLVM and I had a question, where can I see a simple example of lifting instructions to the LLVM-IR level? For example, where can I look at code that just translate one pop rsp instruction to LLVM-IR? Since I didn't find anything like that.
Maybe someone has articles where this is described or can someone suggest with an example?

Here is a list of similar tools you could try:
MeSema relies on IDA Pro to disassemble a binary file and produce a control flow graph. Then it can convert the control flow graph into LLVM IR.
llvm-mctoll is easy to use, but SIMD instructions such as SSE, AVX, and Neon cannot be raised.
retdec is a retargetable machine-code decompiler
reopt is a general purpose decompilation and recompilation tool, support x86-64 Linux programs.

Related

PROTEUS: How to add a binary firmware, from an unsupported compiler?

In ISIS design tool there are many options of compilers, as well as microcontroller families and their variants with which we can perform compilation and simulation in the Proteus IDE.
However, although it is possible to install other compilers that are not configured by default, there is no option for many other compilers, such as for example the PIC32 from Microchip.
Is there any way to do that ?
I am a bit confused since you mentioned binary image in the question title and talks about compilation in the body.
Proteus does support loading pre compiled images in hex of elf format.

Is it possible to embed LLVM Interpreter in my software and does it make sense?

Suppose I have a software and I want to make cross-plataform plugins. You compile the plugin for a virtual machine, and any platform running my software would be able to run this code.
I am wondering if it is possible to use LLVM interpreter and bytecode for this purpose. Also, I am wondering if does make sense using LLVM for this purpose instead of something else, i.e. is it what LLVM was made for?
I'm not sure that LLVM was designed for it. However, I doubt there is anything that hasn't been done using LLVM1
Other virtual-machines based script engines are specifically created for the job:
LUA is very popular
Wikipedia lists some other Extension/embeddable languages under the Scripting language entry
If you're looking for embeddable virtual machines:
IKVM supports embedding JVM and CLR in a bridged mode (interoperable)
Parrot supports embedding (and includes a Python interpreter; mind you, you can just run python bytecode images)
Perl has similar architecture and supports embedding
Javascript supports embedding (not sure about the architecture of v8, but I guess it would use a virtual machine)
Mono's CLR engine supports embedding: http://www.mono-project.com/Embedding_Mono
1 including compiling c++ information to javascript to run in your browser...
There is VMIR (https://github.com/andoma/vmir) which is a LLVM bitcode interpreter / JIT engine that's intended to be embedded into other apps.
Disclaimer: I'm the author of it and it's still work-in-progress but works reasonable well.
In theory, there exist a limited subset of LLVM IR which can be portable across various platforms. You shall not specify alignments, you shall not bitcast pointers to integral types, you must avoid intrinsics, etc. Which means - you can't immediately use a code generated by a stock C compiler (llvm-gcc, Clang, whatever), unless you specify a limited target for it and implement sanitising LLVM passes. Another issue is that the bitcode format from different LLVM versions is not guaranteed to be compatible.
In practice, I would not go there. Mono is a reasonably small, embeddable, fast VM, and all the .NET stack of tools is available for it. VM itself is pretty low-level (as long as you do not care about the verifyability).
LLVM includes an interpreter, so if you can build this interpreter for your target platforms, you can then evaluate LLVM bitcode on the fly.
It's apparently not so fast though.
In their classic discussion (that you do not want to miss if you're a fan of open source, LLVM, compilers) about LLVM vs libJIT, that has happened long before LLVM became famous and established, the author of libJIT Rhys Weatherley raised this particular issue, he stated that LLVM is not suitable to be embedded, while Chris Lattner, the author of LLVM stated that otherwise, it is modular and you can use it in any possible fashion including embedding only the parts you need.

Where is the VM in LLVM?

Note: marked as community wiki.
Where is the Low Level Virtual Machine in LLVM?
I see that we have llvm-g++ and c-lang, but to me, a LLVM is something almost like Valgrind of a simulator, where instructions are executed on it, and I can write programs to instrument the running code / interrupt when certain conditions happen / etc ...
Where are the tools like this built on LLVM?
Thanks!
I think you're looking for QEMU, not LLVM.
The low-level virtual machine in LLVM is that, after converting the higher-level C and C++ language input into an internal low-level representation (as a stage in the normal compiling process), it can then save this low-level representation and execute it on a JIT compiler (which thus acts somewhat like a virtual machine). This JIT compiler does a substantial amount of optimization, and so I expect it would be difficult to instrument in quite the form that you're thinking of -- in particular, it does not do instruction-by-instruction stepping through the execution.
QEMU, by contrast, is an open-source emulator that does instruction-by-instruction stepping through of machine code. It already contains a certain amount of ability to instrument code to look for certain conditions, in that it can connect to GDB and set watchpoints and so forth, which are implemented in QEMU itself.
To use LLVM for running x86 code you should check libCPU or outdated llvm-qemu.
Look at running x86 program _on_ llvm

What is a good VM for developing a hobby language?

I'm thinking about writing my own little language.
I found a few options, but feel free to suggest more.
JVM
Parrot
OSA
A lot of languages are using the JVM, but unless you write a Java-ish language, all the power the stdlib gives you is going to feel ugly; It's not very good at dynamic stuff either.
Parrot seems a good VM for developing languages, but it has a little abandoned/unfinished/hobby project smell to it.
OSA is what powers Applescript, not a particularly well known VM, but I use Mac, and it offers good system integration.
CLR+Mac doesn't seem a good combination...
My language is going to be an object orientated functional concurrent dataflow language with strong typing and a mix of Python and Lisp syntax.
Sounds good, eh?
[edit]
I accepted Python for now, but I'd like to hear more about OSA and Parrot.
One approach I've played with is to use the Python ast module to build an abstract syntax tree representing the code to run. The Python compile function can compile an AST into Python bytecode, which exec can then run. This is a bit higher level than directly generating bytecode, but you will have to deal with some quirks of the Python language (for example, the fundamental difference between statements and expressions).
In doing this I've also written a "deparse" module that attempts to convert an AST back to equivalent Python source code, just for debugging. You can find code in the psil repository if you're interested.
Have a look at LLVM. It's not a pure VM as such, more a framework with it's own IR that allows you to build high level VMs. Has nice stuff like static code analysis and JIT support
Lua has a small, well-written and fast VM
Python VM - you can really attach a new language to it if you want. Or write (use?) something like tinypy which is a small and simple implementation of the Python VM.
Both options above have access to useful standard libraries that will save you work, and are coded in relatively clean and modular C, so they shouldn't be hard to connect to.
That said, I disagree that Parrot is abandoned/hobby. It's quite mature, and has some very strong developers working on it. Furthermore, it's specifically a VM designed to be targeted by multiple dynamic languages. Thus, is was designed with flexibility in mind.
Have you considered Pypy? From what I've read, in addition to being a Python JIT Compiler, it also has the capability to handle other languages. For example there is a tutorial which explains how to create a Brainfuck JIT compiler using Pypy.

Arm Board Bring Up

Can anybody tell me where I can find information related to How to Bringup any arm board? I am looking for an overview as I am novice in ARM related stuffs. Any link/document will do ...It will be gr8 help if i can look for a case-study
any arm based board can be considered..I am looking for just a case study...simple in few steps??
Every single ARM "board" will be different. Read the datasheet for the ARM chip you have, that should have a section near the start about booting. Also, read the datasheet about your board, as it made have flash/boot loaders on there. If there are no loaders on the board, you'll have to either set the jumpers for the ARM (if that type supoprts it) to read from external rom, or JTAG the initial boot code into it.
Basically: Read the datasheets. Programming a device like an ARM isn't your usual compile/run stratergy like most software, especially not in the first stage.
edit:
If you don't even have a board yet, try going for this one:
http://beagleboard.org/
It has and ARM on it (as well as a decent GPU).
Check the DLP-2232PB-G evaluation kit from FTDI. Looks great for newbies trying to get into microcontrollers, and it comes with everything you need. It's a PIC controller - not an ARM controller, but the easiest starting point that I've seen... and same basic methods of development.
I would start with any documentation the IC manufacturer may have on "getting started".
http://free-electrons.com/doc/porting-kernel.odp
This link gives a good overview of the bringup of the board with a CPU for which the linux support package is available.
Linux sources in arch/arm have mach-* which are cpus supported by Linux Kernel.
With in the mach-* dir, there are some board specific files that are board specific BSPs.
You can take the process elucidated in this article and try using in your case.
Check out the ok6410-h at http://www.arm9board.net/sel/prddetail.aspx?id=348&pid=200
Quit a nice kicking-start kit coming with everyting you would ever need: documentations, source code, example programs.
recommendable for both newbies and experienced.