How to do Binary instrumentation of syscall brk ? (x86-64 Linux) (maybe valgrind?) - valgrind

I'd like to instrument syscall brk (and other calls but this in first order, it's most important to me) in given binary (preferably on actual syscall/sysenter level (x86-64 and x86) of making sys_brk call).
Main goal:
A part of sandbox which gives fixed amount of memory to jailed process
So, I'd like to get rid of brk system calls (and most preferably others in next order) and simulate memory allocations under fixed limit. Fixed limit is memory space, available to program. (You can think about it like making a kind of sandbox with fixed amount of available memory)
How to implement (one of) some example possible solutions (or yours solution):
just changing instructions to NOP
As brk returns 0 on success, simulate it's successes with setting operations that sets memory (register) state , as brk would be called with success.
More complex... instrument with code (or function call) which simulates success memory allocations under fixed limit.
Most flexible (maybe overkill in my case) to change this syscall into function call and add provided function to binary.
Given binary is code that can be malicious in one of two (most preferably both :) ) forms:
shared library - here I can setup environment before function call (for example do brk call in controlled way)
program binary - in this case we need to give program fixed amount of memory (by caller, or on begining of program "one syscall"), cause it can not allocate. Example of calling such program should be included in answer.
As problem is highly connected with many other aspects, I tried do my best in separating it as question, but please give me advice if I should specify something more or less.
Answers with implementation, links to resources (books, tutorials) are welcome.
(I am most interested in Linux, and solution that is reliable, so that people preparing binaries, even in assembler, would not have to worry about execution of their code)

LD_PRELOAD will trap C calls to brk(), but it won't trap the actual system call (int/syscall instruction). There's no portable way to trap those, but on Linux, ptrace will do it. Memory can also be allocated to a program by mmap(), so you'll need to intercept that call too.
Of course, what it seems you're really looking for is rlimit().

Yeah, I don't think you want valgrind for this.
You can use LD_PRELOAD or linker tricks to capture brk(2): see these other discussions:
Function interposition in Linux without dlsym
Overriding 'malloc' using the LD_PRELOAD mechanism
Code might look like this:
#include <unistd.h>
#include <dlfcn.h>
/* prototype int brk(void *addr); */
static int (*real_brk)(void *addr) = NULL;
int brk(void * addr) {
real_brk = dlsym(RTLD_NEXT, "brk");
if (real_brk == NULL) {
fprintf(stderr, "error mapping brk: %s\n", dlerror());
return -1;
}
printf("calling brk(2) for %p\n", addr);
return (real_brk (addr));
}`
and then LD_PRELOAD that to intercept brk(2)

Related

OBJC_PRINT_VTABLE_IMAGES and OBJC_PRINT_VTABLE_SETUP does not show any output

I've tried to use OBJC_PRINT_VTABLE_IMAGES and OBJC_PRINT_VTABLE_SETUP environmental variables on Objective-C executable in order to learn about vtable mechanism in Objective-C objects. Unfortunately the mentioned environment variables have no effect on console output, despite the fact that runtime acknowledged that the variables were set:
ยป OBJC_PRINT_OPTIONS=1 OBJC_PRINT_VTABLE_IMAGES=YES /Applications/TextEdit.app/Contents/MacOS/TextEdit
objc[41098]: OBJC_PRINT_OPTIONS is set
objc[41098]: OBJC_PRINT_VTABLE_IMAGES is set
I've tried to use both variables on executables provided by system (TextEdit) and my own. With no effect.
Whole vtable mechanism in Objective-C objects is obscure. It's hard to find information about this mechanism on Apple pages. There is some info from other sources, but no official documentation:
http://www.sealiesoftware.com/blog/archive/2011/06/17/objc_explain_objc_msgSend_vtable.html
http://cocoasamurai.blogspot.com/2010/01/understanding-objective-c-runtime.html
Why these variables are not working? Does vtables in current version of Objective-C are deprecated?
In this case, the answer is pretty straightforward - vtable dispatch is no longer optimized in the objective-c runtime, and was probably a bad idea in the first place.
vtable-based dispatch was one of the first attempts to speed up frequent calls in the objective-c runtime, but note that it predates the current method caching solution. The problem with using a fixed set of selectors as in the vtable solution not only means increased memory for every class in the runtime, but it also means that if you're using an architecture which doesn't result in isEqualToString: being called frequently, for example, you now have a completely wasted pointer for EVERY class in the runtime that overrides ONE of those selectors. Whoops.
Also, note that Vtable dispatch by design couldn't work on 32-bit architectures, which meant that once the iOS SDK was released, and 32bit was again a reasonable target for objective-c, that optimization simply couldn't work.
The relevant documentation that I can find for this is in objc-abi.h:
#if TARGET_OS_OSX && defined(__x86_64__)
// objc_msgSend_fixup() is used for vtable-dispatchable call sites.
OBJC_EXPORT void objc_msgSend_fixup(void)
__OSX_DEPRECATED(10.5, 10.8, "fixup dispatch is no longer optimized")
__IOS_UNAVAILABLE __TVOS_UNAVAILABLE __WATCHOS_UNAVAILABLE;
Nowadays, there aren't many vestigial fragments of vtable dispatch left in the runtime. A quick grep over the codebase shows a few places in objc-runtime-new.mm:
#if SUPPORT_FIXUP
// Fix up old objc_msgSend_fixup call sites
for (EACH_HEADER) {
message_ref_t *refs = _getObjc2MessageRefs(hi, &count);
if (count == 0) continue;
if (PrintVtables) {
_objc_inform("VTABLES: repairing %zu unsupported vtable dispatch "
"call sites in %s", count, hi->fname());
}
for (i = 0; i < count; i++) {
fixupMessageRef(refs+i);
}
}
ts.log("IMAGE TIMES: fix up objc_msgSend_fixup");
#endif
And
*********************************************************************
* fixupMessageRef
* Repairs an old vtable dispatch call site.
* vtable dispatch itself is not supported.
**********************************************************************/
static void
fixupMessageRef(message_ref_t *msg)
Which pretty clearly indicates that it's not supported.
See also, the method stub for it (if you were to do it without a compiler generated call-site), found in objc-msg-x86_64.s:
ENTRY _objc_msgSend_fixup
int3
END_ENTRY _objc_msgSend_fixup
Where int3 is the SIGTRAP instruction, which would cause a crash if a debugger isn't attached (usually).
So, while vtable dispatch is an interesting note in the history of objective-c, it should be looked back as little more than an experiment when we weren't quite familiar with the best ways to optimize common method calls.

What is the significance of the "volatile" key word with respect to Embedded Systems?

I have been recently working on learning embedded systems programming on my own. I have observed a fairly high usage of the keyword volatile qualifier when declaring variables?
What is the significance of volatile when declaring a variable in Embedded System programming?
Basically when the should the key word be used. I did read something about compiler optimization and use of the keyword. Also something related to memory mapping registers.
For example, I read this StackOverflow post but I didn't understand how it applied in an embedded environment. More specifically, I don't understand when the key word should be used. I did read something about compiler optimization and use of the keyword. Also something related to memory mapping registers, but I don't understand when to use it.
Let's have a look at an example. When you look at C header files for PIC microcontrollers, you will see that many elements are declared volatile:
extern volatile unsigned char PORTB # 0x006;
As you have read, the volatile keyword disables compiler optimization. Suppose you write a program that does the following:
PORTB = 0x00; // set all of port B low
while (PORTB == 0x00); // wait for any pin to get high
// do something else
When the compiler optimises this code, it will recognise the second line as an infinite loop: the condition is true and never gets false within its body. Therefore, everything after the infinite loop does not need to be compiled as it will never be ran. Hence, the compiler may decide to not include that part of the code in the generated assembly code.
However, this PORTB is actually linked to a physical port. It is a hardware port whose value may be altered by the external circuitry. This means that although the loop seems to be infinite, it doesn't have to be. The compiler can't possibly know this.
That's where volatile comes in. When PORTB is declared volatile, the compiler won't do any optimisation based on reasoning about PORTB. It will assume that its value may be changed at any time by external factors.
In the embedded systems world, one of the key aspects of the volatile key-word is that it denotes a variable that may change at any time (eg an external/hardware data input - eg an ADC) and therefore the compiler must not optimise use.
But specifically, when used with a control register, it indicates that a read access may in fact change the data!
As a general rule of thumb, I would recommend the use of the volatile qualifier in all of the following:
All hardware register accesses(read and write)
All variables that are accessible in multiple threads (especially interrupt handlers)
Note: accessing a volatile is not necessarily atomic, so it is imperative that you know your hardware and your code structure.
The volatile keyword is primarily used tell the compiler the value of the variable may change anytime. It also tells the compiler the not to apply optimization on the variable. I am not too much of an expert on this but below is good reference that I have referred in the past.
volatile is a qualifier that is applied to a variable when it is declared. It tells the compiler that the value of the variable may change at any time-without any action being taken by the code the compiler finds nearby. The implications of this are quite serious. However, before we examine them, let's take a look at the syntax.
Reference:
Introduction to the volatile keyword
Let me put it in other perspective it is exactly opposite of const keyword.
When compiler encounters const qualifier for any variable it checks if any function or statement is modified it once initialized. Hence flag error.
Volatile is exactly opposite, this variable can be changed by any function. Hence compiler does not apply optimization.
You see this mostly in embedded system programming due to use of interrupts and some programming logic constructs seems redundant.
While the statements about optimization are correct, they seem a little unclear to me. Here is what is really going on.
If you don't use the volatile keyword C may optimize that variable into a register it isn't using at the time. This will make for fewer assembly instructions and the code will execute faster.
For example, consider the following...
extern int my_port; // my_port is defined in a different module somewhere
// presumably a memory mapped hardware port
while (my_port > 0) {so stuff}
The compiler may decide to read my_port into a register only once before the actual while statement, then each time to test my_port it will look at the register not the memory location.
If, however, my_port is a hardware port, the port may change but register won't and the while conditional will not change.
The loop variable (the register) will be "out of phase" with the actual variable (my_port).
Thus the need for the keyword volatile.
Volatile tells C, "Don't optimize this variable into a reg, but read it each and every time you need it."
More instructions are generated, code is a bit slower, but it is always accurate.

loading shared library into shared memory

Is there anyway I can load a shared library into shared memory in a process so that some other process can simply map that shared memory (to the same address) and simply invoke functions? I understand that the external in the shared library need to have an additional jump into process-specific memory locations to call into appropriate functions (like elf plt). But, is such a thing viable with today's tools.
But, is such a thing viable with today's tools.
Not with today's tools, nor ever.
Sure, if your shared library has completely self-contained functions, then it will work. But the moment your library references external data or functions, you will crash and burn.
I understand that the external in the shared library need to have an additional jump into process-specific memory locations to call into appropriate functions
I don't think you understand. Let's consider an example:
void *foo() { return malloc(1); }
When this is built into a shared library on Linux, the result is:
0x00000000000006d0 <+0>: mov $0x1,%edi
0x00000000000006d5 <+5>: jmpq 0x5c0 <malloc#plt>
and
Dump of assembler code for function malloc#plt:
0x00000000000005c0 <+0>: jmpq *0x200a5a(%rip) # 0x201020 <malloc#got.plt>
0x00000000000005c6 <+6>: pushq $0x1
0x00000000000005cb <+11>: jmpq 0x5a0
So the question is: where will jmpq *0x200a5a(%rip) go in the second process. Answer: one of two places.
If the first process has already called malloc (very likely), then the jmpq will go to address of malloc in the first process, which is exceedingly unlikely to be the address of malloc in the second process, and more likely to be unmapped, or be in the middle of some data. Either way, you crash.
If the first process has not yet called malloc, then the jmpq in the second process will jump to address of the runtime loader (ld-linux.so.2 or similar on Linux, ld.so on Solaris) resolver function. Again, that address is very unlikely to also be the address of the resolver in the second process, and if it's not, you crash.
But it gets worse from here. If by some improbable magic you ended up actually calling malloc in the second process, that malloc is itself very likely to crash, because it will try to use data structures it has set up previously, using memory obtained from sbrk or mmap. These data structures are present in the first process, but not in the second, and so you crash again.

Is BOOL read/write atomic in Objective C?

What happens when two threads set a BOOL to YES "at the same time"?
Here is code for solution suggested by Jacko.
Use volatile uint32_t with OSAtomicOr32Barrier and OSAtomicAnd32Barrier
#import <libkern/OSAtomic.h>
volatile uint32_t _IsRunning;
- (BOOL)isRunning {
return _IsRunning != 0;
}
- (void)setIsRunning:(BOOL)allowed {
if (allowed) {
OSAtomicOr32Barrier(1, & _IsRunning); //Atomic bitwise OR of two 32-bit values with barrier
} else {
OSAtomicAnd32Barrier(0, & _IsRunning); //Atomic bitwise AND of two 32-bit values with barrier.
}
}
No. Without a locking construct, reading/writing any type variable is NOT atomic in Objective C.
If two threads write YES at the same time to a BOOL, the result is YES regardless of which one gets in first.
Please see: Synchronizing Thread Execution
I would have to diverge from the accepted answer. Sorry.
While objective c does not guarantee that BOOL properties declared as nonatomic
are in fact atomic I'd have to guess that the hardware you most
care about (all iOS and macos devices) have instructions to perform byte reads and stores atomically. So, unless Apple comes out with Road Light OS running
on an IBM microcontroller that has 5 bit wide bus to send 10 bit bytes over
you could just as well use nonatomic BOOLs in a situation that calls for atomic BOOLs. The code would not be portable to Road Light OS but if you can
sacrifice the futureproofing of your code nonatomic is fine for this use case.
I'm sure there are hardened individuals on s.o. that would raise to the challenge of disassembling synthesized BOOL getter and setter for atomic/nonatomic cases to see what's the difference. At least on ARM.
Your takeaway from this is likely this
you can declare BOOL properties as atomic and it won't cost you a dime
on all HW iOS and macOS intrinsically supports.
memory barriers are orthogonal to atomicity
you most definitely should not use 4 byte properties to store booleans into
unless you are into [very] fuzzy logic.
It's idiotic and wasteful, you don't want to be a clone of a Java programmer,
who can't tell a float from a double, or do you?
BOOL variables (which do not obviously support atomic/nonatomic decorators
would not be atomic on some narrow bus architectures objective C would
not be used on anyway (microcontrollers with or without some [very] micro OS are C & assembly territory I suppose. they don't typically need the luggage
objc runtime would bring)
What happens when two threads set a BOOL to YES "at the same time"?
Then its value will be YES. If to threads write the same value to the same memory location, the memory location will have that value, whether that is atomic or not plays no role. It would only play a role if two threads write a different value to the same memory location or if one thread writes to it while another one is reading from it.
Is BOOL read/write atomic in Objective C?
It is if your hardware is a Macintosh running macOS. BOOL is uint32_t on PPC systems and char on Intel systems and writing these data types is atomic on their respective systems.
The Obj-C language makes no such guarantee, though. On other systems it depends on your compiler used and how BOOL is defined for that platfrom. Most compilers (gcc, clang, ...) guarantee that writing a variable of int-size is always atomic, whether other sizes are atomic depends on the CPU.
Note that atomic is not the same as thread-safe. Writing a BOOL is not a memory barrier. The compiler and the CPU may reorder instructions around a BOOL write:
a = 10;
b = YES;
c = 20;
There is no guarantee that instructions are executed in that order. The fact that b is YES does not mean that a is 10. The compiler and CPU are free to shuffle these three instructions around as desired since they don't depend on each other. Explicit atomic instructions as well as locks, mutexes and semaphores are usually memory barriers, that means they instruct the compiler and CPU to not move instructions located before that operation beyond it and not move instructions located after that operation before it (it's a hard border, that instructions may not pass).
Also cache consistency is not guaranteed. Even after you set a BOOL to YES, some other thread may still see it as NO for a limited amount of time. Memory barrier operations are usually also operations that ensure cache synchronization among all threads/cores/CPUs in the system.
And to add something really useful here as well, this is how you can ensure that setting a boolean value is atomic and acts as a memory barrier in 2020 using C11 which will also work in Obj-C Code:
#import <stdatomic.h>
// ...
volatile atomic_bool b = true;
// ...
atomic_store(&b, true);
// ...
atomic_store(&b, false);
Not only will this code guarantee atomic writes to the bool (for which the system will choose an appropriate type), it will also act as a memory barrier (Sequentially Consistent).
To read the boolean atomically from another thread, you'd use
bool x = atomic_load(&b);
You can also use atomic_load_explicit and atomic_store_explicit and pass an explicit memory order, which allows you to control more fine grained which kind of memory reordering is allowed and which ones is not.
Learn more about your possibilities here:
http://llvm.org/docs/Atomics.html
Always read "Notes for optimizers" to see which memory reordering is allowed. If in doubt, always use Sequentially Consistent (memory_order_seq_cst, which is the default if not specified). It will not result in fastest performance but it's the safest option and you really should only use something else if you know what you are doing.

Using Printf to display on serial port of an ARM microcontroller

I would like to use printf to diplay text on a serial port of an ARM microcontroller. I am unable to do so. Any help is appreciated.
My init_serial looks like this
void init_serial (void)
{
PINSEL0 = 0x00050000; /* Enable RXD1 TxD1 */
U1LCR = 0x00000083; /*8 bits, 1 Stop bit */
U1DLL = 0x000000C2; /*9600 Baud Rate #12MHz VPB Clock */
U1LCR = 0x00000003; /* DLAB=0*/
}
which is obviously wrong.
For microcontollers, you typically have to define your own putc function to send bytes to whichever UART you're using. print will then call your putc.
Check the documentation for the libraries supplied with your compiler.
Note that this is entirely unrelated to how you intialise your UART. All that matters is which UART you're using.
(On an unrelated issue, rather than saying:
PINSEL0 = 0x00050000; /* Enable RXD1 TxD1 */
U1LCR = 0x00000083; /*8 bits, 1 Stop bit */
there are typically #defines for registers which (usually) aid readability, provide a link to the bit names in the documentation, and reduce the need for comments to be added and maintained on every line like these. For example:
PINSEL0 = PICSEL0_RXD1EN | PICSEL0_TXD1EN;
U1LCR = U1LCR_8BITS | U1LCR_1STOPBIT;
..and so on.)
To make printf(), puts() etc work on an embedded platform, you need to implement some hooks that work with the C library. This is typically dependent on the C libraries provided with your compiler, so is probably compiler-dependent. But in many cases the library just requires you to provide a putc() function (or similar name), which takes a character (generated by the printf() library function) and sends it to your chosen output device. That could be a memory buffer, serial port, USB message, whatever.
From the point of view of the C library, the putc() function would be run-to-completion, so it's up to you whether you implement it to be a simple blocking function (waiting until the serial port is free and sending the character), or non-blocking (putting it into a buffer, to be sent by a background interrupt task; but the buffer might fill up if you output enough bytes fast enough, and then you have to either block or discard characters). You can also make it work properly with your RTOS if you have one, implementing a blocking write that sleeps on a semaphore until the serial port is available.
So, in summary, read the documentation for your compiler and its C library, and it should tell you what you need to do to make printf() work.
Example links for AVR micro with GCC compiler:
AVR libc stdio docs
a blog post
ARM GCC compiler using newlib C library:
Newlib C library docs
Defining host interface - syscalls - write() function
I'm not sure about ARM in particular...
For some chips, within the IDE, you need to specify that you need a heap to use the printf, and how big it should be. The programmer won't automatically put one on.
Check in the menus of your programmer/IDE and see if there is a place to specify the heap size.
And I agree with Steve, this is only if you can actually use the printf, otherwise write your own little snippet.