kernel /proc/pid/stack format, what does the addresses mean? - process

given a stack like the following:
cat /proc/17019/stack
[<0>] futex_wait_queue_me+0xc4/0x120
[<0>] futex_wait+0x10a/0x250
[<0>] do_futex+0x325/0x500
[<0>] SyS_futex+0x13b/0x180
[<0>] do_syscall_64+0x73/0x130
[<0>] entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[<0>] 0xffffffffffffffff
take the line futex_wait_queue_me+0xc4/0x120 as an example, what does 0xc4 and 0x120 mean?
and additionally, how can I figure out the line of code corresponding to this address?

futex_wait_queue_me+0xc4/0x120 - In call stack at this function current operation is at offset 0xc4 and total size of the function is 0x120, both are in hexadecimal format. For Kernel subroutines, you can get the corresponding line by using objdump of vmlinux provided it has debug symbols to map it.
As shown below in system_call_fastpath, current offset of 0x22 is actually 34d in disassembled output.
[root#linux ~]# cat /proc/26581/stack
[<ffffffff9f28eace>] ep_poll+0x23e/0x360
[<ffffffff9f28ff9d>] SyS_epoll_wait+0xed/0x120
[<ffffffff9f774ddb>] system_call_fastpath+0x22/0x27
[<ffffffffffffffff>] 0xffffffffffffffff
(gdb) disassemble system_call_fastpath
Dump of assembler code for function system_call_fastpath:
0xffffffff81774db9 <+0>: cmp $0x14c,%rax
0xffffffff81774dbf <+6>: jae 0xffffffff81774f43 <badsys>
0xffffffff81774dc5 <+12>: sbb %rcx,%rcx
0xffffffff81774dc8 <+15>: and %rcx,%rax
0xffffffff81774dcb <+18>: mov %r10,%rcx
0xffffffff81774dce <+21>: mov -0x7e7fd6c0(,%rax,8),%rax
0xffffffff81774dd6 <+29>: callq 0xffffffff81386770 <__x86_indirect_thunk_rax>
0xffffffff81774ddb <+34>: mov %rax,0x50(%rsp)
End of assembler dump.
(gdb)

Related

How to pass function parameters into inline assembly blocks without assigning them to register variables in c++ [duplicate]

This question already has answers here:
How to access C variable for inline assembly manipulation?
(2 answers)
How to invoke a system call via syscall or sysenter in inline assembly?
(2 answers)
How do I pass inputs into extended asm?
(1 answer)
Closed 3 years ago.
I am trying to write a function, which prints string to stdout without importing <cstdio> or <iostream>.
For this I am trying to pass 2 parameters (const char* and const unsigned) to the asm(...) section in c++ code. And calling write syscall.
This works fine:
void writeInAsm(const char* str, const unsigned len) {
register const char* arg3 asm("rsi") = str;
register const unsigned arg4 asm("rdx") = len;
asm(
"mov rax, 1 ;" // write syscall
"mov rdi, 1 ;" // file descriptor 1 - stdout
"syscall ;"
);
}
Is it possible to do this without those first two lines in which I assign parameters to registers?
Next lines don't work:
mov rsi, str;
// error: relocation R_X86_64_32S against undefined symbol `str' can not be used when making a PIE object; recompile with -fPIC
// compiled with -fPIC - still got this error
mov rsi, [str];
// error: relocation R_X86_64_32S against undefined symbol `str' can not be used when making a PIE object; recompile with -fPIC
// compiled with -fPIC - still got this error
mov rsi, dword ptr str;
// incorrect register `rsi' used with `l' suffix
mov rsi, dword ptr [str];
// incorrect register `rsi' used with `l' suffix
I am compiling with g++ -masm=intel. I am on x86_64 Intel® Core™ i7-7700HQ CPU # 2.80GHz × 8, Ubuntu 19.04 5.0.0-36-generic kernel (if it matters).
$ g++ --version
g++ (Ubuntu 8.3.0-6ubuntu1) 8.3.0
Edit: According to Compiler Explorer, the next can be used:
void writeInAsm(const char* str, const unsigned len) {
asm(
"mov rax, 1 ;"
"mov rdi, 1 ;"
"mov rsi, QWORD PTR [rbp-8] ;"
"mov edx, DWORD PTR [rbp-12] ;"
"syscall ;"
);
}
But is it always rbp register and how will it change with larger number of parameters?

The relationship with _NSConcreteMallocBlock and NSMallocBlock?

I was digging what's a block really is, I find that in _Block_copy_internal(), we assign _NSConcreteMallocBlock to the result->isa, but _NSConcreteMallocBlock is a array with 32 counts void * elements, it confused me a lot, why define _NSConcreteMallocBlock to a array pointer?and how did the dyld link the _NSConcreteMallocBlock to the NSMallocBlock class?
The declaring it as 32 pointers simply reserves space for the size of the class object they will put there later.
If you read the comment in https://opensource.apple.com/source/libclosure/libclosure-65/data.c
These data areas are set up by Foundation to link in as real classes
post facto.
Foundation is closed-source so you cannot see how that works and what content they put into that space.
thanks, I have understood this by disassembling CoreFoundation and Foundation Framework, found this code:
___CFMakeNSBlockClasses:
0000000000008029 leaq 0x4522b0(%rip), %rdi ## literal pool for: “__NSStackBlock"
0000000000008030 callq 0x1d4858 ## symbol stub for: _objc_lookUpClass
0000000000008035 movq 0x46f07c(%rip), %rdx ## literal pool symbol address: __NSConcreteStackBlock
000000000000803c movq %rdx, %rcx
000000000000803f subq $-0x80, %rcx
0000000000008043 leaq 0x4522a5(%rip), %rsi ## literal pool for: "__NSStackBlock__"
000000000000804a movq %rax, %rdi
this assemble code corresponding to these OC code:
Class __NSStackBlock = _objc_lookUpClass(“__NSStackBlock”);
objc_initializeClassPair_internal(__NSStackBlock, “__NSStackBlock__”, &__NSConcreteStackBlock, &__NSConcreteStackBlock+0x80);

Casting Failed with Parse

I am using parse to do a tableviewer. I am trying to load the table lines. The error shows like below.
libswiftCore.dylib`swift_dynamicCastObjCClassUnconditional:
0x103710991: je 0x1037109ac ; swift_dynamicCastObjCClassUnconditional + 44
0x103710993: movq 0x7f236(%rip), %rsi ; "isKindOfClass:"
0x1037109a0: callq 0x10371346a ; symbol stub for: objc_msgSend
0x1037109aa: je 0x1037109b3 ; swift_dynamicCastObjCClassUnconditional + 51
0x1037109b3: leaq 0xc158(%rip), %rax ; "Swift dynamic cast failed"
0x1037109ba: movq %rax, 0x87427(%rip) ; gCRAnnotations + 8
My code line is :
let array:NSArray = self.cartoonData.reverseObjectEnumerator().allObjects
self.cartoonData = array as NSMutableArray
I think this is the error line code. But I don't know how can I fix it.
It seems the error speaks by itself:
Swift dynamic cast failed
which I interpret as: array cannot be cast to NSMutableArray. You should create a mutable copy of it:
self.cartoonData = NSMutableArray(array: array)

Why does GCC insist non-leaf callees copy CBV structs to the stack?

Consider the following C99 code:
#include <stdio.h>
#include <stdint.h>
struct baz { uint64_t x, y; };
uint64_t foo(uint64_t a, uint64_t b, struct baz c)
{
return a + b + c.x + c.y;
}
void bar(uint64_t a, uint64_t b, struct baz c)
{
printf("%lu\n", a);
}
The behavior I expect, when compiled with gcc -O3, is that c is passed in registers to both foo and bar, is accessed using registers in foo, and is entirely ignored in bar. GCC produces code which does this for foo. However, in bar, c is copied from registers to the stack, and is then promptly ignored:
.file "pbv.c"
.text
.p2align 4,,15
.globl foo
.type foo, #function
foo:
.LFB22:
.cfi_startproc
leaq (%rcx,%rdx), %rdx
leaq (%rdx,%rdi), %rdi
leaq (%rdi,%rsi), %rax
ret
.cfi_endproc
.LFE22:
.size foo, .-foo
.section .rodata.str1.1,"aMS",#progbits,1
.LC0:
.string "%lu\n"
.text
.p2align 4,,15
.globl bar
.type bar, #function
bar:
.LFB23:
.cfi_startproc
movq %rdx, -24(%rsp)
movl $.LC0, %esi
movq %rdi, %rdx
xorl %eax, %eax
movl $1, %edi
movq %rcx, -16(%rsp)
jmp __printf_chk
.cfi_endproc
.LFE23:
.size bar, .-bar
.ident "GCC: (Ubuntu/Linaro 4.4.6-11ubuntu2) 4.4.6"
.section .note.GNU-stack,"",#progbits
(Note that a and b are passed in %rsi and %rdi, and c is passed in %rcx and %rdx.)
The only reason I can surmise for this is some sort of ABI requirement (e.g. for interaction with longjmp). I cannot find any optimization (-f) options for GCC, nor GCC-specific annotations which inhibit this behavior. Annotating c with register does not help.
This happens with different targets as well. (Notably, on the TileGX, foo has space allocated and deallocated on the stack, but nothing is stored there.) I have tested both GCC 4.4.6 and 4.6.1.
Is this expected behavior or a bug in GCC? Either way, is there some way to work around it (beside using call-by-reference or ensuring bar can be a leaf)?
This shortcoming is the same as mentioned in bug 44194, the patch for which is present in the very latest version of GCC (4.7.2).
The cause is roughly that the call to printf (or any function) is considered to be able to access anything in memory, including stack-based locals. The patch causes stack-based locals not to be considered reachable by the callee.

Assignment or memcpy? What is the preferred approach to setting an array member variable?

For this example, I am working with objective-c, but answers from the broader C/C++ community are welcome.
#interface BSWidget : NSObject {
float tre[3];
}
#property(assign) float* tre;
.
- (void)assignToTre:(float*)triplet {
tre[0] = triplet[0];
tre[1] = triplet[1];
tre[2] = triplet[2];
}
.
- (void)copyToTre:(float*)triplet {
memcpy(tre, triplet, sizeof(tre) );
}
So between these two approaches, and considering the fact that these setter functions will only generally handle dimensions of 2,3, or 4...
What would be the most efficient approach for this situation?
Will gcc generally reduce these to the same basic operations?
Thanks.
A quick test seems to show that the compiler, when optimising, replaces the memcpy call with the instructions to perform the assignment.
Disassemble the following code, when compiled unoptimised and with -O2, shows that in the optimised case the testMemcpy function does not contain a call to memcpy.
struct test src = { .a=1, .b='x' };
void testMemcpy(void)
{
struct test *dest = malloc(sizeof(struct test));
memcpy(dest, &src, sizeof(struct test));
}
void testAssign(void)
{
struct test *dest = malloc(sizeof(struct test));
*dest = src;
}
Unoptimised testMemcpy, with a memcpy call as expected
(gdb) disassemble testMemcpy
Dump of assembler code for function testMemcpy:
0x08048414 <+0>: push %ebp
0x08048415 <+1>: mov %esp,%ebp
0x08048417 <+3>: sub $0x28,%esp
0x0804841a <+6>: movl $0x8,(%esp)
0x08048421 <+13>: call 0x8048350 <malloc#plt>
0x08048426 <+18>: mov %eax,-0xc(%ebp)
0x08048429 <+21>: movl $0x8,0x8(%esp)
0x08048431 <+29>: movl $0x804a018,0x4(%esp)
0x08048439 <+37>: mov -0xc(%ebp),%eax
0x0804843c <+40>: mov %eax,(%esp)
0x0804843f <+43>: call 0x8048340 <memcpy#plt>
0x08048444 <+48>: leave
0x08048445 <+49>: ret
Optimised testAssign
(gdb) disassemble testAssign
Dump of assembler code for function testAssign:
0x080483f0 <+0>: push %ebp
0x080483f1 <+1>: mov %esp,%ebp
0x080483f3 <+3>: sub $0x18,%esp
0x080483f6 <+6>: movl $0x8,(%esp)
0x080483fd <+13>: call 0x804831c <malloc#plt>
0x08048402 <+18>: mov 0x804a014,%edx
0x08048408 <+24>: mov 0x804a018,%ecx
0x0804840e <+30>: mov %edx,(%eax)
0x08048410 <+32>: mov %ecx,0x4(%eax)
0x08048413 <+35>: leave
0x08048414 <+36>: ret
Optimised testMemcpy does not contain a memcpy call
(gdb) disassemble testMemcpy
Dump of assembler code for function testMemcpy:
0x08048420 <+0>: push %ebp
0x08048421 <+1>: mov %esp,%ebp
0x08048423 <+3>: sub $0x18,%esp
0x08048426 <+6>: movl $0x8,(%esp)
0x0804842d <+13>: call 0x804831c <malloc#plt>
0x08048432 <+18>: mov 0x804a014,%edx
0x08048438 <+24>: mov 0x804a018,%ecx
0x0804843e <+30>: mov %edx,(%eax)
0x08048440 <+32>: mov %ecx,0x4(%eax)
0x08048443 <+35>: leave
0x08048444 <+36>: ret
Speaking from a C background, I recommend using direct assignment. That version of the code is more obvious as to your intent, and less error-prone if your array changes in the future and adds extra indices that your function doesn't need to copy.
The two are not strictly equivalent. memcpy is typically implemented as a loop that copies the data in fixed-size chunks (that may be smaller than a float), so the compiler probably won't generate the same code for the memcpy case. The only way to know for sure is to build it both ways and look at the emitted assembly in a debugger.
Even if the memcpy call is inlined, it will probably result in more code and slower execution time. The direct assignment case should be more efficient (unless your target platform requires special code to handle float datatypes). This is only an educated guess, however; the only way to know for sure is to try it both ways and profile the code.
memcpy:
Do function prolog.
Initialize counter and pointers.
Check if have bytes to copy.
Copy memory.
Increment pointer.
Increment pointer.
Increment counter.
Repeat 3-7 3 or 11 more times.
Do function epilog.
Direct assignment:
Copy memory.
Copy memory.
Copy memory.
As you see, direct assignment is much faster.