How unwind ARM Cortex M3 stack - crash

The ARM Coretex STM32's HardFault_Handler can only get several registers values, r0, r1,r2, r3, lr, pc, xPSR, when crash happened. But there is no FP and SP in the stack. Thus I could not unwind the stack.
Is there any solution for this? Thanks a lot.
[update]
Following a web instruction to let ARMGCC(Keil uvision IDE) generate FP by adding a compiling option "--use_frame_pointer", but I could not find the FP in the stack. I am a real newbie here. Below is my demo code:
int test2(int i, int j)
{
return i/j;
}
int main()
{
SCB->CCR |= 0x10;
int a = 10;
int b = 0;
int c;
c = test2(a,b);
}
enum { r0 = 0, r1, r2, r3, r11, r12, lr, pc, psr};
void Hard_Fault_Handler(uint32_t *faultStackAddress)
{
uint32_t r0_val = faultStackAddress[r0];
uint32_t r1_val = faultStackAddress[r1];
uint32_t r2_val = faultStackAddress[r2];
uint32_t r3_val = faultStackAddress[r3];
uint32_t r12_val = faultStackAddress[r12];
uint32_t r11_val = faultStackAddress[r11];
uint32_t lr_val = faultStackAddress[lr];
uint32_t pc_val = faultStackAddress[pc];
uint32_t psr_val = faultStackAddress[psr];
}
I have two questions here:
1. I am not sure where the index of FP(r11) in the stack, or whether it is pushed into stack or not. I assume it is before r12, because I compared the assemble source before and after adding the option "--use_frame_pointer". I also compared the values read from Hard_Fault_Handler, seems like r11 is not in the stack. Because r11 address I read points to a place where the code is not my code.
[update] I have confirmed that FP is pushed into the stack. The second question still needs to be answered.
See below snippet code:
Without the option "--use_frame_pointer"
test2 PROC
MOVS r0,#3
BX lr
ENDP
main PROC
PUSH {lr}
MOVS r0,#0
BL test2
MOVS r0,#0
POP {pc}
ENDP
with the option "--use_frame_pointer"
test2 PROC
PUSH {r11,lr}
ADD r11,sp,#4
MOVS r0,#3
MOV sp,r11
SUB sp,sp,#4
POP {r11,pc}
ENDP
main PROC
PUSH {r11,lr}
ADD r11,sp,#4
MOVS r0,#0
BL test2
MOVS r0,#0
MOV sp,r11
SUB sp,sp,#4
POP {r11,pc}
ENDP
2. Seems like FP is not in the input parameter faultStackAddress of Hard_Fault_Handler(), where can I get the caller's FP to unwind the stack?
[update again]
Now I understood the last FP(r11) is not stored in the stack. All I need to do is to read the value of r11 register, then I can unwind the whole stack.
So now my final question is how to read it using inline assembler of C. I tried below code, but failed to read the correct value from r11 following the reference of http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0472f/Cihfhjhg.html
volatile int top_fp;
__asm
{
mov top_fp, r11
}
r11's value is 0x20009DCC
top_fp's value is 0x00000004
[update 3] Below is my whole code.
int test5(int i, int j, int k)
{
char a[128] = {0} ;
a[0] = 'a';
return i/j;
}
int test2(int i, int j)
{
char a[18] = {0} ;
a[0] = 'a';
return test5(i, j, 0);
}
int main()
{
SCB->CCR |= 0x10;
int a = 10;
int b = 0;
int c;
c = test2(a,b); //create a divide by zero crash
}
/* The fault handler implementation calls a function called Hard_Fault_Handler(). */
#if defined(__CC_ARM)
__asm void HardFault_Handler(void)
{
TST lr, #4
ITE EQ
MRSEQ r0, MSP
MRSNE r0, PSP
B __cpp(Hard_Fault_Handler)
}
#else
void HardFault_Handler(void)
{
__asm("TST lr, #4");
__asm("ITE EQ");
__asm("MRSEQ r0, MSP");
__asm("MRSNE r0, PSP");
__asm("B Hard_Fault_Handler");
}
#endif
void Hard_Fault_Handler(uint32_t *faultStackAddress)
{
volatile int top_fp;
__asm
{
mov top_fp, r11
}
//TODO: use top_fp to unwind the whole stack.
}
[update 4] Finally, I made it out. My solution:
Note: To access r11, we have to use embedded assembler, see here, which costs me much time to figure it out.
//we have to use embedded assembler.
__asm int getRegisterR11()
{
mov r0,r11
BX LR
}
//call it from Hard_Fault_Handler function.
/*
Function call stack frame:
FP1(r11) -> | lr |(High Address)
| FP2|(prev FP)
| ...|
Current FP(r11) ->| lr |
| FP1|(prev FP)
| ...|(Low Address)
With FP, we can access lr(link register) which is the address to return when the current functions returns(where you were).
Then (current FP - 1) points to prev FP.
Thus we can unwind the stack.
*/
void unwindBacktrace(uint32_t topFp, uint16_t* backtrace)
{
uint32_t nextFp = topFp;
int j = 0;
//#define BACK_TRACE_DEPTH 5
//loop backtrace using FP(r11), save lr into an uint16_t array.
for(int i = 0; i < BACK_TRACE_DEPTH; i++)
{
uint32_t lr = *((uint32_t*)nextFp);
if ((lr >= 0x08000000) && (lr <= 0x08FFFFFF))
{
backtrace[j*2] = LOW_16_BITS(lr);
backtrace[j*2 + 1] = HIGH_16_BITS(lr);
j += 1;
}
nextFp = *((uint32_t*)nextFp - 1);
if (nextFp == 0)
{
break;
}
}
}
#if defined(__CC_ARM)
__asm void HardFault_Handler(void)
{
TST lr, #4
ITE EQ
MRSEQ r0, MSP
MRSNE r0, PSP
B __cpp(Hard_Fault_Handler)
}
#else
void HardFault_Handler(void)
{
__asm("TST lr, #4");
__asm("ITE EQ");
__asm("MRSEQ r0, MSP");
__asm("MRSNE r0, PSP");
__asm("B Hard_Fault_Handler");
}
#endif
void Hard_Fault_Handler(uint32_t *faultStackAddress)
{
//get back trace
int topFp = getRegisterR11();
unwindBacktrace(topFp, persistentData.faultStack.back_trace);
}

Very primitive method to unwind the stack in such case is to read all stack memory above SP seen at the time of HardFault_Handler and process it using arm-none-eabi-addr2line. All link register entries saved on stack will be transformed into source line (remember that actual code path goes the line before LR points to). Note, if functions in between were called using branch instruction (b) instead of branch and link (bl) you'll not see them using this method.
(I don't have enough reputation points to write comments, so I'm editing my answer):
UPDATE for question 2:
Why do you expect that Hard_Fault_Handler has any arguments? Hard_Fault_Handler is usally a function to which address is stored in vector (exception) table. When the processor exception happens then Hard_Fault_Handler will be executed. There is no arguments passing involved doing this. But still, all registers at the time the fault happens are preserved. Specifically, if you compiled without omit-frame-pointer you can just read value of R11 (or R7 in Thumb-2 mode). However, to be sure that in your code Hard_Fault_Handler is actually a real hard fault handler, look into startup.s code and see if Hard_Fault_Handler is at the third entry in vector table. If there is an other function, it means Hard_Fault_Handler is just called from that function explicitly. See this article for details. You can also read my blog :) There is a chapter about stack which is based on Android example, but a lot of things are the same in general.
Also note, most probably in faultStackAddress should be stored a stack pointer, not a frame pointer.
UPDATE 2
Ok, lets clarify some things. Firstly, please paste the code from which you call Hard_Fault_Handler. Secondly, I guess you call it from within real HardFault exception handler. In that case you cannot expect that R11 will be at faultStackAddress[r11]. You've already mentioned it at the first sentence in your question. There will be only r0-r3, r12, lr, pc and psr.
You've also written:
But there is no FP and SP in the stack. Thus I could not unwind the
stack. Is there any solution for this?
The SP is not "in the stack" because you have it already in one of the stack registers (msp or psp). See again THIS ARTICLE. Also, FP is not crucial to unwind stack because you can do it without it (by "navigating" through saved Link Registers). Other thing is that if you dump memory below your SP you can expect FP to be just next to saved LR if you really need it.
Answering your last question: I don't now how you're verifying this code and how you're calling it (you need to paste full code). You can look into assembly of that function and see what's happening under the hood. Other thing you can do is to follow this post as a template.

Related

_Unwind_Backtrace for different context on FreeRTOS

Hello I am trying to implement error handling in FreeRTOS project. The handler is triggered by WatchDog interrupt, prior to WatchDog reset. The idea is to log task name + call stack of the failed task.
I have managed to backtrace a call stack but in the wrong context, the context of the interrupt. While I need the context of the failed task which is stored in pxCurrentTCB. but I do not know how to tell _Unwind_Backtrace to use it instead of the interrupt context, where it is called from.
So I want to _Unwind_Backtrace not the context it is called from but for different context found in pxCurrentTCB. I have searched and tried to understand how _Unwind_Backtrace work but without success, so please help.
Any help will be appreciated especially sample code. Thank you.
_Unwind_Reason_Code unwind_backtrace_callback(_Unwind_Context * context, void * arg)
{
static uint8_t row = 1;
char str_buff[BUFF_SIZE];
uintptr_t pc = _Unwind_GetIP(context);
if (pc && row < MAX_ROW) {
snprintf(str_buff, sizeof(str_buff), "%d .. 0x%x", row, pc);
printString(str_buff, 0, ROW_SIZE * row++);
}
return _URC_NO_REASON;
}
void WDOG1_DriverIRQHandler(void)
{
printString(pxCurrentTCB->pcTaskName, 0, 0);
_Unwind_Backtrace(unwind_backtrace_callback, 0);
while(1) Wdog_Service();
}
As it turns out, OpenMRN implements exactly the solution you are looking for: https://github.com/bakerstu/openmrn/blob/master/src/freertos_drivers/common/cpu_profile.hxx
More information can be found here: Stack Backtrace for ARM core using GCC compiler (when there is a MSP to PSP switch). To quote this post:
This is doable but needs access to internal details of how libgcc implements the _Unwind_Backtrace function. Fortunately the code is open-source, but depending on such internal details is brittle in that it may break in future versions of armgcc without any notice.
Generally, reading through the source of libgcc doing the backtrace, it creates an inmemory virtual representation of the CPU core registers, then uses this representation to walk up the stack, simulating exception throws. The first thing that _Unwind_Backtrace does is fill in this context from the current CPU registers, then call an internal implementation function.
Creating that context manually from the stacked exception structure is sufficient to fake the backtrace going from handler mode upwards through the call stack in most cases. Here is some example code (from https://github.com/bakerstu/openmrn/blob/62683863e8621cef35e94c9dcfe5abcaf996d7a2/src/freertos_drivers/common/cpu_profile.hxx#L162):
/// This struct definition mimics the internal structures of libgcc in
/// arm-none-eabi binary. It's not portable and might break in the future.
struct core_regs
{
unsigned r[16];
};
/// This struct definition mimics the internal structures of libgcc in
/// arm-none-eabi binary. It's not portable and might break in the future.
typedef struct
{
unsigned demand_save_flags;
struct core_regs core;
} phase2_vrs;
/// We store what we know about the external context at interrupt entry in this
/// structure.
phase2_vrs main_context;
/// Saved value of the lr register at the exception entry.
unsigned saved_lr;
/// Takes registers from the core state and the saved exception context and
/// fills in the structure necessary for the LIBGCC unwinder.
void fill_phase2_vrs(volatile unsigned *fault_args)
{
main_context.demand_save_flags = 0;
main_context.core.r[0] = fault_args[0];
main_context.core.r[1] = fault_args[1];
main_context.core.r[2] = fault_args[2];
main_context.core.r[3] = fault_args[3];
main_context.core.r[12] = fault_args[4];
// We add +2 here because first thing libgcc does with the lr value is
// subtract two, presuming that lr points to after a branch
// instruction. However, exception entry's saved PC can point to the first
// instruction of a function and we don't want to have the backtrace end up
// showing the previous function.
main_context.core.r[14] = fault_args[6] + 2;
main_context.core.r[15] = fault_args[6];
saved_lr = fault_args[5];
main_context.core.r[13] = (unsigned)(fault_args + 8); // stack pointer
}
extern "C"
{
_Unwind_Reason_Code __gnu_Unwind_Backtrace(
_Unwind_Trace_Fn trace, void *trace_argument, phase2_vrs *entry_vrs);
}
/// Static variable for trace_func.
void *last_ip;
/// Callback from the unwind backtrace function.
_Unwind_Reason_Code trace_func(struct _Unwind_Context *context, void *arg)
{
void *ip;
ip = (void *)_Unwind_GetIP(context);
if (strace_len == 0)
{
// stacktrace[strace_len++] = ip;
// By taking the beginning of the function for the immediate interrupt
// we will attempt to coalesce more traces.
// ip = (void *)_Unwind_GetRegionStart(context);
}
else if (last_ip == ip)
{
if (strace_len == 1 && saved_lr != _Unwind_GetGR(context, 14))
{
_Unwind_SetGR(context, 14, saved_lr);
allocator.singleLenHack++;
return _URC_NO_REASON;
}
return _URC_END_OF_STACK;
}
if (strace_len >= MAX_STRACE - 1)
{
++allocator.limitReached;
return _URC_END_OF_STACK;
}
// stacktrace[strace_len++] = ip;
last_ip = ip;
ip = (void *)_Unwind_GetRegionStart(context);
stacktrace[strace_len++] = ip;
return _URC_NO_REASON;
}
/// Called from the interrupt handler to take a CPU trace for the current
/// exception.
void take_cpu_trace()
{
memset(stacktrace, 0, sizeof(stacktrace));
strace_len = 0;
last_ip = nullptr;
phase2_vrs first_context = main_context;
__gnu_Unwind_Backtrace(&trace_func, 0, &first_context);
// This is a workaround for the case when the function in which we had the
// exception trigger does not have a stack saved LR. In this case the
// backtrace will fail after the first step. We manually append the second
// step to have at least some idea of what's going on.
if (strace_len == 1)
{
main_context.core.r[14] = saved_lr;
main_context.core.r[15] = saved_lr;
__gnu_Unwind_Backtrace(&trace_func, 0, &main_context);
}
unsigned h = hash_trace(strace_len, (unsigned *)stacktrace);
struct trace *t = find_current_trace(h);
if (!t)
{
t = add_new_trace(h);
}
if (t)
{
t->total_size += 1;
}
}
/// Change this value to runtime disable and enable the CPU profile gathering
/// code.
bool enable_profiling = 0;
/// Helper function to declare the CPU usage tick interrupt.
/// #param irq_handler_name is the name of the interrupt to declare, for example
/// timer4a_interrupt_handler.
/// #param CLEAR_IRQ_FLAG is a c++ statement or statements in { ... } that will
/// be executed before returning from the interrupt to clear the timer IRQ flag.
#define DEFINE_CPU_PROFILE_INTERRUPT_HANDLER(irq_handler_name, CLEAR_IRQ_FLAG) \
extern "C" \
{ \
void __attribute__((__noinline__)) load_monitor_interrupt_handler( \
volatile unsigned *exception_args, unsigned exception_return_code) \
{ \
if (enable_profiling) \
{ \
fill_phase2_vrs(exception_args); \
take_cpu_trace(); \
} \
cpuload_tick(exception_return_code & 4 ? 0 : 255); \
CLEAR_IRQ_FLAG; \
} \
void __attribute__((__naked__)) irq_handler_name(void) \
{ \
__asm volatile("mov r0, %0 \n" \
"str r4, [r0, 4*4] \n" \
"str r5, [r0, 5*4] \n" \
"str r6, [r0, 6*4] \n" \
"str r7, [r0, 7*4] \n" \
"str r8, [r0, 8*4] \n" \
"str r9, [r0, 9*4] \n" \
"str r10, [r0, 10*4] \n" \
"str r11, [r0, 11*4] \n" \
"str r12, [r0, 12*4] \n" \
"str r13, [r0, 13*4] \n" \
"str r14, [r0, 14*4] \n" \
: \
: "r"(main_context.core.r) \
: "r0"); \
__asm volatile(" tst lr, #4 \n" \
" ite eq \n" \
" mrseq r0, msp \n" \
" mrsne r0, psp \n" \
" mov r1, lr \n" \
" ldr r2, =load_monitor_interrupt_handler \n" \
" bx r2 \n" \
: \
: \
: "r0", "r1", "r2"); \
} \
}
This code is designed to take a CPU profile using a timer interrupt, but the backtrace unwinding can be reused from any handler including fault handlers. Read the code from the bottom to the top:
It is important that the IRQ function be defined with the attribute __naked__, otherwise the function entry header of GCC will manipulate the state of the CPU in unpredictable way, modifying the stack pointer for example.
First thing we save all other core registers that are not in the exception entry struct. We need to do this from assembly right at the beginning, because these will be typically modified by later C code when they are used as temporary registers.
Then we reconstruct the stack pointer from before the interrupt; the code will work whether the processor was in handler or thread mode before. This pointer is the exception entry structure. This code does not handle stacks that are not 4-byte aligned, but I never saw armgcc do that anyway.
The rest of the code is in C/C++, we fill in the internal structure we took from libgcc, then call the internal implementation of the unwinding process. There are some adjustments we need to make to work around certain assumptions of libgcc that do not hold upon exception entry.
There is one specific situation where the unwinding does not work, which is if the exception happened in a leaf function that does not save LR to the stack upon entry. This never happens when you try to do a backtrace from process mode, because the backtrace function being called will ensure that the calling function is not a leaf. I tried to apply some workarounds by adjusting the LR register during the backtracing process itself, but I'm not convinced it works every time. I'm interested in suggestions on how to do this better.

Direct2D COM calls returning 64-bit structs and C++Builder 2010

I'm trying to get the size of a Direct2D Bitmap and getting an immediate crash.
// props and target etc all set up beforehand.
CComPtr<ID2D1Bitmap> &b;
target->CreateBitmap(D2D1::SizeU(1024,1024), frame.p_data, 1024* 4, &props, &b));
D2D_SIZE_U sz = b->GetPixelSize(); // Crashes here.
All other operations using the bitmap (including drawing it) work correctly. It's just returning the size that seems to be the problem.
Based on a articles like this by Rudy V, my suspicion is that it's some incompatibility with C++Builder 2010 and how COM functions return 64-bit structures. http://rvelthuis.de/articles/articles-convert.html
The Delphi declaration of GetPixelSize looks like this: (from D2D1.pas)
// Returns the size of the bitmap in resolution dependent units, (pixels).
procedure GetPixelSize(out pixelSize: TD2D1SizeU); stdcall;
... and in D2D1.h it's
//
// Returns the size of the bitmap in resolution dependent units, (pixels).
//
STDMETHOD_(D2D1_SIZE_U, GetPixelSize)(
) CONST PURE;
Can I fix this without rewriting the D2D headers?
All suggestions welcome - except upgrading from C++Builder 2010 which is more of a task than I'm ready for at the moment.
„getInfo“ is a function derived from Delphi code, which can work around.
void getInfo(void* itfc, void* info, int vmtofs)
{
asm {
push info // pass pointer to return result
mov eax,itfc // eax poionts to interface
push eax // pass pointer to interface
mov eax,[eax] // eax points to VMT
add eax,vmtofs // eax points rto address of virtual function
call dword ptr [eax] // call function
}
}
Disassembly of code generated by CBuilder, which results in a crash:
Graphics.cpp.162: size = bmp->GetSize();
00401C10 8B4508 mov eax,[ebp+$08]
00401C13 FF7004 push dword ptr [eax+$04]
00401C16 8D55DC lea edx,[ebp-$24]
00401C19 52 push edx
00401C1A 8B4D08 mov ecx,[ebp+$08]
00401C1D 8B4104 mov eax,[ecx+$04]
00401C20 8B10 mov edx,[eax]
00401C22 FF5210 call dword ptr [edx+$10]
00401C25 8B4DDC mov ecx,[ebp-$24]
00401C28 894DF8 mov [ebp-$08],ecx
00401C2B 8B4DE0 mov ecx,[ebp-$20]
00401C2E 894DFC mov [ebp-$04],ecx
„bmp“ is declared as
ID2D1Bitmap* bmp;
Code to call „getInfo“:
D2D1_SIZE_F size;
getInfo(bmp,&pf,0x10);
You get 0x10 (vmtofs) from disassembly line „call dword ptr [edx+$10]“
You can call „GetPixelSize“, „GetPixelFormat“ and others by calling „getInfo“
D2D1_SIZE_U ps;// = bmp->GetPixelSize();
getInfo(bmp,&ps,0x14);
D2D1_PIXEL_FORMAT pf;// = bmp->GetPixelFormat();
getInfo(bmp,&pf,0x18);
„getInfo“ works with methods „STDMETHOD_ ... CONST PURE;“, which return a result.
STDMETHOD_(D2D1_SIZE_F, GetSize)(
) CONST PURE;
For this method CBuilder generates malfunctional code.
In case of
STDMETHOD_(void, GetDpi)(
__out FLOAT *dpiX,
__out FLOAT *dpiY
) CONST PURE;
the CBuilder code works fine, „getDpi“ results void.

How do I get the current interrupt state (enabled, disabled or current level) on a MC9S12ZVM processor

I'm working on a project using a MC9S12ZVM family processor and need to be able to get, save and restore the current interrupt enabled state. This is needed to access variables from the main line code that may be modified by the interrupt handler that are larger than word in size and therefore not atomic.
pseudo code: (variable is 32bits and -= isn't atomic anyhow)
state_save = current_interrupt_state();
DisableInterrupt();
variable -= x;
RestoreInterrupts(state_save);
Edit: I found something that works, but has the issue of modifying the stack.
asm(PSH CCW);
asm(SEI);
Variable++;
asm(PUL CCW);
This is ok as long as I don't need to do anything other than a simple variable++, but I don't like exiting a block with the stack modified.
It seems you are referring to the global interrupt mask. If so, then this is one way to disable it and then restore it to previous state:
static const uint8_t CCR_I_MASK = 0x10;
static uint8_t ccr;
void disable_interrupts (void)
{
__asm PSHA;
__asm TPA; // transfer CCR to A
__asm STA ccr; // store CCR in RAM variable
__asm PULA;
__asm SEI;
}
void restore_interrupts (void)
{
if((ccr & CCR_I_MASK) == 0)
{
__asm CLI; // i was not set, clear it
}
else
{
; // i was set, do nothing
}
}
__asm is specific to the Codewarrior compiler, with or without "strict ANSI" option set.
Ok, I've found an answer to my problem, with thanks to those who commented.
static volatile uint16_t v = 0u;
void testfunction(void);
void testfunction(void)
{
static uint16_t L_CCR;
asm( PSH D2 );
asm( TFR CCW, D2);
asm( ST D2, L_CCR );
asm( PUL D2 );
asm( SEI );
v++;
asm( PSH D2 );
asm( LD D2, L_CCR );
asm( TFR D2, CCW);
asm( PUL D2 );
}

Debugging unmanaged callback problems

Apologies in advance here for the length of the question.
I have a closed source and undocumented COM object - an unmanaged DLL - that I'm attempting to integrate into a Windows service written in C#. The COM object wraps access to some hardware that the service needs to interact with.
I'm not able to get interface documentation or source for the object. All I have to go on is the object itself, three [closed source undocumented] clients that interact with the COM object, and a fair amount of domain specific knowledge.
So far this has been a very tough nut to crack - one week and counting.
I was able to obtain the object's CLSID from the registry - this allowed me to instantiate it in the service.
The next step was to find the IIDs for the interface(s) that I need to use. The particular methods that I was looking for are not exported. I don't have PDBs. There doesn't appear to be any typelib info and the OLE-COM Object Viewer refuses to open the COM object. IDispatch is not implemented either, so it has been a matter of digging. I eventually succeeded in identifying two IIDs by manually searching the binaries for GUIDs and eliminating unique and/or known GUIDs. At this point I'm confident that the IIDs are correct.
The IIDs are obviously useless without corresponding method info. For that I was forced to resort to reversing with IDA. Correlating references to the GUIDs with my knowledge of the hardware functions and the rough disassembly allowed me to make some educated guesses about the structure and purpose of the interfaces.
Now I'm at the point where I need to attempt to use the interfaces to interact with the hardware... and this is where I'm stuck.
From the disassembly, I know that the first method I have to call looks like this:
HRESULT __stdcall SetStateChangeCallback(LPVOID callback);
The callback signature looks something like this:
HRESULT (__stdcall *callbackType)(LPVOID data1, LPVOID data2)
Here is my service code:
[ComImport, System.Security.SuppressUnmanagedCodeSecurity,
Guid(...),
InterfaceType(ComInterfaceType.InterfaceIsIUnknown)]
private interface AccessInterface
{
[PreserveSig]
int SetStateChangeCallback(IntPtr callbackPtr);
...
}
[UnmanagedFunctionPointerAttribute(CallingConvention.StdCall)]
private delegate int OnStateChangeDelegate(IntPtr a, IntPtr b);
private int OnStateChange(IntPtr a, IntPtr b)
{
Debug("***** State change triggered! *****");
}
private Guid _typeClsid = new Guid(...);
private Guid _interfaceIid = new Guid(...);
private object _comObj = null;
private AccessInterface _interface = null;
private OnStateChangeDelegate _stateChangeDelegate = null;
private IntPtr _functionPtr = IntPtr.Zero;
private void InitHardware()
{
Type t = Type.GetTypeFromCLSID(_typeClsid);
_comObj = Activator.CreateInstance(t);
if (_comObj == null)
{
throw new NullReferenceException();
}
_interface = _comObj as AccessInterface;
if (_interface == null)
{
throw new NullReferenceException();
}
_stateChangeDelegate = new OnStateChangeDelegate(OnStateChange);
_functionPtr = Marshal.GetFunctionPointerForDelegate(_stateChangeDelegate);
int hr = _interface.SetStateChangeCallBack(_functionPtr);
// hr (HRESULT) == 0, indicating success
}
Now, I can run this code successfully but only if I pass IntPtr.Zero to SetStateChangeCallBack(). If I pass a real reference, the service crashes within a matter of seconds after calling SetStateChangeCallBack() - presumably when the COM object tries to invoke the callback for the first time - with exception code 0xc0000005.
The fault offset is consistent. With the aid of IDA and the previously generated disassembly I was able to identify the area where the problem occurs:
06B04EF7 loc_6B04EF7: ; CODE XREF: 06B04F49j
06B04EF7 lea eax, [esp+0Ch]
06B04EFB push eax
06B04EFC mov ecx, ebx
06B04EFE call near ptr unk_6B06660
06B04F03 test eax, eax
06B04F05 jl short loc_6B04F4B
06B04F07 mov esi, [esp+0Ch]
06B04F0B test esi, esi
06B04F0D jz short loc_6B04F45
06B04F0F push 36h
06B04F11 lea ecx, [esp+18h]
06B04F15 push 0
06B04F17 push ecx
06B04F18 call near ptr unk_6B0F960
06B04F1D mov edx, [esp+1Ch]
06B04F21 push edx
06B04F22 lea eax, [esp+24h]
06B04F26 push esi
06B04F27 push eax
06B04F28 call near ptr unk_6B0F9E0
06B04F2D push esi
06B04F2E call near ptr unk_6B0C8D2
06B04F33 mov eax, [edi+4]
06B04F36 mov ecx, [eax]
06B04F38 add esp, 1Ch
06B04F3B lea edx, [esp+14h]
06B04F3F push edx
06B04F40 push eax
06B04F41 mov eax, [ecx] ; Crash here!
06B04F43 call eax
06B04F45
06B04F45 loc_6B04F45: ; CODE XREF: 06B04F0Dj
06B04F45 cmp dword ptr [edi+28h], 0
06B04F49 jnz short loc_6B04EF7
06B04F4B
06B04F4B loc_6B04F4B: ; CODE XREF: 06B04F05j
06B04F4B pop esi
06B04F4C pop ebx
06B04F4D pop edi
06B04F4E add esp, 40h
06B04F51 retn
The crash is at offset 0x06B04F41 (ie. "mov eax, [ecx]").
Corresponding pseudo code function from the disassembly (note assembler above starts at the do loop):
void __thiscall sub_10004EE0(int this)
{
int v1; // edi#1
void *v2; // esi#4
void *v3; // [sp+4h] [bp-40h]#3
int v4; // [sp+8h] [bp-3Ch]#5
char v5; // [sp+Ch] [bp-38h]#5
v1 = this;
if ( *(_DWORD *)(this + 4) )
{
if ( *(_DWORD *)(this + 40) )
{
do
{
if ( sub_10006660(v1 + 12, (int)&v3) < 0 )
break;
v2 = v3;
if ( v3 )
{
memset(&v5, 0, 0x36u);
unknown_libname_44(&v5, v2, v4);
j_j__free(v2);
// Crash on this statement!
(*(void (__stdcall **)(_DWORD, char *))**(void (__stdcall ****)(_DWORD, _DWORD))(v1 + 4))(
*(_DWORD *)(v1 + 4),
&v5);
}
}
while ( *(_DWORD *)(v1 + 40) );
}
}
}
I'm convinced that I am not passing the function pointer to the COM object correctly, but I'm stuffed if I can figure out how to do it properly. I've tried [in order of desperation!]:
_functionPtr
_functionPtr.ToPointer() [as void* param]
_functionPtr.ToInt32() [as int param]
_stateChangeDelegate [as OnStateChangeDelegate param]
OnStateChange [as OnStateChangeDelegate param]
using CallingConvention.Cdecl for the delegate
adding static qualifier to variables and functions
changing signature of the callback (including removing the return value, changing the parameters to ints, modifying the number of parameters)
adding a level of indirection [by storing _functionPtr.ToInt32() in a block of memory allocated with Marshal.AllocCoTaskMem()]
In some cases the changes triggered different crash locations... like crashes in ntdll, or at 06B04F36. In most cases the crash is as described above - at 06B04F41.
When I attach IDA Pro to the process it looks like the address of my callback is going into EAX at 06B04F40, and the address that the COM object attempts to use has a fixed offset from that. For example:
EAX (correct address) = 000A1392
ECX (used address) = 0A1378B8
The last 4 digits of ECX are always 78B8.
So again, I think I'm not passing the delegate or function pointer correctly but I'm not sure how to do it. I guess the fact that the service is running in a WOW64 environment could also be having an impact.
My question: what would you suggest I do to (1) get more information about the problem and/or (2) solve the problem?
Keep in mind I don't have access to any source code except the full code for the C# service. I'm using the free version of IDA Pro so I don't seem to be able to do anything more useful than reverse to pseudo code or attach to the process and catch the crash exception. It is not possible to run the service from VS in debug mode so I really only have logging on that side... not that I think it would be much good as the problem is triggering in the unmanaged code where I don't have compilable/easily-readable source. Maybe I'm wrong?
Thank you sincerely for your advice!
Edit:
Well, after another day bashing my head against the problem I figured if I couldn't succeed from C# I would try and create a minimal C++ test application to do what the service has to do... and I was successful!
IAccessInterface : public IUnknown
{
public:
virtual HRESULT STDMETHODCALLTYPE SetCallback(
/* [in] */ LPVOID pCallBack) = 0;
virtual HRESULT STDMETHODCALLTYPE SetDevice(
/* [in] */ char* context1,
/* [in] */ LPVOID context2,
/* [in] */ LPVOID context3) = 0;
virtual HRESULT STDMETHODCALLTYPE CloseDevice() = 0;
};
IAccessInterface* pInterface;
int __stdcall CallbackImpl(char* context, char* data)
{
printf("Callback succeeded!\r\n");
return 0;
}
void CleanUp(bool deviceOpen)
{
if (pInterface != NULL)
{
if (deviceOpen)
{
pInterface->SetCallback(NULL);
pInterface->CloseDevice();
}
pInterface->Release();
pInterface = NULL;
}
CoUninitialize();
}
int _tmain(int argc, _TCHAR* argv[])
{
GUID objClsid = GUID();
GUID interfaceIid = GUID();
CoInitialize(NULL);
int hr = CoCreateInstance(objClsid, 0, 1, interfaceIid, (void**)&pInterface);
if (!pInterface || !SUCCEEDED(hr))
{
CleanUp(false);
return 1;
}
LPVOID ptr = &callbackImpl;
LPVOID ptr2 = &ptr;
hr = pInterface->SetCallback(&ptr2);
if (!SUCCEEDED(hr))
{
CleanUp(false);
return 1;
}
char* context1 = "a_device_identifier";
hr = pInterface->SetDevice(context1, NULL, NULL);
if (!SUCCEEDED(hr))
{
CleanUp(false);
}
Sleep(30000); // give time for device to initialise and trigger callbacks (testing only)
// clean up
CleanUp(true);
return 0;
}
So now I just need to find a way to replicate the following three lines with equivalent C#:
LPVOID ptr = &CallbackImpl;
LPVOID ptr2 = &ptr;
hr = pInterface->SetCallback(&ptr2);
It seems unnecessary (even suspicious) that so many levels of indirection would be required. Maybe I haven't fully understood the disassembly. At this point the most important thing is that it works.
So any comments about how to achieve this from C# would be welcome!

are 2^n exponent calculations really less efficient than bit-shifts?

if I do:
int x = 4;
pow(2, x);
Is that really that much less efficient than just doing:
1 << 4
?
Yes. An easy way to show this is to compile the following two functions that do the same thing and then look at the disassembly.
#include <stdint.h>
#include <math.h>
uint32_t foo1(uint32_t shftAmt) {
return pow(2, shftAmt);
}
uint32_t foo2(uint32_t shftAmt) {
return (1 << shftAmt);
}
cc -arch armv7 -O3 -S -o - shift.c (I happen to find ARM asm easier to read but if you want x86 just remove the arch flag)
_foo1:
# BB#0:
push {r7, lr}
vmov s0, r0
mov r7, sp
vcvt.f64.u32 d16, s0
vmov r0, r1, d16
blx _exp2
vmov d16, r0, r1
vcvt.u32.f64 s0, d16
vmov r0, s0
pop {r7, pc}
_foo2:
# BB#0:
movs r1, #1
lsl.w r0, r1, r0
bx lr
You can see foo2 only takes 2 instructions vs foo1 which takes several instructions. It has to move the data to the FP HW registers (vmov), convert the integer to a float (vcvt.f64.u32) call the exp function and then convert the answer back to an uint (vcvt.u32.f64) and move it from the FP HW back to the GP registers.
Yes. Though by how much I can't say. The easiest way to determine that is to benchmark it.
The pow function uses doubles... At least, if it conforms to the C standard. Even if that function used bitshift when it sees a base of 2, there would still be testing and branching to reach that conclusion, by which time your simple bitshift would be completed. And we haven't even considered the overhead of a function call yet.
For equivalency, I assume you meant to use 1 << x instead of 1 << 4.
Perhaps a compiler could optimize both of these, but it's far less likely to optimize a call to pow. If you need the fastest way to compute a power of 2, do it with shifting.
Update... Since I mentioned it's easy to benchmark, I decided to do just that. I happen to have Windows and Visual C++ handy so I used that. Results will vary. My program:
#include <Windows.h>
#include <cstdio>
#include <cmath>
#include <ctime>
LARGE_INTEGER liFreq, liStart, liStop;
inline void StartTimer()
{
QueryPerformanceCounter(&liStart);
}
inline double ReportTimer()
{
QueryPerformanceCounter(&liStop);
double milli = 1000.0 * double(liStop.QuadPart - liStart.QuadPart) / double(liFreq.QuadPart);
printf( "%.3f ms\n", milli );
return milli;
}
int main()
{
QueryPerformanceFrequency(&liFreq);
const size_t nTests = 10000000;
int x = 4;
int sumPow = 0;
int sumShift = 0;
double powTime, shiftTime;
// Make an array of random exponents to use in tests.
const size_t nExp = 10000;
int e[nExp];
srand( (unsigned int)time(NULL) );
for( int i = 0; i < nExp; i++ ) e[i] = rand() % 31;
// Test power.
StartTimer();
for( size_t i = 0; i < nTests; i++ )
{
int y = (int)pow(2, (double)e[i%nExp]);
sumPow += y;
}
powTime = ReportTimer();
// Test shifting.
StartTimer();
for( size_t i = 0; i < nTests; i++ )
{
int y = 1 << e[i%nExp];
sumShift += y;
}
shiftTime = ReportTimer();
// The compiler shouldn't optimize out our loops if we need to display a result.
printf( "Sum power: %d\n", sumPow );
printf( "Sum shift: %d\n", sumShift );
printf( "Time ratio of pow versus shift: %.2f\n", powTime / shiftTime );
system("pause");
return 0;
}
My output:
379.466 ms
15.862 ms
Sum power: 157650768
Sum shift: 157650768
Time ratio of pow versus shift: 23.92
That depends on the compiler, but in general (when the compiler is not totally braindead) yes, the shift is one CPU instruction, the other is a function call, that involves saving the current state an setting up a stack frame, that requires many instructions.
Generally yes, as bit shift is very basic operation for the processor.
On the other hand many compilers optimise code so that raising to power is in fact just a bit shifting.