Follow-up questions: "When investigating a crash, should I only investigate second chance exceptions? What are the cases when I also need to investigate a first chance exception dump?"
My questions are a bit broad but I'm curious to what's really the answer. I've read numerous articles that says first chance exceptions are unlikely to cause an application crash; it's a second chance exceptions that causes it. A simple google search doesn't answer my question directly.
EDIT: Here are sample articles, but there are many other more:
What is a First Chance Exception?:
"For code without exception handling, the debugger will receive a
second chance exception notification and will stop with a unhandled
exception. "
Program crashes, but Debug Diag says it's a first chance exception, is that correct?
Surely by definition, only a 2nd chance exception can make code crash,
i.e. one that has NOT been handled by the code?
I'm having an intermittent issue where my app restarts or crash (no error in event viewer) but before it restarts, Adplus generates some first chance AccessViolation exceptions. No second chance exceptions.
Below is a snippet of the FULLDUMP_FirstChance_av_AccessViolation on WinDbg.exe:
PROBLEM_CLASSES:
HEAP_CORRUPTION
Tid [0x16e8]
Frame [0x02]: ntdll!RtlAllocateHeap
HEAP_CORRUPTION
Tid [0x16e8]
Frame [0x02]: ntdll!RtlAllocateHeap
INVALID_POINTER_READ
Tid [0x16e8]
Frame [0x00]: ntdll!ExpInterlockedPopEntrySListFault
NOSOS
Tid [0x16e8]
BUGCHECK_STR: HEAP_CORRUPTION_HEAP_CORRUPTION_INVALID_POINTER_READ_NOSOS
Sample call stacks below:
# ChildEBP RetAddr Args to Child
00 085aec28 7c91020e 00000007 00c407d8 00c40000 ntdll!ExpInterlockedPopEntrySListFault (FPO: [0,2,0])
01 085aec58 7c91019b 00c407d8 00000030 00000000 ntdll!RtlpAllocateFromHeapLookaside+0x1d (FPO: [Non-Fpo])
02 085aee84 78134d83 00c40000 00000000 00000030 ntdll!RtlAllocateHeap+0x1c2 (FPO: [Non-Fpo])
03 085aeea4 78160e30 00000030 0000002f 085aeecc msvcr80!malloc(unsigned int size = 0x30)+0x7a (FPO: [1,0,0]) (CONV: cdecl) [f:\dd\vctools\crt_bld\self_x86\crt\src\malloc.c # 163]
04 085aeebc 7c4221b3 00000030 00000003 7c422f20 msvcr80!operator new(unsigned int size = 0x30)+0x1d (FPO: [Non-Fpo]) (CONV: cdecl) [f:\dd\vctools\crt_bld\self_x86\crt\src\new.cpp # 59]
05 085aeed4 7c423315 00000030 00000000 ae218f51 msvcp80!std::_Allocate<char>(unsigned int _Count = 0x30, char * __formal = 0x00000000 "")+0x15 (FPO: [Non-Fpo]) (CONV: cdecl) [f:\dd\vctools\crt_bld\self_x86\crt\src\xmemory # 44]
06 085aef0c 7c4233c4 0000002a 00000000 085af028 msvcp80!std::basic_string<char,std::char_traits<char>,std::allocator<char> >::_Copy(unsigned int _Newsize = 0x2a, unsigned int _Oldlen = 0)+0x55 (FPO: [Non-Fpo]) (CONV: thiscall) [f:\dd\vctools\crt_bld\self_x86\crt\src\xstring # 2020]
07 085aef20 7c423779 0000002a 00000000 085af200 msvcp80!std::basic_string<char,std::char_traits<char>,std::allocator<char> >::_Grow(unsigned int _Newsize = 0x2a, bool _Trim = false)+0x22 (FPO: [2,0,0]) (CONV: thiscall) [f:\dd\vctools\crt_bld\self_x86\crt\src\xstring # 2050]
08 085aef3c 7c425e55 0000002a 00000000 0000002a msvcp80!std::basic_string<char,std::char_traits<char>,std::allocator<char> >::append(class std::basic_string<char,std::char_traits<char>,std::allocator<char> > * _Right = 0x0000002a, unsigned int _Roff = 0, unsigned int _Count = 0x2a)+0x58 (FPO: [Non-Fpo]) (CONV: thiscall) [f:\dd\vctools\crt_bld\self_x86\crt\src\xstring # 969]
09 085aef4c 60baed1e 085af028 ae262fd2 085af1a4 msvcp80!std::basic_string<char,std::char_traits<char>,std::allocator<char> >::append(class std::basic_string<char,std::char_traits<char>,std::allocator<char> > * _Right = 0x085af028 " S1 S1 Card number: ************8706
")+0xd (FPO: [1,0,0]) (CONV: thiscall) [f:\dd\vctools\crt_bld\self_x86\crt\src\xstring # 956]
0a 085af1a4 7c802662 00000100 00000000 00000000 aipoptrv19!DllUnregisterServer+0x1f15e
0b 085af234 7c42317a 00000000 00000000 0000000f kernel32!WaitForSingleObject+0x12 (FPO: [Non-Fpo])
0c 085af274 60bc1fd8 60baa1cb 0865d680 0000001c msvcp80!std::basic_string<char,std::char_traits<char>,std::allocator<char> >::basic_string<char,std::char_traits<char>,std::allocator<char> >(void)+0x11 (FPO: [0,0,4]) (CONV: thiscall) [f:\dd\vctools\crt_bld\self_x86\crt\src\xstring # 576]
0d 085af278 60baa1cb 0865d680 0000001c 00000002 aipoptrv19!DllUnregisterServer+0x32418
0e 085af2e4 60bb227c 00000001 085af420 0865d648 aipoptrv19!DllUnregisterServer+0x1a60b
0f 085af34c 7c425e45 085af404 00000000 ffffffff aipoptrv19!DllUnregisterServer+0x226bc
10 085af35c 60b97724 72506f44 69746e69 0000676e msvcp80!std::basic_string<char,std::char_traits<char>,std::allocator<char> >::assign(class std::basic_string<char,std::char_traits<char>,std::allocator<char> > * _Right = 0x72506f44)+0xd (FPO: [1,0,0]) (CONV: thiscall) [f:\dd\vctools\crt_bld\self_x86\crt\src\xstring # 1044]
11 085af45c 78261414 00000002 403110f4 7824f516 aipoptrv19!DllUnregisterServer+0x7b64
12 085af468 7824f516 fffffffe 781f2c2e 0000001c mfc80!_AfxDispatchCall(<function> * __formal = 0x40b59c84, void * __formal = 0x085af6b8, unsigned int __formal = 0x85a0003)+0x10 (CONV: stdcall) [f:\dd\vctools\vc7libs\ship\atlmfc\src\mfc\olecall.cpp # 40]
13 085af470 781f2c2e 0000001c 7824f49b 00000008 mfc80!CCmdTarget::CallMemberFunc(struct AFX_DISPMAP_ENTRY * pEntry = 0x6d756e20, unsigned short wFlags = 0x6562, struct tagVARIANT * pvarResult = 0x20202020 Empty, struct tagDISPPARAMS * pDispParams = 0x2a2a2a2a, unsigned int * puArgErr = 0x2a2a2a2a)+0x1ad (CONV: thiscall) [f:\dd\vctools\vc7libs\ship\atlmfc\src\mfc\oledisp1.cpp # 1064]
The error are about heap corruptions and invalid pointers which I'm still studying about. I'm a complete newbie on heaps and mallocs and I just learned debugging using WinDbg. I just wanted to know if I'm wasting my time learning about memory allocations when it's not my priority and will not really fix my issue. (Of course knowing about heaps is a good thing but fixing the main issue is the top priority)
I'm confident in my adplus config file and I'm sure it will generate a full dumps on all second chance exceptions. I tried it on a sample application.
The app doesn't crash, it just unexpectedly and intermittently restart without event viewer error. It can be recreated intermittently when a specific service is used.
Here are my possible thoughts if the dump files are not really the cause of the issue:
Other process (not attached on my adplus) caused the restart.
Second chance exception full dumps just didn't generated.
Others (Any thoughts?)
PS: Sorry if I didn't specify some details and code samples, etc. as it's confidential. I did my best explaining the issue without compromising company policy.
Advance thank you!
This MSDN article about exception dispatching explains the process:
When an exception occurs in user-mode code, the system uses the following search order to find an exception handler:
If the process is being debugged, the system notifies the debugger. For more information, see Debugger Exception Handling.
If the process is not being debugged, or if the associated debugger does not handle the exception, the system attempts to locate a frame-based exception handler by searching the stack frames of the thread in which the exception occurred. The system searches the current stack frame first, then searches through preceding stack frames in reverse order.
If no frame-based handler can be found, or no frame-based handler handles the exception, but the process is being debugged, the system notifies the debugger a second time.
If the process is not being debugged, or if the associated debugger does not handle the exception, the system provides default handling based on the exception type. For most exceptions, the default action is to call the ExitProcess function.
In step 1 the exception is called a first chance exception, because it's the first chance anyone can catch and handle the exception.
In step 3 the same exception is called a second chance exception, because it's the second time, the debugger gets the chance to catch and handle the exception.
Only if the process continues to step 4 the program will crash or exit. Therefore yes, only second chance exceptions can crash a process.
Can unmanaged first chance exception cause a crash/restart?
No. See before.
When investigating a crash, should I only investigate second chance exceptions?
Basically yes. That's what everyone (>90%) does when analyzing crashes.
What are the cases when I also need to investigate a first chance exception dump?
Case 1:
That second chance exception might be a result of a previous first chance exception. Due to that first chance exception, a value might not be initialized and cause a different second chance exception.
Example code for such a scenario:
SomeObject o = null;
try {
throw new Exception("First chance"); // consider this in some method
o = new SomeObject();
}
catch (Exception)
{
// make sure that the exception does not become a second chance exception
}
o.DoSomething(); // causes NullReferenceException first chance and second chance if uncaught
The application crashes because of a NullReferenceException but the real cause is the Exception before. However, such cases are typically easy to identify without having a look at first chance exceptions.
Case 2:
Exceptions have a high overhead, i.e. they cost CPU cycles and thus performance. If you have really many first chance exceptions, you might want to get rid of them.
Related
I am trying to get CAN communication running via an external SBC (TLE9263) board.
Microcontroller: S32K312
Ext. SBC board: TLE9263_EVB_2
Without SBC, i.e., using a standard external CAN transceiver TJA1057GT, CAN communication is running.
With SBC, some messages are received once, but then the SBC transceiver goes down, and CANIF_E_FATAL Det error occurs (Call Stack).
I configured the SBC registers as follows: SBC registers configuration
The following values are observed on the SPI-MOSI signal:
41 00 - SUP_STAT_1
82 00 - HW_CTRL
81 1F - M_S_CTRL
84 07 - BUS_CTRL_1
85 00 - BUS_CTRL_2
83 82 - WD_CTRL
The above observed values are expected in my opinion. What could be the cause of communication not working?
Additionally (not sure if this is relevant), the Fail Output LEDs on the TLE9263 board, FO1 and FO3 are ON as soon as the board is powered, and FO2 is blinking, and this status of the LEDs remains the same when the software is run.
I was working on esp32 MQTT. When I publish a message from cloud to microcontroller I received an MQTT message based on the msg program do process after the process complete I m sending the acknowledgement using MQTT. After sending acknowledgement the esp getting crash. So, I want to know what does this error means?
What will the possible reason that I m getting the error?
DEBUG: [mqtt.c:800:handleMqttPayload] ------------------------>line
DEBUG: [mqtt.c:472:_mqttSubscriptionCallback] ------------------------>line
Guru Meditation Error: Core 0 panic'ed (LoadProhibited). Exception was unhandled.
Core 0 register dump:
PC : 0x400957b6 PS : 0x00060933 A0 : 0x80085160 A1 : 0x3ffe2120
0x400957b6: is_free at /home/horsemann/Desktop/WorkSpace/TestingRepo/vendors/espressif/esp-idf/components/heap/multi_heap.c:380
(inlined by) multi_heap_malloc_impl at /home/horsemann/Desktop/WorkSpace/TestingRepo/vendors/espressif/esp-idf/components/heap/multi_heap.c:432
A2 : 0x3ffb9a20 A3 : 0x00000074 A4 : 0x3ffb9bc2 A5 : 0x3ffc24f4
A6 : 0x00000000 A7 : 0x3ffc1930 A8 : 0x62df42e6 A9 : 0x00003ffb
A10 : 0x00000001 A11 : 0x00000001 A12 : 0x62df42e6 A13 : 0x3ffb9b98
A14 : 0x3ffb9bc2 A15 : 0x00000003 SAR : 0x0000001d EXCCAUSE: 0x0000001c
EXCVADDR: 0x00003ffb LBEG : 0x4000c2e0 LEND : 0x4000c2f6 LCOUNT : 0xffffffff
ELF file SHA256: 6ba6c7666cfc3a6affb97ff2c01bc138e861781ba07bfe6d7e01fbf2e790ec91
Backtrace: 0x400957b6:0x3ffe2120 0x4008515d:0x3ffe2140 0x40085456:0x3ffe2160 0x40085671:0x3ffe21a0 0x400817ad:0x3ffe21c0 0x400eeae1:0x3ffe21e0 0x400ef431:0x3ffe2220 0x400ee77f:0x3ffe2240 0x400e2ca4:0x3ffe2270 0x400e2e90:0x3ffe22c0 0x400f2ca9:0x3ffe22e0 0x400f2d18:0x3ffe2300 0x400f3c7a:0x3ffe2320 0x4008fa5d:0x3ffe2350
0x400957b6: is_free at /home/horsemann/Desktop/WorkSpace/TestingRepo/vendors/espressif/esp-idf/components/heap/multi_heap.c:380
(inlined by) multi_heap_malloc_impl at /home/horsemann/Desktop/WorkSpace/TestingRepo/vendors/espressif/esp-idf/components/heap/multi_heap.c:432
0x4008515d: heap_caps_malloc at /home/horsemann/Desktop/WorkSpace/TestingRepo/vendors/espressif/esp-idf/components/heap/heap_caps.c:232
0x40085456: trace_malloc at /home/horsemann/Desktop/WorkSpace/TestingRepo/vendors/espressif/esp-idf/components/heap/heap_trace.c:188
0x40085671: __wrap_heap_caps_malloc at /home/horsemann/Desktop/WorkSpace/TestingRepo/vendors/espressif/esp-idf/components/heap/heap_trace.c:421
0x400817ad: malloc_internal_wrapper at /home/horsemann/Desktop/WorkSpace/TestingRepo/vendors/espressif/esp-idf/components/esp32/esp_adapter.c:407
0x400eeae1: esf_buf_alloc at ??:?
0x400ef431: ic_ebuf_alloc at ??:?
0x400ee77f: ieee80211_getmgtframe at ??:?
0x400e2ca4: ieee80211_encap_null_data at ??:?
0x400e2e90: ieee80211_pm_tx_null_process at ??:?
0x400f2ca9: pm_tx_null_data_done_process at ??:?
0x400f2d18: pm_send_wake_null_cb at ??:?
0x400f3c7a: ppProcTxDone at ??:?
0x4008fa5d: ppTask at ??:?
The call stack is presented in the debug output :
ppTask calls
ppProcTxDone calls
pm_send_wake_null_cb etc.
Since the error is likely to be in your code you should look at the last call in the back-trace that is yours at the place(s) it calls the next function in the stack dump, and verify the validity of any call parameters.
Another useful information here is the value in the EXCCAUSE (Exception Cause) register 28(0x1C):
indicating access to an invalid address.
The location of the exception is:
0x400957b6: is_free at [...]/espressif/esp-idf/components/heap/multi_heap.c:380
That function looks like this:
static inline bool is_free(const block_header_t *block)
{
return ((block->size & 0x01) != 0);
}
and the most likely cause of an exception there is if block refers to an invalid location or is null or has invalid alignment when it is dereferenced by block->size. The exception rather suggests the first (or maybe second) of these possibilities.
That suggests heap corruption, which could have occurred anywhere are any time previously - not necessarily in the code path indicated by the backtrace. It is typically caused by over-running or under-running an allocated heap block and then detected when a new heap operation (malloc, free, new, delete etc.) and tries to interpret the already corrupted heap.
You need therefore to review your usage of every dynamically allocated block to ensure that you have for example:
Allocated an appropriate size in all cases,
Have not accessed and modified data beyond the bounds of the allocated size,
Have not accessed the memory after it has been free'd / delete'd or otherwise returned to the heap.
Do not have a memory leak.
Aside:
"Guru Mediation Error" is a misspelling of "Guru Meditation Error"; it is itself meaningless (a "cute" computing history reference to what was itself a joke, then rendered less cute or funny by misspelling), but it is essentially akin to a kernel panic or BSOD. The critical thing is; an exception occurred.
Here is my code:
#pragma comment (linker, "/INCLUDE:_tls_used")
#pragma comment (linker, "/INCLUDE:p_tls_callback1")
#pragma const_seg(push)
#pragma const_seg(".CRT$XLAAA")
EXTERN_C const PIMAGE_TLS_CALLBACK p_tls_callback1 = tls_start_protect;
#pragma const_seg(pop)
and following piece of code is directly called by tls_start_protect.
char buf[10];
sprintf_s(buf, 10, "hello\n");
and it crashes.
0:000> k
# Child-SP RetAddr Call Site
00 0000007c`e676eb58 00007ff6`e3b04829 AGTtest!__crtFlsGetValue+0x10 [f:\dd\vctools\crt\crtw32\misc\winapisupp.c # 422]
01 0000007c`e676eb60 00007ff6`e3b047f3 AGTtest!_getptd_noexit+0x1d [f:\dd\vctools\crt\crtw32\startup\tidtable.c # 277]
02 0000007c`e676eb90 00007ff6`e3b03737 AGTtest!_getptd+0xb [f:\dd\vctools\crt\crtw32\startup\tidtable.c # 337]
03 0000007c`e676ebc0 00007ff6`e3b06030 AGTtest!_LocaleUpdate::_LocaleUpdate+0x1b [f:\dd\vctools\crt\crtw32\h\setlocal.h # 248]
04 0000007c`e676ebf0 00007ff6`e3b02816 AGTtest!_output_s_l+0x6c [f:\dd\vctools\crt\crtw32\stdio\output.c # 1028]
05 0000007c`e676ef10 00007ff6`e3b028a8 AGTtest!_vsnprintf_helper+0x92 [f:\dd\vctools\crt\crtw32\stdio\vsprintf.c # 140]
06 0000007c`e676ef80 00007ff6`e3b025a3 AGTtest!_vsprintf_s_l+0x3c [f:\dd\vctools\crt\crtw32\stdio\vsprintf.c # 237]
07 0000007c`e676efc0 00007ff6`e3b0112f AGTtest!sprintf_s+0x1f [f:\dd\vctools\crt\crtw32\stdio\sprintf.c # 216]
08 0000007c`e676f000 00007ffb`bd6a52c8 AGTtest!tls_start_protect+0x1f [d:\repos\antidebug\agt\tls_callback.c # 83]
09 0000007c`e676f040 00007ffb`bd6a1577 ntdll!LdrpCallInitRoutine+0x4c
0a 0000007c`e676f0a0 00007ffb`bd7201cd ntdll!LdrpCallTlsInitializers+0x93
0b 0000007c`e676f120 00007ffb`bd75166d ntdll!LdrpInitializeProcess+0x1c99
0c 0000007c`e676f510 00007ffb`bd706d5e ntdll!_LdrpInitialize+0x4a8b9
0d 0000007c`e676f590 00000000`00000000 ntdll!LdrInitializeThunk+0xe
Similarly family of _vscprintf(format, args) also crash in __crtFlsGetValue.
I wonder it's too early to call family of printfs, before which, initialization haven't been done. What I know is that TLS callback(Only DLL_PROCESS_ATTACH) is executed after 'ntdll!Ldr*' load all dependent modules and before 'EOP'.
Question:
Any detail about the initialization of _vscprintf, is it done by "CRT" in some CPP constructors' code???
Any other restriction of TLS callback???
If I do need to call _vscprintf in TLS Callback, how? (somehow, I just wanna print before main)
you use static linked CRT - this is visible from your stack trace. static linked CRT in exe initialized after exe entry point is called. but tls callback on DLL_PROCESS_ATACH called before exe entry point. in this case your static CRT yet not initialized and any call to this CRT code can crash. solution - use dynamic linked CRT in separate DLL - in this case it already will be initialized before TLS callback
I'm trying to unmount a volume in my Cocoa application using the Disk Arbitration Framework.
Before calling:
DADiskUnmount(disk,
kDADiskUnmountOptionDefault,
unmountCallback,
self );
I register a callback function that get's called afterwards:
void unmountCallback(DADiskRef disk, DADissenterRef dissenter, void *context )
{
if (dissenter != NULL)
{
DAReturn ret = DADissenterGetStatus(dissenter);
switch (ret) {
case kDAReturnBusy:
printf("kDAReturnBusy\n");
break;
}
}
In this function I try to interpret the dissenter return value but get stuck. I suppose it should be of type DAReturn and have a value like kDAReturnBusy But when e.g. iTunes is using the volume and it can not be unmounted "ret" has a value of 0xc010 that I don't quite understand.
In case unmounting fails I'd like to find out why the volume can't be unmounted and in case another application is using it remind the user of closing this application.
But when e.g. iTunes is using the volume and it can not be unmounted "ret" has a value of 0xc010 that I don't quite understand.
The documentation you linked to, for the DAReturn type, lists all the Disk Arbitration constants as looking like this:
kDAReturnError = err_local | err_local_diskarbitration | 0x01, /* ( 0xF8DA0001 ) */
So, DA's error returns are all made of three components, OR'd together.
If you look at the documentation for DADissenterGetStatus, it says:
A BSD return code, if applicable, is encoded with unix_err().
If you then search the headers for unix_err, you find it in /usr/include/mach/error.h, which says:
/* unix errors get lumped into one subsystem */
#define unix_err(errno) (err_kern|err_sub(3)|errno)
and:
/*
* error number layout as follows:
*
* hi lo
* | system(6) | subsystem(12) | code(14) |
*/
There's those three components again. Some other macros in error.h arrange the system and subsystem values (e.g., err_kern and err_sub(3)) into those positions.
So now, let's open the Calculator, press ⌘3 to put it into programmer mode, switch it to base-16, and type in your error code, and see what it says:
0xC010
0000 0000 0000 0000 1100 0000 0001 0000
31 15 0
Breaking that apart according to the above layout, we find:
0000 00
31
System: 0, which error.h says is err_kern. This error came from the kernel.
00 0000 0000 11
31 15
Subsystem: 3 (0b11). This plus the system code matches the aforementioned definition of unix_err. So this is a BSD return code, as DADissenterGetStatus said.
00 0000 0001 0000
31 15 0
Individual error code: 16 (0x10, 0b10000).
UNIX/BSD errors are defined in <sys/errno.h>, which says:
#define EBUSY 16 /* Device / Resource busy */
This suggests to me that you can't unmount that device because it's in use.
the above post nicely explains how to find out information about the error code which you have seeing.
however, how to actually solve the issue with unmount failing due to EBUSY?
if you don't care about processes that might still be using the mounted volume, you can just force the dismount by changing:
DADiskUnmount(disk, kDADiskUnmountOptionDefault...)
to
DADiskUnmount(disk, kDADiskUnmountOptionForce...)
your idea of "reminding the user of closing this application" is more complicated to implement. if you really want to go that way, i guess you could parse the output of /usr/sbin/lsof to find the 'offending' process names
We have a large legacy VB app made up of a number of DLLs (a couple of dozen or so), all installed into a single COM+ Server Application. Every now and then, something happens that causes dllhost.exe to keel over (and automatically restart), leaving this message in the Windows Application Event log...
The system has called a custom component and that component has
failed and generated an exception. This indicates a problem with the
custom component. Notify the developer of this component that a failure has
occurred and provide them with the information below.
Server Application ID: {8CC02F18-2733-4A17-9E5C-1A70CB6B6977}
Server Application Instance ID: {1940A147-8A5E-45FA-86FE-DAF92A822597}
Server Application Name: MyTestApp
The serious nature of this error has caused the process to terminate.
Exception: C0000005
Address: 0x758DA3DA
Source: Complus
Event ID: 4786
Level: Error
Along side this is another log, specifically on dllhost.exe...
Faulting application name: dllhost.exe, version: 6.0.6000.16386, time stamp: 0x4549b14e
Faulting module name: msvcrt.dll, version: 7.0.6002.18005, time stamp: 0x49e0379e
Exception code: 0xc0000005
Fault offset: 0x0000a3da
Faulting process id: 0x83c
Faulting application start time: 0x01cb50c507ee0166
Faulting application path: %11
Faulting module path: %12
Report Id: %13
I know it's flagging a failure in the C runtime (msvcrt), but ideally I need to trace this back into the DLL that's called into msvcrt (probably with bad data/parameters). So without installing a debugger, is there any way to identify the DLL that causes this? I'm trying to see if there's a memory dump anywhere I can use to analyse offline - and thus tie the Address to something specific. But without that, I'm not sure that's possible. Can the COM subsystem be told to generate a minidump when a hosted application crashes? (yes it can [probably] - there's a checkbox on the 'Dump' tab).
This is on Windows Server 2008 R1 32-bit (but also be interested for Server 2003 as well).
It doesn't affect availability of the app -- COM+ simply restarts dllhost and the application continues, but it is an inconvienience that would be useful to fix.
Edit Okay, I've got a crash dump, I've got windbg, but it's not helping. Not sure if I'm being thick (a possibility) or something else :-) Output of !analyze -v is below , but it's not showing me anything in our DLLs, although it looks like it hasn't been able to resolve FAULTING_IP? I'm not sure where to turn next.
I'm wondering if any of my pdb's are dodgy and be worth generating new ones -- hooked into Microsoft's symbol server, so they shouldn't be, but not sure for what module it's (apparently) reporting wrong symbols for (BUGCHECK_STR and PRIMARY_PROBLEM_CLASS) (or are these symbols on the server the code was originally running on?). Would it be better to put the PDBs on the server itself?
If not, any other ideas? I've used windbg briefly before, but I'm no regular user of it, so maybe there's some more incantations I need to type to dig deeper? Guidance welcome :-)
*******************************************************************************
* *
* Exception Analysis *
* *
*******************************************************************************
FAULTING_IP:
+5c112faf02e0d82c
00000000 ?? ???
EXCEPTION_RECORD: ffffffff -- (.exr 0xffffffffffffffff)
ExceptionAddress: 00000000
ExceptionCode: 80000003 (Break instruction exception)
ExceptionFlags: 00000000
NumberParameters: 0
FAULTING_THREAD: 00000f1c
DEFAULT_BUCKET_ID: WRONG_SYMBOLS
PROCESS_NAME: dllhost.exe
ERROR_CODE: (NTSTATUS) 0x80000003 - {EXCEPTION} Breakpoint A breakpoint has been reached.
EXCEPTION_CODE: (HRESULT) 0x80000003 (2147483651) - One or more arguments are invalid
MOD_LIST: <ANALYSIS/>
NTGLOBALFLAG: 0
APPLICATION_VERIFIER_FLAGS: 0
MANAGED_STACK: !dumpstack -EE
OS Thread Id: 0xf1c (0)
Current frame:
ChildEBP RetAddr Caller,Callee
LAST_CONTROL_TRANSFER: from 77b15620 to 77b15e74
PRIMARY_PROBLEM_CLASS: WRONG_SYMBOLS
BUGCHECK_STR: APPLICATION_FAULT_WRONG_SYMBOLS
STACK_TEXT:
0022fa68 77b15620 77429884 00000064 00000000 ntdll!KiFastSystemCallRet
0022fa6c 77429884 00000064 00000000 00000000 ntdll!NtWaitForSingleObject+0xc
0022fadc 774297f2 00000064 ffffffff 00000000 kernel32!WaitForSingleObjectEx+0xbe
0022faf0 778e2c44 00000064 ffffffff 00e42374 kernel32!WaitForSingleObject+0x12
0022fb0c 778e2e32 00060848 0022fb5b 00000000 ole32!CSurrogateProcessActivator::WaitForSurrogateTimeout+0x55
0022fb24 00e413a4 0022fb40 00000000 00061d98 ole32!CoRegisterSurrogateEx+0x1e9
0022fcb0 00e41570 00e40000 00000000 00061d98 dllhost!WinMain+0xf2
0022fd40 7742d0e9 7ffde000 0022fd8c 77af19bb dllhost!_initterm_e+0x1a1
0022fd4c 77af19bb 7ffde000 dc2ccd29 00000000 kernel32!BaseThreadInitThunk+0xe
0022fd8c 77af198e 00e416e6 7ffde000 ffffffff ntdll!__RtlUserThreadStart+0x23
0022fda4 00000000 00e416e6 7ffde000 00000000 ntdll!_RtlUserThreadStart+0x1b
STACK_COMMAND: .cxr 00000000 ; kb ; dt ntdll!LdrpLastDllInitializer BaseDllName ; dt ntdll!LdrpFailureData ; ~0s; .ecxr ; kb
FOLLOWUP_IP:
dllhost!WinMain+f2
00e413a4 ff15a410e400 call dword ptr [dllhost!_imp__CoUninitialize (00e410a4)]
SYMBOL_STACK_INDEX: 6
SYMBOL_NAME: dllhost!WinMain+f2
FOLLOWUP_NAME: MachineOwner
MODULE_NAME: dllhost
IMAGE_NAME: dllhost.exe
DEBUG_FLR_IMAGE_TIMESTAMP: 4549b14e
FAILURE_BUCKET_ID: WRONG_SYMBOLS_80000003_dllhost.exe!WinMain
BUCKET_ID: APPLICATION_FAULT_WRONG_SYMBOLS_dllhost!WinMain+f2
Do you have symbols for the VB dlls? Symbols are important to get the call-stack. I hope you have correct symbols. You can use ld * and then lme which should get you list of symbols that did not match within windbg. Also set the symbol path for MS symbols and as well as for your custom code using _NT_SYMBOL_PATH
One of the easiest option is to load the dump within DebugDiag which should give you reason for the failure along with call-stack. DebugDiag has debugger extensions for Complus.
And here is a command to native call stack for all the threads
~*ek
and this one switch to the current exception
.ecxr
Debug Mon / WinDbg is the best way to troubleshoot this issue.
you should be able to use the modules list in winDbg, or the lm command to list the loaded modules. The stack trace should then tell you which DLLs are involved. This should be possible even without the symbols for the process/dll.