(Apple Silicon) (M1) Inexplicable SIGBUS crash - crash

In some native M1 code I'm working on, calling a particular function raises a SIGBUS fault that makes no sense:
Exception Type: EXC_BAD_ACCESS (SIGBUS)
Exception Codes: KERN_PROTECTION_FAILURE at 0x0000000280dc7da0
Exception Codes: 0x0000000000000002, 0x0000000280dc7da0
Exception Note: EXC_CORPSE_NOTIFY
Termination Reason: Namespace SIGNAL, Code 10 Bus error: 10
Terminating Process: exc handler [12171]
VM Region Info: 0x280dc7da0 is in 0x280d50000-0x280dd0000; bytes after start: 490912 bytes before end: 33375
REGION TYPE START - END [ VSIZE] PRT/MAX SHRMOD REGION DETAIL
VM_ALLOCATE 280cf0000-280d50000 [ 384K] rw-/rwx SM=ZER
---> VM_ALLOCATE 280d50000-280dd0000 [ 512K] rwx/rwx SM=ZER
VM_ALLOCATE 280dd0000-280e50000 [ 512K] rw-/rwx SM=ZER
According to this dump:
The fault address is the same as the function address.
The function address (0x280dc7da0) is properly aligned.
The target region has rwx protection and is therefore executable.
What could possibly be triggering SIGBUS here?
BTW, an Intel (x64) version of this program works fine on x64 Macs and in Rosetta.

The problem here is most likely Thread JIT Write Protection, a feature that only exists on Apple Silicon and operates in addition to conventional memory page permissions. Unfortunately, Apple's crash dumps seem to provide no indication that Thread JIT Write Protection could be the SIGBUS trigger.

Related

Debug exception in tRootTask in VxWorks

Using VxWorks A653 v2.5.0.2 I am getting an exception being generated in tRootTask during startup:
data storage
Exception current instruction address: 0x50000120
Machine Status Register: 0x0000fb30
Data Exception Address Register: 0x00000008
Integer Exception Register XER: 0x00000000
Condition Register: 0x40000088
Exception Syndrome Register: 0x01000000
Partition Domain ID: 0x007539a0
Task: 0x521f5920 "tRootTask"
the tRootTask does spawn a number of other kernel tasks and also creates a task for the user partition. But the entry point to the user task in the user partition never seems to run (printf() statement is not hit). After the exception occurs it is possible to attach a debugger, but the tRootTask itself is deleted by the exception. If I access the shell after the exception, attempting to display the contents at 0x50000000 contains fails, as if it is unmapped memory. This may be the root cause of the exception, but why it's inaccessible is unclear.
So I am searching for a way to debug why the exception is happening. I'm new to this OS.itself
Look in your linker map for the
"Exception current instruction address: 0x50000120"
Or if the shell has "lkup"
-> lkup 0x50000120
should give the nearest global function that's throwing exception.
Data Exception Address Register: 0x00000008
looks like zero page access, but you need decipher the "Exception Syndrome Register" in PowerPC manual to see if it's the right condition code?
"tRootTask" is the first thread in the context, so it's some sort of startup code that's failing. But it's so early, you probably need a JTAG debugger to get a breakpoint on it.

RabbitMQ Crash Explanation

Our RabbitMQ service crashed twice with the following report in the $RABBITMQ_NODENAME-sasl.log:
=CRASH REPORT==== 7-Jun-2016::14:37:25 ===
crasher:
initial call: gen:init_it/6
pid: <0.223.0>
registered_name: []
exception exit: {{badmatch,
{[{msg_location,
<<162,171,39,113,226,229,228,92,227,253,48,186,
45,48,29,98>>,
1,357,0,583},
******************
16000 similar msg_location lines snipped
******************
1795219}},
[{rabbit_msg_store,combine_files,3,[]},
{rabbit_msg_store_gc,attempt_action,3,[]},
{rabbit_msg_store_gc,handle_cast,2,[]},
{gen_server2,handle_msg,2,[]},
{proc_lib,wake_up,3,
[{file,"proc_lib.erl"},{line,250}]}]}
in function gen_server2:terminate/3
ancestors: [msg_store_persistent,rabbit_sup,<0.159.0>]
messages: [{'$gen_cast',{combine,394,380}}]
links: [#Port<0.86370>,<0.218.0>,#Port<0.86369>]
dictionary: [{{"/var/lib/rabbitmq/mnesia/$RABBITMQ_NODENAME/msg_store_persistent/357.rdq",
fhc_file},
{file,1,false}},
{{"/var/lib/rabbitmq/mnesia/$RABBITMQ_NODENAME/msg_store_persistent/340.rdq",
fhc_file},
{file,1,true}},
{fhc_age_tree,{2,
{{1465,346244,764691},
#Ref<0.0.3145729.257998>,nil,
{{1465,346244,891543},
#Ref<0.0.3145729.258001>,nil,nil}}}},
{{#Ref<0.0.3145729.257998>,fhc_handle},
{handle,{file_descriptor,prim_file,{#Port<0.86369>,59}},
0,false,0,1048576,[],false,
"/var/lib/rabbitmq/mnesia/$RABBITMQ_NODENAME/msg_store_persistent/357.rdq",
[raw,binary,read_ahead,read],
[{write_buffer,1048576}],
false,true,
{1465,346244,764691}}},
{{#Ref<0.0.3145729.258001>,fhc_handle},
{handle,{file_descriptor,prim_file,{#Port<0.86370>,64}},
14212552,false,0,1048576,[],false,
"/var/lib/rabbitmq/mnesia/$RABBITMQ_NODENAME/msg_store_persistent/340.rdq",
[raw,binary,read_ahead,read,write],
[{write_buffer,1048576}],
true,true,
{1465,346244,891543}}}]
trap_exit: false
status: running
heap_size: 121536
stack_size: 27
reductions: 835024
neighbours:
We'd like to understand what this crash report means. Does it signify a bad message, RMQ can't find a message, or something completely different? We're using RabbitMQ 3.1.5 with Erlang 18, and while we know we're using an old version, we want to first know what's causing the crash before dedicating resources to an upgrade.
This message means that RabbitMQ message storage process has failed to combine files during garbage collection on message store. This can in theory cause message loss.
Note that 3.1.5 is not supported and has not been tested with OTP 10. This issue can be already fixed in newer versions though.

php7.0.2 Program terminated with signal 11, Segmentation fault

I am running php-7.0.2 with codeigniter (a php mvc frame). I got some segmentation faults which caused core dumps. And, I found that these segmentation faults randomly occurred when the child php-fpm processes shutdown and restart. I don't know why.
Using gdb "bt" to display the core dump:
Core was generated by `php-fpm: pool www '.
Program terminated with signal 11, Segmentation fault.
\#0 zend_string_release (ht=0x114dae0) at /home/smt/phpng/php-7.0.2/Zend/zend_string.h:269
269 /home/smt/phpng/php-7.0.2/Zend/zend_string.h: No such file or directory.
in /home/smt/phpng/php-7.0.2/Zend/zend_string.h
Missing separate debuginfos, use: debuginfo-install php7-7.0.2-20160407105024.x86_64
(gdb) bt
\#0 zend_string_release (ht=0x114dae0) at /home/smt/phpng/php-7.0.2/Zend/zend_string.h:269
\#1 zend_hash_destroy (ht=0x114dae0) at /home/smt/phpng/php-7.0.2/Zend/zend_hash.c:1273
\#2 0x000000000080647b in module_destructor (module=0x14b6ae0)
at /home/smt/phpng/php-7.0.2/Zend/zend_API.c:2509
\#3 0x000000000080075c in module_destructor_zval (zv=<value optimized out>)
at /home/smt/phpng/php-7.0.2/Zend/zend.c:615
\#4 0x000000000080dcff in _zend_hash_del_el_ex (ht=0x1154780)
at /home/smt/phpng/php-7.0.2/Zend/zend_hash.c:1013
\#5 _zend_hash_del_el (ht=0x1154780) at /home/smt/phpng/php-7.0.2/Zend/zend_hash.c:1037
\#6 zend_hash_graceful_reverse_destroy (ht=0x1154780) at /home/smt/phpng/php-7.0.2/Zend/zend_hash.c:1489
\#7 0x0000000000800096 in zend_shutdown () at /home/smt/phpng/php-7.0.2/Zend/zend.c:840
\#8 0x00000000007a2a6a in php_module_shutdown () at /home/smt/phpng/php-7.0.2/main/main.c:2339
\#9 0x000000000089e45d in main (argc=<value optimized out>, argv=<value optimized out>)
at /home/smt/phpng/php-7.0.2/sapi/fpm/fpm/fpm_main.c:1997
(gdb) quit
The php-fpm.log is as following:
[20-Apr-2016 08:00:02] WARNING: [pool www] child 11751 exited on signal 11 (SIGSEGV - core dumped) after 3600.462022 seconds from start
I am very curious about this bug.
Until now, I am sure that the core dumps occurred when the fpm restarted. The restarts were caused by the command 'kill -10 fpm-master-process-ids'. Or, the fpm also restarted when it had processed 'pm.max_requests' requests.
However, the core dumps didn't occur at every restart and the probability of core dumps was very small. I cannot find the role.
Fortunately, I have installed the 7.0.5 version to replace the 7.0.2 version in our production environment and it had run for three days without core dumps.
I cannot find any modification in the changelogs from 7.0.2 to 7.0.5. This is exactly a very strange bug and I want to know the reason. who can tell me something about this bug?
After updating to 7.0.5, core dump has not occurred for 2 weeks. So, this bug has been fixed in 7.0.5!
I still don't know what case this bug.
I am a curious cat. #_#

How to get stack trace from System.AccessViolationException?

I have a crash dump (minidump) for an applicaiton which includes managed and unmanaged code. The application crashed with exception System.Reflection.TargetInvocationException during execution of Delegate.DynamicInvoke. The inner exception is System.AccessViolationException but it doesn't include stack trace information (see below).
Is it possible to get this stack trace somehow?
Windbg output for !analyze -v (I've shortened the stack trace to make it more clear):
EXCEPTION_OBJECT: !pe 6c09b28
Exception object: 0000000006c09b28
Exception type: System.Reflection.TargetInvocationException
Message: Exception has been thrown by the target of an invocation.
InnerException: System.AccessViolationException, Use !PrintException 0000000006c09988 to see more.
StackTrace (generated):
SP IP Function
00000000004FB540 0000000000000000 mscorlib_ni!System.RuntimeMethodHandle.InvokeMethod(System.Object, System.Object[], System.Signature, Boolean)+0x1
00000000004FB540 000007FEF761D28C mscorlib_ni!System.Reflection.RuntimeMethodInfo.UnsafeInvokeInternal(System.Object, System.Object[], System.Object[])+0x4c
00000000004FB5B0 000007FEF754FD3A mscorlib_ni!System.Delegate.DynamicInvokeImpl(System.Object[])+0x6a
00000000004FB610 000007FE99A15010 CCTV_Framework_Utility!CCTV.Framework.Utility.EventsHelper.SafeInvoke(System.Delegate, System.Object[])+0x60
StackTraceString:
HResult: 80131604
EXCEPTION_OBJECT: !pe 6c09988
Exception object: 0000000006c09988
Exception type: System.AccessViolationException
Message: Attempted to read or write protected memory. This is often an indication that other memory is corrupt.
InnerException: none
StackTrace (generated): none
StackTraceString: none
HResult: 80004003

COM exception on "custom component" - how to identify DLL?

We have a large legacy VB app made up of a number of DLLs (a couple of dozen or so), all installed into a single COM+ Server Application. Every now and then, something happens that causes dllhost.exe to keel over (and automatically restart), leaving this message in the Windows Application Event log...
The system has called a custom component and that component has
failed and generated an exception. This indicates a problem with the
custom component. Notify the developer of this component that a failure has
occurred and provide them with the information below.
Server Application ID: {8CC02F18-2733-4A17-9E5C-1A70CB6B6977}
Server Application Instance ID: {1940A147-8A5E-45FA-86FE-DAF92A822597}
Server Application Name: MyTestApp
The serious nature of this error has caused the process to terminate.
Exception: C0000005
Address: 0x758DA3DA
Source: Complus
Event ID: 4786
Level: Error
Along side this is another log, specifically on dllhost.exe...
Faulting application name: dllhost.exe, version: 6.0.6000.16386, time stamp: 0x4549b14e
Faulting module name: msvcrt.dll, version: 7.0.6002.18005, time stamp: 0x49e0379e
Exception code: 0xc0000005
Fault offset: 0x0000a3da
Faulting process id: 0x83c
Faulting application start time: 0x01cb50c507ee0166
Faulting application path: %11
Faulting module path: %12
Report Id: %13
I know it's flagging a failure in the C runtime (msvcrt), but ideally I need to trace this back into the DLL that's called into msvcrt (probably with bad data/parameters). So without installing a debugger, is there any way to identify the DLL that causes this? I'm trying to see if there's a memory dump anywhere I can use to analyse offline - and thus tie the Address to something specific. But without that, I'm not sure that's possible. Can the COM subsystem be told to generate a minidump when a hosted application crashes? (yes it can [probably] - there's a checkbox on the 'Dump' tab).
This is on Windows Server 2008 R1 32-bit (but also be interested for Server 2003 as well).
It doesn't affect availability of the app -- COM+ simply restarts dllhost and the application continues, but it is an inconvienience that would be useful to fix.
Edit Okay, I've got a crash dump, I've got windbg, but it's not helping. Not sure if I'm being thick (a possibility) or something else :-) Output of !analyze -v is below , but it's not showing me anything in our DLLs, although it looks like it hasn't been able to resolve FAULTING_IP? I'm not sure where to turn next.
I'm wondering if any of my pdb's are dodgy and be worth generating new ones -- hooked into Microsoft's symbol server, so they shouldn't be, but not sure for what module it's (apparently) reporting wrong symbols for (BUGCHECK_STR and PRIMARY_PROBLEM_CLASS) (or are these symbols on the server the code was originally running on?). Would it be better to put the PDBs on the server itself?
If not, any other ideas? I've used windbg briefly before, but I'm no regular user of it, so maybe there's some more incantations I need to type to dig deeper? Guidance welcome :-)
*******************************************************************************
* *
* Exception Analysis *
* *
*******************************************************************************
FAULTING_IP:
+5c112faf02e0d82c
00000000 ?? ???
EXCEPTION_RECORD: ffffffff -- (.exr 0xffffffffffffffff)
ExceptionAddress: 00000000
ExceptionCode: 80000003 (Break instruction exception)
ExceptionFlags: 00000000
NumberParameters: 0
FAULTING_THREAD: 00000f1c
DEFAULT_BUCKET_ID: WRONG_SYMBOLS
PROCESS_NAME: dllhost.exe
ERROR_CODE: (NTSTATUS) 0x80000003 - {EXCEPTION} Breakpoint A breakpoint has been reached.
EXCEPTION_CODE: (HRESULT) 0x80000003 (2147483651) - One or more arguments are invalid
MOD_LIST: <ANALYSIS/>
NTGLOBALFLAG: 0
APPLICATION_VERIFIER_FLAGS: 0
MANAGED_STACK: !dumpstack -EE
OS Thread Id: 0xf1c (0)
Current frame:
ChildEBP RetAddr Caller,Callee
LAST_CONTROL_TRANSFER: from 77b15620 to 77b15e74
PRIMARY_PROBLEM_CLASS: WRONG_SYMBOLS
BUGCHECK_STR: APPLICATION_FAULT_WRONG_SYMBOLS
STACK_TEXT:
0022fa68 77b15620 77429884 00000064 00000000 ntdll!KiFastSystemCallRet
0022fa6c 77429884 00000064 00000000 00000000 ntdll!NtWaitForSingleObject+0xc
0022fadc 774297f2 00000064 ffffffff 00000000 kernel32!WaitForSingleObjectEx+0xbe
0022faf0 778e2c44 00000064 ffffffff 00e42374 kernel32!WaitForSingleObject+0x12
0022fb0c 778e2e32 00060848 0022fb5b 00000000 ole32!CSurrogateProcessActivator::WaitForSurrogateTimeout+0x55
0022fb24 00e413a4 0022fb40 00000000 00061d98 ole32!CoRegisterSurrogateEx+0x1e9
0022fcb0 00e41570 00e40000 00000000 00061d98 dllhost!WinMain+0xf2
0022fd40 7742d0e9 7ffde000 0022fd8c 77af19bb dllhost!_initterm_e+0x1a1
0022fd4c 77af19bb 7ffde000 dc2ccd29 00000000 kernel32!BaseThreadInitThunk+0xe
0022fd8c 77af198e 00e416e6 7ffde000 ffffffff ntdll!__RtlUserThreadStart+0x23
0022fda4 00000000 00e416e6 7ffde000 00000000 ntdll!_RtlUserThreadStart+0x1b
STACK_COMMAND: .cxr 00000000 ; kb ; dt ntdll!LdrpLastDllInitializer BaseDllName ; dt ntdll!LdrpFailureData ; ~0s; .ecxr ; kb
FOLLOWUP_IP:
dllhost!WinMain+f2
00e413a4 ff15a410e400 call dword ptr [dllhost!_imp__CoUninitialize (00e410a4)]
SYMBOL_STACK_INDEX: 6
SYMBOL_NAME: dllhost!WinMain+f2
FOLLOWUP_NAME: MachineOwner
MODULE_NAME: dllhost
IMAGE_NAME: dllhost.exe
DEBUG_FLR_IMAGE_TIMESTAMP: 4549b14e
FAILURE_BUCKET_ID: WRONG_SYMBOLS_80000003_dllhost.exe!WinMain
BUCKET_ID: APPLICATION_FAULT_WRONG_SYMBOLS_dllhost!WinMain+f2
Do you have symbols for the VB dlls? Symbols are important to get the call-stack. I hope you have correct symbols. You can use ld * and then lme which should get you list of symbols that did not match within windbg. Also set the symbol path for MS symbols and as well as for your custom code using _NT_SYMBOL_PATH
One of the easiest option is to load the dump within DebugDiag which should give you reason for the failure along with call-stack. DebugDiag has debugger extensions for Complus.
And here is a command to native call stack for all the threads
~*ek
and this one switch to the current exception
.ecxr
Debug Mon / WinDbg is the best way to troubleshoot this issue.
you should be able to use the modules list in winDbg, or the lm command to list the loaded modules. The stack trace should then tell you which DLLs are involved. This should be possible even without the symbols for the process/dll.