Interpreting data from crash dump using winDbg - wcf

I am trying to debug a WCF service which is crashing from time to time. I have created a crash dump file using adplus, that was the easy part, I used this command.
adplus.exe -crash -pmn myservicehost.exe -o c:\dump
I am opening the file .dmp file with winDbg x64 version 6.2.9200. When I look at the threads !Threads I see there are a bunch of threads that that are waiting for a callback to complete (I think).
0:031> ~~[1b00]s
ntdll!NtWaitForMultipleObjects+0xa:
00000000`778d18ca c3 ret
0:029> ~~[1b00]s
ntdll!NtWaitForMultipleObjects+0xa:
00000000`778d18ca c3 ret
Eventually it crashes. Here is the output when I look at the call stack from that exception using !PrintException /d -nested 00000002814ad6d0
Exception object: 00000002814af4c8
Exception type: System.Runtime.CallbackException
Message: A user callback threw an exception. Check the exception stack and inner exception to determine the callback that failed.
InnerException: System.ServiceModel.CommunicationObjectAbortedException, Use !PrintException 00000002814ad6d0 to see more.
StackTrace (generated):
SP IP Function
000000000FA89D40 000007FEDD9AD3E2 System_ServiceModel_ni!System.ServiceModel.Channels.CommunicationObject.OnClosed()+0x262
000000000FA8C240 000007FEDD93759D System_ServiceModel_ni!System.ServiceModel.ServiceHostBase.OnClosed()+0x6d
000000000FA8C290 000007FEDD9433D0 System_ServiceModel_ni!System.ServiceModel.ServiceHost.OnClosed()+0x10
000000000FA8C2C0 000007FEDE185B86 System_ServiceModel_ni!System.ServiceModel.Channels.CommunicationObject.Abort()+0x2b6
000000000FA8C3C0 000007FE8ABAC89C MyCompany_WcfApp_WcfAppServiceHost!MyCompany.WcfApp.WcfAppServiceHost.WcfAppServiceHost.FaultedServiceHandler(System.Object, System.EventArgs)+0x26c
000000000FA8C5F0 0000000000000000 mscorlib_ni!System.EventHandler.Invoke(System.Object, System.EventArgs)+0x1
000000000FA8C5F0 000007FEDE184E9A System_ServiceModel_ni!System.ServiceModel.Channels.CommunicationObject.OnFaulted()+0x1ca
000000000FA8C670 000007FEDE184784 System_ServiceModel_ni!System.ServiceModel.Channels.CommunicationObject.Fault()+0x94
000000000FA8C6E0 000007FEDE184E9A System_ServiceModel_ni!System.ServiceModel.Channels.CommunicationObject.OnFaulted()+0x1ca
000000000FA8C760 000007FEDE184784 System_ServiceModel_ni!System.ServiceModel.Channels.CommunicationObject.Fault()+0x94
000000000FA8C7D0 000007FEDE184E9A System_ServiceModel_ni!System.ServiceModel.Channels.CommunicationObject.OnFaulted()+0x1ca
000000000FA8C850 000007FEDE184784 System_ServiceModel_ni!System.ServiceModel.Channels.CommunicationObject.Fault()+0x94
000000000FA8C8C0 000007FEDE475407 System_ServiceModel_ni!System.ServiceModel.Channels.MsmqInputChannelBase.TryReceive(System.TimeSpan, System.ServiceModel.Channels.Message ByRef)+0x4f7
000000000FA8EBF0 000007FEDE5409AE System_ServiceModel_ni!System.ServiceModel.Dispatcher.InputChannelBinder.TryReceive(System.TimeSpan, System.ServiceModel.Channels.RequestContext ByRef)+0x2e
000000000FA8EC50 000007FEDEAC29E2 System_ServiceModel_ni!System.ServiceModel.Dispatcher.ErrorHandlingReceiver.TryReceive(System.TimeSpan, System.ServiceModel.Channels.RequestContext ByRef)+0x646022
000000000FA8ECB0 000007FEDE47C8D6 System_ServiceModel_ni!System.ServiceModel.Dispatcher.ChannelHandler.TryTransactionalReceive(System.Transactions.Transaction, System.ServiceModel.Channels.RequestContext ByRef)+0x396
000000000FA8ED70 000007FEDE47BE07 System_ServiceModel_ni!System.ServiceModel.Dispatcher.ChannelHandler.TransactedLoop()+0xb7
000000000FA8EDF0 000007FEDE47BD31 System_ServiceModel_ni!System.ServiceModel.Dispatcher.ChannelHandler.SyncTransactionalMessagePump()+0x21
000000000FA8EE20 000007FEDE47A829 System_ServiceModel_ni!System.ServiceModel.Dispatcher.ChannelHandler.OnStartSyncMessagePump(System.Object)+0x209
000000000FA8EED0 000007FEDB6DE651 System_ServiceModel_Internals_ni!System.Runtime.IOThreadScheduler+ScheduledOverlapped.IOCallback(UInt32, UInt32, System.Threading.NativeOverlapped*)+0x71
000000000FA8EF30 000007FEDB77A260 System_ServiceModel_Internals_ni!System.Runtime.Fx+IOCompletionThunk.UnhandledExceptionFrame(UInt32, UInt32, System.Threading.NativeOverlapped*)+0x9bcd0
000000000FA8EF90 000007FEDF225C26 mscorlib_ni!System.Threading._IOCompletionCallback.PerformIOCompletionCallback(UInt32, UInt32, System.Threading.NativeOverlapped*)+0x96
Problem is, this isn't really helping me. I need to find out what is creating all these locks. Does anyone have any suggestions on how I might do that? I have never had an issue that required this level of debugging and to be honest I am not 100% sure what I am doing. If I haven't given enough info please let me know I will happily provide you with whatever else I can.
Thanks

Try using !SyncBlk command of SOS to determine deadlocks if any.

Related

Debug exception in tRootTask in VxWorks

Using VxWorks A653 v2.5.0.2 I am getting an exception being generated in tRootTask during startup:
data storage
Exception current instruction address: 0x50000120
Machine Status Register: 0x0000fb30
Data Exception Address Register: 0x00000008
Integer Exception Register XER: 0x00000000
Condition Register: 0x40000088
Exception Syndrome Register: 0x01000000
Partition Domain ID: 0x007539a0
Task: 0x521f5920 "tRootTask"
the tRootTask does spawn a number of other kernel tasks and also creates a task for the user partition. But the entry point to the user task in the user partition never seems to run (printf() statement is not hit). After the exception occurs it is possible to attach a debugger, but the tRootTask itself is deleted by the exception. If I access the shell after the exception, attempting to display the contents at 0x50000000 contains fails, as if it is unmapped memory. This may be the root cause of the exception, but why it's inaccessible is unclear.
So I am searching for a way to debug why the exception is happening. I'm new to this OS.itself
Look in your linker map for the
"Exception current instruction address: 0x50000120"
Or if the shell has "lkup"
-> lkup 0x50000120
should give the nearest global function that's throwing exception.
Data Exception Address Register: 0x00000008
looks like zero page access, but you need decipher the "Exception Syndrome Register" in PowerPC manual to see if it's the right condition code?
"tRootTask" is the first thread in the context, so it's some sort of startup code that's failing. But it's so early, you probably need a JTAG debugger to get a breakpoint on it.

WebSphere wsadmin testConnection error message

I'm trying to write a script to test all DataSources of a WebSphere Cell/Node/Cluster. While this is possible from the Admin Console a script is better for certain audiences.
So I found the following article from IBM https://www.ibm.com/support/knowledgecenter/en/SSAW57_8.5.5/com.ibm.websphere.nd.multiplatform.doc/ae/txml_testconnection.html which looks promising as it describles exactly what I need.
After having a basic script like:
ds_ids = AdminConfig.list("DataSource").splitlines()
for ds_id in ds_ids:
AdminControl.testConnection(ds_id)
I experienced some undocumented behavior. Contrary to the article above the testConnection function does not always return a String, but may also throw a exception.
So I simply use a try-catch block:
try:
AdminControl.testConnection(ds_id)
except: # it actually is a com.ibm.ws.scripting.ScriptingException
exc_type, exc_value, exc_traceback = sys.exc_info()
now when I print the exc_value this is what one gets:
com.ibm.ws.scripting.ScriptingException: com.ibm.websphere.management.exception.AdminException: javax.management.MBeanException: Exception thrown in RequiredModelMBean while trying to invoke operation testConnection
Now this error message is always the same no matter what's wrong. I tested authentication errors, missing WebSphere Variables and missing driver classes.
While the Admin Console prints reasonable messages, the script keeps printing the same meaningless message.
The very weird thing is, as long as I don't catch the exception and the script just exits by error, a descriptive error message is shown.
Accessing the Java-Exceptions cause exc_value.getCause() gives None.
I've also had a look at the DataSource MBeans, but as they only exist if the servers are started, I quickly gave up on them.
I hope someone knows how to access the error messages I see when not catching the Exception.
thanks in advance
After all the research and testing AdminControl seems to be nothing more than a convinience facade to some of the commonly used MBeans.
So I tried issuing the Test Connection Service (like in the java example here https://www.ibm.com/support/knowledgecenter/en/SSEQTP_8.5.5/com.ibm.websphere.base.doc/ae/cdat_testcon.html
) directly:
ds_id = AdminConfig.list("DataSource").splitlines()[0]
# other queries may be 'process=server1' or 'process=dmgr'
ds_cfg_helpers = __wat.AdminControl.queryNames("WebSphere:process=nodeagent,type=DataSourceCfgHelper,*").splitlines()
try:
# invoke MBean method directly
warning_cnt = __wat.AdminControl.invoke(ds_cfg_helpers[0], "testConnection", ds_id)
if warning_cnt == "0":
print = "success"
else:
print "%s warning(s)" % warning_cnt
except ScriptingException as exc:
# get to the root of all evil ignoring exception wrappers
exc_cause = exc
while exc_cause.getCause():
exc_cause = exc_cause.getCause()
print exc_cause
This works the way I hoped for. The downside is that the code gets much more complicated if one needs to test DataSources that are defined on all kinds of scopes (Cell/Node/Cluster/Server/Application).
I don't need this so I left it out, but I still hope the example is useful to others too.

Replicator failure in service exiting with FailFast

We are experiencing a specific statefull service that is unable to fully "go green", the partitions keep reshuffling and we are not seeing any indications of the errors in our own logs. After lots of digging we found something suspicious in the EventLogs on one of the WMs (pasted below)
Application: (hidden).exe
Framework Version: v4.0.30319
Description: The application requested process termination through System.Environment.FailFast(string message).
Message: ((copyMode & CopyMode.FalseProgress) == 0) || (sourceStartingLsn < targetStartingLsn). Source starting lsn : 2018, target starting lsn :2018
Stack:
at System.Environment.FailFast(System.String)
at Microsoft.ServiceFabric.Replicator.Utility.CodingError(System.String, System.Object[])
at Microsoft.ServiceFabric.Replicator.Utility.Assert(Boolean, System.String, ...)
at Microsoft.ServiceFabric.Replicator.LoggingReplicator.GetLogRecordsToCopy(Microsoft.ServiceFabric.Replicator.ProgressVector, System.Fabric.Epoch, Microsoft.ServiceFabric.Replicator.LogicalSequenceNumber, Microsoft.ServiceFabric.Replicator.LogicalSequenceNumber, Int64, Int64, Microsoft.ServiceFabric.Replicator.LogicalSequenceNumber ByRef, Microsoft.ServiceFabric.Replicator.LogicalSequenceNumber ByRef, Microsoft.ServiceFabric.Data.IAsyncEnumerator'1 ByRef, Microsoft.ServiceFabric.Replicator.BeginCheckpointLogRecord ByRef)
at Microsoft.ServiceFabric.Replicator.LoggingReplicatorCopyStream+d__3.MoveNext()
at System.Runtime.CompilerServices.AsyncMethodBuilderCore.Start[[Microsoft.ServiceFabric.Replicator.LoggingReplicatorCopyStream+d__3, Microsoft.ServiceFabric.Data.Impl, Version=5.0.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35]](d__3 ByRef)
at Microsoft.ServiceFabric.Replicator.LoggingReplicatorCopyStream.GetNextAsync(System.Threading.CancellationToken)
at System.Fabric.StateProviderBroker+AsyncEnumerateOperationDataBroker.b__8(System.Threading.CancellationToken)
at System.Fabric.Interop.Utility.WrapNativeAsyncMethodImplementation(System.Func`2, IFabricAsyncOperationCallback, System.String, System.Fabric.Interop.InteropApi)
We are not sure what to make of this. Seems related to state replication but we don't think we've changed anything related to the state of the service. Since the service is exiting with FailFast, we don't get a chance to do anything in our code to remedy this so we are basically stuck in this loop right now (luckily on a non-Live environment but still...)
Does anyone have any idea what this is related to specifically and how we can recover the service and the data?

MQQueueManager Constructor throwing FileNotFoundException

I have the following vb.net code:
Imports IBM.WMQ
[...]
MQEnvironment.Hostname = hostName
MQEnvironment.Port = portNumber
MQEnvironment.Channel = channelName
queueManager = New MQQueueManager(queueManagerName) ' error here
which is throwing the following error:
System.IO.FileNotFoundException occurred
FileName=C:\Users\User\Documents\Visual Studio 2012\Projects\[...]\bin\Debug\mqclient.ini
HResult=-2147024894
Message=Could not find file 'C:\Users\User\Documents\Visual Studio 2012\Projects\[...]\bin\Debug\mqclient.ini'.
Source=mscorlib
StackTrace:
at System.IO.__Error.WinIOError(Int32 errorCode, String maybeFullPath)
I am not using any ini files in the construction of my queue manager, so does anyone have any idea what's going on - why is it even looking for one, and why in the same directory as the program? I have installed the MQ client, and afaik I have all the environmental variables, etc. set up properly.
Thanks for any help you can give
Is that an unhandled or a first chance exception? Internally, the MQ .net layer will try to read a MQClient.ini but should function quite happily without it. It reads the file for compatibility with the C client, and can handle some of the MQClient.ini stanzas. I would not have expected an absence of such a file to cause problems, but it will try to open it internally. Was that the full callstack, as I'd have expected some MQ libraries on the stack otherwise.

Return key in process

The code was run as:
u = subprocess.Popen(['process','abc','def','','ghi','jkl'], stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
It doesn't work below due to an error occurred:
ValueError: I/O operation on closed file
I suggest you to try pexpect, it is far more well-suited for this tasks (actually, it is a tool built for these kind of tasks).
You can also browse througn examples and see what its usage looks like.