What is the 'COM context' referred to in the 'ContextSwitchDeadlock' MDA message? - com

While running unit tests, I'm getting the MDA shown below.
In the error message, what is the hexadecimal value refered to as a 'COM context'?
Can I determine this value for a given STA thread? If so, how?
Managed Debugging Assistant
'ContextSwitchDeadlock' has detected a
problem in 'C:\Program Files\Microsoft
Visual Studio
9.0\Common7\IDE\vstesthost.exe'. Additional Information: The CLR has
been unable to transition from COM
context 0x14cff0 to COM context
0x14d218 for 60 seconds. The thread
that owns the destination
context/apartment is most likely
either doing a non pumping wait or
processing a very long running
operation without pumping Windows
messages. This situation generally has
a negative performance impact and may
even lead to the application becoming
non responsive or memory usage
accumulating continually over time. To
avoid this problem, all single
threaded apartment (STA) threads
should use pumping wait primitives
(such as CoWaitForMultipleHandles) and
routinely pump messages during long
running operations.

From what I can see (looking at mscorwks disassembly) it is a IObjContext*, returned from CoGetContextToken().
Basically it looks like a call is queued up using IContextCallback::ContextCallback() from mscorwks!CtxEntry::EnterContextOle32BugAware(), which in turn calls mscorwks!CtxEntry::EnterContextCallback() once the object context (apartment) processes the message. They use a CLREvent to signal callback completion. For STA threads, not pumping messages would cause the event wait to timeout, which triggers the ContextSwitchDeadlock MDA.
Note: I'm not running this under a debugger so I can't confirm the behavior, but this is probably reasonably accurate.

I haven't seen that before, I suspect it's just an internal pointer. Neither thread IDs nor thread handles are usually that large.
There's been no way of getting the apartment type from the current thread, and I've never seen an apartment ID (other than a GUID representing the source/target apartment when marshalling) in native code.

The unit test is most likely running in MTA mode and you have code in it that displays a UI.
One COM context is visual studio, the other is the UI inside your unit test. You can either not display a UI or turn off the MDA.

It looks like the STA COM application does not spin the message loop. This would cause STA COM to just die. I got no idea what the COM context is but it strikes me that you should be able to reproduce your target app to just hang for long periods of time.
Sounds like there is a function that takes more than 60 seconds to run. Have you been able to isolate it?
EDIT This is the guy that wrote COM interop for .NET.
http://blogs.msdn.com/cbrumme/
Have a look at his blog or check out the boards that he frequents. He's written lots of stuff about why COM interop the way it is. It might help to ask this question on one of the Microsoft boards.

Related

How to debug a freeRTOS application?

How do you debug an RTOS application? I am using KEIL µVision and when I hit debug, the program steps through the main function until the function that initializes the RTOS kernel and then you can't step any further. The code itself works though. It is not mine btw, but I have to work on it. Is this normal behavior with RTOS applications or is this related to the program?
Yes, this is normal. You need to set breakpoints in the source code for the tasks that were created in main(): the only purpose of main() in a FreeRTOS application is to :
initialize the hardware,
create the resources (timers, semaphores...) and tasks your application will need,
start the scheduler
The application should never return from vTaskStartScheduler() if they were enough resources available.
Put break-points at the entry point of each task you need to debug. When you step the over the scheduler start (or simply run) the debugger will halts at the first task that runs. When that task blocks, some other task will be selected to run according to the scheduling rules.
Generally when debugging and you reach a blocking call, step-over it, other tasks may run and the debugger will stop at the next line only when the task becomes ready (depending on the nature of the blocking call). Often you will want to predict what task will run as a result of the call, and put a breakpoint in that task. For example if you issue a message send, you might place a breakpoint after the message receive call of the receiving task.
The point is you cannot "step-through" a context switch unless you have the RTOS source or do it at the assembler level, which is seldom useful or productive, and will not work for preemption.
You get a somewhat better RTOS debug experience and tool support in Keil if you use Keil's own RTX5 RTOS rather then FreeRTOS, but all of the above remains true.
Yes, this is an expected behaviour. The best way to debug a RTOS application is to place breakpoints at all tasks, key function entry points and step debug.
The debugger supports various methods of single-stepping through an application as in below link.
http://www.keil.com/products/uvision/db_exe_step.asp
Typical challenges in debugging RTOS application can be dealing with interrupt handling, synchronization issues and register/memory corruption.
Keil µVision's System Analyzer enables one to view the program execution time frame, status of each thread. It shall also help in viewing interrupts, exceptions if tracer is enabled.

procrun server crashes after few seconds

I have a web application, using Spring-Boot. There is now a need for this application to use a custom dll (in house build dll file). There is nothing wrong with this dll, as we use it on our other applications, and have no problems with it.
To load the library in this new web application I'm writing, I have added the dll file to the procrun directory. This directory is on the library path, so that makes sense.
During startup I put in code to immediately load the dll, and also test some of its functionality. This works fine.
However, I have a timer, that schedules the execution of some functions, which may or may not include function calls to the dll.
At some point, about 10 minutes or so into execution, the service unexpected and seemingly without any valid reason, stops.
Although I try/catch exceptions at the appropriate logical places in code, there are no relevant log entries printed.
The Event Log shows something that reminds me of a null pointer exception:
Another bread crumb is that the event log will print something about the dll_unload. (see picture)
I need some help figuring out why the service is failing/stopping.
Kind Regards.
EDIT: After about three days of debugging and scratching my head, I came upon a forum thread that explained that this problem has something to do with the manner in which the system releases the memory during garbage collection. It seems that the dll in question was being unloaded by the garbage collector, even though it could still be called at some time later - which of course was the cause of the service falling over.
To solve the problem, I put in a timer that would call a method in the dll at three minute intervals (on my system this would not impact performance). I know this solution is a hack, but it works for me.

What causes a program to freeze

From what experience I have programming whenever a program has a problem it crashes, whether it is from an unhanded exception or a piece of code that should have been checked for errors, but was not and threw one. What would cause a program to completely freeze a system to the point of requiring a restart.
Edit: Thanks for the answers. As for the language and OS this question was inspired by me playing Fallout and the game freezing twice in an hour causing me to have to restart the xbox, so I am guessing c++.
A million different things. The most common that come to mind are:
Spawning too many threads or processes, which drowns the OS scheduler.
Gobbling too much RAM, which puts the memory manager into page-fault hell.
In a Dotnet/Java type environment its quite difficult to seize a system up, because the Runtime keeps you code at a distance from the OS.
Closer to the metal say C or C++, Assembly etc you have to play fair with the rest of the system - If you dont have it already grab a copy of Petzold and observe/experiment yourself with the amount of 'boilerplate' code to get a single Window running...
Even closer, down at the driver level all sorts of things can happen...
There are number of reasons, being internal or external that leads to deadlocked application, more general case is when something is being asked for by a program but is not given that leads to infinite waiting, the practical example to this is, a program writes some text to a file, but when it is about to open a file for writing, same file is opened by any other application, so the requesting app will wait (freeze in some cases if not coded properly) until it gets exclusive control of the file.
And a critical freeze that leads to restarting the system is when the file which is asked for is something which very important for the OS. However, you may not need to restart the system in order to get it back to normal, unless the program which was frozen is written in a language that produces native binary, i.e. C/C++ to be precise. So if application is written in a language which works with the concept of managed code, like any .NET language, it will not need a system restart to get things back to normal.
page faults, trying to access inaccessible data or memory(acces violation), incompatible data types etc.

VB.NET Synchronization confusion

VB.NET, .NET 4
Hello all,
I have an application that controls an industrial system. It has a GUI which, once a process is started, principally displays the states of various attached devices. It basically works like this:
A System.Timers.Timer object is always running. At each Elapsed event, it polls the devices for their current values and invokes controls on the GUI, updating them with the new values.
A start button is clicked, a process time Stopwatch object is created and started (Labels on the GUI are now invoked and updated on the System.Timers.Timer's Elapsed event, in addition to the other work that is taken care of on this event)
A new thread is created which runs a Process() subroutine
Some Stopwatch objects are created and started (these Stopwatches are periodically restarted during the process via their Restart() method.
Some logic is executed on the new Stopwatchs' Elapsedmilliseconds properties to determine when to do things like write new setpoints to the devices, update the data log, etc...
Here's my problem: The program occasionally freezes. My ignorant efforts at tracking down the problem have led me to suspect that read/writes to the subset of devices that are RS-232 controlled are the culprits most of the time. However, I occasionally see other strange things upon program freeze, e.g., one of the time Labels whose Text property is determined by a Stopwatch's Elapsedmilliseconds property sometimes will show an impossible value (e.g., -50 hours or something).
For the RS-232 problems, I suspect something like a read event is being executed at the same time as a write event and this causes a freeze(?). I tried to prevent this by making sure that all communication with an RS-232 device is funneled through a Transmit() subroutine which has the following attribute:
Which, as far as my ignorance permits me to understand, should force one Transmit() execution to finish completely before another one can start. Perhaps another risk is the code getting blocked here if one Transmit() never finishes?
Regarding the Stopwatch trouble, I speculate that the problem is that the Timer is trying to update a GUI Label at the same time that the Stopwatch's Restart() method is being executed. I'm unsure if this could cause a problem. All I know is that this problem has only occurred at a point in the process when a Restart() call would be made.
I am wondering if I could use a SyncLock or something to lock a Stopwatch while the Label is being updated (or, conversely, while its being restarted)? Or, perhaps I should stop the Timer, restart the Stopwatch, and then start the timer again, like so?:
Timer.Stop
Stopwatch.Restart
Timer.Start
My trepidation regarding how to proceed is due to my complete lack of understanding of how .NET synchronization objects actually work. I've tried slapping a few SyncLocks in various places, but I really have no idea if they're implemented correctly or not. I'm wondering if, having provided all this context, someone really smart might be able to tell me how I'm stupid and how to do this right. I would really appreciate any input. If it would be useful to provide some code snippets, I'd be happy to, I just worry that everything's so convoluted that it would just detract from what I'm hoping is a conceptual question.
Thanks in advance!
Brian
I would consider a shift to a task scheduling framework instead of relying on manual manipulation of timers if your working on anything SCADA related. A simple starting point would be something similar to the hardcodet.Scheduling classes and you can move to something like the beast that is Quartz. Most of these types of frameworks will provide you with a way to pause and resume scheduled actions.
If I'm working with Modbus, I normally keep a local cache of the register values and make changes to any value fire a change event. This has the benefit of allowing you to implement things like refreshing values manually without interfering with your process scheduling and checking for deadband when evaluating your polled response. This happened to be the side effect of implementing a polled protocol to a subset of the OPC DA interface.

How would I go about taking a snapshot of a process to preserve its state for future investigation? Is this possible?

Whether this is possible I don't know, but it would mighty useful!
I have a process that fails periodically (running in Windows 2000). I then have just one chance to react to it before having to restart it and painfully wait for it to fail again. I didn't write the process so don't have the source to debug. The failure is seemingly random.
With a snapshot of the process I could repeatedly and quickly test reactions to the failure.
I had thought of running inside a VM but this isn't possible in this instance.
EDIT:
#Jon Cage asked:
When you say a snapshot, you mean capturing a process when it's about to fail (including memory, program state etc. etc.) ...and then replaying it's final few seconds repeatedly to see what effect it has on some other component?
This is exactly what I mean!
I think minidump is what you are looking for.
You can also used Userdump:
The User Mode Process Dumper
(userdump) dumps any running Win32
processes memory image (including
system processes such as csrss.exe,
winlogon.exe, services.exe, etc) on
the fly, without attaching a debugger,
or terminating target processes.
Generated dump file can be analyzed or
debugged by using the standard
debugging tools.
This article shows you how to use it.
My best bet is to start the process in a debugger (OllyDbg being my preferred tool).
The process will pause on an exception, and you can try to figure out what happened shortly before that.
This needs some understanding of assembler and does not allow to create a snapshot of the process for later analysis. You would need to write your own debugger for that - it should be theoretically possible.