I'm working on a console app developed by a guy who doesn't work here any longer. While debugging, the ContextSwitchDeadlock exception was thrown (I found this question on the exception). If I ignore it, the app will eventually work through the loop it occurs in. The app runs as a scheduled task every day, but this particular process is not called every single time.
I'm wondering if it is OK to allow this exception to go to production. The author of this app put it in production with this exception, and its been running ever since. Should I just make my (unrelated to this exception) updates and leave the app as is? Or should I try to address the issue? Addressing it seems daunting to me :/
Ben. I would say 'NO'. Unless your exception is a ThreadAbortException (i.e. the user closed a window and so the process is dead) or some such thing, an exception like this could open your code up to cascading failures. Based on what we do where I work:
I think, as a band aid, you should encapsulate the offending code with a Try-Catch, and wire it up to send you an email every time it Catches so you have documentation on what's going on AND so that you prevent cascading failures from propagating throughout your code (quarantine the problem).
Towards a fix (when you have time), debug it and step through to figure out why your main thread is taking so long, and if you can, create a worker thread to handle that (DISCLAIMER: this would be my opening attack angle at this problem, based on the answer from the link you provided. I have NOT tested this, nor do I have experience enough to definitively say this will work).
EDIT: After running into this error for a particularly long running process myself, I came across this slew of answers on msdn:
http://social.msdn.microsoft.com/Forums/en/vsto/thread/bf71a6a8-2a6a-4c0a-ab7b-effb09451a89
While I resolved my error (I was reading a System.IO.FileStream into a String Builder instead of using a String and the StreamReader ReadToEnd method), I think it might be helpful to you.
Related
and subsequently obviously to read/take the topic. The problematic topic is published under BuiltinQosLibExp::Generic.KeepLastReliable.TransientLocal policy and the message is fired only once at the startup of the publisher application. Few things to consider:
Im not using this policy and taking the default policy configuration in code
dds::sub::qos::DataReaderQos tempQos = inSubScriber->default_datareader_qos();
m_EntitySpecReader = new dds::sub::DataReader<XXX_ICD::Entity_Specification_DT>(*inSubScriber, topicLocal, tempQos, m_EntitySpecListener);
from subscriber
The problem is not Firewall or some connection issue, as I know to receive other cyclic topics without any problem.
It is frustrating that I see this topic if Im trying to monitor either with rtiddsspy or RTI administration console.
Last bullet and most frustrating, when I actually felt stuck, is that I have a listener configured with all available callbacks and I thought to receive if not the data at least some callback clue regarding the possible mismatch, lost, something .... but it keeps silence no matter what Im trying to do :)
Will be more than happy to understand if somebody has an answer or potential direction to check :)
You are using the default QoS for your DataReader. This means that its Durability policy is VOLATILE. Even though the DataWriter is configured as TRANSIENT_LOCAL, it will not deliver "old" samples to your DataReader since it is not requesting those due to its volatile durability. In this context, "old" samples are samples that were written before the DataWriter discovered the DataReader.
Things should start working as expected when you configure your DataReader with a Durability policy as TRANSIENT_LOCAL as well.
If you instrumented a Listener on the DataReader, it should show you that a match has taken place though, or that it has failed. If you implemented both the on_subscription_matched and on_requested_incompatible_qos callbacks, then at least one of those two should fire if you have both applications started and if they are able to discover each other.
Since you discovered that the problem was a type mismatch, I wanted to show how the AdminConsole tool could have helped you finding that. Reproducing your issue, this is what it showed:
If your Play app discovers it is unable to operate, for instance because of missing mandatory configuration items, what is the correct way of handling that?
Log an error and System.exit() ? Or is there a "nicer" way?
From a little research, it seems there is a method for closing down the actual play application but this does not shut down the app server (e.g. Netty) (at least in dev mode). Combining this with System.exit() appears to do a "safe" shutdown by first dealing with Play:
play.api.Play.stop
System.exit(-1)
But it will be interesting to test it in your specific circumstances.
This discussion talks more about the meaning of shutting down safely and has an example of Play.stop being called.
BTW, Netty seems to have a stop method, which does a few other things besides the Play.stop call.
Caveat: have not used this in anger.
I am using the UI Automation COM-to-.NET Adapter to read the contents of the target Google Chrome browser that plays a FLASH content on Windows 7. It works.
I succeeded to get the content and elements. Everything works fine for some time but after few hours the elements become inaccessible.
The (AutomationElement).FindAll() returns 0 children.
Is there any internal undocumented Timeout used by UIAutomation ?
According to this IUIAutomation2 interface
There are 2 timeouts but they are not accessible from IUIAutomation interface.
IUIAutomation2 is supported only on Windows 8 (desktop apps only).
So I believe there is some timeout.
I made a workaround that restarts the searching and monitoring of elements from the beginning of the desktop tree but the elements are still not available.
After some time (not sure how much) the elements are available again.
My requirements are to read the values all the time as fast as possible but this behavior makes a damage to the whole architecture.
I read somewhere that there is some timeout of 3 minutes but not sure.
if there is a timeout, is it possible to change it ?
Is it possible to restart something or release/dispose something ?
I can't find anything on MSDN.
Does anybody have any idea what is happening and how to resolve ?
Thanks for this nicely put question. I have a similar issue with a much different setup. I'm on Win7, using UIAutomationCore.dll directly from C# to test our application-under-development. After running my sequence of actions & event subscriptions and all the other things, I intermittently observe that the UIA interface stops working (about 8-10min in my case, but I'm heavily using the UIA interface).
Many different things including dispatching the COM interface, sleeping at different places failed. The funny revelation was I managed to use the AccEvent.exe (part of SDK like inspect.exe) during the test and saw that events also stopped flowing to AccEvent, too. So it wasn't my client's interface that stopped, but it was rather the COM-server (or whatever the UIAutomationCore does) that stopped responding.
As a solution (that seems to work most of the time - or improve the situation a lot), I decided I should give the application-under-test some breathing point, since using the UIA puts additional load on it. This could be a smartly-put sleep points in your client, but instead of sleeping a set time, I'm monitoring the processor load of the application and waiting until it settles down.
One of the intermittent errors I receive when the problem manifests itself is "... was unable to call any of the subscribers..", and my search resulted in an msdn page saying they have improved things on CUIAutomation8 interface, but as this is Windows8 specific, I didn't have the chance to try that yet.
I should also add that I also reduced the number of calls to UIA by incorporating more ui caching (FindAllBuildCache), as the less the frequency of back-and-forth the better it is for the uia. Thanks to the answer of Guy in another question: UI Automation events stop being received after a while monitoring an application and then restart after some time
I am working on an application in Silverlight 5. We use WCF for all of our network communication, and it mostly works well. However, we have a couple of Virtual Machines that we use for testing where the app fails, tries to restart itself, fails, etc. in an endless loop. I have added a lot of tracing code, and a lot of try catches, and I have it isolated all the way down to the line of code that is failing, but I still can't get an actual error message from the failure, just the crash. Originally, it was failing on this line of WCF code:
return await Task<List<Instance>.Factory.FromAsync(Channel.BeginGetInstance, Channel.EndGetInstance, null);
In case it had something to do with the use of async/await, I went back to our old code with callbacks. I still get the same failure, but now I can see the call to the WCF function completes successfully, but the log statement on the first line of the callback never happens, so it seems like its dying before or outside of the callback.
One other note, it appears the code we have in Application_UnhandledException is not firing, but the code in Application_Exit does run, I see that as the last line in the log file.
I tried to setup remote debugging, but I am unable to connect to the app before it crashes and recycles, so that didn't help either.
I also used TCPView to watch the network traffic, and it looks like communication is happening in both directions.
If anyone has any suggestions of anything else to try, I would greatly appreciate it.
I spent 10 days chasing my tail on this before finally realizing my problem. There was a bug in the error logging code. It was generating an error, I was just not seeing it. Once I realized that and got the error message, the actual underlying bug was fixed in about 5 minutes. Good lesson though, never assume the underlying code is working, no matter how simple it is.
Scenario:
I have a Distributed-objects-based IPC between a mac application and a launchd daemon (written with Foundation classes). Since I had issues before regarding asynchronous messaging (e.g. I have a registerClient: on the server's root object and whenever there's an event the server's root object notifies / calls a method in the client's proxy object), I did long-polling which meant that the client "harvests" lists of events / notifications from the daemon. This "harvest" is done through a server object method call, which then returns an NSArray instance.
It works pretty well, until for a few seconds, the server object's process (launched thru launchd) starts being labeled red with the "(Not responding)" tag beside it (inside Activity Monitor). Like I said, functionally, it works well, but we just want to get rid of this "Not responding" label.
How can I prevent this "Not responding" tag?
FYI, I already did launchd-based processes before and this is the first time I did long-polling. Also, I tried NSSocketPortNameServer-based connections and also NSSocketPort-based ones. They didn't have this problem. Locking wasn't also an issue 'coz the locks used were only NSCondition's and we logged and debugged the program and it seems like the only locking "issue" is on the harvesting part, which actually, functionally works. Also, client-process is written in PyObjC while server process was written using ObjC.
Thanks in advance.
Sample the process to find out what it's doing or waiting on.
Peter's correct in the approach, though you may be able to figure it out through simple inspection. "Not responding" means that you're not processing events on your event queue for at least 5 seconds (used to be 2 seconds, but they upped it in 10.4). For a UI process, this would create a spinning wait cursor, but for a non-UI process, you're not seeing the effects as easily.
If this is a runloop-based program, it means you're probably doing something with a blocking (synchronous) operation that should be done with the run loop and a callback (async). Alternately, you need a second thread to process your blocking operations so your mainthread can continue to respond to events.
My problem was actually the call for getting a process's PID using the signature FNDR... that part caused the "Not responding" error and it never was the locks or the long-polling part. Sorry about this guys. But thank God I already found the answer.