I am facing an issue in our C based application where one of VxWorks TASK(say Task1) got crashed due to some unknown reasons. The crashed task had locked a mutual exclusion semaphore(say semA).
Now the next TASK2 is waiting on semA to get Unlocked. Since semA is locked by a crashed TASK, TASK2 will be waiting infinitely to grab semA. This has broken application functionality.
We can not provide a timeout to lock semA in TASK2 becuase semA is protecting a send routing that is sending data over sockets. Providing a timeout will result in failure in message communication.
After googling I have found ROBUST mutex for LINUX for such problem, but our platform is VxWorks(version 5.5.1).
So can somebody tell me the way by which we can handle this problem in VxWorks?
I have tried a below mentioned solution nut not sure how safe it is to do so.
1) TASK2 will wait on semA for a particular timout
2) if failed check the state of previous task that had locked the semA
3 if TASK1 state is SUSPENDED, TASK 2 will call semDelete on semA and than recreate it.
4) if TASK1 is not in SUSPENDED state, keep on waiting to grab semA.
I have test this code as prototype and is working fine. I am not sure about how good is to implement such solution where we recreate semaphore and what will be the possible risks imposed.
Please let me know your inputs.
Thanks
I think your prototyped solution is not anymore risky than having code (Task1) that crashes for unknown reasons.
If I were to work on your problem, I would first try really hard to find out why Task1 is crashing. If I were unable to figure out the root cause, I would then go to implement your proposed solution. That is, I would query the state of Task2 after a certain amount of time, and then recreate the semaphore.
I must say, that even if you implement your work around of recreating the semaphore, then you still have a crashed task which consumes resources. If this problem persists, then eventually the whole system will stop working.
In the end the correct and only way to fix this problem is to fix the crash in task1. You should be able to get a stack trace to where it crashed and fix it.
I second the previous answers: finding the cause why Task1 crashes is better than implementing a workaround.
Can you post the messages written by VxWorks of the crashed Task1?
One of the first things I try if a task crashes for no good reason is to increase its stack size (let's say double it). If the task runs fine your stack size is too small. Also try to increase the stack size of the task(s) you've modified lately!
If it is a stack problem it isn't neccessarily Task1 which is to blame...
Related
I'm new to threading, so there are a few things I'm trying to grasp correctly.
I have a windows form application that uses threading to keep my UI responsive while some server shenanigans are going on.
My question is: when I quit my application, what happens to ongoing threads? Will they run to completion or will the abruptly be interrupted?
If they are interrupted, what can I do to make sure they at least don't get interrupted in such a way that would corrupt data on my server (force them to run to a safe place in the code where I know it's ok to interrupt the execution)
You will want to keep a reference of said threads, and call .Abort() on them when you want to terminate. Then you put your thread's code in a try/catch block and handle ThreadAbortException's. This will let you clean up what you are doing and terminate the thread cleanly at your own pace. In the main thread, after you called .Abort(), you just wait until the thread is no longer running (by polling the .IsAlive property of the Thread object) and close your application afterwards.
A thread needs a process to run in. The process won't be able to terminate if you don't terminate all the non-background threads you have started. Threads marked as background thread will be aborted.
So, the behavior is entirely up to your implementation. If you want to close the application, you could wait for all threads to terminate by themself, you could set an event to ask them to terminate and wait or you could just kill the threads.
The UI thread will terminate by itself because it runs a messageloop that stops when requested by the operating system, also see wikipedia and this answer.
I am looking for a technique to hold off on requesting a thread (background worker, Task, etc,) from starting while a previous thread is still processing. The thread has an object writer and if it is busy I cannot use it in the next thread until it finishes its write.
Note, that the processing that occurs before each thread request is sufficiently long enough that there should not be an issue, this is just precautionary.
I am guessing that how I request the thread here is critical to having some sort of response back that will allow the next thread to get called. But I could use some help on how to set this up. If anyone has a specific scenario of similar design I would be happy researching the recommended technique. Sort of new to this sort of thread processing.
vb.net
I'm not sure how you plan on implementing this, but you should try and use the TPL vs. using Threads directly. With Tasks, you can wait on them to complete.
See the following example https://msdn.microsoft.com/en-us/library/dd537610(v=vs.100).aspx
And read the following on Threads vs. Tasks if you need more information on the differences.
http://blog.slaks.net/2013-10-11/threads-vs-tasks/
Typically mutexes are used for synchronization.
https://msdn.microsoft.com/en-us/library/windows/desktop/ms684266(v=vs.85).aspx
Note that you'll also need handle WAIT_ABANDONED, which is the status when a thread that had the mutex dies instead of finishing.
Examples and more info for .Net here: https://msdn.microsoft.com/en-us/library/system.threading.mutex(v=vs.110).aspx
Going through the logs generated by my 'CoreBluetooth' state machine and have noticed on occasions a didDisconnectPeripheral is being called while the peripheral is in CBPeripheralStateConnecting and before a didConnectPeripheral. The code is immune to this strangeness however I would like to understand what is happening.
Anyone else experienced this or anything similar? I cannot find any logical explanation.
in iOS6 when CoreBluetooth was rather less mature I adopted the connection strategy of requesting a connection, if connection didn't result in the next 2 seconds, I would then call cancelPeripheralConnection and then issue another connectPeripheral this cycle would continue 3 further times before terminating and informing the user that something is wrong.
It would appear that the calls to didDisconnectPeripheral, even when not first connected, were a result of the intermediate calls to cancelPeripheralConnection.
Now with the stability of iOS7 and having learned that connectPeripheral never times out I have removed the complexity of intermediate cancelPeripheralConnection & connectPeripheral calls and just wait for the connection, with a timeout.
No more mystery didDisconnectPeripheral calls!
I'm working on a console app developed by a guy who doesn't work here any longer. While debugging, the ContextSwitchDeadlock exception was thrown (I found this question on the exception). If I ignore it, the app will eventually work through the loop it occurs in. The app runs as a scheduled task every day, but this particular process is not called every single time.
I'm wondering if it is OK to allow this exception to go to production. The author of this app put it in production with this exception, and its been running ever since. Should I just make my (unrelated to this exception) updates and leave the app as is? Or should I try to address the issue? Addressing it seems daunting to me :/
Ben. I would say 'NO'. Unless your exception is a ThreadAbortException (i.e. the user closed a window and so the process is dead) or some such thing, an exception like this could open your code up to cascading failures. Based on what we do where I work:
I think, as a band aid, you should encapsulate the offending code with a Try-Catch, and wire it up to send you an email every time it Catches so you have documentation on what's going on AND so that you prevent cascading failures from propagating throughout your code (quarantine the problem).
Towards a fix (when you have time), debug it and step through to figure out why your main thread is taking so long, and if you can, create a worker thread to handle that (DISCLAIMER: this would be my opening attack angle at this problem, based on the answer from the link you provided. I have NOT tested this, nor do I have experience enough to definitively say this will work).
EDIT: After running into this error for a particularly long running process myself, I came across this slew of answers on msdn:
http://social.msdn.microsoft.com/Forums/en/vsto/thread/bf71a6a8-2a6a-4c0a-ab7b-effb09451a89
While I resolved my error (I was reading a System.IO.FileStream into a String Builder instead of using a String and the StreamReader ReadToEnd method), I think it might be helpful to you.
Scenario:
I have a Distributed-objects-based IPC between a mac application and a launchd daemon (written with Foundation classes). Since I had issues before regarding asynchronous messaging (e.g. I have a registerClient: on the server's root object and whenever there's an event the server's root object notifies / calls a method in the client's proxy object), I did long-polling which meant that the client "harvests" lists of events / notifications from the daemon. This "harvest" is done through a server object method call, which then returns an NSArray instance.
It works pretty well, until for a few seconds, the server object's process (launched thru launchd) starts being labeled red with the "(Not responding)" tag beside it (inside Activity Monitor). Like I said, functionally, it works well, but we just want to get rid of this "Not responding" label.
How can I prevent this "Not responding" tag?
FYI, I already did launchd-based processes before and this is the first time I did long-polling. Also, I tried NSSocketPortNameServer-based connections and also NSSocketPort-based ones. They didn't have this problem. Locking wasn't also an issue 'coz the locks used were only NSCondition's and we logged and debugged the program and it seems like the only locking "issue" is on the harvesting part, which actually, functionally works. Also, client-process is written in PyObjC while server process was written using ObjC.
Thanks in advance.
Sample the process to find out what it's doing or waiting on.
Peter's correct in the approach, though you may be able to figure it out through simple inspection. "Not responding" means that you're not processing events on your event queue for at least 5 seconds (used to be 2 seconds, but they upped it in 10.4). For a UI process, this would create a spinning wait cursor, but for a non-UI process, you're not seeing the effects as easily.
If this is a runloop-based program, it means you're probably doing something with a blocking (synchronous) operation that should be done with the run loop and a callback (async). Alternately, you need a second thread to process your blocking operations so your mainthread can continue to respond to events.
My problem was actually the call for getting a process's PID using the signature FNDR... that part caused the "Not responding" error and it never was the locks or the long-polling part. Sorry about this guys. But thank God I already found the answer.