Good method to pass messages between embedded RTOS tasks (but can handle message timeouts gracefully) - embedded

I'm working with an embedded RTOS (CMX), but I think this applies to any embedded RTOS. I want to pass messages between various tasks. The problem is that one task sometimes 'locks' every other task out for a long period of time (several seconds).
Since I no longer wait for the message to be ACK'ed after ~100 ms or so, if I send a mailbox message during this time, the task that sent the message is no longer waiting for it reply, but the receiving task will get the message and try to act on it. The problem is that the receiving task has a pointer to the message, but since the sending task has moved on, the pointer is no longer pointing to the message which can cause huge problems.
I have no method of removing messages once they are in the queue. How can I handle this error gracefully?

This question actually covers several different issues / points.
First of all, I'm wondering why one task hogs the CPU for seconds at a time sometimes. Generally this is an indication of a design problem. But I don't know your system, and it could be that there is a reasonable explanation, so I won't go down that rabbit hole.
So from your description, you are enqueueing pointers to messages, not copies of messages. Nothing inherently wrong with that. But you can encounter exactly the problem you describe.
There are at least 2 solutions to this problem. Without knowing more, I cannot say which of these might be better.
The first approach would be to pass a copy of the message, instead of a pointer to it. For example, VxWorks msg queues (not CMX queues obviously) have you enqueue a copy of the message. I don't know if CMX supports such a model, and I don't know if you have the bandwidth / memory to support such an approach. Generally I avoid this approach when I can, but it has its place sometimes.
The second approach, which I use whenever I can in such a situation, is to have the sender allocate a message buffer (usually from my own msg/buffer pools, usually a linked-list of fixed size memory blocks - but that is an implementation detail - see this description of "memory pools" for an illustration of what I'm talking about). Anyway -- after the allocation, the sender fills in the message data, enqueues a pointer to the message, and releases control (ownership) of the memory block (i.e., the message). The receiver is now responsible for freeing/returning the memory after reading the message.
There are other issues that could be raised in this question, for example what if the sender "broadcasts" the msg to more than one receiver? How do the receivers coordinate/communicate so that only the last reader frees the memory (garbage collection)? But hopefully from what you asked, the 2nd solution will work for you.

Related

vkAllocateDescriptorSets returns VK_OUT_OF_HOST_MEMORY

I wrote vulkan code on my laptop that worked, and then I got a new laptop and now running it, the program aborts because vkAllocateDescriptorSets() returns VK_OUT_OF_HOST_MEMORY.
I doubt that it is actually out of memory, and I know it can allocate some memory because VkCreateInstance() doesn't fail like in this stack overflow post: Vulkan create instance VK_OUT_OF_HOST_MEMORY.
EDIT: Also, I forgot to mention, vkAllocateDescriptorSets() only returns VK_OUT_OF_HOST_MEMORY the second time I run it.
vkAllocateDescriptorSets allocates descriptors from a pool. So while such allocation could fail due to a lack of host/device memory, there are two other things that could cause failure. There may simply not be enough memory in the pool to allocate the number of descriptors/sets you asked for. Or there could be enough memory, but repeated allocations/deallocations have fragmented the pool such that the allocations cannot be made.
The case of allocating more descriptors/sets than are available should never happen. After all, you know how many descriptors&sets you put into that pool, so you should know exactly when you'll run out. This is an error state that a working application can guarantee it will never encounter. Though the VK_KHR_maintenance1 extension did add support for this circumstance: VK_ERROR_OUT_OF_POOL_MEMORY_KHR.
However, if you've screwed up your pool creation in some way, you will get this possibility. Of course, since there's no error code for it (outside of the extension), the implementation will have to provide a different error code: either host or device memory exhaustion.
But again, this is a programming error on your part, not something you should ever expect to see with working code. In particular, even if you request that extension, do not keep allocating from a pool until it stops giving you memory. That's just bad coding.
For the fragmentation case, they do have an error code: VK_ERROR_FRAGMENTED_POOL. However, the Khronos Group screwed up. See, the first few releases of Vulkan didn't include this error code; it was added later. Which means that implementations from before the adding of this error code (and likely afterwards) had to pick an inappropriate error code to return. Again, either host or device memory.
So you basically have to treat any failure of this function as either fragmentation, programming error (ie: you asked for more stuff than you put into the pool), or something else. In all cases, it's not something you can recover from at runtime.
Since it appeared to work once, odds are good that you probably just allocated more stuff than the pool contains. So you should make sure that you add enough stuff to the pool before allocating from it.
The problem was that I had not allocated enough memory in the pool. I solved it by creating multiple pools. One for each descriptor set.

Grand Central Dispatch: Queue vs Semaphore for controlling access to a data structure?

I'm doing this with Macruby, but I don't think that should matter much here.
I've got a model which stores its state in a dictionary data structure. I want concurrent operations to be updating this data structure sporadically. It seems to me like GCD offers a few possible solutions to this, including these two:
wrap any code that accesses the data structure in a block sent to some serial queue
use a GCD semaphore, with client code sending wait/signal calls as necessary when accessing the structure
When the queues in the first solution are synchronously called, then it seems pretty much equivalent to the semaphore solution. Do either of these solutions have clear advantages that I'm missing? Is there a better alternative I'm missing?
Also: would it be straightforward to implement a read-write (shared-exclusive) lock with GCD?
Serial Queue
Pros
there are not any lock
Cons
tasks can't work concurrently in the Serial Queue
GCD Semaphore
Pros
tasks can work concurrently
Cons
it uses lock even though it is light weight
Also we can use Atomic Operations instead of GCD Semaphore. It would be lighter than GCD Semaphore in some situation.
Synchronization Tools - Atomic Operations
Guarding access to the data structure with dispatch_sync on serial queue is semantically equivalent to using a dispatch semaphore, and in the uncontended case, they should both be very fast. If performance is important, benchmark and see if there's any significant difference.
As for the readers-writer lock, you can indeed construct one on top of GCD—at least, I cobbled something together the other day here that seems to work. (Warning: there be dragons/not-well-tested code.) My solution funnels the read/write requests through an intermediary serial queue before submitting to a global concurrent queue. The serial queue is suspended/resumed at the appropriate times to ensure that write requests execute serially.
I wanted something that would simulate a private concurrent dispatch queue that allowed for synchronisation points—something that's not exposed in the public GCD api, but is strongly hinted at for the future.
Adding a warning (which ends up being a con for dispatch queues) to the previous answers.
You need to be careful of how the dispatch queues are called as there are some hidden scenarios that were not immediately obvious to me until I ran into them.
I replaced NSLock and #synchronized on a number of critical sections with dispatch queues with the goal of having lightweight synchronization. Unfortunately, I ran into a situation that results in a deadlock and I have pieced it back to using the dispatch_barrier_async / dispatch_sync pattern. It would seem that dispatch_sync may opportunistically call its block on the main queue (if already executing there) even when you create a concurrent queue. This is a problem since dispatch_sync on the current dispatch queue causes a deadlock.
I guess I'll be moving backwards and using another locking technique in these areas.

Live object is garbage collected?

I am using Garbage collector in my Cocoa based application on Mac OS X. It has 100s of threads running and synchronization is done using Operation Queue.
After a long run, one of the object is garbaged and application will crash.
Checking to see if the object is non nil also fails as the the object is invalid and contains some garbage value. Calling a method on the object leads to crash.
Anyone please help me in debugging the issue.
Thank you......................
I am using Garbage collector in my
Cocoa based application on Mac OS X.
It has 100s of threads running and
synchronization is done using
Operation Queue.
More likely than not, the bug lies within the seemingly rather overly concurrent nature of your code. Running 100s of threads on a machine with "only" double digits worth of cores (if that) is unlikely to be very efficient and, of course, keeping everything properly in sync is going to be rather difficult.
The best place to start is to turn on Malloc stack logging and use malloc_history to find out what events occurred at the address that went south.
Also, there were fixes in 10.6.5 that impacted GC correctness.
If you can change the code of the garbage collected object, then override the finalize method like this, to get some information:
- (void) finalize
{
NSLog(#"Finalizing!\n%#", [[NSThread callStackSymbols] componentsJoinedByString:#"\n"]);
//if you put here a breakpoint, you can check the supposed references to this object
[super finalize];
}

Can the try...catch mechanism be used to avoid memory crashes?

I am really interested to know that, Is it possible that using try ... catch mechanism, we can avoid memory crash of our application ... ??
Let say the program part that we are expecting a chance of memory leak is kept under try...catch block, if the program crashes (ie memory leak) occurs then catch statement execute. So we can prevent our program from being crashed.
Is it possible ? If yes, How Or If not , why not ??
A try/catch block is there to catch an exception and stop it from propagating upwards in your callstack.
The idea goes that you catch it at the place where you know how to handle it, and then you get a chance to execute code in response to the exception.
It is not a magical solution that will prevent anything, it is just what I said above. What you do with the exception is what matters.
And a memory leak and a crash is not the same thing, it is rare that a program crashes due to memory leaks, unless it actually runs out of memory. A memory leak is rarely "fixable" after the fact. The correct, and usually only, way to fix a memory leak is to avoid it happening in the first place.
Also, yes, in a way you can keep your program from crashing by adding try/catch blocks all over, but the only thing you've succeeded in is to hide the crash from the user, and then let the program continue running. "Crashes" are not always safe to ignore, or rather, they are usually not safe to ignore.
If you are looking for some catch-all advice on how to avoid a program crashing, here's my advice:
Write a program that works correctly
I think we need to know more about what kinds of problems you're having, or you need to post a clearer question.
I would not trust any in process system to do the right thing in case of an out of memory condition. We have systems which completely lock up when a PermGen exception occurs and need a kill -9 to get rid off.
If you want the system to self correct, wrap it in a script or a system which monitors the health, a heart beat or a diagnostics page or whatever makes sense. If you receive no health indication kill it (hard if necessary) and restart it.
Best of all is to use testing and validation to include monitoring the memory (and disk) usage and make sure you really know how your system behaves and is properly configured so it does not happen.
The restarting solution is a poor alternative, because you then must test and ascertain that you can kill your application at any time and be confident it can be restarted correctly, which might even be more difficult.
If you are asking "can I catch segmentation faults with try catch", the answer is no.
try catch is for handling Objective-C exceptions i.e. those raised by executing an #throw statement. Segmentation faults caused by e.g. null pointer dereferences are generated by the operating system and are examples of Unix signals and can only be caught and handled by OS level system calls e.g. the sigaction(2) system call. Even then, the only sane thing you can do is terminate the program immediately because you might have a corrupt heap or stack.

Does the .dispose() method do anything at all?

I was experimenting with ways to get rid of some memory leaks within my application the other day when I realized that I know virtually nothing about cleaning up my resources. I did some research, and hoped that just calling the .dispose() would solve all of my problems. We have a table in our database that contains about 65,000 records. Obviously when I fill my dataset from the data adapter, the memory usage can get pretty high. When I called the dispose method on the dataset, I was surprised to find out that NONE of the memory got released. Why did this happen? Clearing the dataset doesn't help either.
IDisposable and thus Dispose is not used to reduced memory pressure, although in some cases it might, but instead used for deterministic cleanup.
Consider this, you construct an object that maintains an active and open connection to your database server. This connection uses resources, both on your machine, and the server.
You could of course just leave the object be when you're done with it, and eventually it'll get picked up by the garbage collector, but suppose you want to make sure at least the resources gets freed, and thus the connection closed, when you're done with it. This is where IDisposable.Dispose comes into play.
It is used to clean up resources managed by the object.
It will, however, not free the managed memory allocated to the object. This is still left to the garbage collector, that will kick in at some later time to do that.
Do you actually have a memory problem, or do you just look at the memory usage in Task Manager or similar and go "that's a bit high."?
If the latter, then you should just leave it be for now. .NET will run garbage collection more often if you have less memory available, so unless you're in a situation where you get, or might suspect you will get soon, a memory overflow condition, you're probably not going to have any problems.
Let me explain what I mean by "run less often".
If you have 8GB of memory in your machine, and only have Windows and Notepad running, most of that memory will be available. When you now run your program, even if it loads minor data blocks into memory, you can keep doing that for a long time, and memory usage will steadily grow. Exactly when the GC will kick in and try to reduce your memory footprint I don't know, but I can almost guarantee you that you will wonder why it gets so high.
Let's just for the sake of the argument say that your program will eventually use 2GB of memory.
Now, if you run your program on a machine that has less memory available, GC will occur more often, and will kick in on a lower limit, which might keep the memory usage below 500MB or possibly even less.
The important part to note here is that in order for you to get an accurate picture of how much memory application actually requires, then you can't rely on Task Manager or similar ways to measure it, you need something more targetted.
Calling Dispose() will only release unmanaged resources, such as file handles, database connections, unmanaged memory, etc. It will not release garbage collected memory.
Garbage collected memory will only get released at the next collection. Usually when the application domain memory is deamed full.
I'm going to point out something here that hasn't been explicitly mentioned: calling Dispose() will only clean up (free) unmanaged resources if the developer of the component has coded it.
What I mean is this: if you suspect you have a memory leak, calling Dispose() is not going to fix it if the original developer has done a lousy job and not correctly freed up unmanaged resources. For a bit more info, check this blog post. Take note of the statement The behaviour of Dispose is defined by the developer.
Some objects will ask one or more other entities to do something on its behalf until further notice, to the detriment of other entities. If an object which did so were to disappear without informing the former entities that their services were no longer needed, those entities would continue to uselessly act on behalf of an object that no longer needed them, to the continuing detriment of other entities that would want to use them.
In many cases, for an object "George" to tell an outside entity "Joe" that its services were no longer needed, George would have to know that its services were no longer needed. There are two normal means via which that can happen in .NET, finalization and IDIsposable.
If an object overrides a method called Finalize, then when the object is created the .NET garbage collector will add it to a list of objects with registered finalizers. If the GC discovers that there exists no rooted reference to the object other than that list, the GC will remove the object from that list and add it to a strongly-rooted queue of objects which should have their Finalize method called as soon as possible. Such an object can then use its Finalize method to inform other entities that their services are no longer required.
Although finalization-based cleanup can sometimes work, there's no guarantee of timeliness. At one point during the design of .net Microsoft may have intended that finalization would be the primary cleanup method, but for a variety of reasons it cannot safely be relied upon.
The other cleanup approach, which should be the focus of one's efforts, is IDisposable. Basically, the idea behind IDisposable is simple: for every object that implements IDisposable, there should be one entity (generally either an object or a nested execution scope) which is responsible for ensuring that that object's IDisposable.Dispose method will get called sometime within the lifetime of the universe (which would imply sometime while a reference to the object still exists), and preferably as soon as code can tell that the object's services will no longer be required.
Note that IDisposable.Dispose generally promises that any outside entities which had been asked to do something on an object's behalf will be told that they no longer need to do so, but such a promise does not imply that the number of entities is non-zero. If an object hasn't asked any outside entities to do anything on its behalf, then delivering a message "all" such entities doesn't require doing anything at all. On the other hand, the fact that a Dispose method may do nothing in some cases doesn't mean that it's guaranteed never to do anything in any case, nor that failure to call it in those cases where it would do something won't have detrimental effects.