I have an app that needs to send collected data every X milliseconds (and NOT sooner!). My first thought was to stack up the data on an NSMutableArray (array1) on thread1. When thread2 has finished waiting it's X milliseconds, it will swap out the NSMutableArray with a fresh one (array2) and then process its contents. However, I don't want thread1 to further modify array1 once thread2 has it.
This will probably work, but thread safety is not a field where you want to "just try it out." What are the pitfalls to this approach, and what should I do instead?
(Also, if thread2 is actually an NSTimer instance, how does the problem/answer change? Would it all happen on one thread [which would be fine for me, since the processing takes a tiny fraction of a millisecond]?).
You should use either NSOperationQueue or Grand Central Dispatch. Basically, you'll create an operation that receives your data and uploads it when X milliseconds have passed. Each operation will be independent and you can configure the queue wrt how many concurrent ops you allow, op priority, etc.
The Apple docs on concurrency should help:
http://developer.apple.com/library/ios/#documentation/General/Conceptual/ConcurrencyProgrammingGuide/Introduction/Introduction.html
The pitfalls of this approach have to do with when you "swap out" the NSArray for a fresh one. Imagine that thread1 gets a reference to the array, and at the same time thread2 swaps the arrays and finishes processing. Thread1 is now writing to a dead array (one that will no longer be processed), even if it's just for a few milliseconds. The way to prevent this, of couse, is by using synchronized code-blocks (i.e., make your code "thread-safe") in the critical sections, but it's kind of hard not to overshoot the mark and synchronize too much of your code (sacrificing performance).
So the risks are you'll:
Make code that is not thread-safe
Make code that overuses synchronize and is slow (and threads already have a performance overhead)
Make some combination of these two: slow, unsafe code.
The idea is to "migrate away from threads" which is what this link is about.
Related
If you have a semaphore that is being used to restrict access to a shared resource or limit the number of concurrent actions, what is the locking algorithm to be able to change the maximum value of that semaphore once it's in use?
Example 1
In NSOperationQueue, there is a property named maxConcurrentOperationCount. This value can be changed after the queue has been created. The documentation notes that changing this value doesn't affect any operations already running, but it does affect pending jobs, which presumably are waiting on a lock or semaphore to execute.
Since that semaphore is potentially being held by pending operations, you can just replace it with one with a new count. So another lock must be needed in the change somewhere, but where?
Example 2:
In most of Apple's Metal sample code, they use a semaphore with an initial count of 3 to manage in-flight buffers. I'd like to experiment changing that number while my application is running, just to see how big of a difference it makes. I could tear down the entire class that uses that semaphore and then rebuild the Metal pipeline, but that's a bit heavy handed. Like above, I'm curious how I can structure a sequence of locks or semaphores to allow me to swap out that semaphore for a different one while everything is running.
My experience is with Grand Central Dispatch, but I'm equally interested in a C++ implementation that might use those locking or atomic constructs.
I should add that I'm aware I can technically just make unbalanced calls to signal and wait but that doesn't seem right to me. Namely, whatever code that is making these changes needs to be able to block itself if wait takes awhile to reduce the count...
I am using sleep() in two ways in my current embedded (real time) software design:
To throttle a processing loop, but this is discussed here, and as pointed out thread priority will most likely answer very well for that.
Waiting for hardware to "settle". Lets say I am writing an interface to some hardware. Communications with the hardware is all good, but I want to change its mode and I know it only takes a small number of instruction cycles to do it. I am using a sleep(1); to pause briefly to allow for this. I could setup a loop that keeps pinging it until I receive a valid response, but this would arguably be harder to read (much more code) and, in fact, slower because of data transfer times. In fact I could probably do a usleep(100) or less in my case.
So my question is, is this a good practice? And if not, is there a better/efficient alternative?
Callback
The most ideal solution to this would be to have the hardware notify you when a particular operation is complete through some form of callback/signal.
When writing production code, I would almost always favor this solution above all others. Of course this is provided that the api you are using exposes such methods.
Poll
If there is no way for you to receive such events then the only other option would be for you to check if the operation has completed. The most naive solution would be one that constantly checks (spin-lock).
However, if you know roughly how long an operation should take you could always sleep for that duration, wake-up, check operation status then sleep again or continue.
If you are 100% sure about the timings and can guarantee that your thread is not woken up early then you can rely solely on sleep.
Poor Design?
I wouldn't necessarily say that using sleep for this task is poor design. Sometimes you have no other choice. I would say that to rely solely on sleep is poor design when you cannot guarantee timing because you can not be 100% sure that the operation you are waiting for has in fact completed.
In Linux I use sigsuspend it suspends the software until it receives a signal.
Example
My main thread needs a data, but this data isn't ready, so main thread is suspended.
Other thread reads the data and when it finishes, it fires a signal.
Now then main thread continues and it has the data ready.
If you use sleep the data can be ready or not.
So I restructured a central part in my Cocoa application (I really had to!) and I am running into issues since then.
Quick outline: my application controls the playback of QuickTime movies so that they are in sync with external timecode.
Thus, external timecode arrives on a CoreMIDI callback thread and gets posted to the application about 25 times per sec. The sync is then checked and adjusted if it needs to be.
All this checking and adjusting is done on the main thread.
Even if I put all the processing on a background thread it would be a ton of work as I'm currently using a lot of GCD blocks and I would need to rewrite a lot of functions so that they can be called from NSThread. So I would like to make sure first if it will solve my problem.
The problem
My Core MIDI callback is always called in time, but the GCD block that is dispatched to the main queue is sometimes blocked for up to 500 ms. Understandable that adjusting the sync does not quite work if that happens. I couldn't find a reason for it, so I'm guessing that I'm doing something that blocks the main thread.
I'm familiar with Instruments, but I couldn't find the right mode to see what keeps my messages from being processed in time.
I would appreciate if anyone could help.
Don't know what I can do about it.
Thanks in advance!
Watchdog
You can use watch dog that stop when the main thread stopped for time
https://github.com/wojteklu/Watchdog
you can install it using cocoapod
pod 'Watchdog'
You may be blocking the main thread or you might be flooding it with events.
I would suggest three things:
Grab a timestamp for when the timecode arrives in the CoreMIDI callback thread (see mach_absolute_time(). Then grab the current time when your main thread work is being done. You can then adjust accordingly based on how much time has elapsed between posting to the main thread and it actually being processed.
create some kind of coalescing mechanism such that when your main thread is blocked, interim timecode events (that are now out of date) are tossed. This can be as simple as a global NSUInteger that is incremented every time an event is received. The block dispatched to the main queue could capture the current value on creation, then check it when it is processed. If it differs by more than N (N for you to determine), then toss the event because more are in flight.
consider not sending an event to the main thread for every timecode notification. 25 adjustments per second is a lot of work. If processing only 5 per second yields a "good enough" perceptual experience, then that is an awful lot of work saved.
In general, instrumenting the main event loop is a bit tricky. The CPU profiler in Instruments can be quite helpful. It may come as a surprise, but so can the Allocations instrument. In particular, you can use the Allocations instrument to measure memory throughput. If there are tons of transient (short lived) allocations, it'll chew up a ton of CPU time doing all those allocations/deallocations.
Is it safe? For instance, if I create a bunch of different GCD queues that each compress (tar cvzf) some files, am I doing something wrong? Will the hard drive be destroyed?
Or does the system properly take care of such things?
Dietrich's answer is correct save for one detail (that is completely non-obvious).
If you were to spin off, say, 100 asynchronous tar executions via GCD, you'd quickly find that you have 100 threads running in your application (which would also be dead slow due to gross abuse of the I/O subsystem).
In a fully asynchronous concurrent system with queues, there is no way to know if a particular unit of work is blocked because it is waiting for a system resource or waiting for some other enqueued unit of work. Therefore, anytime anything blocks, you pretty much have to spin up another thread and consume another unit of work or risk locking up the application.
In such a case, the "obvious" solution is to wait a bit when a unit of work blocks before spinning up another thread to de-queue and process another unit of work with the hope that the first unit of work "unblocks" and continues processing.
Doing so, though, would mean that any asynchronous concurrent system with interaction between units of work -- a common case -- would be so slow as to be useless.
Far more effective is to limit the # of units of work that are enqueued in the global asynchronous queues at any one time. A GCD semaphore makes this quite easy; you have a single serial queue into which all units of work are enqueued. Every time you dequeue a unit of work, you increment the semaphore. Every time a unit of work is completed, you decrement the semaphore. As long as the semaphore is below some maximum value (say, 4), then you enqueue a new unit of work.
If you take something that is normally IO limited, such as tar, and run a bunch of copies in GCD,
It will run more slowly because you are throwing more CPU at an IO-bound task, meaning the IO will be more scattered and there will be more of it at the same time,
No more than N tasks will run at a time, which is the point of GCD, so "a billion queue entries" and "ten queue entries" give you the same thing if you have less than 10 threads,
Your hard drive will be fine.
Even though this question was asked back in May, it's still worth noting that GCD has now provided I/O primitives with the release of 10.7 (OS X Lion). See the man pages for dispatch_read and dispatch_io_create for examples on how to do efficient I/O with the new APIs. They are smart enough to properly schedule I/O against a single disk (or multiple disks) with knowledge of how much concurrency is, or is not, possible in the actual I/O requests.
I have 3 threads (in addition to the main thread). The threads read, process, and write. They each do this to a number of buffers, which are cycled through and reused. The reason it's set up this way is so the program can continue to do the other tasks while one of them is running. So, for example, while the program is writing to disk, it can simultaneously be reading more data.
The problem is I need to synchronize all this so the processing thread doesn't try to process buffers that haven't been filled with new data. Otherwise, there is a chance that the processing step could process leftover data in one of the buffers.
The read thread reads data into a buffer, then marks the buffer as "new data" in an array. So, it works like this:
//set up in main thread
NSConditionLock *readlock = [[NSConditionLock alloc] initWithCondition:0];
//set up lock in thread
[readlock lockWhenCondition:buffer_new[current_buf]];
//copy data to buffer
memcpy(buffer[current_buf],source_data,data_length);
//mark buffer as new (this is reset to 0 once the data is processed)
buffer_new[current_buf] = 1;
//unlock
[readlock unlockWithCondition:0];
I use buffer_new[current_buf] as a condition variable to NSConditionLock. If the buffer isn't marked as new, then the thread in question will lock, waiting for the previous thread to write new data. That part seems to work okay.
The main problem is I need to sync this in both directions. If the read thread happens to take too long for some reason and the processing thread has already finished with processing all the buffers, the processing thread needs to wait and vice-versa.
I'm not sure NSConditionLock is the appropriate way to do this.
I'd turn this on its head. As you say, threading is hard and multi-way synchronization of threads is even harder. Queue based concurrency is often much more natural.
Define three queues; a read queue, a write queue and a processing queue. Then employ a rule stating that no buffer shall be enqueued in more than one queue at a time.
That is, a buffer may be enqueued onto the read queue and, once done reading, enqueued into the processing queue, and once done processing, enqueued into the write queue.
You could use a stack of buffers if you want but, typically, the cost of allocation is pretty cheap compared to the cost of processing and, thus, enqueue-for-read could also do the allocation while dequeue-once-written could do the free.
This would be pretty straightforward to code with GCD. Note that if you really want parallelism, your various queues would really just be throttles, using semaphores -- potentially shared -- to enqueue the work to the global concurrent queues.
Note also that this design has a distinct advantage over what you are currently using in that it uses no locks. The only locks are hidden below the GCD APIs as a part of queue management, but that is effectively invisible to your code.
Have you seen then Apple Concurrency Programming Guide ?
It recommends several preferable methods for moving away from a Threads and Locks concurrency model. Using Operation Queues for example can not only reduce and simplify your code, speed up your development and give you better performance.
Sometimes you need to use threads, and you already have the correct idea. You will need to keep adding locks, and with each it will get exponentially more complicated until you can't understand your own code. Then you can start adding locks at random places. Then you're screwed.
Read the concurrency guide, then follow bbum's advice.