How to determine what's blocking the main thread - objective-c

So I restructured a central part in my Cocoa application (I really had to!) and I am running into issues since then.
Quick outline: my application controls the playback of QuickTime movies so that they are in sync with external timecode.
Thus, external timecode arrives on a CoreMIDI callback thread and gets posted to the application about 25 times per sec. The sync is then checked and adjusted if it needs to be.
All this checking and adjusting is done on the main thread.
Even if I put all the processing on a background thread it would be a ton of work as I'm currently using a lot of GCD blocks and I would need to rewrite a lot of functions so that they can be called from NSThread. So I would like to make sure first if it will solve my problem.
The problem
My Core MIDI callback is always called in time, but the GCD block that is dispatched to the main queue is sometimes blocked for up to 500 ms. Understandable that adjusting the sync does not quite work if that happens. I couldn't find a reason for it, so I'm guessing that I'm doing something that blocks the main thread.
I'm familiar with Instruments, but I couldn't find the right mode to see what keeps my messages from being processed in time.
I would appreciate if anyone could help.
Don't know what I can do about it.
Thanks in advance!

Watchdog
You can use watch dog that stop when the main thread stopped for time
https://github.com/wojteklu/Watchdog
you can install it using cocoapod
pod 'Watchdog'

You may be blocking the main thread or you might be flooding it with events.
I would suggest three things:
Grab a timestamp for when the timecode arrives in the CoreMIDI callback thread (see mach_absolute_time(). Then grab the current time when your main thread work is being done. You can then adjust accordingly based on how much time has elapsed between posting to the main thread and it actually being processed.
create some kind of coalescing mechanism such that when your main thread is blocked, interim timecode events (that are now out of date) are tossed. This can be as simple as a global NSUInteger that is incremented every time an event is received. The block dispatched to the main queue could capture the current value on creation, then check it when it is processed. If it differs by more than N (N for you to determine), then toss the event because more are in flight.
consider not sending an event to the main thread for every timecode notification. 25 adjustments per second is a lot of work. If processing only 5 per second yields a "good enough" perceptual experience, then that is an awful lot of work saved.
In general, instrumenting the main event loop is a bit tricky. The CPU profiler in Instruments can be quite helpful. It may come as a surprise, but so can the Allocations instrument. In particular, you can use the Allocations instrument to measure memory throughput. If there are tons of transient (short lived) allocations, it'll chew up a ton of CPU time doing all those allocations/deallocations.

Related

It is said that using sleep() is poor design, but what about waiting for hardware to settle?

I am using sleep() in two ways in my current embedded (real time) software design:
To throttle a processing loop, but this is discussed here, and as pointed out thread priority will most likely answer very well for that.
Waiting for hardware to "settle". Lets say I am writing an interface to some hardware. Communications with the hardware is all good, but I want to change its mode and I know it only takes a small number of instruction cycles to do it. I am using a sleep(1); to pause briefly to allow for this. I could setup a loop that keeps pinging it until I receive a valid response, but this would arguably be harder to read (much more code) and, in fact, slower because of data transfer times. In fact I could probably do a usleep(100) or less in my case.
So my question is, is this a good practice? And if not, is there a better/efficient alternative?
Callback
The most ideal solution to this would be to have the hardware notify you when a particular operation is complete through some form of callback/signal.
When writing production code, I would almost always favor this solution above all others. Of course this is provided that the api you are using exposes such methods.
Poll
If there is no way for you to receive such events then the only other option would be for you to check if the operation has completed. The most naive solution would be one that constantly checks (spin-lock).
However, if you know roughly how long an operation should take you could always sleep for that duration, wake-up, check operation status then sleep again or continue.
If you are 100% sure about the timings and can guarantee that your thread is not woken up early then you can rely solely on sleep.
Poor Design?
I wouldn't necessarily say that using sleep for this task is poor design. Sometimes you have no other choice. I would say that to rely solely on sleep is poor design when you cannot guarantee timing because you can not be 100% sure that the operation you are waiting for has in fact completed.
In Linux I use sigsuspend it suspends the software until it receives a signal.
Example
My main thread needs a data, but this data isn't ready, so main thread is suspended.
Other thread reads the data and when it finishes, it fires a signal.
Now then main thread continues and it has the data ready.
If you use sleep the data can be ready or not.

operating system - context switches

I have been confused about the issue of context switches between processes, given round robin scheduler of certain time slice (which is what unix/windows both use in a basic sense).
So, suppose we have 200 processes running on a single core machine. If the scheduler is using even 1ms time slice, each process would get its share every 200ms, which is probably not the case (imagine a Java high-frequency app, I would not assume it gets scheduled every 200ms to serve requests). Having said that, what am I missing in the picture?
Furthermore, java and other languages allows to put the running thread to sleep for e.g. 100ms. Am I correct in saying that this does not cause context switch, and if so, how is this achieved?
So, suppose we have 200 processes running on a single core machine. If
the scheduler is using even 1ms time slice, each process would get its
share every 200ms, which is probably not the case (imagine a Java
high-frequency app, I would not assume it gets scheduled every 200ms
to serve requests). Having said that, what am I missing in the
picture?
No, you aren't missing anything. It's the same case in the case of non-pre-emptive systems. Those having pre-emptive rights(meaning high priority as compared to other processes) can easily swap the less useful process, up to an extent that a high-priority process would run 10 times(say/assume --- actual results are totally depending on the situation and implementation) than the lowest priority process till the former doesn't produce the condition of starvation of the least priority process.
Talking about the processes of similar priority, it totally depends on the Round-Robin Algorithm which you've mentioned, though which process would be picked first is again based on the implementation. And, Windows and Unix have same process scheduling algorithms. Windows and Unix does utilise Round-Robin, but, Linux task scheduler is called Completely Fair Scheduler (CFS).
Furthermore, java and other languages allows to put the running thread
to sleep for e.g. 100ms. Am I correct in saying that this does not
cause context switch, and if so, how is this achieved?
Programming languages and libraries implement "sleep" functionality with the aid of the kernel. Without kernel-level support, they'd have to busy-wait, spinning in a tight loop, until the requested sleep duration elapsed. This would wastefully consume the processor.
Talking about the threads which are caused to sleep(Thread.sleep(long millis)) generally the following is done in most of the systems :
Suspend execution of the process and mark it as not runnable.
Set a timer for the given wait time. Systems provide hardware timers that let the kernel register to receive an interrupt at a given point in the future.
When the timer hits, mark the process as runnable.
I hope you might be aware of threading models like one to one, many to one, and many to many. So, I am not getting into much detail, jut a reference for yourself.
It might appear to you as if it increases the overhead/complexity. But, that's how threads(user-threads created in JVM) are operated upon. And, then the selection is based upon those memory models which I mentioned above. Check this Quora question and answers to that one, and please go through the best answer given by Robert-Love.
For further reading, I'd suggest you to read from Scheduling Algorithms explanation on OSDev.org and Operating System Concepts book by Galvin, Gagne, Silberschatz.

NSTimer sometimes freezes when app is doing heavy computation

I'd like to animate some loading points while the app is doing some computation in the background. I achieve this via an NSTimer:
self.timer = [NSTimer scheduledTimerWithTimeInterval:0.3f
target:self
selector:#selector(updateLoadingPoints:)
userInfo:nil
repeats:YES];
Unfortunately, sometimes, when the computation becomes pretty heavy, the method is not fired and the updating therefore doesn't happen. It seems like all the firing is in a queue which is fired after the heavy computation.
Is there a way to give the NSTimer a higher priority to ensure that it's regularly calling my method? Or is there another way to achieve this?
NSTimer works by adding events to the queue on the main run loop; it's the same event queue used for touch events and I/O data received events and so on. The time interval you set isn't a precise schedule; basically on each pass through the run loop, the timers are checked to see if any are due to be fired.
Because of the way they are implemented, there is no way to increase the priority of a timer.
It sounds like your secondary thread is taking a lot of CPU time away from the main thread, and so the timers don't fire as often as you would like. In other words, the main thread is starved for CPU time.
Calling performSelectorOnMainThread: won't necessarily help, because these methods essentially add a single-fire timer to the main thread's event queue. So you'll just be setting up timers in a different way.
To fix your problem, I would suggest that you increase the relative priority of the main thread by decreasing the priority of your computation thread. (See [NSThread setThreadPriority:].)
It may seem counter-intuitive to have your important worker thread running at a lower priority than the main thread, which is just drawing stuff to the screen, but in a human-friendly application, keeping the screen up to date and responding to user input usually is the most important thing that the app should be doing.
In practice, the main thread needs very little CPU, so it won't really be slowing your worker thread down; rather, you are just ensuring that for the small amount of time that the main thread needs to do something, it gets done quickly.
The timer is added to the run loop it's been scheduled with. If you create the timer on a secondary thread (e.g. your worker thread), there's a good chance you also scheduled it on the secondary thread.
You want the UI updates on the main thread. Thus, you want the timer scheduled on the main thread. If your updates are still slow, perhaps your main thread can do less work, and ensure that you have very low number of threads, and that you are locking appropriately.
I suspect you created it on a secondary thread which did not run the run loop as often as the timer wanted to fire. If it is doing a lot of (prolonged) work in the background, and not running the run loop, then the timer would not have a chance to fire because the messages would not have the chance to be fired while its thread is still out processing.
Make your timer call from a separate thread rather than from main thread. this will certainly keep it separate from your other main thread's processing which will give you desired results.
Perform your computation on a separate thread, using performSelectorInBackground:withObject. Always do as little as possible in your UI loop, as any work done here will prevent mouseClicks, cause SPoDs/beachballs, and delay timer handlers.
I suspect that it's not just your TIMER being unresponsive, but the whole UI in general.
Sorry for having called out the wrong API in my earlier revision - copy/paste failure on my part.

Is it safe to access the hard drive via many different GCD queues?

Is it safe? For instance, if I create a bunch of different GCD queues that each compress (tar cvzf) some files, am I doing something wrong? Will the hard drive be destroyed?
Or does the system properly take care of such things?
Dietrich's answer is correct save for one detail (that is completely non-obvious).
If you were to spin off, say, 100 asynchronous tar executions via GCD, you'd quickly find that you have 100 threads running in your application (which would also be dead slow due to gross abuse of the I/O subsystem).
In a fully asynchronous concurrent system with queues, there is no way to know if a particular unit of work is blocked because it is waiting for a system resource or waiting for some other enqueued unit of work. Therefore, anytime anything blocks, you pretty much have to spin up another thread and consume another unit of work or risk locking up the application.
In such a case, the "obvious" solution is to wait a bit when a unit of work blocks before spinning up another thread to de-queue and process another unit of work with the hope that the first unit of work "unblocks" and continues processing.
Doing so, though, would mean that any asynchronous concurrent system with interaction between units of work -- a common case -- would be so slow as to be useless.
Far more effective is to limit the # of units of work that are enqueued in the global asynchronous queues at any one time. A GCD semaphore makes this quite easy; you have a single serial queue into which all units of work are enqueued. Every time you dequeue a unit of work, you increment the semaphore. Every time a unit of work is completed, you decrement the semaphore. As long as the semaphore is below some maximum value (say, 4), then you enqueue a new unit of work.
If you take something that is normally IO limited, such as tar, and run a bunch of copies in GCD,
It will run more slowly because you are throwing more CPU at an IO-bound task, meaning the IO will be more scattered and there will be more of it at the same time,
No more than N tasks will run at a time, which is the point of GCD, so "a billion queue entries" and "ten queue entries" give you the same thing if you have less than 10 threads,
Your hard drive will be fine.
Even though this question was asked back in May, it's still worth noting that GCD has now provided I/O primitives with the release of 10.7 (OS X Lion). See the man pages for dispatch_read and dispatch_io_create for examples on how to do efficient I/O with the new APIs. They are smart enough to properly schedule I/O against a single disk (or multiple disks) with knowledge of how much concurrency is, or is not, possible in the actual I/O requests.

Feedback from threads to main program

My software will simulate a few hundred hardware devices, each of which will send several thousand reports to a database server.
Trying it without threading did not give very good results, so now it's time to thread.
Since I am load testing the d/b server, some of those transactions will succeed and a few may fail. The GUI of the main program needs to reflect this. How should the threads communicate their results back to the main program? Update global variables? Send a message? Or something lese?
Now, if I update only at the end of each thread then the GUI is going to look rather boring (and I can't tell if the program hung). It might be nice to update the GUI periodically. But that might cause contention, with threads waiting for other threads to update (for instance, if I am writing to global variables, I need a mutex, which will block each thread which is waiting to write).
I'm new to threading. How is this normally done? Perhaps the main program could poll the threads, instead of the threads iforming the main program?
One way to organize this is for your threads to add messages to a thread-safe queue (e.g. a ConcurrentQueue) as they get data. To keep things simple you can have a timer thread in your UI that periodically dequeues the queued messages to a private list and then renders them. This design allows your threads to easily queue and forget messages with minimal contention, and for your UI to periodically update itself without blocking your writers too much (i.e. for only the period it takes to dequeue current messages to a private list).
Although you are attempting to simulate the load of hundreds of devices, using thread per device is not the way to model this as you can only run so many threads concurrently anyway.