My software will simulate a few hundred hardware devices, each of which will send several thousand reports to a database server.
Trying it without threading did not give very good results, so now it's time to thread.
Since I am load testing the d/b server, some of those transactions will succeed and a few may fail. The GUI of the main program needs to reflect this. How should the threads communicate their results back to the main program? Update global variables? Send a message? Or something lese?
Now, if I update only at the end of each thread then the GUI is going to look rather boring (and I can't tell if the program hung). It might be nice to update the GUI periodically. But that might cause contention, with threads waiting for other threads to update (for instance, if I am writing to global variables, I need a mutex, which will block each thread which is waiting to write).
I'm new to threading. How is this normally done? Perhaps the main program could poll the threads, instead of the threads iforming the main program?
One way to organize this is for your threads to add messages to a thread-safe queue (e.g. a ConcurrentQueue) as they get data. To keep things simple you can have a timer thread in your UI that periodically dequeues the queued messages to a private list and then renders them. This design allows your threads to easily queue and forget messages with minimal contention, and for your UI to periodically update itself without blocking your writers too much (i.e. for only the period it takes to dequeue current messages to a private list).
Although you are attempting to simulate the load of hundreds of devices, using thread per device is not the way to model this as you can only run so many threads concurrently anyway.
Related
It is said semaphores are designed for this but how? It looks like I need to submit the semaphore before waiting for it to signal. Then what's the point of multithreading?
I'm using skia (has its own VkQueue) to draw UI, I don't have access to the commandbuffer, I can only provide semaphores for it. it first waits for the scene complete semaphore then draw ui and signals present ready semaphore.
It works fine when everything happens in a single thread. But after I move the UI part to a second thread. It stopped working and I got validation errors like: VkQueue is waiting on semaphore that has no way to be signaled. Of course, since it's on a different thread, the semaphore might not have been submitted to a queue yet.
The spec for vkQueuePresentKHR says
All elements of the pWaitSemaphores member of pPresentInfo must be semaphores that are signaled, or have semaphore signal operations previously submitted for execution
You can't submit work that waits on a semaphore that you plan to submit later. If you have this kind of dependency in your code you need to externally synchronize the submissions so the command buffers that will signal will be sent BEFORE you submit the dependent command buffers, regardless of the queue.
If you're using multiple threads it sounds like you need to rely on some CPU side synchronization primitives, like a CPU semaphore to properly order the work between them. Pure Vulkan sync primitives won't help you there.
So I restructured a central part in my Cocoa application (I really had to!) and I am running into issues since then.
Quick outline: my application controls the playback of QuickTime movies so that they are in sync with external timecode.
Thus, external timecode arrives on a CoreMIDI callback thread and gets posted to the application about 25 times per sec. The sync is then checked and adjusted if it needs to be.
All this checking and adjusting is done on the main thread.
Even if I put all the processing on a background thread it would be a ton of work as I'm currently using a lot of GCD blocks and I would need to rewrite a lot of functions so that they can be called from NSThread. So I would like to make sure first if it will solve my problem.
The problem
My Core MIDI callback is always called in time, but the GCD block that is dispatched to the main queue is sometimes blocked for up to 500 ms. Understandable that adjusting the sync does not quite work if that happens. I couldn't find a reason for it, so I'm guessing that I'm doing something that blocks the main thread.
I'm familiar with Instruments, but I couldn't find the right mode to see what keeps my messages from being processed in time.
I would appreciate if anyone could help.
Don't know what I can do about it.
Thanks in advance!
Watchdog
You can use watch dog that stop when the main thread stopped for time
https://github.com/wojteklu/Watchdog
you can install it using cocoapod
pod 'Watchdog'
You may be blocking the main thread or you might be flooding it with events.
I would suggest three things:
Grab a timestamp for when the timecode arrives in the CoreMIDI callback thread (see mach_absolute_time(). Then grab the current time when your main thread work is being done. You can then adjust accordingly based on how much time has elapsed between posting to the main thread and it actually being processed.
create some kind of coalescing mechanism such that when your main thread is blocked, interim timecode events (that are now out of date) are tossed. This can be as simple as a global NSUInteger that is incremented every time an event is received. The block dispatched to the main queue could capture the current value on creation, then check it when it is processed. If it differs by more than N (N for you to determine), then toss the event because more are in flight.
consider not sending an event to the main thread for every timecode notification. 25 adjustments per second is a lot of work. If processing only 5 per second yields a "good enough" perceptual experience, then that is an awful lot of work saved.
In general, instrumenting the main event loop is a bit tricky. The CPU profiler in Instruments can be quite helpful. It may come as a surprise, but so can the Allocations instrument. In particular, you can use the Allocations instrument to measure memory throughput. If there are tons of transient (short lived) allocations, it'll chew up a ton of CPU time doing all those allocations/deallocations.
Question as stated above .. from the stand point of Operating System, which one is easier to create, a thread or a process?
A new thread should be faster to create than a new process.
A process is a heavy weight system structure. It has it's own virtual memory space, owns all handles (mutexes, semaphores, open files), and has protection from other processes. Cross-process communication has to go through the OS.
A thread is a "child" to a process. A thread is simply an execution context (registers, stack, and thread-local state) that can run on another hardware core or be co-scheduled on the same core as other threads within a process. Multiple threads share the resources of a single process including the address space and OS handles owned by the process.
There are structures even faster than dynamically creating threads for achieving multitasking during a programs runtime.
Some systems or code libraries support have thread pools (light-weight threads). In this case, you tell the system how many threads you want to run and it creates them up front. Then instead of creating and destroying threads (which is still a relatively slow process), you can allocate and free threads from this pool.
Job Tasking is another similar lighter weight multicore structure where you have several threads with a job queues of tasks to execute. They run the tasks in their job queues and then sleep when the queues are empty.
For both thread pools and job tasking, there is no need for thread startup / shutdown cost aside from upon creation and destruction of the global pools and queues.
Well traditionally threads are called "lightweight processes" so I guess they are easier to set up.
IIRC in Linux both forking and starting a new thread (clone(2)) are implemented deep down with a call to the same function (do_fork) and the set-up times are really comparable for decent numbers. For large numbers of forks / clones (think thousands) they start to add up.
In TLPI there is a nice comparison:
Forking 100,000 times: 22.27 seconds
Cloning 100,000 times: 2.97 seconds
In particular a really nice feature of clone is that the speed remains constant even if the size of the process cloned grows.
The real advantage of threads lies in that they don't need IPC.
A new thread is easier to create, since when a new process is created, it requires more setup than a thread, e.g. a security context, an inheritable handle, a current directory, etc.
The major difference between threads and processes is
1.Threads share the address space of the process that
created it; processes have their own address.
2.Threads have direct access to the data segment of its
process; processes have their own copy of the data segment
of the parent process.
3.Threads can directly communicate with other threads of
its process; processes must use interprocess communication
to communicate with sibling processes.
4.Threads have almost no overhead; processes have
considerable overhead.
5.New threads are easily created; new processes require
duplication of the parent process.
6.Threads can exercise considerable control over threads of
the same process; processes can only exercise control over
child processes.
7.Changes to the main thread (cancellation, priority
change, etc.) may affect the behavior of the other threads
of the process; changes to the parent process does not
affect child processes.
A thread just has to be as easy or easier to create than a process since a creating a process implies creating at least one thread to run the process code.
Rgds,
Martin
Is it safe? For instance, if I create a bunch of different GCD queues that each compress (tar cvzf) some files, am I doing something wrong? Will the hard drive be destroyed?
Or does the system properly take care of such things?
Dietrich's answer is correct save for one detail (that is completely non-obvious).
If you were to spin off, say, 100 asynchronous tar executions via GCD, you'd quickly find that you have 100 threads running in your application (which would also be dead slow due to gross abuse of the I/O subsystem).
In a fully asynchronous concurrent system with queues, there is no way to know if a particular unit of work is blocked because it is waiting for a system resource or waiting for some other enqueued unit of work. Therefore, anytime anything blocks, you pretty much have to spin up another thread and consume another unit of work or risk locking up the application.
In such a case, the "obvious" solution is to wait a bit when a unit of work blocks before spinning up another thread to de-queue and process another unit of work with the hope that the first unit of work "unblocks" and continues processing.
Doing so, though, would mean that any asynchronous concurrent system with interaction between units of work -- a common case -- would be so slow as to be useless.
Far more effective is to limit the # of units of work that are enqueued in the global asynchronous queues at any one time. A GCD semaphore makes this quite easy; you have a single serial queue into which all units of work are enqueued. Every time you dequeue a unit of work, you increment the semaphore. Every time a unit of work is completed, you decrement the semaphore. As long as the semaphore is below some maximum value (say, 4), then you enqueue a new unit of work.
If you take something that is normally IO limited, such as tar, and run a bunch of copies in GCD,
It will run more slowly because you are throwing more CPU at an IO-bound task, meaning the IO will be more scattered and there will be more of it at the same time,
No more than N tasks will run at a time, which is the point of GCD, so "a billion queue entries" and "ten queue entries" give you the same thing if you have less than 10 threads,
Your hard drive will be fine.
Even though this question was asked back in May, it's still worth noting that GCD has now provided I/O primitives with the release of 10.7 (OS X Lion). See the man pages for dispatch_read and dispatch_io_create for examples on how to do efficient I/O with the new APIs. They are smart enough to properly schedule I/O against a single disk (or multiple disks) with knowledge of how much concurrency is, or is not, possible in the actual I/O requests.
I am aware that threads are used for multi-tasking and they are light weight. But my doubt is lets say I need a process without any multi-tasking. I just created a process. Now will the CPU associate a single thread to the process OR will it execute the process alone without need to have a thread?
Please clarify.
Regards,
Harish
Well, that depends on the OS that you're talking about but, for many, the creation of a process includes the act of creating a single thread for that process.
That thread is then free to go and create other threads belonging to the process.
It makes little sense to talk about a process with no threads since that means no code is running for that process so it can't really do anything useful. And one of the things it won't be able to do is create the first thread for that process if you want it to do any useful work :-)
As an example, in the Linux kernel, the creation of a process is little different to creating a new thread. That's because the kernel schedules threads rather than processes.
Processes are now considered to be groups of threads with the same thread group ID (TGID), that TGID being the thread ID (TID) of the first thread created for that process.
When you fork or vfork or clone (without CLONE_THREAD), you get a new thread with a new TID and the TGID is set to that TID - that's a new process.
When you clone with CLONE_THREAD, you get a new thread with a new TID but the TGID remains the same as your cloner. That's a different thread in the same process.
That's how Linux (as an example) distinguishes between processes and threads without having to make the scheduler too complicated. The scheduler can choose to ignore thread groups entirely if it wishes. It's actually incredibly clever.
To code outside the scheduler, a group of threads with the same TGID is considered a process (with the TGID being what the code outside of the scheduler sees as the process ID).
This includes both user space code and other bits of the kernel since, for example, how threads are grouped into processes has a bearing on things like signal delivery and exit codes.
A process -is- a thread.
When a process begins, it begins with a single thread.
Before the days of multi-threading, the term thread was unnecessary because you couldn't have a process with more than one thread.
Now days, you can create additional threads, and so have a process with multiple threads.
A process is also a bunch of other things - memory, stack, whathaveyou; one of the things it is, is threads. The threads share some of the other things in the process (such as memory), but have their own individual instances of others (such as stacks).