I am aware that threads are used for multi-tasking and they are light weight. But my doubt is lets say I need a process without any multi-tasking. I just created a process. Now will the CPU associate a single thread to the process OR will it execute the process alone without need to have a thread?
Please clarify.
Regards,
Harish
Well, that depends on the OS that you're talking about but, for many, the creation of a process includes the act of creating a single thread for that process.
That thread is then free to go and create other threads belonging to the process.
It makes little sense to talk about a process with no threads since that means no code is running for that process so it can't really do anything useful. And one of the things it won't be able to do is create the first thread for that process if you want it to do any useful work :-)
As an example, in the Linux kernel, the creation of a process is little different to creating a new thread. That's because the kernel schedules threads rather than processes.
Processes are now considered to be groups of threads with the same thread group ID (TGID), that TGID being the thread ID (TID) of the first thread created for that process.
When you fork or vfork or clone (without CLONE_THREAD), you get a new thread with a new TID and the TGID is set to that TID - that's a new process.
When you clone with CLONE_THREAD, you get a new thread with a new TID but the TGID remains the same as your cloner. That's a different thread in the same process.
That's how Linux (as an example) distinguishes between processes and threads without having to make the scheduler too complicated. The scheduler can choose to ignore thread groups entirely if it wishes. It's actually incredibly clever.
To code outside the scheduler, a group of threads with the same TGID is considered a process (with the TGID being what the code outside of the scheduler sees as the process ID).
This includes both user space code and other bits of the kernel since, for example, how threads are grouped into processes has a bearing on things like signal delivery and exit codes.
A process -is- a thread.
When a process begins, it begins with a single thread.
Before the days of multi-threading, the term thread was unnecessary because you couldn't have a process with more than one thread.
Now days, you can create additional threads, and so have a process with multiple threads.
A process is also a bunch of other things - memory, stack, whathaveyou; one of the things it is, is threads. The threads share some of the other things in the process (such as memory), but have their own individual instances of others (such as stacks).
Related
I am currently pursuing an undergraduate level course in Operating Systems. I'm somewhat confused about the functions of dispatcher and scheduler in process scheduling. Based on what I've learnt, the medium term scheduler selects the process for swapping out and in , and once the processes are selected, the actual swap operation is performed by Dispatcher by context switching. Also the short term scheduler is responsible for scheduling the processes and allocate them CPU time, based on the scheduling algorithm followed.
Please correct me if I'm wrong. I'm really confused about the functions of medium term scheduler vs dispatcher, and differences between Swapping & context switching.
You describing things in system specific terms.
The scheduler and the dispatcher could be all the same thing. However, the frequently are divided so that the scheduler maintains a queue of processes and the dispatcher handles the actual context switch.
If you divide the scheduler into long term, medium term, and short term, that division (if it exists at all) is specific to the operating system.
Swapping in the process of removing a process from memory. A process can be made non-executable through a context switch but may not be swapped out. Swapping is generally independent of scheduling. However, a process must be swapped in to run and the memory management will try to avoid swapping out executing processes.
A scheduler evaluate the requirement of the request to be serviced and thus imposes ordering.
Basically,whatever you have known about scheduler and dispatcher is correct.Sometimes they are referred to as a same unit or scheduler(short time in this case) contains dispatcher as a single unit and together are responsible for allocating a process to CPU for execution.Sometimes they are referred as two separate units,the scheduler selects a process according to some algorithm and the dispatcher is a software that is responsible for actual context switching.
Question as stated above .. from the stand point of Operating System, which one is easier to create, a thread or a process?
A new thread should be faster to create than a new process.
A process is a heavy weight system structure. It has it's own virtual memory space, owns all handles (mutexes, semaphores, open files), and has protection from other processes. Cross-process communication has to go through the OS.
A thread is a "child" to a process. A thread is simply an execution context (registers, stack, and thread-local state) that can run on another hardware core or be co-scheduled on the same core as other threads within a process. Multiple threads share the resources of a single process including the address space and OS handles owned by the process.
There are structures even faster than dynamically creating threads for achieving multitasking during a programs runtime.
Some systems or code libraries support have thread pools (light-weight threads). In this case, you tell the system how many threads you want to run and it creates them up front. Then instead of creating and destroying threads (which is still a relatively slow process), you can allocate and free threads from this pool.
Job Tasking is another similar lighter weight multicore structure where you have several threads with a job queues of tasks to execute. They run the tasks in their job queues and then sleep when the queues are empty.
For both thread pools and job tasking, there is no need for thread startup / shutdown cost aside from upon creation and destruction of the global pools and queues.
Well traditionally threads are called "lightweight processes" so I guess they are easier to set up.
IIRC in Linux both forking and starting a new thread (clone(2)) are implemented deep down with a call to the same function (do_fork) and the set-up times are really comparable for decent numbers. For large numbers of forks / clones (think thousands) they start to add up.
In TLPI there is a nice comparison:
Forking 100,000 times: 22.27 seconds
Cloning 100,000 times: 2.97 seconds
In particular a really nice feature of clone is that the speed remains constant even if the size of the process cloned grows.
The real advantage of threads lies in that they don't need IPC.
A new thread is easier to create, since when a new process is created, it requires more setup than a thread, e.g. a security context, an inheritable handle, a current directory, etc.
The major difference between threads and processes is
1.Threads share the address space of the process that
created it; processes have their own address.
2.Threads have direct access to the data segment of its
process; processes have their own copy of the data segment
of the parent process.
3.Threads can directly communicate with other threads of
its process; processes must use interprocess communication
to communicate with sibling processes.
4.Threads have almost no overhead; processes have
considerable overhead.
5.New threads are easily created; new processes require
duplication of the parent process.
6.Threads can exercise considerable control over threads of
the same process; processes can only exercise control over
child processes.
7.Changes to the main thread (cancellation, priority
change, etc.) may affect the behavior of the other threads
of the process; changes to the parent process does not
affect child processes.
A thread just has to be as easy or easier to create than a process since a creating a process implies creating at least one thread to run the process code.
Rgds,
Martin
Is it safe? For instance, if I create a bunch of different GCD queues that each compress (tar cvzf) some files, am I doing something wrong? Will the hard drive be destroyed?
Or does the system properly take care of such things?
Dietrich's answer is correct save for one detail (that is completely non-obvious).
If you were to spin off, say, 100 asynchronous tar executions via GCD, you'd quickly find that you have 100 threads running in your application (which would also be dead slow due to gross abuse of the I/O subsystem).
In a fully asynchronous concurrent system with queues, there is no way to know if a particular unit of work is blocked because it is waiting for a system resource or waiting for some other enqueued unit of work. Therefore, anytime anything blocks, you pretty much have to spin up another thread and consume another unit of work or risk locking up the application.
In such a case, the "obvious" solution is to wait a bit when a unit of work blocks before spinning up another thread to de-queue and process another unit of work with the hope that the first unit of work "unblocks" and continues processing.
Doing so, though, would mean that any asynchronous concurrent system with interaction between units of work -- a common case -- would be so slow as to be useless.
Far more effective is to limit the # of units of work that are enqueued in the global asynchronous queues at any one time. A GCD semaphore makes this quite easy; you have a single serial queue into which all units of work are enqueued. Every time you dequeue a unit of work, you increment the semaphore. Every time a unit of work is completed, you decrement the semaphore. As long as the semaphore is below some maximum value (say, 4), then you enqueue a new unit of work.
If you take something that is normally IO limited, such as tar, and run a bunch of copies in GCD,
It will run more slowly because you are throwing more CPU at an IO-bound task, meaning the IO will be more scattered and there will be more of it at the same time,
No more than N tasks will run at a time, which is the point of GCD, so "a billion queue entries" and "ten queue entries" give you the same thing if you have less than 10 threads,
Your hard drive will be fine.
Even though this question was asked back in May, it's still worth noting that GCD has now provided I/O primitives with the release of 10.7 (OS X Lion). See the man pages for dispatch_read and dispatch_io_create for examples on how to do efficient I/O with the new APIs. They are smart enough to properly schedule I/O against a single disk (or multiple disks) with knowledge of how much concurrency is, or is not, possible in the actual I/O requests.
My software will simulate a few hundred hardware devices, each of which will send several thousand reports to a database server.
Trying it without threading did not give very good results, so now it's time to thread.
Since I am load testing the d/b server, some of those transactions will succeed and a few may fail. The GUI of the main program needs to reflect this. How should the threads communicate their results back to the main program? Update global variables? Send a message? Or something lese?
Now, if I update only at the end of each thread then the GUI is going to look rather boring (and I can't tell if the program hung). It might be nice to update the GUI periodically. But that might cause contention, with threads waiting for other threads to update (for instance, if I am writing to global variables, I need a mutex, which will block each thread which is waiting to write).
I'm new to threading. How is this normally done? Perhaps the main program could poll the threads, instead of the threads iforming the main program?
One way to organize this is for your threads to add messages to a thread-safe queue (e.g. a ConcurrentQueue) as they get data. To keep things simple you can have a timer thread in your UI that periodically dequeues the queued messages to a private list and then renders them. This design allows your threads to easily queue and forget messages with minimal contention, and for your UI to periodically update itself without blocking your writers too much (i.e. for only the period it takes to dequeue current messages to a private list).
Although you are attempting to simulate the load of hundreds of devices, using thread per device is not the way to model this as you can only run so many threads concurrently anyway.
What is the difference between a thread/process/task?
Process:
A process is an instance of a computer program that is being executed.
It contains the program code and its current activity.
Depending on the operating system (OS), a process may be made up of multiple threads of execution that execute instructions concurrently.
Process-based multitasking enables you to run the Java compiler at the same time that you are using a text editor.
In employing multiple processes with a single CPU,context switching between various memory context is used.
Each process has a complete set of its own variables.
Thread:
A thread is a basic unit of CPU utilization, consisting of a program counter, a stack, and a set of registers.
A thread of execution results from a fork of a computer program into two or more concurrently running tasks.
The implementation of threads and processes differs from one operating system to another, but in most cases, a thread is contained inside a process. Multiple threads can exist within the same process and share resources such as memory, while different processes do not share these resources.
Example of threads in same process is automatic spell check and automatic saving of a file while writing.
Threads are basically processes that run in the same memory context.
Threads may share the same data while execution.
Thread Diagram i.e. single thread vs multiple threads
Task:
A task is a set of program instructions that are loaded in memory.
Short answer:
A thread is a scheduling concept, it's what the CPU actually 'runs' (you don't run a process). A process needs at least one thread that the CPU/OS executes.
A process is data organizational concept. Resources (e.g. memory for holding state, allowed address space, etc) are allocated for a process.
To explain on simpler terms
Process: process is the set of instruction as code which operates on related data and process has its own various state, sleeping, running, stopped etc. when program gets loaded into memory it becomes process. Each process has atleast one thread when CPU is allocated called sigled threaded program.
Thread: thread is a portion of the process. more than one thread can exist as part of process. Thread has its own program area and memory area. Multiple threads inside one process can not access each other data. Process has to handle sycnhronization of threads to achieve the desirable behaviour.
Task: Task is not widely concept used worldwide. when program instruction is loaded into memory people do call as process or task. Task and Process are synonyms nowadays.
A process invokes or initiates a program. It is an instance of a program that can be multiple and running the same application. A thread is the smallest unit of execution that lies within the process. A process can have multiple threads running. An execution of thread results in a task. Hence, in a multithreading environment, multithreading takes place.
A program in execution is known as process. A program can have any number of processes. Every process has its own address space.
Threads uses address spaces of the process. The difference between a thread and a process is, when the CPU switches from one process to another the current information needs to be saved in Process Descriptor and load the information of a new process. Switching from one thread to another is simple.
A task is simply a set of instructions loaded into the memory. Threads can themselves split themselves into two or more simultaneously running tasks.
for more Understanding refer the link: http://www.careerride.com/os-thread-process-and-task.aspx
Wikipedia sums it up quite nicely:
Threads compared with processes
Threads differ from traditional multitasking operating system processes in that:
processes are typically independent, while threads exist as
subsets of a process
processes carry considerable state information, whereas multiple
threads within a process share state
as well as memory and other resources
processes have separate address spaces, whereas threads share their
address space
processes interact only through system-provided inter-process
communication mechanisms.
Context switching between threads in the same process is
typically faster than context
switching between processes.
Systems like Windows NT and OS/2 are said to have "cheap" threads and "expensive" processes; in other operating systems there is not so great a difference except the cost of address space switch which implies a TLB flush.
Task and process are used synonymously.
from wiki clear explanation
1:1 (Kernel-level threading)
Threads created by the user are in 1-1 correspondence with schedulable entities in the kernel.[3] This is the simplest possible threading implementation. Win32 used this approach from the start. On Linux, the usual C library implements this approach (via the NPTL or older LinuxThreads). The same approach is used by Solaris, NetBSD and FreeBSD.
N:1 (User-level threading)
An N:1 model implies that all application-level threads map to a single kernel-level scheduled entity;[3] the kernel has no knowledge of the application threads. With this approach, context switching can be done very quickly and, in addition, it can be implemented even on simple kernels which do not support threading. One of the major drawbacks however is that it cannot benefit from the hardware acceleration on multi-threaded processors or multi-processor computers: there is never more than one thread being scheduled at the same time.[3] For example: If one of the threads needs to execute an I/O request, the whole process is blocked and the threading advantage cannot be utilized. The GNU Portable Threads uses User-level threading, as does State Threads.
M:N (Hybrid threading)
M:N maps some M number of application threads onto some N number of kernel entities,[3] or "virtual processors." This is a compromise between kernel-level ("1:1") and user-level ("N:1") threading. In general, "M:N" threading systems are more complex to implement than either kernel or user threads, because changes to both kernel and user-space code are required. In the M:N implementation, the threading library is responsible for scheduling user threads on the available schedulable entities; this makes context switching of threads very fast, as it avoids system calls. However, this increases complexity and the likelihood of priority inversion, as well as suboptimal scheduling without extensive (and expensive) coordination between the userland scheduler and the kernel scheduler.