Im trying to understand how their scheduling criteria works.
why IO bound and CPU bound mix are more important to batch processes?.
does Preemptive scheduling important to all?
thanks alot for the help.
A mixed system typically allows the system manager to create batch queues into which those with appropriate privileges may submit jobs. Usual purpose of the batch queues is use the CPU when interactive processes are not using it.
Usually batch queues are assigned priorities that override the normal user priorities. The system manager usually assigns batch queues prior levels that are lower than the normal interactive priority. If you set the priority low enough, the batch queue does not interfere with interactive users.
It is also possible to schedule the batch queues so that they only run at specified times (e.g., between 2AM and 6AM).
The system manager does not concern himself with I/O bound or CPU bound.
Related
We are looking into creating a distributed system for task execution, where the tasks have priorities in .NET (C#). There are a lot of options, I would like to get your take on it. The options & their disadvantages are:
1) Amazon's SWF (Simple WorkFlow) - in .NET we can't use a framework such as java's FLOW which simplifies. this means a lot of boilerplate code. In addition, this offering from amazon doesn't seem to be very popular (so: no community support, and might eventually disappear)
2) Building our own on top of a queuing system
2.a) SQS - not really a FIFO, and using 2 queues (normal and high priority) won't give us granular control over the priorities (we might be able to live with that)
2.b) RabbitMQ - administrative overhead (setting it up, configuring it in cluster mode for reliability, etc)
3) I have received another suggestion to use "event driven" without queues. I can't see how it's possible, maybe someone can help clarify it to me? (oh, and, is it related to a technology called Akka (actor based))
Thank you
SQS is probably going to be the simplest - very little code is required, and the cost is extremely low and the setup time is minimal.
If 2 queues and hi/low priority is not enough then create 3 queues, or 5 queues or 10 queues - you can be as granular as you need to be.
You can have multiple worker machines scanning all the queues in priority in order, or have some machines just dedicated to processing the hi-priortiy queues, and these machines could be bigger/faster if you want to process even quicker.
Another option is to have seperate auto-scaling policies that will spin up more/faster machines based on a small increase in the length of the high-priority queues, but only scale up smaller/cheaper machines, when the low-priority queue gets very long....lots of options to choose from and fine-tune you solution.
I am currently pursuing an undergraduate level course in Operating Systems. I'm somewhat confused about the functions of dispatcher and scheduler in process scheduling. Based on what I've learnt, the medium term scheduler selects the process for swapping out and in , and once the processes are selected, the actual swap operation is performed by Dispatcher by context switching. Also the short term scheduler is responsible for scheduling the processes and allocate them CPU time, based on the scheduling algorithm followed.
Please correct me if I'm wrong. I'm really confused about the functions of medium term scheduler vs dispatcher, and differences between Swapping & context switching.
You describing things in system specific terms.
The scheduler and the dispatcher could be all the same thing. However, the frequently are divided so that the scheduler maintains a queue of processes and the dispatcher handles the actual context switch.
If you divide the scheduler into long term, medium term, and short term, that division (if it exists at all) is specific to the operating system.
Swapping in the process of removing a process from memory. A process can be made non-executable through a context switch but may not be swapped out. Swapping is generally independent of scheduling. However, a process must be swapped in to run and the memory management will try to avoid swapping out executing processes.
A scheduler evaluate the requirement of the request to be serviced and thus imposes ordering.
Basically,whatever you have known about scheduler and dispatcher is correct.Sometimes they are referred to as a same unit or scheduler(short time in this case) contains dispatcher as a single unit and together are responsible for allocating a process to CPU for execution.Sometimes they are referred as two separate units,the scheduler selects a process according to some algorithm and the dispatcher is a software that is responsible for actual context switching.
I'm doing some fill in the blanks from a sample exam for my class and I was hoping you could double check my terminology.
The various scheduling queues used by the operating system would consist of lists of processes.
Interrupt handling is the technique of periodically checking to see if a condition (such as completion of some requested I/O operation) has been met.
When the CPU is in kernel mode, a running program has access to a restricted set of CPU functionality.
The job of the CPU scheduler is to select a process on the ready queue and change its state.
The CPU normally supports a vector of interrupts so the OS can respond appropriately when some event of interest occurs in the hardware.
Using traps, a device controller can use idle time on the bus to read from or write to main memory.
During a context switch, the state of one process is copied from the CPU and saved, and the state of a different process is restored.
An operating system consists of a kernel and a collection of application programs that run as user processes and either provide OS services to the user or work in the background to keep the computer running smooth.
There are so many terms from our chapters, I am not quite sure if I am using the correct ones.
My thoughts:
1. Processes and/or threads. Jobs and tasks aren't unheard of either. There can be other things. E.g. in MS Windows there are also Deferred Procedure Calls (DPCs) that can be queued.
2. This must be polling.
4. Why CPU scheduler? Why not just scheduler?
6. I'm not sure about traps in the hardware/bus context.
In a distributed system, a certain node distributes 'X' units of work equally across 'N' nodes (via socket message passing).
As we increase the number of worker nodes, each nodes completes his job faster but we have to set-up more connections.
In a real situation, it would be similar to changing 10 nodes in a Hadoop-like system with each node processing 100GB by 1,000,000 nodes with each node processing 1MB.
What's the impact of setting up more connections in this case? Is this a big overhead in poll() function?
What's the best approach?
Sounds like you will need to consult Amdahl's Law.
At least it was how I computed how many machines on a high-speed switch were optimal for my parallel computations.
Does it have to use sockets and message passing between Supervisor and Worker?
You can use some type of queuing so avoid putting load onto the Supervisor. Or a distributed file system similar to HDFS to distribute the tasks and collect the results.
It also depends on the number of nodes you are planning to deploy the Workers on. 1,000,000 nodes is a very big number therefore in that case, you'll have to distribute the tasks into multiple queues.
The thing to be careful about is what will happen if all the nodes finish their tasks at the same time. It would be worth putting some variability into when they can request for a new task. ZooKeeper (http://hadoop.apache.org/zookeeper/) is potentially something you can also use to synchronise the jobs.
Can you measure your network cost? The time spent working on the worker machine should be only part of the cost of the message pass and receive.
Also can you describe the O notation for handling each worker result into the master result?
Does your master round robin expected responses?
btw -- if your worker nodes are finishing quicker but underutilizing the cpu resources you may be missing a design trade-off?
of course, you could be the rule or the exception to any law(argument/out of date research). ;-)
What is the difference between a thread/process/task?
Process:
A process is an instance of a computer program that is being executed.
It contains the program code and its current activity.
Depending on the operating system (OS), a process may be made up of multiple threads of execution that execute instructions concurrently.
Process-based multitasking enables you to run the Java compiler at the same time that you are using a text editor.
In employing multiple processes with a single CPU,context switching between various memory context is used.
Each process has a complete set of its own variables.
Thread:
A thread is a basic unit of CPU utilization, consisting of a program counter, a stack, and a set of registers.
A thread of execution results from a fork of a computer program into two or more concurrently running tasks.
The implementation of threads and processes differs from one operating system to another, but in most cases, a thread is contained inside a process. Multiple threads can exist within the same process and share resources such as memory, while different processes do not share these resources.
Example of threads in same process is automatic spell check and automatic saving of a file while writing.
Threads are basically processes that run in the same memory context.
Threads may share the same data while execution.
Thread Diagram i.e. single thread vs multiple threads
Task:
A task is a set of program instructions that are loaded in memory.
Short answer:
A thread is a scheduling concept, it's what the CPU actually 'runs' (you don't run a process). A process needs at least one thread that the CPU/OS executes.
A process is data organizational concept. Resources (e.g. memory for holding state, allowed address space, etc) are allocated for a process.
To explain on simpler terms
Process: process is the set of instruction as code which operates on related data and process has its own various state, sleeping, running, stopped etc. when program gets loaded into memory it becomes process. Each process has atleast one thread when CPU is allocated called sigled threaded program.
Thread: thread is a portion of the process. more than one thread can exist as part of process. Thread has its own program area and memory area. Multiple threads inside one process can not access each other data. Process has to handle sycnhronization of threads to achieve the desirable behaviour.
Task: Task is not widely concept used worldwide. when program instruction is loaded into memory people do call as process or task. Task and Process are synonyms nowadays.
A process invokes or initiates a program. It is an instance of a program that can be multiple and running the same application. A thread is the smallest unit of execution that lies within the process. A process can have multiple threads running. An execution of thread results in a task. Hence, in a multithreading environment, multithreading takes place.
A program in execution is known as process. A program can have any number of processes. Every process has its own address space.
Threads uses address spaces of the process. The difference between a thread and a process is, when the CPU switches from one process to another the current information needs to be saved in Process Descriptor and load the information of a new process. Switching from one thread to another is simple.
A task is simply a set of instructions loaded into the memory. Threads can themselves split themselves into two or more simultaneously running tasks.
for more Understanding refer the link: http://www.careerride.com/os-thread-process-and-task.aspx
Wikipedia sums it up quite nicely:
Threads compared with processes
Threads differ from traditional multitasking operating system processes in that:
processes are typically independent, while threads exist as
subsets of a process
processes carry considerable state information, whereas multiple
threads within a process share state
as well as memory and other resources
processes have separate address spaces, whereas threads share their
address space
processes interact only through system-provided inter-process
communication mechanisms.
Context switching between threads in the same process is
typically faster than context
switching between processes.
Systems like Windows NT and OS/2 are said to have "cheap" threads and "expensive" processes; in other operating systems there is not so great a difference except the cost of address space switch which implies a TLB flush.
Task and process are used synonymously.
from wiki clear explanation
1:1 (Kernel-level threading)
Threads created by the user are in 1-1 correspondence with schedulable entities in the kernel.[3] This is the simplest possible threading implementation. Win32 used this approach from the start. On Linux, the usual C library implements this approach (via the NPTL or older LinuxThreads). The same approach is used by Solaris, NetBSD and FreeBSD.
N:1 (User-level threading)
An N:1 model implies that all application-level threads map to a single kernel-level scheduled entity;[3] the kernel has no knowledge of the application threads. With this approach, context switching can be done very quickly and, in addition, it can be implemented even on simple kernels which do not support threading. One of the major drawbacks however is that it cannot benefit from the hardware acceleration on multi-threaded processors or multi-processor computers: there is never more than one thread being scheduled at the same time.[3] For example: If one of the threads needs to execute an I/O request, the whole process is blocked and the threading advantage cannot be utilized. The GNU Portable Threads uses User-level threading, as does State Threads.
M:N (Hybrid threading)
M:N maps some M number of application threads onto some N number of kernel entities,[3] or "virtual processors." This is a compromise between kernel-level ("1:1") and user-level ("N:1") threading. In general, "M:N" threading systems are more complex to implement than either kernel or user threads, because changes to both kernel and user-space code are required. In the M:N implementation, the threading library is responsible for scheduling user threads on the available schedulable entities; this makes context switching of threads very fast, as it avoids system calls. However, this increases complexity and the likelihood of priority inversion, as well as suboptimal scheduling without extensive (and expensive) coordination between the userland scheduler and the kernel scheduler.