Look at the below table for example :
Look at the Non-Preemptive SJF table
Suppose if the burst time of process P1 was a very large number compared to 7, the processes P3,P2 and P4 have to wait for a large amount of time till P1 frees the CPU. No book or article that I've read mentions of SJF being a subject to convoy effect? Why not?
Related
Is there a way to provide a np-complete proof of this following problem :
given a set of N 0-1 vectors (which can be considered as jobs)
every job has a number of tasks to be proceeded( example : v_1=(0,1,1,0) this job needs tasks 2 and 3). In addition we have an integer T. the idea is to partition all vectors in T separate
sets A_i, i=1,..,T. There are some constraints :
a job in a set A_i if started then should be finished before starting the next one in the same set.
the cost of each task is 1 but if a there's a task j shared between 2 or more jobs at the same time then the cost is only 1.
So the objective is to minimize this cost. in other words, find a partition A_1,..,A_T and also a permutation of execution of each job in each set A_i so that to find matches between same tasks.
This problem is a little bit like multiprocessor scheduling if we didn't have those additional constraint, but i couldn't find a formal proof.
Thank you in advance.
Question
A scheduler attempts to share the CPU between multiple processes. Two
processes, P1 and P2, are running. P1 does many I/O operation, while P2 does
very few.
Explain what happens if a simple ‘round robin’ pre-emptive scheduling
algorithm is used to schedule P1 and P2.
My Attempt
From my understanding, a scheduler is said to be pre-emptive when it has the ability to be invoked by an interrupt and move a process from running state to another and then moving another process to the running state. Round-robin means that each process, P1 and P2, would get an equal time with the CPU but if P1 is performing many I/O operations while P2 performs fewer, wouldn't P1 get more time on with the CPU as it has many more operations? If each Process was given for example 1 second, if P1 had to perform 50 I/O operations (each taking 1 second, for simplicity) while P2 had to perform 3 I/O operations, would I be correct in assuming that the order would go: P1,P2,P1,P2,P1,P2,P1,P1 (continuing with P1 till the operations are complete).
That is my understanding hopefully some of you guys can provide more insight. Thank You.
Your understanding is pretty close to the mark.
Round robin means that the scheduler picks each process in turn. So if there are only two processes, the scheduler will pick one and then the other (assuming both are ready).
As to your first question, process P2 actually gets more CPU time. Here is an example where P1 is scheduled first and does an I/O after .5 seconds:
Time(seconds) What
0 P1 starts
.5 P1 does I/O; P2 is scheduled
1.5 P2's time is up; P1 is scheduled because its I/O has finished
2.0 P1 does I/O; P2 is scheduled
3.0 P2's time is up, P1 is scheduled because its I/O has completed
Total P1 time: 1 second
Total P2 time: 2 seconds
You can see that because P1 does more I/O, it gets less total CPU time because the scheduler doesn't take into account the fact that P1 doesn't use all of its allocated time.
If both P1 and P2 do I/O, the schedule will still be:
P1, P2, P1, P2, P1, P2, etc.
because if P1 yields the CPU, P2 is ready and vice versa.
Assuming you are on Linux system, looking at /proc/sched_debug will give you a lot info (average time, wait time) on the scheduler details as well as processes (number of nonvoluntary switches, etc).
You migh also interested in Tuning the Task Scheduler
I am new to optaplanner, and right now I focus on trying to understand the project job scheduling. I trying to run this examples using the sample data from optaplanner manual like in this picture below:
I have some question about the domain classes in this example :
What is the difference of GlobalResource and LocalResource? In the examples, all the resource is GlobalResource right? Then what the use of LocalResource?
There are 3 JobType: SOURCE, STANDARD, SINK, what is the meaning each one of them? It is SOURCE mean the job should be the first to start before the others? STANDARD mean it is should be run after the predecessor job finished but not after the SINK job? SINK mean it is the last job to do after all job finished?
What is the meaning of property releaseDate and criticalPathDuration in Project class? If we related it with the picture above, what is the value for project Book1 and Book2?
What is the meaning of requirement in ResourceRequirement?
I will be really thankful if someone can help me create the xml sample data like in optaplanner distribution, cause it will help me more faster to understand this example. Thanks & Regards.
A LocalResource belongs to a specific Project, a GlobalResource is shared between the projects.
So a LocalResource only has to be worry about being used by other jobs in the same Project too, while a GlobalResource has to worry about all other tasks.
That's an implementation trick. The source and sink jobs are dummy's basically. Because a project might start with multiple jobs in parallel, a SOURCE job is put in front of it, to have a single root. Same for the end: it can end with multiple, so a SINK job is put after it, to have a single tail. This makes it easier and faster to determine makespan etc.
IIRC, releaseDate is the first date we are allowed to start the first job. For example: you have to create a book, but you 'll only get the actual final content next Monday, so the releaseDate is next Monday (you can't start any work before that date).
The criticalPathDuration is a theoretical minimum duration (if we can happily ignore resources IIRC). For example: if job A takes 5 days and job B takes 2 days and B has to be done AFTER A, then the critical path duration is 7 days. Adding job C which takes 1 day and can be done in parallel with the others, don't affect that.
ResourceRequirement is the many2many relationship between ExecutionMode and Resource. Remember that ExecutionMode belongs to a specific Job. For example: doing job A in executionMode A1 requires 1 laborers and 5 days. Doing job A in executionMode A2 requires 2 laborers and 3 days.
I just wrote my first hadoop job. It processes many files and generates multipleoutput files for each input file. I am running it on a two node cluster and it takes about 10 minutes for my largest input set. Looking at the counters below, what are the optimizations I can do to make it run faster? Are there any specific indicators which one should look for in these counters-
Version: 2.0.0-mr1-cdh4.1.2
Map task Capacity:20
Reduce task Capacity:20
Avg task per node:20
We can see here that most of data reduction happens in the map phase (number of map output bytes is much less then HDFS read bytes, The same about map input records - it is much lower then map input record). We also see that a lot of CPU time spent. We also see low number of shuffling bytes
So this job is:
a) A lot of data reduction is done on Map phase.
b) The job is CPU bound.
So I think code of mapper and reducer should be optimized. I/O probably is not important for this job.
My query is regarding engineering the priority value of a process. In my system, process A is running in RR at priority 83. Now I have another process B in RR, I want B's priority to be higher than A (i.e. I want B to be scheduled always compared to A).
To do this, what value should I choose for B. I have read in code that there is a penalty/bonus of 5 depending upon process's history.
Also, If I choose value 84 Or 85, is there any chance in some situations that my process is ignored.
Please help in engineering this value.
Now I got it. Real time tasks(FF/RR) are not governed by penalty/bonus rules. With O(1) scheduler, task with higher priority will be chosen. In my case process B will be scheduled if its priority is greater than process A.
Penalty/bonus is for SCHED_OTHER/SCHED_NORMAL.