Processes sharing a CPU (scheduler) - process

Question
A scheduler attempts to share the CPU between multiple processes. Two
processes, P1 and P2, are running. P1 does many I/O operation, while P2 does
very few.
Explain what happens if a simple ‘round robin’ pre-emptive scheduling
algorithm is used to schedule P1 and P2.
My Attempt
From my understanding, a scheduler is said to be pre-emptive when it has the ability to be invoked by an interrupt and move a process from running state to another and then moving another process to the running state. Round-robin means that each process, P1 and P2, would get an equal time with the CPU but if P1 is performing many I/O operations while P2 performs fewer, wouldn't P1 get more time on with the CPU as it has many more operations? If each Process was given for example 1 second, if P1 had to perform 50 I/O operations (each taking 1 second, for simplicity) while P2 had to perform 3 I/O operations, would I be correct in assuming that the order would go: P1,P2,P1,P2,P1,P2,P1,P1 (continuing with P1 till the operations are complete).
That is my understanding hopefully some of you guys can provide more insight. Thank You.

Your understanding is pretty close to the mark.
Round robin means that the scheduler picks each process in turn. So if there are only two processes, the scheduler will pick one and then the other (assuming both are ready).
As to your first question, process P2 actually gets more CPU time. Here is an example where P1 is scheduled first and does an I/O after .5 seconds:
Time(seconds) What
0 P1 starts
.5 P1 does I/O; P2 is scheduled
1.5 P2's time is up; P1 is scheduled because its I/O has finished
2.0 P1 does I/O; P2 is scheduled
3.0 P2's time is up, P1 is scheduled because its I/O has completed
Total P1 time: 1 second
Total P2 time: 2 seconds
You can see that because P1 does more I/O, it gets less total CPU time because the scheduler doesn't take into account the fact that P1 doesn't use all of its allocated time.
If both P1 and P2 do I/O, the schedule will still be:
P1, P2, P1, P2, P1, P2, etc.
because if P1 yields the CPU, P2 is ready and vice versa.

Assuming you are on Linux system, looking at /proc/sched_debug will give you a lot info (average time, wait time) on the scheduler details as well as processes (number of nonvoluntary switches, etc).
You migh also interested in Tuning the Task Scheduler

Related

How are reserved slots re-allocated between reservation/projects if idle slots are used?

The documentation on Introduction to Reservations: Idle Slots states that idle slots from reservations can be used by other reservations if required
By default, queries running in a reservation automatically use idle slots from other reservations. That means a job can always run as long as there's capacity. Idle capacity is immediately preemptible back to the original assigned reservation as needed, regardless of the priority of the query that needs the resources. This happens automatically in real time.
However, I'm wondering if this can have a negative effect on other reservations in a scenario where idle slots are used but are shortly after required by the "owning" reservation.
To be concrete I would like to understand if i can regard assigned slots as guarantee OR as a best effort.
Example:
Reserved slots: 100
Reservation A: 50 Slots
Reservation B: 50 Slots
"A" starts a query at 14:00:00 and the computation takes 300 seconds if 100 slots are used.
All slots are idle at the start of the query, thus all 100 slots are made available to A.
5 seconds later at 14:00:05 "B" starts a query that takes 30 seconds if 50 slots are used.
Note:
For the sake of simplicity let's assume that both queries have only excactly 1 stage and each computation unit ("job") in the stage takes the full time of the query. I.e. the stage is divided into 100 jobs and if a slot starts the computation it takes the full 300 seconds to finish successfully.
I'm fairly certain that on "multiple stages" or "shorter computation times" (e.g. if the computation can be broken down in 1000 jobs) GBQ would be smart enough to dynamically re-assign the freed up slot the reservation it belongs to.
Questions:
does "B" now have to wait until a slot in "A" finishes?
this would mean ~5 min waiting time
I'm not sure how "realistic" the 5 min are, but I feel this is an important variable since I wouldn't worry about a couple of seconds - but I would worry about a couple of minutes!
or might an already started computation of "A" also be killed mid-flight?
the docu Introduction to Reservations: Slot Scheduling seems to suggest something like this
The goal of the scheduler is to find a medium between being too aggressive with evicting running tasks (which results in wasting slot time) and being too lenient (which results in jobs with long running tasks getting a disproportionate share of the slot time).
Answer via Reddit
A stage may run for quite some time (minutes, even hours in really bad cases) but a stage is run by many workers. And most workers complete their work within a very short time, e.g. milliseconds or seconds. Hence rebalancing, I.e. reallocating slots from one job to another is very fast.
So if a rebalancing happens and a job loses a large part of slots, then it will run a lot slower. And the one that gains slots will run fast. And this change is quick.
So in the above example. As job B starts 5 seconds in, within a second or so it would have acquired most of its slots.
So bottom line:
a query is broken up into "a lot" of units of work
each unit of work finishes pretty fast
this give GBQ to opportunity to re-assign slots

Does g1 collect all (both eden and survivor) or part of garbages in young gc?

I am curious if g1 will choose part of young region to collect in order to reach the target gc time.
And what the real meaning of params InitiatingHeapOccupancyPercent?
In G1 GC, heap is divided into different regions. To achieve the target GC time number of regions will be altered accordingly. From logs you can observe variations in Young generation size. InitiatingHeapOccupancyPercent indicates when to start GC cycles. By default it's 45. That means once heap is 45% full then GC cycles will kick in.
Reference: https://www.oracle.com/technetwork/tutorials/tutorials-1876574.html

Can Shortest Job First Scheduling be subject to convoy effect?

Look at the below table for example :
Look at the Non-Preemptive SJF table
Suppose if the burst time of process P1 was a very large number compared to 7, the processes P3,P2 and P4 have to wait for a large amount of time till P1 frees the CPU. No book or article that I've read mentions of SJF being a subject to convoy effect? Why not?

How to calculate average waiting time in preemptive priority scheduling

Given the following table :
I want to calculate the average waiting time of preemptive priority scheduling .
In the table above , the bigger the number is (in the priority column) the higher the priority is .
Partial solution :
|p1|p3|p1|p2|p5|p4|
0 8 29 33 52 67 80
What do I do from here ?
Thanks
did you understand the partial solution (the actual schedule) you have posted?
Waiting time of a process = finish time of that process - execution time - arrival time
Once you have this for all process then just take the average. That would give you the avg waiting time of the scheduling algorithm for this instance
More details:
Here process p1 did not wait for the first 8 seconds. Then at t=9 it was preempted. It was in wait mode from time t=9 to t=29 while process p3 was executing. So p1 had waited for 21 seconds while p3 was executing. At t=29, p1 started again and completed at t=33. In total, p1 had waited for 21 seconds. As per the formula we get waiting time as 33-12-0=21. Basically for each process we look at the time between when it arrived and when it finished. Any time between this interval when it is not executing is a wait time

changing real time process priority in Linux ..?

My query is regarding engineering the priority value of a process. In my system, process A is running in RR at priority 83. Now I have another process B in RR, I want B's priority to be higher than A (i.e. I want B to be scheduled always compared to A).
To do this, what value should I choose for B. I have read in code that there is a penalty/bonus of 5 depending upon process's history.
Also, If I choose value 84 Or 85, is there any chance in some situations that my process is ignored.
Please help in engineering this value.
Now I got it. Real time tasks(FF/RR) are not governed by penalty/bonus rules. With O(1) scheduler, task with higher priority will be chosen. In my case process B will be scheduled if its priority is greater than process A.
Penalty/bonus is for SCHED_OTHER/SCHED_NORMAL.