kernel: Can preemption occur while do_fork() is executing? - process

Is do_fork() safe from preemption? In other words, can the parent process allocate a new task struct and then get preempted, before getting a chance to insert the new task struct into the ready queue?

It's not safe from preemption.
The do_fork calls copy_process which in turn does the sched_fork that initializes the task. Afterwards the do_fork calls wake_up_new_task in order to put it on the run queue.
This is separated in order to be able to kill or terminate a process before being scheduled.
The sched_fork disables preemption, but enables it once its done with its work, making it possible for the kernel to preempt before calling the wake_up_new_task and putting it on the run queue.
This is based on my knowledge of the 2.6 kernel.

Related

Vulkan - How to efficiently copy data to CPU *and* wait for it

Let's say I want to execute the following commands:
cmd_buff start
dispatch (write to texture1)
copy (texture1 on gpu to buffer1 host-visible)
dispatch (write to texture2)
cmd_buff end
I'd like to know as soon as possible when buffer1's data are available.
My idea here is to have a waiting thread on which I'd wait for the copy to have completed. What I'd do is first split the above list of cmds into:
cmd_buff_1 start
dispatch (write to texture1)
copy (texture1 on gpu to buffer1 host-visible)
cmd_buff_1 end
and:
cmd_buff_2 start
dispatch (write to texture2)
cmd_buff_2 end
Now, I'd call vkQueueSubmit with cmd_buff_1 and with some fence1, followed by a call to another vkQueueSubmit with cmd_buff_2 with NULL fence.
On the waiting thread I'd call vkWaitForFences( fence1 ).
That's how I see such an operation. However, I'm wondering if that is optimal and if there was actually any way to put a direct sync still within cmd_buff_1 so that I wouldn't need to split the cmd buffer into two?
Never break up submit operations just to test fences; submit operations are too heavyweight to do that. If the CPU needs to check to see if work on the GPU has reached a specific point, there are many options other than a fence.
The simplest mechanism for something like this is to use an event. Set the event after the transfer operation, then use vkGetEventStatus on the CPU to see when it is ready. That's a polling function, so a waiting CPU thread won't immediately wake up when the data is ready (but then, there's no guarantee that would happen with a non-polling function either).
If timeline semaphores are available to you, you can wait for them to reach a particular counter value on the CPU with vkWaitSemaphores. This requires that you break the batch up into two batches, but they can both be submitted in the same submit command.

Barriers or semaphores for multiple submissions over the same queue?

In this example where we copy some buffer into vertex buffer and we want to quickly to start rendering using this buffer in two submissions without waiting over some fence:
vkBeginCommandBuffer(tansferCommandBuffer)
vkCmdCopyBuffer(tansferCommandBuffer, hostVisibleBuffer, vertexBuffer)
vkEndCommandBuffer(tansferCommandBuffer)
vkQueueSubmit(queue, tansferCommandBuffer)
vkBeginCommandBuffer(renderCommandBuffer)
...
vkCmdBindVertexBuffers(vertexBuffer)
vkCmdDraw()
...
vkEndCommandBuffer(renderCommandBuffer)
vkQueueSubmit(queue, renderCommandBuffer)
From what I understand is that tansferCommandBuffer might not have been finished when renderCommandBuffer is submitted, and renderCommandBuffer may get scheduled and reads form floating data in vertexBuffer.
We could attach a semaphore while submitting tansferCommandBuffer to be singled after completion and forward this semaphore to renderCommandBuffer to wait for before execution. The issue here is that it blocks the second batch commands that do not depend on the buffer.
Or we could insert a barrier after the copy command or before the bind vertex command, which seems to be much better since we can specify that the access to the buffer is our main concerned and possibly keep part of the batch to be executed.
Is there any good reason for using semaphores instead of barriers for similar cases (single queue, multiple submissions)?
Barriers are necessary whenever you change the way in which resources are used to inform the driver/hardware about that change. So in your example the barrier is probably needed too, nevertheless.
But as for the semaphores. When you submit a command buffer, you specify both semaphore handles and pipeline stages at which wait should occur on a corresponding semaphore. You do this through the following members of the VkSubmitInfo structure:
pWaitSemaphores is a pointer to an array of semaphores upon which to wait before the command buffers for this batch begin execution. If
semaphores to wait on are provided, they define a semaphore wait
operation.
pWaitDstStageMask is a pointer to an array of pipeline stages at which each corresponding semaphore wait will occur.
So when you submit a command buffer, all commands up to the specified stage can be executed by the hardware.

Can a process terminate after I/O without returning to the CPU?

I have a question about the following diagram from Operating Systems Concepts: http://unboltingbinary.in/wp-content/uploads/2015/04/image028.jpg
This diagram seems to imply that after every I/O operation, the process is placed back on the ready queue before being sent to the CPU again. However, is it possible for a process to terminate after I/O but before being sent to the ready queue?
Suppose we have a program that computes a number and then writes it to storage. In this case, does the process really need to return to the CPU after the I/O operation? It seems to me that the process should be allowed to terminate right after I/O. That way, there would be no need for a context switch.
Once one process has successfully executed a termination request on another, the threads of the terminated process should never run again, no matter what state they were in - blocked on I/O, blocked on inter-thread comms, running on a core, sleeping, whatever - they all must be stopped immediately if running and all be put in a state where they will never run again.
Anything else would be a security issue - terminated threads should not be given execution at all, (else it may not be possible to terminate the process).
Process termination requires the cpu. Changes to kernel mode structures on process exit, returning memory resources, etc. all require the cpu.
A process simply just does not evaporate. The term you want here is process rundown - I think.

Operating Systems - General Process Creation

Review Question
Consider the Program
#include <stdio.h>
int main(){
putchar('X');
exit(0);
}
Suppose it is compiled an an a.out file is generated. now suppose that a user in a local console window types a.out and hits the return key. what happens? be sure to describe a plausible but detailed and comprehensive sequence of operating system actions and events, not just what the user sees.
My answer
First, the shell will create a process in User Space
Then it will perform the system call 'putchar' Which simulates input, and the process will switch to kernel mode
It will then add the process (thread) to the long term scheduler where it will join the set of all processes that are ready to run
Once it is selected, it will move to the short term scheduler, where it will receive some processing time (ready -> running)
Since this process is an IO bound process, it will then head to the IO queue, where it will be stored in a buffer where it awaits execution (running -> waiting)
Once the IO is complete, the putchar call will print the X on the peripheral for which it is applied (the monitor) (waiting -> running)
Once the process returns to the short term scheduler it will again receive more processing time. Since there is nothing left to do but terminate, the process terminates (running -> terminated)
Is this valid understanding? Am I missing some critical concepts for process creation? I know it is relatively simple process, but please advise anything I am missing.
Thanks for reading, and thanks in advance for assistance.
First, the shell will create a process in User Space
// A lot of things happen before this!!
//The program will be loaded by the loaded.
//VM areas will be created for this process.
//Linking for library files will be done.
//Then a series of pagefault will occur will happen to bring your file on physical and virtual memory
Then it will perform the system call 'putchar' Which simulates input, and the process will switch to kernel mode
//putchar in not at all a system call!!!!
//putchar will call its library implementation, which will further call a write() system call and your program will get trapped inside the kernel
It will then add the process (thread) to the long term scheduler where it will join the set of all processes that are ready to run
//Totally depends upon the scheduling algorithms.. might be possible your process will be first to run!!
Once it is selected, it will move to the short term scheduler, where it will receive some processing time (ready -> running)
//Right, waiting on RunQ
Since this process is an IO bound process, it will then head to the IO queue, where it will be stored in a buffer where it awaits execution (running -> waiting)
//Sort of, it will be waiting on I/O queue, waiting for an interrupt, to write on o/p device
Once the IO is complete, the putchar call will print the X on the peripheral for which it is applied (the monitor) (waiting -> running)
//Correct
Once the process returns to the short term scheduler it will again receive more processing time. Since there is nothing left to do but terminate, the process terminates (running -> terminated)
//Before this it will again get trapped inside the kernel when your program will execute RETURN statement.
//It will call the back the startup function which was responsible for calling the main() function.
//Then startup() function will return 0 to operating system, and hence OS will kill this process and moce it to terminated state..
I still don't think its a complete version as 100's of machine instruction will be executed for this program and its difficult to pin point each and everyone..
But, still if you have some doubt post your comment!!]
Hope this will help!!!

How does a VxWorks scheduler get executed?

Would like to know how the scheduler gets called so that it can switch tasks. As in even if its preemptive scheduling or round robin scheduling - the scheduler should come in to picture to do any kind of task switching. Supposing a low priority task has an infinite loop - when does the scheduler intervene and switch to a higher priority task?
Query is:
1. Who calls the scheduler? [in VxWorks]
2. If it gets called at regular intervals - how is that mechanism implemented?
Thanks in advance.
--Ashwin
The simple answer is that vxWorks takes control through a hardware interrupt from the system timer that occurs continually at fixed intervals while the system is running.
Here's more detail:
When vxWorks starts, it configures your hardware to generate a timer interrupt every n milliseconds, where n is often 10 but completely depends on your hardware. The timer interval is generally set up by vxWorks in your Board Support Package (BSP) when it starts.
Every time the timer fires an interrupt, the system starts executing the timer interrupt handler. The timer interrupt handler is part of vxWorks, so now vxWorks has control. The first thing it does is save the CPU state (such as registers) into the Task Control Block (TCB) of the currently running task.
Then eventually vxWorks runs the scheduler to determine who runs next. To run a task, vxWorks copies the state of the task from its TCB into the machine registers, and after it does that the task has control of the CPU.
Bonus info:
vxWorks provides hooks into the task switching logic so you can have a function get called whenever your task gets preempted.
indiv provides a very good answer, but it is only partially accurate.
The actual working of the system is slightly more complex.
The scheduler can be executed as a result of either synchronous or asynchronous operations.
Synchronous refers to operations that are caused as a result of the code in the currently executing task. A prime example of this would be to take a semaphore (semTake).
If the semaphore is not available, the currently executing task will pend and no longer be available to execute. At this point, the scheduler will be invoked and determine the next task that should execute and will perform a context switch.
Asynchronous operations essentially refer to interrupts. Timer interrupts were very well described by indiv. However, a number of different elements could cause an interrupt to execute: network traffic, sensor, serial data, etc...
It is also good to remember that the timer interrupt does not necessarily cause a context switch! Yes, the interrupt will occur, and delayed task and the time slice counters will be decremented. However, if the time slice is not expired, or no higher priority task transitions from the pended to the ready state, then the scheduler will not actually be invoked, and you will return back to the original task, at the exact point where execution was interrupted.
Note that the scheduler does not have its own context; it is not a task. It is simply code that executes in whatever context it is invoked from. Either from the interrupt context (asynchronous) or from the invoking task context (synchronous).
Unless you have a majorily-customized target build, the scheduler is invoked by the Timer interrupt. Details are platform-specific, though.
The scheduler also gets invoked if current task gets completed or blocks.