After a process calls syscall wait(), who will wake it up? - process

I have a general idea that a process can be in ready_queue where CPU selects candidate to run next. And there are these other queues on which a process waits for (broadly speaking) events. I know from OS courses long time ago that there are wait queues for IO and interrupts. My questions are:
There are many events a process can wait on. Is there a wait queue corresponding to each such event?
Are these wait queues created/destroyed dynamically? If so, which kernel module is responsible for managing these queues? The scheduler? Are there any predefined queues that will always exist?
To eventually get a waiting process off a wait queue, does the kernel have a way of mapping from each actual event (either hardware or software) to the wait queue, and then remove ALL processes on that queue? If so, what mechanisms does a kernel employ?
To give an example:
....
pid = fork();
if (pid == 0) { // child process
// Do something for a second;
}
else { // parent process
wait(NULL);
printf("Child completed.");
}
....
wait(NULL) is a blocking system call. I want to know the rest of the journey the parent process goes through. My take of the story line is as follows, PLEASE correct me if I miss crucial steps or if I am completely wrong:
Normal system call setup through libc runtime. Now parent process is in kernel mode, ready to execute whatever is in wait() syscall.
wait(NULL) creates a wait queue where the kernel can later find this queue.
wait(NULL) puts the parent process onto this queue, creates an entry in some map that says "If I (the kernel) ever receives an software interrupt, signal, or whatever that indicates that the child process is finished, scheduler should come look at this wait queue".
Child process finishes and kernel somehow noticed this fact. Kernel context switches to scheduler, which looks up in the map to find the wait queue where the parent process is on.
Scheduler moves the parent process to ready queue, does its magic and sometime later the parent process is finally selected to run.
Parent process is still in kernel mode, inside wait(NULL) syscall. Now the main job of rest of the syscall is to exit kernel mode and eventually return the parent process to user land.
The process continues its journey on the next instruction, and may later be waiting on other wait queues until it finishes.
PS: I am hoping to know the inner workings of the OS kernel, what stages a process goes through in the kernel and how the kernel interact and manipulate these processes. I do know the semantics and the contract of the wait() Syscall APIs and that is not what I want to know from this question.

Let's explore the kernel sources. First of all, it seems all the
various wait routines (wait, waitid, waitpid, wait3, wait4) end up in the
same system call, wait4. These days you can find system calls in the
kernel by looking for the macros SYSCALL_DEFINE1 and so, where the number
is the number of parameters, which for wait4 is coincidentally 4. Using the
google-based freetext search in the Free Electrons Linux Cross
Reference we eventually find the definition:
1674 SYSCALL_DEFINE4(wait4, pid_t, upid, int __user *, stat_addr,
1675 int, options, struct rusage __user *, ru)
Here the macro seems to split each parameter into its type and name. This
wait4 routine does some parameter checking, copies them into a wait_opts
structure, and calls do_wait(), which is a few lines up in the same file:
1677 struct wait_opts wo;
1705 ret = do_wait(&wo);
1551 static long do_wait(struct wait_opts *wo)
(I'm missing out lines in these excerpts as you can tell by the
non-consecutive line numbers).
do_wait() sets another field of the structure to the name of a function,
child_wait_callback() which is a few lines up in the same file. Another
field is set to current. This is a major "global" that points to
information held about the current task:
1558 init_waitqueue_func_entry(&wo->child_wait, child_wait_callback);
1559 wo->child_wait.private = current;
The structure is then added to a queue specifically designed for a process
to wait for SIGCHLD signals, current->signal->wait_chldexit:
1560 add_wait_queue(&current->signal->wait_chldexit, &wo->child_wait);
Let's look at current. It is quite hard to find its definition as it
varies per architecture, and following it to find the final structure is a
bit of a rabbit warren. Eg current.h
6 #define get_current() (current_thread_info()->task)
7 #define current get_current()
then thread_info.h
163 static inline struct thread_info *current_thread_info(void)
165 return (struct thread_info *)(current_top_of_stack() - THREAD_SIZE);
55 struct thread_info {
56 struct task_struct *task; /* main task structure */
So current points to a task_struct, which we find in sched.h
1460 struct task_struct {
1461 volatile long state; /* -1 unrunnable, 0 runnable, >0 stopped */
1659 /* signal handlers */
1660 struct signal_struct *signal;
So we have found current->signal out of current->signal->wait_chldexit,
and the struct signal_struct is in the same file:
670 struct signal_struct {
677 wait_queue_head_t wait_chldexit; /* for wait4() */
So the add_wait_queue() call we had got to above refers to this
wait_chldexit structure of type wait_queue_head_t.
A wait queue is simply an initially empty, doubly-linked list of structures that contain a
struct list_head types.h
184 struct list_head {
185 struct list_head *next, *prev;
186 };
The call add_wait_queue()
wait.c
temporarily locks the structure and via an inline function
wait.h
calls list_add() which you can find in
list.h.
This sets the next and prev pointers appropriately to add the new item on
the list.
An empty list has the two pointers pointing at the list_head structure.
After adding the new entry to the list, the wait4() system call sets a
flag that will remove the process from the runnable queue on the next
reschedule and calls do_wait_thread():
1573 set_current_state(TASK_INTERRUPTIBLE);
1577 retval = do_wait_thread(wo, tsk);
This routine calls wait_consider_task() for each child of the process:
1501 static int do_wait_thread(struct wait_opts *wo, struct task_struct *tsk)
1505 list_for_each_entry(p, &tsk->children, sibling) {
1506 int ret = wait_consider_task(wo, 0, p);
which goes very deep but in fact is just trying to see if any child already
satisfies the syscall, and we can return with the data immediately. The
interesting case for you is when nothing is found, but there are still running
children. We end up calling schedule(), which is when the process gives
up the cpu and our system call "hangs" for a future event.
1594 if (!signal_pending(current)) {
1595 schedule();
1596 goto repeat;
1597 }
When the process is woken up, it will continue with the code after
schedule() and again go through all the children to see if the wait
condition is satisfied, and probably return to the caller.
What wakes up the process to do that? A child dies and generates a SIGCHLD
signal.
In signal.c
do_notify_parent() is called by a process as it dies:
1566 * Let a parent know about the death of a child.
1572 bool do_notify_parent(struct task_struct *tsk, int sig)
1656 __wake_up_parent(tsk, tsk->parent);
__wake_up_parent() calls __wake_up_sync_key() and uses exactly the
wait_chldexit wait queue we set up previously.
exit.c
1545 void __wake_up_parent(struct task_struct *p, struct task_struct *parent)
1547 __wake_up_sync_key(&parent->signal->wait_chldexit,
1548 TASK_INTERRUPTIBLE, 1, p);
I think we should stop there, as wait() is clearly one of the more
complex examples of a system call and the use of wait queues. You can find
a simpler presentation of the mechanism in this 3 page Linux Journal
article from 2005. Many things
have changed, but the principle is explained. You might also buy the books
"Linux Device Drivers" and "Linux Kernel Development", or check out the
earlier editions of these that can be found online.
For the "Anatomy Of A System Call" on the way from user space to the kernel
you might read these lwn articles.
Wait queues are heavily used throughout the kernel whenever a task,
needs to wait for some condition. A grep through the kernel sources finds
over 1200 calls of init_waitqueue_head() which is how you initialise a
waitqueue you have dynamically created by simply kmalloc()-ing the space
to hold the structure.
A grep for the DECLARE_WAIT_QUEUE_HEAD() macro finds over 150 uses of
this declaration of a static waitqueue structure. There is no intrinsic
difference between these. A driver, for example, can choose either method
to create a wait queue, often depending on whether it can manage
many similar devices, each with their own queue, or is only expecting one device.
No central code is responsible for these queues, though there is common
code to simplify their use. A driver, for example, might create an empty
wait queue when it is installed and initialised. When you use it to read data from some
hardware, it might start the read operation by writing directly into the
registers of the hardware, then queue an entry (for "this" task, i.e. current) on its wait queue to give up
the cpu until the hardware has the data ready.
The hardware would then interrupt the cpu, and the kernel would call the
driver's interrupt handler (registered at initialisation). The handler code
would simply call wake_up() on the wait queue, for the kernel to
put all tasks on the wait queue back in the run queue.
When the task gets the cpu again, it continues where it left off (in
schedule()) and checks that the hardware has completed the operation, and
can then return the data to the user.
So the kernel is not responsible for the driver's wait queue, as it only
looks at it when the driver calls it to do so. There is no mapping from the
hardware interrupt to the wait queue, for example.
If there are several tasks on the same wait queue, there are variants of
the wake_up() call that can be used to wake up only 1 task, or all of
them, or only those that are in an interruptable wait (i.e. are designed to
be able to cancel the operation and return to the user in case of a
signal), and so on.

In order to wait for a child process to terminate, a parent process will just execute a wait() system call. This call will suspend the parent process until any of its child processes terminates, at which time the wait() call returns and the parent process can continue.
The prototype for the wait( call is:
#include <sys/types.h>
#include <sys/wait.h>
pid_t wait(int *status);
The return value from wait is the PID of the child process which terminated. The parameter to wait() is a pointer to a location which will receive the child's exit status value when it terminates.
When a process terminates it executes an exit() system call, either directly in its own code, or indirectly via library code. The prototype for the exit() call is:
#include <std1ib.h>
void exit(int status);
The exit() call has no return value as the process that calls it terminates and so couldn't receive a value anyway. Notice, however, that exit() does take a parameter value - status. As well as causing a waiting parent process to resume execution, exit() also returns the status parameter value to the parent process via the location pointed to by the wait() parameter.
In fact, wait() can return several different pieces of information via the value to which the status parameter points. Consequently, a macro is provided called WEXITSTATUS() (accessed via ) which can extract and return the child's exit status. The following code fragment shows its use:
#include <sys/wait.h>
int statval, exstat;
pid_t pid;
pid = wait(&statval);
exstat = WEXITSTATUS(statval);
In fact, the version of wait() that we have just seen is only the simplest version available under Linux. The new POSIX version is called waitpid. The prototype for waitpid() is:
#include <sys/types.h>
#include <sys/wait.h>
pid_t waitpid(pid_t pid, int *status, int options);
where pid specifies what to wait for, status is the same as the simple wait() parameter and options allows you to specify that a call to waitpid() should not suspend the parent process if no child process is ready to report its exit status.
The various possibilities for the pid parameter are:
< -1 wait for a child whose PGID is -pid
-1 same behavior as standard wait()
0 wait for child whose PGID = PGID of calling process
> 0 wait for a child whose PID = pid
The standard wait() call is now redundant as the following waitpid() call is exactly equivalent:
#include <sys/wait.h>
int statval;
pid_t pid;
pid = waitpid(-1, &statval, 0);
It is possible for a child process which only executes for a very short time to terminate before its parent process has had the chance to wait() for it. In these circumstances the child process will enter a state, known as a zombie state, in which all its resources have been released back to the system except for its process data structure, which holds its exit status. When the parent eventually wait()s for the child, the exit status is delivered immediately and then the process data structure can also be released back to the system.

Related

Do I need two different queue submissions here?

I have once command buffer 1, and command buffer 2. Both have had their recordings finished, and now I want to submit both of them, preferably in an efficient way (one vkQueueSubmit call.
The second command buffer needs to wait until the first one has completed. I thought that I could do this with one submit:
/* I THINK THE FOLLOWING IS WRONG */
VkSubmitInfo info;
info.pCommandBuffers = /* Pointer to the first command buffer */;
info.commandBufferCount = 2;
info.pSignalSemaphores = /* The first pointer value points to the semaphore
that the first command buffer will signal when completed */
info.pWaitSemaphores = /* The second pointer value points to the same semaphore as above */
So I'm pretty sure the above is a wrong understanding, does that mean the only way in this case to have a dependency of the second command buffer on the first to have two different queue submissions? I mean two different VkSubmitInfo's?
Binary semaphores are meant for interqueue communication. And while you can use timeline semaphores for intraqueue communications, for something like this, it's best to take a step back and ask yourself what is really going on.
Because it's not true that CB2 needs to wait for CB1 to complete. What's happening is that the result of some operation within CB1 needs to complete before some operation within CB2 can start. This is best accomplished with a VkEvent. CB1's operation can set an event that CB2 corresponding operation will wait on.
There's no need to separate these operations into different batches, let alone different queue submissions.

Vulkan Queue submission synchronization - vkWaitForFences vs vkQueueWaitIdle [duplicate]

I have a function that copies data from one buffer to another, I need to synchronize its execution.
I have such a bad option:
void MainWindow::copyBuffer(VkBuffer srcBuffer, VkBuffer dstBuffer, VkDeviceSize size)
{
VkCommandBuffer commandBuffer;
vkAllocateCommandBuffers(logicalDevice, &allocInfo, &commandBuffer);
//Start recording
vkBeginCommandBuffer(commandBuffer, &beginInfo);
vkCmdCopyBuffer(commandBuffer, srcBuffer, dstBuffer, 1, &copyRegion);
vkEndCommandBuffer(commandBuffer);
//Run command buffer
vkQueueSubmit(graphicsQueue, 1, &submitInfo, VK_NULL_HANDLE);
//Waiting for completion
vkQueueWaitIdle(graphicsQueue);
vkFreeCommandBuffers(logicalDevice, commandPool, 1, &commandBuffer);
}
This option is bad because if I want to execute the copyBuffer() function several times, then all the buffers will be copied strictly one at a time.
I want to use a fence for each function call so that multiple calls can run in parallel.
So far, I have only such a solution:
void MainWindow::copyBuffer(VkBuffer srcBuffer, VkBuffer dstBuffer, VkDeviceSize size)
{
VkCommandBuffer commandBuffer;
vkAllocateCommandBuffers(logicalDevice, &allocInfo, &commandBuffer);
//Create fence
VkFenceCreateInfo fenceInfo{};
fenceInfo.sType = VK_STRUCTURE_TYPE_FENCE_CREATE_INFO;
fenceInfo.flags = VK_FENCE_CREATE_SIGNALED_BIT;
VkFence executionCompleteFence = VK_NULL_HANDLE;
if (vkCreateFence(logicalDevice, &fenceInfo, VK_NULL_HANDLE, &executionCompleteFence) != VK_SUCCESS) {
throw MakeErrorInfo("Failed to create fence");
}
//Start recording
vkBeginCommandBuffer(commandBuffer, &beginInfo);
vkCmdCopyBuffer(commandBuffer, srcBuffer, dstBuffer, 1, &copyRegion);
vkEndCommandBuffer(commandBuffer);
//Run command buffer
vkQueueSubmit(graphicsQueue, 1, &submitInfo, VK_NULL_HANDLE);
vkWaitForFences(logicalDevice, 1, &executionCompleteFence, VK_TRUE, UINT64_MAX);
vkResetFences(logicalDevice, 1, &executionCompleteFence);
vkFreeCommandBuffers(logicalDevice, commandPool, 1, &commandBuffer);
vkDestroyFence(logicalDevice, executionCompleteFence, VK_NULL_HANDLE);
}
Which of these options is better?
Is the second option written correctly?
Both functions are bad in the same way. They both block the CPU from doing anything until the transfer is done. And they both could be used to potentially submit multiple CBs to the same queue in the same frame, but with different submit commands.
Neither is desirable if performance is something you care about.
Ultimately, what you need to do is have your copyBuffer function not actually perform the copy. You should have a function which builds a command buffer to do a copy. That CB is then stored in a place to be submitted later with other copying CBs. Or better yet, you can have just one copying CB that each command adds to (the first one called in a frame will create the CB).
At some point in the future, before you've submitted the work that will use this data, you need to submit the transfer operations. And the way this works depends on if you're submitting the transfer operations on the same queue as the work that will consume them or not.
If they're on the same queue, then all you need to do is have an event in a command buffer at the end of your batch that synchronizes the transfer operations with their receivers. If you want to be more clever, each transfer operation can have its own event, which the receiving operations will wait on.
And in same-queue transfers, you also want to make sure that you submit the transfers in the same vkQueueSubmit call as the rest of your work. Or to put it another way, you should never make more than one call to vkQueueSubmit for a particular queue in a particular frame.
If you're dealing with separate queues, then things change. A bit. If timeline semaphores aren't an option, you'll need to submit your transfer work before you submit the receiving operations. This is because the transfer batch will need to signal a semaphore that the receiving operation will wait on. And a binary semaphore cannot be waited on until the operation that signals it has been submitted to a queue.
But otherwise, everything else stays the same. Of course, you don't need events since you're synchronizing by semaphore.
The two functions are semantically identical and do exactly the same blocking behavior.
The second is slightly better. vkQueueWaitIdle is kind of a debug and out-of-hotspot feature. It might incur a hidden second submit to signal the implicit fence.
You don't need to reset fence that you subsequently destroy anyway. And you are creating it presignaled, which is a bug. Also you forgot to pass it to the vkQueueSubmit.

Semaphore wait() and signal()

I am going through process synchronization, and facing difficulty in understanding semaphore. So here is my doubt:
the source says that
" Semaphore S is an integer variable that is accessed through standard atomic operations i.e. wait() and signal().
It also provided basic definition of wait()
wait(Semaphore S)
{
while S<=0
; //no operation
S--;
}
Definition of signal()
signal(S)
{
S++;
}
Let the initial value of a semaphore be 1, and say there are two concurrent processes P0 and P1 which are not supposed to perform operations of their critical section simultaneously.
Now say P0 is in its critical section, so the Semaphore S must have value 0, now say P1 wants to enter its critical section so it executes wait(), and in wait() it continuously loops, now to exit from the loop the semaphore value must be incremented, but it may not be possible because according the source, wait() is an atomic operation and can't be interrupted and thus the process P0 can't call signal() in a single processor system.
I want to know, is the understanding i have so far is correct or not. and if correct then how come process P0 call signal() when process P1 is strucked in while loop?
I think the top-voted answer is inaccurate!
Operation wait() and signal() must be completely atomic; no two processes can execute wait() or signal() operation simultaneously because they are implemented in kernel and processes in kernel mode can not be preempted.
If several processes attempt a P(S) simultaneously, only one process will be allowed to proceed(non-preemptive kernel that is free of race condition).
for the above implementation to work preemption is necessary (preemptive kernel)
read about the atomicity of semaphore operations
http://personal.kent.edu/~rmuhamma/OpSystems/Myos/semaphore.htm
https://en.wikibooks.org/wiki/Operating_System_Design/Processes/Semaphores
I think it's an inaccuracy in your source. Atomic for the wait() operation means each iteration of it is atomic, meaning S-- is performed without interruption, but the whole operation is interruptible after each completion of S-- inside the while loop.
I don't think, keeping an infinite while loop inside the wait() operation is wise. I would go for Stallings' example;
void semWait(semaphore s){
s.count--;
if(s.count<0)
*place this process in s.queue and block this process
}
I think what the book means for the atomic operation is testing S<=0 to be true as well as S--. Just like testAndset() it mention before.
if both separate operations S<=0 and S-- are atomic but can be interrupt by other process, this method won't work.
imagine two process p0 and p1, if p0 want to enter the critical section and tested S<=0 to be true. and it was interrupted by p1 and tested S<=0 also be true. then both of the process will enter the critical section. And that's wrong.
the actual not atomic operation is inside the while loop, even if the while loop is empty, other process can still interrupt current one when S<=0 tested to be false, which enable other process can continue their work in critical section and release the lock.
however, I think the code from the book can not actually use in OS since I don't know how to make operations S<=0 to be true and S-- together atomic. more possible way to do that is put the S-- inside the while loop like SomeWittyUsername said.
When a task attempts to acquire a semaphore that is unavailable, the semaphore places the task onto a wait queue and puts the task to sleep.The processor is then free to execute other code.When the semaphore becomes available, one of the tasks on the wait queue is awakened so that it can then acquire the semaphore.
while S<=0
; //no operation This doesn't mean that the processor running this code. The process/task is blocked until it gets the semaphore.
i think ,
when process P1 is strucked in while loop it will be in the wait state.processor will switch over among the process p0 & p1 (context switching) so the priority goes to p0 and it call signal() and then s will be incremented by 1 and p0 exit from the section so process P1 can enter into critical section and can avoid the mutual exclusion

Event handling in embedded code

I want to know how events are used in embedded system code.
Main intention is to know how exactly event flags are set/reset in code. and how to identify which task is using which event flag and which bits of the flag are getting set/reset by each task.
Please put your suggestion or comments about it.
Thanks in advance.
(edit 1: copied from clarification in answer below)
Sorry for not specifying the details required. Actually I am interested in the analysis of any application written in C language using vxworks/Itron/OSEK OS. For example there is eventLib library in vxworks to support event handling. I want to know that how one can make use of such system routines to handle events in task. What is event flag(is it global/local...or what ?), how to set bits of any event flag and which can be the possible relationship between task and event flags ??
How task can wait for multiple events in AND and OR mode ??
I came across one example in which the scenario given below looks dangerous, but why ??
Scenarios is ==> *[Task1 : Set(e1), Task2 : Wait(e1) and Set(e2), Task3 : Wait(e2) ]*
I know that multiple event flags waited by one task or circular dependency between multiple tasks(deadlock) are dangerous cases in task-event relationship, but how above scenario is dangerous, I am not getting it....Kindly explain.
(Are there any more such scenarios possible in task-event handling which should be reviewed in code ?? )
I hope above information is sufficient ....
Many embedded systems use Interrupt Service Routines (ISR) to handle events. You would define an ISR for a given "flag" and reset that flag after you handle the event.
For instance say you have a device performing Analog to Digital Conversions (ADC). On the device you could have an ISR that fires each time the ADC completes a conversion and then handle it within the ISR or notify some other task that the data is available (if you want to send it across some communications protocol). After you complete that you would reset the ADC flag so that it can fire again at it's next conversion.
Usually there are a set of ISRs defined in the devices manual. Sometimes they provide general purpose flags that you could also handle as you wish. Each time resetting the flag that caused the routine to fire.
The eventLib in VxWorks is similar to signal() in unix -- it can indicate to a different thread that something occurred. If you need to pass data with the event, you may want to use Message Queues instead.
The events are "global" between the sender and receiver. Since each sender indicates which task the event is intended for, there can be multiple event masks in the system with each sender/receiver pair having their own interpretation.
A basic example:
#define EVENT1 0x00000001
#define EVENT2 0x00000002
#define EVENT3 0x00000004
...
#define EVENT_EXIT 0x80000000
/* Spawn the event handler task (event receiver) */
rcvTaskId = taskSpawn("tRcv",priority,0,stackSize,handleEvents,0,0,0,0,0,0,0,0,0,0);
...
/* Receive thread: Loop to receive events */
STATUS handleEvents(void)
{
UINT32 rcvEventMask = 0xFFFFFFFF;
while(1)
{
UINT32 events = 0;
if (eventReceive(rcvEventMask. EVENTS_WAIT_ANY, WAIT_FOREVER, &events) == OK)
{
/* Process events */
if (events & EVENT1)
handleEvent1();
if (events & EVENT2)
handleEvent2();
...
if (events & EVENT_EXIT)
break;
}
}
return OK;
}
The event sender is typically a hardware driver (BSP) or another thread. When a desired action occurs, the driver builds a mask of all pertinent events and sends them to the receiver task.
The sender needs to obtain the taskID of the receiver. The taskID can be a global,
int RcvTaskID = ERROR;
...
eventSend(RcvTaskID, eventMask);
it can be registered with the driver/sender task by the receiver,
static int RcvTaskID = ERROR;
void DRIVER_setRcvTaskID(int rcvTaskID)
{
RcvTaskID = rcvTaskID;
}
...
eventSend(RcvTaskID, eventMask);
or the driver/sender task can call a receiver API method to send the event (wrapper).
static int RcvTaskID;
void RECV_sendEvents(UINT32 eventMask)
{
eventSend(RcvTaskID, eventMask);
}
This question needs to provide more context. Embedded systems can be created using a wide range of languages, operating systems (including no operating system), frameworks etc. There is nothing universal about how events are created and handled in an embedded system, just as there is nothing universal about how events are created and handled in computing in general.
If you're asking how to set, clear, and check the various bits that represent events, this example may help. The basic strategy is to declare a (usually global) variable and use one bit to represent each condition.
unsigned char bit_flags = 0;
Now we can assign events to the bits:
#define TIMER_EXPIRED 0x01 // 0000 0001
#define DATA_READY 0x02 // 0000 0010
#define BUFFER_OVERFLOW 0x04 // 0000 0100
And we can set, clear, and check bits with bitwise operators:
// Bitwise OR: bit_flags | 00000001 sets the first bit.
bit_flags |= TIMER_EXPIRED; // Set TIMER_EXPIRED bit.
// Bitwise AND w/complement clears bits: flags & 11111101 clears the 2nd bit.
bit_flags &= ~DATA_READY; // Clear DATA_READY bit.
// Bitwise AND tests a bit. The result is BUFFER_OVERFLOW
// if the bit is set, 0 if the bit is clear.
had_ovflow = bit_flags & BUFFER_OVERFLOW;
We can also set or clear combinations of bits:
// Set DATA_READY and BUFFER_OVERFLOW bits.
bit_flags |= (DATA_READY | BUFFER_OVERFLOW);
You'll often see these operations implemented as macros:
#define SET_BITS(bits, data) data |= (bits)
#define CLEAR_BITS(bits, data) data &= ~(bits)
#define CHECK_BITS(bits, data) (data & (bits))
Also, a note about interrupts and interrupt service routines: they need to run fast, so a typical ISR will simply set a flag, increment a counter, or copy some data and exit immediately. Then you can check the flag and attend to the event at your leisure. You probably do not want to undertake lengthy or error-prone activities in your ISR.
Hope that's helpful!
Sorry for not specifying the details required. Actually I am interested in the analysis of any application written in C language using vxworks/Itron/OSEK OS.
For example there is eventLib library in vxworks to support event handling.
I want to know that how one can make use of such system routines to handle events in task. What is event flag(is it global/local...or what ?), how to set bits of any event flag and which can be the possible relationship between task and event flags ??
I hope above information is sufficient ....
If you're interested in using event-driven programming at the embedded level you should really look into QP. It's an excellent lightweight framework and if you get the book "Practical UML Statecharts in C/C++" by Miro Samek you find everything from how to handle system events in an embedded linux kernel (ISR's etc) to handling and creating them in a build with QP as your environment. (Here is a link to an example event).
In one family of embedded systems I designed (for a PIC18Fxx micro with ~128KB flash and 3.5KB RAM), I wrote a library to handle up to 16 timers with 1/16-second resolution (measured by a 16Hz pulse input to the CPU). The code is set up to determine whether any timer is in the Expired state or any dedicated wakeup pin is signaling, and if not, sleep until the next timer would expire or a wakeup input changes state. Quite a handy bit of code, though I should in retrospect probably have designed it to work with multiple groups of eight timers rather than one set of 16.
A key aspect of my timing routines which I have found to be useful is that they mostly aren't driven by interrupts; instead I have a 'poll when convenient' routine which updates the timers off a 16Hz counter. While it sometimes feels odd to have timers which aren't run via interrupt, doing things that way avoids the need to worry about interrupts happening at odd times. If the action controlled by a timer wouldn't be able to happen within an interrupt (due to stack nesting and other limitations), there's no need to worry about the timer in an interrupt--just keep track of how much time has passed.

Precisely time a function call

I am using a microcontroller with a C51 core. I have a fairly timeconsuming and large subroutine that needs to be called every 500ms. An RTOS is not being used.
The way I am doing it right now is that I have an existing Timer interrupt of 10 ms. I set a flag after every 50 interrupts that is checked for being true in the main program loop. If the Flag is true the subroutine is called. The issue is that by the time the program loop comes round to servicing the flag, it is already more than 500ms,sometimes even >515 ms in case of certain code paths. The time taken is not accurately predictable.
Obviously, the subroutine cannot be called from inside the timer interrupt due to that large time it takes to execute.The subroutine takes 50ms to 89ms depending upon various conditions.
Is there a way to ensure that the subroutine is called in exactly 500ms each time?
I think you have some conflicting/not-thought-through requirements here. You say that you can't call this code from the timer ISR because it takes too long to run (implying that it is a lower-priority than something else which would be delayed), but then you are being hit by the fact that something else which should have been lower-priority is delaying it when you run it from the foreground path ('program loop').
If this work must happen at exactly 500ms, then run it from the timer routine, and deal with the fall-out from that. This is effectively what a pre-emptive RTOS would be doing anyway.
If you want it to run from the 'program loop', then you will have to make sure than nothing else which runs from that loop ever takes more than the maximum delay you can tolerate - often that means breaking your other long-running work into state-machines which can do a little bit of work per pass through the loop.
I don't think there's a way to guarantee it but this solution may provide an acceptable alternative.
Might I suggest not setting a flag but instead modifying a value?
Here's how it could work.
1/ Start a value at zero.
2/ Every 10ms interrupt, increase this value by 10 in the ISR (interrupt service routine).
3/ In the main loop, if the value is >= 500, subtract 500 from the value and do your 500ms activities.
You will have to be careful to watch for race conditions between the timer and main program in modifying the value.
This has the advantage that the function runs as close as possible to the 500ms boundaries regardless of latency or duration.
If, for some reason, your function starts 20ms late in one iteration, the value will already be 520 so your function will then set it to 20, meaning it will only wait 480ms before the next iteration.
That seems to me to be the best way to achieve what you want.
I haven't touched the 8051 for many years (assuming that's what C51 is targeting which seems a safe bet given your description) but it may have an instruction which will subtract 50 without an interrupt being possible. However, I seem to remember the architecture was pretty simple so you may have to disable or delay interrupts while it does the load/modify/store operation.
volatile int xtime = 0;
void isr_10ms(void) {
xtime += 10;
}
void loop(void) {
while (1) {
/* Do all your regular main stuff here. */
if (xtime >= 500) {
xtime -= 500;
/* Do your 500ms activity here */
}
}
}
You can also use two flags - a "pre-action" flag, and a "trigger" flag (using Mike F's as a starting point):
#define PREACTION_HOLD_TICKS (2)
#define TOTAL_WAIT_TICKS (10)
volatile unsigned char pre_action_flag;
volatile unsigned char trigger_flag;
static isr_ticks;
interrupt void timer0_isr (void) {
isr_ticks--;
if (!isr_ticks) {
isr_ticks=TOTAL_WAIT_TICKS;
trigger_flag=1;
} else {
if (isr_ticks==PREACTION_HOLD_TICKS)
preaction_flag=1;
}
}
// ...
int main(...) {
isr_ticks = TOTAL_WAIT_TICKS;
preaction_flag = 0;
tigger_flag = 0;
// ...
while (1) {
if (preaction_flag) {
preaction_flag=0;
while(!trigger_flag)
;
trigger_flag=0;
service_routine();
} else {
main_processing_routines();
}
}
}
A good option is to use an RTOS or write your own simple RTOS.
An RTOS to meet your needs will only need to do the following:
schedule periodic tasks
schedule round robin tasks
preform context switching
Your requirements are the following:
execute a periodic task every 500ms
in the extra time between execute round robin tasks ( doing non-time critical operations )
An RTOS like this will guarantee a 99.9% chance that your code will execute on time. I can't say 100% because whatever operations your do in your ISR's may interfere with the RTOS. This is a problem with 8-bit micro-controllers that can only execute one instruction at a time.
Writing an RTOS is tricky, but do-able. Here is an example of small ( 900 lines ) RTOS targeted at ATMEL's 8-bit AVR platform.
The following is the Report and Code created for the class CSC 460: Real Time Operating Systems ( at the University of Victoria ).
Would this do what you need?
#define FUDGE_MARGIN 2 //In 10ms increments
volatile unsigned int ticks = 0;
void timer_10ms_interrupt( void ) { ticks++; }
void mainloop( void )
{
unsigned int next_time = ticks+50;
while( 1 )
{
do_mainloopy_stuff();
if( ticks >= next_time-FUDGE_MARGIN )
{
while( ticks < next_time );
do_500ms_thingy();
next_time += 50;
}
}
}
NB: If you got behind with servicing your every-500ms task then this would queue them up, which may not be what you want.
One straightforward solution is to have a timer interrupt that fires off at 500ms...
If you have some flexibility in your hardware design, you can cascade the output of one timer to a second stage counter to get you a long time base. I forget, but I vaguely recall being able to cascade timers on the x51.
Ah, one more alternative for consideration -- the x51 architecture allow two levels of interrupt priorities. If you have some hardware flexibility, you can cause one of the external interrupt pins to be raised by the timer ISR at 500ms intervals, and then let the lower-level interrupt processing of your every-500ms code to occur.
Depending on your particular x51, you might be able to also generate a lower priority interrupt completely internal to your device.
See part 11.2 in this document I found on the web: http://www.esacademy.com/automation/docs/c51primer/c11.htm
Why do you have a time-critical routine that takes so long to run?
I agree with some of the others that there may be an architectural issue here.
If the purpose of having precise 500ms (or whatever) intervals is to have signal changes occuring at specific time intervals, you may be better off with a fast ISR that ouputs the new signals based on a previous calculation, and then set a flag that would cause the new calculation to run outside of the ISR.
Can you better describe what this long-running routine is doing, and what the need for the specific interval is for?
Addition based on the comments:
If you can insure that the time in the service routine is of a predictable duration, you might get away with missing the timer interrupt postings...
To take your example, if your timer interrupt is set for 10 ms periods, and you know your service routine will take 89ms, just go ahead and count up 41 timer interrupts, then do your 89 ms activity and miss eight timer interrupts (42nd to 49th).
Then, when your ISR exits (and clears the pending interrupt), the "first" interrupt of the next round of 500ms will occur about a ms later.
Given that you're "resource maxed" suggests that you have your other timer and interrupt sources also in use -- which means that relying on the main loop to be timed accurately isn't going to work, because those other interrupt sources could fire at the wrong moment.
If I'm interpretting your question correctly, you have:
a main loop
some high priority operation that needs to be run every 500ms, for a duration of up to 89ms
a 10ms timer that also performs a small number of operations.
There are three options as I see it.
The first is to use a second timer of a lower priority for your 500ms operations. You can still process your 10ms interrupt, and once complete continue servicing your 500ms timer interrupt.
Second option - doe you actually need to service your 10ms interrupt every 10ms? Is it doing anything other than time keeping? If not, and if your hardware will allow you to determine the number of 10ms ticks that have passed while processing your 500ms op's (ie. by not using the interrupts themselves), then can you start your 500ms op's within the 10ms interrupt and process the 10ms ticks that you missed when you're done.
Third option: To follow on from Justin Tanner's answer, it sounds like you could produce your own preemptive multitasking kernel to fill your requirements without too much trouble.
It sounds like all you need is two tasks - one for the main super loop and one for your 500ms task.
The code to swap between two contexts (ie. two copies of all of your registers, using different stack pointers) is very simple, and usually consists of a series of register pushes (to save the current context), a series of register pops (to restore your new context) and a return from interrupt instruction. Once your 500ms op's are complete, you restore the original context.
(I guess that strictly this is a hybrid of preemptive and cooperative multitasking, but that's not important right now)
edit:
There is a simple fourth option. Liberally pepper your main super loop with checks for whether the 500ms has elapsed, both before and after any lengthy operations.
Not exactly 500ms, but you may be able to reduce the latency to a tolerable level.