Context switch time - Role of RTOS and Processor

Context switch time - Role of RTOS and Processor - embedded

Does the RTOS play a major role or processor play a major role in determining the time for context switch ? What is the percentage of share between these two major players in determining the context switch time .
Can anyone tell with respect to uC/OS-II RTOS ?

I would say both are significant, but it is not really as simple as that:
The actual context switch time is simply a matter of the number of instruction cycles required to perform the switch, like anything in software it may be coded efficiently or it may not. On the other hand, all other things being equal, a processor with a large register set will require more instruction cycles to save the context; but having a large register set may make other code far more efficient.
A processor may also have an architecture that directly supports fast context switching. For example the lowly 8bit 8051 has four duplicate register banks; so a context switch is little more that a register bank switch (so long as you have not more that four threads), and given that Silicon Labs produce 8051 based devices at 100MIPS, that could be very fast indeed!
More sophisticated processors and operating systems may use an MMU to provide thread memory protection, this is an additional context switch overhead but with benefits that may override that. Also of course such processors generally also have high clock rates which helps.
So all in all, the processor speed, the processor architecture, the quality of the RTOS implementation, and the functionality provided by the RTOS may all affect context switch time. But in the end the easiest way to improve switch time is almost certainly to increase the clock rate.
Although it is nice to have more headroom, if context switch time is a make or break issue for your project on any reputable RTOS you should consider the suitability of either your hardware or your design. You should aim toward a design that minimises context switches. For example, if an ADC conversion takes 6us and a context switch takes 20us, then you would do better to busy-wait than to use a conversion-complete interrupt; better yet use DMA transfers to avoid context switches on single data items where possible.

uC/OS-II RTOS is written in C, with some very specific sections(maybe in assembly) for the processor specific handling. The context switching will be part of the sections that are very specific to the processor.
So the context switch time will be very dependent on the processor selected and the specific sections used to adapt uC/OS-II to that processor. I believe all the source code is available so you should be able to see how much source is needed for a context switch. I also think uC/OS-II has callback's that may allow you to add some performance measuring code.

Just to complete on what Clifford was saying, context switching time also depends on the conditions that trigger the context switch, so mainly it depends on the benchmark.
Depending on the RTOS implementation, in some cases it's possible to switch directly to the first waiting process bypassing the scheduler altogether.
This of course gives a huge boost in some benchmarks.
For example we make some benchmark that measures the overhead (in µs) required to deliver a signal and switch to the high-priority process varying the particular kernel configuration and the target architecture:
http://www.bertos.org/discover/context-switch-overhead

Related

is it recommended to use SPI flash to run code instead internal flash due to memory limitation of internal flash?

We used the LPC546xx family microcontroller in our project, currently, at the initial stage, we are finalizing the software and hardware requirements. The basic firmware size (which contains RTOS, 3rd party stack, library, etc...) currently is 480 KB. Now once full application developed than the size will exceed the internal flash size (512KB) and plus we needed storage which can hold firmware update image separately.
So we planned to use SPI flash (S25LP064A-JBLE, http://www.issi.com/WW/pdf/IS25LP032-064-128.pdf, serial flash memory) of 4MB\8MB to boot and run firmware.
is it recommended to run code from SPI flash? how can I map external flash memory directly to CPU memory space? Can anyone give an example that contains this memory mapping(linker script etc..) or demo application in which LPC546xx uses SPI FLASH?

Generally speaking it's not recommended, or differently put: the closer to the CPU the better. Both S25LP064A and LPC546xx however support XIP, so it is viable.
This is not a trivial issue as many aspects are affecting. I.e. issue is best avoided and should really have been ironed out in the planning stage. Embedded Systems are more about compromising than anything and making the right/better choices takes skill end experience.
Same question with replies on the NXP forum: link
512K of NVRAM is huge. There are almost certainly room for optimisations even if 3'rd party libraries are used.
On a related note this discussion concerning XIP should give valuable insight: link.
I would strongly encourage use of file-systems if not done already, for which external storage is much better suited. The further from the computational unit, the more relevant. That's not XIP and the penalty is copy-to-RAM either way you do it. I.e. performance will be slower. But in my experience, the need for speed has often-times not been thoroughly considered and at least partially greatly overestimated.
Regarding your mentioning of RTOS and FW-upgrade:
Unless it's a poor RTOS there's file-system awareness built in. Especially for FW upgrading (Note: you'll need room for 3 images, factory reset included), unless already supported by the SoC-vendor by some other means (OTA), it will make life much easier and less risky. If there's no FS-awareness, it can be added.
FW upgrade requires a lot of extra storage. More if simpler. Simpler is however also safer which especially for FW upgrades matters hugely. In the simplest case (binary flat image), you'll need at least twice the amount of memory you're already consuming.
All-in-all: I think the direction you're going is viable and depending on the actual situation perhaps your only choice.

Interrupts execution context

I'm trying to figure out this basic scenario:
Suppose my cpu received an exception or an interrupt. What I do know, is that the cpu starts to perform an interrupt service routine (looks at the idtr register to locate the idt table, and goes to the appropriate entry to receive the isr address), but in what context is the code running?
Meaning if I have a thread currently running and generating an interrupt of some sort, in which context will the isr run, in the initial process that "holds" the thread, or in some other magical thread?
Thanks!

Interesting question, which raises a few different issues.
The first is that interrupts don’t actually run inside of any thread from the CPU’s perspective. Indeed, the CPU itself is barely aware of threads; it may know a bit more if it has hyper threading or some similar technology, but a thread is really an operating system thing (or, sometimes, an application thing).
The second is that ISRs (Interrupt Service Routines) generally run at some elevated privilege level; you don’t really say which processor family you’re talking about, so it’s difficult to be specific, but modern processors normally have at least one special mode that they enter for handling interrupts — often with its own register bank. One might also ask, as part of your question, whose page table is active during an interrupt?
Third is the question of whose memory map ISRs have when they are entered. The answer, again, is going to be highly processor specific; it’s possible to imagine architectures that disable paging on ISR entry, other architectures that switch automatically to an interrupt page table, and (probably the most common approach) those that decide not to bother doing anything about the page table when entering an ISR.
The fourth is that some operating systems have policies of their own on these kinds of things. A common approach on modern operating systems is to make ISRs themselves as short as possible, and where any significant work needs to be done, convert the interrupt into some kind of event that can be handled by a kernel thread (or even, potentially, by a user thread). In this kind of system, the code that actually handles an interrupt may well be running in a specific thread, though it probably isn’t actually an interrupt service routine at that point.
Summary:
ISRs themselves don’t really run in the context of any given thread.
ISRs may run with the page table of the interrupted thread (depends on architecture).
ISRs may start with a copy of that thread’s registers (depends on architecture).
In modern systems, ISRs commonly try to schedule an event and then exit quickly. That event might be handled by a specific thread (e.g. for processor exceptions, it’s usually delivered as a signal or Structured Exception or similar to the thread that caused it); or by a pool of threads (e.g. to service I/O in the kernel).
If you’re interested in the specifics for x86 (I guess you are, as you use some Intel specific terms in your question), you need to look at the Intel 64 and IA-32 Architectures Software Developer’s Manual, volume 3B, and you’ll need to look at the operating system documentation. x86 is a very complicated architecture compared to some others — for instance, it can optionally perform a task switch on interrupt delivery (if you put a “task gate” in the IDT), in which case it will certainly have its own set of registers and quite possibly its own page table; even if this feature is used by a given operating system, there is no guarantee that x86 tasks map straightforwardly (or at all) to operating system processes and/or threads.

What is the best way to implement my program for Keil MCB1700 evaluation board?

I want to develop a program for MCB1700 evaluation board.
Client software of PC reads a picture from HDD.
Then it sends the picture to MCB1700 evaluation board through socket (Ethernet).
Server of MCB1700 receives pictures from PC through Socket-connection and displays it on LCD.
Also server must perform such tasks:
To save picture to USB-stick;
To read picture from USB-stick and send it to client through socket;
To send and receive information through CAN
COM-logging.
etc.
Socket connection can be implemented with help of CMSIS and RL-ARM libraries.
But, as far as I understood, in both cases sofrware has to listen for incoming TCP-connection and handle network's events in an endless loop - All examples of Keil are based on such principle.
I always thought, that it is a poor way for embedded programming to use endless loops.
Moreover, I read such interesting statement
"it is certainly possible to create real-time programs without an RTOS
(by executing one or more tasks in a loop)"
http://www.keil.com/support/man/docs/rlarm/rlarm_ar_artxarm.htm
So, as I understood, it is normal practice to execute a lot of tasks in loop?
while (1) {
task1();
task2();
...
taskN();
}
I think that it is better to handle all events by interrupts.
Is it possible to use socket conection of CMSIS and RL-ARM libraries and organize all my software by handling of interrupts?
My server (on MCB1700) has to perform a lot of tasks. I guess, I should use RTOS RTX in my software. Isn't it so? Is it better to implement my software without RTX?

Simple real-time systems often operate in a "big-loop" architecture, with some time critical events handled by interrupts. It is not a "bad" architecture, bit it is somewhat inflexible, scales poorly, and any change may affect the timing of and or all parts of the system.
It is not true that RL-TCPnet must work this way, but it is designed to run stand alone, and the examples work that way so that there are no dependencies on other libraries for the widest applicability. They are only examples and are intended to demonstrate RL-TCPnet and nothing else. In that sense the examples are not realistic and should be adapted to your requirements.
You might devise a system where all your application code runs in interrupt handlers and the network stack is services in an endless-loop in the thread context, but architecturally that is probably far worse than the "big-loop" architecture, since you may end up with very long interrupt handlers, with the higher priority ones starving and affecting the real-time response of lower priority ones. You end up with a system that is difficult to schedule satisfactorily. Moreover not all library routines are suitable for execution in an interrupt handler, so it would be rather restrictive.
It is possible to service the network stack in a low priority thread in an RTOS. The framework for such operation is described in the documentation in the section: Using RL-TCPnet with RTX kernel.. This will require you to understand and use the RL-RTX kernel library, but given your reservations about "big loop" code, and the description of tasks your system must perform, it sounds like that is what you want to do in any case. If you choose to use a different RTOS, then RL-TCPnet can work in a similar way with any RTOS.

Why would I consider using an RTOS for my embedded project?

First the background, specifics of my question will follow:
At the company that I work at the platform we work on is currently the Microchip PIC32 family using the MPLAB IDE as our development environment. Previously we've also written firmware for the Microchip dsPIC and TI MSP families for this same application.
The firmware is pretty straightforward in that the code is split into three main modules: device control, data sampling, and user communication (usually a user PC). Device control is achieved via some combination of GPIO bus lines and at least one part needing SPI or I2C control. Data sampling is interrupt driven using a Timer module to maintain sample frequency and more SPI/I2C and GPIO bus lines to control the sampling hardware (ie. ADC). User communication is currently implemented via USB using the Microchip App Framework.
So now the question: given what I've described above, at what point would I consider employing an RTOS for my project? Currently I'm thinking of these possible trigger points as reasons to use an RTOS:
Code complexity? The code base architecture/organization is still small enough that I can keep all the details in my head.
Multitasking/Threading? Time-slicing the module execution via interrupts suffices for now for multitasking.
Testing? Currently we don't do much formal testing or verification past the HW smoke test (something I hope to rectify in the near future).
Communication? We currently use a custom packet format and a protocol that pretty much only does START, STOP, SEND DATA commands with data being a binary blob.
Project scope? There is a possibility in the near future that we'll be getting a project to integrate our device into a larger system with the goal of taking that system to mass production. Currently all our projects have been experimental prototypes with quick turn-around of about a month, producing one or two units at a time.
What other points do you think I should consider? In your experience what convinced (or forced) you to consider using an RTOS vs just running your code on the base runtime? Pointers to additional resources about designing/programming for an RTOS is also much appreciated.

There are many many reasons you might want to use an RTOS. They are varied & the degree to which they apply to your situation is hard to say. (Note: I tend to think this way: RTOS implies hard real time which implies preemptive kernel...)
Rate Monotonic Analysis (RMA) - if you want to use Rate Monotonic Analysis to ensure your timing deadlines will be met, you must use a pre-emptive scheduler
Meet real-time deadlines - even without using RMA, with a priority-based pre-emptive RTOS, your scheduler can help ensure deadlines are met. Paradoxically, an RTOS will typically increase interrupt latency due to critical sections in the kernel where interrupts are usually masked
Manage complexity -- definitely, an RTOS (or most OS flavors) can help with this. By allowing the project to be decomposed into independent threads or processes, and using OS services such as message queues, mutexes, semaphores, event flags, etc. to communicate & synchronize, your project (in my experience & opinion) becomes more manageable. I tend to work on larger projects, where most people understand the concept of protecting shared resources, so a lot of the rookie mistakes don't happen. But beware, once you go to a multi-threaded approach, things can become more complex until you wrap your head around the issues.
Use of 3rd-party packages - many RTOSs offer other software components, such as protocol stacks, file systems, device drivers, GUI packages, bootloaders, and other middleware that help you build an application faster by becoming almost more of an "integrator" than a DIY shop.
Testing - yes, definitely, you can think of each thread of control as a testable component with a well-defined interface, especially if a consistent approach is used (such as always blocking in a single place on a message queue). Of course, this is not a substitute for unit, integration, system, etc. testing.
Robustness / fault tolerance - an RTOS may also provide support for the processor's MMU (in your PIC case, I don't think that applies). This allows each thread (or process) to run in its own protected space; threads / processes cannot "dip into" each others' memory and stomp on it. Even device regions (MMIO) might be off limits to some (or all) threads. Strictly speaking, you don't need an RTOS to exploit a processor's MMU (or MPU), but the 2 work very well hand-in-hand.
Generally, when I can develop with an RTOS (or some type of preemptive multi-tasker), the result tends to be cleaner, more modular, more well-behaved and more maintainable. When I have the option, I use one.
Be aware that multi-threaded development has a bit of a learning curve. If you're new to RTOS/multithreaded development, you might be interested in some articles on Choosing an RTOS, The Perils of Preemption and An Introduction to Preemptive Multitasking.
Lastly, even though you didn't ask for recommendations... In addition to the many numerous commercial RTOSs, there are free offerings (FreeRTOS being one of the most popular), and the Quantum Platform is an event-driven framework based on the concept of active objects which includes a preemptive kernel. There are plenty of choices, but I've found that having the source code (even if the RTOS isn't free) is advantageous, esp. when debugging.

RTOS, first and foremost permits you to organize your parallel flows into the set of tasks with well-defined synchronization between them.
IMO, the non-RTOS design is suitable only for the single-flow architecture where all your program is one big endless loop. If you need the multi-flow - a number of tasks, running in parallel - you're better with RTOS. Without RTOS you'll be forced to implement this functionality in-house, re-inventing the wheel.

Code re-use -- if you code drivers/protocol-handlers using an RTOS API they may plug into future projects easier
Debugging -- some IDEs (such as IAR Embedded Workbench) have plugins that show nice live data about your running process such as task CPU utilization and stack utilization

Usually you want to use an RTOS if you have any real-time constraints. If you don’t have real-time constraints, a regular OS might suffice. RTOS’s/OS’s provide a run-time infrastructure like message queues and tasking. If you are just looking for code that can reduce complexity, provide low level support and help with testing, some of the following libraries might do:
The standard C/C++ libraries
Boost libraries
Libraries available through the manufacturer of the chip that can provide hardware specific support
Commercial libraries
Open source libraries

Additional to the points mentioned before, using an RTOS may also be useful if you need support for
standard storage devices (SD, Compact Flash, disk drives ...)
standard communication hardware (Ethernet, USB, Firewire, RS232, I2C, SPI, ...)
standard communication protocols (TCP-IP, ...)
Most RTOSes provide these features or are expandable to support them

Processor can support/require an RTOS?

I have few queries related with going in for an RTOS for the different processors in hand. These are generic questions. Maybe you can clarify with examples specific to any processor/rtos or even generally. How to determine if a processor can support a RTOS ? How to know if the processor requires a RTOS ?

Does a processor requires an RTOS?
No - you don't require an RTOS. You can have a sophisticated embedded application running without one. The applications that I am working on currently does not have an RTOS.
We have to think about scheduling various tasks in our application, and have to write code that schedules these tasks. We achieve most of it by simply using software timers and timeslicing different tasks as we deem appopriate. However, having an RTOS can make the process a lot easier by scheduling different parts of your code seamlessly, and you don't really have to worry about taking care of that then.
You have to consider a few things when you choose an RTOS. How much RAM does your processor have? How much FLASH do you have? You don't want to put an expensive chip on your board, and a heavy RTOS, if you don't need all the features of it.
For basic scheduling stuff, you can get relatively small RTOS's, that are not huge and that will do most things you want quite efficiently.
e.g. Free RTOS is open source and is roughly 9K's only
You can also choose to use RTOS' like VxWorks or Embedded Linux that do a whole lot more, but are either expensive or huge or both.
In the end, the RTOS you use really depends on what your application's needs are, and how much memory you have to spare for it.

This is another "how long is a piece of string" question, but I will give it +1 for being interesting.
Second point first. I don't think that a processor can require an RTOS; I would rather say that an application can.
As to whether a processor can support an RTOS, your principle questions are going to be how heavily you load it, how many events it must handle & how much processing they require, etc, and also the availability of interrupt handling mechanisms, etc.
Do you have a particular processor, ROTS, application in mind, or is this just a general question?

No processor REQUIRES an RTOS. RT is a feature of the programming, not something a processor can DEMAND.
EVERY processor that I know of supports RTOS - a hardware interrupt will interrupt at next instruction. It is basically the OS that stops that and handles things in a non-real-time fashion.

Why would a processor require and RTOS? After all an RTOS is just software running directly on the hardware, that software could equally be your application running directly on the hardware instead. That part of your question makes little sense. Now if you have a processor designed to run say Java code by executing bytecode in hardware, it would not make sense to use that processor with anything other than a JVM as the foundation for an application, but I cannot think of a processor that is so tailored to RTOS implementation that you could not use it without an RTOS.
Now with respect to whether a processor can support an RTOS the simplest way is to see if there is a commercial RTOS already implemented for it. Most processor vendors will ensure that such support is in place from one or more third-parties before a chip is generally available. Generally I would suggest that anything with an interrupt mechanism and timer hardware can support an RTOS or at least a scheduler of some sort given sufficient resources. However there are some very resource constrained microcontrollers where it would simply make no sense.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas