Is uint16_t and uint32_t interrupt safe in Cortex M architecture? - interrupt

I am working on some embedded stuff. I had multiple interrupts possibly working on same data and so I was wondering if uint16_t and uint32_t data types are interrupt safe.
If interrupt is working a uint16/32_t data and is halfway interrupted by another interrupt that is trying to read this data, it will see corrupted data. Is this a possible scenario?
Thanks

To expand on the answer from #DinhQC, all single-result instructions on 16- and 32-bit data types are 'atomic' with respect to interrupts on the Cortex-M as long as the data is properly aligned (and you have to try quite hard to get the C compiler to give you unaligned data, because unaligned accesses are slow and need special treatment). Multiple-result operations like LDM and STM can be interrupted and resumed, on most implementations, but the integrity of each individual 32-bit transfer within the LDM or STM is guaranteed.
The important thing is to understand whether the operations you're performing are single instructions at the machine level or not. If you're incrementing a shared variable, for example, this will take three instructions: a read, a modify, and a write. If an interrupt occurs in between the read and the write, and the interrupt service routine modifies the same variable, this modification will be overwritten when the ISR returns.
The safe way to go is to use some kind of hardware-supported mechanism to enforce atomicity or mutual exclusion over your shared data. There are more powerful, more flexible and faster approaches to mutual exclusion on the Cortex-M than disabling and re-enabling interrupts, though, notably the STREX and LDREX instructions (which are available in C too). Take a look at my answer to this other question for more information.

Cortex-M processors do not corrupt and give your data undefined value. The value will always be deterministic. However, there are many conditions that affect the value of the data in case of interrupts. The uint16/32_t data can be located in the memory, or only inside the processor registers. If in memory, it can be 16/32-bit aligned or not 16/32-bit aligned. The processor, e.g. M0 or M4, and the operation performed on the data, e.g. add or multiply, also matter. All of those will determine whether the instruction used to process the data is atomic or not.
You can find more details in this discussion and this answered by Joseph Yiu.
Generally speaking, if the instruction is atomic (single execution cycle), the interrupt cannot disturb the data operation. However, at your C code level, uint16/32_t data operation may take more than 1 instruction. Therefore, it is hard to guarantee that the program runs as expected. This also applies to uint8_t data. You may wish to disable interrupts before working on the shared data and enable interrupts afterwards. The technique is covered well in this answer (look at point 2).

Related

How to get data from an interrupt into a task-based system?

I am preparing for an embedded systems interview and was given this question as one of the questions to prepare with.
From my research, I learned that interrupts are called by a piece of hardware when there is an important issue that the OS should take care of and because of this data cannot be returned by an interrupt.
However, I didn't find any definitive information about how interrupts work with a task-based system. Is this a trick question about interrupts or is there a method to get data from them?
It is true that an interrupt cannot "return" data to a caller, because there is no caller. Interrupts are triggered by asynchronous events independent of normal program flow.
However it is possible for an interrupt pass data to a thread/task context (or even other interrupts) via shared memory or inter-process communication (IPC) such as a message queue, pipe or mailbox. Such data exchange must be non-blocking. So for example, if a message queue is full, the ISR may not wait on the queue to become available - it must baulk and discard the data.
interrupts are called [...] when there is an important issue that the OS should take care of [...]
It is not about "importance" it is about timliness, determinusm, meeting real-time deadlines, and dealing with data before buffers or FIFOs are overrun for example. There need not even be an OS, ant interrupts are generally application specific and not an issue for the OS at all.
I didn't find any definitive information about how interrupts work with a task-based system.
Perhaps you need to hone your research technique (or Google Fu). https://www.google.com/search?q=rtos+interrupt+communication
It's not a trick question. The problem is the same whether you have a "task-based system" (RTOS) or not:
Interrupts will happen at random times, relative to your main program. I.e. you could be calculating something like a = b + c and the interrupt could happen at a time where b is loaded into a register, but before c is.
Interrupts execute in a different context from your main program. Depending on the architecture, you could have a separate stack, for example.
Given the above, the interrupt service routines cannot take any parameters and cannot return any values. The way to pass data in and out of an ISR is to use shared memory (e.g. a global variable).
There are several ways to get data from interrupt:
Using queue or similar approach (implement in freeRTOS). NOTE: there is special API for ISR.
Using global variable. For this way you need to care about consistency of data because interrupt can happen anytime.
...

Will semaphore corrupt data transmission of peripherals like UART in a microcontroller?

Semaphore disables interrupts and so will this cause other operations like receiving data on SPI to get corrupt?
Disabling interrupts cannot corrupt the data on the hardware interface.
The problem is if the data is received by the hardware peripheral and then the it raises an interrupt to have the processor collect the data then this will be delayed. If it is delayed for too long then potentially more data will have been received. Depending on the peripheral, either the new data or the old data will have to be discarded. Either way stream of data will be incomplete.
In most cases it is difficult to predict or test how long it is safe to disable interrupts for, so if possible it is best to avoid turning interrupts off.
If the peripheral includes a FIFO buffer, then the length of time that it is safe to disable interrupts for may be increased (although still difficult to predict).
Most modern microcontrollers have many ways to avoid disabling interrupts:
A better approach is to have the peripheral transfer the data to memory with DMA, so no interrupt is required at all.
Most modern processor cores provide ways to implement a semaphore do not even need to disable interrupts.
There's no standard way of implementing a semaphore. To disable all interrupts on the MCU is one way to do it, but it's a very poor amateur way of doing so. Because in more complex applications with multiple interrupts, this will make all real-time considerations and calculations a nightmare.
It creates subtle but severe bugs. Particularly when some quack has done so from deep inside some driver code. You import the driver into your project and suddenly previously working code breaks. In particular, be very careful about using various libs provided by silicon vendors - they are often of very poor quality.
There are better ways to do it, including:
Ensuring atomic access of shared variables, which can only be done with inline assembler or C11 _Atomic if supported.
Disabling one specific interrupt for a specific hardware peripheral, if it is possible to do do given the real-time considerations. Then this should be handled by the driver for that hardware peripheral in the form of setter/getter functions.
Use a "poor man's semaphore" in the form of a plain flag variable, by relying on the interrupt mechanism of the MCU blocking all other interrupts while the ISR is executing. Example.

TinyAVR 0-Series: Can I use pin-change sensing without entering interrupt handler?

I am evaluating the ATtiny806 running at 20MHz to build a cycle-accurate Intel 4004 microprocessor emulator. (I know it will be a bit too slow, but AVRs have a huge community.)
I need to synchronize to the external, two-phase non-overlapping clocks. These are not fast clocks (the original 4004 ran at 750kHz)
but if I spin-wait for every clock edge, I risk wasting most of my time budget.
The TinyAVR 0-series has a very nice pin-change interrupt facility that can be configured to trigger only on rising edges.
But, an interrupt routine round-trip is 8 cycles (3 in, 5 out).
My question is:
Can I leverage the pin-change sensing mechanism while never visiting an ISR?
(Other processor families let you poll for interruptible conditions without enabling interrupts from that peripheral). Can polling be done with a tight skip-on-bit/jump-back loop, followed by a set-bit instruction?
Straightforward way
You can always just poll on the level of the GPIO pin using the single cycle skip if bit set/clear instruction on the appropriate PORT register and bit.
But as you mention, polling does burn cycles so I'm not sure exactly what you want here - either a poll (that burns cycles but has low latency) or an interrupt (that has higher latency but allows processing to continue until the condition is true).
Note that if things get really tight and you are looking for, say, power savings by sleeping between clock signal transitions then you can do tricks like having an ISR that nevers returns (saving the IRET cycles) but that requires some careful coding probably with something like a state machine.
INTFLAG way
Alternately, if you want to use the internal pin state machine logic and you can live without interrupts, then you can use the INTFLAGS flags to check for the pin change configured in the ISC bits of the PINxCTRL register. As long as global interrupts are not enabled in SREG then you can spin poll on the appropriate INTFLAG bit to check/wait for the desired condition, and then write a 1 to that bit to clear the flag.
Note that if you want to make this fast, you will probably want to map the appropriate PORT to a VPORT since the VPORT registers are in I/O Memory. This lets you use SBIS to test the INTFLAG bit a single cycle and SBI to clear the bit in a single cycle (these instructions only work on IO memory and the normal PORT registers are not in IO Memory).
Finally one more complication, if you need to leave the interrupts on when doing this, it is probably possible by hacking the interrupt priority registers. You'd set the pin change to be on level 0, and then make sure the interrupts you care about are level 1 or higher, and then trick the interrupt controller into thinking that there is already a level 0 running so these interrupts do not actually fire. There are also other restrictions to this strategy so avoid it if at all possible.
Programmable logic way
If you want to get really esoteric, it is likely possible that you could route the input value of a pin to a configurable custom logic LUT in the chip and then route the output of that module to a bit that you test using a 1-cycle bit test (maybe an unused IO Pin). To do this, you'd feedback the output of the LUT back into one of its inputs and then use the LUT to create a strobe on the edge you are looking for. This is very complex, and also since the strobe has no acknowledgement that if the signal changes when you are not looking for it (in a spin check) then it will be lost and you will have to wait for the next edge (probably fatal in your application).

Interrupt vector table: why do some architectures employ a "jump table" VS an "array of pointers"?

On some architectures (e.g. x86) the Interrupt Vector Table (IVT) is indeed what it says on the tin: a table of vectors, aka pointers. Each vector holds the address of an Interrupt Service Routine (ISR). When an Interrupt Request (IRQ) occurs, the CPU saves some context and loads the vector into the PC register, thus jumping to the ISR. so far so good.
But on some other architectures (e.g. ARM) the IVT contains executable code, not pointers. When an IRQ occurs, the CPU saves some context and executes the vector. But there is no space in between these "vectors", so there is no room for storing the ISR there. Thus each "vector instruction" typically just jumps to the proper ISR somewhere else in memory.
My question is: what are the advantages of the latter approach ?
I would kinda understand if the ISRs themselves had fixed well-known addresses, and were spaced out so that reasonnable IRSs would fit in-place. Then we would save one indirection level, though at the expense of some fragmentation. But this "compact jump table" approach seems to have no advantage at all. What did I miss ?
Some of the reasons, but probably not all of them (I'm pretty much self educated in these matters):
You can have fall through (one exception does nothing, and just goes to the next in the table)
The FIQ interrupt (Fast Interrupt Requests) is the last in the table, and as the name suggest, it's used for devices that need immediate and low latency processing.
It means you can just put that ISR in there (no jumping), and process it as fast as possible. Also, the way FIQ was thought with it's dedicated registers, it allows for optimal implementation of FIQ handlers. See https://en.wikipedia.org/wiki/Fast_interrupt_request
I think it has do with simplifying the processor's hardware.
If you have machine instructions (jump instructions) in the vector interrupt table, the only extra thing the processor has to do when it has to jump to an interrupt handler is to load the address of the corresponding interrupt vector in the PC.
Whereas, if you have addresses in the interrupt vector table, the processor must be able to read the interruption handler start address from memory, and then jump to it.
The extra hardware required to read from memory and writing to a register is more complex than the required to just writing to a register.

Does the CPU stall when access external memory(SRAM) via FSMC

I am using a STM32f103 chip with a Cortex-m3 core in a project. According to the manual 3.3.1. Cortex-M3 instructions, load a 32bit word with a single LRD instruction takes 2 CPU cycles to complete (assuming the destination is not PC).
My understanding is that this is only true for reading from internal memories (Flash or internal SRAM)
When reading from an external SRAM via the FSMC, it must take more cycles to complete the read operation. During the read operation, does the CPU stall until the FSMC is able to put the data together? In other words, do I lose CPU cycles when accessing external memories?
Thank you.
Edit 1: Also assume all access are aligned 32bit access.
LDR and STR instructions are not interruptible. The FSMC is bridged from the AHB, and can run at a much slower rate, as you already know. For reads, the pipeline will stall until the data is ready, and this may cause increased worst-case interrupt latency. The write may or may not stall the pipe, depending on configuration. The reference manual says there is a two-word write buffer, but it appears that may only be used to buffer bursting memories. If you were using a CRAM (PSRAM) with a bursting interface, subsequent writes would likely not complete before the next instruction is executing, but a subsequent read would stall (longer) to allow the write to finish before initiating the read.
If using LDM and STM instructions to perform multiple reads or writes, these instructions are interruptible, and it is implementation defined as to whether they will restart from the beginning or continue when returned to. I haven't been able to find out how ST has chosen to implement this behavior. In either case, each individual bus transaction would should not be interrupted.
In regards to LDRD and STRD for working on 64-bit values, I found this discussion which references the following from the ARM-ARM:
"... LDRD, ... STRD, ... instructions are executed as a sequence of
word-aligned word accesses. Each 32-bit word access is guaranteed to
be single-copy atomic. The architecture does not require subsequences
of two or more word accesses from the sequence to be single-copy
atomic."
So, it appears that LDRD and STRD are likely to function the same way LDM and STM function.
The STM32F1xx FSMC has programmable wait states - if for your memory that is not set to zero, then it will indeed take additional cycles. The data bus for the external memory is either 16 or 8 bits, so 32 bit accesses will also take additional cycles. Also the write FIFO can cause the insertion of wait states.
On the other hand the Cortex-M is a Harvard architecture core with different memories on different buses so that instruction and data fetches can occur simultaneously, minimising ot some extent processor stalling.