Does flash memory ever change on execution? - embedded

I'm required to do a code verification using CRC. In this case, all I do is pass every byte found flash memory through an algorithm to calculate the CRC and compare the result to a predefined CRC value.
However, I'm hung up on the idea that the flash memory might change at some point, causing the CRC verification to fail.
Assuming that the code isn't touched again whatsoever, is it possible that flash memory will change during execution? If so, what can cause it to change? And how do I avoid said change?

Flash memory only means it retains its content in the absence of power; flash memory is definitely erasable / reprogrammable. The separate term Read-Only-Memory (ROM) means it cannot be altered after the initial write.
But memory doesn't change unless a CPU instruction touches it or it degrades or is affected by external factors. Flash memory contents might last ten years unperturbed. Usually it is the number of read/write cycles that degrade flash memory before age does. High static electrical charges could corrupt flash but magnetic fields should have little effect.
If you have any influence over the hardware specs, ROM should be considered if that is the primary intent; it has several advantages over flash for this purpose.
Lastly, you note that you will pass "every byte" of memory through your CRC algorithm. If the correct checksum is to be stored within the same memory medium, you have a recursive problem of trying to precompute a checksum that contains it's own checksum. In most cases, the valid checksum should be located in some segment of the memory that is not itself subjected to the scan.

In any event if it were to change spontaneously that is exactly what your code verification is intended to catch and you would want and expect the CRC to fail, so there is no problem - it is doing its job
It is mostly not a matter of spontaneously changing, but rather one of not being written correctly in the first instance, or possibly protecting against malicious or accidental tampering. If part of the flash is used for variable non-volatile storage, you would obviously not include that area in your CRC. If you are partitioning the same flash into code space and NV storage, potentially errors in the NV storage code could inadvertently modify the code space, so your CRC protects against that and also external tampering (via JTAG for example).
Flash memory is subject to erase/write cycle endurance, and after a number of cycles some bits may stick "high". Endurance varies between parts from around 10000 to 100000, and is seldom a problem for code storage. Flash memory also has a nominal data retention time; this is normally quoted at 10 years; but that is worst-case extreme condition rating - again your CRC guards against these effects.

Related

How do you design a configuration file system that copes with abrupt shutdown in embedded environment?

I'm designing a software that manages configuration file at application layer in embedded Linux.
Generally, it maintains two copies of the configuration file, one in RAM and one in flash memory. As soon as end-users update setting(s) by UI, the software saves it to the file in RAM, and then copy-paste it to the file in flash memory.
This scheme makes sure best stability in that the software reflects reality at the next second. However, the scheme compromises longevity to flash memory by accessing it every time.
As to longevity issue, I've thought about it by having a dedicated program doing this housekeeping, and adds this program to crontab then let it run like every 30 mins.
(Note: flash memory wears off only during erase cycles; the program only does housekeeping if the both files are not the same.)
But if the file in RAM is waiting for the program to do housekeeping and system shuts down unexpectedly, the file will lose.
So I'm thinking is there a way to have both longevity and not losing file at the same time? Or am I missing something?
There are many different reasons why flash can get corrupted: data retention over time, erase/write failures which are primarily caused by erase/write cycle wear, clock inaccuracies, read disturb in case of NAND flash, and even less likely errors sources such as cosmic rays or EMI. But also as in your case, algorithmic layer problems such as a flash erase/write getting interrupted by power loss or reset caused by EMI.
Similarly, there are many ways to deal with these various problems.
CRC16 or CRC32 depending on flash size is the classic way to deal with pretty much all possible flash errors, particularly with data retention since it most often manifests itself as single-bit errors, which CRC is great at discovering. It should ideally be designed so that the checksum is placed at the end of each erase-size segment. Or in case erase-size is very small (emulated eeprom/data flash etc), maybe a single CRC32 at the end of all data. Modern MCUs often have a CRC hardware peripheral which might be helpful.
Optionally you can let the CRC algorithm repair single bit errors, though this practice is often banned in high integrity systems.
ECC is used on NAND flash or in high integrity systems. Traditionally done through software (which is quite cumbersome), but lately also available through built-in hardware support, particularly on the "safety/chassis" kind of automotive microcontrollers. If you wish to use ECC then I highly recommend picking a part with such built-in support, then it can be used to replace manual CRC (which is somewhat painful to deal with real-time wise).
These parts with hardware ECC may also support a feature with an area where you can write down variables to have the hardware handle writing them to flash in the background, kind of similar to DMA.
Using the flash segment as FIFO. When storing reasonably small amounts of data in memory with large erase sizes, you can save flash erase/write cycles by only erasing the whole segment once, after which it will likely be set to "all ones" 0xFFFF... When writing, you look for the last available chunk of memory which is "all ones" and write there, even though the same data was previously written just before it. And when reading, you fetch the last written chunk before "all ones". Only when the whole erase size is used up do you perform an erase and start over from the beginning - data needs to be stored in RAM during this.
I strongly recommend picking a part with decent data flash though, meaning small erase sizes - so that you don't need to resort to hacks like this.
Mirror segments where all memory is stored as duplicates in two separate segments is mandatory practice for high integrity systems, though this can also be used to prevent corruption during power loss/unexpected resets and of course flash corruption in general. The idea is to always have at least one segment of intact data at all times, and optionally repair a corrupt one by overwriting it with the correct one at start-up. Also meaning that one segment must be verified to be correct and complete before writing to the next.
Keep the product cool. This is a hardware solution obviously, but data retention in particular is heavily affected by ambient temperature. The manufacturer normally guarantees some 15-20 years or so up to 85°C, but that might mean 100 years if you keep it at <25°C. As in, whenever possible, avoid mounting your MCU PCB near exhausts, oil coolers, hydraulics, heating elements etc etc.
Mirror segments in combination with CRC and/or ECC is likely the solution you are looking for here. Again, I strongly recommend to pick a MCU with dedicated data flash, meaning small erase segments and often far more erase/write cycles, ideally >100k.

What is the 'erased' value of a memory location on an sdhc card?

I am writing to a micro sd card (SDHC) for an embedded application. The application needs to be able to write very quickly to the card in real time.
I have seen that erasing the memory blocks beforehand makes the write a lot faster. Unfortunately I am struggling to get the erase command (and ACMD23) to work as the driver provided for the development board I am using is not complete.
Is there any way to erase the card by maybe writing an 'erased' value to the memory blocks beforehand? For example, if after erasing a block it becomes 0x12345678 can I just write this value instead to make it erased in order to get around using the erase command? Or is there some other way that the card marks a block as erased?
Thanks
I have tried writing 0xffffffff as the erased value but it has not helped.
I think you're misunderstanding how flash memory works.
Flash memory has blocks which are way bigger than what typical filesystems expect. Additionally, they have a limited number of erase cycles. Hence, the flash controller provides an abstraction that maps virtual sectors to physical blocks.
A sector that is "erased" is not actively erased at all. It's just unmapped, and an empty block (if available) is mapped in its place. In the background, the flash controller shuffles sectors around and erases physical blocks as they become wholly unused.
As you can see, the quality of the flash controller matters here. It's not even a driver issue, typically. The driver just sends the command; the flash controller executes them. If you need better performance, get a better SD card.

Could a tight loop destroy cells of a microcontroller's flash?

It is well-known that Flash memory has limited write endurance, less so that reads could also have an upper limit such as mentioned in this Flash endurance test's Conclusion (3rd point).
On a microcontroller the code is typically stored in Flash, and is executed by fetching code words directly from the Flash cells.(at least this is most commonly so on 8 bit micros, some 32 bit micros might have some small buffer).
Depending on the particular code, it might happen that a location is accessed extremely frequently, such as if on the main execution path there is some busy loop, such as a wait for an interrupt (for example from a timer, synchronizing execution to a fixed interval).This could generate 100K or even more (read) accesses per second on average to a single Flash cell (depending on clock and the particular code).
Could such code actually destroy the cells of the Flash underneath it?(Is there any necessity to be concerned about this particular problem when designing code for microcontrollers? Such as part of a system which is meant to operate for years? Of course the Flash could be periodically verified by CRC, but that doesn't prevent the system failing if it happens, only that the failure will more likely happen in a controlled manner)
Only erasing/writing will affect the memory cells, not reading. You don't need to consider the number of reads when designing the program.
Programmed flash memory does age though, meaning that the value of the cells might not be reliable after a certain amount of years. This is known as data retention and depends mainly on temperature. MCU manufacturers typically specify a worse case in years, assuming that the part is kept in maximum specified ambient temperature.
This is something to consider for products that are expected to live long (> 10 years), particularly in environments where high temperatures can be expected. CRC and/or ECC is a good counter-measure against data retention, although if you do find that a cell has been corrupted, it typically just means that the application should shut down to a non-recoverable safe state.
I know of two techniques to approach this issue:
1) One technique is to set aside a const 32-bit integer variable in the system code. Then calculate a CRC32 checksum of the compiled binary image, and inserting the checksum into the reserved variable using an ELF-editor.
A module in the system software will then calculate a CRC32 over the flash area occupied by the application and compare to the "stored" value.
If you are using GCC, the linker can define a symbol to tell you where the segment stops. This method can detect errors but cannot correct them.
2) Another technique is to use a microcontroller that supports Flash ECC. TI sells Cortex-R4 MCUs which support Flash ECC (Hercules series).
I doubt that this is a practical concern. The article you cited vaguely asserts that this can happen but with no supporting evidence or quantification of the effect. There is a vague, unsupported and unquantified reference in the introduction:
[...] flash degrades over time from erasing/writing (or even just reading, although that decay is slower) [...]
Then again in the conclusion:
We did not check flash decay for reads, but reading also causes long term decay. It would be interesting to see if we can read a spot enough times to cause failure.
The author may be referring to read-disturbance in NAND flash, but microcontrollers do not use NAND flash for code storage/execution since it is not random-access. Read disturb is not a permanent effect, erasing and re-writing the affected block restores endurance. NAND controllers maintain read counts for blocks and automatically copy and erase blocks as necessary. They also employ ECC to detect and correct errors, and identify "write-worn" areas.
There is the potential for long-term "bit-rot" but I doubt that it is caused specifically by reading rather just ageing.
Most RTOS systems spend the majority of their processing time in a do-nothing idle loop, and run happily 24/7 365 days a year. Some processors support a wait-for-interrupt instruction that halts the CPU in the idle loop, but by no means all, and it is not uncommon not to use such an instruction. Processors with flash accelerators or caches may also prevent continuous rapid fetch from a single location, but again that is by no means all microcontrollers.

Estimating available RAM left with safety margin in C (STM32F4)

I am currently developing application for STM32F407 using STM32CubeMx and Keil uVision. I know that dynamic memory allocation in embedded systems is mostly discouraged, but from spot to spot on internet I can find some arguments in favor of it.
Due to my inventors soul I wanted to try to do it, but do it safely. Let's assume I'm creating a dynamically allocated fifo for incoming UART messages, holding structs composed of the msg itself and its' length. However I wouldn't like to consume all the heap size doing so, therefore I want to check how much of it I have left: Me new (?) idea is to try temporarily allocating some big chunk of memory (say 100 char) - if it's successful, I accept the incoming msg, if not - it means that I'm running out of heap and ignore the msg (or accept it and dequeue the oldest). After checking I of course free the temp memory.
A few questions arise in my mind:
First of all, does it make sens at all? Do you think, basic on your experience, that it could be usefull and safe?
I couldn't find precise info about what exactly shares RAM in ES (I know about heap, stack and volatile vars) so my question is: providing that answer to 1. isn't "hell no go home", what size of the temp memory checker would you pick for the mentioned controller?
About the micro itself - it has 192kB RAM, however in the Drivers\CMSIS\Device\ST\STM32F4xx\Source\Templates\arm\startup_stm32f407xx.s file only 512B+1024B are allocated for heap and stack - isn't that very little, leaving the whooping, remaining 190kB for volatile vars? Would augmenting the heap size to, say 50kB be sensible? If yes, do I do it directly in this file or it's a better practice to do it somewhere else?
Probably for some of you "safe dynamic memory" and "embedded" in one post is both schocking and dazzling, but keep in mind that this is experimenting and exploring new horizons :) Thanks and greetings.
Keil uVision describes only the IDE. If you are using KEil MDK-ARM which implies ARM's RealView compiler then you can get accurate heap information using the __heapstats() function.
__heapstats() is a little strange in that rather than simply returning a value it outputs heap information to a formatted output stream facilitated by a function pointer and file descriptor passed to it. The output function must have an fprintf() like interface. You can use fprintf() of course, but that requires that you have correctly retargetted the stdio
For example the following:
typedef int (*__heapprt)(void *, char const *, ...);
__heapstats( (__heapprt)fprintf, stdout ) ;
outputs for example:
4180 bytes in 1 free blocks (avge size 4180)
1 blocks 2^11+1 to 2^12
Unfortunately that does not really achieve what you need since it outputs text. You could however implement your own function to capture the data in memory and parse the result. You may only need to capture the first decimal digit characters and discard anything else, except that the amount of free memory and the largest allocatable block are not necessarily the same thing of course. Fragmentation is indicated by the number or free blocks and their average size. You can perhaps guarantee to be able to allocate at least an average sized block.
The issue with dynamic allocation in embedded systems are to do with handling memory exhaustion and, in real-time systems, the non-deterministic timing of both allocation and deallocation using the default malloc/free implementations. In your case you might be better off using a fixed-block allocator. You can implement such an allocator by creating a static array of memory blocks (or by dynamically allocating them from the heap at start-up), and placing a pointer to each block on a queue or linked list or stack structure. To allocate you simply remove a pointer from the queue/list/stack, and to free you place a pointer back. When the available blocks structure is empty, memory is exhausted. It is entirely deterministic, and because it is your implementation can be easily monitored for performance and capacity.
With respect to question 3. You are expected to adjust the heap and system stack size to suit your application. Most tools I have used have a linker script that automatically allocates all available memory not statically allocated, allocated to a stack or reserved for other purposes to the heap. However MDK-ARM does not do that in the default linker scripts but rather allocates a fixed size heap.
You can use the linker map file summary to determine how much space is unused and manually expand the heap. I usually do that leaving a small amount of unused space to account for maintenance when the amount of statically allocated data may increase. At some point however; you end up running out of memory, and the arcane error messages from the linker may not make it obvious that your heap is just too big. It is possible to override the default linker script and provide your own, and no doubt possible then to automatically size the heap - though I have never taken the trouble to try it.
Okay I have tested my idea with dynamic heap free space checking and it worked well (although I didn't perform long-run tests), however #Clifford answer and this article convinced me to abandon the idea of dynamic allocation. Eventually I implemented my own, static heap with pages (2d array), occupied pages indicator (0-1 array of size of number of pages) and fifo of structs consisting of pointer to the msg on my static heap (actually just the index of the array) and length of message (to determine how many contiguous pages it occupies). 95% of msg I receive should take up only one page, 5% - 2 or 3 pages, so fragmentation is still possible, but at least I keep a tight rein on it and it affects only the part of memory assigned to this module of the code (in other words: the fragmentation doesn't leak to other parts of the code). So far it has worked without any problems and for sure is faster because the lookup time is O(n*m), n - number of pages, m - the longest page possible, but taking into consideration the laws of probability it goes down to O(n). Moreover n is always a lot smaller the number of all allocation units in memory, so way less to look for.

Does laptop Embedded Controller have limited writes?

I am wondering if I should be worried about excessive writes to the embedded controller registers on my laptop. I am guessing that if they are true registers, they probably act more like RAM rather than flash memory so this isn't a problem.
However, I have a script to modify the registers in my laptop's EC to better control the fan speed curve. It has to be re-applied after each power change event such as sleep/wake as well as power cable events, so it happens fairly often. I just want to make sure I am not burning out my chips in the process.
The script I am using to write to the EC is located here:
https://github.com/RayfenWindspear/perl-acpi-fanspeed
Well, it seems you're writing to ACPI registers. Registers here do not refer to any specific hardware; it just means its a specific address that you can reach using a specific bus. It's however highly unlikely that something that you have to re-write after every power cycle is overwriting permanent storage, so for all practical aspects I'd assume that you can rely on this for as long as your laptop lives.
Hardware peripherals are almost universally implemented as SRAM cells. They will not wear out first. The fan you are controlling will have a limited number of start/stop cycles. So it is much more likely that the act of toggling these registers will wear something else out prematurely (than the SRAM type memory cell itself).
To your particular case, correctly driving a fan/motor can significantly improve it's life time. Over driving a fan/motor does not always make it go faster, but instead creates heat. The heat weakens the wiring and eventually the coils will short reducing drive and eventually wearing out. That said, the element being cooled can be damaged by excess heat so tuning things just to reduce sound may not make sense.
Background details
Generally, the element is called a Flip-Flop with various forms. SystemRDL is an example as well as SystemC and others where digital engineers will model these. In digital hardware, the flip-flops have default or reset values. This is fixed like ROM on each chip and is not normally re-programmable, uses EEPROM technologyNote1 or is often configured via input lines which the hardware designer can pull them high/low with a resistor or connect them to another elements 'GPIO'.
It is analogous to 'initdata'. Program values that aren't zero get copied from flash, disk, etc to memory at program startup. So the flip-flops normally do not hold state over a power cycle; something else does this.
The 'Flash' technology is based off of a floating gate and uses 'quantum tunnelling' to program the floating gates. This process is slightly destructive. It was invented by Fowler and Nordheim in 1967, but wide spread electronics industry did not start to produce them until the early 90s with NOR flash followed by NAND flash and many variants. But the underlying physics is the same; just the digital connections are different. So as well as this defect you are concerned about, the flash technology actually followed many hardware chips such as 68k, i386, etc. So 'flip-flops' were well established and typically the 'register' part of the logic is not that great of a typical chip and a flip-flop uses the same logic (gates) as the rest of the chip logic. Meaning that using flash would be an extra overhead with little benefit.
Some additional take-away is that the startup up and shutdown of chips is usually the most destructive time. Often poor hardware designers do not put proper voltage supervision and some lines maybe floating with the expectation that system programs will set them immediately. Reset events, ESD, over heating, etc will all be more harmful than just the act of writing a peripheral register.
Note 1: EEPROM typically has 100,000+ cycles. These features are typically only used once at manufacture time to set a chip configuration for the system. These are actually quite rare, but possible.
The MLC (multi-level) NAND flash in SSD has pathetically low cycles like 8,000 in some cases. The SLC (single level) old school flash have 10,000+ cycles, but people demand large data formats.