How getrandom syscall guarantees randomness - cryptography

I've read recently topics about cryptography, and seen the importance of randomness of the seeds in different algorithms.
And after reading getrandom manpage, It says that the random buffer will be taken based on an environment of the kernel, I want to know how exactly this is done, I could guess that the clock may be part of this environment, and maybe other hardware parts that I don't know.

Generally this is done by using low-order bits of a high-resolution clock whenever 'external' events happen -- external in this case being events that are not governed by the same clock domain. So things like keystrokes and network packet arrivals are good choices.

Related

Could a tight loop destroy cells of a microcontroller's flash?

It is well-known that Flash memory has limited write endurance, less so that reads could also have an upper limit such as mentioned in this Flash endurance test's Conclusion (3rd point).
On a microcontroller the code is typically stored in Flash, and is executed by fetching code words directly from the Flash cells.(at least this is most commonly so on 8 bit micros, some 32 bit micros might have some small buffer).
Depending on the particular code, it might happen that a location is accessed extremely frequently, such as if on the main execution path there is some busy loop, such as a wait for an interrupt (for example from a timer, synchronizing execution to a fixed interval).This could generate 100K or even more (read) accesses per second on average to a single Flash cell (depending on clock and the particular code).
Could such code actually destroy the cells of the Flash underneath it?(Is there any necessity to be concerned about this particular problem when designing code for microcontrollers? Such as part of a system which is meant to operate for years? Of course the Flash could be periodically verified by CRC, but that doesn't prevent the system failing if it happens, only that the failure will more likely happen in a controlled manner)
Only erasing/writing will affect the memory cells, not reading. You don't need to consider the number of reads when designing the program.
Programmed flash memory does age though, meaning that the value of the cells might not be reliable after a certain amount of years. This is known as data retention and depends mainly on temperature. MCU manufacturers typically specify a worse case in years, assuming that the part is kept in maximum specified ambient temperature.
This is something to consider for products that are expected to live long (> 10 years), particularly in environments where high temperatures can be expected. CRC and/or ECC is a good counter-measure against data retention, although if you do find that a cell has been corrupted, it typically just means that the application should shut down to a non-recoverable safe state.
I know of two techniques to approach this issue:
1) One technique is to set aside a const 32-bit integer variable in the system code. Then calculate a CRC32 checksum of the compiled binary image, and inserting the checksum into the reserved variable using an ELF-editor.
A module in the system software will then calculate a CRC32 over the flash area occupied by the application and compare to the "stored" value.
If you are using GCC, the linker can define a symbol to tell you where the segment stops. This method can detect errors but cannot correct them.
2) Another technique is to use a microcontroller that supports Flash ECC. TI sells Cortex-R4 MCUs which support Flash ECC (Hercules series).
I doubt that this is a practical concern. The article you cited vaguely asserts that this can happen but with no supporting evidence or quantification of the effect. There is a vague, unsupported and unquantified reference in the introduction:
[...] flash degrades over time from erasing/writing (or even just reading, although that decay is slower) [...]
Then again in the conclusion:
We did not check flash decay for reads, but reading also causes long term decay. It would be interesting to see if we can read a spot enough times to cause failure.
The author may be referring to read-disturbance in NAND flash, but microcontrollers do not use NAND flash for code storage/execution since it is not random-access. Read disturb is not a permanent effect, erasing and re-writing the affected block restores endurance. NAND controllers maintain read counts for blocks and automatically copy and erase blocks as necessary. They also employ ECC to detect and correct errors, and identify "write-worn" areas.
There is the potential for long-term "bit-rot" but I doubt that it is caused specifically by reading rather just ageing.
Most RTOS systems spend the majority of their processing time in a do-nothing idle loop, and run happily 24/7 365 days a year. Some processors support a wait-for-interrupt instruction that halts the CPU in the idle loop, but by no means all, and it is not uncommon not to use such an instruction. Processors with flash accelerators or caches may also prevent continuous rapid fetch from a single location, but again that is by no means all microcontrollers.

Does laptop Embedded Controller have limited writes?

I am wondering if I should be worried about excessive writes to the embedded controller registers on my laptop. I am guessing that if they are true registers, they probably act more like RAM rather than flash memory so this isn't a problem.
However, I have a script to modify the registers in my laptop's EC to better control the fan speed curve. It has to be re-applied after each power change event such as sleep/wake as well as power cable events, so it happens fairly often. I just want to make sure I am not burning out my chips in the process.
The script I am using to write to the EC is located here:
https://github.com/RayfenWindspear/perl-acpi-fanspeed
Well, it seems you're writing to ACPI registers. Registers here do not refer to any specific hardware; it just means its a specific address that you can reach using a specific bus. It's however highly unlikely that something that you have to re-write after every power cycle is overwriting permanent storage, so for all practical aspects I'd assume that you can rely on this for as long as your laptop lives.
Hardware peripherals are almost universally implemented as SRAM cells. They will not wear out first. The fan you are controlling will have a limited number of start/stop cycles. So it is much more likely that the act of toggling these registers will wear something else out prematurely (than the SRAM type memory cell itself).
To your particular case, correctly driving a fan/motor can significantly improve it's life time. Over driving a fan/motor does not always make it go faster, but instead creates heat. The heat weakens the wiring and eventually the coils will short reducing drive and eventually wearing out. That said, the element being cooled can be damaged by excess heat so tuning things just to reduce sound may not make sense.
Background details
Generally, the element is called a Flip-Flop with various forms. SystemRDL is an example as well as SystemC and others where digital engineers will model these. In digital hardware, the flip-flops have default or reset values. This is fixed like ROM on each chip and is not normally re-programmable, uses EEPROM technologyNote1 or is often configured via input lines which the hardware designer can pull them high/low with a resistor or connect them to another elements 'GPIO'.
It is analogous to 'initdata'. Program values that aren't zero get copied from flash, disk, etc to memory at program startup. So the flip-flops normally do not hold state over a power cycle; something else does this.
The 'Flash' technology is based off of a floating gate and uses 'quantum tunnelling' to program the floating gates. This process is slightly destructive. It was invented by Fowler and Nordheim in 1967, but wide spread electronics industry did not start to produce them until the early 90s with NOR flash followed by NAND flash and many variants. But the underlying physics is the same; just the digital connections are different. So as well as this defect you are concerned about, the flash technology actually followed many hardware chips such as 68k, i386, etc. So 'flip-flops' were well established and typically the 'register' part of the logic is not that great of a typical chip and a flip-flop uses the same logic (gates) as the rest of the chip logic. Meaning that using flash would be an extra overhead with little benefit.
Some additional take-away is that the startup up and shutdown of chips is usually the most destructive time. Often poor hardware designers do not put proper voltage supervision and some lines maybe floating with the expectation that system programs will set them immediately. Reset events, ESD, over heating, etc will all be more harmful than just the act of writing a peripheral register.
Note 1: EEPROM typically has 100,000+ cycles. These features are typically only used once at manufacture time to set a chip configuration for the system. These are actually quite rare, but possible.
The MLC (multi-level) NAND flash in SSD has pathetically low cycles like 8,000 in some cases. The SLC (single level) old school flash have 10,000+ cycles, but people demand large data formats.

Why are GPIOs used?

I have been searching around [in vain] for some good links/sources to help understand GPIOs and why they are used in embedded systems. Can anyone please point me to some ?
In any useful system, the CPU has to have some way to interact with the outside world - be it lights or sounds presented to the user or electrical signals used to communicate with other parts of the system. A GPIO (general purpose input/output) pin lets you either get input for your program from outside the CPU or to provide output to the user.
Some uses for GPIOs as inputs:
detect button presses
receive interrupt requests from external devices
Some uses for GPIOs as outputs:
blink an LED
sound a buzzer
control power for external devices
A good case for a bidirectional GPIO or a set of GPIOs can be to "bit-bang" a protocol that your SoC doesn't provide natively. You could roll your own SPI or I2C interface, for example.
The reason you cannot find an answer is probably because if you know what an embedded system is and does, or indeed anything about digital electronic systems, then the answer is rather too obvious to write down! That is to say that if you get as far a s actually implementing a working embedded system, you should already know what they are.
GPIO pins are as a minimum, two state digital logic I/O. In most cases some or all of them may also be interrupt sources. These interrupts may have options for be rising, falling, dual edge, or level triggering.
On some targets GPIO pins may have configurable output circuitry to allow, for example, external pull-ups to be omitted, or to allow connection to devices that require open-collector outputs, and in some cases even to provide filtering of high frequency noise and glitches.
In most embedded systems, a processor will be ultimately responsible for sensing the state of various devices which translate external stimuli to digital-level logic voltages (e.g. when a button is pushed, a pin will go low; otherwise it will sit high), and controlling devices which translate logic-level voltages directly into action (e.g. when a pin is high, a light will go on; when low, it will go off). It used to be that processors did not have general-purpose I/O, but would instead have to use a shared bus communicate with devices that could process I/O requests and set or report the state of the external circuits. Although this approach was not entirely without advantages (one processor could monitor or control thousands of circuits on a shared bus) it was inconvenient in many real-world applications.
While it is possible for a processor to control any number of inputs and outputs using a four-wire SPI bus or even a two-wire I2C bus, in many cases the number of signals a processor will need to monitor or control is sufficiently small that it's easier to simply include the circuitry to monitor or control some signals directly on the chip itself. Although dedicated interfacing hardware will frequently have output-only or input-only pins (the person choosing the hardware interface chips will know how many signals need to be monitored, and how many need to be controlled), a particular family of processor may be used in some applications that require e.g. 4 inputs and 28 outputs, and other applications that require 28 inputs and 4 outputs. Instead of requiring that different parts be used in applications with different balances between inputs and outputs, it's simpler to just have one part with inputs that can be configured as inputs or outputs, as needed.
I think you have it backwards. GPIO is the default in electronics. It's a pin, a signal, that can be programmed. Everything is made up of these. For a processor, dedicated peripherals are a special case, they're extras for when you know you want a more limited function.
From a chip manufacturers perspective, you often don't know exactly what the user needs so you can't make the exact peripherals on your chip. You make generic ones instead. Many applications are so rare that there's no market for a specific chip. Only thing you can do is use GPIO or make specific hardware yourself. Also, all (unused or potentially unused) pins are worth turning into GPIO because that makes the part even more generic and reusable. Generic and reusable is very nearly the whole point of programmable chips, otherwise you would just make ASICs.
Some particularly suitable applications:
Reset parts (chips) in a system
Interface to switches, keypads, lights (all they have is one pin/signal!)
Controlling loads with relays or semicondctor switches (on-off)
Solenoid, motor, heater, valve...
Get interrupts from single signals
Thermostats, limit switches, level detectors, alarm devices...
BTW, the Parallax Propeller has practically nothing but GPIO pins. Peripherals are made in software. It works very well for many uses.

Testing Real Time Operating System for Hardness

I have an embedded device (Technologic TS-7800) that advertises real-time capabilities, but says nothing about 'hard' or 'soft'. While I wait for a response from the manufacturer, I figured it wouldn't hurt to test the system myself.
What are some established procedures to determine the 'hardness' of a particular device with respect to real time/deterministic behavior (latency and jitter)?
Being at college, I have access to some pretty neat hardware (good oscilloscopes and signal generators), so I don't think I'll run into any issues in terms of testing equipment, just expertise.
With that kind of equipment, it ought to be fairly easy to sync the o-scope to a steady clock, produce a spike each time the real-time system produces an output, an see how much that spike varies from center. The less the variation, the greater the hardness.
To clarify Bob's answer maybe:
Use the signal generator to generate a pulse at some varying frequency.
Random distribution across some range would be best.
use the signal generator (trigger signal) to start the scope.
the RTOS has to respond, do it thing and send an output pulse.
feed the RTOS output into input 2 of the scope.
get the scope to persist/collect mode.
get the scope to start on A , stop on B. if you can.
in an ideal workd, get it to measure the distribution for you. A LeCroy would.
Start with a much slower trace than you would expect. You need to be able to see slow outliers.
You'll be able to see the distribution.
Assuming a normal distribution the SD of the response time variation is the SOFTNESS.
(This won't really happen in practice, but if you don't get outliers it is reasonably useful. )
If there are outliers of large latency, then the RTOS is NOT very hard. Does not meet deadlines well. Unsuitable then it is for hard real time work.
Many RTOS-like things have a good left edge to the curve, sloping down like a 1/f curve.
Thats indicitive of combined jitters. The thing to look out for is spikes of slow response on the right end of the scope. Keep repeating the experiment with faster traces if there are no outliers to get a good image of the slope. Should be good for some speculative conclusion in your paper.
If for your application, say a delta of 1uS is okay, and you measure 0.5us, it's all cool.
Anyway, you can publish the results ( and probably in the publish sense, but certainly on the web.)
Link from this Question to the paper when you've written it.
Hard real-time has more to do with how your software works than the hardware on its own. When asking if something is hard real-time it must be applied to the complete system (Hardware, RTOS and application). This means hard or soft real-time is system design issues.
Under loading exceeding the specification even a hard real-time system will fail (hopefully with proper failure indication) while a soft real-time system with low loading would give hard real-time results. How much processing must happen in time and how much pre/post processing can be performed is the real key to hard/soft real-time.
In some real-time applications some data loss is not a failure it should just be below a certain level, again a system criteria.
You can generate inputs to the board and have a small application count them and check at what level data is going to be lost. But that gives you a rating specific to that system running that application. As soon as you start doing more processing your computational load increases and you now have a different hard real-time limit.
This board will running a bare bones scheduler will give great predictable hard real-time performance for most tasks.
Running a full RTOS with heavy computational load you probably only get soft real-time.
Edit after comment
The most efficient and easiest way I have used to measure my software's performance (assuming you use a schedular) is by using a free running hardware timer on the board and to time stamp my start and end of my cycle. Or if you run a full RTOS time stamp you acquisition and transition. Save your Max time and run a average on the values over a second. If your average is around 50% and you max is within 20% of your average you are OK. If not it is time to refactor your application. As your application grows the cycle time will grow. You can monitor the effect of all your software changes on your cycle time.
Another way is to use a hardware timer generate a cyclical interrupt. If you are in time reset the interrupt. If you miss the deadline you have interrupt handler signal a failure. This however will only give you a warning once your application is taking to long but it rely on hardware and interrupts so you can't miss.
These solutions also eliminate the requirement to hook up a scope to monitor the output since the time information can be displayed in any kind of terminal by a background task. If it is easy to monitor you will monitor it regularly avoiding solving the timing problems at the end but as soon as they are introduced.
Hope this helps
I have the same board here at work. It's a slightly-modified 2.6 Kernel, I believe... not the real-time version.
I don't know that I've read anything in the docs yet that indicates that it is meant for strict RTOS work.
I think that this is not a hard real-time device, since it runs no RTOS.
I understand being geek, but using oscilloscope to test a computer with ethernet/usb/other digital ports and HUGE internal state (RAM) is both ineffective and unreliable.
Instead of watching wave forms, you can connect any PC to the output port and run proper statistical analysis.
The established procedure (if the input signal is analog by nature) is to test system against several characteristic inputs - traditionally spikes, step functions and sine waves of different frequencies - and measure phase shift and variance for each input type. Worst case is then used in specifications of the system.
Again, if you are using standard ports, you can easily generate those on PC. If the input is truly analog, a separate DAC or simply a good sound card would be needed.
Now, that won't say anything about OS being real-time - it could be running vanilla Linux or even Win CE and still produce good and stable results in those tests if hardware is fast enough.
So, you need to simulate heavy and varying loads on processor, memory and all ports, let it heat and eat memory for a few hours, and then repeat tests. If latency stays constant, it's hard real-time. If it doesn't, under any load and input signal type, increase above acceptable limit, it's soft. Otherwise, it's advertisement.
P.S.: Implication is that even for critical systems you don't actually need hard real-time if you have hardware.

What is the impact of virtualisation on cryptographically strong random number generators?

/dev/random and /dev/urandom use environmental noise to generate randomness.
With a virtualised server there can be multiple instances of an Operating System on one hardware configuration. These operating systems will all be sourcing their randomness from the same environmental noise.
Does this mean as a group the random number generators strength is reduced as all OS instances are basing their calculations of the same input? Or, is the environmental noise partitioned out so that sharing doesn't occur?
If the latter is true, I can see this reducing the effectiveness of /dev/urandom because it reuses its internal pool and with less environmental input, reduces entropy.
/dev/random should be ok because it blocks until enough noise is acquired... unless of course the OS instances are all sharing the input.
So, the question: What is the impact of virtualisation on cryptographically strong random number generators, specifically those that use environmental noise?
I couldn't find any references quickly, but it would seem to me that the entropy is derived from the kernel data structures for the devices, not the actual devices themselves. Since these would be independent regardless of virtualization, I suspect the answer is not much.
[EDIT] After peeking at the kernel source (actually patch history), it looks like Linux, at least, gathers entropy from keyboard presses, mouse activity, interrupt timing (but not all interrupts), and block device request finishing times. On a virtualized system, I suspect that mouse/keyboard events would be pretty low and thus not contribute to the entropy gathered. Presumably this would be offset by additional network I/O interrupt activity, but it's not clear. In this respect, I don't think it differs much from non-VM server.
Thanks.
From what I understand that a system that relies on network I/O for entropy is susceptible to man in the middle attacks. I found the follow article that discusses appropriate sources of entropy. Their suggestion is to remove network I/O from the Linux kernel because of its susceptibility.
I think that means that there is possibility for exploiting the common hardware in a virtualised environment. The chance is increased if network I/O is used. Otherwise it is low enough not to be of a concern for all but the most secure of applications. In such instances it probably safer to host your own application anyway.
By definition, the randomness of a cryptographically strong PRNG should not be affected by virtualization. As you mention, the difference between /dev/random and /dev/urandom [ref: http://en.wikipedia.org/wiki/Urandom/] is that a read operation on /dev/random will block if the system has gathered insufficient entropy to produce the desired amount of random data. You may also write to /dev/random to mix your own data into the entropy pool.