How do I keep time without cumulative error? - embedded

How can you keep track of time in a simple embedded system, given that you need a fixed-point representation of the time in seconds, and that your time between ticks is not precisely expressable in that fixed-point format? How do you avoid cumulative errors in those circumstances.
This question is a reaction to this article on slashdot.
0.1 seconds cannot be neatly expressed as a binary fixed-point number, just as 1/3 cannot be neatly expressed as a decimal fixed-point number. Any binary fixed-point representation has a small error. For example, if there are 8 binary bits after the point (ie using an integer value scaled by 256), 0.1 times 256 is 25.6, which will be rounded to either 25 or 26, resulting in an error in the order of -2.3% or +1.6% respectively. Adding more binary bits after the point reduces the scale of this error, but cannot eliminate it.
With repeated addition, the error gradually accumulates.
How can this be avoided?

One approach is not to try to compute the time by repeated addition of this 0.1 seconds constant, but to keep a simple integer clock-tick count. This tick count can be converted to a fixed-point time in seconds as needed, usually using a multiplication followed by a division. Given sufficient bits in the intermediate representations, this approach allows for any rational scaling, and doesn't accumulate errors.
For example, if the current tick count is 1024, we can get the current time (in fixed point with 8 bits after the point) by multiplying that by 256, then dividing by 10 - or equivalently, by multiplying by 128 then dividing by 5. Either way, there is an error (the remainder in the division), but the error is bounded since the remainder is always less than 5. There is no cumulative error.

Another approach might be useful in contexts where integer multiplication and division is considered too costly (which should be getting pretty rare these days). It borrows an idea from Bresenhams line drawing algorithm. You keep the current time in fixed point (rather than a tick count), but you also keep an error term. When the error term grows too large, you apply a correction to the time value, thus preventing the error from accumulating.
In the 8-bits-after-the-point example, the representation of 0.1 seconds is 25 (256/10) with an error term (remainder) of 6. At each step, we add 6 to our error accumulator. Based on this so far, the first two steps are...
Clock Seconds Error
----- ------- -----
25 0.0977 6
50 0.1953 12
At the second step, the error value has overflowed - exceeded 10. Therefore, we increment the clock and subtract 10 from the error. This happens every time the error value reaches 10 or higher.
Therefore, the actual sequence is...
Clock Seconds Error Overflowed?
----- ------- ----- -----------
25 0.0977 6
51 0.1992 2 Yes
76 0.2969 8
102 0.3984 4 Yes
There is almost always an error (the clock is precisely correct only when the error value is zero), but the error is bounded by a small constant. There is no cumulative error in the clock value.

A hardware-only solution is to arrange for the hardware clock ticks to run very slightly fast - precisely fast enough to compensate for cumulative losses caused by the rounding-down of the repeatedly added tick-duration value. That is, adjust the hardware clock tick speed so that the fixed-point tick-duration value is precisely correct.
This only works if there is only one fixed-point format used for the clock.

Why not have 0.1 sec counter and every ten times increment your seconds counter, and wrap the 0.1 counter back to 0?

In this particular instance, I would have simply kept the time count in tenths of a seconds (or milliseconds, or whatever time scale is appropriate for the application). I do this all the time in small systems or control systems.
So a time value of 100 hours would be stored as 3_600_000 ticks - zero error (other than error that might be introduced by hardware).
The problems that are introduced by this simple technique are:
you need to account for the larger numbers. For example, you may have to use a 64-bit counter rather than a 32-bit counter
all your calculations need to be aware of the units used - this is the area that is most likely going to cause problems. I try to help with this problem by using time counters with a uniform unit. For example, this particular counter needs only 10 ticks per second, but another counter might need millisecond precision. In that case, I'd consider making both counters millisecond precision so they use the same units even though one doesn't really need that precision.
I've also had to play some other tricks this with timers that aren't 'regular'. For example, I worked on a device that required a data acquisition to occur 300 times a second. The hardware timer fired once a millisecond. There's no way to scale the millisecond timer to get exactly 1/300th of a second units. So We had to have logic that would perform the data acquisition on every 3, 3, and 4 ticks to keep the acquisition from drifting.
If you need to deal with hardware time error, then you need more than one time source and use them together to keep the overall time in sync. Depending on your needs this can be simple or pretty complex.

Something I've seen implemented in the past: the increment value can't be expressed precisely in the fixed-point format, but it can be expressed as a fraction. (This is similar to the "keep track of an error value" solution.)
Actually in this case the problem was slightly different, but conceptually similar—the problem wasn't a fixed-point representation as such, but deriving a timer from a clock source that wasn't a perfect multiple. We had a hardware clock that ticks at 32,768 Hz (common for a watch crystal based low-power timer). We wanted a millisecond timer from it.
The millisecond timer should increment every 32.768 hardware ticks. The first approximation is to increment every 33 hardware ticks, for a nominal 0.7% error. But, noting that 0.768 is 768/1000, or 96/125, you can do this:
Keep a variable for "fractional" value. Start it on 0.
wait for the hardware timer to count 32.
While true:
increment the millisecond timer.
Add 96 to the "fractional" value.
If the "fractional" value is >= 125, subtract 125 from it and wait for the hardware timer to count 33.
Otherwise (the "fractional" value is < 125), wait for the hardware timer to count 32.
There will be some short term "jitter" on the millisecond counter (32 vs 33 hardware ticks) but the long-term average will be 32.768 hardware ticks.

Related

Why does my simulation compute only a certain number of digits before only changing the power magnitude?

I am using another person's code to try and demonstrate this problem in physics:
a large mass M collides with a smaller mass m, which then moves moves to rebound off a wall returning to collide with the larger mass M. This process repeats until larger mass has turned and its velocity sign flips. If the mass of the larger block is 16*100^n (where n is an integer) times more massive than the first block the number of collisions between the large block and the small block compute the (n+1) digits of pi. For example: when the block is 1600 times bigger there are 31 collisions. If the block is 16000000 there are be 3141 collisions.
I did my code in vPython and it works, but only until a certain amount. I was able to get 31415 collisions when the original code. When I make N=5 the simulation completely fails and the screen turns black. Apparently this is because the time step is not small enough. So I tried to make it smaller and see if it can compute more numbers and it does. I was able to count 314159 collisions by changing the time step to 0.00001. But then I input N=6 and again it collapses. So I try to increase the time step to 0.000001 and it works but only gives me the number 3.14159e+6 without the extra digit of pi.
enter image description here
Can someone please tell why this is. Why do I not get the next digit. Is my computer not strong enough. I do not need to actually fix this problem, that is not the point, I just need to understand the limitations of my simulation and computer and why it cannot compute the next digit.

How to generate timestamps from the 33-bit PCR count

So I have been trying to wrap my head around mpeg-ts timing, and the PCR (program clock reference). I understand that this is used for video/audio synchronisation at the decoder.
My basic understanding so far is that everything is driven by a 27 Mhz clock (oscillator). This clock loops at a rate of 27 Mhz, counting from 0 - 299 and keeps repeating itself. Each time this "rollover" from 299 back to 0 occurs, then a 33-bit PCR counter is incremented by 1. In effect, the 33-bit PCR counter is therefore itself running at a rate of 90 kHz. So another way of saying this is that the 27 Mhz clock is divided by 300, giving us a 90 kHz clock.
This 90 kHz clock is then used for the 33-bit PCR counter.
I understand that historically 90 kHz was chosen because mpeg-1 used a 90kHz timebase. [see source here]
Anyway... I have read that the PCR 33-bit count values range from 0x000000000 all the way through to 0x1FFFFFFFF. And according to this, it shows what these values mean in terms of time as we humans understand it (Hours, Mins, Secs, etc):
00:00:00.000 (0x000000000)
to
26:30:43.717 (0x1FFFFFFFF)
So ultimately, my question is relating to how do these hex codes get translated into those time stamps. What would the equations be if someone gave me a hex code, and now I need to reproduce the time stamp?
I would appreciate any help :)
==========
I am closer to an answer myself. Looking at the range from 0x000000000 to 0x1FFFFFFFF, this is basically 0 to 8589934591 in decimal. Since the PCR clock is 90Khz, to get the number of seconds it takes to go from 0 to 8589934591 we can do 8589934591/90000 which gives us 95443.71768 seconds.
Unless you are creating a strict bitrate encoder for broadcast over satellite or terrestrial radio, the PCR doesn't matter that much.
Scenario:
You are broadcasting to a wireless receiver with no return channel, The receiver has a clock running at what it thinks is 90000 ticks per second. Your encoder is also running at 90000 tickets per second. How can you be sure the receiver and the broadcaster have the EXACT same definition of a second? Maybe one side is running a little fast or slow. To keep the clocks in sync, the encoder sends the current time occasionally, This value is the PCR. For example, if you are broadcasting at 15,040,000 bits per second, the receiver receives a 188 byte packet every 0.0000125 seconds. Every now and then (100 ms) the encoder will insert its current time. The receiver can compare this time to its internal clock and determine if is running faster or slower than the broadcast encoder. To keep the strict 235,000 packets per second (15,040,000/(188*8) = 235,000) the encoder will insert null packets. On the internet, the null packets take bandwidth, and have no value, so they are eliminated. Hence the PCR has almost no value anymore because its time is no longer relative the the reception rate.
To answer your question. Set the 27hz value to zero, use a recent DTS minus a small static amount (like 100ms), for the 90khz value.

Computing the approximate LCM of a set of numbers

I'm writing a tone generator program for a microcontroller.
I use an hardware timer to trigger an interrupt and check if I need to set the signal to high or low in a particular moment for a given note.
I'm using pretty limited hardware, so the slower I run the timer the more time I have to do other stuff (serial communication, loading the next notes to generate, etc.).
I need to find the frequency at which I should run the timer to have an optimal result, which is, generate a frequency that is accurate enough and still have time to compute the other stuff.
To achieve this, I need to find an approximate (within some percent value, as the higher are the frequencies the more they need to be imprecise in value for a human ear to notice the error) LCM of all the frequencies I need to play: this value will be the frequency at which to run the hardware timer.
Is there a simple enough algorithm to compute such number? (EDIT, I shall clarify "simple enough": fast enough to run in a time t << 1 sec. for less than 50 values on a 8 bit AVR microcontroller and implementable in a few dozens of lines at worst.)
LCM(a,b,c) = LCM(LCM(a,b),c)
Thus you can compute LCMs in a loop, bringing in frequencies one at a time.
Furthermore,
LCM(a,b) = a*b/GCD(a,b)
and GCDs are easily computed without any factoring by using the Euclidean algorithm.
To make this an algorithm for approximate LCMs, do something like round lower frequencies to multiples of 10 Hz and higher frequencies to multiples of 50 Hz. Another idea that is a bit more principled would be to first convert the frequency to an octave (I think that the formula is f maps to log(f/16)/log(2)) This will give you a number between 0 and 10 (or slightly higher --but anything above 10 is almost beyond human hearing so you could perhaps round down). You could break 0-10 into say 50 intervals 0.0, 0.2, 0.4, ... and for each number compute ahead of time the frequency corresponding to that octave (which would be f = 16*2^o where o is the octave). For each of these -- go through by hand once and for all and find a nearby round number that has a number of smallish prime factors. For example, if o = 5.4 then f = 675.58 -- round to 675; if o = 5.8 then f = 891.44 -- round to 890. Assemble these 50 numbers into a sorted array, using binary search to replace each of your frequencies by the closest frequency in the array.
An idea:
project the frequency range to a smaller interval
Let's say your frequency range is from 20 to 20000 and you aim for a 2% accurary, you'll calculate for a 1-50 range. It has to be a non-linear transformation to keep the accurary for lower frequencies. The goal is both to compute the result faster and to have a smaller LCM.
Use a prime factors table to easily compute the LCM on that reduced range
Store the pre-calculated prime factors powers in an array (size about 50x7 for range 1-50), and then use it for the LCM: the LCM of a number is the product of multiplying the highest power of each prime factor of the number together. It's easy to code and blazingly fast to run.
Do the first step in reverse to get the final number.

How would you most efficiently store latitude and longitude data?

This question comes from a homework assignment I was given. You can base your storage system off of one of the three following formats:
DD MM SS.S
DD MM.MMM
DD.DDDDD
You want to maximize the amount of data you can store by using as few bytes as possible.
My solution is based off the first format. I used 3 bytes for latitude: 8 bits for the DD (-90 to 90), 6 bits for the MM (0-59), and 10 bits for the SS.S (0-59.9). I then used 25 bits for the longitude: 9 bits for the DDD (-180 to 180), 6 bits for the MM, and 10 for the SS.S. This solution doesn't fit nicely on a byte border, but I figured the next reading can be stored immediately following the previous one, and 8 readings would use only 49 bytes.
I'm curious what methods others can come up. Is there a more efficient method to storing this data? As a note, I considered an offset based storage, but the problem gave no indication of how much the values may change between readings, so I'm assuming any change is possible.
Your suggested method is not optimal. You are using 10 bits (1024 possible values) to store a value in the range (0..599). This is a waste of space.
If you'll use 3 bytes for latitude, you should map the range [0, 2^24-1] to the range [-90, 90]. Hence each of the 2^24 values represents 180/2^24 degrees, which is 0.086 seconds.
If you want only 0.1 second accuracy, you'll need 23 bits for latitudes and 24 bits for longitudes (you'll get 0.077 seconds accuracy). That's 47 bit total instead of your 49 bits, with better accuracy.
Can we do even better?
The exact number of bits needed for 0.1 second accuracy is log2(180*60*60*10 * 360*60*60*10) < 46.256. Which means that you can use 46256 bits (5782 bytes) to store 1000 (lat,lon) pairs, but the mathematics involved will require dealing with very large integers.
Can we do even better?
It depends. If your data set has concentrations, you can store only some points and relative distances from these points, using less bits. Clustering algorithms should be used.
Sticking to existing technology:
If you used half precision floating point numbers to store only the DD.DDDDD data, you can be a lot more space-efficent, but you'd have to accept an exponent bias of 15, which means: The coordinates stored might not be exact, but at an offset from the original value.
This is due to the way floating point numbers are stored, essentially: A normalized significant is multiplied by an exponent to result in a number, instead of just storing a single value (as in integer numbers, the way you calculated the numbers for your solution).
The next highest commonly used floating point number mechanism uses 32 bits (the type "float" in many programming languages) - still efficient, but larger than your custom format.
If, however, you would design your own custom floating point type as well, and you gradually added more bits, your results would become more exact and it would STILL be more efficient than the solution you first found. Just play around with the number of bits used for significant and exponent, and find out how close your fp approximations come to the desired result in degrees!
Well, if this is for a large number of readings, then you may try a differential approach. Start with an absolute location, and then start saving incremental changes, which should ideally require less bits, depending on the nature of the changes. This is effectively compressing the stream. But somehow I don't think that's what this homework is about.

Difference between Logarithmic and Uniform cost criteria

I'v got some problem to understand the difference between Logarithmic(Lcc) and Uniform(Ucc) cost criteria and also how to use it in calculations.
Could someone please explain the difference between the two and perhaps show how to calculate the complexity for a problem like A+B*C
(Yes this is part of an assignment =) )
Thx for any help!
/Marthin
Uniform Cost Criteria assigns a constant cost to every machine operation regardless of the number of bits involved WHILE Logarithm Cost Criteria assigns a cost to every machine operation proportional to the number of bits involved
Problem size influence complexity
Since complexity depends on the size of the
problem we define complexity to be a function
of problem size
Definition: Let T(n) denote the complexity for
an algorithm that is applied to a problem of
size n.
The size (n) of a problem instance (I) is the
number of (binary) bits used to represent the
instance. So problem size is the length of the
binary description of the instance.
This is called Logarithmic cost criteria
Unit Cost Criteria
If you assume that:
- every computer instruction takes one time
unit,
- every register is one storage unit
- and that a number always fits in a register
then you can use the number of inputs as
problem size since the length of input (in bits)
will be a constant times the number of inputs.
Uniform cost criteria assume that every instruction takes a single unit of time and that every register requires a single unit of space.
Logarithmic cost criteria assume that every instruction takes a logarithmic number of time units (with respect to the length of the operands) and that every register requires a logarithmic number of units of space.
In simpler terms, what this means is that uniform cost criteria count the number of operations, and logarithmic cost criteria count the number of bit operations.
For example, suppose we have an 8-bit adder.
If we're using uniform cost criteria to analyze the run-time of the adder, we would say that addition takes a single time unit; i.e., T(N)=1.
If we're using logarithmic cost criteria to analyze the run-time of the adder, we would say that addition takes lg⁡n time units; i.e., T(N)=lgn, where n is the worst case number you would have to add in terms of time complexity (in this example, n would be 256). Thus, T(N)=8.
More specifically, say we're adding 256 to 32. To perform the addition, we have to add the binary bits together in the 1s column, the 2s column, the 4s column, etc (columns meaning the bit locations). The number 256 requires 8 bits. This is where logarithms come into our analysis. lg256=8. So to add the two numbers, we have to perform addition on 8 columns. Logarithmic cost criteria say that each of these 8 addition calculations takes a single unit of time. Uniform cost criteria say that the entire set of 8 addition calculations takes a single unit of time.
Similar analysis can be made in terms of space as well. Registers either take up a constant amount of space (under uniform cost criteria) or a logarithmic amount of space (under uniform cost criteria).
I think you should do some research on Big O notation... http://en.wikipedia.org/wiki/Big_O_notation#Orders_of_common_functions
If there is a part of the description you find difficult edit your question.