What is the meaning of round complexity in distributed network algorithms - time-complexity

Can someone explain what is the meaning of the time complexity in distributed networking algorithms? The definition given in DNA book by Panduranga is as follow :
"In the synchronous model, time is measured by the number of clock ticks called rounds, i.e., processors are said to compute in “lock step”. When running a distributed algorithm, different nodes might take a different number of rounds to finish. In that case, the maximum time needed over all nodes is taken as the time complexity"
Can you explain the above definition with a simple example

Let's say you want to compute the sum of a really large list (say, 1 billion numbers). To speed things up, you use 4 threads, each computing the sum of 250 million rows, which can then be added to get the total sum. If the time taken for each thread to run is:
thread1 takes 43 seconds
thread2 takes 39 seconds
thread3 takes 40 seconds
thread4 takes 41 seconds
Then you would say that the runtime of this operation is bounded by the thread that takes the longest, in this case 43 seconds. It doesn't matter if the other threads take 2 seconds, the longest task determines the runtime of your algorithm.

Related

How to generate timestamps from the 33-bit PCR count

So I have been trying to wrap my head around mpeg-ts timing, and the PCR (program clock reference). I understand that this is used for video/audio synchronisation at the decoder.
My basic understanding so far is that everything is driven by a 27 Mhz clock (oscillator). This clock loops at a rate of 27 Mhz, counting from 0 - 299 and keeps repeating itself. Each time this "rollover" from 299 back to 0 occurs, then a 33-bit PCR counter is incremented by 1. In effect, the 33-bit PCR counter is therefore itself running at a rate of 90 kHz. So another way of saying this is that the 27 Mhz clock is divided by 300, giving us a 90 kHz clock.
This 90 kHz clock is then used for the 33-bit PCR counter.
I understand that historically 90 kHz was chosen because mpeg-1 used a 90kHz timebase. [see source here]
Anyway... I have read that the PCR 33-bit count values range from 0x000000000 all the way through to 0x1FFFFFFFF. And according to this, it shows what these values mean in terms of time as we humans understand it (Hours, Mins, Secs, etc):
00:00:00.000 (0x000000000)
to
26:30:43.717 (0x1FFFFFFFF)
So ultimately, my question is relating to how do these hex codes get translated into those time stamps. What would the equations be if someone gave me a hex code, and now I need to reproduce the time stamp?
I would appreciate any help :)
==========
I am closer to an answer myself. Looking at the range from 0x000000000 to 0x1FFFFFFFF, this is basically 0 to 8589934591 in decimal. Since the PCR clock is 90Khz, to get the number of seconds it takes to go from 0 to 8589934591 we can do 8589934591/90000 which gives us 95443.71768 seconds.
Unless you are creating a strict bitrate encoder for broadcast over satellite or terrestrial radio, the PCR doesn't matter that much.
Scenario:
You are broadcasting to a wireless receiver with no return channel, The receiver has a clock running at what it thinks is 90000 ticks per second. Your encoder is also running at 90000 tickets per second. How can you be sure the receiver and the broadcaster have the EXACT same definition of a second? Maybe one side is running a little fast or slow. To keep the clocks in sync, the encoder sends the current time occasionally, This value is the PCR. For example, if you are broadcasting at 15,040,000 bits per second, the receiver receives a 188 byte packet every 0.0000125 seconds. Every now and then (100 ms) the encoder will insert its current time. The receiver can compare this time to its internal clock and determine if is running faster or slower than the broadcast encoder. To keep the strict 235,000 packets per second (15,040,000/(188*8) = 235,000) the encoder will insert null packets. On the internet, the null packets take bandwidth, and have no value, so they are eliminated. Hence the PCR has almost no value anymore because its time is no longer relative the the reception rate.
To answer your question. Set the 27hz value to zero, use a recent DTS minus a small static amount (like 100ms), for the 90khz value.

How does the kernel scheduler maintain time quanta precision with timer interrupts?

From my reading there's a timer interrupt called by the hard ware that executes pretty often and transfers control back from a running process to the kernel/scheduler which is then able to determine if a running process has exceeded its time quanta and if so run another task.
This seems imprecise.
For example:
If a timer interrupt was every 1 unit
And the scheduler algorithm determined a cpu bound process time quanta to be 1.5 units, it would actually get 2 units of CPU time.
Or does the scheduler only give time quanta's to processes in units of interrupt timers?
Linux's scheduler (CFS) allocates time slices to threads by first defining a time period in which every thread will run once. This time period is computed by the sched_slice() function and depends on the number of threads on the CPU, and 2 variables that can be set from user space (sysctl_sched_latency and sysctl_sched_min_granularity):
If the number of threads is greater than sysctl_sched_latency / sysctl_sched_min_granularity; then the period will be nr_threads * sysctl_sched_min_granularity; else the period will be sysctl_sched_latency.
For example, on my laptop, I have the following values:
% cat /proc/sys/kernel/sched_latency_ns
18000000
% cat /proc/sys/kernel/sched_min_granularity_ns
2250000
Therefore, sysctl_sched_latency / sysctl_sched_min_granularity = 8. Now, if I have less than 8 threads on a CPU, then each will be allocated 18.000.000 nanoseconds (ie. 18 milliseconds); else, each will be allocated 2.250.000 ns (2.25 ms).
Now, with those values in mind, if we look at the tick frequency (defined at compile time of the kernel) with this command:
% zcat /proc/config.gz | grep CONFIG_HZ
# CONFIG_HZ_PERIODIC is not set
# CONFIG_HZ_100 is not set
# CONFIG_HZ_250 is not set
CONFIG_HZ_300=y
# CONFIG_HZ_1000 is not set
CONFIG_HZ=300
So, on my laptop, I have 300 ticks per second, which means a tick every 3 ms. Which means that in my case, with more than 8 threads on a CPU, I will loose a little bit of precision in my time slices (a thread that should run 2.25 ms will run 3 ms), but I could fix it by recompiling my kernel with more frequent ticks.
However, it should be noted that this is actually not a problem because, as indicated by its name, CFS (Completely Fair Scheduler) aims at being fair, which will be the case here.

Computing the approximate LCM of a set of numbers

I'm writing a tone generator program for a microcontroller.
I use an hardware timer to trigger an interrupt and check if I need to set the signal to high or low in a particular moment for a given note.
I'm using pretty limited hardware, so the slower I run the timer the more time I have to do other stuff (serial communication, loading the next notes to generate, etc.).
I need to find the frequency at which I should run the timer to have an optimal result, which is, generate a frequency that is accurate enough and still have time to compute the other stuff.
To achieve this, I need to find an approximate (within some percent value, as the higher are the frequencies the more they need to be imprecise in value for a human ear to notice the error) LCM of all the frequencies I need to play: this value will be the frequency at which to run the hardware timer.
Is there a simple enough algorithm to compute such number? (EDIT, I shall clarify "simple enough": fast enough to run in a time t << 1 sec. for less than 50 values on a 8 bit AVR microcontroller and implementable in a few dozens of lines at worst.)
LCM(a,b,c) = LCM(LCM(a,b),c)
Thus you can compute LCMs in a loop, bringing in frequencies one at a time.
Furthermore,
LCM(a,b) = a*b/GCD(a,b)
and GCDs are easily computed without any factoring by using the Euclidean algorithm.
To make this an algorithm for approximate LCMs, do something like round lower frequencies to multiples of 10 Hz and higher frequencies to multiples of 50 Hz. Another idea that is a bit more principled would be to first convert the frequency to an octave (I think that the formula is f maps to log(f/16)/log(2)) This will give you a number between 0 and 10 (or slightly higher --but anything above 10 is almost beyond human hearing so you could perhaps round down). You could break 0-10 into say 50 intervals 0.0, 0.2, 0.4, ... and for each number compute ahead of time the frequency corresponding to that octave (which would be f = 16*2^o where o is the octave). For each of these -- go through by hand once and for all and find a nearby round number that has a number of smallish prime factors. For example, if o = 5.4 then f = 675.58 -- round to 675; if o = 5.8 then f = 891.44 -- round to 890. Assemble these 50 numbers into a sorted array, using binary search to replace each of your frequencies by the closest frequency in the array.
An idea:
project the frequency range to a smaller interval
Let's say your frequency range is from 20 to 20000 and you aim for a 2% accurary, you'll calculate for a 1-50 range. It has to be a non-linear transformation to keep the accurary for lower frequencies. The goal is both to compute the result faster and to have a smaller LCM.
Use a prime factors table to easily compute the LCM on that reduced range
Store the pre-calculated prime factors powers in an array (size about 50x7 for range 1-50), and then use it for the LCM: the LCM of a number is the product of multiplying the highest power of each prime factor of the number together. It's easy to code and blazingly fast to run.
Do the first step in reverse to get the final number.

Difference between Logarithmic and Uniform cost criteria

I'v got some problem to understand the difference between Logarithmic(Lcc) and Uniform(Ucc) cost criteria and also how to use it in calculations.
Could someone please explain the difference between the two and perhaps show how to calculate the complexity for a problem like A+B*C
(Yes this is part of an assignment =) )
Thx for any help!
/Marthin
Uniform Cost Criteria assigns a constant cost to every machine operation regardless of the number of bits involved WHILE Logarithm Cost Criteria assigns a cost to every machine operation proportional to the number of bits involved
Problem size influence complexity
Since complexity depends on the size of the
problem we define complexity to be a function
of problem size
Definition: Let T(n) denote the complexity for
an algorithm that is applied to a problem of
size n.
The size (n) of a problem instance (I) is the
number of (binary) bits used to represent the
instance. So problem size is the length of the
binary description of the instance.
This is called Logarithmic cost criteria
Unit Cost Criteria
If you assume that:
- every computer instruction takes one time
unit,
- every register is one storage unit
- and that a number always fits in a register
then you can use the number of inputs as
problem size since the length of input (in bits)
will be a constant times the number of inputs.
Uniform cost criteria assume that every instruction takes a single unit of time and that every register requires a single unit of space.
Logarithmic cost criteria assume that every instruction takes a logarithmic number of time units (with respect to the length of the operands) and that every register requires a logarithmic number of units of space.
In simpler terms, what this means is that uniform cost criteria count the number of operations, and logarithmic cost criteria count the number of bit operations.
For example, suppose we have an 8-bit adder.
If we're using uniform cost criteria to analyze the run-time of the adder, we would say that addition takes a single time unit; i.e., T(N)=1.
If we're using logarithmic cost criteria to analyze the run-time of the adder, we would say that addition takes lg⁡n time units; i.e., T(N)=lgn, where n is the worst case number you would have to add in terms of time complexity (in this example, n would be 256). Thus, T(N)=8.
More specifically, say we're adding 256 to 32. To perform the addition, we have to add the binary bits together in the 1s column, the 2s column, the 4s column, etc (columns meaning the bit locations). The number 256 requires 8 bits. This is where logarithms come into our analysis. lg256=8. So to add the two numbers, we have to perform addition on 8 columns. Logarithmic cost criteria say that each of these 8 addition calculations takes a single unit of time. Uniform cost criteria say that the entire set of 8 addition calculations takes a single unit of time.
Similar analysis can be made in terms of space as well. Registers either take up a constant amount of space (under uniform cost criteria) or a logarithmic amount of space (under uniform cost criteria).
I think you should do some research on Big O notation... http://en.wikipedia.org/wiki/Big_O_notation#Orders_of_common_functions
If there is a part of the description you find difficult edit your question.

How do I keep time without cumulative error?

How can you keep track of time in a simple embedded system, given that you need a fixed-point representation of the time in seconds, and that your time between ticks is not precisely expressable in that fixed-point format? How do you avoid cumulative errors in those circumstances.
This question is a reaction to this article on slashdot.
0.1 seconds cannot be neatly expressed as a binary fixed-point number, just as 1/3 cannot be neatly expressed as a decimal fixed-point number. Any binary fixed-point representation has a small error. For example, if there are 8 binary bits after the point (ie using an integer value scaled by 256), 0.1 times 256 is 25.6, which will be rounded to either 25 or 26, resulting in an error in the order of -2.3% or +1.6% respectively. Adding more binary bits after the point reduces the scale of this error, but cannot eliminate it.
With repeated addition, the error gradually accumulates.
How can this be avoided?
One approach is not to try to compute the time by repeated addition of this 0.1 seconds constant, but to keep a simple integer clock-tick count. This tick count can be converted to a fixed-point time in seconds as needed, usually using a multiplication followed by a division. Given sufficient bits in the intermediate representations, this approach allows for any rational scaling, and doesn't accumulate errors.
For example, if the current tick count is 1024, we can get the current time (in fixed point with 8 bits after the point) by multiplying that by 256, then dividing by 10 - or equivalently, by multiplying by 128 then dividing by 5. Either way, there is an error (the remainder in the division), but the error is bounded since the remainder is always less than 5. There is no cumulative error.
Another approach might be useful in contexts where integer multiplication and division is considered too costly (which should be getting pretty rare these days). It borrows an idea from Bresenhams line drawing algorithm. You keep the current time in fixed point (rather than a tick count), but you also keep an error term. When the error term grows too large, you apply a correction to the time value, thus preventing the error from accumulating.
In the 8-bits-after-the-point example, the representation of 0.1 seconds is 25 (256/10) with an error term (remainder) of 6. At each step, we add 6 to our error accumulator. Based on this so far, the first two steps are...
Clock Seconds Error
----- ------- -----
25 0.0977 6
50 0.1953 12
At the second step, the error value has overflowed - exceeded 10. Therefore, we increment the clock and subtract 10 from the error. This happens every time the error value reaches 10 or higher.
Therefore, the actual sequence is...
Clock Seconds Error Overflowed?
----- ------- ----- -----------
25 0.0977 6
51 0.1992 2 Yes
76 0.2969 8
102 0.3984 4 Yes
There is almost always an error (the clock is precisely correct only when the error value is zero), but the error is bounded by a small constant. There is no cumulative error in the clock value.
A hardware-only solution is to arrange for the hardware clock ticks to run very slightly fast - precisely fast enough to compensate for cumulative losses caused by the rounding-down of the repeatedly added tick-duration value. That is, adjust the hardware clock tick speed so that the fixed-point tick-duration value is precisely correct.
This only works if there is only one fixed-point format used for the clock.
Why not have 0.1 sec counter and every ten times increment your seconds counter, and wrap the 0.1 counter back to 0?
In this particular instance, I would have simply kept the time count in tenths of a seconds (or milliseconds, or whatever time scale is appropriate for the application). I do this all the time in small systems or control systems.
So a time value of 100 hours would be stored as 3_600_000 ticks - zero error (other than error that might be introduced by hardware).
The problems that are introduced by this simple technique are:
you need to account for the larger numbers. For example, you may have to use a 64-bit counter rather than a 32-bit counter
all your calculations need to be aware of the units used - this is the area that is most likely going to cause problems. I try to help with this problem by using time counters with a uniform unit. For example, this particular counter needs only 10 ticks per second, but another counter might need millisecond precision. In that case, I'd consider making both counters millisecond precision so they use the same units even though one doesn't really need that precision.
I've also had to play some other tricks this with timers that aren't 'regular'. For example, I worked on a device that required a data acquisition to occur 300 times a second. The hardware timer fired once a millisecond. There's no way to scale the millisecond timer to get exactly 1/300th of a second units. So We had to have logic that would perform the data acquisition on every 3, 3, and 4 ticks to keep the acquisition from drifting.
If you need to deal with hardware time error, then you need more than one time source and use them together to keep the overall time in sync. Depending on your needs this can be simple or pretty complex.
Something I've seen implemented in the past: the increment value can't be expressed precisely in the fixed-point format, but it can be expressed as a fraction. (This is similar to the "keep track of an error value" solution.)
Actually in this case the problem was slightly different, but conceptually similar—the problem wasn't a fixed-point representation as such, but deriving a timer from a clock source that wasn't a perfect multiple. We had a hardware clock that ticks at 32,768 Hz (common for a watch crystal based low-power timer). We wanted a millisecond timer from it.
The millisecond timer should increment every 32.768 hardware ticks. The first approximation is to increment every 33 hardware ticks, for a nominal 0.7% error. But, noting that 0.768 is 768/1000, or 96/125, you can do this:
Keep a variable for "fractional" value. Start it on 0.
wait for the hardware timer to count 32.
While true:
increment the millisecond timer.
Add 96 to the "fractional" value.
If the "fractional" value is >= 125, subtract 125 from it and wait for the hardware timer to count 33.
Otherwise (the "fractional" value is < 125), wait for the hardware timer to count 32.
There will be some short term "jitter" on the millisecond counter (32 vs 33 hardware ticks) but the long-term average will be 32.768 hardware ticks.