How does the kernel scheduler maintain time quanta precision with timer interrupts? - process

From my reading there's a timer interrupt called by the hard ware that executes pretty often and transfers control back from a running process to the kernel/scheduler which is then able to determine if a running process has exceeded its time quanta and if so run another task.
This seems imprecise.
For example:
If a timer interrupt was every 1 unit
And the scheduler algorithm determined a cpu bound process time quanta to be 1.5 units, it would actually get 2 units of CPU time.
Or does the scheduler only give time quanta's to processes in units of interrupt timers?

Linux's scheduler (CFS) allocates time slices to threads by first defining a time period in which every thread will run once. This time period is computed by the sched_slice() function and depends on the number of threads on the CPU, and 2 variables that can be set from user space (sysctl_sched_latency and sysctl_sched_min_granularity):
If the number of threads is greater than sysctl_sched_latency / sysctl_sched_min_granularity; then the period will be nr_threads * sysctl_sched_min_granularity; else the period will be sysctl_sched_latency.
For example, on my laptop, I have the following values:
% cat /proc/sys/kernel/sched_latency_ns
18000000
% cat /proc/sys/kernel/sched_min_granularity_ns
2250000
Therefore, sysctl_sched_latency / sysctl_sched_min_granularity = 8. Now, if I have less than 8 threads on a CPU, then each will be allocated 18.000.000 nanoseconds (ie. 18 milliseconds); else, each will be allocated 2.250.000 ns (2.25 ms).
Now, with those values in mind, if we look at the tick frequency (defined at compile time of the kernel) with this command:
% zcat /proc/config.gz | grep CONFIG_HZ
# CONFIG_HZ_PERIODIC is not set
# CONFIG_HZ_100 is not set
# CONFIG_HZ_250 is not set
CONFIG_HZ_300=y
# CONFIG_HZ_1000 is not set
CONFIG_HZ=300
So, on my laptop, I have 300 ticks per second, which means a tick every 3 ms. Which means that in my case, with more than 8 threads on a CPU, I will loose a little bit of precision in my time slices (a thread that should run 2.25 ms will run 3 ms), but I could fix it by recompiling my kernel with more frequent ticks.
However, it should be noted that this is actually not a problem because, as indicated by its name, CFS (Completely Fair Scheduler) aims at being fair, which will be the case here.

Related

What is the meaning of round complexity in distributed network algorithms

Can someone explain what is the meaning of the time complexity in distributed networking algorithms? The definition given in DNA book by Panduranga is as follow :
"In the synchronous model, time is measured by the number of clock ticks called rounds, i.e., processors are said to compute in “lock step”. When running a distributed algorithm, different nodes might take a different number of rounds to finish. In that case, the maximum time needed over all nodes is taken as the time complexity"
Can you explain the above definition with a simple example
Let's say you want to compute the sum of a really large list (say, 1 billion numbers). To speed things up, you use 4 threads, each computing the sum of 250 million rows, which can then be added to get the total sum. If the time taken for each thread to run is:
thread1 takes 43 seconds
thread2 takes 39 seconds
thread3 takes 40 seconds
thread4 takes 41 seconds
Then you would say that the runtime of this operation is bounded by the thread that takes the longest, in this case 43 seconds. It doesn't matter if the other threads take 2 seconds, the longest task determines the runtime of your algorithm.

why load tf.keras.Sequential again and again with joblib cost increasing time

It comes to a situation that I need to load the same size model(almost the same scale) one by one, the joblib.load kept cost increasing time.
For example, loading the first model token 3 seconds and the second one token 5 seconds. The later one always took more time than the previous one while the latter model is not more complicated. even if I kept loading one same model, again and again, it would cost more and more time as well(time consumption as follows).
0 4.4071619510650635 seconds
1 4.408193111419678 seconds
2 5.5284717082977295 seconds
3 7.223154306411743 seconds
4 8.955665111541748 seconds
5 10.984207153320312 seconds
6 12.934542179107666 seconds
7 14.573008298873901 seconds
8 17.183340311050415 seconds
9 19.3760027885437 seconds
I don't know why this happened and how can I found a method to fix that.
It is because keras.backend load all context into a global variable, and keep it running all the way, you can't delete model itself to clean the environment.
Take the following piece of code would release the context, and make the next process much easier.
import keras
keras.backend.clear_session()
By the way, If you are considering control the session of joblib process, take a look at
tf.InteractiveSession or tf.Session under with

How to generate timestamps from the 33-bit PCR count

So I have been trying to wrap my head around mpeg-ts timing, and the PCR (program clock reference). I understand that this is used for video/audio synchronisation at the decoder.
My basic understanding so far is that everything is driven by a 27 Mhz clock (oscillator). This clock loops at a rate of 27 Mhz, counting from 0 - 299 and keeps repeating itself. Each time this "rollover" from 299 back to 0 occurs, then a 33-bit PCR counter is incremented by 1. In effect, the 33-bit PCR counter is therefore itself running at a rate of 90 kHz. So another way of saying this is that the 27 Mhz clock is divided by 300, giving us a 90 kHz clock.
This 90 kHz clock is then used for the 33-bit PCR counter.
I understand that historically 90 kHz was chosen because mpeg-1 used a 90kHz timebase. [see source here]
Anyway... I have read that the PCR 33-bit count values range from 0x000000000 all the way through to 0x1FFFFFFFF. And according to this, it shows what these values mean in terms of time as we humans understand it (Hours, Mins, Secs, etc):
00:00:00.000 (0x000000000)
to
26:30:43.717 (0x1FFFFFFFF)
So ultimately, my question is relating to how do these hex codes get translated into those time stamps. What would the equations be if someone gave me a hex code, and now I need to reproduce the time stamp?
I would appreciate any help :)
==========
I am closer to an answer myself. Looking at the range from 0x000000000 to 0x1FFFFFFFF, this is basically 0 to 8589934591 in decimal. Since the PCR clock is 90Khz, to get the number of seconds it takes to go from 0 to 8589934591 we can do 8589934591/90000 which gives us 95443.71768 seconds.
Unless you are creating a strict bitrate encoder for broadcast over satellite or terrestrial radio, the PCR doesn't matter that much.
Scenario:
You are broadcasting to a wireless receiver with no return channel, The receiver has a clock running at what it thinks is 90000 ticks per second. Your encoder is also running at 90000 tickets per second. How can you be sure the receiver and the broadcaster have the EXACT same definition of a second? Maybe one side is running a little fast or slow. To keep the clocks in sync, the encoder sends the current time occasionally, This value is the PCR. For example, if you are broadcasting at 15,040,000 bits per second, the receiver receives a 188 byte packet every 0.0000125 seconds. Every now and then (100 ms) the encoder will insert its current time. The receiver can compare this time to its internal clock and determine if is running faster or slower than the broadcast encoder. To keep the strict 235,000 packets per second (15,040,000/(188*8) = 235,000) the encoder will insert null packets. On the internet, the null packets take bandwidth, and have no value, so they are eliminated. Hence the PCR has almost no value anymore because its time is no longer relative the the reception rate.
To answer your question. Set the 27hz value to zero, use a recent DTS minus a small static amount (like 100ms), for the 90khz value.

clock() accuracy

I have seen many posts about using the clock() function to determine the amount of elapsed time in a program with code looking something like:
start_time = clock();
//code to be timed
.
.
.
end_time = clock();
elapsed_time = (end_time - start_time) / CLOCKS_PER_SEC;
The value of CLOCKS_PER_SEC is almost surely not the actual number of clock ticks per second so I am a bit wary of the result. Without worrying about threading and I/O, is the output of the clock() function being scaled in some way so that this divison produces the correct wall clock time?
The answer to your question is yes.
clock() in this case refers to a wallclock rather than a CPU clock so it could be misleading at first glance. For all the machines and compilers I've seen, it returns the time in milliseconds since I've never seen a case where CLOCKS_PER_SEC isn't 1000. So the precision of clock() is limited to milliseconds and the accuracy is usually slightly less.
If you're interested in the actual cycles, this can be hard to obtain.
The rdtsc instruction will let you access the number "pseudo"-cycles from when the CPU was booted. On older systems (like Intel Core 2), this number is usually the same as the actual CPU frequency. But on newer systems, it isn't.
To get a more accurate timer than clock(), you will need to use the hardware performance counters - which is specific to the OS. These are internally implemented using the 'rdtsc' instruction from the last paragraph.

How do I keep time without cumulative error?

How can you keep track of time in a simple embedded system, given that you need a fixed-point representation of the time in seconds, and that your time between ticks is not precisely expressable in that fixed-point format? How do you avoid cumulative errors in those circumstances.
This question is a reaction to this article on slashdot.
0.1 seconds cannot be neatly expressed as a binary fixed-point number, just as 1/3 cannot be neatly expressed as a decimal fixed-point number. Any binary fixed-point representation has a small error. For example, if there are 8 binary bits after the point (ie using an integer value scaled by 256), 0.1 times 256 is 25.6, which will be rounded to either 25 or 26, resulting in an error in the order of -2.3% or +1.6% respectively. Adding more binary bits after the point reduces the scale of this error, but cannot eliminate it.
With repeated addition, the error gradually accumulates.
How can this be avoided?
One approach is not to try to compute the time by repeated addition of this 0.1 seconds constant, but to keep a simple integer clock-tick count. This tick count can be converted to a fixed-point time in seconds as needed, usually using a multiplication followed by a division. Given sufficient bits in the intermediate representations, this approach allows for any rational scaling, and doesn't accumulate errors.
For example, if the current tick count is 1024, we can get the current time (in fixed point with 8 bits after the point) by multiplying that by 256, then dividing by 10 - or equivalently, by multiplying by 128 then dividing by 5. Either way, there is an error (the remainder in the division), but the error is bounded since the remainder is always less than 5. There is no cumulative error.
Another approach might be useful in contexts where integer multiplication and division is considered too costly (which should be getting pretty rare these days). It borrows an idea from Bresenhams line drawing algorithm. You keep the current time in fixed point (rather than a tick count), but you also keep an error term. When the error term grows too large, you apply a correction to the time value, thus preventing the error from accumulating.
In the 8-bits-after-the-point example, the representation of 0.1 seconds is 25 (256/10) with an error term (remainder) of 6. At each step, we add 6 to our error accumulator. Based on this so far, the first two steps are...
Clock Seconds Error
----- ------- -----
25 0.0977 6
50 0.1953 12
At the second step, the error value has overflowed - exceeded 10. Therefore, we increment the clock and subtract 10 from the error. This happens every time the error value reaches 10 or higher.
Therefore, the actual sequence is...
Clock Seconds Error Overflowed?
----- ------- ----- -----------
25 0.0977 6
51 0.1992 2 Yes
76 0.2969 8
102 0.3984 4 Yes
There is almost always an error (the clock is precisely correct only when the error value is zero), but the error is bounded by a small constant. There is no cumulative error in the clock value.
A hardware-only solution is to arrange for the hardware clock ticks to run very slightly fast - precisely fast enough to compensate for cumulative losses caused by the rounding-down of the repeatedly added tick-duration value. That is, adjust the hardware clock tick speed so that the fixed-point tick-duration value is precisely correct.
This only works if there is only one fixed-point format used for the clock.
Why not have 0.1 sec counter and every ten times increment your seconds counter, and wrap the 0.1 counter back to 0?
In this particular instance, I would have simply kept the time count in tenths of a seconds (or milliseconds, or whatever time scale is appropriate for the application). I do this all the time in small systems or control systems.
So a time value of 100 hours would be stored as 3_600_000 ticks - zero error (other than error that might be introduced by hardware).
The problems that are introduced by this simple technique are:
you need to account for the larger numbers. For example, you may have to use a 64-bit counter rather than a 32-bit counter
all your calculations need to be aware of the units used - this is the area that is most likely going to cause problems. I try to help with this problem by using time counters with a uniform unit. For example, this particular counter needs only 10 ticks per second, but another counter might need millisecond precision. In that case, I'd consider making both counters millisecond precision so they use the same units even though one doesn't really need that precision.
I've also had to play some other tricks this with timers that aren't 'regular'. For example, I worked on a device that required a data acquisition to occur 300 times a second. The hardware timer fired once a millisecond. There's no way to scale the millisecond timer to get exactly 1/300th of a second units. So We had to have logic that would perform the data acquisition on every 3, 3, and 4 ticks to keep the acquisition from drifting.
If you need to deal with hardware time error, then you need more than one time source and use them together to keep the overall time in sync. Depending on your needs this can be simple or pretty complex.
Something I've seen implemented in the past: the increment value can't be expressed precisely in the fixed-point format, but it can be expressed as a fraction. (This is similar to the "keep track of an error value" solution.)
Actually in this case the problem was slightly different, but conceptually similar—the problem wasn't a fixed-point representation as such, but deriving a timer from a clock source that wasn't a perfect multiple. We had a hardware clock that ticks at 32,768 Hz (common for a watch crystal based low-power timer). We wanted a millisecond timer from it.
The millisecond timer should increment every 32.768 hardware ticks. The first approximation is to increment every 33 hardware ticks, for a nominal 0.7% error. But, noting that 0.768 is 768/1000, or 96/125, you can do this:
Keep a variable for "fractional" value. Start it on 0.
wait for the hardware timer to count 32.
While true:
increment the millisecond timer.
Add 96 to the "fractional" value.
If the "fractional" value is >= 125, subtract 125 from it and wait for the hardware timer to count 33.
Otherwise (the "fractional" value is < 125), wait for the hardware timer to count 32.
There will be some short term "jitter" on the millisecond counter (32 vs 33 hardware ticks) but the long-term average will be 32.768 hardware ticks.