Total execution time of a program with conditional branches in a five-stage pipeline - conditional-statements

A CPU has a five-stage pipeline and runs at 1 GHz frequency. Instruction fetch
happens in the first stage of the pipeline. A conditional branch instruction
computes the target address and evaluates the condition in the third stage of the
pipeline. The processor stops fetching new instructions following a conditional
branch until the branch outcome is known. A program executes 10^9 instructions
out of which 20% are conditional branches. If each instruction takes one cycle to
complete on average, the total execution time of the program is:
(A) 1.0 second
(B) 1.2 seconds
(C) 1.4 seconds
(D) 1.6 seconds

Total_execution_time = (1+stall_cycle*stall_frequency)*exec_time_each_inst
exec_time_each_inst = 1s [i.e #1ghz need to execute 10^9 inst => 1 inst = 1 sec]
stall_frequency = 20% = .20
stall_cycle = 2
[i.e in 3rd stage of pipeline we know branch result, so there will be 2 stall cycles]
therefore Total_execution_time = (1+2*.20)*1 = 1.4 seconds
I don't know how to explain it better but hope it helps a bit :)

Related

Calculating total number of program instructions?

So if a program take 5.7 seconds to execute on a processor with the clock frequency of 1.8 GHz where each instruction take 7 clock cycles, what's the total amount of instructions of the program?
I though I could calculate it like this:
Total number of clock cycles = 5.7 seconds * 1.8 GHz = 10,260,000,000 cycles.
Then divide total number of cycles with number of cycles per instructions: 10,260,000,000 / 7 we get 1,466,428,571
But apparently this is wrong? It's part of a quiz and I got this question wrong, wonder why that is?

Estimation of Execution Time based on GFLOPS and Time Complexity

I have a CPU with 83.2 GFLOPS/s + 4 cores. So i understand that each core is (83.2 / 4) = 20.8 GFLOPS/s.
What i am trying to do is to estimate the execution time of an algorithm. I found that we can estimate the execution time roughly by using the following formula :
estimation_exec_time = algorithm_time_complexity / Gflops/s
So if we have a bubble_sort algorithm with time complexity O(n^2) that runs on a VM that uses 1 core of my CPU the estimation exec time would be :
estimation_exec_time = n^2 / 20.8 GFLOPS/s
The problem is that the estimation execution time is completely different from the real execution time when i am timing my code..
To be more specific the formula returns an estimation of 0.00004807s
and the real execution time gives a result of 0.74258s
Is this approach with the formula false?

Understanding spacy textcat_multilabel scorer output

I'm trying to understand the output of my textcat_multilabel job. I have 4 text categories and I'm using spacy version 3.2.0 (The methodologies have changed a lot recently and I don't really understand the documentation).
E
#
LOSS TEXTC...
CATS_SCORE
SCORE
0
0
1.00
51.86
0.52
0
200
122.15
52.90
0.53
This is what I have in my config file. (btw. What is v1?)
scorer = {"#scorers":"spacy.textcat_multilabel_scorer.v1"}
threshold = 0.5
In fact, everything in the standard config file is unchanged from the suggestions except the dropout which I increased to 0.5.
The final row of my job shows these values: 0 8400 2.59 87.29 0.87
I am very impressed with the results that I'm getting with this job. Just need to understand what I'm doing.
E is epochs
# is training iterations / batches (see here)
LOSS_TEXTCAT is the loss of your textcat component. Loss normally fluctuates the first few iterations and then trends downward. The exact values are meaningless.
SCORE_TEXTCAT is the score of your textcat component on your dev set, see the docs for some details on that.
SCORE is overall score of your pipeline, a weighted average of any components you have. But you only have a textcat so it's basically the same as that score.
v1 is just version 1, components are versioned in case they are updated later so you can use older versions with newer code.

How does the kernel scheduler maintain time quanta precision with timer interrupts?

From my reading there's a timer interrupt called by the hard ware that executes pretty often and transfers control back from a running process to the kernel/scheduler which is then able to determine if a running process has exceeded its time quanta and if so run another task.
This seems imprecise.
For example:
If a timer interrupt was every 1 unit
And the scheduler algorithm determined a cpu bound process time quanta to be 1.5 units, it would actually get 2 units of CPU time.
Or does the scheduler only give time quanta's to processes in units of interrupt timers?
Linux's scheduler (CFS) allocates time slices to threads by first defining a time period in which every thread will run once. This time period is computed by the sched_slice() function and depends on the number of threads on the CPU, and 2 variables that can be set from user space (sysctl_sched_latency and sysctl_sched_min_granularity):
If the number of threads is greater than sysctl_sched_latency / sysctl_sched_min_granularity; then the period will be nr_threads * sysctl_sched_min_granularity; else the period will be sysctl_sched_latency.
For example, on my laptop, I have the following values:
% cat /proc/sys/kernel/sched_latency_ns
18000000
% cat /proc/sys/kernel/sched_min_granularity_ns
2250000
Therefore, sysctl_sched_latency / sysctl_sched_min_granularity = 8. Now, if I have less than 8 threads on a CPU, then each will be allocated 18.000.000 nanoseconds (ie. 18 milliseconds); else, each will be allocated 2.250.000 ns (2.25 ms).
Now, with those values in mind, if we look at the tick frequency (defined at compile time of the kernel) with this command:
% zcat /proc/config.gz | grep CONFIG_HZ
# CONFIG_HZ_PERIODIC is not set
# CONFIG_HZ_100 is not set
# CONFIG_HZ_250 is not set
CONFIG_HZ_300=y
# CONFIG_HZ_1000 is not set
CONFIG_HZ=300
So, on my laptop, I have 300 ticks per second, which means a tick every 3 ms. Which means that in my case, with more than 8 threads on a CPU, I will loose a little bit of precision in my time slices (a thread that should run 2.25 ms will run 3 ms), but I could fix it by recompiling my kernel with more frequent ticks.
However, it should be noted that this is actually not a problem because, as indicated by its name, CFS (Completely Fair Scheduler) aims at being fair, which will be the case here.

How to calculate percentage improvement in response time for performance testing

How should I calculate the percentage improvement in response time.
I am getting 15306 ms response time for old code and 799 ms response for the updated code. What will be the percentage improvement in response time?
There are two ways to interpret "percentage improvement in response time". One is the classic and ubiquitous formula for computing a percentage change in a data point from an old value to a new value, which looks like this:
(new - old)/old*100%
So for your case:
(799 - 15306)/15306*100% = -94.78%
That means the new value is 94.78% smaller (faster, since we're talking about response time) than the old value.
The second way of interpreting the statement is to take the percentage of the old value that the new value "covers" or "reaches":
new/old*100%
For your case:
799/15306*100% = 5.22%
That means the new value is just 5.22% of the old value, which, for response time, means it takes just 5.22% of the time to respond, compared to the old response time.
The use of the word "improvement" suggests that you want the 94.78% value, as that shows how much of the lag in the old response time was eliminated ("improved") by the new code. But when it comes to natural language, it can be difficult to be certain about precise meaning without careful clarification.
I think the accepted answer suffers from the original question not having
nice round numbers and that there are 3 different ways to state the
result.
Let's assume that the old time was 10 seconds and the new time is 5 seconds.
There's clearly a 50% reduction (or decrease) in the new time:
(old-new)/old x 100% = (10-5)/10 x 100% = 50%
But when you talk about an increase in performance, where a bigger increase is clearly better, you can't use the formula above. Instead, the increase in performance is 100%:
(old-new)/new x 100% = (10-5)/5 x 100% = 100%
The 5 second time is 2x faster than the 10 second time. Said a different way, you can do the task twice (2x) now for every time you used to be able to do it.
old/new = 10/5 = 2.0
So now let's consider the original question
The old time was 15306 ms and the new time is 799 ms.
There is a 94.7% reduction in time.
(old-new)/old x 100% = (15306-799)/15306 x 100% = 94.7%
There is a 1816% increase in performance:
(old-new)/new x 100% = (15306-799)/799 x 100% = 1815.6%
Your new time is 19x faster:
old/new = 15306/799 = 19.16
Actually performance is about how much can be done in the same amount of time.
So the formula is OLD/NEW - 1
In your case your performance increased by 1816% (i.e. you can do 18.16X more in the same time)
15306/799 - 1 = 1816%
Note: before you could do 1/15360, now 1/799 ...
your code's runtime is 94.78% shorter/improved/decreased:
(new - old) / old x 100%
(799 - 15306) / 15306 x 100% =~ -94.78% (minus represents decrease)
your code is 1816% faster:
(old - new) / new x 100%
(15306 - 799) / 799 x 100% =~ 1816%
Couple of responses already answered the question correctly, but let me expand those answers with some additional thoughts and a practical example.
Percentage improvement is usually calculated as ((NEW - OLD)/OLD) * 100
Let us think about it with some practical examples:
If I make $10,000 in my current job and get a new job that offers $12,000 then I am getting 20% increase in salary with this new job ((12000 - 10000)/10000)*100
If Train A travels at 100 miles per hour and Train B travels at 150 miles per hour, Train B is 50% faster than Train A. Simple, right?
It gets tricky when you try to measure a metric using another metric that has an inverse relationship with the metric you are trying to measure. Let me explain what I mean by this.
Let us try the second example again now using "time taken to reach destination". Let us say Train A takes 3 hours to reach the destination and Train B takes 2 hours to reach the same destination. Train B is faster than Train A by what percentage?
If we use the same formula we used in the above examples, we get ((2-3)/3)*100, which is -33%. This simply tells that Train B takes 33% less time to reach the destination than Train A, but that is not what we are trying to determine. Right? We are trying to measure the difference in speed by percentage. If we change the formula slightly and take the absolute value, we get 33%, which may seem right, but not really. (I'll explain why in a minute)
So, what do we do? The first thing we need to do is to convert the metric we have in hand to the metric we want to measure. After that, we should be able to use the same formula. In this example, we are trying to measure difference in speed. So let us first get the speed of each train. Train A travels at 1/3 of the distance per hour. Train B travels at 1/2 distance per hour. The difference in speed in percentage then is: ((1/2 - 1/3)/1/3) * 100 = ((1/2 - 1/3)*3)*100 = (3/2 - 3/3) * 100 = 50%
Which happens to be same as ((3 - 2)/2) * 100.
In short, when the metric we are measuring and the metric we have at hand have an inverse relationship, the formula should be
((OLD - NEW)/NEW) * 100
What is wrong with the original formula? Why can't we use the original formula and conclude that train B is only 33% faster? Because it is inaccurate. The original formula always yields a result that is less than 100%. Imagine a flight reaching the same destination in 15 mins? The first formula tells that flight is 91.6% faster while the second formula tells that the flight is 1100% faster (or 11 times faster), which is more accurate.
Using this modified formula, the percentage improvement for case posted in the original question is ((15306 - 799)/799) * 100 = 1815.6%
((old time - new time) / old time) * 100
This formula will give the Percentage Decreased in New Response time.
In your case, ((15306 - 799)/ 15306) * 100 = 94.78 %
The formula for finding the percentage of reduction is:
P = a/b × 100
Where P is the percentage of reduction, a is the amount of the reduction and b is the original amount that was reduced.
So to calculate a you do: old - new wichi will translate into:
P = ((OLD - NEW)/OLD)*100