VLIW - Instructon width performance increase

VLIW - Instructon width performance increase - embedded

Would doubling the amount of instructions in a VLIW allow for a processor to achieve double the performance since it can execute twice as many operations in parallel?

The answer depends on type of the calculations. Let us say that we have only one ALU on our machine. Imagine we have code that counts sum of an array:
for(int i = 0; i < len; i++)
{
sum += arr[i]
}
The pseudo assembly will look like the following:
; tick 0:
LD arr[i] -> %r0 ; load value from memory to register on ALU0
; tick 1:
ADD sum, %r0 -> sum ; increment sum value on ALU0
The loop body takes 2 ticks. If we double the ALU number and unroll the loop body we will get the following situation:
; tick 0:
LD arr[i] -> %r0 ; load value from memory to register on ALU0
LD arr[i+1] -> %r1 ; load value from memory to register on ALU1
; tick 1:
ADD sum, %r0 -> sum ; increment sum value on ALU0
; tick 2:
ADD sum, %r1 -> sum ; increment sum value on ALU0
Now we can see that the loop body takes 3 ticks. It is possible to make parallel load but the calculation itself can no be paralleled because its result depends on the previous loop iteration. So we do no double the performance with doubling the number of ALUs.
Now lets look to another example - the sum of two vectors:
for(int i = 0; i < len; i++)
{
c[i] = a[i] + b[i]
}
Let us look an the pseudo assembly:
; tick 0:
LD a[i] -> %r0 ; load value a[i] on ALU0
; tick 1:
LD b[i] -> %r1 ; load value b[i] on ALU0
; tick 2:
ADD %r0, %r1 -> %r2 ; add values on ALU0
; tick 3:
ST c[i] <- %r2 ; store value to c[i] on ALU0
We count the body in 4 ticks. What happens if we double the number of ALUs? In this case we do not have dependencies on the previous calculations. So we can unroll the body of the loop and have the following code:
; tick 0:
LD a[i] -> %r0 ; load value a[i] on ALU0
LD b[i] -> %r1 ; load value b[i] on ALU1
; tick 1:
LD a[i] -> %r0 ; load value a[i] on ALU0
LD b[i] -> %r1 ; load value b[i] on ALU1
; tick 2:
ADD %r0, %r1 -> %r2 ; add values on ALU0
ADD %r0, %r1 -> %r2 ; add values on ALU1
; tick 3:
ST c[i] <- %r2 ; store value to c[i] on ALU0
ST c[i] <- %r2 ; store value to c[i] on ALU1
We still have 4 ticks but in the 4 ticks we count 2 loop iterations. So we can say that doubling the ALU numbers doubled our performance.
These simple examples only illustrate the idea that the instruction level parallelism depends on the particular algorithm and just doubling the ALUs may ton lead to doubling the performance.
In more complex cases VLIW systems have to implement complex optimizing compiler that can do optimizations that non-VLIW systems implement in the hardware. In some cases it works better in some - worse.

Related

How to find the complexity of the following loop

What would be the worst case running time of the following code where the input is 2 variables and loop exits when first variable becomes larger than the second one. My first guess was O(1) considering (x raised to 3) scales pretty quickly compared to (x raised to 2) but i don't know if it does close the gap quickly even when a is 1 and b is very very large integer.
i = 0;
cin >> a >> b;
while (a <= b)
{
i++;
a *= 3; b*= 2;
}
cout << i;

I think you are solving for the equation:
So solving for n, you get:
Assuming that b > a > 1.
Even for large differences, where a = 1.0001 and b = 10^1000, you get a small n = 41.8

Why is the time complexity of the below snippet O(n) while the space complexity is O(1)

The below-given code has a space complexity of O(1). I know it has something to do with the call stack but I am unable to visualize it correctly. If somebody could make me understand this a little bit clearer that would be great.
int pairSumSequence(int n) {
int sum = 0;
for (int i = 0;i < n; i++) {
sum += pairSum(i, i + l);
}
return sum;
}
int pairSum(int a, int b) {
return a + b;
}

How much space does it needs in relation to the value of n?
The only variable used is sum.
sum doesn't change with regards to n, therefore it's constant.
If it's constant then it's O(1)
How many instructions will it execute in relation to the value of n?
Let's first simplify the code, then analyze it row by row.
int pairSumSequence(int n) {
int sum = 0;
for (int i = 0; i < n; i++) {
sum += 2 * i + l;
}
return sum;
}
The declaration and initialization of a variable take constant time and doesn't change with the value of n. Therefore this line is O(1).
int sum = 0;
Similarly returning a value takes constant time so it's also O(1)
return sum;
Finally, let's analyze the inside of the for:
sum += 2 * i + l;
This is also constant time since it's basically one multiplication and two sums. Again O(1).
But this O(1) it's called inside a for:
for (int i = 0; i < n; i++) {
sum += 2 * i + l;
}
This for loop will execute exactly n times.
Therefore the total complexity of this function is:
C = O(1) + n * O(1) + O(1) = O(n)
Meaning that this function will take time proportional to the value of n.

Time/space complexity O(1) means a constant complexity, and the constant is not necessarily 1, it can be arbitrary number, but it has to be constant and not dependent from n. For example if you always had 1000 variables (independent from n), it would still give you O(1). Sometimes it may even happen that the constant will be so big compared to your n that O(n) would be much better than O(1) with that constant.
Now in your case, your time complexity is O(n) because you enter the loop n times and each loop has constant time complexity, so it is linearly dependent from your n. Your space complexity, however, is independent from n (you always have the same number of variables kept) and is constant, hence it will be O(1)

How do you calculate second Loop complexity?

For this for loop in python:
for(int a = 1 ; a < N ; a*=2)
for(int b =1 ; b <N ; b++)
I know the code completes in O(n*log(n)), but if the loop is:
For(int a = 1 ; a <N ; a*=2)
for(int b = 1 ; b < a ; b++)
Will it still complete in O(n*log(n))?

Yes (and no). Going strictly by the definition of Big O, we will see below that the second algorithm is O(n*log(n)), but has a better bound.
First, let's review the ideas of the first loop. If we call each loop iteration a unit of computation, we can say the total amount of computation is sum(sum(1 from 0 to N) from 0 to log2(N)), which is just N log2(N) (maybe off by one). This result shows that the first set of loops is O(N log(N)).
Now for the second set. Repeating the procedure above, we have sum(sum(1 from 0 to *a*) from 0 to log2(N)) = (log2(2 N) log2(4 N)) <= k [log(N)]^2, so the second algorithm is O(log^2(N)) (known as polylogarithmic).
Note that k N log(N) (for some k) is still an upper bound for the second algorithm, so strictly speaking it is still O(N log(N).
Here is an image of the expected growth of the runtime:
Red is the first set of loops, green is the second set of loops, and blue is 2N log(N).

Complexity of single loop that changes its index

Can someone please explain how to evaluate the complexity of the following code? Consider that the array_of_size_n is made of positive random numbers in ascending order.
for(i = 0; i < n; i++){
temp = array_of_size_n[i] + last
if(temp > last){
do_something_else(temp); //doesn't change the complexity
last = temp;
i = 0;
}
}

According to my test the growth is linear with a huge constant factor.
Assume last is 0 in the beginning.
It will always pass the first value because the i++ in the loop.
So when it comes to the second value, if it is 1, then last will be added to INT_MAX. Then if(temp > last) will be false forever, hence linear.
The size of the second value will affect how fast last reach INT_MAX.

Big-Oh notation for code fragment

I am lost on these code fragments and finding a hard time to find any other similar examples.
//Code fragment 1
sum = 0;
for(i = 0;i < n; i++)
for(J=1;j<i*i;J++)
for(K=0;k<j;k++)
sum++;
I'm guessing it is O(n^4) for fragment 1.
//Code fragment 2
sum = 0;
for( 1; i < n; i++ )
for (j =1;,j < i * i; j++)
if( j % i == 0 )
for( k = 0; k < j; k++)
sum++;
I am very lost on this one. Not sure how does the if statement affects the loop.
Thank you for the help head of time!

The first one is in fact O(n^5). The sum++ line is executed 1^4 times, then 2^4 times, then 3^4, and so on. The sum of powers-of-k has a term in n^(k+1) (see e.g. Faulhaber's formula), so in this case n^5.
For the second one, the way to think about it is that the inner loop only executes when j is a multiple of i. So the second loop may as well be written for (j = 1; j < i * i; j+=i). But this is the same as for (j = 1; j < i; j++). So we now have a sequence of cubes, rather than powers-of-4. The highest term is therefore n^4.

I'm fairly sure the 1st fragment is actually O(n^5).
Because:
n times,
i^2 times, where i is actually half of n (average i for the case, since for each x there is a corresponding n-x that sum to 2n) Which is therefore n^2 / 4 times. (a times)
Then, a times again,
and when you do: n*a*a, or n*n*n/4*n*n/4 = n^5 / 16, or O(n^5)
I believe the second is O(4), because:
It's iterated n times.
Then it's iterated n*n times, (literally n*n/4, but not in O notation)
Then only 1/n are let through by the if (I can't remember how I got this)
Then n*n are repeated.
So, n*n*n*n*n/n = n^4.

With a sum so handy to compute, you could run these for n=10, n=50, and so on, and just look which of O(N^2), O(N^3), O(N^4), O(N^6) is a better match. (Note that the index for the inner-most loop also runs to n*n...)

First off I agree with your assumption for the first scenario. Here is my breakdown for the second.
The If statement will cause the last loop to run only half of the time since an odd value for i*i will only lead to the third inner loop for prime values that i*i can be decomposed into. Bottom line in big-O we ignore constants so I think your looking at O(n^3).

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

VLIW - Instructon width performance increase - embedded

Would doubling the amount of instructions in a VLIW allow for a processor to achieve double the performance since it can execute twice as many operations in parallel?

Related

How to find the complexity of the following loop

Why is the time complexity of the below snippet O(n) while the space complexity is O(1)

How do you calculate second Loop complexity?

Complexity of single loop that changes its index

Big-Oh notation for code fragment

Categories

Resources