Imagine this piece of code:
void Function(int16 *src, int *indices, float *dst, int cnt, float mul)
{
for (int i=0; i<cnt; i++) dst[i] = float(src[indices[i]]) * mul;
};
This really asks for gather intrinsics e.g. _mm_i32gather_epi32. I got great success with these when loading floats, but are there any for 16-bit ints? Another problem here is that I need to transition from 16-bits on the input to 32-bits (float) on the output.
There is indeed no instruction to gather 16bit integers, but (assuming there is no risk of memory-access violation) you can just load 32bit integers starting at the corresponding addresses, and mask out the upper halves of each value.
For uint16_t this would be a simple bit-and, for signed integers you can shift the values to the left in order to have the sign bit at the most-significant position. You can then (arithmetically) shift back the values before converting them to float, or, since you multiply them anyway, just scale the multiplication factor accordingly.
Alternatively, you could load from two bytes earlier and arithmetically shift to the right. Either way, your bottle-neck will likely be the load-ports (vpgatherdd requires 8 load-uops. Together with the load for the indices you have 9 loads distributed on two ports, which should result in 4.5 cycles for 8 elements).
Untested possible AVX2 implementation (does not handle the last elements, if cnt is not a multiple of 8 just execute your original loop at the end):
void Function(int16_t const *src, int const *indices, float *dst, size_t cnt, float mul_)
{
__m256 mul = _mm256_set1_ps(mul_*float(1.0f/0x10000));
for (size_t i=0; i+8<=cnt; i+=8){ // todo handle last elements
// load indicies:
__m256i idx = _mm256_loadu_si256(reinterpret_cast<__m256i const*>(indices + i));
// load 16bit integers in the lower halves + garbage in the upper halves:
__m256i values = _mm256_i32gather_epi32(reinterpret_cast<int const*>(src), idx, 2);
// shift each value to upper half (removes garbage, makes sure sign is at the right place)
// values are too large by a factor of 0x10000
values = _mm256_slli_epi32(values, 16);
// convert to float, scale and multiply:
__m256 fvalues = _mm256_mul_ps(_mm256_cvtepi32_ps(values), mul);
// store result
_mm256_storeu_ps(dst, fvalues);
}
}
Porting this to AVX-512 should be straight-forward.
So I'm trying my hand at optimising some code, and have run into some issues trying to vectorise the code.
I essentially have a nested loop as such:
for(int i = 0; i<N; i++)
{
for(int j = 0; j<N; j++;)
{
//Bunch of calculations
//array[i] += (x*y);
}
}
In the process of vectorising the inner loop, x and y both become vectorised. So I have x_vector with four values in the register, and y_vector with 4 values in the register.
In order to add these to array[i], I need to perform the calculation of x_vector*y_vector, sum the four results to a single variable and then add it to array[i]. So something like this:
__m128 x_vector ....
__m128 y_vector ....
__m128 xy_vector = _mm_mul_ps(x_vector, y_vector);
//now the xy_vector has all 4 multiplication results, need to sum them to a single variable
float result = _mm_someInstruction_ps(xy_vector);
array[i] += result;
Is there an instruction stated in the intel instrinsics guide that does this? I looked into the _mm_add_ps instruction, but that returns a vector. Is there any add instruction which sums the contents of the register, then returns this result?
In the below code instead of using for loop I wanted to implement a one line code that would use Eigen library functions and help in vectorisation of code itself and thus making parallelization through OpenMP easy.
Eigen::VectorXd get_vector(int n, int j , int start){
Eigen::VectorXd foo(n);
indices = Eigen::VectorXd::LinSpaced(n, start + n - 1, start).array();
for(int i =0;i<indices.size();i++)
foo(i) = (array(indices(i)) - array(j))*(array(indices(i)) - array(j));
return foo;
}
// array is globally declared as Eigen::VectorXd and have length greater than n, it is already been defined.(set of N(>n) random double numbers)
Assuming array is an VectorXd and you don't need indices outside your function:
return (array.segment(start, n).array() - array(j)).square();
And you should consider returning a ArrayXd instead of VectorXd.
If array is actually a ArrayXd, you can omit the .array().
void KeyExpansion(unsigned char key[N_KEYS], unsigned int* w)
{
unsigned int temp;
for(int i=0; i< N_KEYS; i++)
{
w[i] = (key[N_KEYS*i]<<24) + (key[N_KEYS*i+1]<<16) + (key[N_KEYS*i+2]<<8) + key[N_KEYS*i+3];
}
for(int i = 4; i< EXPANDED_KEY_COUNT; i++)
{
temp = w[i-1];
if(i % 4 == 0)
temp = SubWord(RotWord(temp)) ^ Rcon[i/4];
w[i] = temp ^ w[i-4] ;
}
}
Big-O helps us do analysis based on the input. The issue with your question is that there seems to be several inputs, which may or may not relate with each other.
Input variables look like N_KEYS, and EXPANDED_KEY_COUNT. We also don't know what SubWord() or RotWord() do based on what is provided.
Since SubWord() and RotWord() aren't provided, lets assume they are constant for easy calculations.
You have basic loops and iterate over each value, so its pretty straight forward. This means you have O(N_KEYS) + O(EXPANDED_KEY_COUNT). So the overall time complexity depends on two inputs, and would be bound by the larger.
If SubWord() or RotWord() do anything special that aren't constant time, then that would affect the time complexity of O(EXPANDED_KEY_COUNT) portion of code. You could adjust the time complexity by multiplied against it. But by the names of the methods, it sounds like their time complexity would be based on the length of the string, would would be yet another different input variable.
So this isn't a clear answer, because the question isn't fully clear, but I tried to break things down for you as best as I could.
I wrote a function containing array as argument,
and call it by passing value of array as follows.
void arraytest(int a[])
{
// changed the array a
a[0] = a[0] + a[1];
a[1] = a[0] - a[1];
a[0] = a[0] - a[1];
}
void main()
{
int arr[] = {1, 2};
printf("%d \t %d", arr[0], arr[1]);
arraytest(arr);
printf("\n After calling fun arr contains: %d\t %d", arr[0], arr[1]);
}
What I found is though I am calling arraytest() function by passing values, the original copy of int arr[] is changed.
Can you please explain why?
When passing an array as a parameter, this
void arraytest(int a[])
means exactly the same as
void arraytest(int *a)
so you are modifying the values in main.
For historical reasons, arrays are not first class citizens and cannot be passed by value.
For passing 2D (or higher multidimensional) arrays instead, see my other answers here:
How to pass a multidimensional [C-style] array to a function in C and C++, and here:
How to pass a multidimensional array to a function in C++ only, via std::vector<std::vector<int>>&
Passing 1D arrays as function parameters in C (and C++)
1. Standard array usage in C with natural type decay (adjustment) from array to ptr
#Bo Persson correctly states in his great answer here:
When passing an array as a parameter, this
void arraytest(int a[])
means exactly the same as
void arraytest(int *a)
Let me add some comments to add clarity to those two code snippets:
// param is array of ints; the arg passed automatically "adjusts" (frequently said
// informally as "decays") from `int []` (array of ints) to `int *`
// (ptr to int)
void arraytest(int a[])
// ptr to int
void arraytest(int *a)
However, let me add also that the above two forms also:
mean exactly the same as
// array of 0 ints; automatically adjusts (decays) from `int [0]`
// (array of zero ints) to `int *` (ptr to int)
void arraytest(int a[0])
which means exactly the same as
// array of 1 int; automatically adjusts (decays) from `int [1]`
// (array of 1 int) to `int *` (ptr to int)
void arraytest(int a[1])
which means exactly the same as
// array of 2 ints; automatically adjusts (decays) from `int [2]`
// (array of 2 ints) to `int *` (ptr to int)
void arraytest(int a[2])
which means exactly the same as
// array of 1000 ints; automatically adjusts (decays) from `int [1000]`
// (array of 1000 ints) to `int *` (ptr to int)
void arraytest(int a[1000])
etc.
In every single one of the array examples above, and as shown in the example calls in the code just below, the input parameter type adjusts (decays) to an int *, and can be called with no warnings and no errors, even with build options -Wall -Wextra -Werror turned on (see my repo here for details on these 3 build options), like this:
int array1[2];
int * array2 = array1;
// works fine because `array1` automatically decays from an array type
// to a pointer type: `int *`
arraytest(array1);
// works fine because `array2` is already an `int *`
arraytest(array2);
As a matter of fact, the "size" value ([0], [1], [2], [1000], etc.) inside the array parameter here is apparently just for aesthetic/self-documentation purposes, and can be any positive integer (size_t type I think) you want!
In practice, however, you should use it to specify the minimum size of the array you expect the function to receive, so that when writing code it's easy for you to track and verify. The MISRA-C-2012 standard (buy/download the 236-pg 2012-version PDF of the standard for £15.00 here) goes so far as to state (emphasis added):
Rule 17.5 The function argument corresponding to a parameter declared to have an array type shall have an appropriate number of elements.
...
If a parameter is declared as an array with a specified size, the corresponding argument in each function call should point into an object that has at least as many elements as the array.
...
The use of an array declarator for a function parameter specifies the function interface more clearly than using a pointer. The minimum number of elements expected by the function is explicitly stated, whereas this is not possible with a pointer.
In other words, they recommend using the explicit size format, even though the C standard technically doesn't enforce it--it at least helps clarify to you as a developer, and to others using the code, what size array the function is expecting you to pass in.
2. Forcing type safety on arrays in C
(Not recommended (correction: sometimes recommended, especially for fixed-size multi-dimensional arrays), but possible. See my brief argument against doing this at the end. Also, for my multi-dimensional-array [ex: 2D array] version of this, see my answer here.)
As #Winger Sendon points out in a comment below my answer, we can force C to treat an array type to be different based on the array size!
First, you must recognize that in my example just above, using the int array1[2]; like this: arraytest(array1); causes array1 to automatically decay into an int *. HOWEVER, if you take the address of array1 instead and call arraytest(&array1), you get completely different behavior! Now, it does NOT decay into an int *! This is because if you take the address of an array then you already have a pointer type, and pointer types do NOT adjust to other pointer types. Only array types adjust to pointer types. So instead, the type of &array1 is int (*)[2], which means "pointer to an array of size 2 of int", or "pointer to an array of size 2 of type int", or said also as "pointer to an array of 2 ints". So, you can FORCE C to check for type safety on an array by passing explicit pointers to arrays, like this:
// `a` is of type `int (*)[2]`, which means "pointer to array of 2 ints";
// since it is already a ptr, it can NOT automatically decay further
// to any other type of ptr
void arraytest(int (*a)[2])
{
// my function here
}
This syntax is hard to read, but similar to that of a function pointer. The online tool, cdecl, tells us that int (*a)[2] means: "declare a as pointer to array 2 of int" (pointer to array of 2 ints). Do NOT confuse this with the version withOUT parenthesis: int * a[2], which means: "declare a as array 2 of pointer to int" (AKA: array of 2 pointers to int, AKA: array of 2 int*s).
Now, this function REQUIRES you to call it with the address operator (&) like this, using as an input parameter a POINTER TO AN ARRAY OF THE CORRECT SIZE!:
int array1[2];
// ok, since the type of `array1` is `int (*)[2]` (ptr to array of
// 2 ints)
arraytest(&array1); // you must use the & operator here to prevent
// `array1` from otherwise automatically decaying
// into `int *`, which is the WRONG input type here!
This, however, will produce a warning:
int array1[2];
// WARNING! Wrong type since the type of `array1` decays to `int *`:
// main.c:32:15: warning: passing argument 1 of ‘arraytest’ from
// incompatible pointer type [-Wincompatible-pointer-types]
// main.c:22:6: note: expected ‘int (*)[2]’ but argument is of type ‘int *’
arraytest(array1); // (missing & operator)
You may test this code here.
To force the C compiler to turn this warning into an error, so that you MUST always call arraytest(&array1); using only an input array of the corrrect size and type (int array1[2]; in this case), add -Werror to your build options. If running the test code above on onlinegdb.com, do this by clicking the gear icon in the top-right and click on "Extra Compiler Flags" to type this option in. Now, this warning:
main.c:34:15: warning: passing argument 1 of ‘arraytest’ from incompatible pointer type [-Wincompatible-pointer-types]
main.c:24:6: note: expected ‘int (*)[2]’ but argument is of type ‘int *’
will turn into this build error:
main.c: In function ‘main’:
main.c:34:15: error: passing argument 1 of ‘arraytest’ from incompatible pointer type [-Werror=incompatible-pointer-types]
arraytest(array1); // warning!
^~~~~~
main.c:24:6: note: expected ‘int (*)[2]’ but argument is of type ‘int *’
void arraytest(int (*a)[2])
^~~~~~~~~
cc1: all warnings being treated as errors
Note that you can also create "type safe" pointers to arrays of a given size, like this:
int array[2]; // variable `array` is of type `int [2]`, or "array of 2 ints"
// `array_p` is a "type safe" ptr to array of size 2 of int; ie: its type
// is `int (*)[2]`, which can also be stated: "ptr to array of 2 ints"
int (*array_p)[2] = &array;
...but I do NOT necessarily recommend this (using these "type safe" arrays in C), as it reminds me a lot of the C++ antics used to force type safety everywhere, at the exceptionally high cost of language syntax complexity, verbosity, and difficulty architecting code, and which I dislike and have ranted about many times before (ex: see "My Thoughts on C++" here).
For additional tests and experimentation, see also the link just below.
References
See links above. Also:
My code experimentation online: https://onlinegdb.com/B1RsrBDFD
See also:
My answer on multi-dimensional arrays (ex: 2D arrays) which expounds upon the above, and uses the "type safety" approach for multi-dimensional arrays where it makes sense: How to pass a multidimensional array to a function in C and C++
If you want to pass a single-dimension array as an argument in a function, you would have to declare a formal parameter in one of following three ways and all three declaration methods produce similar results because each tells the compiler that an integer pointer is going to be received.
int func(int arr[], ...){
.
.
.
}
int func(int arr[SIZE], ...){
.
.
.
}
int func(int* arr, ...){
.
.
.
}
So, you are modifying the original values.
Thanks !!!
Passing a multidimensional array as argument to a function.
Passing an one dim array as argument is more or less trivial.
Let's take a look on more interesting case of passing a 2 dim array.
In C you can't use a pointer to pointer construct (int **) instead of 2 dim array.
Let's make an example:
void assignZeros(int(*arr)[5], const int rows) {
for (int i = 0; i < rows; i++) {
for (int j = 0; j < 5; j++) {
*(*(arr + i) + j) = 0;
// or equivalent assignment
arr[i][j] = 0;
}
}
Here I have specified a function that takes as first argument a pointer to an array of 5 integers.
I can pass as argument any 2 dim array that has 5 columns:
int arr1[1][5]
int arr1[2][5]
...
int arr1[20][5]
...
You may come to an idea to define a more general function that can accept any 2 dim array and change the function signature as follows:
void assignZeros(int ** arr, const int rows, const int cols) {
for (int i = 0; i < rows; i++) {
for (int j = 0; j < cols; j++) {
*(*(arr + i) + j) = 0;
}
}
}
This code would compile but you will get a runtime error when trying to assign the values in the same way as in the first function.
So in C a multidimensional arrays are not the same as pointers to pointers ... to pointers. An int(*arr)[5] is a pointer to array of 5 elements,
an int(*arr)[6] is a pointer to array of 6 elements, and they are a pointers to different types!
Well, how to define functions arguments for higher dimensions? Simple, we just follow the pattern!
Here is the same function adjusted to take an array of 3 dimensions:
void assignZeros2(int(*arr)[4][5], const int dim1, const int dim2, const int dim3) {
for (int i = 0; i < dim1; i++) {
for (int j = 0; j < dim2; j++) {
for (int k = 0; k < dim3; k++) {
*(*(*(arr + i) + j) + k) = 0;
// or equivalent assignment
arr[i][j][k] = 0;
}
}
}
}
How you would expect, it can take as argument any 3 dim arrays that have in the second dimensions 4 elements and in the third dimension 5 elements. Anything like this would be OK:
arr[1][4][5]
arr[2][4][5]
...
arr[10][4][5]
...
But we have to specify all dimensions sizes up to the first one.
You are not passing the array as copy. It is only a pointer pointing to the address where the first element of the array is in memory.
You are passing the address of the first element of the array
You are passing the value of the memory location of the first member of the array.
Therefore when you start modifying the array inside the function, you are modifying the original array.
Remember that a[1] is *(a+1).
Arrays in C are converted, in most of the cases, to a pointer to the first element of the array itself. And more in detail arrays passed into functions are always converted into pointers.
Here a quote from K&R2nd:
When an array name is passed to a function, what is passed is the
location of the initial element. Within the called function, this
argument is a local variable, and so an array name parameter is a
pointer, that is, a variable containing an address.
Writing:
void arraytest(int a[])
has the same meaning as writing:
void arraytest(int *a)
So despite you are not writing it explicitly it is as you are passing a pointer and so you are modifying the values in the main.
For more I really suggest reading this.
Moreover, you can find other answers on SO here
In C, except for a few special cases, an array reference always "decays" to a pointer to the first element of the array. Therefore, it isn't possible to pass an array "by value". An array in a function call will be passed to the function as a pointer, which is analogous to passing the array by reference.
EDIT: There are three such special cases where an array does not decay to a pointer to it's first element:
sizeof a is not the same as sizeof (&a[0]).
&a is not the same as &(&a[0]) (and not quite the same as &a[0]).
char b[] = "foo" is not the same as char b[] = &("foo").
Arrays are always passed by reference if you use a[] or *a:
int* printSquares(int a[], int size, int e[]) {
for(int i = 0; i < size; i++) {
e[i] = i * i;
}
return e;
}
int* printSquares(int *a, int size, int e[]) {
for(int i = 0; i < size; i++) {
e[i] = i * i;
}
return e;
}
An array can also be called as a decay pointer.
Usually when we put a variable name in the printf statement the value gets printed in case of an array it decays to the address of the first element, Therefore calling it as a decay pointer.
And we can only pass the decay pointer to a function.
Array as a formal parameter like Mr.Bo said int arr[] or int arr[10] is equivalent to the int *arr;
They will have there own 4 bytes of memory space and storing the decay pointer received.and we do pointer arithmetic on them.