vecLib cblas_sgemm documentation wrong? - blas

I'm trying to multiply two matrices using vecLibs' cblas:
#include <stdlib.h>
#include <string.h>
#include <stdio.h>
#include <vecLib/cblas.h>
int main (void) {
float *A = malloc(sizeof(float) * 2 * 3);
float *B = malloc(sizeof(float) * 3 * 1);
float *C = malloc(sizeof(float) * 2 * 1);
cblas_sgemm(CblasRowMajor,
CblasNoTrans,
CblasNoTrans,
2,
1,
3,
1.0,
A, 2,
B, 3,
0.0,
C, 2);
printf ("[ %f, %f]\n", C[0], C[1]);
return 0;
}
According to the docs every argument seems to match yet I get this error:
lda must be >= MAX(K,1): lda=2 K=3BLAS error: Parameter number 9 passed to cblas_sgemm had an invalid value

The error you are seeing seems perfectly correct to my eyes.
LDA is always the pitch of the array A in linear memory. If you are using row major storage order, the pitch will be the number of columns, not the number of rows. So LDA should be 3 in this case.

Related

Different FFT results from Matlab fft and Objective-c fft

Here is my code in matlab:
x = [1 2 3 4];
result = fft(x);
a = real(result);
b = imag(result);
Result from matlab:
a = [10,-2,-2,-2]
b = [ 0, 2, 0,-2]
And my runnable code in objective-c:
int length = 4;
float* x = (float *)malloc(sizeof(float) * length);
x[0] = 1;
x[1] = 2;
x[2] = 3;
x[3] = 4;
// Setup the length
vDSP_Length log2n = log2f(length);
// Calculate the weights array. This is a one-off operation.
FFTSetup fftSetup = vDSP_create_fftsetup(log2n, FFT_RADIX2);
// For an FFT, numSamples must be a power of 2, i.e. is always even
int nOver2 = length/2;
// Define complex buffer
COMPLEX_SPLIT A;
A.realp = (float *) malloc(nOver2*sizeof(float));
A.imagp = (float *) malloc(nOver2*sizeof(float));
// Generate a split complex vector from the sample data
vDSP_ctoz((COMPLEX*)x, 2, &A, 1, nOver2);
// Perform a forward FFT using fftSetup and A
vDSP_fft_zrip(fftSetup, &A, 1, log2n, FFT_FORWARD);
//Take the fft and scale appropriately
Float32 mFFTNormFactor = 0.5;
vDSP_vsmul(A.realp, 1, &mFFTNormFactor, A.realp, 1, nOver2);
vDSP_vsmul(A.imagp, 1, &mFFTNormFactor, A.imagp, 1, nOver2);
printf("After FFT: \n");
printf("%.2f | %.2f \n",A.realp[0], 0.0);
for (int i = 1; i< nOver2; i++) {
printf("%.2f | %.2f \n",A.realp[i], A.imagp[i]);
}
printf("%.2f | %.2f \n",A.imagp[0], 0.0);
The output from objective c:
After FFT:
10.0 | 0.0
-2.0 | 2.0
The results are so close. I wonder where is the rest ? I know missed something but don't know what is it.
Updated: I found another answer here . I updated the output
After FFT:
10.0 | 0.0
-2.0 | 2.0
-2.0 | 0.0
but even that there's still 1 element missing -2.0 | -2.0
Performing a FFT delivers a right hand spectrum and a left hand spectrum.
If you have N samples the frequencies you will return are:
( -f(N/2), -f(N/2-1), ... -f(1), f(0), f(1), f(2), ..., f(N/2-1) )
If A(f(i)) is the complex amplitude A of the frequency component f(i) the following relation is true:
Real{A(f(i)} = Real{A(-f(i))} and Imag{A(f(i)} = -Imag{A(-f(i))}
This means, the information of the right hand spectrum and the left hand spectrum is the same. However, the sign of the imaginary part is different.
Matlab returns the frequency in a different order.
Matlab order is:
( f(0), f(1), f(2), ..., f(N/2-1) -f(N/2), -f(N/2-1), ... -f(1), )
To get the upper order use the Matlab function fftshift().
In the case of 4 Samples you have got in Matlab:
a = [10,-2,-2,-2]
b = [ 0, 2, 0,-2]
This means:
A(f(0)) = 10 (DC value)
A(f(1)) = -2 + 2i (first frequency component of the right hand spectrum)
A(-f(2) = -2 ( second frequency component of the left hand spectrum)
A(-f(1) = -2 - 2i ( first frequency component of the left hand spectrum)
I do not understand your objective-C code.
However, it seems to me that the program returns the right hand spectrum only.
So anything is perfect.

How % operator works when we use negative values for operation?

When I'm trying to execute -13%-10 statement in c , I'm getting -3 as output, I'm curious to know that why the output is not 3.
I did not know the answer to your question, either. So I expanded your original equation and inserted them into a short Netbeans/GCC C program:
/*
* File: main.c
* Author: Colleen
*
* Created on December 16, 2015, 9:43 AM
* Testing modulus operations with negative numbers.
*/
#include <windef.h>
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <string.h>
/*
*
*/
int main(int argc, char** argv) {
INT16 A = -13;
INT16 B = -10;
printf ("1. %3d %% %3d = %3d\n", A, B, A % B);
printf ("2. %3d %% %3d = %3d\n", -A, B, (-A)% B);
printf ("3. %3d %% %3d = %3d\n", A, -B, A%(-B));
printf ("4. %3d %% %3d = %3d\n", -A, -B, (-A)%(-B));
return (EXIT_SUCCESS);
}
Here were my results:
1. -13 % -10 = -3
2. 13 % -10 = 3
3. -13 % 10 = -3
4. 13 % 10 = 3
RUN SUCCESSFUL (total time: 35ms)
So it looks to me like the dividend (the number you are dividing, or "A" above) and the quotient (the answer, or "C" above) will always have the same sign.

Clang, link time optimization fails for AVX horizontal add

I have a small piece of testing code which calculates the dot products of two vectors with a third vector using AVX instructions (A dot C and B dot C below). It also adds the two products, but that is just to make the function return something for this example.
#include <iostream>
#include <immintrin.h>
double compute(const double *x)
{
__m256d A = _mm256_loadu_pd(x);
__m256d B = _mm256_loadu_pd(x + 4);
__m256d C = _mm256_loadu_pd(x + 8);
__m256d c1 = _mm256_mul_pd(A, C);
__m256d c2 = _mm256_mul_pd(B, C);
__m256d tmp = _mm256_hadd_pd(c1, c2);
__m128d lo = _mm256_extractf128_pd(tmp, 0);
__m128d hi = _mm256_extractf128_pd(tmp, 1);
__m128d dotp = _mm_add_pd(lo, hi);
double y[2];
_mm_store_pd(y, dotp);
return y[0] + y[1];
}
int main(int argc, char *argv[])
{
const double v[12] = {0.3, 2.9, 1.3, 4.0, -1.0, -2.1, -3.0, -4.0, 0.0, 2.0, 1.3, 1.2};
double x = 0;
std::cout << "AVX" << std::endl;
x = compute(v);
std::cout << "x = " << x << std::endl;
return 0;
}
When I compile as
clang++ -O3 -mavx main.cc -o main
everything works fine. If I enable link time optimization:
clang++ -flto -O3 -mavx main.cc -o main
I get the following error "LLVM ERROR: Do not know how to split the result of this operator!". I have narrowed the culprit to the _mm256_hadd_pd statement. If this is exchanged with e.g. _m256_add_pd link time optimization works again. I realize that this is a silly example to use link-time optimization for, but the error ocurred in a different context where it link-time optimization is extremely helpful.
Can anyone explain what is going on here?

CGAL Solves a Quadratic Programming

I have a qp problem:
Minimize: -5x0 - x1 - 4x2 - 5x5 + 1000x0x2 + 1000x1x2 + 1000x0x3
+ 1000x1x3 + 1000x0x4 +1000x1x4
Subject to: x0>=0 x1>=0 x2>=0 x3>=0 x4>=0 x5>=0
x0+x1+x5<=5
x2+x3+x4<=5
The answer should be X0=0 X1=0 X2=5 X3=0 X4=0 X5=5 and obj=-45.
But CGAL gives me X0=5 X1=0 X2=0 X3=0 X4=0 X5=0 and obj=-25.
The code is pasted as follows:
Any suggestion would be appreciated.
Kelly
#include <iostream>
#include <climits>
#include <cassert>
#include <CGAL/basic.h>
#include <CGAL/QP_models.h>
#include <CGAL/QP_functions.h>
// choose exact integral type
#ifdef CGAL_USE_GMP
#include <CGAL/Gmpz.h>
typedef CGAL::Gmpz ET;
#else
#include <CGAL/MP_Float.h>
typedef CGAL::MP_Float ET;
#endif
using namespace std;
// program and solution types
typedef CGAL::Quadratic_program<int> Program;
typedef CGAL::Quadratic_program_solution<ET> Solution;
int
main(){
Program qp (CGAL::SMALLER, true, 0.0, false, 0.0);
qp.set_c(0, -5);
qp.set_c(1, -1);
qp.set_c(2, -4);
qp.set_c(5, -5);
int g = 1000;
qp.set_d(2, 0, g);
qp.set_d(2, 1, g);
qp.set_d(3, 0, g);
qp.set_d(3, 1, g);
qp.set_d(4, 0, g);
qp.set_d(4, 1, g);
int nRow = 0;
qp.set_a(0, nRow, 1.0);
qp.set_a(1, nRow, 1.0);
qp.set_a(5, nRow, 1.0);
qp.set_b(nRow, 5);
nRow++;
qp.set_a(2, nRow, 1.0);
qp.set_a(3, nRow, 1.0);
qp.set_a(4, nRow, 1.0);
qp.set_b(nRow, 5);
Solution s = CGAL::solve_quadratic_program(qp, ET());
assert (s.solves_quadratic_program(qp));
CGAL::print_nonnegative_quadratic_program(std::cout, qp, "first_qp");
std::cout << s;
return 0;
}
Since you matrix D (quadratic objective function) is not positive semi-definite, your result isn't so surprising. CGAL does not guarantee convergence towards a global minimum but towards a local one. What you obtain is a local minimum respecting the constraints you imposed.
If you set minimum bounds for x2 and x5 at 1 by writing qp.set_l(2,true,1); qp.set_l(5,true,1);, you will see that you converge towards the solution that you computed.

Having trouble creating a weather converter program

I started school for computer programming just a couple weeks ago and we just started Objective-C! We need to convert Celsius to Fahrenheit and Kelvin. To do that I must input the amount of Celsius. Then I use this equation to get Fahrenheit: * 9 / 5 + 32. To get Kelvin I add 273.15.
#include <stdio.h>
int main(void)
{
float Celsius;
float Farenheight = Celsius * 9 / 5 + 32;
float Kelvin = Celsius + 273.15;
printf("How many degrees in Celsius?");
scanf("%s %s %d", Celsius, Farenheight, Kelvin);
printf("C: %s, F: %s, K: %d", Celsius, Farenheight, Kelvin);
}
This is (the second revision of) what I came up with so far, but I am really unsure on how to do this. If anyone can help me that would be great!
Funnily enough, temperature conversion came up in another context earlier today.
Adapting that code to your outline, you need to read the value in celsius before you convert anything to kelvin or fahrenheit (whereas your code converts an uninitialized value, which is not a good idea):
double celsius;
printf("What is the temperature in degrees Celsius? ");
if (scanf("%lf", &celsius) == 1)
{
double kelvin = celsius + 273.15;
double fahrenheit = (celsius + 40.0) * (9.0 / 5.0) - 40.0;
printf("%7.2f °C = %7.2f K = %7.2f °F\n", celsius, kelvin, fahrenheit);
}
Note that the input is checked for validity before the result is used.
The conversion formula is simpler than the usual one you see quoted, and is symmetric for converting °F to °C or vice versa, the difference being the conversion factor (9.0 / 5.0) vs (5.0 / 9.0). It relies on -40°C = -40°F. Try it:
C =  0°C; (C+40) = 40; (C+40)*9 = 360; (C+40)*9/5 = 72; (C+40)*9/5-40 = 32°F.
F = 32°F; (F+40) = 72; (F+40)*5 = 360; (F+40)*5/9 = 40; (F+40)*5/9-40 =  0°C.
Absolute zero is -273.15°C, 0K, -459.67°F.
Use this code snippet to read input from stdin:
#include <stdio.h>
int main (int argc, char *argv[]) {
int celsius;
printf("What is the temperature in celsius? ");
scanf("%d", &celsius);
printf("celsius degree = %d\n", celsius);
}