CBLAS - ** On entry to SGEMM / DGEMM, parameter number X had an illegal value?

CBLAS - ** On entry to SGEMM / DGEMM, parameter number X had an illegal value? - blas

I am calling cblas_sgemm using the following parameters:
order: CblasRowMajor
transA, transB: either CblasTrans or CblasNoTrans
M: the number of rows (height) of the matrix op(A) and of the matrix C
N: the number of columns (width) of the matrix op(B) and of the matrix C
K: the number of columns (width) of the matrix op(A) and the number of rows (height) of the matrix op(B)
alpha: scalar
A: pointer to matrix A
LDA: When transA = CblasNoTrans then LDA = M, otherwise LDA = K
B: pointer to matrix B
LDB: when transB = CblasNoTrans then LDB = K, otherwise LDB = N
beta: scalar
C: pointer to matrix C (bias on entry and the result on exit)
LDC = M
where, op(M) = M if transM is CblasNoTrans, and Transpose(M) otherwise
The parameters are correct (according to the documentation) but I am getting am error:
"** On entry to SGEMM, parameter number X had an illegal value" - How do I fix this error?

TL;DR
When using: CblasRowMajor
Keep M,N,K setting as listed above.
Flip Transpose value when computing LDA,LDB, and set LDC = N
Details:
I ran into this problem using multiplication with Transpose and could not find a detailed answer that answers my problem. I write this Q/A in hope it will be useful for others.
The root problem, as others have noted, is that cblas is a wrapper over BLAS, which is written in FORTRAN that does not have an order parameter and expect column major matrix representation. The available documentation (usually) is the BLAS documentation - which I used above in the question.
However, While M,N,and K are logical values (the width/height of matrix regardless of its representation) the leading dimensions (LDA, LDB, LDC) are not. Therefore the computation of M,N, and K should stay the same when using CblasRowMajor. However, the transpose of op(A), op(B) and C, should be use when computing LDA, LDB, and LDC.
If CblasColMajor is used then it is the same representation as Fortran and the parameter setting shown in the question is the correct one.
Also note that when you get and error in parameter X, X is shifted by 1 because the error originated in BLAS that does not have an order parameter.

Related

Automatic Differentiation with respect to rank-based computations

I'm new to automatic differentiation programming, so this maybe a naive question. Below is a simplified version of what I'm trying to solve.
I have two input arrays - a vector A of size N and a matrix B of shape (N, M), as well a parameter vector theta of size M. I define a new array C(theta) = B * theta to get a new vector of size N. I then obtain the indices of elements that fall in the upper and lower quartile of C, and use them to create a new array A_low(theta) = A[lower quartile indices of C] and A_high(theta) = A[upper quartile indices of C]. Clearly these two do depend on theta, but is it possible to differentiate A_low and A_high w.r.t theta?
My attempts so far seem to suggest no - I have using the python libraries of autograd, JAX and tensorflow, but they all return a gradient of zero. (The approaches I have tried so far involve using argsort or extracting the relevant sub-arrays using tf.top_k.)
What I'm seeking help with is either a proof that the derivative is not defined (or cannot be analytically computed) or if it does exist, a suggestion on how to estimate it. My eventual goal is to minimize some function f(A_low, A_high) wrt theta.

This is the JAX computation that I wrote based on your description:
import numpy as np
import jax.numpy as jnp
import jax
N = 10
M = 20
rng = np.random.default_rng(0)
A = jnp.array(rng.random((N,)))
B = jnp.array(rng.random((N, M)))
theta = jnp.array(rng.random(M))
def f(A, B, theta, k=3):
C = B # theta
_, i_upper = lax.top_k(C, k)
_, i_lower = lax.top_k(-C, k)
return A[i_lower], A[i_upper]
x, y = f(A, B, theta)
dx_dtheta, dy_dtheta = jax.jacobian(f, argnums=2)(A, B, theta)
The derivatives are all zero, and I believe this is correct, because the change in value of the outputs does not depend on the change in value of theta.
But, you might ask, how can this be? After all, theta enters into the computation, and if you put in a different value for theta, you get different outputs. How could the gradient be zero?
What you must keep in mind, though, is that differentiation doesn't measure whether an input affects an output. It measures the change in output given an infinitesimal change in input.
Let's use a slightly simpler function as an example:
import jax
import jax.numpy as jnp
A = jnp.array([1.0, 2.0, 3.0])
theta = jnp.array([5.0, 1.0, 3.0])
def f(A, theta):
return A[jnp.argmax(theta)]
x = f(A, theta)
dx_dtheta = jax.grad(f, argnums=1)(A, theta)
Here the result of differentiating f with respect to theta is all zero, for the same reasons as above. Why? If you make an infinitesimal change to theta, it will in general not affect the sort order of theta. Thus, the entries you choose from A do not change given an infinitesimal change in theta, and thus the derivative with respect to theta is zero.
Now, you might argue that there are circumstances where this is not the case: for example, if two values in theta are very close together, then certainly perturbing one even infinitesimally could change their respective rank. This is true, but the gradient resulting from this procedure is undefined (the change in output is not smooth with respect to the change in input). The good news is this discontinuity is one-sided: if you perturb in the other direction, there is no change in rank and the gradient is well-defined. In order to avoid undefined gradients, most autodiff systems will implicitly use this safer definition of a derivative for rank-based computations.
The result is that the value of the output does not change when you infinitesimally perturb the input, which is another way of saying the gradient is zero. And this is not a failure of autodiff – it is the correct gradient given the definition of differentiation that autodiff is built on. Moreover, were you to try changing to a different definition of the derivative at these discontinuities, the best you could hope for would be undefined outputs, so the definition that results in zeros is arguably more useful and correct.

How to implement an equation with "diag" and circled-times symbol (⊗) in Python

The above equation is found in this paper (https://openreview.net/pdf?id=Sk_P2Q9sG) as equation #4.
I'm really confused as to how this gets converted into the following code:
epistemic = np.mean(p_hat**2, axis=0) - np.mean(p_hat, axis=0)**2
aleatoric = np.mean(p_hat*(1-p_hat), axis=0)
I think I may be confused due to the symbols in the equation though, for instance "diag" and the circle with a cross in it. How is the diag and circle represented as such in Python?
Thanks.

The linked article explains below eq. (2) the definition of this "outer square" operation:
v⊗2 = v vT.
In the field of machine learning, vectors are to be interpreted as column vectors. In numpy, if v is a vector of shape (n,), you would write:
v.reshape(-1, 1) * v
The result is a shape (n, n) array. If you want to stay closer to the notation with column and row vectors, you could also write:
v_column = v.reshape(-1, 1)
result = v_column # v_column.T
The function diag is the same as np.diag: you feed it a vector and you get a diagonal matrix.
How the Python snippet in your question implements the equations from the paper is hard to tell without information about what the shape of p_hat is and which axis represents t in the equation. In the most plausible definition, p_hat would have shape (T, n), in which case np.mean(p_hat**2, axis=0) would return shape (n,), which is not consistent with the equation that you quoted from the paper that should result in an (n, n) shape.
Given that the epistemic and aleatoric uncertainties in the paper seem to be scalars rather than vectors, I suspect that the authors made an error in the definition of the ⊗2 exponent: they should have written
v⊗2 = vT v
which translates to Python (on shape (n,) arrays) as
np.sum(v**2)

Projection of fisheye camera model by Scaramuzza

I am trying to understand the fisheye model by Scaramuzza, which is implemented in Matlab, see https://de.mathworks.com/help/vision/ug/fisheye-calibration-basics.html#mw_8aca38cc-44de-4a26-a5bc-10fb312ae3c5
The backprojection (uv to xyz) seems fairly straightforward according to the following equation:
, where rho=sqrt(u^2 +v^2)
However, how does the projection (from xyz to uv) work?! In my understanding we get a rather complex set of equations. Unfortunately, I don't find any details on that....

Okay, I believe I understand it now fully after analyzing the functions of the (windows) calibration toolbox by Scaramuzza, see https://sites.google.com/site/scarabotix/ocamcalib-toolbox/ocamcalib-toolbox-download-page
Method 1 found in file "world2cam.m"
For the projection, use the same equation above. In the projection case, the equation has three known (x,y,z) and three unknown variables (u,v and lambda). We first substitute lambda with rho by realizing that
u = x/lambda
v = y/lambda
rho=sqrt(u^2+v^2) = 1/lambda * sqrt(x^2+y^2) --> lambda = sqrt(x^2+y^2) / rho
After that, we have the unknown variables (u,v and rho)
u = x/lambda = x / sqrt(x^2+y^2) * rho
v = y/lambda = y / sqrt(x^2+y^2) * rho
z / lambda = z /sqrt(x^2+y^2) * rho = a0 + a2*rho^2 + a3*rho^3 + a4*rho^4
As you can see, the last equation now has only one unknown, namely rho. Thus, we can solve it easily using e.g. the roots function in matlab. However, the result does not always exist nor is it necessarily unique. After solving the unknown variable rho, calculating uv is very simple using the equation above.
This procedure needs to be performed for each point (x,y,z) separately and is thus rather computationally expensive for an image.
Method 2 found in file "world2cam_fast.m"
The last equation has the form rho(x,y,z). However, if we define m = z / sqrt(x^2+y^2) = tan(90°-theta), it only depends on one variable, namely rho(m).
Instead of solving this equation rho(m) for every new m, the authors "plot" the function for several values of m and fit an 8th order polynomial to these points. Using this polynomial they can calculate an approximate value for rho(m) much quicker in the following.
This becomes clear, because "world2cam_fast.m" makes use of ocam_model.pol, which is calculated in "undistort.m". "undistort.m" in turn makes use of "findinvpoly.m".

How can DWT be used in LSB substitution steganography

In steganography, the least significant bit (LSB) substitution method embeds the secret bits in the place of bits from the cover medium, for example, image pixels. In some methods, the Discrete Wavelet Transform (DWT) of the image is taken and the secret bits are embedded in the DWT coefficients, after which the inverse trasform is used to reconstruct the stego image.
However, the DWT produces float coefficients and for the LSB substitution method integer values are required. Most papers I've read use the 2D Haar Wavelet, yet, they aren't clear on their methodology. I've seen the transform being defined in terms of low and high pass filters (float transforms), or taking the sum and difference of pair values, or the average and mean difference, etc.
More explicitly, either in the forward or the inverse transform (but not necessarily in both depending on the formulas used) eventually float numbers will appear. I can't have them for the coefficients because the substitution won't work and I can't have them for the reconstructed pixels because the image requires integer values for storage.
For example, let's consider a pair of pixels, A and B as a 1D array. The low frequency coefficient is defined by the sum, i.e., s = A + B, and the high frequency coefficient by the difference, i.e., d = A - B. We can then reconstruct the original pixels with B = (s - d) / 2 and A = s - B. However, after any bit twiddling with the coefficients, s - d may not be even anymore and float values will emerge for the reconstructed pixels.
For the 2D case, the 1D transform is applied separately for the rows and the columns, so eventually a division by 4 will occur somewhere. This can result in values with float remainders .00, .25, .50 and .75. I've only come across one paper which addresses this issue. The rest are very vague in their methodology and I struggle to replicate them. Yet, the DWT has been widely implemented for image steganography.
My question is, since some of the literature I've read hasn't been enlightening, how can this be possible? How can one use a transform which introduces float values, yet the whole steganography method requires integers?

One solution that has worked for me is using the Integer Wavelet Transform, which some also refer to as a lifting scheme. For the Haar wavelet, I've seen it defined as:
s = floor((A + B) / 2)
d = A - B
And for inverse:
A = s + floor((d + 1) / 2)
B = s - floor(d / 2)
All the values throughout the whole process are integers. The reason it works is because the formulas contain information about both the even and odd parts of the pixels/coefficients, so there is no loss of information from rounding down. Even if one modifies the coefficients and then takes the inverse transform, the reconstructed pixels will still be integers.
Example implementation in Python:
import numpy as np
def _iwt(array):
output = np.zeros_like(array)
nx, ny = array.shape
x = nx // 2
for j in xrange(ny):
output[0:x,j] = (array[0::2,j] + array[1::2,j])//2
output[x:nx,j] = array[0::2,j] - array[1::2,j]
return output
def _iiwt(array):
output = np.zeros_like(array)
nx, ny = array.shape
x = nx // 2
for j in xrange(ny):
output[0::2,j] = array[0:x,j] + (array[x:nx,j] + 1)//2
output[1::2,j] = output[0::2,j] - array[x:nx,j]
return output
def iwt2(array):
return _iwt(_iwt(array.astype(int)).T).T
def iiwt2(array):
return _iiwt(_iiwt(array.astype(int).T).T)
Some languages already have built-in functions for this purpose. For example, Matlab uses lwt2() and ilwt2() for 2D lifting-scheme wavelet transform.
els = {'p',[-0.125 0.125],0};
lshaarInt = liftwave('haar','int2int');
lsnewInt = addlift(lshaarInt,els);
[cAint,cHint,cVint,cDint] = lwt2(x,lsnewInt) % x is your image
xRecInt = ilwt2(cAint,cHint,cVint,cDint,lsnewInt);
An article example where IWT was used for image steganography is Raja, K.B. et. al (2008) Robust image adaptive steganography using integer wavelets.

fmincon : impose vector greater than zero constraint

How do you impose a constraint that all values in a vector you are trying to optimize for are greater than zero, using fmincon()?
According to the documentation, I need some parameters A and b, where A*x ≤ b, but I think if I make A a vector of -1's and b 0, then I will have optimized for the sum of x>0, instead of each value of x greater than 0.
Just in case you need it, here is my code. I am trying to optimize over a vector (x) such that the (componentwise) product of x and a matrix (called multiplierMatrix) makes a matrix for which the sum of the columns is x.
function [sse] = myfun(x) % this is a nested function
bigMatrix = repmat(x,1,120) .* multiplierMatrix;
answer = sum(bigMatrix,1)';
sse = sum((expectedAnswer - answer).^2);
end
xGuess = ones(1:120,1);
[sse xVals] = fmincon(#myfun,xGuess,???);
Let me know if I need to explain my problem better. Thanks for your help in advance!

You can use the lower bound:
xGuess = ones(120,1);
lb = zeros(120,1);
[sse xVals] = fmincon(#myfun,xGuess, [],[],[],[], lb);
note that xVals and sse should probably be swapped (if their name means anything).
The lower bound lb means that elements in your decision variable x will never fall below the corresponding element in lb, which is what you are after here.
The empties ([]) indicate you're not using linear constraints (e.g., A,b, Aeq,beq), only the lower bounds lb.
Some advice: fmincon is a pretty advanced function. You'd better memorize the documentation on it, and play with it for a few hours, using many different example problems.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas