numpy: efficient sum of kronecker products

numpy: efficient sum of kronecker products - numpy

I have three sets of matrices {A_i}, {B_i}, and {C_i} with n matrices in each set
The A_i are of dimension l x m, the B_i are of dimension m x o and the C_i are of dimension p x q
I would like to compute the following:
Here is a concrete example for what I am after
A = np.arange(12).reshape(2,3,2)
B = np.arange(12,24).reshape(2,2,3)
C = np.arange(32).reshape(2,4,4)
result = np.zeros((12,12))
for i in range(2):
result += np.kron(A[i,:,:] # B[i,:,:], C[i,:,:])
How can I implement this more efficiently?
Many thanks for your help!

As suggested, I had a look into numpy.einsum. This turned out to be quite nice. A solution is:
np.einsum('ijk,imn->jmkn', np.einsum('ijk,ikm->ijm', A, B), C).reshape(A.shape[1] * C.shape[1], B.shape[2] * C.shape[2])
The inner np.einsum() produces a 3d array of the products of the 2d "slices" of A and B
The outer np.einsum() mimics (after appropriate reshaping) the kronecker product of this 3d matrix and C and summation.
I found the following two posts very helpful:
Understanding NumPy's einsum
Speeding Up Kronecker Products Numpy

Related

Expanding matrix dimension by multiplication in pytorch

I am using pytorch and I have tensor A of dimensions [a,b,c] and tensor B with dimensions [a,d]. I want to create tensor C of dimension [a,b,c,d], that is multiplication of elements in A with elements in B with following operation:
for i in range(a):
for j in range(b):
for k in range(c):
for l in range(d):
C[i,j,k,l]=A[i,j,k]*B[i,l]
This works as intended, but is very slow. What would be best practice for such operation in pytorch?
Thank you.

A.reshape(a, b, c, 1) * B.reshape(a, 1, 1, d)
will do the trick.

Numpy/Scipy : solving several least squares with the same design matrix

I face a least square problem that i solve via scipy.linalg.lstsq(M,b), where :
M has shape (n,n)
b has shape (n,)
The issue is that i have to solve it a bunch of time for different b's. How can i do something more efficient ? I guess that lstsq does a lot of things independently of the value of b.
Ideas ?

In the case your linear system is well-determined, I'll store M LU decomposition and use it for all the b's individually or simply do one solve call for 2d-array B representing the horizontally stacked b's, it really depends on your problem here but this is globally the same idea. Let's suppose you've got each b one at a time, then:
import numpy as np
from scipy.linalg import lstsq, lu_factor, lu_solve, svd, pinv
# as you didn't specified any practical dimensions
n = 100
# number of b's
nb_b = 10
# generate random n-square matrix M
M = np.random.rand(n**2).reshape(n,n)
# Set of nb_b of right hand side vector b as columns
B = np.random.rand(n*nb_b).reshape(n,nb_b)
# compute pivoted LU decomposition of M
M_LU = lu_factor(M)
# then solve for each b
X_LU = np.asarray([lu_solve(M_LU,B[:,i]) for i in range(nb_b)])
but if it is under or over-determined, you need to use lstsq as you did:
X_lstsq = np.asarray([lstsq(M,B[:,i])[0] for i in range(nb_b)])
or simply store the pseudo-inverse M_pinv with pinv (built on lstsq) or pinv2 (built on SVD):
# compute the pseudo-inverse of M
M_pinv = pinv(M)
X_pinv = np.asarray([np.dot(M_pinv,B[:,i]) for i in range(nb_b)])
or you can also do the work by yourself, as in pinv2 for instance, just store the SVD of M, and solve this manually:
# compute svd of M
U,s,Vh = svd(M)
def solve_svd(U,s,Vh,b):
# U diag(s) Vh x = b <=> diag(s) Vh x = U.T b = c
c = np.dot(U.T,b)
# diag(s) Vh x = c <=> Vh x = diag(1/s) c = w (trivial inversion of a diagonal matrix)
w = np.dot(np.diag(1/s),c)
# Vh x = w <=> x = Vh.H w (where .H stands for hermitian = conjugate transpose)
x = np.dot(Vh.conj().T,w)
return x
X_svd = np.asarray([solve_svd(U,s,Vh,B[:,i]) for i in range(nb_b)])
which all give the same result if checked with np.allclose (unless the system is not well-determined resulting in the LU direct approach failure). Finally in terms of performances:
%timeit M_LU = lu_factor(M); X_LU = np.asarray([lu_solve(M_LU,B[:,i]) for i in range(nb_b)])
1000 loops, best of 3: 1.01 ms per loop
%timeit X_lstsq = np.asarray([lstsq(M,B[:,i])[0] for i in range(nb_b)])
10 loops, best of 3: 47.8 ms per loop
%timeit M_pinv = pinv(M); X_pinv = np.asarray([np.dot(M_pinv,B[:,i]) for i in range(nb_b)])
100 loops, best of 3: 8.64 ms per loop
%timeit U,s,Vh = svd(M); X_svd = np.asarray([solve_svd(U,s,Vh,B[:,i]) for i in range(nb_b)])
100 loops, best of 3: 5.68 ms per loop
Nevertheless, it's up to you to check these with appropriate dimensions.
Hope this helps.

Your question is unclear, but I am guessing you mean to compute the equation Mx=b through scipy.linalg.lstsq(M,b) for different arrays (b0, b1, b2..). If that is the case you could just parallelize the process with concurrent.futures.ProcessPoolExecutor. The documentation for this is fairly simple and can help python run multiple scipy solvers at once.
Hope this helps.

You can factorize M into either QR or SVD products and find the lsq solution manually.

Efficient way to calculate the pairwise matrix product between one tensor and all the rolling of another tensor

Suppose we have two tensors:
tensor A whose shape is (d,m,n)
tensor B whose shape is (d,n,l).
If we want to get the pairwise matrix product of the right-most matrix of A and B, I think we can use np.einsum('dmn,...nl->d...ml',A,B) whose size is (d,d,m,l). However, I would like to get the pairwise product of not all the pairs.
Import a parameter k, 1<=k<=d, I want to get the following pairwise matrix product:
from
A(0,...)#B(0,...)
to
A(0,...)#B(k-1,...)
;
from
A(1,...)#B(1,...)
to
A(1,...)#B(k,...)
;
....
;
from
A(d-2,...)#B(d-2,...),
A(d-2,...)#B(d-1,...)
to
A(d-2,...)#B(k-3,...)
;
from
A(d-1,...)#B(d-1,...)
to
A(d-1,...)#B(k-2,...)
.
Note here we we use a rolling way to deal with tensor B. (like numpy.roll).
Finally, we actually get a tensor whose shape is (d,k,m,l).
What's the most efficient way to do this.
I know several ways like:
First get np.einsum('dmn,...nl->d...ml',A,B), then use a mask to extract the (d,k) pairs.
tile B first, then use einsum in some way.
But I think there exists a better way.

I doubt you can do much better than a for loop. Here is, for example, a vectorized version using einsum and stride_tricks compared to a double for loop:
Code:
from simple_benchmark import BenchmarkBuilder, MultiArgument
import numpy as np
from numpy.lib.stride_tricks import as_strided
B = BenchmarkBuilder()
#B.add_function()
def loopy(A,B,k):
d,m,n = A.shape
l = B.shape[-1]
out = np.empty((d,k,m,l),int)
for i in range(d):
for j in range(k):
out[i,j] = A[i]#B[(i+j)%d]
return out
#B.add_function()
def vectory(A,B,k):
d,m,n = A.shape
l = B.shape[-1]
BB = np.concatenate([B,B[:k-1]],0)
BB = as_strided(BB,(d,k,n,l),np.repeat(BB.strides,(2,1,1)))
return np.einsum("ikl,ijln->ijkn",A,BB)
#B.add_arguments('d x k x m x n x l')
def argument_provider():
for exp in range(10):
d,k,m,n,l = (np.r_[1.6,1.5,1.5,1.5,1.5]**exp*(4,2,2,2,2)).astype(int)
print(d,k,m,n,l)
A = np.random.randint(0,10,(d,m,n))
B = np.random.randint(0,10,(d,n,l))
yield k*d*m*n*l,MultiArgument([A,B,k])
r = B.run()
r.plot()
import pylab
pylab.savefig('diagwa.png')

Vectorizing with summation vs dot product

I am writing a simple linear regression cost function (Python) for a simple neural network. I have come across the following two alternate ways of summing the error (cost) over m examples using numpy (np) matrices.
The cost function is:
def compute_cost(X, Y, W):
m = Y.size;
H = h(X,W)
error = H-Y
J = (1/(2*m)) * np.sum(error **2, axis=0) #1 (sum squared error over m examples)
return J
X is the input matrix.
Y is the output matrix (labels).
W is the weights matrix.
It seems that the statement:
J = (1/(2*m)) * np.sum(error **2, axis=0) #1 (sum squared error over m examples)
can be replaced by:
J = (1/(2*m)) * np.dot(error.T, error) #2
with the same result.
I do not understand why np.dot is equivalent to summing over m examples or just why the two statements give the same result. Could you please provide some leads and also point me to some link(s) where I can read more and understand this relationship between np.sum and np.dot.

There's nothing special, just simple linear algebra.
According to numpy documentation, np.dot(a,b) performs different operation on different types of inputs.
If both a and b are 1-D arrays, it is inner product of vectors
(without complex conjugation).
If both a and b are 2-D arrays, it is
matrix multiplication, but using matmul or a # b is preferred.
If your error is 1-D array, then the transpose error.T is equal to error, then the operation np.dot is the inner product of them, which equals to the sum of each element to the power of 2.
If your error is 2-D array, then you should follow matrix multiplication principle, so each row of error.T will multiply by each column of error. When your error is a column vector, then the result will be a 1*1 matrix, which is similar to a scalar. when your error is a 1-by-N row vector, then it returns an N-by-N matrix.

how to vectorize this in tensorflow?

Let's day A is a M x J matrix and B is a N x J matrix.
I would like to generate a M x N matrix S where:
S_ij = w^t[A_i;B_j;A_i o B_j)
Basically each element of the result matrix S is some vector w (a row of a matrix W) dot product with: col A_j concat with col B_j and concat with the element-wise product of each A_i and B_j for all i and j.
Ideally i would like to vectorize the operation and only use matrix form S, A and B instead of slicing and doing for loops. However I am new to tensorflow and can't seem to figure out how to do the code in matrix form as A_i o B_j seems to result in a 2d matrix and it needs to be concat with vector A_i and B_j.
Thanks

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

numpy: efficient sum of kronecker products - numpy

Related

Expanding matrix dimension by multiplication in pytorch

Numpy/Scipy : solving several least squares with the same design matrix

Efficient way to calculate the pairwise matrix product between one tensor and all the rolling of another tensor

Vectorizing with summation vs dot product

how to vectorize this in tensorflow?

Categories

Resources