I have a matrix A (nXm) . My ultimate goal is to get Z of dimension (nXmXm) Currently I am doing it using this but can it be done without using for loop using some matrix.tensordot or matrix.multiply.outer
for i in range(0,A.shape[0]):
Z[i,:,:] = np.outer(A[i,:],A[i,:])

You could use numpy's Einstein summation, like this:
np.einsum('ij, ik -> ijk', a, a)
Just for completeness, the timing comparison with the also excellent answer (+1) from unutbu:
In [39]: A = np.random.random((1000,50))
In [40]: %timeit using_einsum(A)
100 loops, best of 3: 11.6 ms per loop
In [41]: %timeit using_broadcasting(A)
100 loops, best of 3: 10.2 ms per loop
In [42]: %timeit orig(A)
10 loops, best of 3: 27.8 ms per loop
Which teaches me that
unutbu's machine is faster than mine
broadcasting would be slightly faster than np.einsum

for i in range(0,A.shape[0]):
Z[i,:,:] = np.outer(A[i,:],A[i,:])
Z_ijk = A_ij * A_ik
which can be computed using NumPy broadcasting:
Z = A[:, :, np.newaxis] * A[:, np.newaxis, :]
A[:, :, np.newaxis] has shape (n, m, 1) and A[:, np.newaxis, :] has shape
(n, 1, m). Multiplying the two causes both arrays to be broadcasted up to
shape (n, m, m).
NumPy multiplication is always performed elementwise. The values along the
broadcasted axis are the same everywhere, so elementwise multiplication results
in Z_ijk = A_ij * A_ik.
import numpy as np
def orig(A):
Z = np.empty(A.shape+(A.shape[-1],), dtype=A.dtype)
for i in range(0,A.shape[0]):
Z[i,:,:] = np.outer(A[i,:],A[i,:])
return Z
def using_broadcasting(A):
return A[:, :, np.newaxis] * A[:, np.newaxis, :]
Here is a sanity check showing this produces the correct result:
A = np.random.random((1000,50))
assert np.allclose(using_broadcasting(A), orig(A))
By choosing A.shape[0] to be large we get an example which shows off the
advantage of broadcasting over looping in Python:
In [107]: %timeit using_broadcasting(A)
10 loops, best of 3: 6.12 ms per loop
In [108]: %timeit orig(A)
100 loops, best of 3: 16.9 ms per loop


Loop through numpy array on indexes and apply function [duplicate]

I have two arrays that have the shapes N X T and M X T. I'd like to compute the correlation coefficient across T between every possible pair of rows n and m (from N and M, respectively).
What's the fastest, most pythonic way to do this? (Looping over N and M would seem to me to be neither fast nor pythonic.) I'm expecting the answer to involve numpy and/or scipy. Right now my arrays are numpy arrays, but I'm open to converting them to a different type.
I'm expecting my output to be an array with the shape N X M.
N.B. When I say "correlation coefficient," I mean the Pearson product-moment correlation coefficient.
Here are some things to note:
The numpy function correlate requires input arrays to be one-dimensional.
The numpy function corrcoef accepts two-dimensional arrays, but they must have the same shape.
The scipy.stats function pearsonr requires input arrays to be one-dimensional.
Correlation (default 'valid' case) between two 2D arrays:
You can simply use matrix-multiplication like so -
out =,arr_two.T)
Correlation with the default "valid" case between each pairwise row combinations (row1,row2) of the two input arrays would correspond to multiplication result at each (row1,row2) position.
Row-wise Correlation Coefficient calculation for two 2D arrays:
def corr2_coeff(A, B):
# Rowwise mean of input arrays & subtract from input arrays themeselves
A_mA = A - A.mean(1)[:, None]
B_mB = B - B.mean(1)[:, None]
# Sum of squares across rows
ssA = (A_mA**2).sum(1)
ssB = (B_mB**2).sum(1)
# Finally get corr coeff
return, B_mB.T) / np.sqrt([:, None],ssB[None]))
This is based upon this solution to How to apply corr2 functions in Multidimentional arrays in MATLAB
This section compares runtime performance with the proposed approach against generate_correlation_map & loopy pearsonr based approach listed in the other answer.(taken from the function test_generate_correlation_map() without the value correctness verification code at the end of it). Please note the timings for the proposed approach also include a check at the start to check for equal number of columns in the two input arrays, as also done in that other answer. The runtimes are listed next.
Case #1:
In [106]: A = np.random.rand(1000, 100)
In [107]: B = np.random.rand(1000, 100)
In [108]: %timeit corr2_coeff(A, B)
100 loops, best of 3: 15 ms per loop
In [109]: %timeit generate_correlation_map(A, B)
100 loops, best of 3: 19.6 ms per loop
Case #2:
In [110]: A = np.random.rand(5000, 100)
In [111]: B = np.random.rand(5000, 100)
In [112]: %timeit corr2_coeff(A, B)
1 loops, best of 3: 368 ms per loop
In [113]: %timeit generate_correlation_map(A, B)
1 loops, best of 3: 493 ms per loop
Case #3:
In [114]: A = np.random.rand(10000, 10)
In [115]: B = np.random.rand(10000, 10)
In [116]: %timeit corr2_coeff(A, B)
1 loops, best of 3: 1.29 s per loop
In [117]: %timeit generate_correlation_map(A, B)
1 loops, best of 3: 1.83 s per loop
The other loopy pearsonr based approach seemed too slow, but here are the runtimes for one small datasize -
In [118]: A = np.random.rand(1000, 100)
In [119]: B = np.random.rand(1000, 100)
In [120]: %timeit corr2_coeff(A, B)
100 loops, best of 3: 15.3 ms per loop
In [121]: %timeit generate_correlation_map(A, B)
100 loops, best of 3: 19.7 ms per loop
In [122]: %timeit pearsonr_based(A, B)
1 loops, best of 3: 33 s per loop
#Divakar provides a great option for computing the unscaled correlation, which is what I originally asked for.
In order to calculate the correlation coefficient, a bit more is required:
import numpy as np
def generate_correlation_map(x, y):
"""Correlate each n with each m.
x : np.array
Shape N X T.
y : np.array
Shape M X T.
N X M array in which each element is a correlation coefficient.
mu_x = x.mean(1)
mu_y = y.mean(1)
n = x.shape[1]
if n != y.shape[1]:
raise ValueError('x and y must ' +
'have the same number of timepoints.')
s_x = x.std(1, ddof=n - 1)
s_y = y.std(1, ddof=n - 1)
cov =,
y.T) - n *[:, np.newaxis],
mu_y[np.newaxis, :])
return cov /[:, np.newaxis], s_y[np.newaxis, :])
Here's a test of this function, which passes:
from scipy.stats import pearsonr
def test_generate_correlation_map():
x = np.random.rand(10, 10)
y = np.random.rand(20, 10)
desired = np.empty((10, 20))
for n in range(x.shape[0]):
for m in range(y.shape[0]):
desired[n, m] = pearsonr(x[n, :], y[m, :])[0]
actual = generate_correlation_map(x, y)
np.testing.assert_array_almost_equal(actual, desired)
For those interested in computing the Pearson correlation coefficient between a 1D and 2D array, I wrote the following function, where x is a 1D array and y a 2D array.
def pearsonr_2D(x, y):
"""computes pearson correlation coefficient
where x is a 1D and y a 2D array"""
upper = np.sum((x - np.mean(x)) * (y - np.mean(y, axis=1)[:,None]), axis=1)
lower = np.sqrt(np.sum(np.power(x - np.mean(x), 2)) * np.sum(np.power(y - np.mean(y, axis=1)[:,None], 2), axis=1))
rho = upper / lower
return rho
Example run:
>>> x
Out[1]: array([1, 2, 3])
>>> y
Out[2]: array([[ 1, 2, 3],
[ 6, 7, 12],
[ 9, 3, 1]])
>>> pearsonr_2D(x, y)
Out[3]: array([ 1. , 0.93325653, -0.96076892])

fastest way to use numpy.interp on a 2-D array

I have the following problem. I am trying to find the fastest way to use the interpolation method of numpy on a 2-D array of x-coordinates.
import numpy as np
xp = [0.0, 0.25, 0.5, 0.75, 1.0]
x = np.random.rand(10)
fp = np.random.rand(10, 5)
So basically, xp would be the x-coordinates of the data points, x would be an array containing the x-coordinates of the values I want to interpolate, and fp would be a 2-D array containing y-coordinates of the datapoints.
[0.0, 0.25, 0.5, 0.75, 1.0]
array([ 0.54340494, 0.27836939, 0.42451759, 0.84477613, 0.00471886,
0.12156912, 0.67074908, 0.82585276, 0.13670659, 0.57509333])
array([[ 0.89132195, 0.20920212, 0.18532822, 0.10837689, 0.21969749],
[ 0.97862378, 0.81168315, 0.17194101, 0.81622475, 0.27407375],
[ 0.43170418, 0.94002982, 0.81764938, 0.33611195, 0.17541045],
[ 0.37283205, 0.00568851, 0.25242635, 0.79566251, 0.01525497],
[ 0.59884338, 0.60380454, 0.10514769, 0.38194344, 0.03647606],
[ 0.89041156, 0.98092086, 0.05994199, 0.89054594, 0.5769015 ],
[ 0.74247969, 0.63018394, 0.58184219, 0.02043913, 0.21002658],
[ 0.54468488, 0.76911517, 0.25069523, 0.28589569, 0.85239509],
[ 0.97500649, 0.88485329, 0.35950784, 0.59885895, 0.35479561],
[ 0.34019022, 0.17808099, 0.23769421, 0.04486228, 0.50543143]])
The desired outcome should look like this:
array([ 0.17196795, 0.73908678, 0.85459966, 0.49980648, 0.59893702,
0.9344241 , 0.19840596, 0.45777785, 0.92570835, 0.17977264])
Again, looking for the fastest way to do cause this is a simplified version of my problem, which has a length of about 1 million versus 10.
So basically you want output equivalent to
np.array([np.interp(x[i], xp, fp[i]) for i in range(x.size)])
But that for loop is going to make that pretty slow for large x.size
This should work:
def multiInterp(x, xp, fp):
i, j = np.nonzero(np.diff(np.array(xp)[None,:] < x[:,None]))
d = (x - xp[j]) / np.diff(xp)[j]
return fp[i, j] + np.diff(fp)[i, j] * d
EDIT: This works even better and can handle bigger arrays:
def multiInterp2(x, xp, fp):
i = np.arange(x.size)
j = np.searchsorted(xp, x) - 1
d = (x - xp[j]) / (xp[j + 1] - xp[j])
return (1 - d) * fp[i, j] + fp[i, j + 1] * d
multiInterp2(x, xp, fp)
array([ 0.17196795, 0.73908678, 0.85459966, 0.49980648, 0.59893702,
0.9344241 , 0.19840596, 0.45777785, 0.92570835, 0.17977264])
Timing tests with original data:
%timeit multiInterp2(x, xp, fp)
The slowest run took 6.87 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 25.5 µs per loop
%timeit np.concatenate([compiled_interp(x[[i]], xp, fp[i]) for i in range(fp.shape[0])])
The slowest run took 4.03 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 39.3 µs per loop
Seems to be faster even for a small size of x
Let's try something much, much bigger:
n = 10000
m = 10000
xp = np.linspace(0, 1, n)
x = np.random.rand(m)
fp = np.random.rand(m, n)
%timeit b() # kazemakase's above
10 loops, best of 3: 38.4 ms per loop
%timeit multiInterp2(x, xp, fp)
100 loops, best of 3: 2.4 ms per loop
The advantages scale a lot better even than the complied version of np.interp
np.interp is basically a wrapper around the compiled numpy.core.multiarray.interp. We can shave off a bit of performance by using it directly:
from numpy.core.multiarray import interp as compiled_interp
def a(x=x, xp=xp, fp=fp):
return np.array([np.interp(x[i], xp, fp[i]) for i in range(fp.shape[0])])
def b(x=x, xp=xp, fp=fp):
return np.concatenate([compiled_interp(x[[i]], xp, fp[i]) for i in range(fp.shape[0])])
def multiInterp(x=x, xp=xp, fp=fp):
i, j = np.nonzero(np.diff(xp[None,:] < x[:,None]))
d = (x - xp[j]) / np.diff(xp)[j]
return fp[i, j] + np.diff(fp)[i, j] * d
Timing tests show that for the example arrays this is en par with Daniel Forsman's nice solution:
%timeit a()
10000 loops, best of 3: 44.7 µs per loop
%timeit b()
10000 loops, best of 3: 32 µs per loop
%timeit multiInterp()
10000 loops, best of 3: 33.3 µs per loop
For somewhat larger arrays multiInterp owns the floor:
n = 100
m = 1000
xp = np.linspace(0, 1, n)
x = np.random.rand(m)
fp = np.random.rand(m, n)
%timeit a()
100 loops, best of 3: 4.14 ms per loop
%timeit b()
100 loops, best of 3: 2.97 ms per loop
%timeit multiInterp()
1000 loops, best of 3: 1.42 ms per loop
But for even larger ones it falls behind:
n = 1000
m = 10000
%timeit a()
10 loops, best of 3: 43.3 ms per loop
%timeit b()
10 loops, best of 3: 32.9 ms per loop
%timeit multiInterp()
10 loops, best of 3: 132 ms per loop
Finally, for very big arrays (I'm on 32 bit) temporary arrays become a problem:
n = 10000
m = 10000
%timeit a()
10 loops, best of 3: 46.2 ms per loop
%timeit b()
10 loops, best of 3: 32.1 ms per loop
%timeit multiInterp()
# MemoryError

Numpy mean and std over every terms of arrays

I have a list of 2 dimensional arrays (same shape), and would like to get the mean and deviation for all terms, in a result array of the same shape as the inputs. I have trouble understanding from the doc whether this is possible. All my attempts with axis and keepdims parameters produce results of different shapes.
I would like for example to have: mean([x, x]) equal to x, and std([x, x]) zeroes shaped like x.
Is this possible without reshaping the arrays ? If not, how to do it with reshaping ?
>> x= np.array([[1,2],[3,4]])
>>> y= np.array([[2,3],[4,5]])
>>> np.mean([x,y])
I want [[1.5,2.5],[3.5,4.5]] instead.
As Divikar points out, you can pass the list of arrays to np.mean and specify axis=0 to average over corresponding values from each array in the list:
In [13]: np.mean([x,y], axis=0)
array([[ 1.5, 2.5],
[ 3.5, 4.5]])
This works for lists of arbitrary length. For just two arrays, (x+y)/2.0 is faster:
In [20]: %timeit (x+y)/2.0
100000 loops, best of 3: 1.96 µs per loop
In [21]: %timeit np.mean([x,y], axis=0)
10000 loops, best of 3: 21.6 µs per loop

Nicer way to do nested dot products in numpy?

I'm finding this happening to me a lot: I want to compute a matrix multiplication of the sort (X^TX)^{-1}XX^T, or something along these lines. I end up doing something like
X = np.array([[1,2],[3,4]])
a =, X)
b =, X)
answer =, np.transpose(X))
Is there a better way to do this without resorting to the np.matrix type? Is there a way to do transpose without typing np.transpose?
Let's explore the options a bit
def array1(X):
a =, X)
b =, X)
return, X.T)
Basically your code, but using the method expression dot and .T notation.
Testing with your X:
In [12]: array1(X)
array([[-13.5, -32.5],
[ 10. , 24. ]])
What's the matrix equivalent?
In [17]: M=np.matrix(X)
In [18]: (M.T*M).I*M*M.T
matrix([[-13.5, -32.5],
[ 10. , 24. ]])
The matrix version is more compact, but is it clearer? It's not faster.
In [22]: timeit array1(X)
10000 loops, best of 3: 48.7 µs per loop
In [23]: timeit (M.T*M).I*M*M.T
10000 loops, best of 3: 95.4 µs per loop
First stab at a einsum equivalent
In [32]: np.einsum('ij,jk,lk',inv(np.einsum('ji,jk',X,X)),X,X)
array([[-13.5, -32.5],
[ 10. , 24. ]])
In [33]: timeit np.einsum('ij,jk,lk',inv(np.einsum('ji,jk',X,X)),X,X)
10000 loops, best of 3: 55.1 µs per loop
basically the same as the dot version.
The matrix version shows me that I can simplify the array version to:
(same timing)

Tensordot for numpy array and scipy sparse matrix

For a current project I have to compute the inner product of a lot of vectors with the same matrix (which is quite sparse). The vectors are associated with a two dimensional grid so I store the vectors in a three dimensional array:
X is an array of dim (I,J,N). The matrix A is of dim (N,N). Now the task is to compute[i,j]) for each i,j in I,J.
For numpy arrays, this is quite easily accomplished with
Y =
Now I'd like to store A as sparse matrix since it is sparse and only contains a very limited number of nonzero entries which results in a lot of unnecessary multiplications. Unfortunately, the above solution won't work since the numpy dot doesn't work with sparse matrices. And to the best of my knowledge there is not tensordot-like operation for scipy sparse.
Does anybody know a nice and efficient way to compute the above array Y with a sparse matrix A?
The obvious approach is to run a loop over your vectors and use the sparse matrix's .dot method:
def naive_sps_x_dense_vecs(sps_mat, dense_vecs):
rows, cols = sps_mat.shape
I, J, _ = dense_vecs.shape
out = np.empty((I, J, rows))
for i in xrange(I):
for j in xrange(J):
out[i, j] =[i, j])
return out
But you may be able to speed things up a little by reshaping your 3d array to 2d and avoid the Python looping:
def sps_x_dense_vecs(sps_mat, dense_vecs):
rows, cols = sps_mat.shape
vecs_shape = dense_vecs.shape
dense_vecs = dense_vecs.reshape(-1, cols)
out =
return out.reshape(vecs.shape[:-1] + (rows,))
The problem is that we need to have the sparse matrix be the first argument, so that we can call its .dot method, which means that the return is transposed, which in turns means that after transposing, the last reshape is going to trigger a copy of the whole array. So for fairly large values of I and J, combined with not-so-large values of N, the latter method will be several times faster than the former, but performance may even be reversed for other combinations of the parameters:
n, i, j = 100, 500, 500
a = sps.rand(n, n, density=1/n, format='csc')
vecs = np.random.rand(i, j, n)
>>> np.allclose(naive_sps_x_dense_vecs(a, vecs), sps_x_dense_vecs(a, vecs))
n, i, j = 100, 500, 500
%timeit naive_sps_x_dense_vecs(a, vecs)
1 loops, best of 3: 3.85 s per loop
%timeit sps_x_dense_vecs(a, vecs)
1 loops, best of 3: 576 ms per
n, i, j = 1000, 200, 200
%timeit naive_sps_x_dense_vecs(a, vecs)
1 loops, best of 3: 791 ms per loop
%timeit sps_x_dense_vecs(a, vecs)
1 loops, best of 3: 1.3 s per loop
You could use jaxto achieve what you are looking for. Let's suppose your sparse matrix is in csr_arrayformat. You would first transform it into a jax BCOO array
from scipy import sparse
from jax.experimental import sparse as jaxsparse
import jax.numpy as jnp
def convert_to_BCOO(x):
x = x.transpose() #get the transpose
x = x.tocoo()
x = jaxsparse.BCOO((, jnp.column_stack((x.row, x.col))),
x = L.sort_indices()
You could then use jax.sparsify to create a sparsified dot product as follows.
def dot(x, y):
return, y)
sp_dot = jaxsparse.sparsify(dot)
A_transpose = convert_to_BCOO(A)
Y = sp_dot(X,A_transpose)
The function sp_dot now follows the exact same rules as
Hope this helps!