How to vectorize this operation in numpy? - numpy

I have a 2d array s and I want to calculate differences elementwise, i.e.:
Since it cannot be written as a single matrix multiplication, I was wondering what is the proper way to vectorize it?

You can use broadcasting for that: d = s[:, None, :] - s[None, :, :]. Note the None enable you to create a new dimension. Numpy implicitly perform the broadcasting operation between the two arrays.

Related

Numpy element-wise addition with multiple arrays

I'd like to know if there is a more efficient/pythonic way to add multiple numpy arrays (2D) rather than:
def sum_multiple_arrays(list_of_arrays):
a = np.zeros(shape=list_of_arrays[0].shape) #initialize array of 0s
for array in list_of_arrays:
a += array
return a
Ps: I am aware of np.add() but it works only with 2 arrays.
np.sum(list_of_arrays, axis=0)
should work. Or
np.add.reduce(list_of_arrays).

Implementing custom matrix multiplication-like operations in numpy

I want to implement an operation on two matrices that is similar to matrix multiplication in that it each element of the resulting matrix is a function of the ith row of the first matrix and the jth column of the second matrix.
I would like to be able to do this using numpy and/or pandas using vectorized computations.
In other words:
How do I implemenent $A \bigotimes B = C$
where $C_{ij} = sum_k f(a_{ik}, b_{kj})$ in numpy and/or pandas?

Efficient axis-wise cartesian product of multiple 2D matrices with Numpy or TensorFlow

So first off, I think what I'm trying to achieve is some sort of Cartesian product but elementwise, across the columns only.
What I'm trying to do is, if you have multiple 2D arrays of size [ (N,D1), (N,D2), (N,D3)...(N,Dn) ]
The result is thus to be a combinatorial product across axis=1 such that the final result will then be of shape (N, D) where D=D1*D2*D3*...Dn
e.g.
A = np.array([[1,2],
[3,4]])
B = np.array([[10,20,30],
[5,6,7]])
cartesian_product( [A,B], axis=1 )
>> np.array([[ 1*10, 1*20, 1*30, 2*10, 2*20, 2*30 ]
[ 3*5, 3*6, 3*7, 4*5, 4*6, 4*7 ]])
and extendable to cartesian_product([A,B,C,D...], axis=1)
e.g.
A = np.array([[1,2],
[3,4]])
B = np.array([[10,20],
[5,6]])
C = np.array([[50, 0],
[60, 8]])
cartesian_product( [A,B,C], axis=1 )
>> np.array([[ 1*10*50, 1*10*0, 1*20*50, 1*20*0, 2*10*50, 2*10*0, 2*20*50, 2*20*0]
[ 3*5*60, 3*5*8, 3*6*60, 3*6*8, 4*5*60, 4*5*8, 4*6*60, 4*6*8]])
I have a working solution that essentially creates an empty (N,D) matrix and then broadcasting a vector columnwise product for each column within nested for loops for each matrix in the provided list. Clearly is horrible once the arrays get larger!
Is there an existing solution within numpy or tensorflow for this? Potentially one that is efficiently paralleizable (A tensorflow solution would be wonderful but a numpy is ok and as long as the vector logic is clear then it shouldn't be hard to make a tf equivalent)
I'm not sure if I need to use einsum, tensordot, meshgrid or some combination thereof to achieve this. I have a solution but only for single-dimension vectors from https://stackoverflow.com/a/11146645/2123721 even though that solution says to work for arbitrary dimensions array (which appears to mean vectors). With that one i can do a .prod(axis=1), but again this is only valid for vectors.
thanks!
Here's one approach to do this iteratively in an accumulating manner making use of broadcasting after extending dimensions for each pair from the list of arrays for elmentwise multiplications -
L = [A,B,C] # list of arrays
n = L[0].shape[0]
out = (L[1][:,None]*L[0][:,:,None]).reshape(n,-1)
for i in L[2:]:
out = (i[:,None]*out[:,:,None]).reshape(n,-1)

no broadcasting for tf.matmul in tensorflow for 4D 3D tensors

First I find another question here No broadcasting for tf.matmul in TensorFlow
But that question does not solve my problem.
My problem is a batch of matrices multiply another batch of vectors.
x=tf.placeholder(tf.float32,shape=[10,1000,3,4])
y=tf.placeholder(tf.float32,shape=[1000,4])
x is a batch of matrices.There are 10*1000 matrices.Each matrix is of shape [3,4]
y is a batch of vectors.There are 1000 vectors.Each vector is of shape[4]
Dim 1 of x and dim 0 of y are the same.(Here is 1000)
If tf.matmul had supported broadcasting,I could write
y=tf.reshape(y,[1,1000,4,1])
result=tf.matmul(x,y)
result=tf.reshape(result,[10,1000,3])
But tf.matmul does not support broadcasting
If I use the approach of the question I referenced above
x=tf.reshape(x,[10*1000*3,4])
y=tf.transpose(y,perm=[1,0]) #[4,1000]
result=tf.matmul(x,y)
result=tf.reshape(result,[10,1000,3,1000])
The result is of shape [10,1000,3,1000],not [10,1000,3].
I don't know how to remove the redundant 1000
How to get the same result as the tf.matmul which supports broadcasting?
I solve it myself.
x=tf.transpose(x,perm=[1,0,2,3]) #[1000,10,3,4]
x=tf.reshape(x,[1000,30,4])
y=tf.reshape(y,[1000,4,1])
result=tf.matmul(x,y) #[1000,30,1]
result=tf.reshape(result,[1000,10,3])
result=tf.transpose(result,perm=[1,0,2]) #[10,1000,3]
As indicated here, you can use a function to work around:
def broadcast_matmul(A, B):
"Compute A # B, broadcasting over the first `N-2` ranks"
with tf.variable_scope("broadcast_matmul"):
return tf.reduce_sum(A[..., tf.newaxis] * B[..., tf.newaxis, :, :],
axis=-2)

einsum on a sparse matrix

It seems numpy's einsum function does not work with scipy.sparse matrices. Are there alternatives to do the sorts of things einsum can do with sparse matrices?
In response to #eickenberg's answer: The particular einsum I'm wanting to is numpy.einsum("ki,kj->ij",A,A) - the sum of the outer products of the rows.
A restriction of scipy.sparse matrices is that they represent linear operators and are thus kept two dimensional, which leads to the question: Which operation are you seeking to do?
All einsum operations on a pair of 2D matrices are very easy to write without einsum using dot, transpose and pointwise operations, provided that the result does not exceed two dimensions.
So if you need a specific operation on a number of sparse matrices, it is probable that you can write it without einsum.
UPDATE: A specific way to implement np.einsum("ki, kj -> ij", A, A) is A.T.dot(A). In order to convince yourself, please try the following example:
import numpy as np
rng = np.random.RandomState(42)
a = rng.randn(3, 3)
b = rng.randn(3, 3)
the_einsum_ab = np.einsum("ki, kj -> ij", a, b)
the_a_transpose_times_b = a.T.dot(b)
# We write a test in order to assert equality
from numpy.testing import assert_array_equal
assert_array_equal(the_einsum_ab, the_a_transpose_times_b) # This passes, so equality
This result is slightly more general. Now if you use b = a you obtain your specific result.
einsum translates the index string into a calculation using the C version of np.nditer. http://docs.scipy.org/doc/numpy/reference/arrays.nditer.html is a nice introduction to nditer. Note especially the Cython example at the end.
https://github.com/hpaulj/numpy-einsum/blob/master/einsum_py.py is a Python simulation of the einsum.
scipy.sparse has its own code (ultimately in C) to perform the basic operations, summation, matrix multiplication, etc. Sparse matricies have their own data structures. They can be lists, dictionaries, or a set of numpy arrays. Numpy notation can be used because sparse has the appropriate __xxx__ methods.
A sparse matrix is a matrix, a 2d array object. A sparse einsum could be written, but it would end up using the sparse matrix multiplication, not nditer. So at best it would be a notational convenience.
Sparse csr_matrix.dot is:
def dot(self, other):
"""Ordinary dot product
...
"""
return self * other
A=sparse.csr_matrix([[1,2],[3,4]])
A.dot(A.T).A
(A*A.T).A
A.__rmul__(A.T).A
A.__mul__(A.T).A
np.einsum('ij,kj',A.A,A.A)
# array([[ 5, 11],
# [11, 25]])