Numpy broadcasting inequality operator - numpy

Numpy broadcasting question. I have two arrays similar to these:
>my_array = np.array([[3,1,2,0] , [4,5,2,1]])
>my_array
array([[3, 1, 2, 0],
[4, 5, 2, 1]])
>second_array = np.array([2,5])
>second_array
array([2, 5])
What I want to do is transpose second_array and test, by column, to see if my_array is >= second_array . So the result would be like this:
>final_array = np.array([ [ (3 >= 2), (1>= 2), (2>=2), (0>=2)] , [(4 >=5),(5>=5),(2>=5),(1>=5)]])
>final_array
array([[ True, False, True, False],
[False, True, False, False]], dtype=bool)
I'm pretty new to matrix operations in Numpy (been doing them in R for a long time) so thanks for helping with such an introductory question.

You just need to reshape second_array so that it has appropriate dimensions:
my_array >= second_array.reshape(2,1) # or (-1,1) if height is unknown
Or equivalently:
my_array >= second_array[:,np.newaxis]

Related

How do you concatenate several 2D arrays in numpy?

I would like
np.concatenate((np.array([[5,5],[2,3]]),np.array([[6,4],[7,8]])))
to yield
[ [[5,5],[2,3]], [[6,4],[7,8]] ]
Concatenate doesn't do the trick, but I am lost on how else to do it!
you can use numpy.stack() or numpy.append() (I suggest append if you have a large code). just pay attention it is the append of numpy. not built-in append of python.
>>> import numpy as np
>>> a = np.array([[5,5],[2,3]])
>>> b = np.array([[6,4],[7,8]])
>>> np.append([a], [b], axis = 0)
# answer:
array([[[5, 5],
[2, 3]],
[[6, 4],
[7, 8]]])
now if we go with np.stack():
>>> d = np.stack((a,b))
>>> c == d
# answer:
array([[[ True, True],
[ True, True]],
[[ True, True],
[ True, True]]])
as you can see they are the same.
you can see the user guide of numpy.append here and user guide of numpy.vstack here.
for anyone wondering np.stack((a,b)) does the trick :)

Get indices of slices with at least one element obeying some condition

I have an ndarray A of shape (n, a, b)
I want a Boolean ndarray X of shape (a, b) where
X[i,j]=any(A[:, i, j] < 0)
How to achieve this?
I would use an intermediate matrix and the sum(axis) method:
np.random.seed(24)
# example matrix filled either with 0 or -1:
A = np.random.randint(2, size=(3, 2, 2)) - 1
# condition test:
X_elementwise = A < 0
# Check whether the conditions are fullfilled at least once:
X = X_elementwise.sum(axis=0) >= 1
Values for A and X:
A = array([[[-1, 0],
[-1, 0]],
[[ 0, 0],
[ 0, -1]],
[[ 0, 0],
[-1, 0]]])
X = array([[ True, False],
[ True, True]])

Row wise contain in numpy [duplicate]

This question already has an answer here:
Numpy element-wise in operation
(1 answer)
Closed 5 years ago.
I was wondering what is the best way to check a row wise contain in python numpy?
Suppose we have a vector V = [1, 2, 3, 4] and a Matrix M = [[2, 3, 4], [3, 5, 6], [4, 1, 3], [5, 4, 2]] (The number of rows in M is equal to the length of V). After performing row wise contain, I should get (False, False, True, True) because 1 is not in [2, 3, 4] and 2 is not in [3, 5, 6] and 3 is in [4, 1, 3] and 4 is in [5, 4, 2]
What would be the best way to do this operation in python numpy?
Actually, I do not want to use a for loop. That obviously could work while is not the best way to do it. I myself came up with this idea to do a subtraction and then count the number of zeros in the result which is much faster than using for loops. However, I wanted to know if there is a better way to do it.
What you're looking for is the in operator. e.g. 1 in [1,2,3] returns True
So given your values of v and m as numpy arrays as follows:
import numpy as np
v = np.array([1,2,3,4])
m = np.array([np.array([2,3,4]), np.array([3,5,6]), np.array([4,1,3]), np.array([5,4,2])])
# Checking row wise contain
result = [(v[i] in m[i]) for i in range(len(v))]
The result is:
>>> [(v[i] in m[i]) for i in range(len(v))]
[False, False, True, True]
Another solution as Divakar pointed out would be to use
>>> (m==v[:,None]).any(1)
array([False, False, True, True], dtype=bool)
However, doing some rough timing checks:
>>> start_time=time.time(); (m==v[:,None]).any(1); print(time.time()-start_time)
array([False, False, True, True], dtype=bool)
0.000586032867432
>>> start_time=time.time(); [(v[i] in m[i]) for i in range(len(v))]; print(time.time()-start_time)
[False, False, True, True]
7.20024108887e-05
The initial solution seems to be faster.

Perform matrix multiplication between two arrays and get result only on masked places

I have two dense matrices, A [200000,10], B [10,100000]. I need to multiply them to get matrix C. I can't do that directly, since the resulting matrix won't fit into the memory. Moreover, I need only a few elements from the resulting matrix, like 1-2% of the total number of elements. I have a third matrix W [200000,100000] which is sparse and has non-zero elements on exactly those places which are interesting to me in the matrix C.
Is there a way to use W as a "mask" so that the resulting matrix C will be sparse and will contain only the needed elements?
Since a matrix multiplication is just a table of dot products, we can just perform the specific dot products we need, in a vectorized fashion.
import numpy as np
import scipy as sp
iX, iY = sp.nonzero(W)
values = np.sum(A[iX]*B[:, iY].T, axis=-1) #batched dot product
C = sp.sparse.coo_matrix(values, np.asarray([iX,iY]).T)
First, get the indexes of the non zero places in W, and then you can just get the (i,j) element of the result matrix by multiplying the i-th row in A with the j-th column in B, and save the result as a tuple (i,j,res) instead of saving it as a matrix (this is the right way to save sparse matrices).
Here's one approach using np.einsum for a vectorized solution -
from scipy import sparse
from scipy.sparse import coo_matrix
# Get row, col for the output array
r,c,_= sparse.find(W)
# Get the sum-reduction using valid rows and corresponding cols from A, B
out = np.einsum('ij,ji->i',A[r],B[:,c])
# Store as sparse matrix
out_sparse = coo_matrix((out, (r, c)), shape=W.shape)
Sample run -
1) Inputs :
In [168]: A
Out[168]:
array([[4, 6, 1, 1, 1],
[0, 8, 1, 3, 7],
[2, 8, 3, 2, 2],
[3, 4, 1, 6, 3]])
In [169]: B
Out[169]:
array([[5, 2, 4],
[2, 1, 3],
[7, 7, 2],
[5, 7, 5],
[8, 5, 0]])
In [176]: W
Out[176]:
<4x3 sparse matrix of type '<type 'numpy.bool_'>'
with 5 stored elements in Compressed Sparse Row format>
In [177]: W.toarray()
Out[177]:
array([[ True, False, False],
[False, False, False],
[ True, True, False],
[ True, False, True]], dtype=bool)
2) Using dense array to perform direct calculations and verify results later on :
In [171]: (A.dot(B))*W.toarray()
Out[171]:
array([[52, 0, 0],
[ 0, 0, 0],
[73, 57, 0],
[84, 0, 56]])
3) Use the proposed codes and get sparse matrix output :
In [172]: # Using proposed codes
...: r,c,_= sparse.find(W)
...: out = np.einsum('ij,ji->i',A[r],B[:,c])
...: out_sparse = coo_matrix((out, (r, c)), shape=W.shape)
...:
4) Finally verify results by converting to dense/array version and checking against direct version -
In [173]: out_sparse.toarray()
Out[173]:
array([[52, 0, 0],
[ 0, 0, 0],
[73, 57, 0],
[84, 0, 56]])

Creating matrix out of an array of categories in numpy

I have a length-n numpy array, y, of integers in the range [0...k-1]. From this, I would like to create an n-by-k numpy matrix M, where M[i,j] is 1 if y[i]==j, and 0 else.
What is the best way to do this in numpy?
Use broadcasting:
a = np.array([1, 2, 3, 1, 2, 2, 3, 0])
m = a[:, None] == np.arange(max(a)+1)
the result is:
array([[False, True, False, False],
[False, False, True, False],
[False, False, False, True],
[False, True, False, False],
[False, False, True, False],
[False, False, True, False],
[False, False, False, True],
[ True, False, False, False]], dtype=bool)
Or create a zero array and fill, I think it's faster:
m2 = np.zeros((len(a), a.max()+1), np.bool)
m2[np.arange(len(a)), a] = True
print m2
This is maybe a bit out there, but its a pretty extensible solution and at least worth noting. If you've already got scikit-learn, the DictVectorizer class is used to transform categorical features in a dataset to column-wise binary representations just like you described:
import numpy as np
from sklearn.feature_extraction import DictVectorizer
# starting with your numpy array
y = np.array([1, 2, 3, 1, 2, 2, 3, 0])
# transform the array to a list of dicts, with original
# int values now as strings, and a throw-away key ''
y_dict = [{'':str(x)} for x in y.tolist()]
# create the vectorizer and transform the list of dicts
vec = DictVectorizer(sparse=False, dtype=int)
M = vec.fit_transform(y_dict)
print M
[[0 1 0 0]
[0 0 1 0]
[0 0 0 1]
[0 1 0 0]
[0 0 1 0]
[0 0 1 0]
[0 0 0 1]
[1 0 0 0]]
Again, probably overkill but it's kind of cute and I thought I'd throw it out there.