Row wise contain in numpy [duplicate] - numpy

This question already has an answer here:
Numpy element-wise in operation
(1 answer)
Closed 5 years ago.
I was wondering what is the best way to check a row wise contain in python numpy?
Suppose we have a vector V = [1, 2, 3, 4] and a Matrix M = [[2, 3, 4], [3, 5, 6], [4, 1, 3], [5, 4, 2]] (The number of rows in M is equal to the length of V). After performing row wise contain, I should get (False, False, True, True) because 1 is not in [2, 3, 4] and 2 is not in [3, 5, 6] and 3 is in [4, 1, 3] and 4 is in [5, 4, 2]
What would be the best way to do this operation in python numpy?
Actually, I do not want to use a for loop. That obviously could work while is not the best way to do it. I myself came up with this idea to do a subtraction and then count the number of zeros in the result which is much faster than using for loops. However, I wanted to know if there is a better way to do it.

What you're looking for is the in operator. e.g. 1 in [1,2,3] returns True
So given your values of v and m as numpy arrays as follows:
import numpy as np
v = np.array([1,2,3,4])
m = np.array([np.array([2,3,4]), np.array([3,5,6]), np.array([4,1,3]), np.array([5,4,2])])
# Checking row wise contain
result = [(v[i] in m[i]) for i in range(len(v))]
The result is:
>>> [(v[i] in m[i]) for i in range(len(v))]
[False, False, True, True]
Another solution as Divakar pointed out would be to use
>>> (m==v[:,None]).any(1)
array([False, False, True, True], dtype=bool)
However, doing some rough timing checks:
>>> start_time=time.time(); (m==v[:,None]).any(1); print(time.time()-start_time)
array([False, False, True, True], dtype=bool)
0.000586032867432
>>> start_time=time.time(); [(v[i] in m[i]) for i in range(len(v))]; print(time.time()-start_time)
[False, False, True, True]
7.20024108887e-05
The initial solution seems to be faster.

Related

get elements in one array while not in other array along with axis 0 [duplicate]

I have 2 2d numpy arrays A and B
I want to remove all the rows in A which appear in B.
I tried something like this:
A[~np.isin(A, B)]
but isin keeps the dimensions of A, I need one boolean value per row to filter it.
EDIT: something like this
A = np.array([[3, 0, 4],
[3, 1, 1],
[0, 5, 9]])
B = np.array([[1, 1, 1],
[3, 1, 1]])
.....
A = np.array([[3, 0, 4],
[0, 5, 9]])
Probably not the most performant solution, but does exactly what you want. You can change the dtype of A and B to be a unit consisting of one row. You need to ensure that the arrays are contiguous first, e.g. with ascontiguousarray:
Av = np.ascontiguousarray(A).view(np.dtype([('', A.dtype, A.shape[1])])).ravel()
Bv = np.ascontiguousarray(B).view(Av.dtype).ravel()
Now you can apply np.isin directly:
>>> np.isin(Av, Bv)
array([False, True, False])
According to the docs, invert=True is faster than negating the output of isin, so you can do
A[np.isin(Av, Bv, invert=True)]
Try the following - it uses matrix multiplication for dimensionality reduction:
import numpy as np
A = np.array([[3, 0, 4],
[3, 1, 1],
[0, 5, 9]])
B = np.array([[1, 1, 1],
[3, 1, 1]])
arr_max = np.maximum(A.max(0) + 1, B.max(0) + 1)
print (A[~np.isin(A.dot(arr_max), B.dot(arr_max))])
Output:
[[3 0 4]
[0 5 9]]
This is certainly not the most performant solution but it is relatively easy to read:
A = np.array([row for row in A if row not in B])
Edit:
I found that the code does not correctly work, but this does:
A = [row for row in A if not any(np.equal(B, row).all(1))]

How do you concatenate several 2D arrays in numpy?

I would like
np.concatenate((np.array([[5,5],[2,3]]),np.array([[6,4],[7,8]])))
to yield
[ [[5,5],[2,3]], [[6,4],[7,8]] ]
Concatenate doesn't do the trick, but I am lost on how else to do it!
you can use numpy.stack() or numpy.append() (I suggest append if you have a large code). just pay attention it is the append of numpy. not built-in append of python.
>>> import numpy as np
>>> a = np.array([[5,5],[2,3]])
>>> b = np.array([[6,4],[7,8]])
>>> np.append([a], [b], axis = 0)
# answer:
array([[[5, 5],
[2, 3]],
[[6, 4],
[7, 8]]])
now if we go with np.stack():
>>> d = np.stack((a,b))
>>> c == d
# answer:
array([[[ True, True],
[ True, True]],
[[ True, True],
[ True, True]]])
as you can see they are the same.
you can see the user guide of numpy.append here and user guide of numpy.vstack here.
for anyone wondering np.stack((a,b)) does the trick :)

Numpy broadcasting inequality operator

Numpy broadcasting question. I have two arrays similar to these:
>my_array = np.array([[3,1,2,0] , [4,5,2,1]])
>my_array
array([[3, 1, 2, 0],
[4, 5, 2, 1]])
>second_array = np.array([2,5])
>second_array
array([2, 5])
What I want to do is transpose second_array and test, by column, to see if my_array is >= second_array . So the result would be like this:
>final_array = np.array([ [ (3 >= 2), (1>= 2), (2>=2), (0>=2)] , [(4 >=5),(5>=5),(2>=5),(1>=5)]])
>final_array
array([[ True, False, True, False],
[False, True, False, False]], dtype=bool)
I'm pretty new to matrix operations in Numpy (been doing them in R for a long time) so thanks for helping with such an introductory question.
You just need to reshape second_array so that it has appropriate dimensions:
my_array >= second_array.reshape(2,1) # or (-1,1) if height is unknown
Or equivalently:
my_array >= second_array[:,np.newaxis]

How to understand the np.argwhere function?

Signature: np.argwhere(a)
Docstring:
Find the indices of array elements that are non-zero, grouped by element.
Examples
>>> x = np.arange(6).reshape(2,3)
>>> x
array([[0, 1, 2],
[3, 4, 5]])
>>> np.argwhere(x>1)
array([[0, 2],
[1, 0],
[1, 1],
[1, 2]])
What does it mean by 'non-zero' and 'grouped by element'? and what is "x>1"?
In each row the first entry is the row index and the second entry is the column index of the entries of x that satisfy the condition.
For example:
2 is greater than 1
so the first row of argwhere gives you [0, 2]
pointing to the position of 2 in x.
Find the indices (positions) of array elements that are non-zero (true), grouped by element (each index is its own row).
Basically, if you pass a boolean array, you will find the indices where that array is true, but transposed so that the indices in the form [[x1, x2, ...], [y1, y2, ...]] become in the form [[x1, y1], [x2, y2], ...].
x > 1 is a boolean array which is True wherever x > 1 and False wherever x <= 1. In your example, it looks loke
[[False, False, True],
[True, True, True]]

Perform matrix multiplication between two arrays and get result only on masked places

I have two dense matrices, A [200000,10], B [10,100000]. I need to multiply them to get matrix C. I can't do that directly, since the resulting matrix won't fit into the memory. Moreover, I need only a few elements from the resulting matrix, like 1-2% of the total number of elements. I have a third matrix W [200000,100000] which is sparse and has non-zero elements on exactly those places which are interesting to me in the matrix C.
Is there a way to use W as a "mask" so that the resulting matrix C will be sparse and will contain only the needed elements?
Since a matrix multiplication is just a table of dot products, we can just perform the specific dot products we need, in a vectorized fashion.
import numpy as np
import scipy as sp
iX, iY = sp.nonzero(W)
values = np.sum(A[iX]*B[:, iY].T, axis=-1) #batched dot product
C = sp.sparse.coo_matrix(values, np.asarray([iX,iY]).T)
First, get the indexes of the non zero places in W, and then you can just get the (i,j) element of the result matrix by multiplying the i-th row in A with the j-th column in B, and save the result as a tuple (i,j,res) instead of saving it as a matrix (this is the right way to save sparse matrices).
Here's one approach using np.einsum for a vectorized solution -
from scipy import sparse
from scipy.sparse import coo_matrix
# Get row, col for the output array
r,c,_= sparse.find(W)
# Get the sum-reduction using valid rows and corresponding cols from A, B
out = np.einsum('ij,ji->i',A[r],B[:,c])
# Store as sparse matrix
out_sparse = coo_matrix((out, (r, c)), shape=W.shape)
Sample run -
1) Inputs :
In [168]: A
Out[168]:
array([[4, 6, 1, 1, 1],
[0, 8, 1, 3, 7],
[2, 8, 3, 2, 2],
[3, 4, 1, 6, 3]])
In [169]: B
Out[169]:
array([[5, 2, 4],
[2, 1, 3],
[7, 7, 2],
[5, 7, 5],
[8, 5, 0]])
In [176]: W
Out[176]:
<4x3 sparse matrix of type '<type 'numpy.bool_'>'
with 5 stored elements in Compressed Sparse Row format>
In [177]: W.toarray()
Out[177]:
array([[ True, False, False],
[False, False, False],
[ True, True, False],
[ True, False, True]], dtype=bool)
2) Using dense array to perform direct calculations and verify results later on :
In [171]: (A.dot(B))*W.toarray()
Out[171]:
array([[52, 0, 0],
[ 0, 0, 0],
[73, 57, 0],
[84, 0, 56]])
3) Use the proposed codes and get sparse matrix output :
In [172]: # Using proposed codes
...: r,c,_= sparse.find(W)
...: out = np.einsum('ij,ji->i',A[r],B[:,c])
...: out_sparse = coo_matrix((out, (r, c)), shape=W.shape)
...:
4) Finally verify results by converting to dense/array version and checking against direct version -
In [173]: out_sparse.toarray()
Out[173]:
array([[52, 0, 0],
[ 0, 0, 0],
[73, 57, 0],
[84, 0, 56]])