Row wise contain in numpy [duplicate]

Row wise contain in numpy [duplicate] - numpy

This question already has an answer here:
Numpy element-wise in operation
(1 answer)
Closed 5 years ago.
I was wondering what is the best way to check a row wise contain in python numpy?
Suppose we have a vector V = [1, 2, 3, 4] and a Matrix M = [[2, 3, 4], [3, 5, 6], [4, 1, 3], [5, 4, 2]] (The number of rows in M is equal to the length of V). After performing row wise contain, I should get (False, False, True, True) because 1 is not in [2, 3, 4] and 2 is not in [3, 5, 6] and 3 is in [4, 1, 3] and 4 is in [5, 4, 2]
What would be the best way to do this operation in python numpy?
Actually, I do not want to use a for loop. That obviously could work while is not the best way to do it. I myself came up with this idea to do a subtraction and then count the number of zeros in the result which is much faster than using for loops. However, I wanted to know if there is a better way to do it.

What you're looking for is the in operator. e.g. 1 in [1,2,3] returns True
So given your values of v and m as numpy arrays as follows:
import numpy as np
v = np.array([1,2,3,4])
m = np.array([np.array([2,3,4]), np.array([3,5,6]), np.array([4,1,3]), np.array([5,4,2])])
# Checking row wise contain
result = [(v[i] in m[i]) for i in range(len(v))]
The result is:
>>> [(v[i] in m[i]) for i in range(len(v))]
[False, False, True, True]
Another solution as Divakar pointed out would be to use
>>> (m==v[:,None]).any(1)
array([False, False, True, True], dtype=bool)
However, doing some rough timing checks:
>>> start_time=time.time(); (m==v[:,None]).any(1); print(time.time()-start_time)
array([False, False, True, True], dtype=bool)
0.000586032867432
>>> start_time=time.time(); [(v[i] in m[i]) for i in range(len(v))]; print(time.time()-start_time)
[False, False, True, True]
7.20024108887e-05
The initial solution seems to be faster.

Related

get elements in one array while not in other array along with axis 0 [duplicate]

I have 2 2d numpy arrays A and B
I want to remove all the rows in A which appear in B.
I tried something like this:
A[~np.isin(A, B)]
but isin keeps the dimensions of A, I need one boolean value per row to filter it.
EDIT: something like this
A = np.array([[3, 0, 4],
[3, 1, 1],
[0, 5, 9]])
B = np.array([[1, 1, 1],
[3, 1, 1]])
.....
A = np.array([[3, 0, 4],
[0, 5, 9]])

Probably not the most performant solution, but does exactly what you want. You can change the dtype of A and B to be a unit consisting of one row. You need to ensure that the arrays are contiguous first, e.g. with ascontiguousarray:
Av = np.ascontiguousarray(A).view(np.dtype([('', A.dtype, A.shape[1])])).ravel()
Bv = np.ascontiguousarray(B).view(Av.dtype).ravel()
Now you can apply np.isin directly:
>>> np.isin(Av, Bv)
array([False, True, False])
According to the docs, invert=True is faster than negating the output of isin, so you can do
A[np.isin(Av, Bv, invert=True)]

Try the following - it uses matrix multiplication for dimensionality reduction:
import numpy as np
A = np.array([[3, 0, 4],
[3, 1, 1],
[0, 5, 9]])
B = np.array([[1, 1, 1],
[3, 1, 1]])
arr_max = np.maximum(A.max(0) + 1, B.max(0) + 1)
print (A[~np.isin(A.dot(arr_max), B.dot(arr_max))])
Output:
[[3 0 4]
[0 5 9]]

This is certainly not the most performant solution but it is relatively easy to read:
A = np.array([row for row in A if row not in B])
Edit:
I found that the code does not correctly work, but this does:
A = [row for row in A if not any(np.equal(B, row).all(1))]

How do you concatenate several 2D arrays in numpy?

I would like
np.concatenate((np.array([[5,5],[2,3]]),np.array([[6,4],[7,8]])))
to yield
[ [[5,5],[2,3]], [[6,4],[7,8]] ]
Concatenate doesn't do the trick, but I am lost on how else to do it!

you can use numpy.stack() or numpy.append() (I suggest append if you have a large code). just pay attention it is the append of numpy. not built-in append of python.
>>> import numpy as np
>>> a = np.array([[5,5],[2,3]])
>>> b = np.array([[6,4],[7,8]])
>>> np.append([a], [b], axis = 0)
# answer:
array([[[5, 5],
[2, 3]],
[[6, 4],
[7, 8]]])
now if we go with np.stack():
>>> d = np.stack((a,b))
>>> c == d
# answer:
array([[[ True, True],
[ True, True]],
[[ True, True],
[ True, True]]])
as you can see they are the same.
you can see the user guide of numpy.append here and user guide of numpy.vstack here.

for anyone wondering np.stack((a,b)) does the trick :)

Numpy broadcasting inequality operator

Numpy broadcasting question. I have two arrays similar to these:
>my_array = np.array([[3,1,2,0] , [4,5,2,1]])
>my_array
array([[3, 1, 2, 0],
[4, 5, 2, 1]])
>second_array = np.array([2,5])
>second_array
array([2, 5])
What I want to do is transpose second_array and test, by column, to see if my_array is >= second_array . So the result would be like this:
>final_array = np.array([ [ (3 >= 2), (1>= 2), (2>=2), (0>=2)] , [(4 >=5),(5>=5),(2>=5),(1>=5)]])
>final_array
array([[ True, False, True, False],
[False, True, False, False]], dtype=bool)
I'm pretty new to matrix operations in Numpy (been doing them in R for a long time) so thanks for helping with such an introductory question.

You just need to reshape second_array so that it has appropriate dimensions:
my_array >= second_array.reshape(2,1) # or (-1,1) if height is unknown
Or equivalently:
my_array >= second_array[:,np.newaxis]

How to understand the np.argwhere function?

Signature: np.argwhere(a)
Docstring:
Find the indices of array elements that are non-zero, grouped by element.
Examples
>>> x = np.arange(6).reshape(2,3)
>>> x
array([[0, 1, 2],
[3, 4, 5]])
>>> np.argwhere(x>1)
array([[0, 2],
[1, 0],
[1, 1],
[1, 2]])
What does it mean by 'non-zero' and 'grouped by element'? and what is "x>1"?

In each row the first entry is the row index and the second entry is the column index of the entries of x that satisfy the condition.
For example:
2 is greater than 1
so the first row of argwhere gives you [0, 2]
pointing to the position of 2 in x.

Find the indices (positions) of array elements that are non-zero (true), grouped by element (each index is its own row).
Basically, if you pass a boolean array, you will find the indices where that array is true, but transposed so that the indices in the form [[x1, x2, ...], [y1, y2, ...]] become in the form [[x1, y1], [x2, y2], ...].
x > 1 is a boolean array which is True wherever x > 1 and False wherever x <= 1. In your example, it looks loke
[[False, False, True],
[True, True, True]]

Perform matrix multiplication between two arrays and get result only on masked places

I have two dense matrices, A [200000,10], B [10,100000]. I need to multiply them to get matrix C. I can't do that directly, since the resulting matrix won't fit into the memory. Moreover, I need only a few elements from the resulting matrix, like 1-2% of the total number of elements. I have a third matrix W [200000,100000] which is sparse and has non-zero elements on exactly those places which are interesting to me in the matrix C.
Is there a way to use W as a "mask" so that the resulting matrix C will be sparse and will contain only the needed elements?

Since a matrix multiplication is just a table of dot products, we can just perform the specific dot products we need, in a vectorized fashion.
import numpy as np
import scipy as sp
iX, iY = sp.nonzero(W)
values = np.sum(A[iX]*B[:, iY].T, axis=-1) #batched dot product
C = sp.sparse.coo_matrix(values, np.asarray([iX,iY]).T)

First, get the indexes of the non zero places in W, and then you can just get the (i,j) element of the result matrix by multiplying the i-th row in A with the j-th column in B, and save the result as a tuple (i,j,res) instead of saving it as a matrix (this is the right way to save sparse matrices).

Here's one approach using np.einsum for a vectorized solution -
from scipy import sparse
from scipy.sparse import coo_matrix
# Get row, col for the output array
r,c,_= sparse.find(W)
# Get the sum-reduction using valid rows and corresponding cols from A, B
out = np.einsum('ij,ji->i',A[r],B[:,c])
# Store as sparse matrix
out_sparse = coo_matrix((out, (r, c)), shape=W.shape)
Sample run -
1) Inputs :
In [168]: A
Out[168]:
array([[4, 6, 1, 1, 1],
[0, 8, 1, 3, 7],
[2, 8, 3, 2, 2],
[3, 4, 1, 6, 3]])
In [169]: B
Out[169]:
array([[5, 2, 4],
[2, 1, 3],
[7, 7, 2],
[5, 7, 5],
[8, 5, 0]])
In [176]: W
Out[176]:
<4x3 sparse matrix of type '<type 'numpy.bool_'>'
with 5 stored elements in Compressed Sparse Row format>
In [177]: W.toarray()
Out[177]:
array([[ True, False, False],
[False, False, False],
[ True, True, False],
[ True, False, True]], dtype=bool)
2) Using dense array to perform direct calculations and verify results later on :
In [171]: (A.dot(B))*W.toarray()
Out[171]:
array([[52, 0, 0],
[ 0, 0, 0],
[73, 57, 0],
[84, 0, 56]])
3) Use the proposed codes and get sparse matrix output :
In [172]: # Using proposed codes
...: r,c,_= sparse.find(W)
...: out = np.einsum('ij,ji->i',A[r],B[:,c])
...: out_sparse = coo_matrix((out, (r, c)), shape=W.shape)
...:
4) Finally verify results by converting to dense/array version and checking against direct version -
In [173]: out_sparse.toarray()
Out[173]:
array([[52, 0, 0],
[ 0, 0, 0],
[73, 57, 0],
[84, 0, 56]])

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Row wise contain in numpy [duplicate] - numpy

Related

get elements in one array while not in other array along with axis 0 [duplicate]

How do you concatenate several 2D arrays in numpy?

Numpy broadcasting inequality operator

How to understand the np.argwhere function?

Perform matrix multiplication between two arrays and get result only on masked places

Categories

Resources