How to understand the np.argwhere function? - numpy

Signature: np.argwhere(a)
Docstring:
Find the indices of array elements that are non-zero, grouped by element.
Examples
>>> x = np.arange(6).reshape(2,3)
>>> x
array([[0, 1, 2],
[3, 4, 5]])
>>> np.argwhere(x>1)
array([[0, 2],
[1, 0],
[1, 1],
[1, 2]])
What does it mean by 'non-zero' and 'grouped by element'? and what is "x>1"?

In each row the first entry is the row index and the second entry is the column index of the entries of x that satisfy the condition.
For example:
2 is greater than 1
so the first row of argwhere gives you [0, 2]
pointing to the position of 2 in x.

Find the indices (positions) of array elements that are non-zero (true), grouped by element (each index is its own row).
Basically, if you pass a boolean array, you will find the indices where that array is true, but transposed so that the indices in the form [[x1, x2, ...], [y1, y2, ...]] become in the form [[x1, y1], [x2, y2], ...].
x > 1 is a boolean array which is True wherever x > 1 and False wherever x <= 1. In your example, it looks loke
[[False, False, True],
[True, True, True]]

Related

get elements in one array while not in other array along with axis 0 [duplicate]

I have 2 2d numpy arrays A and B
I want to remove all the rows in A which appear in B.
I tried something like this:
A[~np.isin(A, B)]
but isin keeps the dimensions of A, I need one boolean value per row to filter it.
EDIT: something like this
A = np.array([[3, 0, 4],
[3, 1, 1],
[0, 5, 9]])
B = np.array([[1, 1, 1],
[3, 1, 1]])
.....
A = np.array([[3, 0, 4],
[0, 5, 9]])
Probably not the most performant solution, but does exactly what you want. You can change the dtype of A and B to be a unit consisting of one row. You need to ensure that the arrays are contiguous first, e.g. with ascontiguousarray:
Av = np.ascontiguousarray(A).view(np.dtype([('', A.dtype, A.shape[1])])).ravel()
Bv = np.ascontiguousarray(B).view(Av.dtype).ravel()
Now you can apply np.isin directly:
>>> np.isin(Av, Bv)
array([False, True, False])
According to the docs, invert=True is faster than negating the output of isin, so you can do
A[np.isin(Av, Bv, invert=True)]
Try the following - it uses matrix multiplication for dimensionality reduction:
import numpy as np
A = np.array([[3, 0, 4],
[3, 1, 1],
[0, 5, 9]])
B = np.array([[1, 1, 1],
[3, 1, 1]])
arr_max = np.maximum(A.max(0) + 1, B.max(0) + 1)
print (A[~np.isin(A.dot(arr_max), B.dot(arr_max))])
Output:
[[3 0 4]
[0 5 9]]
This is certainly not the most performant solution but it is relatively easy to read:
A = np.array([row for row in A if row not in B])
Edit:
I found that the code does not correctly work, but this does:
A = [row for row in A if not any(np.equal(B, row).all(1))]

Coalescing rows from boolean mask

I have a 2D array and a boolean mask of the same size. I want to use the mask to coalesce consecutive rows in the 2D array: By coalesce I mean to reduce the rows by taking the first occurrence. An example:
rows = np.r_['1,2,0', :6, :6]
mask = np.tile([1, 1, 0, 0, 1, 1], (2,1)).T.astype(bool)
Expected output:
array([[0, 0],
[2, 2],
[3, 3],
[4, 4])
And to illustrate how the output might be obtained:
array([[0, 0], array([[0, 0], array([[0, 0],
[1, 1], [0, 0], [2, 2],
[2, 2], -> select -> [2, 2], -> reduce -> [3, 3],
[3, 3], [3, 3], [4, 4]])
[4, 4], [4, 4],
[5, 5]]) [4, 4]])
What I have tried:
rows[~mask].reshape(-1,2)
But this will only select the rows which should not be reduced.
Upgraded answer
I realized that my initial submission did a lot of unnecessary operations, I realized that given mask
mask = [1,1,0,0,1,1,0,0,1,1,1,0]
You simply want to negate the leading ones:
#negate:v v v
mask = [0,1,0,0,0,1,0,0,0,1,1,0]
then negate the mask to get your wanted rows. This way is MUCH more efficient than doing a forward fill on indices and removing repeated indices (see old answer). Revised solution:
import numpy as np
rows = np.r_['1,2,0', :6, :6]
mask = np.tile([1, 1, 0, 0, 1, 1], (2,1)).T.astype(bool)
def maskforwardfill(a: np.ndarray, mask: np.ndarray):
mask = mask.copy()
mask[1:] = mask[1:] & mask[:-1] # Negate leading True values
mask[0] = False # First element should always be False, either it is False anyways, or it is a leading True value (which should be set to False)
return a[~mask] # index out wanted rows
# Reduce mask's dimension since I assume that you only do complete rows
print(maskforwardfill(rows, mask.any(1)))
#[[0 0]
# [2 2]
# [3 3]
# [4 4]]
Old answer
Here I assume that you only need complete rows (like in #Arne's answer). My idea is that given the mask and the corresponding array indices
mask = [1,1,0,0,1,1]
indices = [0,1,2,3,4,5]
you can use np.diff to first obtain
indices = [0,-1,2,3,4,-1]
Then a forward fill (where -1 acts as nan) on the indices such that you get
[0,0,2,3,4,4]
of which can use np.unique to remove repeated indices:
[0,2,3,4] # The rows indices you want
Code:
import numpy as np
rows = np.r_['1,2,0', :6, :6]
mask = np.tile([1, 1, 0, 0, 1, 1], (2,1)).T.astype(bool)
def maskforwardfill(a: np.ndarray, mask: np.ndarray):
mask = mask.copy()
indices = np.arange(len(a))
mask[np.diff(mask,prepend=[0]) == 1] = False # set leading True to False
indices[mask] = -1
indices = np.maximum.accumulate(indices) # forward fill indices
indices = np.unique(indices) # remove repeats
return a[indices] # index out wanted rows
# Reduce mask's dimension since I assume that you only do complete rows
print(maskforwardfill(rows, mask.any(1)))
#[[0 0]
# [2 2]
# [3 3]
# [4 4]]
Assuming it's always about complete rows, you can reduce the mask to one dimension. Then a straightforward approach is to iterate over the rows:
# reduce mask to one dimension for row selection
mask_1d = mask.any(axis=1)
# replace rows with previous ones based on mask
for i in range(1, len(rows)):
if mask_1d[i-1] and mask_1d[i]:
rows[i] = rows[i-1]
# leave out repeated rows
reduced = [rows[0]]
for i in range(1, len(rows)):
if not (rows[i] == rows[i-1]).all():
reduced.append(rows[i])
reduced = np.array(reduced)
reduced
array([[0, 0],
[2, 2],
[3, 3],
[4, 4]])

return the entire row with has the max value in numpy - python

val = np.array([[1, 3], [2, 5], [0, 6], [1, 2] ])
print(np.max(val))
6
I also want to print the row [0,6]. with axis it returns all the value from other rows as well. argmax doesnt return the row index.
One way is to use np.where which return indexes where true:
r,_ = np.where(val == np.max(val))
val[r]
Output:
array([[0, 6]])

Perform matrix multiplication between two arrays and get result only on masked places

I have two dense matrices, A [200000,10], B [10,100000]. I need to multiply them to get matrix C. I can't do that directly, since the resulting matrix won't fit into the memory. Moreover, I need only a few elements from the resulting matrix, like 1-2% of the total number of elements. I have a third matrix W [200000,100000] which is sparse and has non-zero elements on exactly those places which are interesting to me in the matrix C.
Is there a way to use W as a "mask" so that the resulting matrix C will be sparse and will contain only the needed elements?
Since a matrix multiplication is just a table of dot products, we can just perform the specific dot products we need, in a vectorized fashion.
import numpy as np
import scipy as sp
iX, iY = sp.nonzero(W)
values = np.sum(A[iX]*B[:, iY].T, axis=-1) #batched dot product
C = sp.sparse.coo_matrix(values, np.asarray([iX,iY]).T)
First, get the indexes of the non zero places in W, and then you can just get the (i,j) element of the result matrix by multiplying the i-th row in A with the j-th column in B, and save the result as a tuple (i,j,res) instead of saving it as a matrix (this is the right way to save sparse matrices).
Here's one approach using np.einsum for a vectorized solution -
from scipy import sparse
from scipy.sparse import coo_matrix
# Get row, col for the output array
r,c,_= sparse.find(W)
# Get the sum-reduction using valid rows and corresponding cols from A, B
out = np.einsum('ij,ji->i',A[r],B[:,c])
# Store as sparse matrix
out_sparse = coo_matrix((out, (r, c)), shape=W.shape)
Sample run -
1) Inputs :
In [168]: A
Out[168]:
array([[4, 6, 1, 1, 1],
[0, 8, 1, 3, 7],
[2, 8, 3, 2, 2],
[3, 4, 1, 6, 3]])
In [169]: B
Out[169]:
array([[5, 2, 4],
[2, 1, 3],
[7, 7, 2],
[5, 7, 5],
[8, 5, 0]])
In [176]: W
Out[176]:
<4x3 sparse matrix of type '<type 'numpy.bool_'>'
with 5 stored elements in Compressed Sparse Row format>
In [177]: W.toarray()
Out[177]:
array([[ True, False, False],
[False, False, False],
[ True, True, False],
[ True, False, True]], dtype=bool)
2) Using dense array to perform direct calculations and verify results later on :
In [171]: (A.dot(B))*W.toarray()
Out[171]:
array([[52, 0, 0],
[ 0, 0, 0],
[73, 57, 0],
[84, 0, 56]])
3) Use the proposed codes and get sparse matrix output :
In [172]: # Using proposed codes
...: r,c,_= sparse.find(W)
...: out = np.einsum('ij,ji->i',A[r],B[:,c])
...: out_sparse = coo_matrix((out, (r, c)), shape=W.shape)
...:
4) Finally verify results by converting to dense/array version and checking against direct version -
In [173]: out_sparse.toarray()
Out[173]:
array([[52, 0, 0],
[ 0, 0, 0],
[73, 57, 0],
[84, 0, 56]])

Substitute entries of numpy array with numpy arrays

I have a numpy array A of size ((s1,...sm)) with integer entries and a dictionary D with integers as keys and numpy arrays of size ((t)) as values. I would like to evaluate the dictionary on every entry of the array A to get a new array B of size ((s1,...sm,t)).
For example
D={1:[0,1],2:[1,0]}
A=np.array([1,2,1])
The output shout be
array([[0,1],[1,0],[0,1]])
Motivation: I have an array with indexes of unit vectors as entries and I need to transform it into an array with the vectors as entries.
If you can rename your keys to be 0-indexed, you might use direct array querying on your unit vectors:
>>> units = np.array([D[1], D[2]])
>>> B = units[A - 1] # -1 because 0 indexed: 1 -> 0, 2 -> 1
>>> B
array([[0, 1],
[1, 0],
[0, 1]])
And similarly for any shape:
>>> A = np.random.random_integers(0, 1, (10, 11, 12))
>>> A.shape
(10, 11, 12)
>>> B = units[A]
>>> B.shape
(10, 11, 12, 2)
You can learn more about advanced indexing on the numpy doc
>>> np.asarray([D[key] for key in A])
array([[0, 1],
[1, 0],
[0, 1]])
Here's an approach using np.searchsorted to locate those row indices to index into the values of the dictionary and then simply indexing it to get the desired output, like so -
idx = np.searchsorted(D.keys(),A)
out = np.asarray(D.values())[idx]
Sample run -
In [45]: A
Out[45]: array([1, 2, 1])
In [46]: D
Out[46]: {1: [0, 1], 2: [1, 0]}
In [47]: idx = np.searchsorted(D.keys(),A)
...: out = np.asarray(D.values())[idx]
...:
In [48]: out
Out[48]:
array([[0, 1],
[1, 0],
[0, 1]])