Can torch.where() used in a equivalent broadcsating form? - numpy

I have the following segment of for loop in my code. The nested loop is slowing down my complete execution.
for q in range(batchSize):
temp=torch.where((composition_matrix == pred[q]).all(dim=1))[0]
if len(temp)==0:
output[q]=0
else:
output[q]=int(temp[0])
Here, composition_matrix is [14000,2] dimensional pytorch tensor with only positive integers as cell values. pred and output both are a [batchSize,2] dimensional torch tensor.
As this for loop is slowing my code a lot and I am unable to get the equivalent broadcasting solution to this code segment.
Does a broadcasting solution exists to eleminate this for loop?
I shall be grateful for any help.
A minimum reproducible example is
import torch
composition_matrix=torch.randint(3, 10, (14000,2))
batchSize=64
pred=torch.randint(3, 10, (batchSize,2))
output=torch.zeros([batchSize])
for q in range(batchSize):
temp=torch.where((composition_matrix == pred[q]).all(dim=1))[0]
if len(temp)==0:
output[q]=0
else:
output[q]=int(temp[0])

To make it simple, you first need to understand what the operation is essentially doing. You've got two tensors. Tensor A is of shape (14000, 2) and tensor B is of shape (64, 2). The operation you want to do is:
For each row B[i] in B, compare that B[i] (of shape (2,) with A (of
shape (14000, 2)). If B[i] occurs within A, set output[i] = index of
first occurrence.
This can actually be done in two lines of code (maybe even one line):
comp = (composition_matrix[:, None, :] == pred).all(dim=-1)
output = torch.argmax(comp.float(), axis=0)
The first line creates comp, the broadcasted comparison of composition_matrix and pred, a boolean tensor of shape (14000, 64).
The second line needs to find the "index of the first match". This can be done quite simply with argmax: it will return the index of the first "1" (or if all the values are "0", will return the first index, ie, 0).
(Note that torch does not support argmax for "bool" tensors, and so comp needed to be cast to another data type.)

Sorry for the short and probably over-simplified example. I fear a bigger one would be much more difficult to visualize. But I hope this suits your purpose.
My solution may seem a little complicated but it's fully vectorized and includes no explicit loops.
Here's what I would do:
import torch
torch.manual_seed(0)
batchSize = 8
pred = torch.randint(0, 10, (batchSize, 2))
output = torch.zeros((batchSize, 2))
composition_matrix = torch.randint(0, 10, (14, 2))
# compair all vectors in composition_matrix to all vectors in pred
comparisons = (composition_matrix.unsqueeze(0) == pred.unsqueeze(1))
comparisons = comparisons.all(2)
# form an index array the shape of the comparisons array
comparison_idxs = torch.arange(comparisons.shape[1])
comparison_idxs = comparison_idxs.repeat(batchSize).reshape(*comparisons.shape)
# multipy the comparisons array by the index array
where_result = (comparison_idxs*comparisons)
# replace invalind zeros with the maximal value in each sample
batch_idxs = torch.arange(comparisons.shape[0])
batch_idxs = batch_idxs.repeat(comparisons.shape[1])
batch_idxs = batch_idxs.reshape(comparisons.shape[1], comparisons.shape[0]).T
maxima = where_result.max(1).values[batch_idxs]
maxima_vecor = maxima[(1-comparisons.int()).bool()]
where_result[(1-comparisons.int()).bool()] = maxima_vecor
vectorized_output = where_result.min(1)[0]
output = torch.zeros([batchSize])
for q in range(batchSize):
temp=torch.where((composition_matrix == pred[q]).all(dim=1))[0]
if len(temp)==0:
output[q]=0
else:
output[q]=int(temp[0])
output:
composition_matrix =
tensor([[6, 8],
[4, 3],
[6, 9],
[1, 4],
[4, 1],
[9, 9],
[9, 0],
[1, 2],
[3, 0],
[5, 5],
[2, 9],
[1, 8],
[8, 3],
[6, 9]])
pred =
tensor([[4, 9],
[3, 0],
[3, 9],
[7, 3],
[7, 3],
[1, 6],
[6, 9],
[8, 6]])
output =
tensor([0., 8., 0., 0., 0., 0., 2., 0.])
vectorized_output =
tensor([0, 8, 0, 0, 0, 0, 2, 0])
Some timing results:
torch.manual_seed(0)
batchSize = 8
pred = torch.randint(0, 10, (batchSize, 2))
composition_matrix = torch.randint(0, 10, (14000, 2))
print('timing the vectorized_solution:')
%timeit -n 1000 vectorized_solution(composition_matrix, pred,)
print('timing the loop_solution:')
%timeit -n 1000 loop_solution(composition_matrix, pred,)
output:
timing the vectorized_solution:
1000 loops, best of 5: 137 µs per loop
timing the loop_solution:
1000 loops, best of 5: 1.89 ms per loop

Related

numpy fill 3D mask array from 2D k-index boundary array

I want to use a 2D array which contains k-index values to quickly fill a 3D array with different mask values above/below each k-index. Only non-zero boundary indices will be used to fill.
Initialize 2D k-index array and extract valid i-j index arrays:
import numpy as np
boundary_indices = np.array([[0, 1, 2], [1, 2, 1], [0, 2, 0]])
ii, jj = np.where(boundary_indices > 0) # determine desired indices
kk = boundary_indices[ii, jj] # align boundary indices with valid indices
Yields:
boundary_indices = array([[0, 1, 2],
[1, 2, 1],
[0, 2, 0]])
ii = array([0, 0, 1, 1, 1, 2])
jj = array([1, 2, 0, 1, 2, 1])
kk = array([1, 2, 1, 2, 1, 2])
Loop through the indices and populate the output array:
output = np.zeros((3, 3, 3), dtype=np.int64)
for i, j, k in zip(ii, jj, kk):
output[i, j, :k] = 7 # fill region above
output[i, j, k:] = 8 # fill region below
While this does yield the correct results, it becomes quite slow once the size of the array increases significantly:
output[:, :, 0] = [[0, 7, 7],
[7, 7, 7],
[0, 7, 0]]
output[:, :, 1] = [[0, 8, 7],
[8, 7, 8],
[0, 7, 0]]
output[:, :, 2] = [[0, 8, 8],
[8, 8, 8],
[0, 8, 0]]
Is there a more efficient way to do this?
Tried output[ii, jj, kk] = 8 but that only imprints the boundary on the output array and not the regions above/below.
I was hoping that there would be some fancy-indexing magic and that something like this would work:
output[ii, jj, :kk] = 7
output[ii, jj, kk:] = 8
But it generates a TypeError: TypeError: only integer scalar arrays can be converted to a scalar index
For such kind of operation, Numba and Cython can be used to produce an efficient code. Here is an example with Numba:
import numba as nb
# `parallel=True` can be added here for large arrays
#nb.njit('int64[:,:,::1](int64[:], int64[:], int64[:])')
def compute(ii, jj, kk):
output = np.zeros((3, 3, 3), dtype=np.int64)
n = output.shape[2]
# `for idx in prange(ii.size)` can be used here for large array
for i, j, k in zip(ii, jj, kk):
# `i, j, k = ii[idx], jj[idx], kk[idx]` can be used here for large array
for l in range(k): # fill region above
output[i, j, l] = 7
for l in range(k, n): # fill region below
output[i, j, l] = 8
return output
# Either kk needs to be converted to an int64-based array with kk.astype(np.int64)
# or boundary_indices needs to be an int64-based array in the first place.
output = compute(ii, jj, kk)
Note that the Numba function can be faster if ii and jj are contiguous. However, they are surprisingly not contiguous when retrieved from np.where. Besides I assume that kk is a 64-bit array. You can change the signature (string in the Numba jit decorator) so to support 32-bit array. Also please note that Numba can lazily compile the function based on the provided type at runtime but this introduce a significant overhead during the first function call. This code is significantly faster, especially for large arrays thanks to the the just-in-time compilation of Numba. The Numba loop can be parallelized using prange and the parallel=True decorator flag although the current code should already be pretty good. Finally, note that you can do the operation np.where(boundary_indices > 0) directly in the Numba loop on the fly so to avoid creating possibly-expensive temporary arrays.

Can someone explain this numpy slicing behaviour?

Could someone explain me why the second assertion below fails? I do not understand why using a slice or a range for indexing would make a difference in this case.
import numpy as np
d = np.zeros(shape = (1,2,3))
assert d[:, 0, slice(0,2)].shape == d[:, 0, range(0,2)].shape #This doesn't trigger an exception as both operands return (1,2)
assert d[0, :, slice(0,2)].shape == d[0, :, range(0,2)].shape #This does because (1,2) != (2,1)...
Make the array more diagnostic:
In [66]: d = np.arange(6).reshape(1,2,3)
In [67]: d
Out[67]:
array([[[0, 1, 2],
[3, 4, 5]]])
scalar index in the middle:
In [68]: d[:,0,:2]
Out[68]: array([[0, 1]])
In [69]: d[:,0,range(2)]
Out[69]: array([[0, 1]])
Shape is (1,2) for both, though the 2nd is a copy because of the advanced indexing of the last dimension.
Shape is the same in the 2nd set, but the order actually differs:
In [70]: d[0,:,:2]
Out[70]:
array([[0, 1],
[3, 4]])
In [71]: d[0,:,range(2)]
Out[71]:
array([[0, 3],
[1, 4]])
[71] is a case of mixed basic and advanced indexing, which is documented as doing the unexpected. The middle sliced dimension is put last.
https://numpy.org/doc/stable/reference/arrays.indexing.html#combining-advanced-and-basic-indexing

Coalescing rows from boolean mask

I have a 2D array and a boolean mask of the same size. I want to use the mask to coalesce consecutive rows in the 2D array: By coalesce I mean to reduce the rows by taking the first occurrence. An example:
rows = np.r_['1,2,0', :6, :6]
mask = np.tile([1, 1, 0, 0, 1, 1], (2,1)).T.astype(bool)
Expected output:
array([[0, 0],
[2, 2],
[3, 3],
[4, 4])
And to illustrate how the output might be obtained:
array([[0, 0], array([[0, 0], array([[0, 0],
[1, 1], [0, 0], [2, 2],
[2, 2], -> select -> [2, 2], -> reduce -> [3, 3],
[3, 3], [3, 3], [4, 4]])
[4, 4], [4, 4],
[5, 5]]) [4, 4]])
What I have tried:
rows[~mask].reshape(-1,2)
But this will only select the rows which should not be reduced.
Upgraded answer
I realized that my initial submission did a lot of unnecessary operations, I realized that given mask
mask = [1,1,0,0,1,1,0,0,1,1,1,0]
You simply want to negate the leading ones:
#negate:v v v
mask = [0,1,0,0,0,1,0,0,0,1,1,0]
then negate the mask to get your wanted rows. This way is MUCH more efficient than doing a forward fill on indices and removing repeated indices (see old answer). Revised solution:
import numpy as np
rows = np.r_['1,2,0', :6, :6]
mask = np.tile([1, 1, 0, 0, 1, 1], (2,1)).T.astype(bool)
def maskforwardfill(a: np.ndarray, mask: np.ndarray):
mask = mask.copy()
mask[1:] = mask[1:] & mask[:-1] # Negate leading True values
mask[0] = False # First element should always be False, either it is False anyways, or it is a leading True value (which should be set to False)
return a[~mask] # index out wanted rows
# Reduce mask's dimension since I assume that you only do complete rows
print(maskforwardfill(rows, mask.any(1)))
#[[0 0]
# [2 2]
# [3 3]
# [4 4]]
Old answer
Here I assume that you only need complete rows (like in #Arne's answer). My idea is that given the mask and the corresponding array indices
mask = [1,1,0,0,1,1]
indices = [0,1,2,3,4,5]
you can use np.diff to first obtain
indices = [0,-1,2,3,4,-1]
Then a forward fill (where -1 acts as nan) on the indices such that you get
[0,0,2,3,4,4]
of which can use np.unique to remove repeated indices:
[0,2,3,4] # The rows indices you want
Code:
import numpy as np
rows = np.r_['1,2,0', :6, :6]
mask = np.tile([1, 1, 0, 0, 1, 1], (2,1)).T.astype(bool)
def maskforwardfill(a: np.ndarray, mask: np.ndarray):
mask = mask.copy()
indices = np.arange(len(a))
mask[np.diff(mask,prepend=[0]) == 1] = False # set leading True to False
indices[mask] = -1
indices = np.maximum.accumulate(indices) # forward fill indices
indices = np.unique(indices) # remove repeats
return a[indices] # index out wanted rows
# Reduce mask's dimension since I assume that you only do complete rows
print(maskforwardfill(rows, mask.any(1)))
#[[0 0]
# [2 2]
# [3 3]
# [4 4]]
Assuming it's always about complete rows, you can reduce the mask to one dimension. Then a straightforward approach is to iterate over the rows:
# reduce mask to one dimension for row selection
mask_1d = mask.any(axis=1)
# replace rows with previous ones based on mask
for i in range(1, len(rows)):
if mask_1d[i-1] and mask_1d[i]:
rows[i] = rows[i-1]
# leave out repeated rows
reduced = [rows[0]]
for i in range(1, len(rows)):
if not (rows[i] == rows[i-1]).all():
reduced.append(rows[i])
reduced = np.array(reduced)
reduced
array([[0, 0],
[2, 2],
[3, 3],
[4, 4]])

How to get indices of multiple elements in a 2D tensor, in a GPU friendly way?

This question is similar to that already answered here, but that question does not address how to retrieve the indices of multiple elements.
I have a 2D tensor points with many rows and a small number of columns, and would like to get a tensor containing the row indices of all the elements in that tensor. I know what elements are present in points beforehand; It contains integer elements ranging from 0 to 999, and I can make a tensor using the range function to reflect the set of possible elements. The elements may be in any of the columns.
How can I retrieve the row indices where each element appears in my tensor in a way that avoids looping or using numpy, so I can do this quickly on a GPU?
I am looking for something like (points == elements).nonzero()[:,1]
Thanks!
try torch.cat([(t == i).nonzero() for i in elements_to_compare])
>>> import torch
>>> t = torch.empty((15,4)).random_(0, 999)
>>> t
tensor([[429., 833., 393., 828.],
[555., 893., 846., 909.],
[ 11., 861., 586., 222.],
[232., 92., 576., 452.],
[171., 341., 851., 953.],
[ 94., 46., 130., 413.],
[243., 251., 545., 331.],
[620., 29., 194., 176.],
[303., 905., 771., 149.],
[482., 225., 7., 315.],
[ 44., 547., 206., 299.],
[695., 7., 645., 385.],
[225., 898., 677., 693.],
[746., 21., 505., 875.],
[591., 254., 84., 888.]])
>>> torch.cat([(t == i).nonzero() for i in [7,385]])
tensor([[ 9, 2],
[11, 1],
[11, 3]])
>>> torch.cat([(t == i).nonzero()[:,1] for i in [7,385]])
tensor([2, 1, 3])
Numpy:
>>> np.nonzero(np.isin(t, [7,385]))
(array([ 9, 11, 11], dtype=int64), array([2, 1, 3], dtype=int64))
>>> np.nonzero(np.isin(t, [7,385]))[1]
array([2, 1, 3], dtype=int64)
I'm not sure if I'm correctly understanding what you're looking for, but if you want the indices of a certain value you could try using where and the sparse representation of the result.
E.g. in the below tensor points the value 998 is present at indices [0,0] and [2,0]. To get those indices one could:
In [34]: points=torch.tensor([ [998, 6], [1, 3], [998, 999], [2, 3] ] )
In [35]: torch.where(points==998, points, torch.tensor(0)).to_sparse().indices()
Out[35]:
tensor([[0, 2],
[0, 0]])

Perform matrix multiplication between two arrays and get result only on masked places

I have two dense matrices, A [200000,10], B [10,100000]. I need to multiply them to get matrix C. I can't do that directly, since the resulting matrix won't fit into the memory. Moreover, I need only a few elements from the resulting matrix, like 1-2% of the total number of elements. I have a third matrix W [200000,100000] which is sparse and has non-zero elements on exactly those places which are interesting to me in the matrix C.
Is there a way to use W as a "mask" so that the resulting matrix C will be sparse and will contain only the needed elements?
Since a matrix multiplication is just a table of dot products, we can just perform the specific dot products we need, in a vectorized fashion.
import numpy as np
import scipy as sp
iX, iY = sp.nonzero(W)
values = np.sum(A[iX]*B[:, iY].T, axis=-1) #batched dot product
C = sp.sparse.coo_matrix(values, np.asarray([iX,iY]).T)
First, get the indexes of the non zero places in W, and then you can just get the (i,j) element of the result matrix by multiplying the i-th row in A with the j-th column in B, and save the result as a tuple (i,j,res) instead of saving it as a matrix (this is the right way to save sparse matrices).
Here's one approach using np.einsum for a vectorized solution -
from scipy import sparse
from scipy.sparse import coo_matrix
# Get row, col for the output array
r,c,_= sparse.find(W)
# Get the sum-reduction using valid rows and corresponding cols from A, B
out = np.einsum('ij,ji->i',A[r],B[:,c])
# Store as sparse matrix
out_sparse = coo_matrix((out, (r, c)), shape=W.shape)
Sample run -
1) Inputs :
In [168]: A
Out[168]:
array([[4, 6, 1, 1, 1],
[0, 8, 1, 3, 7],
[2, 8, 3, 2, 2],
[3, 4, 1, 6, 3]])
In [169]: B
Out[169]:
array([[5, 2, 4],
[2, 1, 3],
[7, 7, 2],
[5, 7, 5],
[8, 5, 0]])
In [176]: W
Out[176]:
<4x3 sparse matrix of type '<type 'numpy.bool_'>'
with 5 stored elements in Compressed Sparse Row format>
In [177]: W.toarray()
Out[177]:
array([[ True, False, False],
[False, False, False],
[ True, True, False],
[ True, False, True]], dtype=bool)
2) Using dense array to perform direct calculations and verify results later on :
In [171]: (A.dot(B))*W.toarray()
Out[171]:
array([[52, 0, 0],
[ 0, 0, 0],
[73, 57, 0],
[84, 0, 56]])
3) Use the proposed codes and get sparse matrix output :
In [172]: # Using proposed codes
...: r,c,_= sparse.find(W)
...: out = np.einsum('ij,ji->i',A[r],B[:,c])
...: out_sparse = coo_matrix((out, (r, c)), shape=W.shape)
...:
4) Finally verify results by converting to dense/array version and checking against direct version -
In [173]: out_sparse.toarray()
Out[173]:
array([[52, 0, 0],
[ 0, 0, 0],
[73, 57, 0],
[84, 0, 56]])