Looking for an efficient way to index 2D tensor by another 2D tensor in pytorch - indexing

I have a tensor, say
A = tensor([
[0, 0],
[0, 2],
[0, 3],
[0, 4],
[0, 5],
[0, 6],
[1, 0],
[1, 1],
[1, 4],
[1, 5],
[1, 6]
])
and the other tensor
b = tensor([[0, 2], [1, 2]])
I would like to find an efficient way to index into A by b such that the result is
result = tensor([[0, 3], [1, 4]])
That is, match A’s first column of last dim (i.e. [0,…,1…]) with b’s first column of the last dim (i.e. [0,1]) by their values and then use b’s second column (i.e. [2, 2]) to index A’s second column.
Thanks

Work out a solution by converting it to one dimensional problem with torch.nonzero and offset by mask sum.
Instead of the original A, get a flatten version, like
A = tensor([[ 0], [ 2], [ 3], [ 4], [ 5], [ 7], [ 8], [11], [12]])
and also calculate the offsets along batch,
offset = tensor([[0], [5], [4]])
Similarly, get b
b = tensor([2, 2])
and
offset_b = b+offset.reshape(-1)[:-1]
Then
indices=A.reshape(-1)[offset_b]

Related

Algorithms of Joining arrays in numpy

I'm new in numpy, I understand the methods of "Joining arrays" in lower shape such as (n1, n2) beacause we can visualize, like a matrix.
But I don't undestand the logic in higher dimensions (n0, ...., n_{d-1}) of course I can't visualize that. To visualize I usually imagine a multidimensional array like a tree, so (n0, ...., n_{d-1}) means that at level (axis) i of tree every node has n_{i} children. So at level 0 (the root) we have n0 children and so on.
In substance what is the formal exact definiton of "Joining arrays" algorithms?
https://numpy.org/doc/stable/reference/routines.array-manipulation.html
Let's see I can illustrate some basic array operations.
First make a 2d array. Start with a 1d, [0,1,...5], and reshape it to (2,3):
In [1]: x = np.arange(6).reshape(2,3)
In [2]: x
Out[2]:
array([[0, 1, 2],
[3, 4, 5]])
I can join 2 copies of x along the 1st dimension (vstack, v for vertical also does this):
In [3]: np.concatenate([x,x], axis=0)
Out[3]:
array([[0, 1, 2],
[3, 4, 5],
[0, 1, 2],
[3, 4, 5]])
Note that the result is (4,3); no new dimension.
Or join them 'horizontally':
In [4]: np.concatenate([x,x], axis=1)
Out[4]:
array([[0, 1, 2, 0, 1, 2], # (2,6) shape
[3, 4, 5, 3, 4, 5]])
But if I supply them to np.array I make a 3d array (2,2,3) shape:
In [5]: np.array([x,x])
Out[5]:
array([[[0, 1, 2],
[3, 4, 5]],
[[0, 1, 2],
[3, 4, 5]]])
This action of np.array is really no different from making a 2d array from nested lists, np.array([[1,2],[3,4]]). We could just add a layer of nesting, just like Out[5} without the line breaks. I tend to think of this 3d array as having 2 blocks, each with 2 rows and 3 columns. But the names are just a convenience.
stack acts like np.array, making a 3d array. It actually changes the input arrays to (1,2,3) shape, and concatenates on the first axis.
In [6]: np.stack([x,x])
Out[6]:
array([[[0, 1, 2],
[3, 4, 5]],
[[0, 1, 2],
[3, 4, 5]]])
stack lets us join the array in other ways
In [7]: np.stack([x,x], axis=1) # expand to (2,1,3) and concatante
Out[7]:
array([[[0, 1, 2],
[0, 1, 2]],
[[3, 4, 5],
[3, 4, 5]]])
In [8]: np.stack([x,x], axis=2) # expand to (2,3,1) and concatenate
Out[8]:
array([[[0, 0],
[1, 1],
[2, 2]],
[[3, 3],
[4, 4],
[5, 5]]])
concatenate and the other stack functions don't add anything new to basic numpy arrays. They just provide a way(s) of making a new array from existing ones. There aren't any special algorithms.
If it helps you could think of these join functions as creating a new "blank" array, and filling it with copies of the source arrays. For example that last stack can be done with:
In [9]: res = np.zeros((2,3,2), int)
In [10]: res
Out[10]:
array([[[0, 0],
[0, 0],
[0, 0]],
[[0, 0],
[0, 0],
[0, 0]]])
In [11]: res[:,:,0] = x
In [12]: res[:,:,1] = x
In [13]: res
Out[13]:
array([[[0, 0],
[1, 1],
[2, 2]],
[[3, 3],
[4, 4],
[5, 5]]])

Random valid data items in numpy array

Suppose I have a numpy array as follows:
data = np.array([[1, 3, 8, np.nan], [np.nan, 6, 7, 9], [np.nan, 0, 1, 2], [5, np.nan, np.nan, 2]])
I would like to randomly select n-valid items from the array, including their indices.
Does numpy provide an efficient way of doing this?
Example
data = np.array([[1, 3, 8, np.nan], [np.nan, 6, 7, 9], [np.nan, 0, 1, 2], [5, np.nan, np.nan, 2]])
n = 5
Get valid indices
y_val, x_val = np.where(~np.isnan(data))
n_val = y_val.size
Pick random subset of size n by index
pick = np.random.choice(n_val, n)
Apply index to valid coordinates
y_pick, x_pick = y_val[pick], x_val[pick]
Get corresponding data
data_pick = data[y_pick, x_pick]
Admire
data_pick
# array([2., 8., 1., 1., 2.])
y_pick
# array([3, 0, 0, 2, 3])
x_pick
# array([3, 2, 0, 2, 3])
Find nonzeros by :
In [37]: a = np.array(np.nonzero(data)).reshape(-1,2)
In [38]: a
Out[38]:
array([[0, 0],
[0, 0],
[1, 1],
[1, 1],
[2, 2],
[2, 3],
[3, 3],
[3, 0],
[1, 2],
[3, 0],
[1, 2],
[3, 0],
[2, 3],
[0, 1],
[2, 3]])
Now pick a random choice :
In [44]: idx = np.random.choice(np.arange(len(a)))
In [45]: data[a[idx][0],a[idx][1]]
Out[45]: 2.0

is there an numpy function that will return an array with different combinations of the original?

For example if my array was
(2,2)
array([[1, 0],
[0, 1]])
I would want it to return:
(4,2,2)
array([[[0, 0],
[0, 1]],
[[1, 1],
[0, 1]],
[[1, 0],
[1, 1]],
[[1, 0],
[0, 0]]])
You can flip binary number at a time using:
(np.identity(inp.size, int)^inp.ravel()).reshape(-1, *inp.shape)
or more verbose but also more economical:
>>> out = np.empty(2*(inp.size,), inp.dtype)
>>> out[...] = inp.ravel()
>>> np.einsum('ii->i', out)[...]^=1
>>> out = out.reshape(-1, *inp.shape)

How to combined two arrays by interating with numpy? [duplicate]

I'd like to turn an open mesh returned by the numpy ix_ routine to a list of coordinates
eg, for:
In[1]: m = np.ix_([0, 2, 4], [1, 3])
In[2]: m
Out[2]:
(array([[0],
[2],
[4]]), array([[1, 3]]))
What I would like is:
([0, 1], [0, 3], [2, 1], [2, 3], [4, 1], [4, 3])
I'm pretty sure I could hack it together with some iterating, unpacking and zipping, but I'm sure there must be a smart numpy way of achieving this...
Approach #1 Use np.meshgrid and then stack -
r,c = np.meshgrid(*m)
out = np.column_stack((r.ravel('F'), c.ravel('F') ))
Approach #2 Alternatively, with np.array() and then transposing, reshaping -
np.array(np.meshgrid(*m)).T.reshape(-1,len(m))
For a generic case with for generic number of arrays used within np.ix_, here are the modifications needed -
p = np.r_[2:0:-1,3:len(m)+1,0]
out = np.array(np.meshgrid(*m)).transpose(p).reshape(-1,len(m))
Sample runs -
Two arrays case :
In [376]: m = np.ix_([0, 2, 4], [1, 3])
In [377]: p = np.r_[2:0:-1,3:len(m)+1,0]
In [378]: np.array(np.meshgrid(*m)).transpose(p).reshape(-1,len(m))
Out[378]:
array([[0, 1],
[0, 3],
[2, 1],
[2, 3],
[4, 1],
[4, 3]])
Three arrays case :
In [379]: m = np.ix_([0, 2, 4], [1, 3],[6,5,9])
In [380]: p = np.r_[2:0:-1,3:len(m)+1,0]
In [381]: np.array(np.meshgrid(*m)).transpose(p).reshape(-1,len(m))
Out[381]:
array([[0, 1, 6],
[0, 1, 5],
[0, 1, 9],
[0, 3, 6],
[0, 3, 5],
[0, 3, 9],
[2, 1, 6],
[2, 1, 5],
[2, 1, 9],
[2, 3, 6],
[2, 3, 5],
[2, 3, 9],
[4, 1, 6],
[4, 1, 5],
[4, 1, 9],
[4, 3, 6],
[4, 3, 5],
[4, 3, 9]])

Merge duplicate indices in a sparse tensor

Lets say I have a sparse tensor with duplicate indices and where they are duplicate I want to merge values (sum them up)
What is the best way to do this?
example:
indicies = [[1, 1], [1, 2], [1, 2], [1, 3]]
values = [1, 2, 3, 4]
object = tf.SparseTensor(indicies, values, shape=[10, 10])
result = tf.MAGIC(object)
result should be a spare tensor with the following values (or concrete!):
indicies = [[1, 1], [1, 2], [1, 3]]
values = [1, 5, 4]
The only thing I have though of is to string concat the indicies together to create an index hash apply it to a third dimension and then reduce sum on that third dimension.
indicies = [[1, 1, 11], [1, 2, 12], [1, 2, 12], [1, 3, 13]]
sparse_result = tf.sparse_reduce_sum(sparseTensor, reduction_axes=2, keep_dims=true)
But that feels very very ugly
Here is a solution using tf.segment_sum. The idea is to linearize the indices in to a 1-D space, get the unique indices with tf.unique, run tf.segment_sum, and convert the indices back to N-D space.
indices = tf.constant([[1, 1], [1, 2], [1, 2], [1, 3]])
values = tf.constant([1, 2, 3, 4])
# Linearize the indices. If the dimensions of original array are
# [N_{k}, N_{k-1}, ... N_0], then simply matrix multiply the indices
# by [..., N_1 * N_0, N_0, 1]^T. For example, if the sparse tensor
# has dimensions [10, 6, 4, 5], then multiply by [120, 20, 5, 1]^T
# In your case, the dimensions are [10, 10], so multiply by [10, 1]^T
linearized = tf.matmul(indices, [[10], [1]])
# Get the unique indices, and their positions in the array
y, idx = tf.unique(tf.squeeze(linearized))
# Use the positions of the unique values as the segment ids to
# get the unique values
values = tf.segment_sum(values, idx)
# Go back to N-D indices
y = tf.expand_dims(y, 1)
indices = tf.concat([y//10, y%10], axis=1)
tf.InteractiveSession()
print(indices.eval())
print(values.eval())
Maybe you can try:
indicies = [[1, 1], [1, 2], [1, 2], [1, 3]]
values = [1, 2, 3, 4]
object = tf.SparseTensor(indicies, values, shape=[10, 10])
tf.sparse.to_dense(object, validate_indices=False)
Using unsorted_segment_sum could be simpler:
def deduplicate(tensor):
if not isinstance(tensor, tf.IndexedSlices):
return tensor
unique_indices, new_index_positions = tf.unique(tensor.indices)
summed_values = tf.unsorted_segment_sum(tensor.values, new_index_positions, tf.shape(unique_indices)[0])
return tf.IndexedSlices(indices=unique_indices, values=summed_values, dense_shape=tensor.dense_shape)
Another solution is to use tf.scatter_nd which will create a dense tensor and accumulate values on duplicate indices. This behavior is clearly stated in the documentation:
If indices contains duplicates, the duplicate values are accumulated (summed).
Then eventually we can convert it back to a sparse representation.
Here is a code sample for TensorFlow 2.x in eager mode:
import tensorflow as tf
indicies = [[1, 1], [1, 2], [1, 2], [1, 3]]
values = [1, 2, 3, 4]
merged_dense = tf.scatter_nd(indices, values, shape=(10, 10))
merged_sparse = tf.sparse.from_dense(merged_dense)
print(merged_sparse)
Output
SparseTensor(
indices=tf.Tensor(
[[1 1]
[1 2]
[1 3]],
shape=(3, 2),
dtype=int64),
values=tf.Tensor([1 5 4], shape=(3,), dtype=int32),
dense_shape=tf.Tensor([10 10], shape=(2,), dtype=int64))
So. As per the solution mentioned above.
Another example.
For the shape [12, 5]:
Lines to be changed in the code:
linearized = tf.matmul(indices, [[5], [1]])
indices = tf.concat([y//5, y%5], axis=1)