Looking for an efficient way to index 2D tensor by another 2D tensor in pytorch

Looking for an efficient way to index 2D tensor by another 2D tensor in pytorch - indexing

I have a tensor, say
A = tensor([
[0, 0],
[0, 2],
[0, 3],
[0, 4],
[0, 5],
[0, 6],
[1, 0],
[1, 1],
[1, 4],
[1, 5],
[1, 6]
])
and the other tensor
b = tensor([[0, 2], [1, 2]])
I would like to find an efficient way to index into A by b such that the result is
result = tensor([[0, 3], [1, 4]])
That is, match A’s first column of last dim (i.e. [0,…,1…]) with b’s first column of the last dim (i.e. [0,1]) by their values and then use b’s second column (i.e. [2, 2]) to index A’s second column.
Thanks

Work out a solution by converting it to one dimensional problem with torch.nonzero and offset by mask sum.
Instead of the original A, get a flatten version, like
A = tensor([[ 0], [ 2], [ 3], [ 4], [ 5], [ 7], [ 8], [11], [12]])
and also calculate the offsets along batch,
offset = tensor([[0], [5], [4]])
Similarly, get b
b = tensor([2, 2])
and
offset_b = b+offset.reshape(-1)[:-1]
Then
indices=A.reshape(-1)[offset_b]

Related

Algorithms of Joining arrays in numpy

I'm new in numpy, I understand the methods of "Joining arrays" in lower shape such as (n1, n2) beacause we can visualize, like a matrix.
But I don't undestand the logic in higher dimensions (n0, ...., n_{d-1}) of course I can't visualize that. To visualize I usually imagine a multidimensional array like a tree, so (n0, ...., n_{d-1}) means that at level (axis) i of tree every node has n_{i} children. So at level 0 (the root) we have n0 children and so on.
In substance what is the formal exact definiton of "Joining arrays" algorithms?
https://numpy.org/doc/stable/reference/routines.array-manipulation.html

Let's see I can illustrate some basic array operations.
First make a 2d array. Start with a 1d, [0,1,...5], and reshape it to (2,3):
In [1]: x = np.arange(6).reshape(2,3)
In [2]: x
Out[2]:
array([[0, 1, 2],
[3, 4, 5]])
I can join 2 copies of x along the 1st dimension (vstack, v for vertical also does this):
In [3]: np.concatenate([x,x], axis=0)
Out[3]:
array([[0, 1, 2],
[3, 4, 5],
[0, 1, 2],
[3, 4, 5]])
Note that the result is (4,3); no new dimension.
Or join them 'horizontally':
In [4]: np.concatenate([x,x], axis=1)
Out[4]:
array([[0, 1, 2, 0, 1, 2], # (2,6) shape
[3, 4, 5, 3, 4, 5]])
But if I supply them to np.array I make a 3d array (2,2,3) shape:
In [5]: np.array([x,x])
Out[5]:
array([[[0, 1, 2],
[3, 4, 5]],
[[0, 1, 2],
[3, 4, 5]]])
This action of np.array is really no different from making a 2d array from nested lists, np.array([[1,2],[3,4]]). We could just add a layer of nesting, just like Out[5} without the line breaks. I tend to think of this 3d array as having 2 blocks, each with 2 rows and 3 columns. But the names are just a convenience.
stack acts like np.array, making a 3d array. It actually changes the input arrays to (1,2,3) shape, and concatenates on the first axis.
In [6]: np.stack([x,x])
Out[6]:
array([[[0, 1, 2],
[3, 4, 5]],
[[0, 1, 2],
[3, 4, 5]]])
stack lets us join the array in other ways
In [7]: np.stack([x,x], axis=1) # expand to (2,1,3) and concatante
Out[7]:
array([[[0, 1, 2],
[0, 1, 2]],
[[3, 4, 5],
[3, 4, 5]]])
In [8]: np.stack([x,x], axis=2) # expand to (2,3,1) and concatenate
Out[8]:
array([[[0, 0],
[1, 1],
[2, 2]],
[[3, 3],
[4, 4],
[5, 5]]])
concatenate and the other stack functions don't add anything new to basic numpy arrays. They just provide a way(s) of making a new array from existing ones. There aren't any special algorithms.
If it helps you could think of these join functions as creating a new "blank" array, and filling it with copies of the source arrays. For example that last stack can be done with:
In [9]: res = np.zeros((2,3,2), int)
In [10]: res
Out[10]:
array([[[0, 0],
[0, 0],
[0, 0]],
[[0, 0],
[0, 0],
[0, 0]]])
In [11]: res[:,:,0] = x
In [12]: res[:,:,1] = x
In [13]: res
Out[13]:
array([[[0, 0],
[1, 1],
[2, 2]],
[[3, 3],
[4, 4],
[5, 5]]])

Random valid data items in numpy array

Suppose I have a numpy array as follows:
data = np.array([[1, 3, 8, np.nan], [np.nan, 6, 7, 9], [np.nan, 0, 1, 2], [5, np.nan, np.nan, 2]])
I would like to randomly select n-valid items from the array, including their indices.
Does numpy provide an efficient way of doing this?

Example
data = np.array([[1, 3, 8, np.nan], [np.nan, 6, 7, 9], [np.nan, 0, 1, 2], [5, np.nan, np.nan, 2]])
n = 5
Get valid indices
y_val, x_val = np.where(~np.isnan(data))
n_val = y_val.size
Pick random subset of size n by index
pick = np.random.choice(n_val, n)
Apply index to valid coordinates
y_pick, x_pick = y_val[pick], x_val[pick]
Get corresponding data
data_pick = data[y_pick, x_pick]
Admire
data_pick
# array([2., 8., 1., 1., 2.])
y_pick
# array([3, 0, 0, 2, 3])
x_pick
# array([3, 2, 0, 2, 3])

Find nonzeros by :
In [37]: a = np.array(np.nonzero(data)).reshape(-1,2)
In [38]: a
Out[38]:
array([[0, 0],
[0, 0],
[1, 1],
[1, 1],
[2, 2],
[2, 3],
[3, 3],
[3, 0],
[1, 2],
[3, 0],
[1, 2],
[3, 0],
[2, 3],
[0, 1],
[2, 3]])
Now pick a random choice :
In [44]: idx = np.random.choice(np.arange(len(a)))
In [45]: data[a[idx][0],a[idx][1]]
Out[45]: 2.0

is there an numpy function that will return an array with different combinations of the original?

For example if my array was
(2,2)
array([[1, 0],
[0, 1]])
I would want it to return:
(4,2,2)
array([[[0, 0],
[0, 1]],
[[1, 1],
[0, 1]],
[[1, 0],
[1, 1]],
[[1, 0],
[0, 0]]])

You can flip binary number at a time using:
(np.identity(inp.size, int)^inp.ravel()).reshape(-1, *inp.shape)
or more verbose but also more economical:
>>> out = np.empty(2*(inp.size,), inp.dtype)
>>> out[...] = inp.ravel()
>>> np.einsum('ii->i', out)[...]^=1
>>> out = out.reshape(-1, *inp.shape)

How to combined two arrays by interating with numpy? [duplicate]

I'd like to turn an open mesh returned by the numpy ix_ routine to a list of coordinates
eg, for:
In[1]: m = np.ix_([0, 2, 4], [1, 3])
In[2]: m
Out[2]:
(array([[0],
[2],
[4]]), array([[1, 3]]))
What I would like is:
([0, 1], [0, 3], [2, 1], [2, 3], [4, 1], [4, 3])
I'm pretty sure I could hack it together with some iterating, unpacking and zipping, but I'm sure there must be a smart numpy way of achieving this...

Approach #1 Use np.meshgrid and then stack -
r,c = np.meshgrid(*m)
out = np.column_stack((r.ravel('F'), c.ravel('F') ))
Approach #2 Alternatively, with np.array() and then transposing, reshaping -
np.array(np.meshgrid(*m)).T.reshape(-1,len(m))
For a generic case with for generic number of arrays used within np.ix_, here are the modifications needed -
p = np.r_[2:0:-1,3:len(m)+1,0]
out = np.array(np.meshgrid(*m)).transpose(p).reshape(-1,len(m))
Sample runs -
Two arrays case :
In [376]: m = np.ix_([0, 2, 4], [1, 3])
In [377]: p = np.r_[2:0:-1,3:len(m)+1,0]
In [378]: np.array(np.meshgrid(*m)).transpose(p).reshape(-1,len(m))
Out[378]:
array([[0, 1],
[0, 3],
[2, 1],
[2, 3],
[4, 1],
[4, 3]])
Three arrays case :
In [379]: m = np.ix_([0, 2, 4], [1, 3],[6,5,9])
In [380]: p = np.r_[2:0:-1,3:len(m)+1,0]
In [381]: np.array(np.meshgrid(*m)).transpose(p).reshape(-1,len(m))
Out[381]:
array([[0, 1, 6],
[0, 1, 5],
[0, 1, 9],
[0, 3, 6],
[0, 3, 5],
[0, 3, 9],
[2, 1, 6],
[2, 1, 5],
[2, 1, 9],
[2, 3, 6],
[2, 3, 5],
[2, 3, 9],
[4, 1, 6],
[4, 1, 5],
[4, 1, 9],
[4, 3, 6],
[4, 3, 5],
[4, 3, 9]])

Merge duplicate indices in a sparse tensor

Lets say I have a sparse tensor with duplicate indices and where they are duplicate I want to merge values (sum them up)
What is the best way to do this?
example:
indicies = [[1, 1], [1, 2], [1, 2], [1, 3]]
values = [1, 2, 3, 4]
object = tf.SparseTensor(indicies, values, shape=[10, 10])
result = tf.MAGIC(object)
result should be a spare tensor with the following values (or concrete!):
indicies = [[1, 1], [1, 2], [1, 3]]
values = [1, 5, 4]
The only thing I have though of is to string concat the indicies together to create an index hash apply it to a third dimension and then reduce sum on that third dimension.
indicies = [[1, 1, 11], [1, 2, 12], [1, 2, 12], [1, 3, 13]]
sparse_result = tf.sparse_reduce_sum(sparseTensor, reduction_axes=2, keep_dims=true)
But that feels very very ugly

Here is a solution using tf.segment_sum. The idea is to linearize the indices in to a 1-D space, get the unique indices with tf.unique, run tf.segment_sum, and convert the indices back to N-D space.
indices = tf.constant([[1, 1], [1, 2], [1, 2], [1, 3]])
values = tf.constant([1, 2, 3, 4])
# Linearize the indices. If the dimensions of original array are
# [N_{k}, N_{k-1}, ... N_0], then simply matrix multiply the indices
# by [..., N_1 * N_0, N_0, 1]^T. For example, if the sparse tensor
# has dimensions [10, 6, 4, 5], then multiply by [120, 20, 5, 1]^T
# In your case, the dimensions are [10, 10], so multiply by [10, 1]^T
linearized = tf.matmul(indices, [[10], [1]])
# Get the unique indices, and their positions in the array
y, idx = tf.unique(tf.squeeze(linearized))
# Use the positions of the unique values as the segment ids to
# get the unique values
values = tf.segment_sum(values, idx)
# Go back to N-D indices
y = tf.expand_dims(y, 1)
indices = tf.concat([y//10, y%10], axis=1)
tf.InteractiveSession()
print(indices.eval())
print(values.eval())

Maybe you can try:
indicies = [[1, 1], [1, 2], [1, 2], [1, 3]]
values = [1, 2, 3, 4]
object = tf.SparseTensor(indicies, values, shape=[10, 10])
tf.sparse.to_dense(object, validate_indices=False)

Using unsorted_segment_sum could be simpler:
def deduplicate(tensor):
if not isinstance(tensor, tf.IndexedSlices):
return tensor
unique_indices, new_index_positions = tf.unique(tensor.indices)
summed_values = tf.unsorted_segment_sum(tensor.values, new_index_positions, tf.shape(unique_indices)[0])
return tf.IndexedSlices(indices=unique_indices, values=summed_values, dense_shape=tensor.dense_shape)

Another solution is to use tf.scatter_nd which will create a dense tensor and accumulate values on duplicate indices. This behavior is clearly stated in the documentation:
If indices contains duplicates, the duplicate values are accumulated (summed).
Then eventually we can convert it back to a sparse representation.
Here is a code sample for TensorFlow 2.x in eager mode:
import tensorflow as tf
indicies = [[1, 1], [1, 2], [1, 2], [1, 3]]
values = [1, 2, 3, 4]
merged_dense = tf.scatter_nd(indices, values, shape=(10, 10))
merged_sparse = tf.sparse.from_dense(merged_dense)
print(merged_sparse)
Output
SparseTensor(
indices=tf.Tensor(
[[1 1]
[1 2]
[1 3]],
shape=(3, 2),
dtype=int64),
values=tf.Tensor([1 5 4], shape=(3,), dtype=int32),
dense_shape=tf.Tensor([10 10], shape=(2,), dtype=int64))

So. As per the solution mentioned above.
Another example.
For the shape [12, 5]:
Lines to be changed in the code:
linearized = tf.matmul(indices, [[5], [1]])
indices = tf.concat([y//5, y%5], axis=1)

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Looking for an efficient way to index 2D tensor by another 2D tensor in pytorch - indexing

Related

Algorithms of Joining arrays in numpy

Random valid data items in numpy array

is there an numpy function that will return an array with different combinations of the original?

How to combined two arrays by interating with numpy? [duplicate]

Merge duplicate indices in a sparse tensor

Categories

Resources