Suppose that I have two tensors
x = tf.constant([["a", "b"], ["c", "d"]]),
y = tf.constant([["b", "c"], ["d", "c"]])
Then, I want to get the following tensors:
x_ = [[0, 1], [1, 1]]
y_ = [[1, 0], [1, 1]]
How is x_ constructed?
The (0,1) entry in the first row of x is in the first row of y, so we set the (0,1) entry of x_ equal to 1. In addition, both the (1,0) and (1,1) entries of x are in the second row of y, so we set the (1,0) and (1,1) entries of x_ equal to 1.
How is y_ constructed?
The (0,0) entry of the first row of y is in the first row of x, so we set the (0,0) entry of y_ equal to 1. In addition, both the (1,0) and (1,1) entries of y are in the second row of x, so we set the (1,0) and (1,1) entries of y_ equal to 1.
Ugliest solution possible:
import tensorflow as tf
x = tf.constant([["a", "b"], ["c", "d"]])
y = tf.constant([["b", "c"], ["d", "c"]])
def func(x, y):
def compare(v, w):
return tf.cast(tf.equal(v, w), tf.int32)
y_rows = tf.range(tf.shape(y)[0])
x_cols = tf.range(tf.shape(x)[1])
res = tf.map_fn(lambda t: tf.map_fn(lambda z: compare(x[t,z],y[t]),
x_cols, dtype=tf.int32), y_rows, dtype=tf.int32)
res = tf.reduce_max(res, axis=-1)
return res
res1 = func(x,y)
res2 = func(y,x)
sess = tf.Session()
print(sess.run(res1))
print(sess.run(res2))
[[0 1]
[1 1]]
[[1 0]
[1 1]]
Don't know whether it has gradient. Perhaps someone will propose something better
I have been doing a project on image matching, so I need to find correspondences between 2 images. To get descriptors, I will need a interpolate function. However, when I read about a equivalent function which is done in Tensorflow, I still don’t get how to implement tf.gather_nd(parmas, indices, barch_dims) in Pytorch. Especially when there is a argument: batch_dims. I have gone through stackoverflow and there is no perfect equivalence yet.
The referred interpolate function in Tensorflow is below and I have been trying to implement this in Pytorch Arguments' information is below:
inputs is a dense feature map[i] from a for loop of batch size, which means it is 3D[H, W, C](in pytorch is [C, H, W])
pos is a set of random point coordinate shapes like [[i, j], [i, j],...,[i, j]], so it is 2D when it goes in interpolate function(in pytorch is [[i,i,...,i], [j,j,...,j]])
and it then expands both of their dimensions when they get into this function
I just want a perfect implement of tf.gather_nd with argument batch_dims. Thank you!
And here's a simple example of using it:
pos = tf.ones((12, 2)) ## stands for a set of coordinates [[i, i,…, i], [j, j,…, j]]
inputs = tf.ones((4, 4, 128)) ## stands for [H, W, C] of dense feature map
outputs = interpolate(pos, inputs, batched=False)
print(outputs.get_shape()) # We get (12, 128) here
interpolate function (tf version):
def interpolate(pos, inputs, nd=True):
pos = tf.expand_dims(pos, 0)
inputs = tf.expand_dims(inputs, 0)
h = tf.shape(inputs)[1]
w = tf.shape(inputs)[2]
i = pos[:, :, 0]
j = pos[:, :, 1]
i_top_left = tf.clip_by_value(tf.cast(tf.math.floor(i), tf.int32), 0, h - 1)
j_top_left = tf.clip_by_value(tf.cast(tf.math.floor(j), tf.int32), 0, w - 1)
i_top_right = tf.clip_by_value(tf.cast(tf.math.floor(i), tf.int32), 0, h - 1)
j_top_right = tf.clip_by_value(tf.cast(tf.math.ceil(j), tf.int32), 0, w - 1)
i_bottom_left = tf.clip_by_value(tf.cast(tf.math.ceil(i), tf.int32), 0, h - 1)
j_bottom_left = tf.clip_by_value(tf.cast(tf.math.floor(j), tf.int32), 0, w - 1)
i_bottom_right = tf.clip_by_value(tf.cast(tf.math.ceil(i), tf.int32), 0, h - 1)
j_bottom_right = tf.clip_by_value(tf.cast(tf.math.ceil(j), tf.int32), 0, w - 1)
dist_i_top_left = i - tf.cast(i_top_left, tf.float32)
dist_j_top_left = j - tf.cast(j_top_left, tf.float32)
w_top_left = (1 - dist_i_top_left) * (1 - dist_j_top_left)
w_top_right = (1 - dist_i_top_left) * dist_j_top_left
w_bottom_left = dist_i_top_left * (1 - dist_j_top_left)
w_bottom_right = dist_i_top_left * dist_j_top_left
if nd:
w_top_left = w_top_left[..., None]
w_top_right = w_top_right[..., None]
w_bottom_left = w_bottom_left[..., None]
w_bottom_right = w_bottom_right[..., None]
interpolated_val = (
w_top_left * tf.gather_nd(inputs, tf.stack([i_top_left, j_top_left], axis=-1), batch_dims=1) +
w_top_right * tf.gather_nd(inputs, tf.stack([i_top_right, j_top_right], axis=-1), batch_dims=1) +
w_bottom_left * tf.gather_nd(inputs, tf.stack([i_bottom_left, j_bottom_left], axis=-1), batch_dims=1) +
w_bottom_right * tf.gather_nd(inputs, tf.stack([i_bottom_right, j_bottom_right], axis=-1), batch_dims=1)
)
interpolated_val = tf.squeeze(interpolated_val, axis=0)
return interpolated_val
As far as I'm aware there is no directly equivalent of tf.gather_nd in PyTorch and implementing a generic version with batch_dims is not that simple. However, you likely don't need a generic version, and given the context of your interpolate function, a version for [C, H, W] would suffice.
At the beginning of interpolate you add a singular dimension to the front, which is the batch dimension. Setting batch_dims=1 in tf.gather_nd means there is one batch dimension at the beginning, therefore it applies it per batch, i.e. it indexes inputs[0] with pos[0] etc. There is no benefit of adding a singular batch dimension, because you could have just used the direct computation.
# Adding singular batch dimension
# Shape: [1, num_pos, 2]
pos = tf.expand_dims(pos, 0)
# Shape: [1, H, W, C]
inputs = tf.expand_dims(inputs, 0)
batched_result = tf.gather_nd(inputs, pos, batch_dims=1)
single_result = tf.gater_nd(inputs[0], pos[0])
# The first element in the batched result is the same as the single result
# Hence there is no benefit to adding a singular batch dimension.
tf.reduce_all(batched_result[0] == single_result) # => True
Single version
In PyTorch the implementation for [H, W, C] can be done with Python's indexing. While PyTorch usually uses [C, H, W] for images, it's only a matter of what dimension to index, but let's keep them the same as in TensorFlow for the sake of comparison. If you were to index them manually, you would do it as such: inputs[pos_h[0], pos_w[0]], inputs[pos_h[1], pos_w[1]] and so on. PyTorch allows you to do that automatically by providing the indices as lists: inputs[pos_h, pos_w], where pos_h and pos_w have the same length. All you need to do is split your pos into two separate tensors, one for the indices along the height dimension and the other along the width dimension, which you also did in the TensorFlow version.
inputs = torch.randn(4, 4, 128)
# Random positions 0-3, shape: [12, 2]
pos = torch.randint(4, (12, 2))
# Positions split by dimension
pos_h = pos[:, 0]
pos_w = pos[:, 1]
# Index the inputs with the indices per dimension
gathered = inputs[pos_h, pos_w]
# Verify that it's identical to TensorFlow's output
inputs_tf = tf.convert_to_tensor(inputs.numpy())
pos_tf = tf.convert_to_tensor(pos.numpy())
gathered_tf = tf.gather_nd(inputs_tf, pos_tf)
gathered_tf = torch.from_numpy(gathered_tf.numpy())
torch.equal(gathered_tf, gathered) # => True
If you want to apply it to a tensor of size [C, H, W] instead, you only need to change the dimensions you want to index:
# For [H, W, C]
gathered = inputs[pos_h, pos_w]
# For [C, H, W]
gathered = inputs[:, pos_h, pos_w]
Batched version
Making it a batched batched version (for [N, H, W, C] or [N, C, H, W]) is not that difficult, and using that is more appropriate, since you're dealing with batches anyway. The only tricky part is that each element in the batch should only be applied to the corresponding batch. For this the batch dimensions needs to be enumerated, which can be done with torch.arange. The batch enumeration is just the list with the batch indices, which will be combined with the pos_h and pos_w indices, resulting in inputs[0, pos_h[0, 0], pos_h[0, 0]], inputs[0, pos_h[0, 1], pos_h[0, 1]] ... inputs[1, pos_h[1, 0], pos_h[1, 0]] etc.
batch_size = 3
inputs = torch.randn(batch_size, 4, 4, 128)
# Random positions 0-3, different for each batch, shape: [3, 12, 2]
pos = torch.randint(4, (batch_size, 12, 2))
# Positions split by dimension
pos_h = pos[:, :, 0]
pos_w = pos[:, :, 1]
batch_enumeration = torch.arange(batch_size) # => [0, 1, 2]
# pos_h and pos_w have shape [3, 12], so the batch enumeration needs to be
# repeated 12 times per batch.
# Unsqueeze to get shape [3, 1], now the 1 could be repeated to 12, but
# broadcasting will do that automatically.
batch_enumeration = batch_enumeration.unsqueeze(1)
# Index the inputs with the indices per dimension
gathered = inputs[batch_enumeration, pos_h, pos_w]
# Again, verify that it's identical to TensorFlow's output
inputs_tf = tf.convert_to_tensor(inputs.numpy())
pos_tf = tf.convert_to_tensor(pos.numpy())
# This time with batch_dims=1
gathered_tf = tf.gather_nd(inputs_tf, pos_tf, batch_dims=1)
gathered_tf = torch.from_numpy(gathered_tf.numpy())
torch.equal(gathered_tf, gathered) # => True
Again, for [N, C, H, W], only the dimensions that are indexed need to be changed:
# For [N, H, W, C]
gathered = inputs[batch_enumeration, pos_h, pos_w]
# For [N, C, H, W]
gathered = inputs[batch_enumeration, :, pos_h, pos_w]
Just a little side note on the interpolate implementation, rounding the positions (floor and ceil respectively) doesn't make sense, because indices must be integers, so it has no effect, as long as your positions are actual indices. That also results in i_top_left and i_bottom_left being the same value, but even if they are to be rounded differently, they are always 1 position apart. Furthermore, i_top_left and i_top_right are literally the same. I don't think that this function produces a meaningful output. I don't know what you're trying to achieve, but if you're looking for image interpolation you could have a look at torch.nn.functional.interpolate.
This is just an extension of Michael Jungo's batched version answer when pos is 2D array instead of 1D array (excluding batch dimension).
bs = 2
H = 4
W = 6
C = 3
inputs = torch.randn(bs, H, W, C)
pos_h = torch.randint(H, (bs, H, W))
pos_w = torch.randint(W, (bs, H, W))
batch_enumeration = torch.arange(bs)
batch_enumeration = batch_enumeration.unsqueeze(1).unsqueeze(2)
inputs.shape
Out[34]: torch.Size([2, 4, 6, 3])
pos_h.shape
Out[35]: torch.Size([2, 4, 6])
pos_w.shape
Out[36]: torch.Size([2, 4, 6])
batch_enumeration.shape
Out[37]: torch.Size([2, 1, 1])
gathered = inputs[batch_enumeration, pos_h, pos_w]
For channel first, we also need to enumerate channels
inputs = torch.randn(bs, C, H, W)
pos_h = torch.randint(H, (bs, 1, H, W))
pos_w = torch.randint(W, (bs, 1, H, W))
batch_enumeration = torch.arange(bs)
batch_enumeration = batch_enumeration.unsqueeze(1).unsqueeze(2).unsqueeze(3)
channel_enumeration = torch.arange(C)
channel_enumeration = channel_enumeration.unsqueeze(0).unsqueeze(2).unsqueeze(3)
inputs.shape
Out[49]: torch.Size([2, 3, 4, 6])
pos_h.shape
Out[50]: torch.Size([2, 1, 4, 6])
pos_w.shape
Out[51]: torch.Size([2, 1, 4, 6])
batch_enumeration.shape
Out[52]: torch.Size([2, 1, 1, 1])
channel_enumeration.shape
Out[57]: torch.Size([1, 3, 1, 1])
gathered = inputs[batch_enumeration, channel_enumeration, pos_h, pos_w]
gathered.shape
Out[59]: torch.Size([2, 3, 4, 6])
Let's verify
inputs_np = inputs.numpy()
pos_h_np = pos_h.numpy()
pos_w_np = pos_w.numpy()
gathered_np = gathered.numpy()
pos_h_np[0,0,0,0]
Out[68]: 0
pos_w_np[0,0,0,0]
Out[69]: 3
inputs_np[0,:,0,3]
Out[71]: array([ 0.79122806, -2.190181 , -0.16741803], dtype=float32)
gathered_np[0,:,0,0]
Out[72]: array([ 0.79122806, -2.190181 , -0.16741803], dtype=float32)
pos_h_np[1,0,3,4]
Out[73]: 1
pos_w_np[1,0,3,4]
Out[74]: 2
inputs_np[1,:,1,2]
Out[75]: array([ 0.9282498 , -0.34945545, 0.9136222 ], dtype=float32)
gathered_np[1,:,3,4]
Out[77]: array([ 0.9282498 , -0.34945545, 0.9136222 ], dtype=float32)
I improved the answer from Michael Jungo's implementation. Now it supports arbitrary leading batch dimensions.
def gather_nd_torch(params, indices, batch_dim=1):
""" A PyTorch porting of tensorflow.gather_nd
This implementation can handle leading batch dimensions in params, see below for detailed explanation.
The majority of this implementation is from Michael Jungo # https://stackoverflow.com/a/61810047/6670143
I just ported it compatible to leading batch dimension.
Args:
params: a tensor of dimension [b1, ..., bn, g1, ..., gm, c].
indices: a tensor of dimension [b1, ..., bn, x, m]
batch_dim: indicate how many batch dimension you have, in the above example, batch_dim = n.
Returns:
gathered: a tensor of dimension [b1, ..., bn, x, c].
Example:
>>> batch_size = 5
>>> inputs = torch.randn(batch_size, batch_size, batch_size, 4, 4, 4, 32)
>>> pos = torch.randint(4, (batch_size, batch_size, batch_size, 12, 3))
>>> gathered = gather_nd_torch(inputs, pos, batch_dim=3)
>>> gathered.shape
torch.Size([5, 5, 5, 12, 32])
>>> inputs_tf = tf.convert_to_tensor(inputs.numpy())
>>> pos_tf = tf.convert_to_tensor(pos.numpy())
>>> gathered_tf = tf.gather_nd(inputs_tf, pos_tf, batch_dims=3)
>>> gathered_tf.shape
TensorShape([5, 5, 5, 12, 32])
>>> gathered_tf = torch.from_numpy(gathered_tf.numpy())
>>> torch.equal(gathered_tf, gathered)
True
"""
batch_dims = params.size()[:batch_dim] # [b1, ..., bn]
batch_size = np.cumprod(list(batch_dims))[-1] # b1 * ... * bn
c_dim = params.size()[-1] # c
grid_dims = params.size()[batch_dim:-1] # [g1, ..., gm]
n_indices = indices.size(-2) # x
n_pos = indices.size(-1) # m
# reshape leadning batch dims to a single batch dim
params = params.reshape(batch_size, *grid_dims, c_dim)
indices = indices.reshape(batch_size, n_indices, n_pos)
# build gather indices
# gather for each of the data point in this "batch"
batch_enumeration = torch.arange(batch_size).unsqueeze(1)
gather_dims = [indices[:, :, i] for i in range(len(grid_dims))]
gather_dims.insert(0, batch_enumeration)
gathered = params[gather_dims]
# reshape back to the shape with leading batch dims
gathered = gathered.reshape(*batch_dims, n_indices, c_dim)
return gathered
I have also made a demo Colab notebook, you can check it here. This implementation is way faster than TF's original implementation according to my poor speed test on Colab server with a GPU instance.
I want to create a batch of zero images with several channels and with some one given pixel per image with value one.
If the images are indexed only by the number of channels, the following code do the work just fine:
num_channels = 3
im_size = 2
images = np.zeros((num_channels, im_size, im_size))
# random locations for the ones
pixels = np.random.randint(low=0, high=im_size,
size=(num_channels, 2))
images[np.arange(num_channels), pixels[:, 0], pixels[:, 1]] = 1
However, the analogous code fails if we want to consider the batch too:
batch_size = 4
num_channels = 3
im_size = 2
images = np.zeros((batch_size, num_channels, im_size, im_size))
# random locations for the ones
pixels = np.random.randint(low=0, high=im_size,
size=(batch_size, num_channels, 2))
images[np.arange(batch_size), np.arange(num_channels), pixels[:, :, 0], pixels[:, :, 1]] = 1
which gives the error
IndexError: shape mismatch: indexing arrays could not be broadcast together with shapes (4,) (3,) (4,3) (4,3)
The following code will do the work, using an inefficient loop:
batch_size = 4
num_channels = 3
im_size = 2
images = np.zeros((batch_size, num_channels, im_size, im_size))
# random locations for the ones
pixels = np.random.randint(low=0, high=im_size,
size=(batch_size, num_channels, 2))
for k in range(batch_size):
images[k, np.arange(num_channels), pixels[k, :, 0], pixels[k, :, 1]] = 1
How would you obtain a vectorized solution?
A simple vectorized using advanced-indexing would be -
I,J = np.arange(batch_size)[:,None],np.arange(num_channels)
images[I, J, pixels[...,0], pixels[...,1]] = 1
Alternative easier way to get those I,J indexers would be with np.ogrid -
I,J = np.ogrid[:batch_size,:num_channels]
The image below describes the output before the application of a max-pooling layer of a single intermediate filter layer of a CNN.
I want to store the co-ordinates of the pixel with intensity 4(on the bottom right of the matrix on the LHS of the arrow) as it is in the matrix on the LHS of the arrow. That is the pixel at co-ordinate (4,4)(1 based indexing)in the right matrix is the one which is getting stored in the bottom right cell of the matrix on the RHS of the arrow, right. Now what I want to do is to store this co-ordinate value (4,4) along with the co-ordinates for the other pixels {(2,2) for pixel with intensity 6, (2, 4) for pixel with intensity 8 and (3, 1) for pixel with intensity 3} as a list for later processing. How do I do it in Tensorflow.
Max pooling done with a filter of size 2 x 2 and stride of 2
You can use tf.nn.max_pool_with_argmax (link).
Note:
The indices in argmax are flattened, so that a maximum value at
position [b, y, x, c] becomes flattened index ((b * height + y) *
width + x) * channels + c.
We need to do some processing to make it fit your coordinates.
An example:
import tensorflow as tf
import numpy as np
def max_pool_with_argmax(net,filter_h,filter_w,stride):
output, mask = tf.nn.max_pool_with_argmax( net,ksize=[1, filter_h, filter_w, 1],
strides=[1, stride, stride, 1],padding='SAME')
# If your ksize looks like [1, stride, stride, 1]
loc_x = mask // net.shape[2]
loc_y = mask % net.shape[2]
loc = tf.concat([loc_x+1,loc_y+1],axis=-1) #count from 0 so add 1
# If your ksize is all changing, use the following
# c = tf.mod(mask,net.shape[3])
# remain = tf.cast(tf.divide(tf.subtract(mask,c),net.shape[3]),tf.int64)
# x = tf.mod(remain,net.shape[2])
# remain = tf.cast(tf.divide(tf.subtract(remain,x),net.shape[2]),tf.int64)
# y = tf.mod(remain,net.shape[1])
# remain = tf.cast(tf.divide(tf.subtract(remain, y), net.shape[1]),tf.int64)
# b = tf.mod(remain, net.shape[0])
# loc = tf.concat([y+1,x+1], axis=-1)
return output,loc
input = tf.Variable(np.random.rand(1, 6, 4, 1), dtype=np.float32)
output, mask = max_pool_with_argmax(input,2,2,2)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
input_value,output_value,mask_value = sess.run([input,output,mask])
print(input_value[0,:,:,0])
print(output_value[0,:,:,0])
print(mask_value[0,:,:,:])
#print
[[0.20101677 0.09207255 0.32177696 0.34424785]
[0.4116488 0.5965447 0.20575707 0.63288754]
[0.3145412 0.16090539 0.59698933 0.709239 ]
[0.00252096 0.18027237 0.11163216 0.40613824]
[0.4027637 0.1995668 0.7462126 0.68812144]
[0.8993007 0.55828506 0.5263306 0.09376772]]
[[0.5965447 0.63288754]
[0.3145412 0.709239 ]
[0.8993007 0.7462126 ]]
[[[2 2]
[2 4]]
[[3 1]
[3 4]]
[[6 1]
[5 3]]]
You can see (2,2) for pixel with intensity 0.5965447, (2, 4) for pixel with intensity 0.63288754 and so on.
Let's say you have the following max-pooling layer:
pool_layer= tf.nn.max_pool(conv_output,
ksize=[1, 2, 2, 1],
strides=[1, 2, 2, 1],
padding='VALID')
you can use:
max_pos = tf.gradients([pool_layer], [conv_output])[0]