K-means example(tf.expand_dims) - tensorflow

In Example code of Kmeans of Tensorflow,
When use the function 'tf.expand_dims'(Inserts a dimension of 1 into a tensor's shape.) in point_expanded, centroids_expanded
before calculate tf.reduce_sum.
why is these have different indexes(0, 1) in second parameter?
import numpy as np
import tensorflow as tf
points_n = 200
clusters_n = 3
iteration_n = 100
points = tf.constant(np.random.uniform(0, 10, (points_n, 2)))
centroids = tf.Variable(tf.slice(tf.random_shuffle(points), [0, 0],[clusters_n, -1]))
points_expanded = tf.expand_dims(points, 0)
centroids_expanded = tf.expand_dims(centroids, 1)
distances = tf.reduce_sum(tf.square(tf.subtract(points_expanded, centroids_expanded)), 2)
assignments = tf.argmin(distances, 0)
means = []
for c in range(clusters_n):
means.append(tf.reduce_mean(tf.gather(points,tf.reshape(tf.where(tf.equal(assignments, c)), [1, -1])), reduction_indices=[1]))
new_centroids = tf.concat(means,0)
update_centroids = tf.assign(centroids, new_centroids)
init = tf.global_variables_initializer()
with tf.Session() as sess:
sess.run(init)
for step in range(iteration_n):
[_, centroid_values, points_values, assignment_values] = sess.run([update_centroids, centroids, points, assignments])
print("centroids" + "\n", centroid_values)
plt.scatter(points_values[:, 0], points_values[:, 1], c=assignment_values, s=50, alpha=0.5)
plt.plot(centroid_values[:, 0], centroid_values[:, 1], 'kx', markersize=15)
plt.show()

This is done to subtract each centroid from each point. First, make sure you understand the notion of broadcasting (https://docs.scipy.org/doc/numpy/user/basics.broadcasting.html)
that is linked from tf.subtract (https://www.tensorflow.org/api_docs/python/tf/subtract). Then, you just need to draw the shapes of points, expanded_points, centroids, and expanded_centroids and understand what values get "broadcast" where. Once you do that you will see that broadcasting allows you to compute exactly what you want - subtract each point from each centroid.
As a sanity check, since there are 200 points, 3 centroids, and each is 2D, we should have 200*3*2 differences. This is exactly what we get:
In [53]: points
Out[53]: <tf.Tensor 'Const:0' shape=(200, 2) dtype=float64>
In [54]: points_expanded
Out[54]: <tf.Tensor 'ExpandDims_4:0' shape=(1, 200, 2) dtype=float64>
In [55]: centroids
Out[55]: <tf.Variable 'Variable:0' shape=(3, 2) dtype=float64_ref>
In [56]: centroids_expanded
Out[56]: <tf.Tensor 'ExpandDims_5:0' shape=(3, 1, 2) dtype=float64>
In [57]: tf.subtract(points_expanded, centroids_expanded)
Out[57]: <tf.Tensor 'Sub_5:0' shape=(3, 200, 2) dtype=float64>
If you are having trouble drawing the shapes, you can think of broadcasting the expanded_points with dimension (1, 200, 2) to dimension (3, 200, 2) as copying the 200x2 matrix 3 times along the first dimension. The 3x2 matrix in centroids_expanded (of shape (3, 1, 2)) get copied 200 times along the second dimension.

Related

How to implement tf.gather_nd in Pytorch with the argument batch_dims?

I have been doing a project on image matching, so I need to find correspondences between 2 images. To get descriptors, I will need a interpolate function. However, when I read about a equivalent function which is done in Tensorflow, I still don’t get how to implement tf.gather_nd(parmas, indices, barch_dims) in Pytorch. Especially when there is a argument: batch_dims. I have gone through stackoverflow and there is no perfect equivalence yet.
The referred interpolate function in Tensorflow is below and I have been trying to implement this in Pytorch Arguments' information is below:
inputs is a dense feature map[i] from a for loop of batch size, which means it is 3D[H, W, C](in pytorch is [C, H, W])
pos is a set of random point coordinate shapes like [[i, j], [i, j],...,[i, j]], so it is 2D when it goes in interpolate function(in pytorch is [[i,i,...,i], [j,j,...,j]])
and it then expands both of their dimensions when they get into this function
I just want a perfect implement of tf.gather_nd with argument batch_dims. Thank you!
And here's a simple example of using it:
pos = tf.ones((12, 2)) ## stands for a set of coordinates [[i, i,…, i], [j, j,…, j]]
inputs = tf.ones((4, 4, 128)) ## stands for [H, W, C] of dense feature map
outputs = interpolate(pos, inputs, batched=False)
print(outputs.get_shape()) # We get (12, 128) here
interpolate function (tf version):
def interpolate(pos, inputs, nd=True):
pos = tf.expand_dims(pos, 0)
inputs = tf.expand_dims(inputs, 0)
h = tf.shape(inputs)[1]
w = tf.shape(inputs)[2]
i = pos[:, :, 0]
j = pos[:, :, 1]
i_top_left = tf.clip_by_value(tf.cast(tf.math.floor(i), tf.int32), 0, h - 1)
j_top_left = tf.clip_by_value(tf.cast(tf.math.floor(j), tf.int32), 0, w - 1)
i_top_right = tf.clip_by_value(tf.cast(tf.math.floor(i), tf.int32), 0, h - 1)
j_top_right = tf.clip_by_value(tf.cast(tf.math.ceil(j), tf.int32), 0, w - 1)
i_bottom_left = tf.clip_by_value(tf.cast(tf.math.ceil(i), tf.int32), 0, h - 1)
j_bottom_left = tf.clip_by_value(tf.cast(tf.math.floor(j), tf.int32), 0, w - 1)
i_bottom_right = tf.clip_by_value(tf.cast(tf.math.ceil(i), tf.int32), 0, h - 1)
j_bottom_right = tf.clip_by_value(tf.cast(tf.math.ceil(j), tf.int32), 0, w - 1)
dist_i_top_left = i - tf.cast(i_top_left, tf.float32)
dist_j_top_left = j - tf.cast(j_top_left, tf.float32)
w_top_left = (1 - dist_i_top_left) * (1 - dist_j_top_left)
w_top_right = (1 - dist_i_top_left) * dist_j_top_left
w_bottom_left = dist_i_top_left * (1 - dist_j_top_left)
w_bottom_right = dist_i_top_left * dist_j_top_left
if nd:
w_top_left = w_top_left[..., None]
w_top_right = w_top_right[..., None]
w_bottom_left = w_bottom_left[..., None]
w_bottom_right = w_bottom_right[..., None]
interpolated_val = (
w_top_left * tf.gather_nd(inputs, tf.stack([i_top_left, j_top_left], axis=-1), batch_dims=1) +
w_top_right * tf.gather_nd(inputs, tf.stack([i_top_right, j_top_right], axis=-1), batch_dims=1) +
w_bottom_left * tf.gather_nd(inputs, tf.stack([i_bottom_left, j_bottom_left], axis=-1), batch_dims=1) +
w_bottom_right * tf.gather_nd(inputs, tf.stack([i_bottom_right, j_bottom_right], axis=-1), batch_dims=1)
)
interpolated_val = tf.squeeze(interpolated_val, axis=0)
return interpolated_val
As far as I'm aware there is no directly equivalent of tf.gather_nd in PyTorch and implementing a generic version with batch_dims is not that simple. However, you likely don't need a generic version, and given the context of your interpolate function, a version for [C, H, W] would suffice.
At the beginning of interpolate you add a singular dimension to the front, which is the batch dimension. Setting batch_dims=1 in tf.gather_nd means there is one batch dimension at the beginning, therefore it applies it per batch, i.e. it indexes inputs[0] with pos[0] etc. There is no benefit of adding a singular batch dimension, because you could have just used the direct computation.
# Adding singular batch dimension
# Shape: [1, num_pos, 2]
pos = tf.expand_dims(pos, 0)
# Shape: [1, H, W, C]
inputs = tf.expand_dims(inputs, 0)
batched_result = tf.gather_nd(inputs, pos, batch_dims=1)
single_result = tf.gater_nd(inputs[0], pos[0])
# The first element in the batched result is the same as the single result
# Hence there is no benefit to adding a singular batch dimension.
tf.reduce_all(batched_result[0] == single_result) # => True
Single version
In PyTorch the implementation for [H, W, C] can be done with Python's indexing. While PyTorch usually uses [C, H, W] for images, it's only a matter of what dimension to index, but let's keep them the same as in TensorFlow for the sake of comparison. If you were to index them manually, you would do it as such: inputs[pos_h[0], pos_w[0]], inputs[pos_h[1], pos_w[1]] and so on. PyTorch allows you to do that automatically by providing the indices as lists: inputs[pos_h, pos_w], where pos_h and pos_w have the same length. All you need to do is split your pos into two separate tensors, one for the indices along the height dimension and the other along the width dimension, which you also did in the TensorFlow version.
inputs = torch.randn(4, 4, 128)
# Random positions 0-3, shape: [12, 2]
pos = torch.randint(4, (12, 2))
# Positions split by dimension
pos_h = pos[:, 0]
pos_w = pos[:, 1]
# Index the inputs with the indices per dimension
gathered = inputs[pos_h, pos_w]
# Verify that it's identical to TensorFlow's output
inputs_tf = tf.convert_to_tensor(inputs.numpy())
pos_tf = tf.convert_to_tensor(pos.numpy())
gathered_tf = tf.gather_nd(inputs_tf, pos_tf)
gathered_tf = torch.from_numpy(gathered_tf.numpy())
torch.equal(gathered_tf, gathered) # => True
If you want to apply it to a tensor of size [C, H, W] instead, you only need to change the dimensions you want to index:
# For [H, W, C]
gathered = inputs[pos_h, pos_w]
# For [C, H, W]
gathered = inputs[:, pos_h, pos_w]
Batched version
Making it a batched batched version (for [N, H, W, C] or [N, C, H, W]) is not that difficult, and using that is more appropriate, since you're dealing with batches anyway. The only tricky part is that each element in the batch should only be applied to the corresponding batch. For this the batch dimensions needs to be enumerated, which can be done with torch.arange. The batch enumeration is just the list with the batch indices, which will be combined with the pos_h and pos_w indices, resulting in inputs[0, pos_h[0, 0], pos_h[0, 0]], inputs[0, pos_h[0, 1], pos_h[0, 1]] ... inputs[1, pos_h[1, 0], pos_h[1, 0]] etc.
batch_size = 3
inputs = torch.randn(batch_size, 4, 4, 128)
# Random positions 0-3, different for each batch, shape: [3, 12, 2]
pos = torch.randint(4, (batch_size, 12, 2))
# Positions split by dimension
pos_h = pos[:, :, 0]
pos_w = pos[:, :, 1]
batch_enumeration = torch.arange(batch_size) # => [0, 1, 2]
# pos_h and pos_w have shape [3, 12], so the batch enumeration needs to be
# repeated 12 times per batch.
# Unsqueeze to get shape [3, 1], now the 1 could be repeated to 12, but
# broadcasting will do that automatically.
batch_enumeration = batch_enumeration.unsqueeze(1)
# Index the inputs with the indices per dimension
gathered = inputs[batch_enumeration, pos_h, pos_w]
# Again, verify that it's identical to TensorFlow's output
inputs_tf = tf.convert_to_tensor(inputs.numpy())
pos_tf = tf.convert_to_tensor(pos.numpy())
# This time with batch_dims=1
gathered_tf = tf.gather_nd(inputs_tf, pos_tf, batch_dims=1)
gathered_tf = torch.from_numpy(gathered_tf.numpy())
torch.equal(gathered_tf, gathered) # => True
Again, for [N, C, H, W], only the dimensions that are indexed need to be changed:
# For [N, H, W, C]
gathered = inputs[batch_enumeration, pos_h, pos_w]
# For [N, C, H, W]
gathered = inputs[batch_enumeration, :, pos_h, pos_w]
Just a little side note on the interpolate implementation, rounding the positions (floor and ceil respectively) doesn't make sense, because indices must be integers, so it has no effect, as long as your positions are actual indices. That also results in i_top_left and i_bottom_left being the same value, but even if they are to be rounded differently, they are always 1 position apart. Furthermore, i_top_left and i_top_right are literally the same. I don't think that this function produces a meaningful output. I don't know what you're trying to achieve, but if you're looking for image interpolation you could have a look at torch.nn.functional.interpolate.
This is just an extension of Michael Jungo's batched version answer when pos is 2D array instead of 1D array (excluding batch dimension).
bs = 2
H = 4
W = 6
C = 3
inputs = torch.randn(bs, H, W, C)
pos_h = torch.randint(H, (bs, H, W))
pos_w = torch.randint(W, (bs, H, W))
batch_enumeration = torch.arange(bs)
batch_enumeration = batch_enumeration.unsqueeze(1).unsqueeze(2)
inputs.shape
Out[34]: torch.Size([2, 4, 6, 3])
pos_h.shape
Out[35]: torch.Size([2, 4, 6])
pos_w.shape
Out[36]: torch.Size([2, 4, 6])
batch_enumeration.shape
Out[37]: torch.Size([2, 1, 1])
gathered = inputs[batch_enumeration, pos_h, pos_w]
For channel first, we also need to enumerate channels
inputs = torch.randn(bs, C, H, W)
pos_h = torch.randint(H, (bs, 1, H, W))
pos_w = torch.randint(W, (bs, 1, H, W))
batch_enumeration = torch.arange(bs)
batch_enumeration = batch_enumeration.unsqueeze(1).unsqueeze(2).unsqueeze(3)
channel_enumeration = torch.arange(C)
channel_enumeration = channel_enumeration.unsqueeze(0).unsqueeze(2).unsqueeze(3)
inputs.shape
Out[49]: torch.Size([2, 3, 4, 6])
pos_h.shape
Out[50]: torch.Size([2, 1, 4, 6])
pos_w.shape
Out[51]: torch.Size([2, 1, 4, 6])
batch_enumeration.shape
Out[52]: torch.Size([2, 1, 1, 1])
channel_enumeration.shape
Out[57]: torch.Size([1, 3, 1, 1])
gathered = inputs[batch_enumeration, channel_enumeration, pos_h, pos_w]
gathered.shape
Out[59]: torch.Size([2, 3, 4, 6])
Let's verify
inputs_np = inputs.numpy()
pos_h_np = pos_h.numpy()
pos_w_np = pos_w.numpy()
gathered_np = gathered.numpy()
pos_h_np[0,0,0,0]
Out[68]: 0
pos_w_np[0,0,0,0]
Out[69]: 3
inputs_np[0,:,0,3]
Out[71]: array([ 0.79122806, -2.190181 , -0.16741803], dtype=float32)
gathered_np[0,:,0,0]
Out[72]: array([ 0.79122806, -2.190181 , -0.16741803], dtype=float32)
pos_h_np[1,0,3,4]
Out[73]: 1
pos_w_np[1,0,3,4]
Out[74]: 2
inputs_np[1,:,1,2]
Out[75]: array([ 0.9282498 , -0.34945545, 0.9136222 ], dtype=float32)
gathered_np[1,:,3,4]
Out[77]: array([ 0.9282498 , -0.34945545, 0.9136222 ], dtype=float32)
I improved the answer from Michael Jungo's implementation. Now it supports arbitrary leading batch dimensions.
def gather_nd_torch(params, indices, batch_dim=1):
""" A PyTorch porting of tensorflow.gather_nd
This implementation can handle leading batch dimensions in params, see below for detailed explanation.
The majority of this implementation is from Michael Jungo # https://stackoverflow.com/a/61810047/6670143
I just ported it compatible to leading batch dimension.
Args:
params: a tensor of dimension [b1, ..., bn, g1, ..., gm, c].
indices: a tensor of dimension [b1, ..., bn, x, m]
batch_dim: indicate how many batch dimension you have, in the above example, batch_dim = n.
Returns:
gathered: a tensor of dimension [b1, ..., bn, x, c].
Example:
>>> batch_size = 5
>>> inputs = torch.randn(batch_size, batch_size, batch_size, 4, 4, 4, 32)
>>> pos = torch.randint(4, (batch_size, batch_size, batch_size, 12, 3))
>>> gathered = gather_nd_torch(inputs, pos, batch_dim=3)
>>> gathered.shape
torch.Size([5, 5, 5, 12, 32])
>>> inputs_tf = tf.convert_to_tensor(inputs.numpy())
>>> pos_tf = tf.convert_to_tensor(pos.numpy())
>>> gathered_tf = tf.gather_nd(inputs_tf, pos_tf, batch_dims=3)
>>> gathered_tf.shape
TensorShape([5, 5, 5, 12, 32])
>>> gathered_tf = torch.from_numpy(gathered_tf.numpy())
>>> torch.equal(gathered_tf, gathered)
True
"""
batch_dims = params.size()[:batch_dim] # [b1, ..., bn]
batch_size = np.cumprod(list(batch_dims))[-1] # b1 * ... * bn
c_dim = params.size()[-1] # c
grid_dims = params.size()[batch_dim:-1] # [g1, ..., gm]
n_indices = indices.size(-2) # x
n_pos = indices.size(-1) # m
# reshape leadning batch dims to a single batch dim
params = params.reshape(batch_size, *grid_dims, c_dim)
indices = indices.reshape(batch_size, n_indices, n_pos)
# build gather indices
# gather for each of the data point in this "batch"
batch_enumeration = torch.arange(batch_size).unsqueeze(1)
gather_dims = [indices[:, :, i] for i in range(len(grid_dims))]
gather_dims.insert(0, batch_enumeration)
gathered = params[gather_dims]
# reshape back to the shape with leading batch dims
gathered = gathered.reshape(*batch_dims, n_indices, c_dim)
return gathered
I have also made a demo Colab notebook, you can check it here. This implementation is way faster than TF's original implementation according to my poor speed test on Colab server with a GPU instance.

How to merge two dimensions of a tensor without assuming the shape of it

Let's assume the following function:
from tensorflow.python.keras import backend as K
def broadcast_sum(a, b):
a = K.expand_dims(a, 1)
b = K.expand_dims(b, 2)
c = a + b
cs = K.shape(c)
return K.reshape(c, (cs[0], -1, cs[-1]))
Given the two tensors of shapes (1, 3, 2) and (1, 4, 2), it correctly returns:
>>> broadcast_sum(K.placeholder((1, 3, 2)), K.placeholder((1, 4, 2)))
>>> <tf.Tensor 'Reshape_2:0' shape=(1, 12, 2) dtype=float32>
Right now, this function works only with 3D input (because of the reshape line). My question is, how can I make this work with any shape (using the same function) without knowing the shape? Of course, I'm assuming the inputs are of the same shape and at least a 3D. But how can I have a single function that works with 3D, 4D and so on?
And I'm assuming that it's always the second dimension (from left) that the function will broadcast and the rest of the dimensions are identical between the two inputs. Here are the shapes that I want to make the same function to work with:
>>> broadcast_sum(K.placeholder((1, 3, 5, 2)), K.placeholder((1, 4, 5, 2)))
>>> <tf.Tensor 'Reshape_3:0' shape=(1, 60, 2) dtype=float32>
Of course, the returned tensor is wrong right now. It should be of shape (1, 12, 5, 2).
[UPDATE]
Please also consider that the first dimension (the batch size) could be None. In fact, any of the dimensions except the rightmost one could be None.
And I'm assuming that it's always the second dimension (from left)
that the function will broadcast and the rest of the dimensions are
identical between the two inputs.
Based on this, I reuse the shape information from one of the inputs.
from tensorflow.python.keras import backend as K
def broadcast_sum(a, b):
final_shape = (a.shape[0], -1, *a.shape[2:])
a = K.expand_dims(a, 1)
b = K.expand_dims(b, 2)
c = a + b
return K.reshape(c, final_shape)
print(broadcast_sum(K.placeholder((1, 3, 2)), K.placeholder((1, 4, 2))))
print(broadcast_sum(K.placeholder((1, 3, 5, 2)), K.placeholder((1, 4, 5, 2))))
Tensor("Reshape:0", shape=(1, 4, 3, 2), dtype=float32)
Tensor("Reshape_1:0", shape=(1, 12, 5, 2), dtype=float32)

Tensorflow add broadcasting

Can someone tell me what's going on here? I'm confused.
In [125]: a
Out[125]: <tf.Tensor 'MatMul_86739:0' shape=(100, 1) dtype=float32>
In [126]: embed
Out[126]: <tf.Tensor 'embedding_lookup_41205:0' shape=(100,) dtype=float32>
In [128]: a+embed
Out[128]: <tf.Tensor 'add_43373:0' shape=(100, 100) dtype=float32>
How can (100,1) + (100,) be (100,100)? and if so, WHY?
The rules for TensorFlow's broadcasting operators are based on NumPy's broadcasting rules.
The basic algorithm for broadcasting works from right to left. Assume we are adding (or applying another binary broadcasting operator to) two tensors x and y, the following code computes the shape of the result:
result_shape = []
# Loop over the matching dimensions of x and y in reverse.
for x_dim, y_dim in zip(x.shape[::-1], y.shape[::-1]):
if x.shape == y.shape:
result_shape.insert(0, x.shape)
elif x.shape == 1:
result_shape.insert(0, y.shape) # x will be broadcast along this dimension.
elif y.shape == 1:
result_shape.insert(0, x.shape) # y will be broadcast along this dimension.
else:
raise ValueError("Shapes of x and y are incompatible.")
# If x and y have a different rank, the leading dimensions are inherited
# from the tensor with higher rank.
if len(x.shape) > len(y.shape):
num_leading_dims = len(x.shape) - len(y.shape)
result_shape = x.shape[0:num_leading_dims] + result_shape
elif len(y.shape) > len(x.shape):
num_leading_dims = len(y.shape) - len(x.shape)
result_shape = y.shape[0:num_leading_dims] + result_shape
Now in your example, you have x.shape = (100,) and y.shape = (100, 1):
The first comparison is between 100 and 1, and so result_shape = [100].
y.shape is longer than x.shape, so we add the leading dimension to the result, giving result_shape = [100, 100].

SparseTensor equivalent of tf.tile?

There's a tf.tile function, which takes a tensor and copies it a given number of times.
f = tf.tile([5], [3])
f.eval() == array([3, 3, 3], dtype=int32)
How to achieve something similar with SparseTensors:
g = tf.SparseTensorValue([[0, 0]], values=[5], shape=[1, 1])
tiled = tf.tile(g, [10, 1]) <- gives ValueError: Argument must be a dense tensor
?
Ok, I have found a solution (that works on SparseTensors, but not on SparseTensorValues):
tiled = tf.sparse_concat(0, [g] * 10)

Confused about conv2d_transpose

I'm getting this error message when using conv2d_transpose:
W tensorflow/core/common_runtime/executor.cc:1102] 0x7fc81f0d6250 Compute status: Invalid argument: Conv2DBackpropInput: Number of rows of out_backprop doesn't match computed: actual = 32, computed = 4
[[Node: generator/g_h1/conv2d_transpose = Conv2DBackpropInput[T=DT_FLOAT, padding="SAME", strides=[1, 2, 2, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/cpu:0"](generator/g_h1/conv2d_transpose/output_shape, generator/g_h1/w/read, _recv_l_0)]]
However, it occurs after the graph is built while compiling the loss function (Adam). Any ideas on what would cause this? I suspect it's related to the input dimensions but I'm not sure exactly why.
Full error: https://gist.github.com/jimfleming/75d88e888044615dd6e3
Relevant code:
# l shape: [batch_size, 32, 32, 4]
output_shape = [self.batch_size, 8, 8, 128]
filter_shape = [7, 7, 128, l.get_shape()[-1]]
strides = [1, 2, 2, 1]
with tf.variable_scope("g_h1"):
w = tf.get_variable('w', filter_shape, initializer=tf.random_normal_initializer(stddev=0.02))
h1 = tf.nn.conv2d_transpose(l, w, output_shape=output_shape, strides=strides, padding='SAME')
h1 = tf.nn.relu(h1)
output_shape = [self.batch_size, 16, 16, 128]
filter_shape = [7, 7, 128, h1.get_shape()[-1]]
strides = [1, 2, 2, 1]
with tf.variable_scope("g_h2"):
w = tf.get_variable('w', filter_shape, initializer=tf.random_normal_initializer(stddev=0.02))
h2 = tf.nn.conv2d_transpose(h1, w,output_shape=output_shape, strides=strides, padding='SAME')
h2 = tf.nn.relu(h2)
output_shape = [self.batch_size, 32, 32, 3]
filter_shape = [5, 5, 3, h2.get_shape()[-1]]
strides = [1, 2, 2, 1]
with tf.variable_scope("g_h3"):
w = tf.get_variable('w', filter_shape, initializer=tf.random_normal_initializer(stddev=0.02))
h3 = tf.nn.conv2d_transpose(h2, w,output_shape=output_shape, strides=strides, padding='SAME')
h3 = tf.nn.tanh(h3)
Thanks for the question! You're exactly right---the problem is that the input and output dimensions being passed to tf.nn.conv2d_transpose don't agree. (The error may be detected when computing gradients, but the gradient computation isn't the problem.)
Let's look at just the first part of your code, and simplify it a little bit:
sess = tf.Session()
batch_size = 3
output_shape = [batch_size, 8, 8, 128]
strides = [1, 2, 2, 1]
l = tf.constant(0.1, shape=[batch_size, 32, 32, 4])
w = tf.constant(0.1, shape=[7, 7, 128, 4])
h1 = tf.nn.conv2d_transpose(l, w, output_shape=output_shape, strides=strides, padding='SAME')
print sess.run(h1)
I replaced the variables with constants --- it's easier to see what's going on.
If you try to run this code, you get a similar error:
InvalidArgumentError: Conv2DCustomBackpropInput: Size of out_backprop doesn't match computed: actual = 32, computed = 4
[[Node: conv2d_transpose_6 = Conv2DBackpropInput[T=DT_FLOAT, data_format="NHWC", padding="SAME", strides=[1, 2, 2, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/cpu:0"](conv2d_transpose_6/output_shape, Const_25, Const_24)]]
Now, the error is a little misleading --- it talks about the 'out_backprop' argument to 'Conv2DCustomBackpropInput'. The key is that tf.nn.conv2d_transpose is actually just the gradient of tf.nn.conv2d, so Tensorflow uses the same code internally (Conv2DCustomBackpropInput) to compute the gradient of tf.nn.conv2d and to compute tf.nn.conv2d_transpose.
The error means that the 'output_shape' you requested is not possible, given the shapes of 'l' and 'w'.
Since tf.nn.conv2d_transpose is the backward (gradient) counterpart of tf.nn.conv2d, one way to see what the correct shapes should be is to use the corresponding forward operation:
output = tf.constant(0.1, shape=output_shape)
expected_l = tf.nn.conv2d(output, w, strides=strides, padding='SAME')
print expected_l.get_shape()
# Prints (3, 4, 4, 4)
That is, in the forward direction, if you provided a tensor of shape 'output_shape', you would get out a tensor of shape (3, 4, 4, 4).
So one way to fix the problem is to change the shape of 'l' to (3, 4, 4, 4); if you change the code above to:
l = tf.constant(0.1, shape=[batch_size, 4, 4, 4])
everything works fine.
In general, try using tf.nn.conv2d to get a feel for what the relationship between the tensor shapes is. Since tf.nn.conv2d_transpose is its backward counterpart, it has the same relationship between input, output and filter shapes (but with the roles of the input and output reversed.)
Hope that helps!
Using padding='SAME' in tf.nn.conv2d_transpose() function may works too