Backpropagating gradients through nested tf.map_fn - tensorflow

I would like to map a TensorFlow function on each vector corresponding to the depth channel of every pixel in a matrix with dimension [batch_size, H, W, n_channels].
In other words, for every image of size H x W that I have in the batch:
I extract some features maps F_k (whose number is n_channels) with the same size H x W (hence, the features maps all together are a tensor of shape [H, W, n_channels];
then, I wish to apply a custom function to the vector v_ij that is associated with the i-th row and j-th column of each feature map F_k, but explores the depth channel in its entirety (e.g. v has dimension [1 x 1 x n_channels]). Ideally, all of this would happen in parallel.
A picture to explain the process can be found below. The only difference with the picture is that both input and output "receptive fields" have size 1x1 (apply the function to each pixel independently).
This would be similar to applying a 1x1 convolution to the matrix; however, I need to apply a more general function over the depth channel, rather than a simple sum operation.
I think tf.map_fn() could be an option and I tried the following solution, where I recursively use tf.map_fn() to access the features associated with each pixel. However, this kind of seems sub-optimal, and most importantly it raises an error when trying to backpropagate the gradients.
Do you have any idea of the reason why this happens and how I should structure my code to avoid the error?
This is my current implementation of the function:
import tensorflow as tf
from tensorflow import layers
def apply_function_on_pixel_features(incoming):
# at first the input is [None, W, H, n_channels]
if len(incoming.get_shape()) > 1:
return tf.map_fn(lambda x: apply_function_on_pixel_features(x), incoming)
else:
# here the input is [n_channels]
# apply some function that applies a transfomration and returns a vetor of the same size
output = my_custom_fun(incoming) # my_custom_fun() doesn't change the shape
return output
and the body of my code:
H = 128
W = 132
n_channels = 8
x1 = tf.placeholder(tf.float32, [None, H, W, 1])
x2 = layers.conv2d(x1, filters=n_channels, kernel_size=3, padding='same')
# now apply a function to the features vector associated to each pixel
x3 = apply_function_on_pixel_features(x2)
x4 = tf.nn.softmax(x3)
loss = cross_entropy(x4, labels)
optimizer = tf.train.AdamOptimizer(lr)
train_op = optimizer.minimize(loss) # <--- ERROR HERE!
Particularly, the error is the following:
File "/home/venvs/tensorflowGPU/lib/python3.6/site-packages/tensorflow/python/ops/control_flow_ops.py", line 2481, in AddOp
self._AddOpInternal(op)
File "/home/venvs/tensorflowGPU/lib/python3.6/site-packages/tensorflow/python/ops/control_flow_ops.py", line 2509, in _AddOpInternal
self._MaybeAddControlDependency(op)
File "/home/venvs/tensorflowGPU/lib/python3.6/site-packages/tensorflow/python/ops/control_flow_ops.py", line 2547, in _MaybeAddControlDependency
op._add_control_input(self.GetControlPivot().op)
AttributeError: 'NoneType' object has no attribute 'op'
The whole error stack and the code can be found here.
Thanks for the help,
G.
Update:
Following #thushv89 suggestion, I added a possible solution to the problem. I still don't know why my previous code didn't work. Any insight on this would still be very appreciated.

#gabriele regarding having to depend on batch_size, have you tried doing it the following way? This function does not depend on batch_size. You can replace the map_fn with anything you like.
def apply_function_on_pixel_features(incoming):
# get input shape:
_, W, H, C = incoming.get_shape().as_list()
incoming_flat = tf.reshape(incoming, shape=[-1, C])
# apply function on every vector of shape [1, C]
out_matrix = tf.map_fn(lambda x: x+1, incoming_flat) # dimension remains unchanged
# go back to the input shape shape [None, W, H, C]
out_matrix = tf.reshape(out_matrix, shape=[-1, W, H, C])
return out_matrix
The full code of what I tested is as below.
import numpy as np
import tensorflow as tf
from tensorflow.keras.losses import categorical_crossentropy
def apply_function_on_pixel_features(incoming):
# get input shape:
_, W, H, C = incoming.get_shape().as_list()
incoming_flat = tf.reshape(incoming, shape=[-1])
# apply function on every vector of shape [1, C]
out_matrix = tf.map_fn(lambda x: x+1, incoming_flat) # dimension remains unchanged
# go back to the input shape shape [None, W, H, C]
out_matrix = tf.reshape(out_matrix, shape=[-1, W, H, C])
return out_matrix
H = 32
W = 32
x1 = tf.placeholder(tf.float32, [None, H, W, 1])
labels = tf.placeholder(tf.float32, [None, 10])
x2 = tf.layers.conv2d(x1, filters=1, kernel_size=3, padding='same')
# now apply a function to the features vector associated to each pixel
x3 = apply_function_on_pixel_features(x2)
x4 = tf.layers.flatten(x3)
x4 = tf.layers.dense(x4, units=10, activation='softmax')
loss = categorical_crossentropy(labels, x4)
optimizer = tf.train.AdamOptimizer(0.001)
train_op = optimizer.minimize(loss)
x = np.zeros(shape=(10, H, W, 1))
y = np.random.choice([0,1], size=(10, 10))
with tf.Session() as sess:
tf.global_variables_initializer().run()
sess.run(train_op, feed_dict={x1: x, labels:y})

Following #thushv89 suggestion, I reshaped the array, applied the function and then reshaped it back (so to avoid the tf.map_fn recursion). I still don't know exactly why the previous code didn't work, but the current implementation allowed to propagate the gradients back to the previous layers. I'll leave it below, for whom might be interested:
def apply_function_on_pixel_features(incoming, batch_size):
# get input shape:
_, W, H, C = incoming.get_shape().as_list()
incoming_flat = tf.reshape(incoming, shape=[batch_size * W * H, C])
# apply function on every vector of shape [1, C]
out_matrix = my_custom_fun(incoming_flat) # dimension remains unchanged
# go back to the input shape shape [None, W, H, C]
out_shape = tf.convert_to_tensor([batch_size, W, H, C])
out_matrix = tf.reshape(out_matrix, shape=out_shape)
return out_matrix
Notice that now I needed to give the batch size to correctly reshape the tensor because TensorFlow would complain if I gave None or -1 as a dimension.
Any comments and insight on the above code would still be very appreciated.

Related

Colab resources and Self-Attention (OOM when allocating tensor)

I'm trying to implement a self-Attention GAN on google Colab with Keras. When i test my Attention Layer i've got a OOM error. So, am i doing something wrong with matrix multiplications or its just a too expensive operation for colab GPU at higher resolutions (> 64 x 64)?
def hw_flatten(x):
# Input shape x: [BATCH, HEIGHT, WIDTH, CHANNELS]
# flat the feature volume across the width and height dimensions
x = Reshape((x.shape[1]*x.shape[2], x.shape[3]))(x) #in the Reshape layer batch is implicit
return x # return [BATCH, W*H, CHANNELS]
def matmul(couple_t):
tensor_1 = couple_t[0]
tensor_2 = couple_t[1]
transponse = couple_t[2] #boolean
return tf.matmul(tensor_1, tensor_2, transpose_b=transponse)
class SelfAttention(Layer):
def _init_(self, ch, **kwargs):
super(SelfAttention, self).__init__(**kwargs)
self.ch = ch
def attentionMap(self, feature_map):
f = Conv2D(filters=feature_map.shape[3]/8, kernel_size=(1,1), strides=1, padding='same')(feature_map) # [bs, h, w, c']
g = Conv2D(filters=feature_map.shape[3]/8, kernel_size=(1,1), strides=1, padding='same')(feature_map) # [bs, h, w, c']
h = Conv2D(filters=feature_map.shape[3], kernel_size=(1,1), strides=1, padding='same')(feature_map) # [bs, h, w, c']
s = Lambda(matmul)([hw_flatten(g), hw_flatten(f), True]) # [bs, N, N]
beta = Activation("softmax")(s)
o = Lambda(matmul)([beta, hw_flatten(h), False]) # [bs, N, C]
gamma = self.add_weight(name='gamma', shape=[1], initializer='zeros', trainable=True)
o = Reshape((feature_map.shape[1:]))(o) # [bs, h, w, C]
x = gamma * o + feature_map
print(x.shape)
return x
This is the test:
tensor = np.random.normal(0, 1, size=(32, 64, 64, 512)).astype('float64')
attention_o = SelfAttention(64)
a = attention_o.attentionMap(tensor)
This is the error:
OOM when allocating tensor with shape[32,4096,4096] and type double
Thank you so much for your Attention :D
Your tensor of size 32x4096x4096 has 536870912 entries! This, multiplied by the number of bytes in a double (8), and converted to Gb is 4294! That is over 4Tb, and definitely will not fit in a GPU. You might want to add in a few max pooling layers to reduce the dimensionality of your data before applying self-attention.

Randomly selecting elements from a tensor in Tensorflow

Given a tensor whose shape is Nx2, how is it possible to select k elements from this tensor akin to np.random.choice (with equal probability) ? Another point to note is that the value of N dynamically changes during execution. Meaning to say that I'm dealing with a dynamically-sized tensor.
You can just wrap np.random.choice as a tf.py_func. See for example this answer. In your case, you need to flatten your tensor so it is an array of length 2*N:
import numpy as np
import tensorflow as tf
a = tf.placeholder(tf.float32, shape=[None, 2])
size = tf.placeholder(tf.int32)
y = tf.py_func(lambda x, s: np.random.choice(x.reshape(-1),s), [a, size], tf.float32)
with tf.Session() as sess:
print(sess.run(y, {a: np.random.rand(4,2), size:5}))
I had a similar problem, where I wanted to subsample points from a pointcloud for an implementation of PointNet. My input dimension was [None, 2048, 3], and I was subsampling down to [None, 1024, 3] using the following custom layer:
class SubSample(Layer):
def __init__(self,num_samples):
super(SubSample, self).__init__()
self.num_samples=num_samples
def build(self, input_shape):
self.shape = input_shape #[None,2048,3]
def call(self, inputs, training=None):
k = tf.random.uniform([self.shape[1],]) #[2048,]
bl = tf.argsort(k)<self.num_samples #[2048,]
res = tf.boolean_mask(inputs, bl, axis=1) #[None,1024,3]
# Reshape needed so that channel shape is passed when `run_eagerly=False`, otherwise it returns `None`
return tf.reshape(res,(-1,self.num_samples,self.shape[-1])) #[None,1024,3]
SubSample(1024)(tf.random.uniform((64,2048,3))).shape
>>> TensorShape([64, 1024, 3])
As far as I can tell, this works for TensorFlow 2.5.0
Note that this isn't directly an answer to the question at hand, but the answer that I was looking for when I stumbled across this question.

How to optimize a variable with dynamic shape?

Based on following code, m0 is a constant with shape (3,1) but got changed its shape inside the while loop.
So after the while loop, the Tensorflow doesn't know its shape, but I use set_shape to change it to correct shape.
However, when you run it through optimization(take gradients), it pop a error:
Incompatible shapes between op input and calculated input gradient. Forward operation: while_29/Enter_1. Input index: 0. Original input shape: (3, 1). Calculated input gradient shape: (15, 1)
It seems like the gradients still treat the shape as (3,1) but our set_shape change it to shape (15,1). Could anyone please tell me how to fix?
sess = tf.Session()
i0 = tf.constant(0)
m0 = tf.ones([3, 1])
x = tf.get_variable('www', shape=(3,1), initializer=tf.zeros_initializer)
loop = 5
def _cond(i0, m0):
return tf.less(i0, loop-1)
def _res(i0, m0):
n = tf.ones([3, 1]) + x
m0 = tf.concat([m0, n], axis=0)
return i0+1, m0
i0, m0 = tf.while_loop(
_cond, _res, loop_vars=[i0, m0],
shape_invariants=[i0.get_shape(), tf.TensorShape([None, 1])])
m0.set_shape([loop*3,1])
opt = tf.train.AdagradOptimizer(1)
grad = opt.compute_gradients(m0)
sess.run(tf.global_variables_initializer())
print(sess.run(grad))
The short answer is that your problem can be efficiently solved by creating m0 like this:
m0 = 1 + tf.tile( x, (loop,1) )
However, the answer to the underlying problem you have above is that you are growing m0 in a loop. However, you know the size that you want m0 to take, so if you really must use a while_loop, then you should use a TensorArray. Something like this:
def mystack(x, n):
loop_vars = [
tf.constant(0, tf.int32),
tf.TensorArray(x.dtype, size=n),
]
_, fx = tf.while_loop(
lambda j, _: j < n,
lambda j, result: (j + 1, result.write(j, 1+x)),
loop_vars
)
return tf.reshape( fx.stack(), (-1,1) )
x = tf.constant( numpy.random.randn(3,1), tf.float32 )
loop = 5
m = mystack(x,loop)
with tf.Session() as sess:
print(sess.run(m).shape)

How shape Tensor array?

I have lately been vexed by the following error message:
ValueError: Cannot feed value of shape (2455040,) for Tensor 'Placeholder:0', which has shape '(2455040, ?)'
Which is being produced from running the following code:
NUMCLASSES=16
NUMPIXELS=959*640*4
# set up to feed an array of images [images, size_of_image]
x = tf.placeholder(tf.float32, [NUMPIXELS,None])
....deletia....
# Define loss and optimizer..why is this 2d?
y_ = tf.placeholder(tf.float32, [None,NUMCLASSES])
sess = tf.InteractiveSession()
tf.global_variables_initializer().run(session=sess)
tl = get_tensor_list()
for f, n in tl:
str = '/users/me/downloads/train/' + f
mm = Image.open(str)
mm = mm.convert('F')
mma=np.array(mm)
i = mma.flatten() #now this is an array of floats of size NUMPIXELS
sess.run(train_step, feed_dict={x: i, y_: n}) # <<DEATH
Somehow, that array is getting a shape that tf does not like [(x,) when it wants (x,?)]. How to satisfy the tensorgods in this case? The tensor must be what it must be for other mathematical reasons not discussed.
reshaping the array might help.
i = mma.flatten().reshape((NUMPIXELS,1))
The error happens because the two tensors have different ranks: tensor with shape (2455040,) has rank 1, while tensor with shape (2455040,?) has rank 2.
You can do this:
x = tf.placeholder(tf.float32, [None])
x = tf.reshape(x, [NUMPIXELS,-1])

What does it mean "inputs must be a list"?

Below codes show me "inputs must be a list". at this.
outputs, states = rnn.rnn(lstm_cell, x, dtype=tf.float32)
When I define placeholder for input x. I have already set a shape as [None,None]. I think this shape is 2-dimensional array. However, the code continuously requires list type of x.
Below, I have attached all of my codes before training. And this codes are inserted into function of class.
x = tf.placeholder("float",[None,None])
y = tf.placeholder("float",[None])
lstm_cell = rnn_cell.BasicLSTMCell(self.n_hidden, forget_bias=1.0)
outputs, states = rnn.rnn(lstm_cell, x, dtype=tf.float32)
pred = tf.matmul(outpus[-1], self.weights['out']) + self.biases['out']
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(pred,y))
optimizer = tf.train.AdamOptimizer(learning_rate=self.learning_rate).minimize(cost)
correct_pred = tf.equal(tf.argmax(pred,1), tf.argmax(y,1))
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))
init = tf.initialize_all_variables()
self.sess = tf.Session()
self.sess.run(init)
Additionally, practical inputs will be float of word sequence and float of label formed as x=[["aaa","aaa","aaa"],["bbb","bbb"]], y=["c1","c2"].
At that, the first element array of x is labeled with "c1" and the second is "c2". Especially, size of each element array of x cannot be deterministic.
As stated by the documentation, the argument inputs of the function tf.nn.rnn() is:
inputs: A length T list of inputs, each a Tensor of shape [batch_size, input_size], or a nested tuple of such elements.
In your code, the argument inputs is x, a Tensor placeholder of shape [None, None]. In order for your code to work, x must be a list of T tensors of shape [None, input_lenght].
The following code generates a list of tensors as inputs and therefore the function tf.nn.rnn works.
import tensorflow as tf
x = tf.placeholder("float",[None,16])
y = tf.placeholder("float",[None])
lstm_cell = tf.nn.rnn_cell.BasicLSTMCell(256, forget_bias=1.0)
inputs = []
for t in range(10):
inputs.append(x)
print(len(inputs))
outputs, states = tf.nn.rnn(lstm_cell, inputs, dtype=tf.float32)
pred = tf.matmul(outputs[-1], self.weights['out']) + self.biases['out']
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(pred,y))
optimizer = tf.train.AdamOptimizer(learning_rate=self.learning_rate).minimize(cost)
correct_pred = tf.equal(tf.argmax(pred,1), tf.argmax(y,1))
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))
init = tf.initialize_all_variables()
self.sess = tf.Session()
self.sess.run(init)
Note how the placeholder x has a defined shape of [None, input_shape]. It won't work with a shape [None, None] because the first dimensions is the batch_size, which can be None, but the second dimension is the size of each item in the input sequence, and that value can't be None.