Colab resources and Self-Attention (OOM when allocating tensor) - tensorflow

I'm trying to implement a self-Attention GAN on google Colab with Keras. When i test my Attention Layer i've got a OOM error. So, am i doing something wrong with matrix multiplications or its just a too expensive operation for colab GPU at higher resolutions (> 64 x 64)?
def hw_flatten(x):
# Input shape x: [BATCH, HEIGHT, WIDTH, CHANNELS]
# flat the feature volume across the width and height dimensions
x = Reshape((x.shape[1]*x.shape[2], x.shape[3]))(x) #in the Reshape layer batch is implicit
return x # return [BATCH, W*H, CHANNELS]
def matmul(couple_t):
tensor_1 = couple_t[0]
tensor_2 = couple_t[1]
transponse = couple_t[2] #boolean
return tf.matmul(tensor_1, tensor_2, transpose_b=transponse)
class SelfAttention(Layer):
def _init_(self, ch, **kwargs):
super(SelfAttention, self).__init__(**kwargs)
self.ch = ch
def attentionMap(self, feature_map):
f = Conv2D(filters=feature_map.shape[3]/8, kernel_size=(1,1), strides=1, padding='same')(feature_map) # [bs, h, w, c']
g = Conv2D(filters=feature_map.shape[3]/8, kernel_size=(1,1), strides=1, padding='same')(feature_map) # [bs, h, w, c']
h = Conv2D(filters=feature_map.shape[3], kernel_size=(1,1), strides=1, padding='same')(feature_map) # [bs, h, w, c']
s = Lambda(matmul)([hw_flatten(g), hw_flatten(f), True]) # [bs, N, N]
beta = Activation("softmax")(s)
o = Lambda(matmul)([beta, hw_flatten(h), False]) # [bs, N, C]
gamma = self.add_weight(name='gamma', shape=[1], initializer='zeros', trainable=True)
o = Reshape((feature_map.shape[1:]))(o) # [bs, h, w, C]
x = gamma * o + feature_map
print(x.shape)
return x
This is the test:
tensor = np.random.normal(0, 1, size=(32, 64, 64, 512)).astype('float64')
attention_o = SelfAttention(64)
a = attention_o.attentionMap(tensor)
This is the error:
OOM when allocating tensor with shape[32,4096,4096] and type double
Thank you so much for your Attention :D

Your tensor of size 32x4096x4096 has 536870912 entries! This, multiplied by the number of bytes in a double (8), and converted to Gb is 4294! That is over 4Tb, and definitely will not fit in a GPU. You might want to add in a few max pooling layers to reduce the dimensionality of your data before applying self-attention.

Related

How to implement style based recalibration module (SRM) - Attention mechanism

I am using pre-trained VGG16 and applying style based recalibration module (SRM) after the last-fully connected layer, but it doesn't work. Could you please explain to me what I'm not doing right? I've attached my code below for reference:
import tensorflow as tf
def SRM_block(x, channels, use_bias=False, is_training=True, scope='srm_block'):
with tf.compat.v1.variable_scope(scope) :
bs, h, w, c = x.get_shape().as_list() # c = channels
x = tf.reshape(x, shape=[bs, -1, c]) # [bs, h*w, c]
x_mean, x_var = tf.nn.moments(x, axes=1, keep_dims=True) # [bs, 1, c]
x_std = tf.sqrt(x_var + 1e-5)
t = tf.concat([x_mean, x_std], axis=1) # [bs, 2, c]
z = tf.layers.conv1d(t, channels, kernel_size=2, strides=1, use_bias=use_bias)
z = tf.layers.batch_normalization(z, momentum=0.9, epsilon=1e-05, center=True, scale=True, training=is_training, name=scope)
z = tf.contrib.layers.batch_norm(z, decay=0.9, epsilon=1e-05, center=True, scale=True, updates_collections=None, is_training=is_training, scope=scope)
g = tf.sigmoid(z)
x = tf.reshape(x * g, shape=[bs, h, w, c])
return x
Pretrained Model
vgg16_model = VGG16(include_top=False, input_shape=(SIZE,SIZE,3), weights='imagenet')
vgg16_model.trainable = False # Freeze the Weights
Adding custom Layers
x = vgg16_model.output
x = SRM_block (x,512)
x = GlobalAveragePooling2D()(x)
x = Dense(256, activation="relu")(x)
x = Dropout(0.2)(x)
predictions = Dense(NUM_CLASSES, activation="softmax")(x)
The error:
TypeError: Exception encountered when calling layer "tf.reshape" (type TFOpLambda).
Failed to convert elements of \[None, -1, 512\] to Tensor. Consider casting elements to a supported type. See https://www.tensorflow.org/api_docs/python/tf/dtypes for supported TF dtypes.
Call arguments received by layer `"tf.reshape" (type TFOpLambda): • tensor=tf.Tensor(shape=(None, 7, 7, 512), dtype=float32) • shape=\['None', '-1', '512'\] • name=None`

Backpropagating gradients through nested tf.map_fn

I would like to map a TensorFlow function on each vector corresponding to the depth channel of every pixel in a matrix with dimension [batch_size, H, W, n_channels].
In other words, for every image of size H x W that I have in the batch:
I extract some features maps F_k (whose number is n_channels) with the same size H x W (hence, the features maps all together are a tensor of shape [H, W, n_channels];
then, I wish to apply a custom function to the vector v_ij that is associated with the i-th row and j-th column of each feature map F_k, but explores the depth channel in its entirety (e.g. v has dimension [1 x 1 x n_channels]). Ideally, all of this would happen in parallel.
A picture to explain the process can be found below. The only difference with the picture is that both input and output "receptive fields" have size 1x1 (apply the function to each pixel independently).
This would be similar to applying a 1x1 convolution to the matrix; however, I need to apply a more general function over the depth channel, rather than a simple sum operation.
I think tf.map_fn() could be an option and I tried the following solution, where I recursively use tf.map_fn() to access the features associated with each pixel. However, this kind of seems sub-optimal, and most importantly it raises an error when trying to backpropagate the gradients.
Do you have any idea of the reason why this happens and how I should structure my code to avoid the error?
This is my current implementation of the function:
import tensorflow as tf
from tensorflow import layers
def apply_function_on_pixel_features(incoming):
# at first the input is [None, W, H, n_channels]
if len(incoming.get_shape()) > 1:
return tf.map_fn(lambda x: apply_function_on_pixel_features(x), incoming)
else:
# here the input is [n_channels]
# apply some function that applies a transfomration and returns a vetor of the same size
output = my_custom_fun(incoming) # my_custom_fun() doesn't change the shape
return output
and the body of my code:
H = 128
W = 132
n_channels = 8
x1 = tf.placeholder(tf.float32, [None, H, W, 1])
x2 = layers.conv2d(x1, filters=n_channels, kernel_size=3, padding='same')
# now apply a function to the features vector associated to each pixel
x3 = apply_function_on_pixel_features(x2)
x4 = tf.nn.softmax(x3)
loss = cross_entropy(x4, labels)
optimizer = tf.train.AdamOptimizer(lr)
train_op = optimizer.minimize(loss) # <--- ERROR HERE!
Particularly, the error is the following:
File "/home/venvs/tensorflowGPU/lib/python3.6/site-packages/tensorflow/python/ops/control_flow_ops.py", line 2481, in AddOp
self._AddOpInternal(op)
File "/home/venvs/tensorflowGPU/lib/python3.6/site-packages/tensorflow/python/ops/control_flow_ops.py", line 2509, in _AddOpInternal
self._MaybeAddControlDependency(op)
File "/home/venvs/tensorflowGPU/lib/python3.6/site-packages/tensorflow/python/ops/control_flow_ops.py", line 2547, in _MaybeAddControlDependency
op._add_control_input(self.GetControlPivot().op)
AttributeError: 'NoneType' object has no attribute 'op'
The whole error stack and the code can be found here.
Thanks for the help,
G.
Update:
Following #thushv89 suggestion, I added a possible solution to the problem. I still don't know why my previous code didn't work. Any insight on this would still be very appreciated.
#gabriele regarding having to depend on batch_size, have you tried doing it the following way? This function does not depend on batch_size. You can replace the map_fn with anything you like.
def apply_function_on_pixel_features(incoming):
# get input shape:
_, W, H, C = incoming.get_shape().as_list()
incoming_flat = tf.reshape(incoming, shape=[-1, C])
# apply function on every vector of shape [1, C]
out_matrix = tf.map_fn(lambda x: x+1, incoming_flat) # dimension remains unchanged
# go back to the input shape shape [None, W, H, C]
out_matrix = tf.reshape(out_matrix, shape=[-1, W, H, C])
return out_matrix
The full code of what I tested is as below.
import numpy as np
import tensorflow as tf
from tensorflow.keras.losses import categorical_crossentropy
def apply_function_on_pixel_features(incoming):
# get input shape:
_, W, H, C = incoming.get_shape().as_list()
incoming_flat = tf.reshape(incoming, shape=[-1])
# apply function on every vector of shape [1, C]
out_matrix = tf.map_fn(lambda x: x+1, incoming_flat) # dimension remains unchanged
# go back to the input shape shape [None, W, H, C]
out_matrix = tf.reshape(out_matrix, shape=[-1, W, H, C])
return out_matrix
H = 32
W = 32
x1 = tf.placeholder(tf.float32, [None, H, W, 1])
labels = tf.placeholder(tf.float32, [None, 10])
x2 = tf.layers.conv2d(x1, filters=1, kernel_size=3, padding='same')
# now apply a function to the features vector associated to each pixel
x3 = apply_function_on_pixel_features(x2)
x4 = tf.layers.flatten(x3)
x4 = tf.layers.dense(x4, units=10, activation='softmax')
loss = categorical_crossentropy(labels, x4)
optimizer = tf.train.AdamOptimizer(0.001)
train_op = optimizer.minimize(loss)
x = np.zeros(shape=(10, H, W, 1))
y = np.random.choice([0,1], size=(10, 10))
with tf.Session() as sess:
tf.global_variables_initializer().run()
sess.run(train_op, feed_dict={x1: x, labels:y})
Following #thushv89 suggestion, I reshaped the array, applied the function and then reshaped it back (so to avoid the tf.map_fn recursion). I still don't know exactly why the previous code didn't work, but the current implementation allowed to propagate the gradients back to the previous layers. I'll leave it below, for whom might be interested:
def apply_function_on_pixel_features(incoming, batch_size):
# get input shape:
_, W, H, C = incoming.get_shape().as_list()
incoming_flat = tf.reshape(incoming, shape=[batch_size * W * H, C])
# apply function on every vector of shape [1, C]
out_matrix = my_custom_fun(incoming_flat) # dimension remains unchanged
# go back to the input shape shape [None, W, H, C]
out_shape = tf.convert_to_tensor([batch_size, W, H, C])
out_matrix = tf.reshape(out_matrix, shape=out_shape)
return out_matrix
Notice that now I needed to give the batch size to correctly reshape the tensor because TensorFlow would complain if I gave None or -1 as a dimension.
Any comments and insight on the above code would still be very appreciated.

Self-Attention GAN in Keras

I'm currently considering to implement the Self-Attention GAN in keras.
The way I'm thinking to implement is as follows:
def Attention(X, channels):
def hw_flatten(x):
return np.reshape(x, (x.shape[0], -1, x.shape[-1]))
f = Conv2D(channels//8, kernel_size=1, strides=1, padding='same')(X) # [bs, h, w, c']
g = Conv2D(channels//8, kernel_size=1, strides=1, padding='same')(X) # [bs, h, w, c']
h = Conv2D(channels, kernel_size=1, strides=1, padding='same')(X) # [bs, h, w, c]
# N = h * w
flatten_g = hw_flatten(g)
flatten_f = hw_flatten(f)
s = np.matmul(flatten_g, flatten_f.reshape((flatten_f.shape[0], flatten_f.shape[-1], -1))) # [bs, N, N]
beta = softmax(s, axis=-1) # attention map
flatten_h = hw_flatten(h) # [bs, N, C]
o = np.matmul(beta, flatten_h) # [bs, N, C]
gamma = 0
o = np.reshape(o, X.shape) # [bs, h, w, C]
y = gamma * o + X
return y
But I have no idea how to add a trainable scalar gamma as described in the paper: SAGAN
I also hope someone can give some ideas about how to initialize a trainable keras scalar.
EDIT:
My implementation is now:
class Attention(Layer):
def __init__(self, ch, **kwargs):
super(Attention, self).__init__(**kwargs)
self.channels = ch
self.filters_f_g = self.channels // 8
self.filters_h = self.channels
def build(self, input_shape):
kernel_shape_f_g = (1, 1) + (self.channels, self.filters_f_g)
print(kernel_shape_f_g)
kernel_shape_h = (1, 1) + (self.channels, self.filters_h)
# Create a trainable weight variable for this layer:
self.gamma = self.add_weight(name='gamma', shape=[1], initializer='zeros', trainable=True)
self.kernel_f = self.add_weight(shape=kernel_shape_f_g,
initializer='glorot_uniform',
name='kernel_f')
self.kernel_g = self.add_weight(shape=kernel_shape_f_g,
initializer='glorot_uniform',
name='kernel_g')
self.kernel_h = self.add_weight(shape=kernel_shape_h,
initializer='glorot_uniform',
name='kernel_h')
self.bias_f = self.add_weight(shape=(self.filters_f_g,),
initializer='zeros',
name='bias_F')
self.bias_g = self.add_weight(shape=(self.filters_f_g,),
initializer='zeros',
name='bias_g')
self.bias_h = self.add_weight(shape=(self.filters_h,),
initializer='zeros',
name='bias_h')
super(Attention, self).build(input_shape)
# Set input spec.
self.input_spec = InputSpec(ndim=4,
axes={3: input_shape[-1]})
self.built = True
def call(self, x):
def hw_flatten(x):
return K.reshape(x, shape=[K.shape(x)[0], K.shape(x)[1]*K.shape(x)[2], K.shape(x)[-1]])
f = K.conv2d(x,
kernel=self.kernel_f,
strides=(1, 1), padding='same') # [bs, h, w, c']
f = K.bias_add(f, self.bias_f)
g = K.conv2d(x,
kernel=self.kernel_g,
strides=(1, 1), padding='same') # [bs, h, w, c']
g = K.bias_add(g, self.bias_g)
h = K.conv2d(x,
kernel=self.kernel_h,
strides=(1, 1), padding='same') # [bs, h, w, c]
h = K.bias_add(h, self.bias_h)
s = tf.matmul(hw_flatten(g), hw_flatten(f), transpose_b=True) # # [bs, N, N]
beta = K.softmax(s, axis=-1) # attention map
o = K.batch_dot(beta, hw_flatten(h)) # [bs, N, C]
o = K.reshape(o, shape=K.shape(x)) # [bs, h, w, C]
x = self.gamma * o + x
return x
def compute_output_shape(self, input_shape):
return input_shape
There are several problems with the modifications you made to the original code:
You cannot use numpy operations in the middle of your Keras/TF graph. First because numpy will try to operate directly, while the inputs tensors will actually be evaluated/receive their value only at graph runtime. Second because Keras/TF won't be able to back-propagate through non-Keras/TF operations.
You should replace the original tensorflow operations by their keras or keras.backend operations instead (e.g. tf.matmul() by keras.backend.batch_dot(), tf.nn.doftmax() by keras.backend.softmax(), etc.)
You are mixing Keras Layers (e.g. Conv2D) and Keras operations (e.g. np/keras.backend.reshape). Keras operations should be wrapped in a Lambda layer to be used along others.
Since this custom layer has a trainable parameter (gamma), you would need to write your own custom layer, e.g.:
from keras import backend as K
from keras.engine.topology import Layer
class AttentionLayer(Layer):
def __init__(self, output_dim, **kwargs):
self.output_dim = output_dim
super(AttentionLayer, self).__init__(**kwargs)
def build(self, input_shape):
# Create a trainable weight variable for this layer:
self.gamma = self.add_weight(name='gamma', shape=[1], initializer='uniform', trainable=True)
super(AttentionLayer, self).build(input_shape)
def call(self, x):
channels = K.int_shape(x)[-1]
x = activation(x, channels) # use TF implementation, or reimplement with Keras operations
return x
def compute_output_shape(self, input_shape):
return input_shape

ValueError: Cannot feed value of shape (128, 28, 28) for Tensor 'Placeholder:0', which has shape '(?, 784)'

I am new to Tensorflow and Machine Learning and trying out CNN using Tensorflow with my custom input data. But I am getting the error attached below.
The Data or Image Size is 28x28 with 15 Labels.
I am not getting the numpy reshape thing in this script or the error.
Help is highly appreciated.
import tensorflow as tf
import os
import skimage.data
import numpy as np
import random
def load_data(data_directory):
directories = [d for d in os.listdir(data_directory)
if os.path.isdir(os.path.join(data_directory, d))]
labels = []
images = []
for d in directories:
label_directory = os.path.join(data_directory, d)
file_names = [os.path.join(label_directory, f)
for f in os.listdir(label_directory)
if f.endswith(".jpg")]
for f in file_names:
images.append(skimage.data.imread(f))
labels.append(d)
print(str(d)+' Completed')
return images, labels
ROOT_PATH = "H:\Testing\TrainingData"
train_data_directory = os.path.join(ROOT_PATH, "Training")
test_data_directory = os.path.join(ROOT_PATH, "Testing")
print('Loading Data...')
images, labels = load_data(train_data_directory)
print('Data has been Loaded')
n_classes = 15
training_examples = 10500
test_examples = 4500
batch_size = 128
x = tf.placeholder('float', [None, 784])
y = tf.placeholder('float')
def conv2d(x, W):
return tf.nn.conv2d(x, W, strides=[1,1,1,1], padding='SAME')
def maxpool2d(x):
return tf.nn.max_pool(x, ksize=[1,2,2,1], strides=[1,2,2,1], padding='SAME')
def neural_network_model(x):
weights = {'W_Conv1':tf.Variable(tf.random_normal([5,5,1,32])),
'W_Conv2':tf.Variable(tf.random_normal([5,5,32,64])),
'W_FC':tf.Variable(tf.random_normal([7*7*64, 1024])),
'Output':tf.Variable(tf.random_normal([1024, n_classes]))}
biases = {'B_Conv1':tf.Variable(tf.random_normal([32])),
'B_Conv2':tf.Variable(tf.random_normal([64])),
'B_FC':tf.Variable(tf.random_normal([1024])),
'Output':tf.Variable(tf.random_normal([n_classes]))}
x = tf.reshape(x, shape=[-1,28,28,1])
conv1 = conv2d(x, weights['W_Conv1'])
conv1 = maxpool2d(conv1)
conv2 = conv2d(conv1, weights['W_Conv2'])
conv2 = maxpool2d(conv2)
fc = tf.reshape(conv2, [-1, 7*7*64])
fc = tf.nn.relu(tf.matmul(fc, weights['W_FC'])+biases['B_FC'])
output = tf.matmul(fc, weights['Output'])+biases['Output']
return output
def next_batch(num, data, labels):
idx = np.arange(0 , len(data))
np.random.shuffle(idx)
idx = idx[:num]
data_shuffle = [data[ i] for i in idx]
labels_shuffle = [labels[ i] for i in idx]
return np.asarray(data_shuffle), np.asarray(labels_shuffle)
def train_neural_network(x):
prediction = neural_network_model(x)
cost = tf.reduce_mean( tf.nn.softmax_cross_entropy_with_logits(logits=prediction, labels=y) )
optimizer = tf.train.AdamOptimizer().minimize(cost)
hm_epochs = 10
with tf.Session() as sess:
# OLD:
#sess.run(tf.initialize_all_variables())
# NEW:
sess.run(tf.global_variables_initializer())
for epoch in range(hm_epochs):
epoch_loss = 0
for _ in range(int(training_examples/batch_size)):
epoch_x, epoch_y = next_batch(batch_size, images, labels)
_, c = sess.run([optimizer, cost], feed_dict={x: epoch_x, y: epoch_y})
epoch_loss += c
print('Epoch', epoch, 'completed out of',hm_epochs,'loss:',epoch_loss)
correct = tf.equal(tf.argmax(prediction, 1), tf.argmax(y, 1))
accuracy = tf.reduce_mean(tf.cast(correct, 'float'))
print('Accuracy:',accuracy.eval({x: images, y: labels}))
print('Training Neural Network...')
train_neural_network(x)
What am I doing wrong? What is needed to be fixed and how do I fix the shape of numpy array?
If you look closely, you'll see that you have two x placeholders:
x = tf.placeholder('float', [None, 784]) # global
...
x = tf.reshape(x, shape=[-1,28,28,1]) # in neural_network_model
One of them is in the function scope, hence not visible in train_neural_network, so tensorflow takes the one with [?, 784] shape. You should get rid of one of them.
Also note that your training data has the rank 3, i.e. [batch_size, 28, 28], so it's not directly compatible with any of those placeholders.
To feed it into the first x, take epoch_x.reshape([-1, 784]). For the second placeholder (once you make it visible), take epoch_x.reshape([-1, 28, 28, 1]).

Which is correct shape of linear regression in my model?

I am designing regression network to predict the weight of a person from 10 to 100 kg. My dataset has 50 training data that is
Vector 1: 1024x1 corresponding to 40kg
Vector 2: 1024x1 corresponding to 20kg
Vector 3: 1024x1 corresponding to 40kg
...
Vector 50: 1024x1 corresponding to 30kg
Hence, my dataset size is 1024x50, and the label size is 1x50. If I design a simple linear regression, like y=xW+b, so the size of W and b will be
W is 1024x1
b is 1x50
Am I right?
This is my tensorflow code but it provide a wrong prediction
# Training Data
train_X = ...# shape of 1024 x 50
train_Y = ...# shape of 1x50
n_samples = 50
learning_rate = 0.0001
training_epochs = 1000
display_step = 50
# tf Graph Input
X = tf.placeholder("float")
Y = tf.placeholder("float")
# Set model weights
W = tf.Variable(tf.truncated_normal([1024, 1], mean=0.0, stddev=1.0, dtype=tf.float32))
b = tf.Variable(tf.zeros(1, dtype = tf.float32))
# Construct a linear model
pred = tf.add(tf.multiply(X, W), b)
# Mean squared error
cost = tf.reduce_sum(tf.pow(pred-Y, 2))/(2*n_samples)
optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost)
init = tf.global_variables_initializer()
# Start training
with tf.Session() as sess:
# Run the initializer
sess.run(init)
# Fit all training data
for epoch in range(training_epochs):
for (x, y) in zip(train_X, train_Y):
sess.run(optimizer, feed_dict={X: x, Y: y})
# Display logs per epoch step
if (epoch + 1) % display_step == 0:
c = sess.run(cost, feed_dict={X: train_X, Y: train_Y})
print("Epoch:", '%04d' % (epoch + 1), "cost=", "{:.9f}".format(c), \
"W=", sess.run(W), "b=", sess.run(b))
print("Optimization Finished!")
W is 1024x1
b is 1x50
Am I right?
No, shape of W is correct, but b should be a scalar (1x1 matrix). In your approach you have one trainable bias per data point which makes no sense. However, in your code it is correctly set to size 1.
What is wrong is handling matrix multiplication, your model should be:
pred = tf.matmul(X, W) + b # you will have to transpose your train_X
tf.multiply is pointwise multiplication, not matrix multiplication.