How to share weights using tf.layers.conv2d - tensorflow

I have built an autoencoder using tf.layers.conv2d layers and would like to train it in phases. That is to train the outer layers first then the middle layers and then the inner. I understand this is possible using tf.nn.conv2d because the weights are declared using tf.get_variable but I would think this should also be possible using tf.layers.conv2d.
If I enter a new variable scope different from the original graph to change the inputs to the convolutional layers (i.e. skip the inner layers during phase 1) I am not able to reuse the weights. If I do not enter a new variable scope I am not able to freeze the weights that I dont want to train in this phase.
Basically I am trying to use the training method from Aurélien Géron here https://github.com/ageron/handson-ml/blob/master/15_autoencoders.ipynb
Except I would like to use a cnn instead of dense layers. How to do it?

No need to create the variables by hand. This works just as well:
import tensorflow as tf
inputs_1 = tf.placeholder(tf.float32, (None, 512, 512, 3), name='inputs_1')
inputs_2 = tf.placeholder(tf.float32, (None, 512, 512, 3), name='inputs_2')
with tf.variable_scope('conv'):
out_1 = tf.layers.conv2d(inputs_1, 32, [3, 3], name='conv_1')
with tf.variable_scope('conv', reuse=True):
out_2 = tf.layers.conv2d(inputs_2, 32, [3, 3], name='conv_1')
init = tf.global_variables_initializer()
with tf.Session() as sess:
sess.run(init)
print(tf.trainable_variables())
If you give tf.layers.conv2d the same name, it will use the same weights (assuming reuse=True, otherwise there will be a ValueError).
In Tesorflow 2.0: tf.layers were replaced by keras layers where the variables are reused by using the same layer object:
model = tf.keras.Sequential([
tf.keras.layers.Conv2D(32, 3, activation='relu',
input_shape=(512, 512, 3)),
])
#tf.function
def f1(x):
return model(x)
#tf.function
def f2(x):
return model(x)
Both f1 and f2 will use the layer with the same variables

I'd recommend setting it up a little bit differently. Instead of using tf.layers.conv2d, I would explicitly make the weights using calls to tf.get_variable() and then use these weights with calls to tf.nn.conv2d(). This way, you don't blackbox the variable creation, and can reference them easily. It's also a good way to learn exactly what's going on in your network, since you wrote the shapes for every set of weights by hand!
Sample (untested) code:
inputs = tf.placeholder(tf.float32, (batch_size, 512, 512, 3), name='inputs')
weights = tf.get_variable(name='weights', shape=[5, 5, 3, 16], dtype=tf.float32)
with tf.variable_scope("convs"):
hidden_layer_1 = tf.nn.conv2d(input=inputs, filter=weights, stride=[1, 1, 1, 1], padding="SAME")
with tf.variable_scope("convs", reuse=True):
hidden_layer_2 = tf.nn.conv2d(input=hidden_layer_1, filter=weights,stride=[1, 1, 1, 1], padding="SAME"
This creates convolutional weights and applies it twice to your input. I haven't tested this code, so there may be bugs, but it's about how it should look. References here for variable sharing and here for tf.nn.conv2d.
Hopefully that helps! I would be more thorough, but I have no idea what your code looks like.

Related

Tensorboard graph orphan layers

While building a model that includes transfer learning (from VGG-16).
I encounter this strange behavior. Tensorboard graph shows the layers which are not part of the new model but part of the old, above the point of seperation, and they are just dangling there.
When investigating further, model.summary() does not show these layers, model.get_layer("block4_conv1") can't find them either, and the keras tf.keras.utils.plot_model doesn't show them too. but if they are not part of the graph, how would tensorboard know about them?
To build the new model, I used the recommended method.
Model first stage:
vgg_input_model = tf.keras.applications.VGG16(weights='imagenet', include_top=False, input_tensor=x)
final_vgg_kayer = vgg_input_model.get_layer("block3_pool")
input_model = tf.keras.Model(inputs=vgg_input_model.inputs, outputs=final_vgg_kayer.output)
input_model.trainable = True
x = tf.keras.layers.Conv2D(512, 1, padding="same", activation='relu', name="stage0_final_conv1")(input_model.output)
x = tf.keras.layers.Conv2D(512, 1, padding="same", activation='relu', name="stage0_final_conv2")(x)
x = tf.keras.layers.Conv2D(256, 1, padding="same", activation='relu', name="stage0_final_conv3")(x)
x = tf.keras.layers.Conv2D(128, 1, padding="same", activation='relu', name="stage0_final_conv4")(x)
TF:2.1 (nightly-2.x)
PY:3.5
Tensorboard: 2.1.0a20191124
After trying multiple methods, I came to the conclusion that the recommended way is wrong. doing model_b=tf.keras.Model(inputs=model_a.inputs,outputs=model_a.get_layet("some_layer").output) will lead to dangling layers from model_a.
Using tf.keras.backend.clear_session() in between may cleans the keras graph, but tensorboard's graph is left empty then.
The best solution I found is config+weights copy of the required model, layer by layer.
And rebuilding the connections in a new model. that way there is no relationship whatsoever in the Keras graph between the two models.
(This is simple for a sequential model like VGG, but might be more difficult for something like ResNet)
Sample code:
tf.keras.backend.clear_session()
input_shape = (368, 368, 3) #only the input shape is shared between the models
#transfer learning model definition
input_layer_vgg = tf.keras.layers.Input(shape=input_shape)
vgg_input_model = tf.keras.applications.VGG16(weights='imagenet', include_top=False, input_tensor=input_layer_vgg)
name_last_layer = "block3_pool" #the last layer to copy
tf.keras.backend.clear_session() #clean the graph from the transfer learning model
input_layer = tf.keras.layers.Input(shape=input_shape) #define the input layer for the first model
x=input_layer
for layer in vgg_input_model.layers[1:]: #copy over layers, without the other input layer
config=layer.get_config() #get config
weights=layer.get_weights() #get weights
#print(config)
copy_layer=type(layer).from_config(config) #create the new layer from config
x=copy_layer(x) #connect to previous layers,
#required for the proper sizing of the layer,
#set_weights will not work without it
copy_layer.set_weights(weights)
if layer.name == name_last_layer:
break
del vgg_input_model
input_model=tf.keras.Model(inputs=input_layer,outputs=x) #create the new model,
#if needed x can be used further doen the line

Tensorflow: Get the same tensor after a series of convolution and deconvolution

I am wondering whether it is possible to end up with the same tensor after propagating it through a convolutional and then deconvolutional filter. For example:
random_image = np.random.rand(1, 6, 6, 3)
input_image = tf.placeholder(shape=[1, 6, 6, 3], dtype=tf.float32)
conv = tf.layers.conv2d(input_image, filters=6, kernel_size=[3, 3], strides=(1, 1), data_format="channels_last")
deconv = tf.layers.conv2d_transpose(conv, filters=3, kernel_size=[3, 3], strides=(1, 1), data_format="channels_last")
sess = tf.Session()
sess.run(tf.global_variables_initializer())
print(random_image)
# Get an output which will be same as:
print(sess.run(deconv, feed_dict={input_image: random_image}))
In other words, if the generated random_image vector is for example: [1,2,3,4,5], after convolution and deconvolution the deconv vector to be [1,2,3,4,5].
However, I am not able to get it to work.
Looking forward to you answers!
It's possible to get some degree of visual similarity, by using VarianceScaling initialization for example. Or even with completely custom initializer. But transposed convolution isn't mathematically deconvolution. So you can't get math equality with conv2d_transpose.
Take a look Why isn't this Conv2d_Transpose / deconv2d returning the original input in tensorflow?

Logits representation in TensorFlow’s sparse_softmax_cross_entropy

I’ve a question regarding to the sparse_softmax_cross_entropy cost function in TensorFlow.
I want to use it in a semantic segmentation context where I use an autoencoder architecture which uses typical convolution operations to downsample images to create a feature vector. This vector is than upsampled (using conv2d_transposeand one-by-one convolutions to create an output image.
Hence, my input consists of single channel images with shape (1,128,128,1), where the first index represents the batch size and the last one the number of channels. The pixel of the image are currently either 0 or 1. So each pixel is mapped to a class. The output image of the autoencoder follows the same rules. Hence, I can’t use any predefined cost function than either MSE or the previously mentioned one.
The network works fine with MSE. But I can’t get it working with sparse_softmax_cross_entropy. It seems like that this is the correct cost function in this context but I’m a bit confused about the representation of the logits. The official doc says that the logits should have the shape (d_i,...,d_n,num_classes). I tried to ignore the num_classes part but this causes an error which says that only the interval [0,1) is allowed. Of course, I need to specify the number of classes which would turn the allowed interval to [0,2) because the exclusive upper bound is obviously num_classes.
Could someone please explain how to turn my output image into the required logits?
The current code for the cost function is:
self._loss_op = tf.reduce_mean((tf.nn.sparse_softmax_cross_entropy_with_logits(labels=tf.squeeze(self._target_placeholder, [3]), logits=self._model, name="Loss")))
The squeeze removes the last dimension of the label input to create a shape for the labels of [1 128 128]. This causes the following exception:
tensorflow.python.framework.errors_impl.InvalidArgumentError: Received a label value of 1 which is outside the valid range of [0, 1).
Edit:
As requested, here's a minimal example to verfiy the behavior of the cost function in the context of fully-convolutional nets:
constructor snipped:
def __init__(self, img_channels=1, img_width=128, img_height=128):
...
self._loss_op = None
self._learning_rate_placeholder = tf.placeholder(tf.float32, [], 'lr')
self._input_placeholder = tf.placeholder(tf.float32, [None, img_width, img_height, img_channels], 'x')
self._target_placeholder = tf.placeholder(tf.float32, [None, img_width, img_height, img_channels], 'y')
self._model = self.build_model()
self.init_optimizer()
build_model() snipped:
def build_model(self):
with tf.variable_scope('conv1', reuse=tf.AUTO_REUSE):
#not necessary
x = tf.reshape(self._input_placeholder, [-1, self._img_width, self._img_height, self._img_channels])
conv1 = tf.layers.conv2d(x, 32, 5, activation=tf.nn.relu)
conv1 = tf.layers.max_pooling2d(conv1, 2, 2)
with tf.variable_scope('conv2', reuse=tf.AUTO_REUSE):
conv2 = tf.layers.conv2d(conv1, 64, 3, activation=tf.nn.relu)
conv2 = tf.layers.max_pooling2d(conv2, 2, 2)
with tf.variable_scope('conv3_red', reuse=tf.AUTO_REUSE):
conv3 = tf.layers.conv2d(conv2, 1024, 30, strides=1, activation=tf.nn.relu)
with tf.variable_scope('conv4_red', reuse=tf.AUTO_REUSE):
conv4 = tf.layers.conv2d(conv3, 64, 1, strides=1, activation=tf.nn.relu)
with tf.variable_scope('conv5_up', reuse=tf.AUTO_REUSE):
conv5 = tf.layers.conv2d_transpose(conv4, 32, (128, 128), strides=1, activation=tf.nn.relu)
with tf.variable_scope('conv6_1x1', reuse=tf.AUTO_REUSE):
conv6 = tf.layers.conv2d(conv5, 1, 1, strides=1, activation=tf.nn.relu)
return conv6
init_optimizer() snipped:
def init_optimizer(self):
self._loss_op = tf.reduce_mean((tf.nn.sparse_softmax_cross_entropy_with_logits(labels=tf.squeeze(self._target_placeholder, [3]), logits=self._model, name="Loss")))
optimizer = tf.train.AdamOptimizer(learning_rate=self._learning_rate_placeholder)
self._train_op = optimizer.minimize(self._loss_op)
By definition the logit is an unscaled probability (strictly speaking odds) or simply put any number. The sequence of logits of length num_classes can be interpreted as unscaled probability distribution. For example, in your case, num_classes=2, then logits=[125.0, -10.0] is an unscaled probability distribution for one pixel (which clearly favors 0 over 1). This array can be squashed to a valid distribution by a softmax, and this is what tf.sparse_softmax_cross_entropy does internally. For [125.0, -10.0] the squashed distribution will be very close to [1.0, 0.0].
Once again, the array [2] is for a single pixel.
If you want to compute the cross-entropy over entire image, the network has to output the binary distribution for all pixels and all images in a batch, i.e. output [batch_size, 128, 128, 2] tensor. The term sparse in the name of the loss refers to the fact that the labels are not one-hot encoded (more details here). It's most useful when the number of classes is large, i.e. one-hot encoding becomes too inefficient in terms of memory, but in your case it's insignificant. If you decide to use tf.sparse_softmax_cross_entropy loss, the labels must be [batch_size, 128, 128], it must be tf.int32 or tf.int64 and must contain correct class indices, zero or one. That's it: tensorflow can compute the cross-entropy between these two arrays.

Am I using tf.get_variable() correctly?

I read from here that it is recommended to always use tf.get_variable(...) although this seems a bit troublesome when I'm trying to implement a network.
For example:
def create_weights(shape, name = 'weights',\
initializer = tf.random_normal_initializer(0, 0.1)):
weights = tf.get_variable(name, shape, initializer = initializer)
print("weights created named: {}".format(weights.name))
return(weights)
def LeNet(in_units, keep_prob):
# define the network
with tf.variable_scope("conv1"):
conv1 = conv(in_units, create_weights([5, 5, 3, 32]), create_bias([32]))
pool1 = maxpool(conv1)
with tf.variable_scope("conv2"):
conv2 = conv(pool1, create_weights([5, 5, 32, 64]), create_bias([64]))
pool2 = maxpool(conv2)
# reshape the network to feed it into the fully connected layers
with tf.variable_scope("flatten"):
flatten = tf.reshape(pool2, [-1, 1600])
flatten = dropout(flatten, keep_prob)
with tf.variable_scope("fc1"):
fc1 = fc(flatten, create_weights([1600, 120]), biases = create_bias([120]))
fc1 = dropout(fc1, keep_prob)
with tf.variable_scope("fc2"):
fc2 = fc(fc1, create_weights([120, 84]), biases = create_bias([84]))
with tf.variable_scope("logits"):
logits = fc(fc2, create_weights([84, 43]), biases = create_bias([43]))
return(logits)
I have to use with tf_variable_scope(...) every single time I call create_weights, and furthermore, say if I wanted to change the conv1 variable's weights to [7, 7, 3, 32] instead of [5, 5, 3, 32] I would have to restart the kernel as the variable already exists. On the other hand if I use tf.Variable(...) I wouldn't have any of these problems.
Am I using tf.variable_scope(...) incorrectly?
It seems that you cannot change what already exists in a variable scope, thus only when you restart the kernel, you can change a variable that you defined before.(In fact you create a new one because the previous one has been deleted)
...
that is only my guess...I will appreciate it if someone can give a detailed answer.

How could I use batch normalization with tensorflow slim?

I've found a few questions on using batch normalization in TensorFlow, but none of them are about it's wrapper in slim.
I'm trying to use batch normalization to train a MNIST digit classifier. While the training performance is high enough, the validation or test performance is poor.
I built only one graph, and passed is_training as a tf.placeholder, just like this (BN is used in every conv and fc layer):
is_training = tf.placeholder(tf.bool, [])
x_image = tf.reshape(x, [-1, 28, 28, 1])
with slim.arg_scope([slim.conv2d, slim.fully_connected],
normalizer_fn=slim.batch_norm,
normalizer_params={'is_training': is_training}):
conv1 = slim.conv2d(x_image, 32, [5, 5], scope='conv1')
pool1 = slim.max_pool2d(conv1, [2, 2], scope='pool1')
conv2 = slim.conv2d(pool1, 64, [5, 5], scope='conv2')
pool2 = slim.max_pool2d(conv2, [2, 2], scope='pool2')
flatten = slim.flatten(pool2)
fc = slim.fully_connected(flatten, 1024, scope='fc1')
drop = slim.dropout(fc, keep_prob=keep_prob)
logits = slim.fully_connected(drop, 10, activation_fn=None, scope='logits')
I also added control dependencies as following:
update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
if update_ops:
updates = tf.group(*update_ops)
cross_entropy = control_flow_ops.with_dependencies([updates], cross_entropy)
For the training phase, I use:
sess.run([net['cross_entropy'], net['accuracy']],
feed_dict={net['x']: batch_xs,
net['y_']: batch_ys,
net['keep_prob']: 1.0,
net['is_training']: True})
For the validation phase, I use:
sess.run(net['accuracy'], feed_dict={net['x']: batch_xs,
net['y_']: batch_ys,
net['keep_prob']: 1.0,
net['is_training']: False})
For test purpose, I dump the trained model to a checkpoint, then pass is_training as False. Again, it's performance is poor.
So what's wrong with it? Is it about the reuse parameter? Or I need to maintain gamma and beta variables in BN layer myself?
For ease of reproducity, this is my code(set phase to train to train a model and validate, test to restore from checkpoint and test):
https://github.com/soloice/mnist-bn/blob/master/mnist_bn.py
Finally I figured out the problem, see https://github.com/tensorflow/tensorflow/issues/1122#issuecomment-280325584
for details. Roughly speaking, one need to use slim.learning.create_train_op to create train op, and must be patient to wait for moving mean/variance parameters to warm up.