How could I use batch normalization with tensorflow slim? - tensorflow

I've found a few questions on using batch normalization in TensorFlow, but none of them are about it's wrapper in slim.
I'm trying to use batch normalization to train a MNIST digit classifier. While the training performance is high enough, the validation or test performance is poor.
I built only one graph, and passed is_training as a tf.placeholder, just like this (BN is used in every conv and fc layer):
is_training = tf.placeholder(tf.bool, [])
x_image = tf.reshape(x, [-1, 28, 28, 1])
with slim.arg_scope([slim.conv2d, slim.fully_connected],
normalizer_fn=slim.batch_norm,
normalizer_params={'is_training': is_training}):
conv1 = slim.conv2d(x_image, 32, [5, 5], scope='conv1')
pool1 = slim.max_pool2d(conv1, [2, 2], scope='pool1')
conv2 = slim.conv2d(pool1, 64, [5, 5], scope='conv2')
pool2 = slim.max_pool2d(conv2, [2, 2], scope='pool2')
flatten = slim.flatten(pool2)
fc = slim.fully_connected(flatten, 1024, scope='fc1')
drop = slim.dropout(fc, keep_prob=keep_prob)
logits = slim.fully_connected(drop, 10, activation_fn=None, scope='logits')
I also added control dependencies as following:
update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
if update_ops:
updates = tf.group(*update_ops)
cross_entropy = control_flow_ops.with_dependencies([updates], cross_entropy)
For the training phase, I use:
sess.run([net['cross_entropy'], net['accuracy']],
feed_dict={net['x']: batch_xs,
net['y_']: batch_ys,
net['keep_prob']: 1.0,
net['is_training']: True})
For the validation phase, I use:
sess.run(net['accuracy'], feed_dict={net['x']: batch_xs,
net['y_']: batch_ys,
net['keep_prob']: 1.0,
net['is_training']: False})
For test purpose, I dump the trained model to a checkpoint, then pass is_training as False. Again, it's performance is poor.
So what's wrong with it? Is it about the reuse parameter? Or I need to maintain gamma and beta variables in BN layer myself?
For ease of reproducity, this is my code(set phase to train to train a model and validate, test to restore from checkpoint and test):
https://github.com/soloice/mnist-bn/blob/master/mnist_bn.py

Finally I figured out the problem, see https://github.com/tensorflow/tensorflow/issues/1122#issuecomment-280325584
for details. Roughly speaking, one need to use slim.learning.create_train_op to create train op, and must be patient to wait for moving mean/variance parameters to warm up.

Related

How to share weights using tf.layers.conv2d

I have built an autoencoder using tf.layers.conv2d layers and would like to train it in phases. That is to train the outer layers first then the middle layers and then the inner. I understand this is possible using tf.nn.conv2d because the weights are declared using tf.get_variable but I would think this should also be possible using tf.layers.conv2d.
If I enter a new variable scope different from the original graph to change the inputs to the convolutional layers (i.e. skip the inner layers during phase 1) I am not able to reuse the weights. If I do not enter a new variable scope I am not able to freeze the weights that I dont want to train in this phase.
Basically I am trying to use the training method from Aurélien Géron here https://github.com/ageron/handson-ml/blob/master/15_autoencoders.ipynb
Except I would like to use a cnn instead of dense layers. How to do it?
No need to create the variables by hand. This works just as well:
import tensorflow as tf
inputs_1 = tf.placeholder(tf.float32, (None, 512, 512, 3), name='inputs_1')
inputs_2 = tf.placeholder(tf.float32, (None, 512, 512, 3), name='inputs_2')
with tf.variable_scope('conv'):
out_1 = tf.layers.conv2d(inputs_1, 32, [3, 3], name='conv_1')
with tf.variable_scope('conv', reuse=True):
out_2 = tf.layers.conv2d(inputs_2, 32, [3, 3], name='conv_1')
init = tf.global_variables_initializer()
with tf.Session() as sess:
sess.run(init)
print(tf.trainable_variables())
If you give tf.layers.conv2d the same name, it will use the same weights (assuming reuse=True, otherwise there will be a ValueError).
In Tesorflow 2.0: tf.layers were replaced by keras layers where the variables are reused by using the same layer object:
model = tf.keras.Sequential([
tf.keras.layers.Conv2D(32, 3, activation='relu',
input_shape=(512, 512, 3)),
])
#tf.function
def f1(x):
return model(x)
#tf.function
def f2(x):
return model(x)
Both f1 and f2 will use the layer with the same variables
I'd recommend setting it up a little bit differently. Instead of using tf.layers.conv2d, I would explicitly make the weights using calls to tf.get_variable() and then use these weights with calls to tf.nn.conv2d(). This way, you don't blackbox the variable creation, and can reference them easily. It's also a good way to learn exactly what's going on in your network, since you wrote the shapes for every set of weights by hand!
Sample (untested) code:
inputs = tf.placeholder(tf.float32, (batch_size, 512, 512, 3), name='inputs')
weights = tf.get_variable(name='weights', shape=[5, 5, 3, 16], dtype=tf.float32)
with tf.variable_scope("convs"):
hidden_layer_1 = tf.nn.conv2d(input=inputs, filter=weights, stride=[1, 1, 1, 1], padding="SAME")
with tf.variable_scope("convs", reuse=True):
hidden_layer_2 = tf.nn.conv2d(input=hidden_layer_1, filter=weights,stride=[1, 1, 1, 1], padding="SAME"
This creates convolutional weights and applies it twice to your input. I haven't tested this code, so there may be bugs, but it's about how it should look. References here for variable sharing and here for tf.nn.conv2d.
Hopefully that helps! I would be more thorough, but I have no idea what your code looks like.

why if we use "tf.make_template()" in training stage, we must use tf.make_template() again in testing stage

I defined a model function which named "drrn_model". While I was training my model, I use model by:
shared_model = tf.make_template('shared_model', drrn_model)
train_output = shared_model(train_input, is_training=True)
It begin training step by step, and I can restore .ckpt file to the model when I want to continue to train the model from an old point.
But there is a problem when I test my trained model.
I use the code below directly without using tf.make_template:
train_output = drrn_model(train_input, is_training=False)
Then the terminal gave me a lots of NotFoundError like "Key LastLayer/Variable_2 not found in checkpoint".
But when I use
shared_model = tf.make_template('shared_model', drrn_model)
output_tensor = shared_model(input_tensor,is_training=False)
It can test normally.
So why we must use tf.make_template() again in testing stage. What is the difference between drrn_model and make_template when we construct our model.
And there is another question: the BN layer in tensorflow.
I have tried many ways but the outputs is always wrong(always worse then the version without BN layer).
There is my newest version of model with BN layer:
tensor = None
def drrn_model(input_tensor, is_training):
with tf.device("/gpu:0"):
with tf.variable_scope("FirstLayer"):
conv_0_w = tf.get_variable("conv_w", [3, 3, 1, 128], initializer=tf.random_normal_initializer(stddev=np.sqrt(2.0 / 9)))
tensor = tf.nn.conv2d(tf.nn.relu(batchnorm(input_tensor, is_training= is_training)), conv_0_w, strides=[1,1,1,1], padding="SAME")
first_layer = tensor
### recursion ###
with tf.variable_scope("recycle", reuse=False):
tensor = drrnblock(first_layer, tensor, is_training)
for i in range(1,10):
with tf.variable_scope("recycle", reuse=True):
tensor = drrnblock(first_layer, tensor, is_training)
### end layer ###
with tf.variable_scope("LastLayer"):
conv_end_w = tf.get_variable("conv_w", [3, 3, 128, 1], initializer=tf.random_normal_initializer(stddev=np.sqrt(2.0 / 9)))
conv_end_layer = tf.nn.conv2d(tf.nn.relu(batchnorm(tensor, is_training= is_training)), conv_end_w, strides=[1, 1, 1, 1], padding='SAME')
tensor = tf.add(input_tensor,conv_end_layer)
return tensor
def drrnblock(first_layer, input_layer, is_training):
conv1_w = tf.get_variable("conv1__w", [3, 3, 128, 128], initializer=tf.random_normal_initializer(stddev=np.sqrt(2.0 / 9)))
conv1_layer = tf.nn.conv2d(tf.nn.relu(batchnorm(input_layer, is_training= is_training)), conv1_w, strides=[1,1,1,1], padding= "SAME")
conv2_w = tf.get_variable("conv2__w", [3, 3, 128, 128], initializer=tf.random_normal_initializer(stddev=np.sqrt(2.0 / 9)))
conv2_layer = tf.nn.conv2d(tf.nn.relu(batchnorm(conv1_layer, is_training=is_training)), conv2_w, strides=[1, 1, 1, 1], padding="SAME")
tensor = tf.add(first_layer, conv2_layer)
return tensor
def batchnorm(inputs, is_training, decay = 0.999):# there is my BN layer
scale = tf.Variable(tf.ones([inputs.get_shape()[-1]]))
beta = tf.Variable(tf.zeros([inputs.get_shape()[-1]]))
pop_mean = tf.Variable(tf.zeros([inputs.get_shape()[-1]]), trainable=False)
pop_var = tf.Variable(tf.ones([inputs.get_shape()[-1]]), trainable=False)
if is_training:
batch_mean, batch_var = tf.nn.moments(inputs,[0,1,2])
print("batch_mean.shape: ", batch_mean.shape)
train_mean = tf.assign(pop_mean, pop_mean*decay+batch_mean*(1-decay))
train_var = tf.assign(pop_var, pop_var*decay+batch_var*(1-decay))
with tf.control_dependencies([train_mean, train_var]):
return tf.nn.batch_normalization(inputs,batch_mean,batch_var,beta,scale,variance_epsilon=1e-3)
else:
return tf.nn.batch_normalization(inputs,pop_mean,pop_var,beta,scale,variance_epsilon=1e-3)
Please tell me where is wrong in my code.
Thanks a lot!!

Tensorflow: Using Batch Normalization gives poor (erratic) validation loss and accuracy

I am trying to use Batch Normalization using tf.layers.batch_normalization() and my code looks like this:
def create_conv_exp_model(fingerprint_input, model_settings, is_training):
# Dropout placeholder
if is_training:
dropout_prob = tf.placeholder(tf.float32, name='dropout_prob')
# Mode placeholder
mode_placeholder = tf.placeholder(tf.bool, name="mode_placeholder")
he_init = tf.contrib.layers.variance_scaling_initializer(mode="FAN_AVG")
# Input Layer
input_frequency_size = model_settings['bins']
input_time_size = model_settings['spectrogram_length']
net = tf.reshape(fingerprint_input,
[-1, input_time_size, input_frequency_size, 1],
name="reshape")
net = tf.layers.batch_normalization(net,
training=mode_placeholder,
name='bn_0')
for i in range(1, 6):
net = tf.layers.conv2d(inputs=net,
filters=8*(2**i),
kernel_size=[5, 5],
padding='same',
kernel_initializer=he_init,
name="conv_%d"%i)
net = tf.layers.batch_normalization(net,
training=mode_placeholder,
name='bn_%d'%i)
with tf.name_scope("relu_%d"%i):
net = tf.nn.relu(net)
net = tf.layers.max_pooling2d(net, [2, 2], [2, 2], 'SAME',
name="maxpool_%d"%i)
net_shape = net.get_shape().as_list()
net_height = net_shape[1]
net_width = net_shape[2]
net = tf.layers.conv2d( inputs=net,
filters=1024,
kernel_size=[net_height, net_width],
strides=(net_height, net_width),
padding='same',
kernel_initializer=he_init,
name="conv_f")
net = tf.layers.batch_normalization( net,
training=mode_placeholder,
name='bn_f')
with tf.name_scope("relu_f"):
net = tf.nn.relu(net)
net = tf.layers.conv2d( inputs=net,
filters=model_settings['label_count'],
kernel_size=[1, 1],
padding='same',
kernel_initializer=he_init,
name="conv_l")
### Squeeze
squeezed = tf.squeeze(net, axis=[1, 2], name="squeezed")
if is_training:
return squeezed, dropout_prob, mode_placeholder
else:
return squeezed, mode_placeholder
And my train step looks like this:
update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
with tf.control_dependencies(update_ops):
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate_input)
gvs = optimizer.compute_gradients(cross_entropy_mean)
capped_gvs = [(tf.clip_by_value(grad, -2., 2.), var) for grad, var in gvs]
train_step = optimizer.apply_gradients(gvs))
During training, I am feeding the graph with:
train_summary, train_accuracy, cross_entropy_value, _, _ = sess.run(
[
merged_summaries, evaluation_step, cross_entropy_mean, train_step,
increment_global_step
],
feed_dict={
fingerprint_input: train_fingerprints,
ground_truth_input: train_ground_truth,
learning_rate_input: learning_rate_value,
dropout_prob: 0.5,
mode_placeholder: True
})
During validation,
validation_summary, validation_accuracy, conf_matrix = sess.run(
[merged_summaries, evaluation_step, confusion_matrix],
feed_dict={
fingerprint_input: validation_fingerprints,
ground_truth_input: validation_ground_truth,
dropout_prob: 1.0,
mode_placeholder: False
})
My loss and accuracy curves (orange is training, blue is validation):
Plot of loss vs number of iterations,
Plot of accuracy vs number of iterations
The validation loss (and accuracy) seem very erratic. Is my implementation of Batch Normalization wrong? Or is this normal with Batch Normalization and I should wait for more iterations?
You need to pass is_training to tf.layers.batch_normalization(..., training=is_training) or it tries to normalize the inference minibatches using the minibatch statistics instead of the training statistics, which is wrong.
There are mainly two things to check.
1. Are you sure that you are using batch normalization (BN) correctly in the train op?
If you read the layer documentation:
Note: when training, the moving_mean and moving_variance need to be updated.
By default the update ops are placed in tf.GraphKeys.UPDATE_OPS, so they
need to be added as a dependency to the train_op. Also, be sure to add
any batch_normalization ops before getting the update_ops collection.
Otherwise, update_ops will be empty, and training/inference will not work
properly.
For example:
x_norm = tf.layers.batch_normalization(x, training=training)
# ...
update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
with tf.control_dependencies(update_ops):
train_op = optimizer.minimize(loss)
2. Otherwise, try lowering the "momentum" in the BN.
During the training, in fact, the BN uses two moving averages of the mean and the variance that are supposed to approximate the population statistics. Mean and variance are initialized to 0 and 1 respectively and then, step by step, they are multiplied by the momentum value (default is 0.99) and added the new value*0.01. At inference (test) time, the normalization uses these statistics. For this reason, it takes these values a little while to arrive at the "real" mean and variance of the data.
Source:
https://www.tensorflow.org/api_docs/python/tf/layers/batch_normalization
https://github.com/keras-team/keras/issues/7265
https://github.com/keras-team/keras/issues/3366
The original BN paper can be found here:
https://arxiv.org/abs/1502.03167
I also observed oscillations in validation loss when adding batch norm before ReLU. We found that moving the batch norm after the ReLU resolved the issue.

Different batch size different results with inception_v2 during testing

I user inception_v2 as a base network for classification. During training, the batchsize=128.
During testing, if the batchsize=128, everything is ok. However, if the batchsize is smaller than 128 the results are different. And the precision declines as the batchsize drops. If the batchsize=1, the network will failed. I also used inception_v3 and inception_v1, the same problems appered. However, if the base network is replaced with Alex network (tensorflow), everything goes well. I also replace the inception_v2 with vgg (slim), and everything goes well.
I think the bug is associated with inception_v1~v3 or batch normalization. Maybe I have not used inception_v2 properly. Did anyone encounter similar problems?
def net(inputs, num_classes=2, batch_size=128, dropout=0.8, height=224, width=224):
with slim.arg_scope(inception_v2.inception_v2_arg_scope()):
end_points = inception_v2.inception_v2(inputs=inputs, num_classes=1001,is_training=True, spatial_squeeze=False)
kernel_size =inception_v2._reduced_kernel_size_for_small_input(end_points['Mixed_5c'], [7, 7])
net = slim.avg_pool2d(end_points['Mixed_5c'], kernel_size, padding='VALID', stride=1,scope='avgpool'.format(*kernel_size))
net = slim.dropout(net, keep_prob=dropout, scope='Dropout_2b')
end_points['Dropout_2b'] = net
# regresion layer
with tf.variable_scope('Regresion') as scope:
inputchannel = 1024
stddev = (2.0/inputchannel)**0.5
logits = slim.conv2d(end_points["Dropout_2b"], num_classes, [1, 1], activation_fn=None, normalizer_fn=None,weights_initializer=trunc_norm al(stddev), scope='regresion_layer')
print("inception_v2 finished!")
return logits, end_points
During testing, is_training=False
restore_list = [v for v in tf.trainable_variables() if (v.name.startswith("InceptionV2"))]
saver_googlenet = tf.train.Saver(var_list=restore_list) # var_list=restore_list
saver_googlenet.restore(sess, 'inception_v2.ckpt')
saver_all = tf.train.Saver(var_list=tf.global_variables(),max_to_keep=20)

how to visualize feature map of CNN for tensorflow? [duplicate]

Similarly to the Caffe framework, where it is possible to watch the learned filters during CNNs training and it's resulting convolution with input images, I wonder if is it possible to do the same with TensorFlow?
A Caffe example can be viewed in this link:
http://nbviewer.jupyter.org/github/BVLC/caffe/blob/master/examples/00-classification.ipynb
Grateful for your help!
To see just a few conv1 filters in Tensorboard, you can use this code (it works for cifar10)
# this should be a part of the inference(images) function in cifar10.py file
# conv1
with tf.variable_scope('conv1') as scope:
kernel = _variable_with_weight_decay('weights', shape=[5, 5, 3, 64],
stddev=1e-4, wd=0.0)
conv = tf.nn.conv2d(images, kernel, [1, 1, 1, 1], padding='SAME')
biases = _variable_on_cpu('biases', [64], tf.constant_initializer(0.0))
bias = tf.nn.bias_add(conv, biases)
conv1 = tf.nn.relu(bias, name=scope.name)
_activation_summary(conv1)
with tf.variable_scope('visualization'):
# scale weights to [0 1], type is still float
x_min = tf.reduce_min(kernel)
x_max = tf.reduce_max(kernel)
kernel_0_to_1 = (kernel - x_min) / (x_max - x_min)
# to tf.image_summary format [batch_size, height, width, channels]
kernel_transposed = tf.transpose (kernel_0_to_1, [3, 0, 1, 2])
# this will display random 3 filters from the 64 in conv1
tf.image_summary('conv1/filters', kernel_transposed, max_images=3)
I also wrote a simple gist to display all 64 conv1 filters in a grid.