Using batch norm when restore the model? - tensorflow

I have a little problem that using the batch norm when restore the model in tensorflow.
Below is my batch norm which from here:
def _batch_normalization(self, input_tensor, is_training, batch_norm_epsilon, decay=0.999):
"""batch normalization for dense nets.
Args:
input_tensor: `tensor`, the input tensor which needed normalized.
is_training: `bool`, if true than update the mean/variance using moving average,
else using the store mean/variance.
batch_norm_epsilon: `float`, param for batch normalization.
decay: `float`, param for update move average, default is 0.999.
Returns:
normalized params.
"""
# actually batch normalization is according to the channels dimension.
input_shape_channels = int(input_tensor.get_shape()[-1])
# scala and beta using in the the formula like that: scala * (x - E(x))/sqrt(var(x)) + beta
scale = tf.Variable(tf.ones([input_shape_channels]))
beta = tf.Variable(tf.zeros([input_shape_channels]))
# global mean and var are the mean and var that after moving averaged.
global_mean = tf.Variable(tf.zeros([input_shape_channels]), trainable=False)
global_var = tf.Variable(tf.ones([input_shape_channels]), trainable=False)
# if training, then update the mean and var, else using the trained mean/var directly.
if is_training:
# batch norm in the channel axis.
axis = list(range(len(input_tensor.get_shape()) - 1))
batch_mean, batch_var = tf.nn.moments(input_tensor, axes=axis)
# update the mean and var.
train_mean = tf.assign(global_mean, global_mean * decay + batch_mean * (1 - decay))
train_var = tf.assign(global_var, global_var * decay + batch_var * (1 - decay))
with tf.control_dependencies([train_mean, train_var]):
return tf.nn.batch_normalization(input_tensor,
batch_mean, batch_var, beta, scale, batch_norm_epsilon)
else:
return tf.nn.batch_normalization(input_tensor,
global_mean, global_var, beta, scale, batch_norm_epsilon)
I train the model and save it using tf.train.Saver(). Below is the test code:
def inference(self, images_for_predict):
"""load the pre-trained model and do the inference.
Args:
images_for_predict: `tensor`, images for predict using the pre-trained model.
Returns:
the predict labels.
"""
tf.reset_default_graph()
images, labels, _, _, prediction, accuracy, saver = self._build_graph(1, False)
predictions = []
correct = 0
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
# saver = tf.train.import_meta_graph('./models/dense_nets_model/dense_nets.ckpt.meta')
# saver.restore(sess, tf.train.latest_checkpoint('./models/dense_nets_model/'))
saver.restore(sess, './models/dense_nets_model/dense_nets.ckpt')
for i in range(100):
pred, corr = sess.run([tf.argmax(prediction, 1), accuracy],
feed_dict={
images: [images_for_predict.images[i]],
labels: [images_for_predict.labels[i]]})
correct += corr
predictions.append(pred[0])
print("PREDICTIONS:", predictions)
print("ACCURACY:", correct / 100)
But the predict result always very bad, like that:
('PREDICTIONS:', [2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2])
('ACCURACY:', 0.080000000000000002)
Some tips: images_for_predict = mnist.test and the self._build_graph method has two params: batch_size and is_training.
Anyone can help me?

After trying a lot of methods, I solve this problem, below are what I did.
First thanks to #gdelab, I used tf.layers.batch_normalization instead, so my batch norm function like that:
def _batch_normalization(self, input_tensor, is_training):
return tf.layers.batch_normalization(input_tensor, training=is_training)
The param is_training is a placeholder like that: is_training = tf.placeholder(tf.bool)
when building your graph, remember to add this code in your optimize:
extra_update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
with tf.control_dependencies(extra_update_ops):
train_step = tf.train.AdamOptimizer(self.learning_rate).minimize(cross_entropy)
because the tf.layers.batch_normalization adds to update mean and variance don't automatically get added as dependencies of the train operation - so if you don't do anything extra, they never get run.
So begain to train the net, after finish the training, save the model using the code like that:
saver = tf.train.Saver(var_list=tf.global_variables())
savepath = saver.save(sess, 'here_is_your_personal_model_path')
Note that var_list=tf.global_variables() param make sure tensorflow save all the params include the global mean/var which are set not trainable.
when restore and test the model, do like that:
# build the graph like training:
images, labels, _, _, prediction, accuracy, saver = self._build_graph(1, False)
saver = tf.train.Saver()
saver.restore(sess, 'here_is_your_personal_model_path')
And now one can test his/her model, hope that it can help u, thanks!

Seeing your implementation of batch norm, when you load your model, you need to keep the graph built with images, labels, _, _, prediction, accuracy, saver = self._build_graph(1, False) and load the weight values for the chekpoint, but NOT the meta graph. I think that saver.restore(sess, './models/dense_nets_model/dense_nets.ckpt') also restores the meta graph now (sorry if I'm wrong), so you need to restore only the "data" part of it.
Otherwise, you're just using the graph for training, in which the mean and variance used in batch norm are the ones obtained from the batch. But when you're testing the batch has size 1, so normalizing by the mean and variance of the batch always brings your data to 0, hence the constant output.
In any case, I'd suggest using tf.layers.batch_normalization instead, with a is_training placeholder that you'll need to feed to your network...

Related

question about TensorFlow mnist data reshape question

now I'm learning TensorFlow, I wonder why numpy.swapaxes(0,3) required.
I know that result is (1, 14, 14, 5) means [ 15element[ 145element[ 145element[ 5element ] ] ] ]
and after bumpy.swapaxes(3,0) -> (5, 14, 14, 1) and 5 images.
below is my code, please save my question. thank you.
#load mnist data
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)
#get only 1 image & reshape it
img = mnist.train.images[0].reshape(28,28)
plt.imshow(img, cmap='gray')
sess = tf.InteractiveSession()
#reshape image to get color = 1
img = img.reshape(-1,28,28,1)
#filter 3X3, count = 5
W1 = tf.Variable(tf.random_normal([3, 3, 1, 5], stddev=0.01))
#zero-padded USE
conv2d = tf.nn.conv2d(img, W1, strides=[1, 2, 2, 1], padding='SAME')
print(conv2d)
sess.run(tf.global_variables_initializer())
#make convoultion data
conv2d_img = conv2d.eval()
#print converted images
conv2d_img = np.swapaxes(conv2d_img, 0, 3)
for i, one_img in enumerate(conv2d_img):
plt.subplot(1,5,i+1), plt.imshow(one_img.reshape(14,14), cmap='gray')
#pooling
pool = tf.nn.max_pool(conv2d, ksize=[1, 2, 2, 1], strides=[
1, 2, 2, 1], padding='SAME')
print(pool)
sess.run(tf.global_variables_initializer())
pool_img = pool.eval()
#print pooling image
pool_img = np.swapaxes(pool_img, 0, 3)
for i, one_img in enumerate(pool_img):
plt.subplot(1,5,i+1), plt.imshow(one_img.reshape(7, 7), cmap='gray')
The swapping is necessary because it changes the order of the image channel.
By default, TensorFlow uses NHWC, where C = 1 since we have a grayscale image.
Therefore, you need the number of channels (1 for a grayscale image, 3 for an RGB) to be on the last axis in your data.
In your code, you can see that the NHWC relation holds (5 for number of images == batch_size, 14 for height, 14 for width, and 1 for image channel).

Tensorflow - fixed convolutional kernel

I am trying to create a fixed convolutional kernel that applies a blur filter to each channel separately:
# inputs = <previous layer>
kernel_weights = np.array([[1, 2, 1],
[2, 4, 2],
[1, 2, 1]])
kernel_weights = kernel_weights / np.sum(kernel_weights)
kernel_weights = np.reshape(kernel_weights, (*kernel_weights.shape, 1, 1))
kernel_weights = np.tile(kernel_weights, (1, 1, inputs.get_shape().as_list()[3], 1))
return tf.nn.depthwise_conv2d_native(max_pool, kernel_weights, strides=(1, 2, 2, 1), padding="SAME")
I'm currently under the impression that this convolutional kernel can/will change during training - how can I prevent it from doing so?
Would it be sufficient to wrap it in a tf.constant before passing it to the conv2d layer? Like so:
kernel_weights = tf.constant(kernel_weights)
Thanks!
As GPhilo's comment correctly identifies: just passing the kernel as a tf.constant (or plain numpy array) works, verified by plotting the histogram for the kernel in tensorboard.

Tensorflow tf.metrics.accuracy multi-label always zero

My label looks like this:
label = [0, 1, 0, 0, 1, 1, 0]
In other words, classes 1, 4, 5 are present at the corresponding sample. I believe this is called a soft class.
I'm calculating my loss with:
logits = tf.layers.dense(encoding, 7, activation=None)
cross_entropy = tf.nn.sigmoid_cross_entropy_with_logits(
labels=labels,
logits=logits
)
loss = tf.reduce_mean(cross_entropy)
According to Tensorboard, the loss is decreasing over time, as expected. However, the accuracy is flat at zero:
eval_metric_ops = {
'accuracy': tf.metrics.accuracy(labels=labels, predictions=logits),
}
tf.summary.scalar('accuracy', eval_metric_ops['accuracy'][1])
How do I calculate the accuracy of my model when using soft classes?
Did you solve this? I think the comment about softmax_cross_entropy_with_logits is incorrect because you have a multi-label, (each label is a) binary-class problem.
Partial solution:
labels = tf.constant([1, 1, 1, 0, 0, 0]) # example
predicitons = tf.constant([0, 1, 0, 0, 1, 0]) # example
is_equal = tf.equal(label, predicitons)
accuracy = tf.reduce_mean(tf.cast(is_equal, tf.float32))
This gives a number but still need to convert it into a tf metric.

Presence of unaccounted conditional nodes in TensorBoard

The Problem
When I run my training, my preprocessing examples are successfully created, however my training does not start. Much weirder is the fact, that on analyzing my TensorBoard graph, I see some extra conditional nodes which do not exist in the code. I want to know where and why do these extra nodes come into picture and exactly why the training does not begin. Below is a systematic description of the situation :
TensorFlow Graph
The following TensorBoard diagram shows my graph :
The code which constructs this graph is below
def getconv2drelu(inputtensor, kernelsize, strides, padding, convname,
imagesummaries=False):
weights = tf.get_variable("weights", shape=kernelsize, dtype=tf.float32,
initializer=tf.truncated_normal_initializer(0,
0.01),
regularizer=tf.nn.l2_loss)
biases = tf.get_variable("biases", shape=kernelsize[3], dtype=tf.float32,
initializer=tf.constant_initializer(0.0))
conv = tf.nn.conv2d(input=inputtensor, filter=weights, strides=strides,
padding=padding, name=convname)
response = tf.nn.bias_add(conv, biases)
if imagesummaries:
filters = (weights - tf.reduce_min(weights)) / (tf.reduce_max(
weights) - tf.reduce_min(weights))
filters = tf.transpose(filters, [3, 0, 1, 2])
tf.summary.image(convname + " filters", filters,
max_outputs=kernelsize[3])
response = tf.nn.relu(response)
activation_summary(response)
return response
def getfullyconnected(inputtensor, numinput, numoutput):
weights = tf.get_variable("weights", shape=[numinput, numoutput],
dtype=tf.float32,
initializer=
tf.truncated_normal_initializer(0, 0.01))
biases = tf.get_variable("biases", shape=[numoutput], dtype=tf.float32,
initializer=tf.truncated_normal_initializer(
0, 0.01))
response = tf.add(tf.matmul(inputtensor, weights), biases)
response = tf.nn.relu(response)
activation_summary(response)
return response
def inference(inputs):
with tf.variable_scope("layer1"):
conv = getconv2drelu(inputtensor=inputs, kernelsize=[7, 7, 3, 96],
strides=[1, 2, 2, 1], padding="VALID",
convname="conv1", imagesummaries=True)
pool = tf.nn.max_pool(conv, [1, 3, 3, 1], strides=[1, 3, 3, 1],
padding="SAME", name="pool1")
with tf.variable_scope("layer2"):
conv = getconv2drelu(inputtensor=pool, kernelsize=[7, 7, 96, 256],
strides=[1, 1, 1, 1], padding="VALID",
convname="conv2", imagesummaries=False)
pool = tf.nn.max_pool(conv, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1],
padding="SAME", name="pool2")
with tf.variable_scope("layer3"):
conv = getconv2drelu(inputtensor=pool, kernelsize=[7, 7, 256, 512],
strides=[1, 1, 1, 1], padding="SAME",
convname="conv3", imagesummaries=False)
with tf.variable_scope("layer4"):
conv = getconv2drelu(inputtensor=conv, kernelsize=[3, 3, 512, 512],
strides=[1, 1, 1, 1], padding="SAME",
convname="conv4", imagesummaries=False)
with tf.variable_scope("layer5"):
conv = getconv2drelu(inputtensor=conv, kernelsize=[3, 3, 512, 1024],
strides=[1, 1, 1, 1], padding="SAME",
convname="conv5", imagesummaries=False)
with tf.variable_scope("layer6"):
conv = getconv2drelu(inputtensor=conv, kernelsize=[3, 3, 1024, 1024],
strides=[1, 1, 1, 1], padding="SAME",
convname="conv6", imagesummaries=False)
pool = tf.nn.max_pool(conv, [1, 3, 3, 1], strides=[1, 3, 3, 1],
padding="SAME", name="pool1")
pool = tf.contrib.layers.flatten(pool)
with tf.variable_scope("fc1"):
fc = getfullyconnected(pool, 5 * 5 * 1024, 4096)
drop = tf.nn.dropout(fc, keep_prob=0.5)
with tf.variable_scope("fc2"):
fc = getfullyconnected(drop, 4096, 4096)
drop = tf.nn.dropout(fc, keep_prob=0.5)
with tf.variable_scope("fc3"):
logits = getfullyconnected(drop, 4096, 1000)
return logits
The complete TensorBoard graph is shown below :
The figure is too small, but you can see a series of pink nodes to the left. An expanded version of such a segment is shown below :
Expansion of one of the condition blocks ( all blocks are similar !!) is shown below :
I am unable to understand the presence and existence of these extra condition blocks. All my images when fed to the graph are of size [221, 221, 3].
You can also see that inside a condition block, there is an isVariableInitialized test. I do initialize my variables right after the launch of a session. So, I do not understand as to why these checks will be performed.I have figured out that these condition blocks are there due to the use tf.get_variable() which checks for the initialization Do they cause any performance difference ?
Another observation
When I decrease the batchsize, the size of my tensorboard file also decreases. But the nodes shown on the graph remain the same. Why is this so ?
My training code is as follows :
with tf.control_dependencies(putops):
train_op = tf.group(apply_gradient_op, variables_averages_op)
sess.run(train_op) # tf.Session() as been defined before sess.run()
And putops is initialized to [] and during graph construction for each GPU, it is populated as follows :
# cpu_compute_stage is appended only once since it corresponds to centralized preprocessing
cpu_compute_stage = data_flow_ops.StagingArea(
[tf.float32, tf.int32],
shapes=[images_shape, labels_shape]
)
cpu_compute_stage_op = gpu_copy_stage.put(
[host_images, host_labels])
putops.append(gpu_copy_stage_op)
# For each device the putops is further appended by gpu_compute_stage which is for each GPU since CPU-GPU copy has to take place
with tf.device('/gpu:%d' % i):
with tf.name_scope('%s_%d' % (TOWER_NAME, i)) as scope:
gpu_compute_stage = data_flow_ops.StagingArea(
[tf.float32, tf.int32],
shapes=[images_shape, labels_shape]
)
gpu_compute_stage_op = gpu_compute_stage.put(
[host_images, host_labels]
)
putops.append(gpu_compute_stage_op)
However, my code does not run despite the fact that I do initialize both global and local variables.

How to get CNN kernel values in Tensorflow

I am using the code below to create CNN layers.
conv1 = tf.layers.conv2d(inputs = input, filters = 20, kernel_size = [3,3],
padding = "same", activation = tf.nn.relu)
and I want to get the values of all kernels after training. It does not work it I simply do
kernels = conv1.kernel
So how should I retrieve the value of these kernels? I am also not sure what variables and method does conv2d has since tensorflow don't really tell it in conv2d class.
You can find all the variables in list returned by tf.global_variables() and easily lookup for variable you need.
If you wish to get these variables by name, declare a layer as:
conv_layer_1 = tf.layers.conv2d(activation=tf.nn.relu,
filters=10,
inputs=input_placeholder,
kernel_size=(3, 3),
name="conv1", # NOTE THE NAME
padding="same",
strides=(1, 1))
Recover the graph as:
gr = tf.get_default_graph()
Recover the kernel values as:
conv1_kernel_val = gr.get_tensor_by_name('conv1/kernel:0').eval()
Recover the bias values as:
conv1_bias_val = gr.get_tensor_by_name('conv1/bias:0').eval()
You mean you want to get the value of the weights for the conv1 layer.
You haven't actually defined the weights with conv2d, you need to do that. When I create a convolutional layer I use a function that performs all the necessary steps, here's a copy/paste of the function I use to create a each of my convolutional layers:
def _conv_layer(self, name, in_channels, filters, kernel, input_tensor, strides, dtype=tf.float32):
with tf.variable_scope(name):
w = tf.get_variable("w", shape=[kernel, kernel, in_channels, filters],
initializer=tf.contrib.layers.xavier_initializer_conv2d(), dtype=dtype)
b = tf.get_variable("b", shape=[filters], initializer=tf.constant_initializer(0.0), dtype=dtype)
c = tf.nn.conv2d(input_tensor, w, strides, padding='SAME', name=name + "c")
a = tf.nn.relu(c + b, name=name + "_a")
print name + "_a", a.get_shape().as_list(), name + "_w", w.get_shape().as_list(), \
"params", np.prod(w.get_shape().as_list()[1:]) + filters
return a, w.get_shape().as_list()
This is what I use to define 5 convolutional layers, this example is straight out of my code, so note that it's 5 convolutional layers stacked without using max pooling or anything, strides of 2 and 5x5 kernels.
conv1_a, _ = self._conv_layer("conv1", 3, 24, 5, self.imgs4d, [1, 2, 2, 1]) # 24.8 MiB/feature -> 540 x 960
conv2_a, _ = self._conv_layer("conv2", 24, 80, 5, conv1_a, [1, 2, 2, 1]) # 6.2 MiB -> 270 x 480
conv3_a, _ = self._conv_layer("conv3", 80, 256, 5, conv2_a, [1, 2, 2, 1]) # 1.5 MiB -> 135 x 240
conv4_a, _ = self._conv_layer("conv4", 256, 750, 5, conv3_a, [1, 2, 2, 1]) # 0.4 MiB -> 68 x 120
conv5_a, _ = self._conv_layer("conv5", 750, 2048, 5, conv4_a, [1, 2, 2, 1]) # 0.1 MiB -> 34 x 60
There's also a good tutorial on the tensorflow website on how to set up a convolutional network:
https://www.tensorflow.org/tutorials/deep_cnn
The direct answer to your question is that the weights for the convolutional layer are defined there as w, that's the tensor you're asking about if I understand you correctly.