I user inception_v2 as a base network for classification. During training, the batchsize=128.
During testing, if the batchsize=128, everything is ok. However, if the batchsize is smaller than 128 the results are different. And the precision declines as the batchsize drops. If the batchsize=1, the network will failed. I also used inception_v3 and inception_v1, the same problems appered. However, if the base network is replaced with Alex network (tensorflow), everything goes well. I also replace the inception_v2 with vgg (slim), and everything goes well.
I think the bug is associated with inception_v1~v3 or batch normalization. Maybe I have not used inception_v2 properly. Did anyone encounter similar problems?
def net(inputs, num_classes=2, batch_size=128, dropout=0.8, height=224, width=224):
with slim.arg_scope(inception_v2.inception_v2_arg_scope()):
end_points = inception_v2.inception_v2(inputs=inputs, num_classes=1001,is_training=True, spatial_squeeze=False)
kernel_size =inception_v2._reduced_kernel_size_for_small_input(end_points['Mixed_5c'], [7, 7])
net = slim.avg_pool2d(end_points['Mixed_5c'], kernel_size, padding='VALID', stride=1,scope='avgpool'.format(*kernel_size))
net = slim.dropout(net, keep_prob=dropout, scope='Dropout_2b')
end_points['Dropout_2b'] = net
# regresion layer
with tf.variable_scope('Regresion') as scope:
inputchannel = 1024
stddev = (2.0/inputchannel)**0.5
logits = slim.conv2d(end_points["Dropout_2b"], num_classes, [1, 1], activation_fn=None, normalizer_fn=None,weights_initializer=trunc_norm al(stddev), scope='regresion_layer')
print("inception_v2 finished!")
return logits, end_points
During testing, is_training=False
restore_list = [v for v in tf.trainable_variables() if (v.name.startswith("InceptionV2"))]
saver_googlenet = tf.train.Saver(var_list=restore_list) # var_list=restore_list
saver_googlenet.restore(sess, 'inception_v2.ckpt')
saver_all = tf.train.Saver(var_list=tf.global_variables(),max_to_keep=20)
Related
My issue:
I am trying to train a semantic segmentation model in tf.keras, in fact it works very well when I am using channels_last (WHC) mode (it reaches 96%+ val acc). I wanted to train it in channels_first (CHW) mode so the weights are compatible with TensorRT. When I do this, the ~80% training accuracy in the first few epochs dips down to around 0.020% and stays there permanently.
It is useful to know that the base of my model is a tf.keras.applications.MobileNet() model with the pre-trained 'imagenet' weights. (Model architecture at the bottom.)
The transformation process:
I used the guidelines provided and I change only a few things here:
Set tf.keras.backend.set_image_data_format() to 'channels_first'.
I change the channel order in the input tensor from: input_tensor=Input(shape=(376, 672, 3)) to: input_tensor=Input(shape=(3, 376, 672))
In my image preprocessing (using tf.data.Dataset), i use tf.transpose(img, perm=[2, 0, 1]) on both my input image and one-hot encoded mask to change the channel orders. I checked this with equality assertion to make sure its correct and it seems to be fine.
When I change these the training starts fine but as I said the training accuracy goes down to almost zero. When I revert back everything's fine again.
Possible leads:
What am I doing wrong or what could be the problematic part here? My suspicions are around these questions:
Are the pre-trained imageNet weights changed to the 'channels_first' order also when I set the backend? Is this something I should consider at all?
Could it be that the tf.transpose() function messes up the mask's one-hot encoding? (I have 3 classes represented by 3 colors: lane, opposing lane, background)
Maybe I am not seeing something obvious. I can provide further code and answers as needed.
EDIT:
08/17: This is still an ongoing issue, I have tried several things:
I checked if the image and the mask is correct after the transpose with numpy assertion, seems correct.
I suspected that the loss function calculates on the wrong axis, so I customized the loss function for the first axis (where the channels are). Here it is:
def ReverseAxisLoss(y_true, y_pred):
return K.categorical_crossentropy(y_true, y_pred, from_logits=True, axis=1)
My main suspicion is that the 'channels first' backend setting does nothing to transpose the pretrained 'imagenet' weights for the mobilenet part. Is there an updated way for TF2.x / Keras to transpose the pre-trained weights into CHW format?
Here is the architecture that I use (the skipNet() is the head network and the mobilenet is the base, and it is connected in the create_model() function)
def skipNet(encoder_output, feed1, feed2, classes):
# random initializer and regularizer
stddev = 0.01
init = RandomNormal(stddev=stddev)
weight_decay = 1e-3
reg = l2(weight_decay)
score_feed2 = Conv2D(kernel_size=(1, 1), filters=classes, padding="SAME",
kernel_initializer=init, kernel_regularizer=reg)(feed2)
score_feed2_bn = BatchNormalization()(score_feed2)
score_feed1 = Conv2D(kernel_size=(1, 1), filters=classes, padding="SAME",
kernel_initializer=init, kernel_regularizer=reg)(feed1)
score_feed1_bn = BatchNormalization()(score_feed1)
upscore2 = Conv2DTranspose(kernel_size=(4, 4), filters=classes, strides=(2, 2),
padding="SAME", kernel_initializer=init,
kernel_regularizer=reg)(encoder_output)
height_pad1 = ZeroPadding2D(padding=((1,0),(0,0)))(upscore2)
upscore2_bn = BatchNormalization()(height_pad1)
fuse_feed1 = add([score_feed1_bn, upscore2_bn])
upscore4 = Conv2DTranspose(kernel_size=(4, 4), filters=classes, strides=(2, 2),
padding="SAME", kernel_initializer=init,
kernel_regularizer=reg)(fuse_feed1)
height_pad2 = ZeroPadding2D(padding=((0,1),(0,0)))(upscore4)
upscore4_bn = BatchNormalization()(height_pad2)
fuse_feed2 = add([score_feed2_bn, upscore4_bn])
upscore8 = Conv2DTranspose(kernel_size=(16, 16), filters=classes, strides=(8, 8),
padding="SAME", kernel_initializer=init,
kernel_regularizer=reg, activation="softmax")(fuse_feed2)
return upscore8
def create_model(classes):
base_model = tf.keras.applications.MobileNet(input_tensor=Input(shape=IMG_SHAPE),
include_top=False,
weights='imagenet')
conv4_2_output = base_model.get_layer(index=43).output
conv3_2_output = base_model.get_layer(index=30).output
conv_score_output = base_model.output
head_model = skipNet(conv_score_output, conv4_2_output, conv3_2_output, classes)
for layer in base_model.layers:
layer.trainable = False
model = Model(inputs=base_model.input, outputs=head_model)
return model
I am trying to train an object detection model as described in this paper
There are 3 fully connected layers with 512, 512, 25 neurons. The 16x55x55 feature map from the last convolutional layer is fed into the fully connected layers to retrieve the appropriate class. At this stage, every grid described by (16x1x1) is fed into the fully connected layers to classify the grid as belonging to one of the 25 classes. The structure can be seen in the pciture below
fully connected layers
I am trying to adapt the code from TF MNIST classification tutorial, and I would like to know if it is okay to just sum the losses from each grid as in the code snippet below and use it to train the model weights.
flat_fmap = tf.reshape(last_conv_layer, [-1, 16*55*55])
total_loss = 0
for grid of flat_fmap:
dense1 = tf.layers.dense(inputs=grid, units=512, activation=tf.nn.relu)
dense2 = tf.layers.dense(inputs=dense1, units=512, activation=tf.nn.relu)
logits = tf.layers.dense(inputs=dense2, units=25)
total_loss += tf.losses.sparse_softmax_cross_entropy(labels=labels, logits=logits)
optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.001)
train_op = optimizer.minimize(
loss=total_loss,
global_step=tf.train.get_global_step())
return tf.estimator.EstimatorSpec(mode=tf.estimator.ModeKeys.TRAIN, loss=total_loss, train_op=train_op)
In the code above, I think at every iteration 3 new layers are being creating. However, I would like the weights to be preserved when classifying one grid and then another.
Adding to the total_loss should be ok.
tf.losses.sparse_softmax_cross_entropy is also adding losses together.
It calculates a sparse_softmax with logits and then reduces the resulting array though a sum using math_ops.reduce_sum.
So you are adding them together, one way or another.
As you can see in its source
The for loop on the network declaration seems unusual, it probably makes more sense to do it at run time and pass each grid through the feed_dict.
dense1 = tf.layers.dense(inputs=X, units=512, activation=tf.nn.relu)
dense2 = tf.layers.dense(inputs=dense1, units=512, activation=tf.nn.relu)
logits = tf.layers.dense(inputs=dense2, units=25)
loss = tf.losses.sparse_softmax_cross_entropy(labels=labels, logits=logits)
optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.001).minimize(loss)
total_loss = 0
with tf.session as sess:
sess.run(init)
for grid in flat_fmap:
_, l = sess.run([optimizer,loss], feed_dict{X: grid, labels=labels})
total_loss += l
I am trying to use Batch Normalization using tf.layers.batch_normalization() and my code looks like this:
def create_conv_exp_model(fingerprint_input, model_settings, is_training):
# Dropout placeholder
if is_training:
dropout_prob = tf.placeholder(tf.float32, name='dropout_prob')
# Mode placeholder
mode_placeholder = tf.placeholder(tf.bool, name="mode_placeholder")
he_init = tf.contrib.layers.variance_scaling_initializer(mode="FAN_AVG")
# Input Layer
input_frequency_size = model_settings['bins']
input_time_size = model_settings['spectrogram_length']
net = tf.reshape(fingerprint_input,
[-1, input_time_size, input_frequency_size, 1],
name="reshape")
net = tf.layers.batch_normalization(net,
training=mode_placeholder,
name='bn_0')
for i in range(1, 6):
net = tf.layers.conv2d(inputs=net,
filters=8*(2**i),
kernel_size=[5, 5],
padding='same',
kernel_initializer=he_init,
name="conv_%d"%i)
net = tf.layers.batch_normalization(net,
training=mode_placeholder,
name='bn_%d'%i)
with tf.name_scope("relu_%d"%i):
net = tf.nn.relu(net)
net = tf.layers.max_pooling2d(net, [2, 2], [2, 2], 'SAME',
name="maxpool_%d"%i)
net_shape = net.get_shape().as_list()
net_height = net_shape[1]
net_width = net_shape[2]
net = tf.layers.conv2d( inputs=net,
filters=1024,
kernel_size=[net_height, net_width],
strides=(net_height, net_width),
padding='same',
kernel_initializer=he_init,
name="conv_f")
net = tf.layers.batch_normalization( net,
training=mode_placeholder,
name='bn_f')
with tf.name_scope("relu_f"):
net = tf.nn.relu(net)
net = tf.layers.conv2d( inputs=net,
filters=model_settings['label_count'],
kernel_size=[1, 1],
padding='same',
kernel_initializer=he_init,
name="conv_l")
### Squeeze
squeezed = tf.squeeze(net, axis=[1, 2], name="squeezed")
if is_training:
return squeezed, dropout_prob, mode_placeholder
else:
return squeezed, mode_placeholder
And my train step looks like this:
update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
with tf.control_dependencies(update_ops):
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate_input)
gvs = optimizer.compute_gradients(cross_entropy_mean)
capped_gvs = [(tf.clip_by_value(grad, -2., 2.), var) for grad, var in gvs]
train_step = optimizer.apply_gradients(gvs))
During training, I am feeding the graph with:
train_summary, train_accuracy, cross_entropy_value, _, _ = sess.run(
[
merged_summaries, evaluation_step, cross_entropy_mean, train_step,
increment_global_step
],
feed_dict={
fingerprint_input: train_fingerprints,
ground_truth_input: train_ground_truth,
learning_rate_input: learning_rate_value,
dropout_prob: 0.5,
mode_placeholder: True
})
During validation,
validation_summary, validation_accuracy, conf_matrix = sess.run(
[merged_summaries, evaluation_step, confusion_matrix],
feed_dict={
fingerprint_input: validation_fingerprints,
ground_truth_input: validation_ground_truth,
dropout_prob: 1.0,
mode_placeholder: False
})
My loss and accuracy curves (orange is training, blue is validation):
Plot of loss vs number of iterations,
Plot of accuracy vs number of iterations
The validation loss (and accuracy) seem very erratic. Is my implementation of Batch Normalization wrong? Or is this normal with Batch Normalization and I should wait for more iterations?
You need to pass is_training to tf.layers.batch_normalization(..., training=is_training) or it tries to normalize the inference minibatches using the minibatch statistics instead of the training statistics, which is wrong.
There are mainly two things to check.
1. Are you sure that you are using batch normalization (BN) correctly in the train op?
If you read the layer documentation:
Note: when training, the moving_mean and moving_variance need to be updated.
By default the update ops are placed in tf.GraphKeys.UPDATE_OPS, so they
need to be added as a dependency to the train_op. Also, be sure to add
any batch_normalization ops before getting the update_ops collection.
Otherwise, update_ops will be empty, and training/inference will not work
properly.
For example:
x_norm = tf.layers.batch_normalization(x, training=training)
# ...
update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
with tf.control_dependencies(update_ops):
train_op = optimizer.minimize(loss)
2. Otherwise, try lowering the "momentum" in the BN.
During the training, in fact, the BN uses two moving averages of the mean and the variance that are supposed to approximate the population statistics. Mean and variance are initialized to 0 and 1 respectively and then, step by step, they are multiplied by the momentum value (default is 0.99) and added the new value*0.01. At inference (test) time, the normalization uses these statistics. For this reason, it takes these values a little while to arrive at the "real" mean and variance of the data.
Source:
https://www.tensorflow.org/api_docs/python/tf/layers/batch_normalization
https://github.com/keras-team/keras/issues/7265
https://github.com/keras-team/keras/issues/3366
The original BN paper can be found here:
https://arxiv.org/abs/1502.03167
I also observed oscillations in validation loss when adding batch norm before ReLU. We found that moving the batch norm after the ReLU resolved the issue.
I am now working on building a stereo matching network using Keras with tensorflow as backend. The network has the following structure:
After training the whole network, I need to test it. However, training phase and testing phase are quite different. I have to split the model into two parts. The first part is CNN+Concatenate which only needs to be run once, while the fully-connected part (actually I modify it to be fully-conv form when testing) needs to be run for d times with slightly different input, where d varies from 100 to 228.
The first part network code:
# input image dimensions
img_rows, img_cols = X1.shape[0], X1.shape[1]
input_shape = (img_rows, img_cols, 1)
X1 = X1.reshape(1, img_rows, img_cols, 1)
X2 = X2.reshape(1, img_rows, img_cols, 1)
# number of conv filters to use
nb_filters = 112
# CNN kernel size
kernel_size = (3,3)
left_branch = Sequential()
left_branch.add(Convolution2D(nb_filters, kernel_size[0], kernel_size[1], border_mode='same', input_shape=input_shape))
left_branch.add(Activation('relu'))
left_branch.add(Convolution2D(nb_filters, kernel_size[0], kernel_size[1], border_mode='same'))
left_branch.add(Activation('relu'))
left_branch.add(Convolution2D(nb_filters, kernel_size[0], kernel_size[1], border_mode='same'))
left_branch.add(Activation('relu'))
left_branch.add(Convolution2D(nb_filters, kernel_size[0], kernel_size[1], border_mode='same'))
left_branch.add(Activation('relu'))
right_branch = Sequential()
right_branch.add(Convolution2D(nb_filters, kernel_size[0], kernel_size[1], border_mode='same', input_shape=input_shape))
right_branch.add(Activation('relu'))
right_branch.add(Convolution2D(nb_filters, kernel_size[0], kernel_size[1], border_mode='same'))
right_branch.add(Activation('relu'))
right_branch.add(Convolution2D(nb_filters, kernel_size[0], kernel_size[1], border_mode='same'))
right_branch.add(Activation('relu'))
right_branch.add(Convolution2D(nb_filters, kernel_size[0], kernel_size[1], border_mode='same'))
right_branch.add(Activation('relu'))
merged = Merge([left_branch, right_branch], mode='concat')
cnn = Sequential()
cnn.add(merged)
I load the weights gained from training phase into the first part of the network and try to get prediction of it.
def load_cnn_weights(filepath):
f = h5py.File(filepath, mode='r')
weights = []
for i in range(1, 9):
weights.append(f['model_weights/conv2d_{}/conv2d_{}/kernel:0'.format(i, i)][()])
weights.append(f['model_weights/conv2d_{}/conv2d_{}/bias:0'.format(i, i)][()])
f.close()
return weights
weights = load_cnn_weights("/home/users/shixin.li/segment/Lecun_stereo_rebuild/weights.hdf5")
cnn.set_weights(weights)
output_cnn = cnn.predict([X1, X2])
I already check that the weights are read successfully and can fit into the network according to calling get_weights() function. X1 and X2 are not zero, they are normalized gray scale image matrix. I even tried compile the network before predict. But the result output_cnn gives all zero.
I didn't see anyone have this problem and I am stuck for two days. The part which really confuses me is that the input and weights are all not zero, then why the result is zero? If you could help, I would really appreciate that!
You might want to try using tfdbg to find out exactly what the inputs to the op with all-zero outputs are, to try to understand what is going on.
Trying to implement a minimal toy RNN example in tensorflow.
The goal is to learn a mapping from the input data to the target data, similar to this wonderful concise example in theanets.
Update: We're getting there. The only part remaining is to make it converge (and less convoluted). Could someone help to turn the following into running code or provide a simple example?
import tensorflow as tf
from tensorflow.python.ops import rnn_cell
init_scale = 0.1
num_steps = 7
num_units = 7
input_data = [1, 2, 3, 4, 5, 6, 7]
target = [2, 3, 4, 5, 6, 7, 7]
#target = [1,1,1,1,1,1,1] #converges, but not what we want
batch_size = 1
with tf.Graph().as_default(), tf.Session() as session:
# Placeholder for the inputs and target of the net
# inputs = tf.placeholder(tf.int32, [batch_size, num_steps])
input1 = tf.placeholder(tf.float32, [batch_size, 1])
inputs = [input1 for _ in range(num_steps)]
outputs = tf.placeholder(tf.float32, [batch_size, num_steps])
gru = rnn_cell.GRUCell(num_units)
initial_state = state = tf.zeros([batch_size, num_units])
loss = tf.constant(0.0)
# setup model: unroll
for time_step in range(num_steps):
if time_step > 0: tf.get_variable_scope().reuse_variables()
step_ = inputs[time_step]
output, state = gru(step_, state)
loss += tf.reduce_sum(abs(output - target)) # all norms work equally well? NO!
final_state = state
optimizer = tf.train.AdamOptimizer(0.1) # CONVERGEs sooo much better
train = optimizer.minimize(loss) # let the optimizer train
numpy_state = initial_state.eval()
session.run(tf.initialize_all_variables())
for epoch in range(10): # now
for i in range(7): # feed fake 2D matrix of 1 byte at a time ;)
feed_dict = {initial_state: numpy_state, input1: [[input_data[i]]]} # no
numpy_state, current_loss,_ = session.run([final_state, loss,train], feed_dict=feed_dict)
print(current_loss) # hopefully going down, always stuck at 189, why!?
I think there are a few problems with your code, but the idea is right.
The main issue is that you're using a single tensor for inputs and outputs, as in:
inputs = tf.placeholder(tf.int32, [batch_size, num_steps]).
In TensorFlow the RNN functions take a list of tensors (because num_steps can vary in some models). So you should construct inputs like this:
inputs = [tf.placeholder(tf.int32, [batch_size, 1]) for _ in xrange(num_steps)]
Then you need to take care of the fact that your inputs are int32s, but a RNN cell works on float vectors - that's what embedding_lookup is for.
And finally you'll need to adapt your feed to put in the input list.
I think the ptb tutorial is a reasonable place to look, but if you want an even more minimal example of an out-of-the-box RNN you can take a look at some of the rnn unit tests, e.g., here.
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/kernel_tests/rnn_test.py#L164