Training Resnet-50 with CIFAR-100 dataset in TensorFlow, can't get good accuracy - tensorflow

I am trying to a resnet-50 model in tensorflow by cifar-100 dataset.I have used builtin resnet_v1_50 to create model in tensorflow with two fully connected layer on it's head.But my validation accuracy stuck at nearly 37%.What is the problem???am I configure wrongly define and configure resnet_v1_50??? my model creation code is given below.
import tensorflow as tf
from tensorflow.contrib.slim.python.slim.nets import resnet_v1
X = tf.placeholder(dtype=tf.float32, shape=[None, 32, 32, 3])
Y = tf.placeholder(dtype=tf.float32, shape=[None, 100])
net, end_points = resnet_v1.resnet_v1_50(X,global_pool=False,is_training=True)
flattened = tf.contrib.layers.flatten(net)
dense_fc1 = tf.layers.dense(inputs=flattened,units=625, activation=tf.nn.relu,kernel_initializer=tf.contrib.layers.xavier_initializer())
dropout_fc1 = tf.layers.dropout(inputs=dense_fc1,rate=0.5, training=self.training)
logits = tf.layers.dense(inputs=dropout_fc1, units=num_classes,kernel_initializer = tf.contrib.layers.xavier_initializer())
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=Y))
optimizer = tf.train.AdamOptimizer(learning_rate=0.001).minimize(cost)

I think you have an extra dense layer. ResNet uses single fully-connected layer with softmax and size=num_classes.
You might also need to make sure that your hyperparameters are set correctly, like learning_rate and weight_decay and your input processing pipeline is also correct.
Here is an extra link to see if your pipeline is similar to a working solution.

Related

One back-propagation pass in keras [duplicate]

I am interested in building reinforcement learning models with the simplicity of the Keras API. Unfortunately, I am unable to extract the gradient of the output (not error) with respect to the weights. I found the following code that performs a similar function (Saliency maps of neural networks (using Keras))
get_output = theano.function([model.layers[0].input],model.layers[-1].output,allow_input_downcast=True)
fx = theano.function([model.layers[0].input] ,T.jacobian(model.layers[-1].output.flatten(),model.layers[0].input), allow_input_downcast=True)
grad = fx([trainingData])
Any ideas on how to calculate the gradient of the model output with respect to the weights for each layer would be appreciated.
To get the gradients of model output with respect to weights using Keras you have to use the Keras backend module. I created this simple example to illustrate exactly what to do:
from keras.models import Sequential
from keras.layers import Dense, Activation
from keras import backend as k
model = Sequential()
model.add(Dense(12, input_dim=8, init='uniform', activation='relu'))
model.add(Dense(8, init='uniform', activation='relu'))
model.add(Dense(1, init='uniform', activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
To calculate the gradients we first need to find the output tensor. For the output of the model (what my initial question asked) we simply call model.output. We can also find the gradients of outputs for other layers by calling model.layers[index].output
outputTensor = model.output #Or model.layers[index].output
Then we need to choose the variables that are in respect to the gradient.
listOfVariableTensors = model.trainable_weights
#or variableTensors = model.trainable_weights[0]
We can now calculate the gradients. It is as easy as the following:
gradients = k.gradients(outputTensor, listOfVariableTensors)
To actually run the gradients given an input, we need to use a bit of Tensorflow.
trainingExample = np.random.random((1,8))
sess = tf.InteractiveSession()
sess.run(tf.initialize_all_variables())
evaluated_gradients = sess.run(gradients,feed_dict={model.input:trainingExample})
And thats it!
The below answer is with the cross entropy function, feel free to change it your function.
outputTensor = model.output
listOfVariableTensors = model.trainable_weights
bce = keras.losses.BinaryCrossentropy()
loss = bce(outputTensor, labels)
gradients = k.gradients(loss, listOfVariableTensors)
sess = tf.InteractiveSession()
sess.run(tf.global_variables_initializer())
evaluated_gradients = sess.run(gradients,feed_dict={model.input:training_data1})
print(evaluated_gradients)

Batch normalization layer for CNN-LSTM

Suppose that I have a model like this (this is a model for time series forecasting):
ipt = Input((data.shape[1] ,data.shape[2])) # 1
x = Conv1D(filters = 10, kernel_size = 3, padding = 'causal', activation = 'relu')(ipt) # 2
x = LSTM(15, return_sequences = False)(x) # 3
x = BatchNormalization()(x) # 4
out = Dense(1, activation = 'relu')(x) # 5
Now I want to add batch normalization layer to this network. Considering the fact that batch normalization doesn't work with LSTM, Can I add it before Conv1D layer? I think it's rational to have a batch normalization layer after LSTM.
Also, where can I add Dropout in this network? The same places? (after or before batch normalization?)
What about adding AveragePooling1D between Conv1D and LSTM? Is it possible to add batch normalization between Conv1D and AveragePooling1D in this case without any effect on LSTM layer?
Update: the LayerNormalization implementation I was using was inter-layer, not recurrent as in the original paper; results with latter may prove superior.
BatchNormalization can work with LSTMs - the linked SO gives false advice; in fact, in my application of EEG classification, it dominated LayerNormalization. Now to your case:
"Can I add it before Conv1D"? Don't - instead, standardize your data beforehand, else you're employing an inferior variant to do the same thing
Try both: BatchNormalization before an activation, and after - apply to both Conv1D and LSTM
If your model is exactly as you show it, BN after LSTM may be counterproductive per ability to introduce noise, which can confuse the classifier layer - but this is about being one layer before output, not LSTM
If you aren't using stacked LSTM with return_sequences=True preceding return_sequences=False, you can place Dropout anywhere - before LSTM, after, or both
Spatial Dropout: drop units / channels instead of random activations (see bottom); was shown more effective at reducing coadaptation in CNNs in paper by LeCun, et al, w/ ideas applicable to RNNs. Can considerably increase convergence time, but also improve performance
recurrent_dropout is still preferable to Dropout for LSTM - however, you can do both; just do not use with with activation='relu', for which LSTM is unstable per a bug
For data of your dimensionality, any sort of Pooling is redundant and may harm performance; scarce data is better transformed via a non-linearity than simple averaging ops
I strongly recommend a SqueezeExcite block after your Conv; it's a form of self-attention - see paper; my implementation for 1D below
I also recommend trying activation='selu' with AlphaDropout and 'lecun_normal' initialization, per paper Self Normalizing Neural Networks
Disclaimer: above advice may not apply to NLP and embed-like tasks
Below is an example template you can use as a starting point; I also recommend the following SO's for further reading: Regularizing RNNs, and Visualizing RNN gradients
from keras.layers import Input, Dense, LSTM, Conv1D, Activation
from keras.layers import AlphaDropout, BatchNormalization
from keras.layers import GlobalAveragePooling1D, Reshape, multiply
from keras.models import Model
import keras.backend as K
import numpy as np
def make_model(batch_shape):
ipt = Input(batch_shape=batch_shape)
x = ConvBlock(ipt)
x = LSTM(16, return_sequences=False, recurrent_dropout=0.2)(x)
# x = BatchNormalization()(x) # may or may not work well
out = Dense(1, activation='relu')
model = Model(ipt, out)
model.compile('nadam', 'mse')
return model
def make_data(batch_shape): # toy data
return (np.random.randn(*batch_shape),
np.random.uniform(0, 2, (batch_shape[0], 1)))
batch_shape = (32, 21, 20)
model = make_model(batch_shape)
x, y = make_data(batch_shape)
model.train_on_batch(x, y)
Functions used:
def ConvBlock(_input): # cleaner code
x = Conv1D(filters=10, kernel_size=3, padding='causal', use_bias=False,
kernel_initializer='lecun_normal')(_input)
x = BatchNormalization(scale=False)(x)
x = Activation('selu')(x)
x = AlphaDropout(0.1)(x)
out = SqueezeExcite(x)
return out
def SqueezeExcite(_input, r=4): # r == "reduction factor"; see paper
filters = K.int_shape(_input)[-1]
se = GlobalAveragePooling1D()(_input)
se = Reshape((1, filters))(se)
se = Dense(filters//r, activation='relu', use_bias=False,
kernel_initializer='he_normal')(se)
se = Dense(filters, activation='sigmoid', use_bias=False,
kernel_initializer='he_normal')(se)
return multiply([_input, se])
Spatial Dropout: pass noise_shape = (batch_size, 1, channels) to Dropout - has the effect below; see Git gist for code:

How to build a pretrained CNN-LSTM network with Keras

I'm trying to use a CNN-LSTM network with Keras in order to analyze videos. I read about it and run into TimeDistributed function and some examples.
Actually, I tried the network described below, which is in fact composed by a convolutional and pooling layers followed by recurrent and dense layers.
model = Sequential()
model.add(TimeDistributed(Conv2D(2, (2,2), activation= 'relu' ), input_shape=(None, IMG_SIZE, IMG_SIZE, 3)))
model.add(TimeDistributed(MaxPooling2D(pool_size=(2, 2))))
model.add(TimeDistributed(Flatten()))
model.add(LSTM(50))
model.add(Dense(50, activation = 'softmax'))
model.compile(loss = 'categorical_crossentropy' , optimizer = 'adam' , metrics = ['acc'])
I haven't tested properly the model, since my dataset is too small. However, during training process the network reaches accuracy 0.98 in 4-5 epochs (perhaps it is overfitting, but it isn't a problem yet because I hope to get a suitable dataset later).
Then, I read about how to use a pretrained convolutional network (MobileNet, ResNet or Inception) as a feature extractor for LSTM network, such that I use the following code:
inputs = Input(shape = (frames, IMG_SIZE, IMG_SIZE, 3))
cnn_base = InceptionV3(include_top = False, weights='imagenet', input_shape = (IMG_SIZE, IMG_SIZE, 3))
cnn_out = GlobalAveragePooling2D()(cnn_base.output)
cnn = Model(inputs=cnn_base.input, outputs=cnn_out)
encoded_frames = TimeDistributed(cnn)(inputs)
encoded_sequence = LSTM(256)(encoded_frames)
hidden_layer = Dense(1024, activation="relu")(encoded_sequence)
outputs = Dense(50, activation="softmax")(hidden_layer)
model = Model([inputs], outputs)
In this case, when training the model it always shows accuracy ~0.02 (it is the baseline 1/50).
Since the first model at least learned anything, I am wondering if there is any error with the way the network is build in the second case.
Has anybody faced this situation? Any advice?
Thank you.
The reason is you have very small amount of data and retraining the complete Inception V3 weights. Either you have to train the model with more amount of data OR train the model with more number of epochs with hyper parameter tuning. You can find more about hyper parameter training here.
The ideal way is to freeze the base model by base_model.trainable = False and just train the new layers that you have added on top of the Inception V3 layers.
OR
Unfreeze the top layers of the base model(Inception V3 layers) and set the bottom layers to be un-trainable. You can do it as below -
# Let's take a look to see how many layers are in the base model
print("Number of layers in the base model: ", len(base_model.layers))
# Fine-tune from this layer onwards
fine_tune_at = 100
# Freeze all the layers before the `fine_tune_at` layer
for layer in base_model.layers[:fine_tune_at]:
layer.trainable = False

How to do Gradient Normalization using Tensorflow LazyAdamOptimizer in functional Keras Model?

I am using a bidirectional RNN in Keras and need to use Tensoflows LazyAdamOptimizer. I need to do Gradient Normalization. How can I implement gradient normalization with tensorflows LazyAdamOptimizer and than use the functional keras model further on?
I am training a unsupervised RNN to predict a input sequence of lenght 10. The Problem is, that i am using a keras functional model. Because of the sparsity of the embedding layer i need to use Tensorflows LazyAdamOptimizer, which is not a default optimizer in keras. When using a default keras optimizer i can do gradient normalization just by setting the argument 'clipnorm=1' in the optimizer function. Because i am using LazyAdam i need to do this with tensorflow and than pass it back to my keras model, but i can't get the code going.
#model architecture
model_input = Input(shape=(seq_len, ))
embedding_a = Embedding(len(port_fwd_dict), 50, input_length=seq_len, mask_zero=True)(model_input)
lstm_a = Bidirectional(GRU(25, return_sequences=True,implementation=2, reset_after=True, recurrent_activation='sigmoid'), merge_mode="concat (embedding_a)
dropout_a = Dropout(0.2)(lstm_a)
lstm_b = Bidirectional(GRU(25, return_sequences=False, activation="relu", implementation=2, reset_after=True, recurrent_activation='sigmoid'), merge_mode="concat")(dropout_a)
dropout_b = Dropout(0.2)(lstm_b)
dense_layer = Dense(100, activation="linear")(dropout_b)
dropout_c = Dropout(0.2)(dense_layer)
model_output = Dense(len(port_fwd_dict)-1, activation="softmax(dropout_c)
# trying to implement gradient normalization
optimizer = tf.contrib.opt.LazyAdamOptimizer()
optimizer = tf.contrib.estimator.clip_gradients_by_norm(optimizer, 1)
loss = tf.reduce_mean(categorical_crossentropy(model_input, model_output))
train_op = optimizer.minimize(loss, tf.train.get_global_step())
model = Model(inputs=model_input, outputs=model_output)
model.compile(optimizer=train_op, loss='categorical_crossentropie', metrics = [ 'categorical_accuracy'])
history = model.fit(X_train, Y_train, epochs=epochs, batch_size=batch_size, validation_split=validation_split, class_weight = 'auto')
Blockquote
I get the following Error: NameError: name 'categorical_crossentropy' is not defined
But even if this error is solved, i do not know if this code will work. Because I need to use the keras function model.compile and in this function there need to be a loss specified. but when i do this in the tensorflow part above, it is not working.
I need a way to do gradient normalization and use my normal keras functional model?!
maybe you can try my implement of lazy optimizer:
https://github.com/bojone/keras_lazyoptimizer
It is a pure keras implement, wrapping a existing optimizer to be a lazy version.

How can I get a tensor output by a tensorflow.layer

I created a CNN model using higher level tensorflow layers, like
conv1 = tf.layers.conv2d(...)
maxpooling1 = tf.layers.max_pooling2d(...)
conv2 = tf.layers.conv2d(...)
maxpooling2 = tf.layers.max_pooling2d(...)
flatten = tf.layers.flatten(...)
logits = tf.layers.dense(...)
loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(...))
optimizer = tf.train.AdadeltaOptimizer(init_lr).minimize(loss)
acc = tf.reduce_mean(...)
The model is well trained and saved, everything is good so far. Next, I want to load this saved model, make a change to the learning rate, and continue to train (I know tensorflow provides exponential_decay() function to allow a decay learning rate, here i just want to be in full control of learning rate, and change it manually). To do this, my idea is like:
saver = tf.train.import_meta_grah(...)
saver.restore(sess, tf.train.latest_chechpoint(...))
graph = tf.get_default_graph()
inputImg_ = graph.get_tensor_by_name(...) # this is place_holder in model
labels_ = graph.get_tensor_by_name(...) # place_holder in model
logits = graphget_tensor_by_name(...) # output of dense layer
loss = grah.get_tensor_by_name(...) # loss
optimizer = tf.train.AdadeltaOptimizer(new_lr).minimize(loss) # I give it a new learning rate
acc = tf.reduce_mean(...)
Now I got a problem. the code above can successfully obtain inputmg_, labels_, because I named them when I defined them. But I cannot obtain logits because logits = tf.layers.dense(name='logits') the name is actually given to the dense layer instead of the output tensor logits. That means, I cannot obtain the tensor conv1, conv2 either. It seems tensorflow cannot name a tensor output by a layer. In this case, is there a way to obtain these tensors, like logits, conv1, maxpooling1? I've searched for the answer for a while but failed.
I was having the same problem and solved it using tf.identity.
Since the dense layer has bias and weights parameters, when you name it, you are naming the layer, not the output tensor.
The tf.identity returns a tensor with the same shape and contents as input.
So just leave the dense layer unamed and use it as input to the tf.identity
self.output = tf.layers.dense(hidden_layer3, 2)
self.output = tf.identity(self.output, name='output')
Now you can load the output
output = graph.get_tensor_by_name('output:0')