Firstly, I would like to apologise as I am not allowed post much code because it's for a university project, but I am seriously stuck.
I am trying to train a ConvNet using the CIFAR-10 dataset with TensorFlow using the following model:
Image: [32,32,3]
conv1: 5,5,3,64 + bias[64](initialised to 0.0's)
norm1: depth_radis=4, bias=1.0, alpha=0.001/9.0, beta=0.75
pool1: ksize=[1,3,3,1], strides=[1,2,2,1], padding=SAME
conv2: 5,5,64,64 + bias[64](initialised to 0.1's)
pool2: ksize=[1,3,3,1], strides=[1,2,2,1], padding=SAME
norm2: depth_radis=4, bias=1.0, alpha=0.001/9.0, beta=0.75
local1: 8*8*64, 384 + bias[384](initialised to 0.1's)
local2: 384, 192 + bias[192](initialised to 0.1's)
dropout: keep_prob=0.5
softmax: [192,10] + bias[10](initialised to 0.0's)
However, the results I'm getting are (with batches of 1000):
step 0, training accuracy 0.09
step 1, training accuracy 0.096
step 2, training accuracy 0.1
step 3, training accuracy 0.108
step 4, training accuracy 0.122
step 5, training accuracy 0.094
step 6, training accuracy 0.086
step 7, training accuracy 0.082
step 8, training accuracy 0.104
step 9, training accuracy 0.09
I'm using the following to update weights:
cross_entropy = tf.reduce_mean(
tf.nn.softmax_cross_entropy_with_logits(y_conv + 1e-10, y_))
train_step = tf.train.AdamOptimizer(0.0001).minimize(cross_entropy)
This is the guide I've been reading: https://www.tensorflow.org/versions/r0.11/tutorials/deep_cnn/index.html#convolutional-neural-networks
I have tried varying the learning rate from 1e-1 to 1e-8, but no luck.
Any help is greatly appreciated. Thanks in advance.
Use tf.nn.sparse_softmax_cross_entropy_with_logits instead of tf.nn.softmax_cross_entropy_with_logits.
you could try to do more things on your dataset:
normalize your image
shuffle your training dataset to reduce the iid(Independent and identically distributed) of data
try grayscale image to see some baseline of your model
Related
I am training the Mask R-CNN model.
I saved the weights after training for 2 epochs the 'head' and I want to continue from epoch three. But the model.train() function does not have initial_epoch argument as model.fit in a Sequential model for example.
I have the following code but if I run it with the loaded weights it starts from the first epoch and I don't want that:
EPOCHS = [1, 3, 5, 8]
model.train(dataset_train, dataset_val,
learning_rate = LEARNING_RATE,
epochs = EPOCHS[1],
layers = 'all',
augmentation = augmentation)
I would appreciate if someone can tell me what is the substitute for initial_epoch in my case.
After first 2 epochs of fitting your model changed its weights. So, when you call fit once again the model will continue training. Your progress won't lost
I am new to machine learning and lstm. I am referring this link LSTM for multistep forecasting for Encoder-Decoder LSTM Model With Multivariate Input section.
Here is my dataset description after reshaping the train and test set.
print(dataset.shape)
print(train_x.shape, train_y.shape)
print((test.shape)
(2192, 15)
(1806, 14, 14) (1806, 7, 1)
(364, 15)
In above I have n_input=14, n_out=7.
Here is my lstm model description:
def build_model(train, n_input):
# prepare data
train_x, train_y = to_supervised(train, n_input)
# define parameters
verbose, epochs, batch_size = 2, 100, 16
n_timesteps, n_features, n_outputs = train_x.shape[1], train_x.shape[2], train_y.shape[1]
# reshape output into [samples, timesteps, features]
train_y = train_y.reshape((train_y.shape[0], train_y.shape[1], 1))
# define model
model = Sequential()
model.add(LSTM(200, activation='relu', input_shape=(n_timesteps, n_features)))
model.add(RepeatVector(n_outputs))
model.add(LSTM(200, activation='relu', return_sequences=True))
model.add(TimeDistributed(Dense(100, activation='relu')))
model.add(TimeDistributed(Dense(1)))
model.compile(loss='mse', optimizer='adam')
# fit network
model.fit(train_x, train_y, epochs=epochs, batch_size=batch_size, verbose=verbose)
return model
On evaluating the model, I am getting the output as:
Epoch 98/100
- 8s - loss: 64.6554
Epoch 99/100
- 7s - loss: 64.4012
Epoch 100/100
- 7s - loss: 63.9625
According to my understanding: (Please correct me if I am wrong)
Here my model accuracy is 63.9625 (by seeing the last epoch 100). Also, this is not stable since there is a gap between epoch 99 and epoch 100.
Here are my questions:
How epoch and batch size above defined is related to gaining model accuracy? How its increment and decrement affect model accuracy?
Is my above-defined epoch, batch, n_input is correct for the model?
How can I increase my model accuracy? Is the above dataset size is good enough for this model?
I am not able to link all this parameter and kindly help me in understanding how to achieve more accuracy by the above factor.
Having a very large epoch size will not necessarily improve your accuracy. Epoch sizes can increase the accuracy up to a certain limit beyond which you begin to overfit your model. Having a very low one will also result in underfitting. See this. So looking at the huge difference between epoch 99 and epoch 100, you can already tell that you are overfitting the model. As a rule of thumb, when you notice the accuracy stops increasing, that is the ideal number of epochs you should have usually between 1 and 10. 100 seems too much already.
Batch size does not affect your accuracy. This is just used to control the speed or performance based on the memory in your GPU. If you have huge memory, you can have a huge batch size so training will be faster.
What you can do to increase your accuracy is:
1. Increase your dataset for the training.
2. Try using Convolutional Networks instead. Find more on convolutional networks from this youtube channel or in a nutshell, CNN's help you identify what features to focus on in training your model.
3. Try other algorithms.
There is no well defined formula for batch size. Typically a larger batch size will run faster, but may compromise your accuracy. You will have to play around with the number.
However, one component with regards to epochs that you are missing is validation. It is normal to have a validation dataset and observe whether this accuracy over this dataset goes up or down. If the accuracy over this dataset goes up, you can multiply your learning rate by 0.8. See this link: https://machinelearningmastery.com/difference-test-validation-datasets/
I'm new with TFLearn.
I was studying this introduction tutorial to TFLearn, in which a fixed amount of epochs is set. However I would like to know if it is possible to use the combination learning_rate and accuracy to determine the end of the network training ...
for example: according to the accuracy decrease or increase the learning_rate ... or according to the accuracy stop the training.
# Build neural network
net = tflearn.input_data(shape=[None, 6])
net = tflearn.fully_connected(net, 32)
net = tflearn.fully_connected(net, 32)
net = tflearn.fully_connected(net, 2, activation='softmax')
net = tflearn.regression(net)
# Define model
model = tflearn.DNN(net)
# Start training (apply gradient descent algorithm)
model.fit(data, labels, n_epoch=10, batch_size=16, show_metric=True)
:)
Look into http://tflearn.org/models/dnn/ and best_checkpoint_path and best_val_accuracy. The parameters will save your best checkpoint.
If you want to stop the training, you have to program a callback yourself to stop the training. Here is a nice tutorial about early stopping with TFlearn: http://mckinziebrandon.me/TensorflowNotebooks/2016/11/28/early-stop-solution.html
When I was training a CNN to classify images of distorted digits varying from 0 to 9, the accuracy of training set and test set improved obviously.
Epoch[0] Batch [100] Train-multi-accuracy_0=0.296000
...
Epoch[0] Batch [500] Train-multi-accuracy_0=0.881900
In Epoch[1] and Epoch[2] the accuracy oscillate slightly between 0.85 and 0.95, however,
Epoch[3] Batch [300] Train-multi-accuracy_0=0.926400
Epoch[3] Batch [400] Train-multi-accuracy_0=0.105300
Epoch[3] Batch [500] Train-multi-accuracy_0=0.098200
Since then, the accuracy was around 0.1 which meant the network only gave random prediction.
I repeated the training several times, this case occurred every time. What's wrong with it?
Is the adapted learning rate strategy the reason?
model = mx.model.FeedForward(...,
optimizer = 'adam',
num_epoch = 50,
wd = 0.00001,
...,
)
What exactly is the model you're training? If you're using the mnist dataset, usually a simple 2-layer MLP trained with sgd with give you pretty high accuracy.
I am adding my RNN text classification model. I am using last state to classify text. Dataset is small I am using glove vector for embedding.
def rnn_inputs(FLAGS, input_data):
with tf.variable_scope('rnn_inputs', reuse=True):
W_input = tf.get_variable("W_input", [FLAGS.en_vocab_size, FLAGS.num_hidden_units])
embeddings = tf.nn.embedding_lookup(W_input, input_data)
return embeddings
self.inputs_X = tf.placeholder(tf.int32, shape=[None, None, FLAGS.num_dim_input], name='inputs_X')
self.targets_y = tf.placeholder(tf.float32, shape=[None, None], name='targets_y')
self.dropout = tf.placeholder(tf.float32, name='dropout')
self.seq_leng = tf.placeholder(tf.int32, shape=[None, ], name='seq_leng')
with tf.name_scope("RNNcell"):
stacked_cell = rnn_cell(FLAGS, self.dropout)
with tf.name_scope("Inputs"):
with tf.variable_scope('rnn_inputs'):
W_input = tf.get_variable("W_input", [FLAGS.en_vocab_size, FLAGS.num_hidden_units], initializer=tf.truncated_normal_initializer(stddev=0.1))
inputs = rnn_inputs(FLAGS, self.inputs_X)
#initial_state = stacked_cell.zero_state(FLAGS.batch_size, tf.float32)
with tf.name_scope("DynamicRnn"):
# flat_inputs = tf.reshape(inputs, [FLAGS.batch_size, -1, FLAGS.num_hidden_units])
flat_inputs = tf.transpose(tf.reshape(inputs, [-1, FLAGS.batch_size, FLAGS.num_hidden_units]), perm=[1, 0, 2])
all_outputs, state = tf.nn.dynamic_rnn(cell=stacked_cell, inputs=flat_inputs, sequence_length=self.seq_leng, dtype=tf.float32)
outputs = state[0]
with tf.name_scope("Logits"):
with tf.variable_scope('rnn_softmax'):
W_softmax = tf.get_variable("W_softmax", [FLAGS.num_hidden_units, FLAGS.num_classes])
b_softmax = tf.get_variable("b_softmax", [FLAGS.num_classes])
logits = rnn_softmax(FLAGS, outputs)
probabilities = tf.nn.softmax(logits, name="probabilities")
self.accuracy = tf.equal(tf.argmax(self.targets_y,1), tf.argmax(logits,1))
with tf.name_scope("Loss"):
self.loss = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=logits, labels=self.targets_y))
with tf.name_scope("Grad"):
self.lr = tf.Variable(0.0, trainable=False)
trainable_vars = tf.trainable_variables()
grads, _ = tf.clip_by_global_norm(tf.gradients(self.loss, trainable_vars), FLAGS.max_gradient_norm)
optimizer = tf.train.AdamOptimizer(self.lr)
self.train_optimizer = optimizer.apply_gradients(zip(grads, trainable_vars))
sampling_outputs = all_outputs[0]
sampling_logits = rnn_softmax(FLAGS, sampling_outputs)
self.sampling_probabilities = tf.nn.softmax(sampling_logits)
Output printed
EPOCH 7 SUMMARY 40 STEP
Training loss 0.439
Training accuracy 0.247
----------------------
Validation loss 0.452
Validation accuracy 0.234
----------------------
Saving the model.
EPOCH 8 SUMMARY 45 STEP
Training loss 0.429
Training accuracy 0.281
----------------------
Validation loss 0.462
Validation accuracy 0.203
----------------------
Saving the model.
EPOCH 9 SUMMARY 50 STEP
Training loss 0.428
Training accuracy 0.268
----------------------
Validation loss 0.465
Validation accuracy 0.188
----------------------
Saving the model.
EPOCH 10 SUMMARY 55 STEP
Training loss 0.424
Training accuracy 0.284
----------------------
Validation loss 0.455
Validation accuracy 0.172
----------------------
Saving the model.
EPOCH 11 SUMMARY 60 STEP
Training loss 0.421
Training accuracy 0.305
----------------------
Validation loss 0.461
Validation accuracy 0.156
----------------------
Saving the model.
EPOCH 12 SUMMARY 65 STEP
Training loss 0.418
Training accuracy 0.299
----------------------
Validation loss 0.462
Validation accuracy 0.141
----------------------
Saving the model.
EPOCH 13 SUMMARY 70 STEP
Training loss 0.416
Training accuracy 0.286
----------------------
Validation loss 0.462
Validation accuracy 0.156
----------------------
Saving the model.
EPOCH 14 SUMMARY 75 STEP
Training loss 0.413
Training accuracy 0.323
----------------------
Validation loss 0.468
Validation accuracy 0.141
----------------------
Saving the model.
After 165 EPOCH
EPOCH 165 SUMMARY 830 STEP
Training loss 0.306
Training accuracy 0.544
----------------------
Validation loss 0.547
Validation accuracy 0.109
----------------------
Saving the model.
If training loss goes down, but validation loss goes up, it is likely that you are running into the problem of overfitting. This means: Generally speaking it is not that hard for a machine learning algorithm to perform exceptionally well on the training set (i.e. training loss is very low). If the algorithm just memorizes the training data set, it will produce a perfect score.
The challenge in machine learning however is to devise a model that performs well on unseen data, i.e. data that was not presented to the algorithm during training. This is what your validation set represents. If a model performs well on unseen data, we say that it generalizes well. If a model performs only well on training data, we call this overfitting. A model that does not generalize well is essentially useless, as it did not learn anything about the underlying structure of the data but just memorized the training set. This is useless because a trained model will be used on new data and probably never data used during training.
So how can you prevent that:
Reduce your model's capacity, i.e. take a simpler model and see if this can still accomplish the task. A less powerful model is less susceptible to simply memorize the data. Cf. also Occam's razor.
Use regularization: use e.g. dropout regularization in your model or add e.g. L1 or L2 norm of your trainable parameters to your loss function.
To get more information about this, search online for regularization, overfitting, etc.