How to start training from different epoch Mask R-CNN? - tensorflow

I am training the Mask R-CNN model.
I saved the weights after training for 2 epochs the 'head' and I want to continue from epoch three. But the model.train() function does not have initial_epoch argument as model.fit in a Sequential model for example.
I have the following code but if I run it with the loaded weights it starts from the first epoch and I don't want that:
EPOCHS = [1, 3, 5, 8]
model.train(dataset_train, dataset_val,
learning_rate = LEARNING_RATE,
epochs = EPOCHS[1],
layers = 'all',
augmentation = augmentation)
I would appreciate if someone can tell me what is the substitute for initial_epoch in my case.

After first 2 epochs of fitting your model changed its weights. So, when you call fit once again the model will continue training. Your progress won't lost

Related

Stateful LSTM Tensorflow Invalid Input_h Shape Error

I am experimenting with stateful LSTM on a time-series regression problem by using TensorFlow. I apologize that I cannot share the dataset.
Below is my code.
train_feature = train_feature.reshape((train_feature.shape[0], 1, train_feature.shape[1]))
val_feature = val_feature.reshape((val_feature.shape[0], 1, val_feature.shape[1]))
batch_size = 64
model = tf.keras.Sequential()
model.add(tf.keras.layers.LSTM(50, batch_input_shape=(batch_size, train_feature.shape[1], train_feature.shape[2]), stateful=True))
model.add(tf.keras.layers.Dense(1))
model.compile(optimizer='adam',
loss='mse',
metrics=[tf.keras.metrics.RootMeanSquaredError()])
model.fit(train_feature, train_label,
epochs=10,
batch_size=batch_size)
When I run the above code, after the end of the first epoch, I will get an error as follows.
InvalidArgumentError: [_Derived_] Invalid input_h shape: [1,64,50] [1,49,50]
[[{{node CudnnRNN}}]]
[[sequential_1/lstm_1/StatefulPartitionedCall]] [Op:__inference_train_function_1152847]
Function call stack:
train_function -> train_function -> train_function
However, the model will be successfully trained if I change the batch_size to 1, and change the code for model training to the following.
total_epochs = 10
for i in range(total_epochs):
model.fit(train_feature, train_label,
epochs=1,
validation_data=(val_feature, val_label),
batch_size=batch_size,
shuffle=False)
model.reset_states()
Nevertheless, with a very large data (1 million rows), the model training will take a very long time since the batch_size is 1.
So, I wonder, how to train a stateful LSTM with a batch size larger than 1 (e.g. 64), without getting the invalid input_h shape error?
Thanks for your answers.
The fix is to ensure batch size never changes between batches. They must all be the same size.
Method 1
One way is to use a batch size that perfectly divides your dataset into equal-sized batches. For example, if total size of data is 1500 examples, then use a batch size of 50 or 100 or some other proper divisor of 1500.
batch_size = len(data)/proper_divisor
Method 2
The other way is to ignore any batch that is less than the specified size, and this can be done using the TensorFlow Dataset API and setting the drop_remainder to True.
batch_size = 64
train_data = tf.data.Dataset.from_tensor_slices((train_feature, train_label))
train_data = train_data.repeat().batch(batch_size, drop_remainder=True)
steps_per_epoch = len(train_feature) // batch_size
model.fit(train_data,
epochs=10, steps_per_epoch = steps_per_epoch)
When using the Dataset API like above, you will need to also specify how many rounds of training count as an epoch (essentially how many batches to count as 1 epoch). A tf.data.Dataset instance (the result from tf.data.Dataset.from_tensor_slices) doesn't know the size of the data that it's streaming to the model, so what constitutes as one epoch has to be manually specified with steps_per_epoch.
Your new code will look like this:
train_feature = train_feature.reshape((train_feature.shape[0], 1, train_feature.shape[1]))
val_feature = val_feature.reshape((val_feature.shape[0], 1, val_feature.shape[1]))
batch_size = 64
train_data = tf.data.Dataset.from_tensor_slices((train_feature, train_label))
train_data = train_data.repeat().batch(batch_size, drop_remainder=True)
model = tf.keras.Sequential()
model.add(tf.keras.layers.LSTM(50, batch_input_shape=(batch_size, train_feature.shape[1], train_feature.shape[2]), stateful=True))
model.add(tf.keras.layers.Dense(1))
model.compile(optimizer='adam',
loss='mse',
metrics=[tf.keras.metrics.RootMeanSquaredError()])
steps_per_epoch = len(train_feature) // batch_size
model.fit(train_data,
epochs=10, steps_per_epoch = steps_per_epoch)
You can also include the validation set as well, like this (not showing other code):
batch_size = 64
val_data = tf.data.Dataset.from_tensor_slices((val_feature, val_label))
val_data = val_data.repeat().batch(batch_size, drop_remainder=True)
validation_steps = len(val_feature) // batch_size
model.fit(train_data, epochs=10,
steps_per_epoch=steps_per_epoch,
validation_steps=validation_steps)
Caveat: This means a few datapoints will never be seen by the model. To get around that, you can shuffle the dataset each round of training, so that the datapoints left behind each epoch changes, giving everyone a chance to be seen by the model.
buffer_size = 1000 # the bigger the slower but more effective shuffling.
train_data = tf.data.Dataset.from_tensor_slices((train_feature, train_label))
train_data = train_data.shuffle(buffer_size=buffer_size, reshuffle_each_iteration=True)
train_data = train_data.repeat().batch(batch_size, drop_remainder=True)
Why the error occurs
Stateful RNNs and their variants (LSTM, GRU, etc.) require fixed batch size. The reason is simply because statefulness is one way to realize Truncated Backprop Through Time, by passing the final hidden state for a batch as the initial hidden state of the next batch. The final hidden state for the first batch has to have exactly the same shape as the initial hidden state of the next batch, which requires that batch size stay the same across batches.
When you set the batch size to 64, model.fit will use the remaining data at the end of an epoch as a batch, and this may not have up to 64 datapoints. So, you get such an error because the batch size is different from what the stateful LSTM expects. You don't have the problem with batch size of 1 because any remaining data at the end of an epoch will always contain exactly 1 datapoint, so no errors. More generally, 1 is always a divisor of any integer. So, if you picked any other divisor of your data size, you should not get the error.
In the error message you posted, it appears the last batch has size of 49 instead of 64. On a side note: The reason the shapes look different from the input is because, under the hood, keras works with the tensors in time_major (i.e. the first axis is for steps of sequence). When you pass a tensor of shape (10, 15, 2) that represents (batch_size, steps_per_sequence, num_features), keras reshapes it to (15, 10, 2) under the hood.

Validation loss is inconsistent if I predict same results and calculate loss afterwards

I have an LSTM model that predicts weather. When I run the model with model.fit, it gives around %20 MAPE.
When I try to predict the same data that given to model.fit, when I calculate the loss, it results %60 MAPE. What might be causing this difference? I would have ignored it but the difference is too much.
Here is my code in main:
#preparing the data and building the model first
regressor.fit(x_train, y_train, epochs = 100, batch_size = 32,
validation_data = (x_test, y_test))
results = regressor.predict(x_test)
print(bm.mean_absolute_percentage_error(y_test, results))
in bm:
def mean_absolute_percentage_error(real, est):
"""Calculates the mean absolute precentage error.
"""
sess = Session()
with sess.as_default():
tensor = losses.mean_absolute_percentage_error(real, est)
return tensor.eval()[-1]
I used the same function that keras uses for calculating MAPE. Even if I made a mistake when preparing test data, they both should be consistently wrong because they take the same set as argument.

Understanding epoch, batch size, accuracy and performance gain in lstm forecasting model

I am new to machine learning and lstm. I am referring this link LSTM for multistep forecasting for Encoder-Decoder LSTM Model With Multivariate Input section.
Here is my dataset description after reshaping the train and test set.
print(dataset.shape)
print(train_x.shape, train_y.shape)
print((test.shape)
(2192, 15)
(1806, 14, 14) (1806, 7, 1)
(364, 15)
In above I have n_input=14, n_out=7.
Here is my lstm model description:
def build_model(train, n_input):
# prepare data
train_x, train_y = to_supervised(train, n_input)
# define parameters
verbose, epochs, batch_size = 2, 100, 16
n_timesteps, n_features, n_outputs = train_x.shape[1], train_x.shape[2], train_y.shape[1]
# reshape output into [samples, timesteps, features]
train_y = train_y.reshape((train_y.shape[0], train_y.shape[1], 1))
# define model
model = Sequential()
model.add(LSTM(200, activation='relu', input_shape=(n_timesteps, n_features)))
model.add(RepeatVector(n_outputs))
model.add(LSTM(200, activation='relu', return_sequences=True))
model.add(TimeDistributed(Dense(100, activation='relu')))
model.add(TimeDistributed(Dense(1)))
model.compile(loss='mse', optimizer='adam')
# fit network
model.fit(train_x, train_y, epochs=epochs, batch_size=batch_size, verbose=verbose)
return model
On evaluating the model, I am getting the output as:
Epoch 98/100
- 8s - loss: 64.6554
Epoch 99/100
- 7s - loss: 64.4012
Epoch 100/100
- 7s - loss: 63.9625
According to my understanding: (Please correct me if I am wrong)
Here my model accuracy is 63.9625 (by seeing the last epoch 100). Also, this is not stable since there is a gap between epoch 99 and epoch 100.
Here are my questions:
How epoch and batch size above defined is related to gaining model accuracy? How its increment and decrement affect model accuracy?
Is my above-defined epoch, batch, n_input is correct for the model?
How can I increase my model accuracy? Is the above dataset size is good enough for this model?
I am not able to link all this parameter and kindly help me in understanding how to achieve more accuracy by the above factor.
Having a very large epoch size will not necessarily improve your accuracy. Epoch sizes can increase the accuracy up to a certain limit beyond which you begin to overfit your model. Having a very low one will also result in underfitting. See this. So looking at the huge difference between epoch 99 and epoch 100, you can already tell that you are overfitting the model. As a rule of thumb, when you notice the accuracy stops increasing, that is the ideal number of epochs you should have usually between 1 and 10. 100 seems too much already.
Batch size does not affect your accuracy. This is just used to control the speed or performance based on the memory in your GPU. If you have huge memory, you can have a huge batch size so training will be faster.
What you can do to increase your accuracy is:
1. Increase your dataset for the training.
2. Try using Convolutional Networks instead. Find more on convolutional networks from this youtube channel or in a nutshell, CNN's help you identify what features to focus on in training your model.
3. Try other algorithms.
There is no well defined formula for batch size. Typically a larger batch size will run faster, but may compromise your accuracy. You will have to play around with the number.
However, one component with regards to epochs that you are missing is validation. It is normal to have a validation dataset and observe whether this accuracy over this dataset goes up or down. If the accuracy over this dataset goes up, you can multiply your learning rate by 0.8. See this link: https://machinelearningmastery.com/difference-test-validation-datasets/

Keras model.load_weights(WEIGHTS) provide inaccurate results

I'm training a LSTM RNN for description generation using Keras (Tensorflow Backend) with MSCOCO dataset. When training the model it had 92% accuracy with 0.79 loss. Further when the model was training I tested the description generation at each epoch and the model provided very good predictions with a meaningful description when it gives a random word.
However after training I loaded the model using model.load_weights(WEIGHTS) method in Keras and tried to create a description by giving a random word as I've done before. But now model is not providing a meaningful description and it just outputs random words which has no meaning at all.
Can anyone tell me what could be the issue for this ?
My model parameters are:
10 LSTM layers, Learning rate: 0.04, Activation: Softmax, Loss Function: Categorical Cross entropy, Optimizer: rmsprop
UPDATE:
This is my model:
model = Sequential()
model.add(LSTM(HIDDEN_DIM, input_shape=(None, VOCAB_SIZE), return_sequences=True))
for i in range(LAYER_NUM - 1):
model.add(LSTM(HIDDEN_DIM, return_sequences=True))
model.add(TimeDistributed(Dense(VOCAB_SIZE)))
model.add(Activation('softmax'))
model.add(Dropout(0.04))
model.compile(loss="categorical_crossentropy", optimizer="rmsprop", metrics=['accuracy'])
This is how I train & save my model weights (I generate a description at each epoch to test the accuracy):
model.fit(X, Y, batch_size=BATCH_SIZE, verbose=1, epochs=EPOCHS)
EPOCHS += 1
generate_description(model, GENERATE_LENGTH, VOCAB_SIZE, index_to_word) model.save_weights('checkpoint_layer_{}_hidden_{}_epoch_{}.hdf5'.format(LAYER_NUM, HIDDEN_DIM, EPOCHS))
This is how I load my model (WEIGHTS = my saved model):
model.load_weights(WEIGHTS)
desc = generate_description(model, GENERATE_LENGTH, VOCAB_SIZE, index_to_word)
print(desc)
I provide randomly generated vector to my model for testing. This is how I generate the description.
def generate_description(model, length, vocab_size, index_to_word):
index = [np.random.randint(vocab_size)]
Y_word = [index_to_word[index[-1]]]
X = np.zeros((1, length, vocab_size))
for i in range(length):
# Appending the last predicted word to next timestep
X[0, i, :][index[-1]] = 1
print(index_to_word[index[-1]])
index = np.argmax(model.predict(X[:, :i + 1, :])[0], 1)
Y_word.append(index_to_word[index[-1]])
Y_word.append(' ')
return ('').join(Y_word)

Tensorflow dynamic_rnn training loss decreasing, validation loss increasing

I am adding my RNN text classification model. I am using last state to classify text. Dataset is small I am using glove vector for embedding.
def rnn_inputs(FLAGS, input_data):
with tf.variable_scope('rnn_inputs', reuse=True):
W_input = tf.get_variable("W_input", [FLAGS.en_vocab_size, FLAGS.num_hidden_units])
embeddings = tf.nn.embedding_lookup(W_input, input_data)
return embeddings
self.inputs_X = tf.placeholder(tf.int32, shape=[None, None, FLAGS.num_dim_input], name='inputs_X')
self.targets_y = tf.placeholder(tf.float32, shape=[None, None], name='targets_y')
self.dropout = tf.placeholder(tf.float32, name='dropout')
self.seq_leng = tf.placeholder(tf.int32, shape=[None, ], name='seq_leng')
with tf.name_scope("RNNcell"):
stacked_cell = rnn_cell(FLAGS, self.dropout)
with tf.name_scope("Inputs"):
with tf.variable_scope('rnn_inputs'):
W_input = tf.get_variable("W_input", [FLAGS.en_vocab_size, FLAGS.num_hidden_units], initializer=tf.truncated_normal_initializer(stddev=0.1))
inputs = rnn_inputs(FLAGS, self.inputs_X)
#initial_state = stacked_cell.zero_state(FLAGS.batch_size, tf.float32)
with tf.name_scope("DynamicRnn"):
# flat_inputs = tf.reshape(inputs, [FLAGS.batch_size, -1, FLAGS.num_hidden_units])
flat_inputs = tf.transpose(tf.reshape(inputs, [-1, FLAGS.batch_size, FLAGS.num_hidden_units]), perm=[1, 0, 2])
all_outputs, state = tf.nn.dynamic_rnn(cell=stacked_cell, inputs=flat_inputs, sequence_length=self.seq_leng, dtype=tf.float32)
outputs = state[0]
with tf.name_scope("Logits"):
with tf.variable_scope('rnn_softmax'):
W_softmax = tf.get_variable("W_softmax", [FLAGS.num_hidden_units, FLAGS.num_classes])
b_softmax = tf.get_variable("b_softmax", [FLAGS.num_classes])
logits = rnn_softmax(FLAGS, outputs)
probabilities = tf.nn.softmax(logits, name="probabilities")
self.accuracy = tf.equal(tf.argmax(self.targets_y,1), tf.argmax(logits,1))
with tf.name_scope("Loss"):
self.loss = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=logits, labels=self.targets_y))
with tf.name_scope("Grad"):
self.lr = tf.Variable(0.0, trainable=False)
trainable_vars = tf.trainable_variables()
grads, _ = tf.clip_by_global_norm(tf.gradients(self.loss, trainable_vars), FLAGS.max_gradient_norm)
optimizer = tf.train.AdamOptimizer(self.lr)
self.train_optimizer = optimizer.apply_gradients(zip(grads, trainable_vars))
sampling_outputs = all_outputs[0]
sampling_logits = rnn_softmax(FLAGS, sampling_outputs)
self.sampling_probabilities = tf.nn.softmax(sampling_logits)
Output printed
EPOCH 7 SUMMARY 40 STEP
Training loss 0.439
Training accuracy 0.247
----------------------
Validation loss 0.452
Validation accuracy 0.234
----------------------
Saving the model.
EPOCH 8 SUMMARY 45 STEP
Training loss 0.429
Training accuracy 0.281
----------------------
Validation loss 0.462
Validation accuracy 0.203
----------------------
Saving the model.
EPOCH 9 SUMMARY 50 STEP
Training loss 0.428
Training accuracy 0.268
----------------------
Validation loss 0.465
Validation accuracy 0.188
----------------------
Saving the model.
EPOCH 10 SUMMARY 55 STEP
Training loss 0.424
Training accuracy 0.284
----------------------
Validation loss 0.455
Validation accuracy 0.172
----------------------
Saving the model.
EPOCH 11 SUMMARY 60 STEP
Training loss 0.421
Training accuracy 0.305
----------------------
Validation loss 0.461
Validation accuracy 0.156
----------------------
Saving the model.
EPOCH 12 SUMMARY 65 STEP
Training loss 0.418
Training accuracy 0.299
----------------------
Validation loss 0.462
Validation accuracy 0.141
----------------------
Saving the model.
EPOCH 13 SUMMARY 70 STEP
Training loss 0.416
Training accuracy 0.286
----------------------
Validation loss 0.462
Validation accuracy 0.156
----------------------
Saving the model.
EPOCH 14 SUMMARY 75 STEP
Training loss 0.413
Training accuracy 0.323
----------------------
Validation loss 0.468
Validation accuracy 0.141
----------------------
Saving the model.
After 165 EPOCH
EPOCH 165 SUMMARY 830 STEP
Training loss 0.306
Training accuracy 0.544
----------------------
Validation loss 0.547
Validation accuracy 0.109
----------------------
Saving the model.
If training loss goes down, but validation loss goes up, it is likely that you are running into the problem of overfitting. This means: Generally speaking it is not that hard for a machine learning algorithm to perform exceptionally well on the training set (i.e. training loss is very low). If the algorithm just memorizes the training data set, it will produce a perfect score.
The challenge in machine learning however is to devise a model that performs well on unseen data, i.e. data that was not presented to the algorithm during training. This is what your validation set represents. If a model performs well on unseen data, we say that it generalizes well. If a model performs only well on training data, we call this overfitting. A model that does not generalize well is essentially useless, as it did not learn anything about the underlying structure of the data but just memorized the training set. This is useless because a trained model will be used on new data and probably never data used during training.
So how can you prevent that:
Reduce your model's capacity, i.e. take a simpler model and see if this can still accomplish the task. A less powerful model is less susceptible to simply memorize the data. Cf. also Occam's razor.
Use regularization: use e.g. dropout regularization in your model or add e.g. L1 or L2 norm of your trainable parameters to your loss function.
To get more information about this, search online for regularization, overfitting, etc.