I have a time-series dataset and I trained it using LSTM. I train using 200 epochs and the result is the loss value and val_loss value is pretty good (IMO)
then I think the result still can be better if I add more epochs so I retrain using 400 epochs. but the loss and val_loss is rising
but somehow the result is different. even become worse. is it better still to use the 200 epoch model or there really is a condition if more epochs can worsen the model
This is probably because your lr(learning rate) is too large. You could try to reduce your lr. From the graph, the training loss is increased so I think this case is not the problem of overfitting.
I train a ml model in TensorFlow and at that time the test accuracy was high so I saved the model and now after some days I load that pretrained model and this time I'm not getting the same test accuracy on same test dataset the accuracy is decreased a lot.
I am training my Neural Network and the loss and accuracy graph look like this:
The training accuracy drops to 0 after epoch 55.
I am training Bi-LSTM for text classification. I use word embedding as the features and I use Adam as optimizer.
Anyone knows why? Thanks!
I am new to machine learning and lstm. I am referring this link LSTM for multistep forecasting for Encoder-Decoder LSTM Model With Multivariate Input section.
Here is my dataset description after reshaping the train and test set.
print(dataset.shape)
print(train_x.shape, train_y.shape)
print((test.shape)
(2192, 15)
(1806, 14, 14) (1806, 7, 1)
(364, 15)
In above I have n_input=14, n_out=7.
Here is my lstm model description:
def build_model(train, n_input):
# prepare data
train_x, train_y = to_supervised(train, n_input)
# define parameters
verbose, epochs, batch_size = 2, 100, 16
n_timesteps, n_features, n_outputs = train_x.shape[1], train_x.shape[2], train_y.shape[1]
# reshape output into [samples, timesteps, features]
train_y = train_y.reshape((train_y.shape[0], train_y.shape[1], 1))
# define model
model = Sequential()
model.add(LSTM(200, activation='relu', input_shape=(n_timesteps, n_features)))
model.add(RepeatVector(n_outputs))
model.add(LSTM(200, activation='relu', return_sequences=True))
model.add(TimeDistributed(Dense(100, activation='relu')))
model.add(TimeDistributed(Dense(1)))
model.compile(loss='mse', optimizer='adam')
# fit network
model.fit(train_x, train_y, epochs=epochs, batch_size=batch_size, verbose=verbose)
return model
On evaluating the model, I am getting the output as:
Epoch 98/100
- 8s - loss: 64.6554
Epoch 99/100
- 7s - loss: 64.4012
Epoch 100/100
- 7s - loss: 63.9625
According to my understanding: (Please correct me if I am wrong)
Here my model accuracy is 63.9625 (by seeing the last epoch 100). Also, this is not stable since there is a gap between epoch 99 and epoch 100.
Here are my questions:
How epoch and batch size above defined is related to gaining model accuracy? How its increment and decrement affect model accuracy?
Is my above-defined epoch, batch, n_input is correct for the model?
How can I increase my model accuracy? Is the above dataset size is good enough for this model?
I am not able to link all this parameter and kindly help me in understanding how to achieve more accuracy by the above factor.
Having a very large epoch size will not necessarily improve your accuracy. Epoch sizes can increase the accuracy up to a certain limit beyond which you begin to overfit your model. Having a very low one will also result in underfitting. See this. So looking at the huge difference between epoch 99 and epoch 100, you can already tell that you are overfitting the model. As a rule of thumb, when you notice the accuracy stops increasing, that is the ideal number of epochs you should have usually between 1 and 10. 100 seems too much already.
Batch size does not affect your accuracy. This is just used to control the speed or performance based on the memory in your GPU. If you have huge memory, you can have a huge batch size so training will be faster.
What you can do to increase your accuracy is:
1. Increase your dataset for the training.
2. Try using Convolutional Networks instead. Find more on convolutional networks from this youtube channel or in a nutshell, CNN's help you identify what features to focus on in training your model.
3. Try other algorithms.
There is no well defined formula for batch size. Typically a larger batch size will run faster, but may compromise your accuracy. You will have to play around with the number.
However, one component with regards to epochs that you are missing is validation. It is normal to have a validation dataset and observe whether this accuracy over this dataset goes up or down. If the accuracy over this dataset goes up, you can multiply your learning rate by 0.8. See this link: https://machinelearningmastery.com/difference-test-validation-datasets/
I'm trying to apply this example model to input images that are much larger (224x224 RGB). With stochastic gradient descent training I get initial loss values that are extremely high but then they drop to 0:
Minibatch loss at step 0: 85038.437500
Minibatch accuracy: 7.0%
Minibatch loss at step 500: 4275149.500000
Minibatch accuracy: 46.9%
Minibatch loss at step 1000: 6613.396484
Minibatch accuracy: 98.4%
Minibatch loss at step 1500: 0.000000
Minibatch accuracy: 100.0%
Minibatch loss at step 2000: 0.000000
Minibatch accuracy: 100.0%
Minibatch loss at step 2500: 0.000000
Minibatch accuracy: 100.0%
Minibatch loss at step 3000: 0.000000
Minibatch accuracy: 100.0%
Test accuracy: 86.9%
I've tried to set the learning rate of the GradientDescentOptimizer to 0.1 and 0.01 but it doesn't help.
What does it mean for the loss to drop to zero? How can I prevent this? Is this model inherently not applicable to this input set?
Zero loss means the model perfectly fits the data, and this is confirmed with your result of 100% accuracy.
That is actually pretty good, but I also see signs of overfitting, as the test accuracy is only 86.9%, considerably smaller than training accuracy. This means the model fits the data too well and is also fitting noise in the training data that is just not present in the test data. This means the model does generalize but it has some issues as the test accuracy is lower. If you look at the test loss (instead of accuracy) you will see that it is non-zero.
How to prevent overfitting? The model you are using is quite simple and seems it is not using any kind of regularization. Adding L1/L2 regularization, Dropout or Batch Normalization will surely decrease overfitting.