This is the exact code I'm running with Keras and TensorFlow as a back end. For each run with the same program, the training results are different. Some times it gets 100% accuracy in 400th iteration and some times in the 200th.
training_data = np.array([[0,0],[0,1],[1,0],[1,1]], "float32")
target_data = np.array([[0],[1],[1],[0]], "float32")
model = Sequential()
model.add(Dense(4, input_dim=2, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='mean_squared_error',
optimizer='adam',
metrics=['binary_accuracy'])
model.fit(training_data, target_data, epochs=500, verbose=2)
Epoch 403/500
0s - loss: 0.2256 - binary_accuracy: 0.7500
So why does the result change in each execution as the train data is fixed ? Would greatly appreciate some explanation.
The training set is fixed, but we set the initial weights of the neural network to a random value in a small range, so each time you train the network you get slightly different results.
If you want reproducible results you can set the numpy random seed with numpy.random.seed to a fixed value, so the same weights will be used, but beware that this can bias your network.
Related
Am using keras to perform image classification, I have 10 classes and ~900 image each, I used VGG 16 and built on top of that this small network
model = Sequential()
model.add(Flatten(input_shape=train_data.shape[1:]))
model.add(Dense(256, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(num_classes, activation='softmax'))
am training with 50 epoch
model.compile(optimizer='rmsprop',
loss='categorical_crossentropy', metrics=['accuracy'])
I get the below accraucy and loss
[INFO] accuracy: 94.72%
[INFO] Loss: 0.45841544931342115
yet am not sure how to stabilize the loss, should I increase the epochs or there would be other parameters I need to change ?
Due to the val loss fluctuates from first epochs, I think that you forget to freeze the main VGG model and just train your adding Dense stack layers.
Indeed It's better to use 2D Global Ave Polling instead of flattening.
If problem don't solve try to use more efficient pre-trained CNN architectures such as MobileNet V2 or Xception
I am doing binary classification using Keras from TensorFlow 2.0, dense net with binary cross entropy loss.
For debugging purpose I was trying to overfit my model first and found weird behavior: I got different results of loss whether I used sigmoid in last dense layer init or initialize this last layer without any activation and add sigmoid activation separately as next step of the model.
I cannot find where in tf/keras it is stated that if last layer of model is just sigmoid and BinaryCrossentropy(from_logits=True) loss will be computed without that sigmoid. Or if there is other explanation of the following results.
As far as I understand BinaryCrossentropy(from_logits=True) expects logits- in this case output of Dense layer without any activation. That's the first example printed by posted code. As there is no activation, and only neuron has only one weight=1 the prediction is the same as input, as well as loss.
In the second variant I used Dense layer without activation in constructor, however I added sigmoid activation as "separate layer" and set BinaryCrossentropy(from_logits=False) in this case BCE doesn't take logits but ("probabilities") computed by sigmoid(logits) as input. Output predicted by model (probability) is equal to 1, which makes sense. However I expected Keras to take that probability, change it back to logit and compute loss source. Due to computational imprecise (tf source) I figured out that BCE cannot compute higher loss than 15.3332 from not logits but it does in this case.
Expected behavior shows up in the last case (print of the code) where activation is in init of Dense layer- loss is clipped to 15.3332, no matter how big the input/output is if it is above 16.
My assumption is that in the second case loss is calculated from logits which is output of Dense layer, but is not the output of the model.
from tensorflow.python.keras.layers import Input, Dense
from tensorflow.python.keras.activations import sigmoid
from tensorflow.python.keras.models import Model
from tensorflow.python.keras.losses import BinaryCrossentropy
import numpy as np
x_train = np.array([[10000]])
y_train = [0]
inp = Input(x_train.shape[1])
out = Dense(1, trainable=False, use_bias=False)(inp)
model = Model(inp, out)
model.get_layer('dense').set_weights([np.array([[1]])])
model.compile(optimizer='adam', loss=BinaryCrossentropy(from_logits=True))
print('\nFrom logits=True, no activation')
model.fit(x_train, y_train, epochs=1)
print('Prediction:')
print(model.predict(x_train))
inp = Input(x_train.shape[1])
out = Dense(1, trainable=False, use_bias=False)(inp)
out = sigmoid(out)
model = Model(inp, out)
model.get_layer('dense_1').set_weights([np.array([[1]])])
model.compile(optimizer='adam', loss=BinaryCrossentropy(from_logits=False))
print('\nFrom logits=False, separate sigmoid')
model.fit(x_train, y_train, epochs=1)
print('Prediction:')
print(model.predict(x_train))
inp = Input(x_train.shape[1])
out = Dense(1, trainable=False, use_bias=False, activation='sigmoid')(inp)
model = Model(inp, out)
model.get_layer('dense_2').set_weights([np.array([[1]])])
model.compile(optimizer='adam', loss=BinaryCrossentropy(from_logits=False))
print('\nFrom logits=False, sigmoid in dense')
model.fit(x_train, y_train, epochs=1)
print('Prediction:')
print(model.predict(x_train))
From logits=True, no activation
1/1 [==============================] - 0s 17ms/sample - loss: 1000.0000
Prediction:
[[1000.]]
From logits=False, separate sigmoid
Train on 1 samples
1/1 [==============================] - 0s 15ms/sample - loss: 1000.0000
Prediction:
[[1.]]
From logits=False, sigmoid in dense
Train on 1 samples
1/1 [==============================] - 0s 17ms/sample - loss: 15.3332
Prediction:
[[1.]]
I am new to machine learning and lstm. I am referring this link LSTM for multistep forecasting for Encoder-Decoder LSTM Model With Multivariate Input section.
Here is my dataset description after reshaping the train and test set.
print(dataset.shape)
print(train_x.shape, train_y.shape)
print((test.shape)
(2192, 15)
(1806, 14, 14) (1806, 7, 1)
(364, 15)
In above I have n_input=14, n_out=7.
Here is my lstm model description:
def build_model(train, n_input):
# prepare data
train_x, train_y = to_supervised(train, n_input)
# define parameters
verbose, epochs, batch_size = 2, 100, 16
n_timesteps, n_features, n_outputs = train_x.shape[1], train_x.shape[2], train_y.shape[1]
# reshape output into [samples, timesteps, features]
train_y = train_y.reshape((train_y.shape[0], train_y.shape[1], 1))
# define model
model = Sequential()
model.add(LSTM(200, activation='relu', input_shape=(n_timesteps, n_features)))
model.add(RepeatVector(n_outputs))
model.add(LSTM(200, activation='relu', return_sequences=True))
model.add(TimeDistributed(Dense(100, activation='relu')))
model.add(TimeDistributed(Dense(1)))
model.compile(loss='mse', optimizer='adam')
# fit network
model.fit(train_x, train_y, epochs=epochs, batch_size=batch_size, verbose=verbose)
return model
On evaluating the model, I am getting the output as:
Epoch 98/100
- 8s - loss: 64.6554
Epoch 99/100
- 7s - loss: 64.4012
Epoch 100/100
- 7s - loss: 63.9625
According to my understanding: (Please correct me if I am wrong)
Here my model accuracy is 63.9625 (by seeing the last epoch 100). Also, this is not stable since there is a gap between epoch 99 and epoch 100.
Here are my questions:
How epoch and batch size above defined is related to gaining model accuracy? How its increment and decrement affect model accuracy?
Is my above-defined epoch, batch, n_input is correct for the model?
How can I increase my model accuracy? Is the above dataset size is good enough for this model?
I am not able to link all this parameter and kindly help me in understanding how to achieve more accuracy by the above factor.
Having a very large epoch size will not necessarily improve your accuracy. Epoch sizes can increase the accuracy up to a certain limit beyond which you begin to overfit your model. Having a very low one will also result in underfitting. See this. So looking at the huge difference between epoch 99 and epoch 100, you can already tell that you are overfitting the model. As a rule of thumb, when you notice the accuracy stops increasing, that is the ideal number of epochs you should have usually between 1 and 10. 100 seems too much already.
Batch size does not affect your accuracy. This is just used to control the speed or performance based on the memory in your GPU. If you have huge memory, you can have a huge batch size so training will be faster.
What you can do to increase your accuracy is:
1. Increase your dataset for the training.
2. Try using Convolutional Networks instead. Find more on convolutional networks from this youtube channel or in a nutshell, CNN's help you identify what features to focus on in training your model.
3. Try other algorithms.
There is no well defined formula for batch size. Typically a larger batch size will run faster, but may compromise your accuracy. You will have to play around with the number.
However, one component with regards to epochs that you are missing is validation. It is normal to have a validation dataset and observe whether this accuracy over this dataset goes up or down. If the accuracy over this dataset goes up, you can multiply your learning rate by 0.8. See this link: https://machinelearningmastery.com/difference-test-validation-datasets/
I know this a very bad thing to do but I noticed something strange using keras mobilenet :
I use the same data for training and validation set :
train_generator = train_datagen.flow_from_directory(
train_dir,
target_size=(IM_WIDTH, IM_HEIGHT),
batch_size=batch_size,
class_mode = "categorical"
)
validation_generator = train_datagen.flow_from_directory(
train_dir,
target_size=(IM_WIDTH, IM_HEIGHT),
class_mode = "categorical"
)
but I don't get the same accuracy on both !
epoch 30/30 - loss: 0.3485 - acc: 0.8938 - val_loss: 1.7545 - val_acc: 0.4406
It seems that I am overfitting the training set compared to the validation set.. but they are supposed to be the same ! How is that possible ?
The training loss is calculated on the fly and only the validation loss is calculated after the epoch is trained. So at the beginning a nearly untrained net will make the training loss worse that it actually is. This effect should vanish in later epochs, since then one epochs mpact on the scoring is not that big anymore.
This behaviour is adressed in keras faq.
If you evaluate both at the end of epoch with a self written callback, they should be the same.
For people reading this after a while :
I still don't understand how this issue happened but it helped a lot working on the batchsize (reducing it).
I am using keras and RNN to classify slack text data on whether the text is reaction worthy or not (1 - emoji, 0 - no emoji). I have removed usernames and urls from the text as well as dropped duplicates with different target variables.
I am not able to get the model to generalize to unseen data. The loss of the train/val sets look good and continually decrease but the accuracy of the val set only decreases.
I am using a pretrained GLOVE word embedding since my training size is only about 25,000 sentences.
I have added additional layers, changed my regularization value and increased dropout but get similar results. Is my model not complex enough to generalize the data? The times i added additional layers they were much smaller but deeper because the training time was about 2 min per epoch.
Any insight would be appreciated.
embedding_layer = Embedding(len(word_index) + 1,
100,
weights=[embeddings_matrix],
input_length=max_message_length,
embeddings_regularizer=l2(0.001),
trainable=True)
# Creating the Model
model = Sequential()
model.add(embedding_layer)
model.add(Convolution1D(filters=32, kernel_size=3, padding='same', activation='relu'))
model.add(MaxPooling1D(pool_size=2))
model.add(Dropout(0.7))
model.add(layers.GRU(128))
model.add(Dropout(0.7))
model.add(Dense(1, activation='sigmoid'))
# Compiling the model with our given Optimizer
optimizer = Adam(lr=0.001, beta_1=0.9, beta_2=0.999, epsilon=1e-08, decay=0.000025)
model.compile(loss='binary_crossentropy', optimizer=optimizer, metrics=['accuracy'])
print(model.summary())