Keras model.load_weights(WEIGHTS) provide inaccurate results - tensorflow

I'm training a LSTM RNN for description generation using Keras (Tensorflow Backend) with MSCOCO dataset. When training the model it had 92% accuracy with 0.79 loss. Further when the model was training I tested the description generation at each epoch and the model provided very good predictions with a meaningful description when it gives a random word.
However after training I loaded the model using model.load_weights(WEIGHTS) method in Keras and tried to create a description by giving a random word as I've done before. But now model is not providing a meaningful description and it just outputs random words which has no meaning at all.
Can anyone tell me what could be the issue for this ?
My model parameters are:
10 LSTM layers, Learning rate: 0.04, Activation: Softmax, Loss Function: Categorical Cross entropy, Optimizer: rmsprop
UPDATE:
This is my model:
model = Sequential()
model.add(LSTM(HIDDEN_DIM, input_shape=(None, VOCAB_SIZE), return_sequences=True))
for i in range(LAYER_NUM - 1):
model.add(LSTM(HIDDEN_DIM, return_sequences=True))
model.add(TimeDistributed(Dense(VOCAB_SIZE)))
model.add(Activation('softmax'))
model.add(Dropout(0.04))
model.compile(loss="categorical_crossentropy", optimizer="rmsprop", metrics=['accuracy'])
This is how I train & save my model weights (I generate a description at each epoch to test the accuracy):
model.fit(X, Y, batch_size=BATCH_SIZE, verbose=1, epochs=EPOCHS)
EPOCHS += 1
generate_description(model, GENERATE_LENGTH, VOCAB_SIZE, index_to_word) model.save_weights('checkpoint_layer_{}_hidden_{}_epoch_{}.hdf5'.format(LAYER_NUM, HIDDEN_DIM, EPOCHS))
This is how I load my model (WEIGHTS = my saved model):
model.load_weights(WEIGHTS)
desc = generate_description(model, GENERATE_LENGTH, VOCAB_SIZE, index_to_word)
print(desc)
I provide randomly generated vector to my model for testing. This is how I generate the description.
def generate_description(model, length, vocab_size, index_to_word):
index = [np.random.randint(vocab_size)]
Y_word = [index_to_word[index[-1]]]
X = np.zeros((1, length, vocab_size))
for i in range(length):
# Appending the last predicted word to next timestep
X[0, i, :][index[-1]] = 1
print(index_to_word[index[-1]])
index = np.argmax(model.predict(X[:, :i + 1, :])[0], 1)
Y_word.append(index_to_word[index[-1]])
Y_word.append(' ')
return ('').join(Y_word)

Related

Why is my val_loss starting low and increasing does it even matters with transfer learning?

Hey guys I am new to machine learning and I am running this code from https://machinelearningmastery.com/how-to-develop-a-convolutional-neural-network-to-classify-photos-of-dogs-and-cats/
I want to understand why my val_loss starts low and then increasing. is this overfitting or under-fitting? also what can I use to improve the val_loss so it will give a better fit? in the blog post his cross entropy plot is a lot different from mine.
def define_model():
# load model
model = VGG16(include_top=False, input_shape=(224, 224, 3))
# mark loaded layers as not trainable
for layer in model.layers:
layer.trainable = False
# add new classifier layers
flat1 = Flatten()(model.layers[-1].output)
class1 = Dense(128, activation='relu', kernel_initializer='he_uniform')(flat1)
output = Dense(1, activation='sigmoid')(class1)
# define new model
model = Model(inputs=model.inputs, outputs=output)
# compile model
opt = SGD(lr=0.001, momentum=0.9)
model.compile(optimizer=opt, loss='binary_crossentropy', metrics=['accuracy'])
return model
# plot diagnostic learning curves
def summarize_diagnostics(history):
# plot loss
pyplot.subplot(211)
pyplot.title('Cross Entropy Loss')
pyplot.plot(history.history['loss'], color='blue', label='train')
pyplot.plot(history.history['val_loss'], color='orange', label='test')
# plot accuracy
pyplot.subplot(212)
pyplot.title('Classification Accuracy')
pyplot.plot(history.history['accuracy'], color='blue', label='train')
pyplot.plot(history.history['val_accuracy'], color='orange', label='test')
# save plot to file
filename = sys.argv[0].split('/')[-1]
pyplot.savefig(filename + '_plot.png')
pyplot.close()
# run the test harness for evaluating a model
def run_test_harness():
# define model
model = define_model()
# create data generator
datagen = ImageDataGenerator(featurewise_center=True)
# specify imagenet mean values for centering
datagen.mean = [123.68, 116.779, 103.939]
# prepare iterator
train_it = datagen.flow_from_directory('dataset_dogs_vs_cats/train/',
class_mode='binary', batch_size=64, target_size=(224, 224))
test_it = datagen.flow_from_directory('dataset_dogs_vs_cats/test/',
class_mode='binary', batch_size=64, target_size=(224, 224))
# fit model
history = model.fit_generator(train_it, steps_per_epoch=len(train_it),
validation_data=test_it, validation_steps=len(test_it), epochs=10, verbose=1)
# evaluate model
_, acc = model.evaluate_generator(test_it, steps=len(test_it), verbose=1)
print('> %.3f' % (acc * 100.0))
# learning curves
summarize_diagnostics(history)
model.save('Transfer_Learning_Model.h5')
# entry point, run the test harness
run_test_harness()
A number of issues. The VGG model was trained with pixel values from -1 to +1 so you need to add the following
def scaler(x):
y=x/127.5-1
return y
datagen = ImageDataGenerator(preprocessing_function=scaler)
I would remove featurewise_center=True. If you use it you have to run fit on the train set.
In model.fit you have steps_per_epoch=len(train_it) and validation_steps=len(test_it).
Since you set a batch_size of 64 in the generators these values should be
steps_per_epoch=len(train_it.labels)//64 and validaion_steps=len(test_it)/64. Actually in model.fit leave these parameters out. Model.fit will calculate the right values internally. Your validation loss curve indicates a small degree of over fitting. Add a drop out layer after the class1 layer. Set the dropout rate to something like.2.
If you want to push for the highest degree of accuracy I recommend you incorporate two Keras callbacks. The EarlyStopping callback monitors validation loss and halts training if the loss fails to reduce after 'patience' number of consecutive epochs. Setting restore_best_weights=True will load the weights for the epoch with the lowest validation loss so you don't have to save then reload the weights. Set epochs to a large number to ensure this callback activates. Use the the keras callback ReduceLROnPlateau to automatically adjust the learning rate based on validation loss. The documentation for the callbacks is located here. The code I use is shown below
es=tf.keras.callbacks.EarlyStopping( monitor="val_loss", patience=3,
verbose=1, restore_best_weights=True)
rlronp=tf.keras.callbacks.ReduceLROnPlateau( monitor="val_loss", factor=0.5, patience=1,
verbose=1)
callbacks=[es, rlronp]
in model.fit add callbacks=callbacks

My validation loss of my LSTM model is very volatile

I'm creating an LSTM model to predict the closing price of bitcoin. However, when I started training, my validation loss starts getting very volatile and my test_prediction becomes inaccurate.
Here's my model:
model = Sequential()
model.add(LSTM(80, input_shape=(1,look_back)))
model.add(LSTM(60))
model.compile(optimizer='adam', loss='mean_squared_error')
Fitting the model:
from keras.callbacks import ModelCheckpoint
callbacks = [ModelCheckpoint(save_best_only = True, filepath='btc_close_prediction.h5')]
history = model.fit(xTrain, yTrain, batch_size=10, epochs=30, callbacks=callbacks, validation_split=0.2)
loss graph:
Prediction Plot:
Please advise how can I adjust my model for a better val_loss and better predicting accuracy.
Your validation dataset should comprise of 5000 samples to get your validation loss smooth.
Try transformer model - it requires less training data.

Arbitrary threshold for sigmoid activation function for CNN binary classification?

I am classifying sentiment of reviews - 0 or 1 - using gensim Doc2Vec and CNN in Tensorflow 2.2.0:
model = tf.keras.Sequential([
tf.keras.layers.Embedding(vocab_size, embedding_dim,
input_length=maxlen,
embeddings_initializer=Constant(embedding), trainable=False),
tf.keras.layers.Conv1D(128, 5, activation='relu'),
tf.keras.layers.GlobalMaxPooling1D(),
tf.keras.layers.Dense(10, activation='relu'),
tf.keras.layers.Dense(1, activation='sigmoid')
])
model.compile(loss='binary_crossentropy',
optimizer=tf.keras.optimizers.Adam(1e-4),
metrics=['accuracy'])
history = model.fit(X_train, y_train,
epochs=8,
validation_split=0.3,
batch_size=10)
I then make predictions and convert my sigmoid probability to 0 or 1 using np.round():
predicted = model.predict(X_test)
predicted = np.round(predicted,1).astype(np.int32)
I get great results (~96% accuracy) indicating that the threshold of 0.5 is working as expected...
However, when I try to predict on a set of new data, the model seems to separate bad reviews from good ones but across approx 0.0:
# Example sigmoid outputs for new test reviews:
good_review_1: 0.000052
good_review_2: 0.000098
bad_review_1: 0.112334
bad_review_2: 0.214934
Mind you, the model never saw X_test during training and it is able to predict just fine. It's only when I introduce a new set of review text strings, I run into incorrect predictions. For new reviews, the only preprocessing that I do before calling model.predict() is feeding them through the same tokenizer used for model training:
s = 'This is a sample bad review.'
tokenizer.texts_to_sequences(pd.Series(s))
s = pad_sequences(s, maxlen=maxlen, padding='pre', truncating='pre')
model.predict(s)
I've been trying to make sense of this conundrum but I'm making little progress. I ran into post and it indicates
Some sigmoid functions will have this at 0, while some will have it set to a different 'threshold'.
But this still doesn't explain why my model was able to predict on np.round()'s 0.5 threshold for X_test dataset (which the model never learned on) and then unable to predict on new dataset at the same 0.5 threshold...

Why I'm getting bad result with Keras vs random forest or knn?

I'm learning deep learning with keras and trying to compare the results (accuracy) with machine learning algorithms (sklearn) (i.e random forest, k_neighbors)
It seems that with keras I'm getting the worst results.
I'm working on simple classification problem: iris dataset
My keras code looks:
samples = datasets.load_iris()
X = samples.data
y = samples.target
df = pd.DataFrame(data=X)
df.columns = samples.feature_names
df['Target'] = y
# prepare data
X = df[df.columns[:-1]]
y = df[df.columns[-1]]
# hot encoding
encoder = LabelEncoder()
y1 = encoder.fit_transform(y)
y = pd.get_dummies(y1).values
# split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3)
# build model
model = Sequential()
model.add(Dense(1000, activation='tanh', input_shape = ((df.shape[1]-1),)))
model.add(Dense(500, activation='tanh'))
model.add(Dense(250, activation='tanh'))
model.add(Dense(125, activation='tanh'))
model.add(Dense(64, activation='tanh'))
model.add(Dense(32, activation='tanh'))
model.add(Dense(9, activation='tanh'))
model.add(Dense(y.shape[1], activation='softmax'))
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
model.fit(X_train, y_train)
score, acc = model.evaluate(X_test, y_test, verbose=0)
#results:
#score = 0.77
#acc = 0.711
I have tired to add layers and/or change number of units per layer and/or change the activation function (to relu) by it seems that the result are not higher than 0.85.
With sklearn random forest or k_neighbors I'm getting result (on same dataset) above 0.95.
What am I missing ?
With sklearn I did little effort and got good results, and with keras, I had a lot of upgrades but not as good as sklearn results. why is that ?
How can I get same results with keras ?
In short, you need:
ReLU activations
Simpler model
Data mormalization
More epochs
In detail:
The first issue here is that nowadays we never use activation='tanh' for the intermediate network layers. In such problems, we practically always use activation='relu'.
The second issue is that you have build quite a large Keras model, and it might very well be the case that with only 100 iris samples in your training set you have too few data to effectively train such a large model. Try reducing drastically both the number of layers and the number of nodes per layer. Start simpler.
Large neural networks really thrive when we have lots of data, but in cases of small datasets, like here, their expressiveness and flexibility may become a liability instead, compared with simpler algorithms, like RF or k-nn.
The third issue is that, in contrast to tree-based models, like Random Forests, neural networks generally require normalizing the data, which you don't do. Truth is that knn also requires normalized data, but in this special case, since all iris features are in the same scale, it does not affect the performance negatively.
Last but not least, you seem to run your Keras model for only one epoch (the default value if you don't specify anything in model.fit); this is somewhat equivalent to building a random forest with a single tree (which, BTW, is still much better than a single decision tree).
All in all, with the following changes in your code:
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
model = Sequential()
model.add(Dense(150, activation='relu', input_shape = ((df.shape[1]-1),)))
model.add(Dense(150, activation='relu'))
model.add(Dense(y.shape[1], activation='softmax'))
model.fit(X_train, y_train, epochs=100)
and everything else as is, we get:
score, acc = model.evaluate(X_test, y_test, verbose=0)
acc
# 0.9333333373069763
We can do better: use slightly more training data and stratify them, i.e.
X_train, X_test, y_train, y_test = train_test_split(X, y,
test_size = 0.20, # a few more samples for training
stratify=y)
And with the same model & training epochs you can get a perfect accuracy of 1.0 in the test set:
score, acc = model.evaluate(X_test, y_test, verbose=0)
acc
# 1.0
(Details might differ due to some randomness imposed by default in such experiments).
Adding some dropout might help you improve accuracy. See Tensorflow's documentation for more information.
Essentially how you add a Dropout layer is just very similar to how you added those Dense() layers.
model.add(Dropout(0.2)
Note: The parameter '0.2 implies that 20% of the connections in the layer is randomly omitted to reduce the interdependencies between them, which reduces overfitting.

What does it mean when training and validation accuracy are 1.000 but results are still poor?

I am using Keras to perform landmark detection - specifically locating parts of the body on a picture of a human. I have gathered around 2,000 training samples and am using rmsprop w/ mse loss function. After training my CNN, I am left with loss: 3.1597e-04 - acc: 1.0000 - val_loss: 0.0032 - val_acc: 1.0000
I figured this would mean my model would perform well on the test data, however, instead the predicted points are way off from the labeled points. Any ideas or help would be greatly appreciated!
IMG_SIZE = 96
NUM_KEYPOINTS = 15
NUM_EPOCHS = 50
NUM_CHANNELS = 1
TESTING = True
def load(test=False):
# load data from CSV file
df = pd.read_csv(fname)
# convert Image to numpy arrays
df['Image'] = df['Image'].apply(lambda im: np.fromstring(im, sep=' '))
df = df.dropna() # drop rows with missing values
X = np.vstack(df['Image'].values) / 255. # scale pixel values to [0, 1]
X = X.reshape(X.shape[0], IMG_SIZE, IMG_SIZE, NUM_CHANNELS)
X = X.astype(np.float32)
y = df[df.columns[:-1]].values
y = (y - (IMG_SIZE / 2)) / (IMG_SIZE / 2) # scale target coordinates to [-1, 1]
X, y = shuffle(X, y, random_state=42) # shuffle train data
y = y.astype(np.float32)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=42)
return X_train, X_test, y_train, y_test
def build_model():
# construct the neural network
model = Sequential()
model.add(Conv2D(16, (3, 3), activation='relu', input_shape=(IMG_SIZE, IMG_SIZE, NUM_CHANNELS)))
model.add(MaxPooling2D(2, 2))
model.add(Conv2D(32, (3, 3), activation='relu'))
model.add(MaxPooling2D(2, 2))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D(2, 2))
model.add(Flatten())
model.add(Dropout(0.5))
model.add(Dense(500, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(NUM_KEYPOINTS * 2))
return model
if __name__ == '__main__':
X_train, X_test, y_train, y_test = load(test=TESTING)
model = build_model()
sgd = optimizers.SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(optimizer=sgd, loss='mse', metrics=['accuracy'])
hist = model.fit(X_train, y_train, epochs=NUM_EPOCHS, verbose=1, validation_split=0.2)
# save the model
model.save_weights("/output/model_weights.h5")
histFile = open("/output/training_history", "wb")
pickle.dump(hist.history, histFile)
According to this question How does keras define "accuracy" and "loss"? your "accuracy" is defined as categorical accuracy which makes absolutely no sense for your problem.
After training you are left with a 10x difference between your training loss and validation loss which would suggest overfitting (hard to say for sure without a graph and some examples).
To start fixing it:
Use a metric that makes sense in your context and you understand what it does and how it's computed.
Take random examples where the metric is very good and where is very bad and manually validate that that is really the case (otherwise you need a different metric).
In your case I would imagine a metric based on the distance between the desired location and the predicted ones. This is not a default thing and you would have to implement it yourself.
Always be suspicious if the model says it's perfect.
It is impossible to tell from your question, but I will venture a guess here by some implications of your data split.
Typically, when one splits one's data into more than two sets, one is using all but one of them to train on some parameter or another. For example, the first split is used to choose the model weights, the second split to choose the model architecture, etc. Presumably you are training something with your 'validation' set, otherwise you wouldn't have it. Thus, the problem is almost certainly overfitting. The way that you detect overfitting, usually, is the difference in the accuracy of your model on data used to train your model (which is usually everything except for one single split) which you are calling your 'training' and 'validation' splits, and the accuracy of a split which your model has not touched, which you are calling your 'test' split.
So, per your question-comment "I assume if the validation accuracy is that high then there is no overfitting, right?". No. If the difference between the accuracy of your model on any data that you've used to train anything at all is higher than the accuracy of your model on data that your model has never touched in any way shape form or fashion, then you've overfit. Which seems to be the case with you.
OTOH, it may be the case that you've simply not shuffled your data. It's impossible to tell without having a look-see at the training/testing pipeline.