tensorflow 2 evaluate inconsistent with sklearn accuracy_score - tensorflow

I try to train a model to predict gender using Celeba dataset and tensorflow.
This is my model:
train_data_gen = train_image_generator.flow_from_dataframe(
dataframe=train_split,
directory=celeba.images_folder,
x_col='id',
y_col='Male',
target_size=(IMG_WIDTH, IMG_HEIGHT),
batch_size=batch_size,
classes=['1', '0']
)
base_model = tf.keras.applications.MobileNetV2(input_shape=IMG_SHAPE,
include_top=False,
weights='imagenet')
model = tf.keras.Sequential([
base_model,
tf.keras.layers.GlobalAveragePooling2D(),
tf.keras.layers.Dense(512, activation='relu'),
tf.keras.layers.Dropout(0.5),
tf.keras.layers.Dense(2),
tf.keras.layers.Softmax()
])
base_learning_rate = 0.001
model.compile(optimizer=tf.keras.optimizers.RMSprop(lr=base_learning_rate),
loss=tf.keras.losses.CategoricalCrossentropy(from_logits=True),
metrics=['accuracy'])
Then I use the following to evaluate the model
test_data_gen = test_image_generator.flow_from_dataframe(
dataframe=test_split,
directory=celeba.images_folder,
x_col='id',
y_col='Male',
target_size=(IMG_WIDTH, IMG_HEIGHT),
batch_size=batch_size,
classes=['1', '0']
)
model = tf.keras.models.load_model("cp-0004.ckpt")
# Re-evaluate the model
loss, acc = model.evaluate(test_data_gen, verbose=2)
which gives accuracy of 0.87
But when I use the following, I get 0.51 accuracy!
pred_test = model.predict(test_data_gen)
pred_df = pd.DataFrame(pred_test, columns=["Male", "Female"])
pred_df[pred_df > 0.5] = "1"
pred_df[pred_df < 0.5] = "0"
# test_split_raw = celeba.split('test', drop_zero=False)
confusion_matrix(test_split["Male"].astype(int).values, np.argmax(pred_df.values, 1))
Can anyone explain why the accuracy from the evaluate function is different?

You want to check test_image_generator.flow_from_dataframe. The default value of shuffle is set to True.
Your generator object therefore yields randomly from your test data.
Your model then predicts for those randomly generated images, but you compare to your ordered dataframe. If you want to compare to test_split["Male"] set shuffle to False. If you don't set shuffle to False you will always get ~0.5 accuracy (If your data is equally distributed)
Another hint: You should use the .evaluate() method if you have labeled data. Using .evaluate() also yields accuracy.
Use .predict() only for new, unlabeled data.

Related

How to provide specific training, validation and test sets in StellarGraph PaddedGraphGenerator -

I am trying to train a graph convolutional neural network using the StellarGraph library. I would like to run this example https://stellargraph.readthedocs.io/en/stable/demos/graph-classification/gcn-supervised-graph-classification.html
but without the N-Fold Crossvalidation by providing my own training, validation and test sets. This is the code I am using (taken from this post)
generator = PaddedGraphGenerator(graphs=graphs)
train_gen = generator.flow([x for x in range(0, len(graphs_train))],
targets=graphs_train_labels,
batch_size=35)
test_gen = generator.flow([x for x in range(len(graphs_train),len(graphs_train) + len(graphs_test))],
targets=graphs_test_labels,
batch_size=35)
# Stopping criterium
es = EarlyStopping(monitor="val_loss",
min_delta=0,
patience=20,
restore_best_weights=True)
# Model definition
gc_model = GCNSupervisedGraphClassification(layer_sizes=[64, 64],
activations=["relu", "relu"],
generator=generator,
dropout=0.5)
x_inp, x_out = gc_model.in_out_tensors()
predictions = Dense(units=32, activation="relu")(x_out)
predictions = Dense(units=16, activation="relu")(predictions)
predictions = Dense(units=1, activation="sigmoid")(predictions)
# Creating Keras model and preparing it for training
model = Model(inputs=x_inp, outputs=predictions)
model.compile(optimizer=Adam(0.001), loss=binary_crossentropy, metrics=["acc"])
# GNN Training
history = model.fit(train_gen, epochs=10, validation_data=test_gen, verbose=1)
model.fit(x=graphs_train,
y=graphs_train_labels,
epochs=10,
verbose=1,
callbacks=[es])
# Calculate performance on the validation data
test_metrics = model.evaluate(valid_gen, verbose=1)
valid_acc = test_metrics[model.metrics_names.index("acc")]
print(f"Test Accuracy model = {valid_acc}")
But at the end I am getting this error
ValueError: Failed to find data adapter that can handle input: (<class 'list'> containing values of types {"<class 'stellargraph.core.graph.StellarGraph'>"}), <class 'numpy.ndarray'>
What am I missing here? Is it because of the way I have created the graphs? In my case the graphs is a list which contains the stellar graphs
Problem solved. I was calling
model.fit(x=graphs_train,
y=graphs_train_labels,
epochs=10,
verbose=1,
callbacks=[es])
after the line
history = model.fit(train_gen, epochs=10, validation_data=test_gen, verbose=1)

Getting constant accuracies for training and validation sets despite their losses are changing during CNN training?

As the title clearly describes the issue I've been experiencing during the training of my CNN model, the accuracies of training and validation sets are constant despite the losses of them are changing. I have included the detail regarding the model and its training setup below. What may cause this issue?
Here is the data that was used by training (X_train & y_train), validation, and test sets (X_test and y_test):
df = pd.read_csv(CSV_PATH, sep=',', header=None)
print(f'Shape of all data: {df.shape}')
y = df.iloc[:, -1].values
X = df.iloc[:, :-1].values
encoder = LabelEncoder()
encoder.fit(y)
encoded_Y = encoder.transform(y)
dummy_y = to_categorical(encoded_Y)
X_train, X_test, y_train, y_test = train_test_split(X, dummy_y, test_size=0.3, random_state=RANDOM_STATE)
X_train = X_train.reshape((X_train.shape[0], X_train.shape[1], 1))
X_test = X_test.reshape((X_test.shape[0], X_test.shape[1], 1))
Here are the shapes of training and test sets:
Shape of X_train: (1322, 10800, 1)
Shape of Y_train: (1322, 3)
Shape of X_test: (567, 10800, 1)
Shape of y_test: (567, 3)
Here is my CNN model:
# Model hyper-parameters
activation_fn = 'relu'
n_lr = 1e-4
weight_decay = 1e-4
batch_size = 64
num_epochs = 200*10*10
num_classes = 3
n_dropout = 0.6
n_momentum = 0.5
n_kernel = 5
n_reg = 1e-5
# the sequential model
model = Sequential()
model.add(Conv1D(128, n_kernel, input_shape=(10800, 1)))
model.add(BatchNormalization())
model.add(Activation(activation_fn))
model.add(MaxPooling1D(pool_size=2, strides=2))
model.add(Dropout(n_dropout))
model.add(Conv1D(256, n_kernel))
model.add(BatchNormalization())
model.add(Activation(activation_fn))
model.add(MaxPooling1D(pool_size=2, strides=2))
model.add(Dropout(n_dropout))
model.add(GlobalAveragePooling1D()) # have tried model.add(Flatten()) as well
model.add(Dense(256, activation=activation_fn))
model.add(Dropout(n_dropout))
model.add(Dense(64, activation=activation_fn))
model.add(Dropout(n_dropout))
model.add(Dense(num_classes, activation='softmax'))
adam = Adam(lr=n_lr, beta_1=0.9, beta_2=0.999, epsilon=1e-08, decay=weight_decay)
model.compile(loss='categorical_crossentropy', optimizer=adam, metrics=['acc'])
Here is how I have evaluated the model:
Y_pred = model.predict(X_test, verbose=0)
y_pred = np.argmax(Y_pred, axis=1)
y_test_int = np.argmax(y_test, axis=1)
And, my model always predicts the same class of three classes during the model evaluation as you can see from the classification result below (via classification_result(y_test_int, y_pred) function):
precision recall f1-score support
normal 0.743 1.000 0.852 421
apb 0.000 0.000 0.000 45
pvc 0.000 0.000 0.000 101
The model was trained using the EarlyStopping callback of Keras. Thus, the training has continued for 4,173 epochs. Here is the obtained losses during the training for training and validation sets:
Here are the obtained accuracies during the training for training and validation sets:
The model was implemented using Keras and hosted on Google Colab.
Although such issues are difficult to resolve without the data, there are a couple of general rules applicable.
The very first thing we do when the model does not seem to learn anything, like here (despite the mild drop in the loss), is to remove all dropout.
In fact, dropout is not supposed to be used by default; its nominal function is to guard against overfitting - but of course, before starting to worry about overfitting, you must first have some success with fitting, something that is clearly not happening here. The fact that, with a dropout rate of n_dropout = 0.6, you also seem to be rather too aggressive in its use, does not help, either.

Tensorflow Hub vs Keras application - performance drop

I have image classification problem and i want to use Keras pretrained models for this task.
When I use such a model
model = tf.keras.Sequential([
hub.KerasLayer("https://tfhub.dev/google/tf2-preview/mobilenet_v2/feature_vector/4",
output_shape=[1280],
trainable=False),
tf.keras.layers.Dropout(0.5),
tf.keras.layers.Dense(num_classes, activation='softmax')
])
model.build([None, image_size[0], image_size[1], 3])
model.compile(
optimizer=tf.keras.optimizers.Adam(),
loss='categorical_crossentropy',
metrics=['acc'])
I easily get ~90% accuracy and very low loss on balanced dataset. However, if use keras.application like that:
`base_model = tf.keras.applications.mobilenet_v2.MobileNetV2(
input_shape=input_img_size,
include_top=False,
weights='imagenet'
)
base_model.trainable = False
model = tf.keras.layers.Dropout(0.5)(model)
model = tf.keras.layers.Dense(num_classes, activation='softmax')(model)
model = tf.keras.models.Model(inputs=base_model.input, outputs=model)
model.compile(
optimizer=tf.keras.optimizers.Adam(),
loss='categorical_crossentropy',
metrics=['acc'])`
and use a proper tf.keras.application.mobilenet_v2.preprocess_input function in datagenerator (and leaving everything else the same) it is stuck at around 60% validation and 80% training.
what is the difference between these approaches? why one is superior to the other?
The data generator:
datagen = tf.keras.preprocessing.image.ImageDataGenerator(
preprocessing_function = preprocessing_function,
rotation_range=10,
zoom_range=0.3,
width_shift_range=0.2,
height_shift_range=0.2,
horizontal_flip=True,
vertical_flip=True,
shear_range=0.2,
)
Training:
history = model.fit_generator(
train_generator,
epochs=nb_epochs,
verbose=1,
steps_per_epoch=steps_per_epoch,
validation_data=valid_generator,
validation_steps=val_steps_per_epoch,
callbacks=[
checkpoint,
learning_rate_reduction,
csv_logger,
tensorboard_callback,
],
)
I believe you are training two different 'models'. In your TensorFlow Hub example, you used mobilenet's feature vector. Feature vector as I understand it, is not the same as a model. It is a 1-D tensor of certain length. It is probably the last layer before the output of the mobilenet model. This is different from the tf.keras example, where you are invoking the full mobilenet model.

Why is my confusion matrix "shifted" to the right?

I built a classifier of 4 flower types based on ResNet 50. The accuracy is really high during training, and everything seems good. However, once I plot my confusion matrix, I see that the values are "shifted" to the right instead of in the main diagonal.
What does this mean? Is it a problem with my dataset, or my code?
Here's what I did to use ResNet 50:
def create_model(input_shape, top='flatten'):
if top not in ('flatten', 'avg', 'max'):
raise ValueError('unexpected top layer type: %s' % top)
# connects base model with new "head"
BottleneckLayer = {
'flatten': Flatten(),
'avg': GlobalAvgPooling2D(),
'max': GlobalMaxPooling2D()
}[top]
base = InceptionResNetV2(input_shape=input_shape,
include_top=False,
weights='imagenet')
x = BottleneckLayer(base.output)
x = Dense(NUM_OF_FLOWERS, activation='linear')(x)
model = Model(inputs=base.inputs, outputs=x)
return model
base = ResNet50(input_shape=input_shape, include_top=False)
x = Flatten()(base.output)
x = Dense(NUM_OF_FLOWERS, activation='softmax')(x)
model = Model(inputs=base.inputs, outputs=x)
Confusion Matrix Generation:
# Predict the values from the validation dataset
Y_pred = model.predict_generator(validation_generator, nb_validation_samples // batch_size+1)
# Convert predictions classes to one hot vectors
Y_pred_classes = numpy.argmax(Y_pred, axis = 1)
# Convert validation observations to one hot vectors
Y_true = validation_generator.classes
# compute the confusion matrix
confusion_mtx = confusion_matrix(Y_true, Y_pred_classes)
# plot the confusion matrix
plot_confusion_matrix(confusion_mtx, classes = range(4))
As requested, this is how I created the generators:
train_generator = train_datagen.flow_from_directory(
train_data_dir,
target_size=(img_width, img_height),
batch_size=batch_size,
color_mode='rgb',
class_mode='categorical',
shuffle=True)
validation_generator = test_datagen.flow_from_directory(
validation_data_dir,
target_size=(img_width, img_height),
batch_size=batch_size,
color_mode='rgb',
class_mode='categorical',
shuffle=False)
Here is an image album of my confusion matrix. Every time I execute model.predict(), the predictions change, always shifting one cell to the right.
Confusion Matrix Album
Yes I imagine it is the code, check your indexing where you create your confusion matrix, it will be off by one
look the validation_generator class. when you use data_generator.flow_from_directory you need see if param shuffle is equal to False like the example above:
val_generator = val_data_generator.flow_from_directory(
test_data_dir,
target_size=(IMAGE_WIDTH, IMAGE_HEIGHT),
batch_size=100,
class_mode="binary",
classes=['dog','cat'],
shuffle=False)
because the default param is True and the only shuffle the images and not labels.
This is an interesting problem. It can be fixed by reloading the imagedatagenerator right before you do a model.predict.
So:
validation_generator = test_datagen.flow_from_directory(
validation_data_dir,
target_size=(img_width, img_height),
batch_size=batch_size,
color_mode='rgb',
class_mode='categorical',
shuffle=False)
Y_pred = model.predict_generator(validation_generator, nb_validation_samples // batch_size+1)
# Convert predictions classes to one hot vectors
Y_pred_classes = numpy.argmax(Y_pred, axis = 1)
# Convert validation observations to one hot vectors
Y_true = validation_generator.classes
# compute the confusion matrix
confusion_mtx = confusion_matrix(Y_true, Y_pred_classes)
# plot the confusion matrix
plot_confusion_matrix(confusion_mtx, classes = range(4))

My loss is "nan" and accuracy is " 0.0000e+00 " in Transfer learning: InceptionV3

I am working on transfer learning. My use case is to classify two categories of images. I used InceptionV3 to classify images. When training my model, I am getting nan as loss and 0.0000e+00 as accuracy in every epoch. I am using 20 epochs because my data amount is small: I got 1000 images for training and 100 for testing and per batch 5 records.
from keras.applications.inception_v3 import InceptionV3
from keras.preprocessing import image
from keras.models import Model
from keras.layers import Dense, GlobalAveragePooling2D
from keras import backend as K
# create the base pre-trained model
base_model = InceptionV3(weights='imagenet', include_top=False)
# add a global spatial average pooling layer
x = base_model.output
x = GlobalAveragePooling2D()(x)
# let's add a fully-connected layer
x = Dense(1024, activation='relu')(x)
x = Dense(512, activation='relu')(x)
x = Dense(32, activation='relu')(x)
# and a logistic layer -- we have 2 classes
predictions = Dense(1, activation='softmax')(x)
# this is the model we will train
model = Model(inputs=base_model.input, outputs=predictions)
for layer in base_model.layers:
layer.trainable = False
# we chose to train the top 2 inception blocks, i.e. we will freeze
# the first 249 layers and unfreeze the rest:
for layer in model.layers[:249]:
layer.trainable = False
for layer in model.layers[249:]:
layer.trainable = True
model.compile(loss="binary_crossentropy", optimizer="adam", metrics=["accuracy"])
from keras.preprocessing.image import ImageDataGenerator
train_datagen = ImageDataGenerator(
rescale=1./255,
shear_range=0.2,
zoom_range=0.2,
horizontal_flip=True)
test_datagen = ImageDataGenerator(rescale=1./255)
training_set = train_datagen.flow_from_directory(
'C:/Users/Desktop/Transfer/train/',
target_size=(64, 64),
batch_size=5,
class_mode='binary')
test_set = test_datagen.flow_from_directory(
'C:/Users/Desktop/Transfer/test/',
target_size=(64, 64),
batch_size=5,
class_mode='binary')
model.fit_generator(
training_set,
steps_per_epoch=1000,
epochs=20,
validation_data=test_set,
validation_steps=100)
It sounds like your gradient is exploding. There could be a few reasons for that:
Check that your input is generated correctly. For example use the save_to_dir parameter of flow_from_directory
Since you have a batch size of 5, fix the steps_per_epoch from 1000 to 1000/5=200
Use sigmoid activation instead of softmax
Set a lower learning rate in Adam; to do that you need to create the optimizer separately like adam = Adam(0.0001) and pass it in model.compile(..., optimizer=adam)
Try VGG16 instead of InceptionV3
Let us know when you tried all of the above.
Using Softmax for the activation does not make sense in case of single class. Your output value will always be normed by itself, thus equals to 1. The purpose of softmax is to make the values sum up to 1. In case of single value you will get it == 1. I believe at some moment in time you got 0 as predicted value, which resulted in zero division and NaN loss value.
You should either change the number of classes to 2 by:
predictions = Dense(2, activation='softmax')(x)
class_mode='categorical' in flow_from_directory
loss="categorical_crossentropy"
or use the sigmoid activation function for the last layer.