Good training accuracy but bad evaluation - tensorflow

I trained a DNN model, get good training accuracy but bad evaluation accuracy.
def DNN_Metrix(shape, dropout):
model = tf.keras.Sequential()
print(shape)
model.add(tf.keras.layers.Flatten(input_shape=shape))
model.add(tf.keras.layers.Dense(10,activation=tf.nn.relu))
for i in range(0,2):
model.add(tf.keras.layers.Dense(10,activation=tf.nn.relu))
model.add(tf.keras.layers.Dense(8,activation=tf.nn.tanh))
model.add(tf.keras.layers.Dense(1, activation=tf.nn.sigmoid))
model.compile(loss='binary_crossentropy',
optimizer=tf.keras.optimizers.Adam(),
metrics=['accuracy'])
return model
model_dnn = DNN_Metrix(shape=(28,20,1), dropout=0.1)
model_dnn.fit(
train_dataset,
steps_per_epoch=1000,
epochs=10,
verbose=2
)
Here is my training process, and result:
Epoch 10/10
- 55s - loss: 0.4763 - acc: 0.7807
But when I evaluation with test dataset, I got:
result = model_dnn.evaluate(np.array(X_test), np.array(y_test), batch_size=len(X_test))
loss, accuracy = [0.9485417604446411, 0.3649936616420746]
it's a binary classification, Positive label : Negetive label is about
0.37 : 0.63
I don't think it was result from overfiting, I have 700k instances when training, with shape of 28 * 20, and my DNN model is simple and have few parameters.
Here is my code when generating the test data and training data:
def parse_function(example_proto):
dics = {
'feature': tf.FixedLenFeature(shape=(), dtype=tf.string, default_value=None),
'label': tf.FixedLenFeature(shape=(2), dtype=tf.float32),
'shape': tf.FixedLenFeature(shape=(2), dtype=tf.int64)
}
parsed_example = tf.parse_single_example(example_proto, dics)
parsed_example['feature'] = tf.decode_raw(parsed_example['feature'], tf.float64)
parsed_example['feature'] = tf.reshape(parsed_example['feature'], [28,20,1])
label_t = tf.cast(parsed_example['label'], tf.int32)
parsed_example['label'] = parsed_example['label'][1]
return parsed_example['feature'], parsed_example['label']
def read_tfrecord(train_tfrecord):
dataset = tf.data.TFRecordDataset(train_tfrecord)
dataset = dataset.map(parse_function)
dataset = dataset.shuffle(buffer_size=10000)
dataset = dataset.repeat(100)
dataset = dataset.batch(670)
return dataset
def read_tfrecord_test(test_tfrecord):
dataset = tf.data.TFRecordDataset(test_tfrecord)
dataset = dataset.map(parse_function)
return dataset
# tf_record_target = 'train_csv_temp_norm_vx.tfrecords'
train_files = 'train_baseline.tfrecords'
test_files = 'test_baseline.tfrecords'
train_dataset = read_tfrecord(train_files)
test_dataset = read_tfrecord_test(test_files)
it_test_dts = test_dataset.make_one_shot_iterator()
it_train_dts = train_dataset.make_one_shot_iterator()
X_test = []
y_test = []
el = it_test_dts.get_next()
count = 1
with tf.Session() as sess:
while True:
try:
x_t, y_t = sess.run(el)
X_test.append(x_t)
y_test.append(y_t)
except tf.errors.OutOfRangeError:
break

Judging from the fact that your data distribution in your test set is [37%-63%] and your final accuracy is 0.365, I would first check the labels predicted on the test set.
Most probably, all your predictions are of class 0, provided that class 0 amounts for 37% of your dataset. In this case, it means that your neural network is not able to learn anything on the training set, and you have a massive scenario of overfitting.
I recommend that you always use a validation set, so that at the end of each epoch, you would check to see if your neural network has learnt anything. In such a situation(like yours), you would see very fast the overfitting issue.

Training accuracy doesn't mean much. A NN can fit any random set of inputs and outputs, even if they're unrelated. That's why you want to use validation data.
After training look at your loss curves, this will give you a better idea of where things are going wrong.
NN's default to just guessing the most popular class it's seen in training data for classification problems. This is usually what happens when you haven't setup your experiment correctly.
And since your dealing with binary classification you might want to look at things like StratifiedKFold which will provided you folds of train/test data were the sample % is persevered.

Related

Tensorflow Probability is stuck with negative binomial distribution on fit, no error

I have this simple bayesian neural network that gets stuck on .fit(x, y)
def get_model(input_shape, loss, optimizer, metrics, kl_weight, output_shape):
inputs = Input(shape=(input_shape))
x = BatchNormalization()(inputs)
x = tfpl.DenseVariational(units=128, activation='tanh', make_posterior_fn=get_posterior, make_prior_fn=get_prior, kl_weight=kl_weight)(x)
count = Dense(1)(x)
logits = Dense(output_shape, activation = 'sigmoid')(x)
neg_binom = tfp.layers.DistributionLambda(
lambda t: tfd.NegativeBinomial(total_count=t[..., 0:1], logits = t[..., 1:]))
cat = Concatenate(axis=-1)([count, logits])
outputs = neg_binom(cat)
model = Model(inputs, outputs)
model.compile(optimizer=optimizer, loss=loss, metrics=metrics)
return model
I do not get an error, it compiles and when I call model.fit(x,y) I just get:
Epoch 1/500
and it's stuck here forever (about 20 minutes I waited for the longest).
When I use a Poisson Layer, which I did before it starts fitting instantly, an epoch runs about 1s.
What could be the cause of this?
Many thanks for your insights and tips of things to try and debug this behaviour.

XGBoost iterative training: Not having all 0,...,C labels in minibatch without erroring

When training XGBoost iteratively for data too large to fit in memory, one may want to use "batches". The problem is, however, that each batch may not contain all 0,...,C labels. This leads to the error ValueError: The label must consist of integer labels of form 0, 1, 2, ..., [num_class-1] -
Is there a way to train XGBoost where we just have some subset of the labels, which may not contain zero?
The code has structure similar to this:
train = module.trainloader
test = module.valloader
# Train on one minibatch to get started
sample = next(iter(loader))
X = xgb.DMatrix(sample[0].numpy(), label=sample[1].numpy())
params = {
'learning_rate': 0.007,
'updater':'refresh',
'process_type': 'update',
}
# Get initial model training
model = xgb.train(params, dtrain=X)
for i, (trainsample, valsample) in enumerate(zip(train, test)):
X_train, y_train = trainsample
X_test, y_test = valsample
X_train = xgb.DMatrix(X_train, labels=y_train)
X_test = xgb.DMatrix(X_test)
model = xgb.train(params, dtrain=X_train, xgb_model=model)
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print(accuracy)

Completely different results using Tensorflow and Pytorch for MobilenetV3 Small

I am using transfer learning from MobileNetV3 Small to predict 5 different points on an image. I am doing this as a regression task.
For both models:
Setting the last 50 layers trainable and adding the same fully connected layers to the end.
Learning rate 3e-2
Batch size 32
Adam optimizer with the same betas
100 epochs
The inputs consist of RGB unscaled images
Pytorch
Model
def _init_weights(m):
if type(m) == nn.Linear:
nn.init.xavier_uniform_(m.weight)
m.bias.data.fill_(0.01)
def get_mob_v3_small():
model = torchvision.models.mobilenet_v3_small(pretrained=True)
children_list = get_children(model)
for c in children_list[:-50]:
for p in c.parameters():
p.requires_grad = False
return model
class TransferMobileNetV3_v2(nn.Module):
def __init__(self,
num_keypoints: int = 5):
super(TransferMobileNetV3_v2, self).__init__()
self.classifier_neurons = num_keypoints*2
self.base_model = get_mob_v3_small()
self.base_model.classifier = nn.Sequential(
nn.Linear(in_features=1024, out_features=1024),
nn.ReLU(),
nn.Linear(in_features=1024, out_features=512),
nn.ReLU(),
nn.Linear(in_features=512, out_features=self.classifier_neurons)
)
self.base_model.apply(_init_weights)
def forward(self, x):
out = self.base_model(x)
return out
Training Script
def train(net, trainloader, testloader, train_loss_fn, optimizer, scaler, args):
len_dataloader = len(trainloader)
for epoch in range(1, args.epochs+1):
net.train()
for batch_idx, sample in enumerate(trainloader):
inputs, labels = sample
inputs, labels = inputs.to(args.device), labels.to(args.device)
optimizer.zero_grad()
with torch.cuda.amp.autocast(args.use_amp):
prediction = net(inputs)
loss = train_loss_fn(prediction, labels)
scaler.scale(loss).backward()
scaler.step(optimizer)
scaler.update()
def main():
args = make_args_parser()
args.device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
seed = args.seed
torch.manual_seed(seed)
torch.cuda.manual_seed_all(seed)
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False
np.random.seed(seed)
loss_fn = nn.MSELoss()
optimizer = optim.Adam(net.parameters(), lr=3e-2,
betas=(0.9, 0.999))
scaler = torch.cuda.amp.GradScaler(enabled=args.use_amp)
train(net, train_loader, test_loader, loss_fn, optimizer, scaler, args)
Tensorflow
Model
base_model = tf.keras.applications.MobileNetV3Small(weights='imagenet',
input_shape=(224,224,3))
x_in = base_model.layers[-6].output
x = Dense(units=1024, activation="relu")(x_in)
x = Dense(units=512, activation="relu")(x)
x = Dense(units=10, activation="linear")(x)
model = Model(inputs=base_model.input, outputs=x)
for layer in model.layers[:-50]:
layer.trainable=False
Training Script
model.compile(loss = "mse",
optimizer = tf.keras.optimizers.Adam(learning_rate=3e-2))
history = model.fit(input_numpy, output_numpy,
verbose=1,
batch_size=32, epochs=100,validation_split = 0.2)
Results
The PyTorch model predicts one single point around the center for all 5 different points.
The Tensorflow model predicts the points quite well and are quite accurate.
The loss in the Pytorch model is much higher than the Tensorflow model.
Please do let me know what is going wrong as I am trying my best to shift to PyTorch for this work and I need this model to give me similar/identical results. Please do let me know what is going wrong as I am trying my best to shift to PyTorch for this work and I need this model to give me similar/identical results.
Note: I also noticed that the MobileNetV3 Small model seems to be different in PyTorch and different in Tensorflow. I do not know if am interpreting it wrong, but I'm putting it here just in case.

Selecting loss and metrics for Tensorflow model

I'm trying to do transfer learning, using a pretrained Xception model with a newly added classifier.
This is the model:
base_model = keras.applications.Xception(
weights="imagenet",
input_shape=(224,224,3),
include_top=False
)
The dataset I'm using is oxford_flowers102 taken directly from tensorflow datasets.
This is a dataset page.
I have a problem with selecting some parameters - either training accuracy shows suspiciously low values, or there's an error.
I need help with specifying this parameter, for this (oxford_flowers102) dataset:
Newly added dense layer for the classifier. I was trying with:
outputs = keras.layers.Dense(102, activation='softmax')(x) and I'm not sure whether I should select the activation function here or not.
loss function for model.
metrics.
I tried:
model.compile(
optimizer=keras.optimizers.Adam(),
loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=[keras.metrics.Accuracy()],
)
I'm not sure whether it should be SparseCategoricalCrossentropy or CategoricalCrossentropy, and what about from_logits parameter?
I'm also not sure whether should I choose for metricskeras.metrics.Accuracy() or keras.metrics.CategoricalAccuracy()
I am definitely lacking some theoretical knowledge, but right now I just need this to work. Looking forward to your answers!
About the data set: oxford_flowers102
The dataset is divided into a training set, a validation set, and a test set. The training set and validation set each consist of 10 images per class (totaling 1020 images each). The test set consists of the remaining 6149 images (minimum 20 per class).
'test' 6,149
'train' 1,020
'validation' 1,020
If we check, we'll see
import tensorflow_datasets as tfds
tfds.disable_progress_bar()
data, ds_info = tfds.load('oxford_flowers102',
with_info=True, as_supervised=True)
train_ds, valid_ds, test_ds = data['train'], data['validation'], data['test']
for i, data in enumerate(train_ds.take(3)):
print(i+1, data[0].shape, data[1])
1 (500, 667, 3) tf.Tensor(72, shape=(), dtype=int64)
2 (500, 666, 3) tf.Tensor(84, shape=(), dtype=int64)
3 (670, 500, 3) tf.Tensor(70, shape=(), dtype=int64)
ds_info.features["label"].num_classes
102
So, it has 102 categories or classes and the target comes with an integer with different shapes input.
Clarification
First, if you keep this integer target or label, you should use sparse_categorical_accuracy for accuracy and sparse_categorical_crossentropy for loss function. But if you transform your integer label to a one-hot encoded vector, then you should use categorical_accuracy for accuracy, and categorical_crossentropy for loss function. As these data set have integer labels, you can choose sparse_categorical or you can transform the label to one-hot in order to use categorical.
Second, if you set outputs = keras.layers.Dense(102, activation='softmax')(x) to the last layer, you will get probabilities score. But if you set outputs = keras.layers.Dense(102)(x), then you will get logits. So, if you set activations='softmax', then you should not use from_logit = True. For example in your above code you should do as follows (here's some theory for you):
...
(a)
# Use softmax activation (no logits output)
outputs = keras.layers.Dense(102, activation='softmax')(x)
...
model.compile(
optimizer=keras.optimizers.Adam(),
loss=keras.losses.SparseCategoricalCrossentropy(from_logits=False),
metrics=[keras.metrics.Accuracy()],
)
or,
(b)
# no activation, output will be logits
outputs = keras.layers.Dense(102)(x)
...
model.compile(
optimizer=keras.optimizers.Adam(),
loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=[keras.metrics.Accuracy()],
)
Third, keras uses string identifier such as metrics=['acc'] , optimizer='adam'. But in your case, you need to be a bit more specific as you mention loss function specific. So, instead of keras.metrics.Accuracy(), you should choose keras.metrics.SparseCategoricalAccuracy() if you target are integer or keras.metrics.CategoricalAccuracy() if your target are one-hot encoded vector.
Code Examples
Here is an end-to-end example. Note, I will transform integer labels to a one-hot encoded vector (right now, it's a matter of preference to me). Also, I want probabilities (not logits) from the last layer which means from_logits = False. And for all of these, I need to choose the following parameters in my training:
# use softmax to get probabilities
outputs = keras.layers.Dense(102,
activation='softmax')(x)
# so no logits, set it false (FYI, by default it already false)
loss = keras.losses.CategoricalCrossentropy(from_logits=False),
# specify the metrics properly
metrics = keras.metrics.CategoricalAccuracy(),
Let's complete the whole code.
import tensorflow_datasets as tfds
tfds.disable_progress_bar()
data, ds_info = tfds.load('oxford_flowers102',
with_info=True, as_supervised=True)
train_ds, valid_ds, test_ds = data['train'], data['validation'], data['test']
NUM_CLASSES = ds_info.features["label"].num_classes
train_size = len(data['train'])
batch_size = 64
img_size = 120
Preprocess and Augmentation
import tensorflow as tf
# pre-process functions
def normalize_resize(image, label):
image = tf.cast(image, tf.float32)
image = tf.divide(image, 255)
image = tf.image.resize(image, (img_size, img_size))
label = tf.one_hot(label , depth=NUM_CLASSES) # int to one-hot
return image, label
# augmentation
def augment(image, label):
image = tf.image.random_flip_left_right(image)
return image, label
train = train_ds.map(normalize_resize).cache().map(augment).shuffle(100).\
batch(batch_size).repeat()
valid = valid_ds.map(normalize_resize).cache().batch(batch_size)
test = test_ds.map(normalize_resize).cache().batch(batch_size)
Model
from tensorflow import keras
base_model = keras.applications.Xception(
weights='imagenet',
input_shape=(img_size, img_size, 3),
include_top=False)
base_model.trainable = False
inputs = keras.Input(shape=(img_size, img_size, 3))
x = base_model(inputs, training=False)
x = keras.layers.GlobalAveragePooling2D()(x)
outputs = keras.layers.Dense(NUM_CLASSES, activation='softmax')(x)
model = keras.Model(inputs, outputs)
Okay, additionally, here I like to use two metrics to compute top-1 and top-3 accuracy.
model.compile(optimizer=keras.optimizers.Adam(),
loss=keras.losses.CategoricalCrossentropy(),
metrics=[
keras.metrics.TopKCategoricalAccuracy(k=3, name='acc_top3'),
keras.metrics.TopKCategoricalAccuracy(k=1, name='acc_top1')
])
model.fit(train, steps_per_epoch=train_size // batch_size,
epochs=20, validation_data=valid, verbose=2)
...
Epoch 19/20
15/15 - 2s - loss: 0.2808 - acc_top3: 0.9979 - acc_top1: 0.9917 -
val_loss: 1.5025 - val_acc_top3: 0.8147 - val_acc_top1: 0.6186
Epoch 20/20
15/15 - 2s - loss: 0.2743 - acc_top3: 0.9990 - acc_top1: 0.9885 -
val_loss: 1.4948 - val_acc_top3: 0.8147 - val_acc_top1: 0.6255
Evaluate
# evaluate on test set
model.evaluate(test, verbose=2)
97/97 - 18s - loss: 1.6482 - acc_top3: 0.7733 - acc_top1: 0.5994
[1.648208498954773, 0.7732964754104614, 0.5994470715522766]

CNN with imbalanced data stuck with 70% testing accuracy

I'm working on image classification task for diabetic retinopathy with fundus image data. There are 5 classes. The data distribution is 1805 images (class 1), 370 images (class 2), 999 images (class 3), 193 images (class 4), 295 images (class 5).
Here are the steps that I have tried to run:
Preprocessing (resized 224 * 224)
The divide of train and test data is 85% : 15%
x_train, xtest, y_train, ytest = train_test_split(
x_train, y_train,
test_size = 0.15,
random_state=SEED,
stratify = y_train
)
Data agumentation
ImageDataGenerator(
zoom_range=0.15,
fill_mode='constant',
cval=0.,
horizontal_flip=True,
vertical_flip=True,
)
Training with the ResNet-50 model and cross-validation
def getResNet():
modelres = ResNet50(weights=None, include_top=False, input_shape= (IMAGE_HEIGHT,IMAGE_HEIGHT, 3))
x = modelres.output
x = GlobalAveragePooling2D()(x)
x = Dense(5, activation= 'softmax')(x)
model = Model(inputs = modelres.input, outputs = x)
return model
num_folds = 5
skf = StratifiedKFold(n_splits = 5, shuffle=True, random_state=2021)
cvscores = []
fold = 1
for train, val in skf.split(x_train, y_train.argmax(1)):
print('Fold: ', fold)
Xtrain = x_train[train]
Xval = x_train[val]
Ytrain = y_train[train]
Yval = y_train[val]
data_generator = create_datagen().flow(Xtrain, Ytrain, batch_size=32, seed=2021)
model = getResNet()
model.compile(loss='categorical_crossentropy',
optimizer=Adam(lr=0.0001),
metrics=['accuracy'])
with tf.compat.v1.device('/device:GPU:0'):
model_train = model.fit(data_generator,
validation_data=(Xval, Yval),
epochs=30, batch_size = 32, verbose=1)
model_name = 'cnn_keras_aug_Fold_'+str(fold)+'.h5'
model.save(model_name)
scores = model.evaluate(xtest, ytest, verbose=0)
print("%s: %.2f%%" % (model.metrics_names[1], scores[1]*100))
cvscores.append(scores[1] * 100)
fold = fold +1
The maximum results I got from this method were training accuracy of 81.2%, validation accuracy of 72.2%, and test accuracy of 70.73%.
Can anyone give me an idea to improve the model so that I can get the test accuracy above 90% as possible?
Later, I will use this model as a pre-trained model to train diabetic retinopathy data as well but from other sources.
BTW, I've tried replacing my preprocessing with this method:
def preprocessing(path):
image = cv2.imread(path)
image = crop_image_from_gray(image)
green = image[:,:,1]
clahe = cv2.createCLAHE(clipLimit=2.0, tileGridSize=(8,8))
cl = clahe.apply(green)
image[:,:,0] = image[:,:,0]
image[:,:,2] = image[:,:,2]
image[:,:,1] = cl
image = cv2.resize(image, (224,224))
return image
I've also tried to replace my model with VGG16, EfficientNetB0. However, none of that had much effect on my results. I'm still stucked with about 70% accuracy.
Please help me come up with ideas to improve my modeling results. I hope.
Your training accuracy is 81.2%. It is generally impossible to have testing accuracy higher that training accuracy, i.e. with current setup you will not achieve 90%.
However, your validation (and also testing) accuracy is about 70-72%. I can suggest that on your small dataset your model is overfitting. So if you add model regularization (e.g. dropout), it is possible that the gap between your training and your validation (and test) will decrease. This way you can improve your validation score.
To further increase the score, you need to check your data manually and try to understand which classes contribute the most to the errors and figure out how those errors can be reduced (e.g. updating your preprocessing pipeline).