Selecting loss and metrics for Tensorflow model - tensorflow

I'm trying to do transfer learning, using a pretrained Xception model with a newly added classifier.
This is the model:
base_model = keras.applications.Xception(
weights="imagenet",
input_shape=(224,224,3),
include_top=False
)
The dataset I'm using is oxford_flowers102 taken directly from tensorflow datasets.
This is a dataset page.
I have a problem with selecting some parameters - either training accuracy shows suspiciously low values, or there's an error.
I need help with specifying this parameter, for this (oxford_flowers102) dataset:
Newly added dense layer for the classifier. I was trying with:
outputs = keras.layers.Dense(102, activation='softmax')(x) and I'm not sure whether I should select the activation function here or not.
loss function for model.
metrics.
I tried:
model.compile(
optimizer=keras.optimizers.Adam(),
loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=[keras.metrics.Accuracy()],
)
I'm not sure whether it should be SparseCategoricalCrossentropy or CategoricalCrossentropy, and what about from_logits parameter?
I'm also not sure whether should I choose for metricskeras.metrics.Accuracy() or keras.metrics.CategoricalAccuracy()
I am definitely lacking some theoretical knowledge, but right now I just need this to work. Looking forward to your answers!

About the data set: oxford_flowers102
The dataset is divided into a training set, a validation set, and a test set. The training set and validation set each consist of 10 images per class (totaling 1020 images each). The test set consists of the remaining 6149 images (minimum 20 per class).
'test' 6,149
'train' 1,020
'validation' 1,020
If we check, we'll see
import tensorflow_datasets as tfds
tfds.disable_progress_bar()
data, ds_info = tfds.load('oxford_flowers102',
with_info=True, as_supervised=True)
train_ds, valid_ds, test_ds = data['train'], data['validation'], data['test']
for i, data in enumerate(train_ds.take(3)):
print(i+1, data[0].shape, data[1])
1 (500, 667, 3) tf.Tensor(72, shape=(), dtype=int64)
2 (500, 666, 3) tf.Tensor(84, shape=(), dtype=int64)
3 (670, 500, 3) tf.Tensor(70, shape=(), dtype=int64)
ds_info.features["label"].num_classes
102
So, it has 102 categories or classes and the target comes with an integer with different shapes input.
Clarification
First, if you keep this integer target or label, you should use sparse_categorical_accuracy for accuracy and sparse_categorical_crossentropy for loss function. But if you transform your integer label to a one-hot encoded vector, then you should use categorical_accuracy for accuracy, and categorical_crossentropy for loss function. As these data set have integer labels, you can choose sparse_categorical or you can transform the label to one-hot in order to use categorical.
Second, if you set outputs = keras.layers.Dense(102, activation='softmax')(x) to the last layer, you will get probabilities score. But if you set outputs = keras.layers.Dense(102)(x), then you will get logits. So, if you set activations='softmax', then you should not use from_logit = True. For example in your above code you should do as follows (here's some theory for you):
...
(a)
# Use softmax activation (no logits output)
outputs = keras.layers.Dense(102, activation='softmax')(x)
...
model.compile(
optimizer=keras.optimizers.Adam(),
loss=keras.losses.SparseCategoricalCrossentropy(from_logits=False),
metrics=[keras.metrics.Accuracy()],
)
or,
(b)
# no activation, output will be logits
outputs = keras.layers.Dense(102)(x)
...
model.compile(
optimizer=keras.optimizers.Adam(),
loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=[keras.metrics.Accuracy()],
)
Third, keras uses string identifier such as metrics=['acc'] , optimizer='adam'. But in your case, you need to be a bit more specific as you mention loss function specific. So, instead of keras.metrics.Accuracy(), you should choose keras.metrics.SparseCategoricalAccuracy() if you target are integer or keras.metrics.CategoricalAccuracy() if your target are one-hot encoded vector.
Code Examples
Here is an end-to-end example. Note, I will transform integer labels to a one-hot encoded vector (right now, it's a matter of preference to me). Also, I want probabilities (not logits) from the last layer which means from_logits = False. And for all of these, I need to choose the following parameters in my training:
# use softmax to get probabilities
outputs = keras.layers.Dense(102,
activation='softmax')(x)
# so no logits, set it false (FYI, by default it already false)
loss = keras.losses.CategoricalCrossentropy(from_logits=False),
# specify the metrics properly
metrics = keras.metrics.CategoricalAccuracy(),
Let's complete the whole code.
import tensorflow_datasets as tfds
tfds.disable_progress_bar()
data, ds_info = tfds.load('oxford_flowers102',
with_info=True, as_supervised=True)
train_ds, valid_ds, test_ds = data['train'], data['validation'], data['test']
NUM_CLASSES = ds_info.features["label"].num_classes
train_size = len(data['train'])
batch_size = 64
img_size = 120
Preprocess and Augmentation
import tensorflow as tf
# pre-process functions
def normalize_resize(image, label):
image = tf.cast(image, tf.float32)
image = tf.divide(image, 255)
image = tf.image.resize(image, (img_size, img_size))
label = tf.one_hot(label , depth=NUM_CLASSES) # int to one-hot
return image, label
# augmentation
def augment(image, label):
image = tf.image.random_flip_left_right(image)
return image, label
train = train_ds.map(normalize_resize).cache().map(augment).shuffle(100).\
batch(batch_size).repeat()
valid = valid_ds.map(normalize_resize).cache().batch(batch_size)
test = test_ds.map(normalize_resize).cache().batch(batch_size)
Model
from tensorflow import keras
base_model = keras.applications.Xception(
weights='imagenet',
input_shape=(img_size, img_size, 3),
include_top=False)
base_model.trainable = False
inputs = keras.Input(shape=(img_size, img_size, 3))
x = base_model(inputs, training=False)
x = keras.layers.GlobalAveragePooling2D()(x)
outputs = keras.layers.Dense(NUM_CLASSES, activation='softmax')(x)
model = keras.Model(inputs, outputs)
Okay, additionally, here I like to use two metrics to compute top-1 and top-3 accuracy.
model.compile(optimizer=keras.optimizers.Adam(),
loss=keras.losses.CategoricalCrossentropy(),
metrics=[
keras.metrics.TopKCategoricalAccuracy(k=3, name='acc_top3'),
keras.metrics.TopKCategoricalAccuracy(k=1, name='acc_top1')
])
model.fit(train, steps_per_epoch=train_size // batch_size,
epochs=20, validation_data=valid, verbose=2)
...
Epoch 19/20
15/15 - 2s - loss: 0.2808 - acc_top3: 0.9979 - acc_top1: 0.9917 -
val_loss: 1.5025 - val_acc_top3: 0.8147 - val_acc_top1: 0.6186
Epoch 20/20
15/15 - 2s - loss: 0.2743 - acc_top3: 0.9990 - acc_top1: 0.9885 -
val_loss: 1.4948 - val_acc_top3: 0.8147 - val_acc_top1: 0.6255
Evaluate
# evaluate on test set
model.evaluate(test, verbose=2)
97/97 - 18s - loss: 1.6482 - acc_top3: 0.7733 - acc_top1: 0.5994
[1.648208498954773, 0.7732964754104614, 0.5994470715522766]

Related

Deep Learning model (LSTM) predicts same class label

I am trying to solve the Spoken Digit Recognition task using the LSTM model, where the audio files are converted into spectrograms and fed into an LSTM model after doing Global Average Pooling. Here is the architecture of it
tf.keras.backend.clear_session()
#input layer
input_= Input(shape = (64, 35))
lstm = LSTM(100, activation='tanh', return_sequences= True, kernel_regularizer = l2(0.000001),
recurrent_initializer = 'glorot_uniform')(input_)
lstm = GlobalAveragePooling1D(data_format='channels_first')(lstm)
dense = Dense(20, activation='relu', kernel_regularizer = l2(0.000001), kernel_initializer='glorot_uniform')(lstm)
drop = Dropout(0.8)(dense)
dense1 = Dense(25, activation='relu', kernel_regularizer = l2(0.000001), kernel_initializer= 'he_uniform')(drop)
drop = Dropout(0.95)(dense1)
output = Dense(10,activation = 'softmax', kernel_regularizer = l2(0.000001), kernel_initializer= 'glorot_uniform')(drop)
model_2 = Model(inputs = [input_], outputs = output)
model_2.summary()
Having summary as -
I need to calculate the F1 score to check the performance of the model, I have implemented a custom callback and used TensorFlow addons F1 score too. However, I won't get the correct result, for every epoch I get the constant F1 score value.
On further digging, I found out that my model predicts the same class label, for the entire epoch, whereas it is supposed to predict 10 classes in one epoch. as there are 10 class label values present.
Here is my model.compile and model.predict commands. I have used TensorFlow addon here -
from tensorflow import keras
opt = keras.optimizers.Adam(0.001, clipnorm=0.8)
model_2.compile(loss='categorical_crossentropy', optimizer=opt, metrics = metric)
hist = model_2.fit([X_train_spectrogram],
[y_train_converted],
validation_data= ([X_test_spectrogram], [y_test_converted]),
epochs = 10,
verbose =1,
callbacks=[tensorBoard_callbk2, ClearMemory()],
# steps_per_epoch = 3,
batch_size=32)
Here is what I mean by getting the same prediction, the entire array is filled with the same predicted values.
Why is the model predicting the same class label? or How to rectify it?
I have tried increasing the number of trainable parameters, increasing - decreasing batch size too, but it won't help me. If anyone knows can you please help me out?

Tensorflow 2: How to fit a subclassed model that returns multiple values in the call method?

I built the following model via Model Subclassing in TensorFlow 2:
from tensorflow.keras import Model, Input
from tensorflow.keras.applications import DenseNet201
from tensorflow.keras.applications.densenet import preprocess_input
from tensorflow.keras.layers import Flatten, Dense
class Detector(Model):
def __init__(self, num_classes=3, name="DenseNet201"):
super(Detector, self).__init__(name=name)
self.feature_extractor = DenseNet201(
include_top=False,
weights="imagenet",
)
self.feature_extractor.trainable = False
self.flatten_layer = Flatten()
self.prediction_layer = Dense(num_classes, activation=None)
def call(self, inputs):
x = preprocess_input(inputs)
extracted_feature = self.feature_extractor(x, training=False)
x = self.flatten_layer(extracted_feature)
y_hat = self.prediction_layer(x)
return extracted_feature, y_hat
The subsequent steps are compiling and fitting the model. The model compiled as normal but when fitting my image generator (built from ImageDataGenerator), I encountered the error: InvalidArgumentError: Incompatible shapes: [64,18,18] vs. [64,1] [[node Equal (defined at :19) ]] [Op:__inference_train_function_32187] Function call stack: train_function –.
history = detector.fit(
train_generator,
epochs=1,
validation_data=val_generator,
callbacks=callbacks
)
This is obvious because TensorFlow does not know whether the prediction is y_hat or extracted_featureduring detector.fit() and thus threw an error. So, what is the right implementation of detector.fit for my case?
Following this question-answer1, you should first train your model with (let's say) one input and one output. And later if you want to compute grad-cam, you would pick some intermediate layer of your base model (not the final output of the base model) and in that case, you need to build your feature extractor separately. For example
# (let's say: one input and one output)
# use for training
base_model = keras.application(...)
x = base_model(..)
dese_drop_bn_[whatever] = x
out = dese_drop_bn_[whatever]
model = Model(base_model.input, out)
# inference / we need to compute grad cam
new_model = tf.keras.models.Model(model.input,
[model.layers[15].output, model.output])
In the above, the model is used for training, and later in inference time if you need to compute grad-cam based on the layer for example layer number 15, you need to build new_model with appropriate outputs. Hope this makes things clear. For more information about feature extraction, see the official doc, Extract and reuse nodes in the graph of layers2. FYI, the exact same things are happening here as I informed you earlier. Also, check this official code example, you will see exact same thing there.
However, there is another way that I'm thinking might work for your easily. That is, as you're using a custom model, we can take the privilege training argument in the call() method. Normally in training time, this is True and for inference time it's False. So, based on this, we can return desired output the accordingly. Here is the complete code example:
import tensorflow as tf
# get some data
data_dir = tf.keras.utils.get_file(
'flower_photos',
'https://storage.googleapis.com/download.tensorflow.org/example_images/flower_photos.tgz',
untar=True)
datagen_kwargs = dict(rescale=1./255, validation_split=.20)
dataflow_kwargs = dict(target_size=(64, 64),
batch_size=16,
interpolation="bilinear")
train_datagen = tf.keras.preprocessing.image.ImageDataGenerator(
rotation_range=40,
horizontal_flip=True,
width_shift_range=0.2, height_shift_range=0.2,
shear_range=0.2, zoom_range=0.2,
**datagen_kwargs)
train_generator = train_datagen.flow_from_directory(
data_dir, subset="training", shuffle=True, **dataflow_kwargs)
for image, label in train_generator:
print(image.shape, image.dtype)
print(label.shape, label.dtype)
print(label[:4])
break
(16, 64, 64, 3) float32
(16, 5) float32
[[0. 0. 0. 0. 1.]
[0. 0. 0. 1. 0.]
[0. 0. 0. 1. 0.]
[0. 0. 0. 0. 1.]]
Here we do that trick based on the boolean value of training in the call method.
class Detector(Model):
def __init__(self, num_classes=5, name="DenseNet201"):
super(Detector, self).__init__(name=name)
self.feature_extractor = DenseNet201(
include_top=False,
weights="imagenet",
)
self.feature_extractor.trainable = False
self.flatten_layer = Flatten()
self.prediction_layer = Dense(num_classes, activation='softmax')
def call(self, inputs, training):
x = preprocess_input(inputs)
extracted_feature = self.feature_extractor(x, training=False)
x = self.flatten_layer(extracted_feature)
y_hat = self.prediction_layer(x)
if training:
return y_hat
else:
return [y_hat, extracted_feature]
Train
det = Detector()
det.compile(loss='categorical_crossentropy',
optimizer='adam', metrics=['acc'])
train_step = train_generator.samples // train_generator.batch_size
det.fit(train_generator,
steps_per_epoch=train_step,
validation_data=train_generator,
validation_steps=train_step,
epochs=2, verbose=2)
Epoch 1/2
37s 139ms/step - loss: 1.7543 - acc: 0.2650 - val_loss: 1.5310 - val_acc: 0.3764
Epoch 2/2
21s 115ms/step - loss: 1.4913 - acc: 0.3915 - val_loss: 1.3066 - val_acc: 0.4667
<tensorflow.python.keras.callbacks.History at 0x7fa2890b1790>
Evaluate
det.evaluate(train_generator,
steps=train_step)
4s 76ms/step - loss: 1.3066 - acc: 0.4667
[1.3065541982650757, 0.46666666865348816]
Inference
Here, we will get two outputs of this model (unlike 1 output that we've got in the training time).
y_hat, base_feature = det.predict(train_generator,
steps=train_step)
y_hat.shape, base_feature.shape
((720, 5), (720, 2, 2, 1920))
Now, you can do grad-cam or whatever require such feature maps.

constant loss values with normal CNNs and transfer learning

I am working on the dataset given in the paper https://arxiv.org/ftp/arxiv/papers/1511/1511.02459.pdf
In this paper, a dataset of images (portraits of people) is labeled by a floating number between 1 and 5 (1 ugly, 5 good looking). I wanted to work on this dataset and use MobileNetV2 with transfer learning (pretrained on Imagenet) in Tensorflow 2.4.0-dev20201009 with CUDA 11.1 on my RTX 3070 8gb. I don't really see my mistake but training my model yields often in constant validation loss, for example:
78/78 [==============================] - ETA: 0s - loss: 52145660442.33472020-11-20 13:19:36.796481: E tensorflow/core/grappler/optimizers/meta_optimizer.cc:596] layout failed: Invalid argument: Size of values 2 does not match size of permutation 4 # fanin shape insequential/dense/BiasAdd-0-TransposeNHWCToNCHW-LayoutOptimizer
78/78 [==============================] - 16s 70ms/step - loss: 51654522711.5709 - val_loss: 9.5415
Epoch 2/300
78/78 [==============================] - 4s 52ms/step - loss: 9.4870 - val_loss: 9.5415
Epoch 3/300
78/78 [==============================] - 4s 52ms/step - loss: 9.3986 - val_loss: 9.5415
Epoch 4/300
78/78 [==============================] - 4s 51ms/step - loss: 9.4950 - val_loss: 9.5415
Epoch 5/300
78/78 [==============================] - 4s 52ms/step - loss: 9.4076 - val_loss: 9.5415
Epoch 6/300
78/78 [==============================] - 4s 52ms/step - loss: 9.4993 - val_loss: 9.5415
Epoch 7/300
78/78 [==============================] - 4s 52ms/step - loss: 9.3758 - val_loss: 9.5415
...
The validation loss would remain constant for 300 epochs. My code can be found here below. Let me summarize:
I used transfer-learning from Imagenet and froze the convolutional base of MobileNetV2.
I added a dense layer as the classificator and 1 output neuron. The loss function I used is MSE. The optimizer in the code is SGD, and I also tried ADAM which could also yield constant loss values on the validation set.
The above error (constant val loss) occurs also with different learning rates and with ADAM. Sometimes the same learning rate yields not constant val loss but reasonable loss. I assume this is due to the randomized weights initialization method on the dense layers in my classificator. I even tried absurd learning_rates like 10, and values are still constant. If the lr is very high then changes should be clearly seen! This is not the case. What is wrong?
My code:
import os
from typing import Dict, Any
from PIL import Image
from sklearn.model_selection import GridSearchCV
import tensorflow as tf
from tensorflow.keras.applications.mobilenet_v2 import MobileNetV2
from tensorflow.keras import layers
from tensorflow import keras
import matplotlib.pyplot as plt
import pickle
import numpy as np
import cv2
import random
#method to create the model
def create_model(IMG_SIZE, lr):
#Limit memore usage of GPU
gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
try:
tf.config.experimental.set_virtual_device_configuration(gpus[0], [
tf.config.experimental.VirtualDeviceConfiguration(memory_limit=1024*7)])
except RuntimeError as e:
print(e)
model = keras.Sequential()
model.add(MobileNetV2(input_shape=(IMG_SIZE, IMG_SIZE, 3), include_top=False))
model.layers[0].trainable = False
model.add(layers.GlobalAveragePooling2D())
model.add(tf.keras.layers.Dropout(0.8))
model.add(layers.Dense(128, activation="relu"))
model.add(layers.Dense(1, activation="relu"))
#use adam or sgd as optimizers
adam = tf.keras.optimizers.Adam(learning_rate=lr, beta_1=0.9, beta_2=0.98,
epsilon=1e-9)
sgd = tf.keras.optimizers.SGD(lr=lr, decay=1e-6, momentum=0.5)
model.compile(optimizer=sgd,
loss=tf.losses.mean_squared_error,
)
model.summary()
return model
#preprocessing
def loadImages(IMG_SIZE):
path = os.path.join(os.getcwd(), 'data\\Images')
training_data=[]
labelMap = getLabelMap()
for img in os.listdir(path):
out_array = np.zeros((350,350, 3), np.float32) #original size of images in the dataset
try:
img_array = cv2.imread(os.path.join(path, img))
img_array=img_array.astype('float32') #cast to float because to prevent normalization erros
out_array = cv2.normalize(img_array, out_array, 0, 1, cv2.NORM_MINMAX) #normalize image
out_array = cv2.resize(out_array, (IMG_SIZE, IMG_SIZE)) #resize, bc we need 224x224 for Imagenet pretrained weights
training_data.append([out_array, float(labelMap[img])])
except Exception as e:
pass
return training_data
#preprocessing, the txt file All_labels.txt has lines of the form 'filename.jpg 3.2' and 3.2 is the label
def getLabelMap():
map = {}
path = os.getcwd()
path = os.path.join(path, "data\\train_test_files\\All_labels.txt")
f = open(path, "r")
for line in f:
line = line.split()
map[line[0]] = line[1]
f.close()
return map
#not important, in case you want to see the images after preprocessing
def showimg(image):
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
plt.imshow(image)
plt.show()
#pickle the preprocessed data
def pickle_it(training_set, IMG_SIZE):
X = []
Y = []
for features, label in training_set:
X.append(features)
Y.append(label)
X = np.array(X).reshape(-1, IMG_SIZE, IMG_SIZE, 3)
Y = np.array(Y)
pickle_out = open("X.pickle", "wb")
pickle.dump(X, pickle_out)
pickle_out.close()
pickle_out = open("Y.pickle", "wb")
pickle.dump(Y, pickle_out)
pickle_out.close()
#for prediction after training the model
def betterThan(y, Y):
Z=np.sort(Y)
cnt = 0
for z in Z:
if z>y:
break
else:
cnt = cnt+1
return float(cnt/len(Y))
#for prediction after training the model
def predictImage(image, model, Y):
img_array = cv2.imread(image)
img_array = cv2.resize(img_array, (IMG_SIZE, IMG_SIZE))
img_array = np.array(img_array).reshape(-1, IMG_SIZE, IMG_SIZE, 3)
y = model.predict(img_array)
per = betterThan(y, Y)
print('You look better than ' + str(per) + '% of the dataset')
#Main/Driver function
#Preprocessing
IMG_SIZE = 224
training_set=[]
training_set = loadImages(IMG_SIZE)
random.shuffle(training_set)
pickle_it(training_set, IMG_SIZE) #I pickle my data, so that I don't always have to go through the preprocessing
#Load preprocessed data
X = pickle.load(open("X.pickle", "rb"))
Y = pickle.load(open("Y.pickle", "rb"))
#Just to check that the images look correct
showimg(X[0])
# define the grid search parameters, feel free to edit the grids
batch_size = [64]
epochsGrid = [300]
learning_rate = [0.1]
#save models and best parameters found in grid search
size_histories = {}
min_val_loss = 10
best_para = {}
#ignore this, used for bugs on my gpu... You possibly don't need this
config = tf.compat.v1.ConfigProto(gpu_options=tf.compat.v1.GPUOptions(allow_growth=True))
sess = tf.compat.v1.Session(config=config)
#grid search, training the model
for epochs in epochsGrid:
for batch in batch_size:
for lr in learning_rate:
model = create_model(IMG_SIZE, lr)
model_name = str(epochs) + '_' + str(batch) + '_' + str(lr)
#train the model with the given hyperparameters
size_histories[model_name] = model.fit(X, Y, batch_size=batch, epochs=epochs, validation_split=0.1)
# save model with the best loss value
if min(size_histories[model_name].history['val_loss']) < min_val_loss:
min_val_loss = min(size_histories[model_name].history['val_loss'])
best_para['epoch'] = epochs
best_para['batch'] = batch
best_para['lr'] = lr
model.save('savedModel')
#If you want to make prediction
model = tf.keras.models.load_model("savedModel")
image = os.path.join(os.getcwd(), 'data\\otherImages\\beautifulWomen.jpg')
predictImage(image, model, Y)
EDIT:
I have found the issue. It is 'relu' in the output neuron. When I change my loss from RMSE to MAPE I will see that I got a 100 percent error on validation. I assume this is because all my validation data is output to 0. This is only possible when the value in the output neuron before 'relu' is negative. I don't know why this is the case. But removing 'relu' will yield better training.
Does anyone know why 'relu' causes this problem with regression problems?
If this is your last layer
model.add(layers.Dense(1, activation="relu"))
then your models final output is y if y > 0 else 0. At your untrained state, your model could very well have y pinned to something like -17 or 17 with fairly equal chance. In the case of -17, the relu will convert that to 0 and also set the gradient to 0, which means the network doesn't learn. Yeah, the network doesn't learn anything from any part of a network where a relu unit output 0. In the case of the layer before
model.add(layers.Dense(128, activation="relu"))
there will be a really good chance that about half of the units will fire with a positive value and so they learn, so that layer is fine.
What can be done in the case of a bad initialization or after training a bad state in which the output of that last layer is pushed down to below 0? Well, what if we just don't use relu. What activation to use? None! Let's look at what that would be
1: model = keras.Sequential()
2: model.add(MobileNetV2(input_shape=(IMG_SIZE, IMG_SIZE, 3), include_top=False))
3: model.layers[0].trainable = False
4: model.add(layers.GlobalAveragePooling2D())
5: model.add(tf.keras.layers.Dropout(0.8))
6: model.add(layers.Dense(128, activation="relu"))
7: model.add(layers.Dense(1))
Lines 1-6 are all the same. It is important to note that the output of line 6 passes through the non-linear relu activation, and so there is the capability to learn non-linearities. Line 7, without an activation function will be a linear combination of Line 6, with a full ability to generate gradients in the positive and negative output region. When backprop is applied to learn the target values of 1 to 5, if the network outputs -17, it can learn to output a larger number. Yeah!
If you'd like to have 2 layers of nonlinearity, I'd suggest the following
1: model = keras.Sequential()
2: model.add(MobileNetV2(input_shape=(IMG_SIZE, IMG_SIZE, 3), include_top=False))
3: model.layers[0].trainable = False
4: model.add(layers.GlobalAveragePooling2D())
5: model.add(layers.Dense(128, activation="tanh"))
6: model.add(layers.Dense(64, activation="tanh"))
7: model.add(layers.Dense(1))
Ditch the dropout unless you have actual proof that it helps in this very specific network (and right now I suspect you don't). Try tanh as your hidden layer activation function. It has some nice features, like being positive and negative, gradient even with large and/or negative numbers, and acts somewhat to automatically regularize weights. But, importantly, the last output either has no activation function.

Good training accuracy but bad evaluation

I trained a DNN model, get good training accuracy but bad evaluation accuracy.
def DNN_Metrix(shape, dropout):
model = tf.keras.Sequential()
print(shape)
model.add(tf.keras.layers.Flatten(input_shape=shape))
model.add(tf.keras.layers.Dense(10,activation=tf.nn.relu))
for i in range(0,2):
model.add(tf.keras.layers.Dense(10,activation=tf.nn.relu))
model.add(tf.keras.layers.Dense(8,activation=tf.nn.tanh))
model.add(tf.keras.layers.Dense(1, activation=tf.nn.sigmoid))
model.compile(loss='binary_crossentropy',
optimizer=tf.keras.optimizers.Adam(),
metrics=['accuracy'])
return model
model_dnn = DNN_Metrix(shape=(28,20,1), dropout=0.1)
model_dnn.fit(
train_dataset,
steps_per_epoch=1000,
epochs=10,
verbose=2
)
Here is my training process, and result:
Epoch 10/10
- 55s - loss: 0.4763 - acc: 0.7807
But when I evaluation with test dataset, I got:
result = model_dnn.evaluate(np.array(X_test), np.array(y_test), batch_size=len(X_test))
loss, accuracy = [0.9485417604446411, 0.3649936616420746]
it's a binary classification, Positive label : Negetive label is about
0.37 : 0.63
I don't think it was result from overfiting, I have 700k instances when training, with shape of 28 * 20, and my DNN model is simple and have few parameters.
Here is my code when generating the test data and training data:
def parse_function(example_proto):
dics = {
'feature': tf.FixedLenFeature(shape=(), dtype=tf.string, default_value=None),
'label': tf.FixedLenFeature(shape=(2), dtype=tf.float32),
'shape': tf.FixedLenFeature(shape=(2), dtype=tf.int64)
}
parsed_example = tf.parse_single_example(example_proto, dics)
parsed_example['feature'] = tf.decode_raw(parsed_example['feature'], tf.float64)
parsed_example['feature'] = tf.reshape(parsed_example['feature'], [28,20,1])
label_t = tf.cast(parsed_example['label'], tf.int32)
parsed_example['label'] = parsed_example['label'][1]
return parsed_example['feature'], parsed_example['label']
def read_tfrecord(train_tfrecord):
dataset = tf.data.TFRecordDataset(train_tfrecord)
dataset = dataset.map(parse_function)
dataset = dataset.shuffle(buffer_size=10000)
dataset = dataset.repeat(100)
dataset = dataset.batch(670)
return dataset
def read_tfrecord_test(test_tfrecord):
dataset = tf.data.TFRecordDataset(test_tfrecord)
dataset = dataset.map(parse_function)
return dataset
# tf_record_target = 'train_csv_temp_norm_vx.tfrecords'
train_files = 'train_baseline.tfrecords'
test_files = 'test_baseline.tfrecords'
train_dataset = read_tfrecord(train_files)
test_dataset = read_tfrecord_test(test_files)
it_test_dts = test_dataset.make_one_shot_iterator()
it_train_dts = train_dataset.make_one_shot_iterator()
X_test = []
y_test = []
el = it_test_dts.get_next()
count = 1
with tf.Session() as sess:
while True:
try:
x_t, y_t = sess.run(el)
X_test.append(x_t)
y_test.append(y_t)
except tf.errors.OutOfRangeError:
break
Judging from the fact that your data distribution in your test set is [37%-63%] and your final accuracy is 0.365, I would first check the labels predicted on the test set.
Most probably, all your predictions are of class 0, provided that class 0 amounts for 37% of your dataset. In this case, it means that your neural network is not able to learn anything on the training set, and you have a massive scenario of overfitting.
I recommend that you always use a validation set, so that at the end of each epoch, you would check to see if your neural network has learnt anything. In such a situation(like yours), you would see very fast the overfitting issue.
Training accuracy doesn't mean much. A NN can fit any random set of inputs and outputs, even if they're unrelated. That's why you want to use validation data.
After training look at your loss curves, this will give you a better idea of where things are going wrong.
NN's default to just guessing the most popular class it's seen in training data for classification problems. This is usually what happens when you haven't setup your experiment correctly.
And since your dealing with binary classification you might want to look at things like StratifiedKFold which will provided you folds of train/test data were the sample % is persevered.

Translating Pytorch program into Keras: different results

I have translated a pytorch program into keras.
A working Pytorch program:
import numpy as np
import cv2
import torch
import torch.nn as nn
from skimage import segmentation
np.random.seed(1)
torch.manual_seed(1)
fi = "in.jpg"
class MyNet(nn.Module):
def __init__(self, n_inChannel, n_outChannel):
super(MyNet, self).__init__()
self.seq = nn.Sequential(
nn.Conv2d(n_inChannel, n_outChannel, kernel_size=3, stride=1, padding=1),
nn.ReLU(inplace=True),
nn.BatchNorm2d(n_outChannel),
nn.Conv2d(n_outChannel, n_outChannel, kernel_size=3, stride=1, padding=1),
nn.ReLU(inplace=True),
nn.BatchNorm2d(n_outChannel),
nn.Conv2d(n_outChannel, n_outChannel, kernel_size=1, stride=1, padding=0),
nn.BatchNorm2d(n_outChannel)
)
def forward(self, x):
return self.seq(x)
im = cv2.imread(fi)
data = torch.from_numpy(np.array([im.transpose((2, 0, 1)).astype('float32')/255.]))
data = data.cuda()
labels = segmentation.slic(im, compactness=100, n_segments=10000)
labels = labels.flatten()
u_labels = np.unique(labels)
label_indexes = np.array([np.where(labels == u_label)[0] for u_label in u_labels])
n_inChannel = 3
n_outChannel = 100
model = MyNet(n_inChannel, n_outChannel)
model.cuda()
model.train()
loss_fn = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=0.1, momentum=0.9)
label_colours = np.random.randint(255,size=(100,3))
for batch_idx in range(100):
optimizer.zero_grad()
output = model( data )[ 0 ]
output = output.permute( 1, 2, 0 ).view(-1, n_outChannel)
ignore, target = torch.max( output, 1 )
im_target = target.data.cpu().numpy()
nLabels = len(np.unique(im_target))
im_target_rgb = np.array([label_colours[ c % 100 ] for c in im_target]) # correct position of "im_target"
im_target_rgb = im_target_rgb.reshape( im.shape ).astype( np.uint8 )
for inds in label_indexes:
u_labels_, hist = np.unique(im_target[inds], return_counts=True)
im_target[inds] = u_labels_[np.argmax(hist, 0)]
target = torch.from_numpy(im_target)
target = target.cuda()
loss = loss_fn(output, target)
loss.backward()
optimizer.step()
print (batch_idx, '/', 100, ':', nLabels, loss.item())
if nLabels <= 3:
break
fo = "out.jpg"
cv2.imwrite(fo, im_target_rgb)
(source: https://github.com/kanezaki/pytorch-unsupervised-segmentation/blob/master/demo.py)
My translation into Keras:
import cv2
import numpy as np
from skimage import segmentation
from keras.layers import Conv2D, BatchNormalization, Input, Reshape
from keras.models import Model
import keras.backend as k
from keras.optimizers import SGD, Adam
from skimage.util import img_as_float
from skimage import io
from keras.models import Sequential
np.random.seed(0)
fi = "in.jpg"
im = cv2.imread(fi).astype(float)/255.
labels = segmentation.slic(im, compactness=100, n_segments=10000)
labels = labels.flatten()
print (labels.shape)
u_labels = np.unique(labels)
label_indexes = [np.where(labels == u_label)[0] for u_label in np.unique(labels)]
n_channels = 100
model = Sequential()
model.add ( Conv2D(n_channels, kernel_size=3, activation='relu', input_shape=im.shape, padding='same'))
model.add( BatchNormalization())
model.add( Conv2D(n_channels, kernel_size=3, activation='relu', padding='same'))
model.add( BatchNormalization())
model.add( Conv2D(n_channels, kernel_size=1, padding='same'))
model.add( BatchNormalization())
model.add( Reshape((im.shape[0] * im.shape[1], n_channels)))
img = np.expand_dims(im,0)
print (img.shape)
output = model.predict(img)
print (output.shape)
im_target = np.argmax(output[0], 1)
print (im_target.shape)
for inds in label_indexes:
u_labels_, hist = np.unique(im_target[inds], return_counts=True)
im_target[inds] = u_labels_[np.argmax(hist, 0)]
def custom_loss(loss_target, loss_output):
return k.categorical_crossentropy(target=k.stack(loss_target), output=k.stack(loss_output), from_logits=True)
model.compile(optimizer=SGD(lr=0.1, momentum=0.9), loss=custom_loss)
model.fit(img, output, epochs=100, batch_size=1, verbose=1)
pred_result = model.predict(x=[img])[0]
print (pred_result.shape)
target = np.argmax(pred_result, 1)
print (target.shape)
nLabels = len(np.unique(target))
label_colours = np.random.randint(255, size=(100, 3))
im_target_rgb = np.array([label_colours[c % 100] for c in im_target])
im_target_rgb = im_target_rgb.reshape(im.shape).astype(np.uint8)
cv2.imwrite("out.jpg", im_target_rgb)
However, Keras output is really different than of pytorch
Input image:
Pytorch result:
Keras result:
Could someone help me for this translation?
Edit 1:
I corrected two errors as advised by #sebrockm
1. removed `relu` from last conv layer
2. added `from_logits = True` in the loss function
Also, changed the no. of conv layers from 4 to 3 to match with the original code.
However, output image did not improve than before and the `loss` are resulted in negative:
Epoch 99/100
1/1 [==============================] - 0s 92ms/step - loss: -22.8380
Epoch 100/100
1/1 [==============================] - 0s 99ms/step - loss: -23.039
I think that the Keras code lacks connection between model and output. However, could not figure out to make this connection.
Two major mistakes that I see (likely related):
The last convolutional layer in the original model does not have an activation function, while your translation uses relu.
The original model uses CrossEntropyLoss as loss function, while your model uses categorical_crossentropy with logits=False (a default argument). Without mathematical background the difference is tricky to explain, but in short: CrossEntropyLoss has a softmax built in, that's why the model doesn't have one on the last layer. To do the same in keras, use k.categorical_crossentropy(..., logits=True). "logits" means the input values are expected not to be "softmaxed", i.e. all values can be arbitrary. Currently, your loss function expects the output values to be "softmaxed", i.e. all values must be between 0 and 1 (and sum up to 1).
Update:
One other mistake, likely a huge one: In Keras, you calculate the output once in the beginning and never change it from there on. Then you train your model to fit on this initially generated output.
In the original pytorch code, target (which is the variable being trained on) gets updated in every training loop.
So, you cannot use Keras' fit method which is designed for doing the entire training for you (given fixed training data). You will have to replicate the training loop manually, just as it is done in the pytorch code. I'm not sure if this is easily doable with the API Keras provides. train_on_batch is one method you surely will need in your manual loop. You will have to do some more work, I'm afraid...