Selecting Metrics in Keras CNN - tensorflow

I am trying to use CNN for trying to classify cats/dogs and noticed something strange.
When i define the model compile statement as below -
cat_dog_model.compile(optimizer =optimizers.Adam(),
metrics= [metrics.Accuracy()], loss=losses.binary_crossentropy)
my accuracy is very bad - something like 0.15% after 25 epochs.
When i define the same as
cnn.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ['accuracy'])
my accuracy shoots upto 55% in the first epoch and almost 80% by epoch 25.
When I read the Keras doc - they mention explicitly that
You can either instantiate an optimizer before passing it to model.compile(), as in the above example, or you can pass it by its string identifier. In the latter case, the default parameters for the optimizer will be used.
Also the metrics parameter are also as per the API - Keras Metrics API
So as per my understanding i am using default parameters on both. Also when i change the metrics parameter to hardcode I get the same accuracy. So somehow the accuracy metrics is causing this issue. But I cant figure out why - Any help is appreciated.
My qn is why is hard coding metrics better than defining it as parameter?
Some more details : I am trying to use 8k images for training and about 2k images for validation.
sample code (you can change the line number 32 to get different results) :
from keras import models, layers, losses, metrics, optimizers
import numpy as np
import pandas as pd
from keras.preprocessing.image import ImageDataGenerator, load_img,img_to_array
train_datagen = ImageDataGenerator(rescale = 1./255,shear_range = 0.2,zoom_range = 0.2,horizontal_flip = True)
train_set = train_datagen.flow_from_directory('/content/drive/MyDrive/....../training_set/',
target_size = (64, 64),batch_size = 32,class_mode = 'binary')
test_datagen = ImageDataGenerator(rescale = 1./255)
test_set = test_datagen.flow_from_directory(
target_size = (64, 64),batch_size = 32,class_mode = 'binary')
cat_dog_model = models.Sequential()
cat_dog_model.add(layers.Conv2D(filters=32, kernel_size=3, activation='relu', input_shape=[64, 64, 3]))
cat_dog_model.add(layers.MaxPool2D(pool_size=2, strides=2))
cat_dog_model.add(layers.Conv2D(filters=32, kernel_size=3, activation='relu'))
cat_dog_model.add(layers.MaxPool2D(pool_size=2, strides=2) )
cat_dog_model.add(layers.Dense(units=128, activation='relu'))
cat_dog_model.add(layers.Dense(units=1, activation='sigmoid'))
cat_dog_model.compile(optimizer =optimizers.Adam(), metrics= [metrics.Accuracy()], loss=losses.binary_crossentropy)
cat_dog_model.summary(),validation_data=test_set, epochs=25)


Great validation accuracy but terrible prediction

I have got the following CNN:
import os
import numpy as np
from keras.layers import Conv2D, MaxPooling2D, Flatten, Dense
from keras.models import Sequential
from keras.utils import to_categorical
from sklearn.preprocessing import LabelEncoder
from tqdm import tqdm
# Load the data
data_dir = PATH_DIR
x_train = []
y_train = []
total_files = 0
for subdir in os.listdir(data_dir):
subdir_path = os.path.join(data_dir, subdir)
if os.path.isdir(subdir_path):
total_files += len([f for f in os.listdir(subdir_path) if f.endswith('.npy')])
with tqdm(total=total_files, unit='file') as pbar:
for subdir in os.listdir(data_dir):
subdir_path = os.path.join(data_dir, subdir)
if os.path.isdir(subdir_path):
for image_file in os.listdir(subdir_path):
if image_file.endswith('.npy'):
image_path = os.path.join(subdir_path, image_file)
image = np.load(image_path)
x_train = np.array(x_train)
y_train = np.array(y_train)
# Preprocess the labels
label_encoder = LabelEncoder()
y_train = label_encoder.fit_transform(y_train)
y_train = to_categorical(y_train)
# Create the model
model = Sequential()
model.add(Conv2D(32, (3, 3), activation='relu', input_shape=(57, 57, 3)))
model.add(MaxPooling2D((2, 2)))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D((2, 2)))
model.add(Conv2D(128, (3, 3), activation='relu'))
model.add(MaxPooling2D((2, 2)))
model.add(Dense(128, activation='relu'))
model.add(Dense(8, activation='softmax'))
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy']), y_train, epochs=10)'GeneratedModels/units_model_np.h5')
And then the following function that is called within a loop about 15 times a second. Where image is a numpy array.
def guess_unit(image, classList):
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
model = tf.keras.models.load_model(MODEL_PATH)
image = np.expand_dims(image, axis=0)
prediction = model.predict(image, verbose=0)
index = np.argmax(prediction)
# Return the predicted unit
return classList[index]
The problem is that when i train the model the accuracy is very high (99,99976%) but when I am using the predict the output is terribily wrong, to the point it does not make any sense. Sometimes the image received will be the same but the predict will return 2 different things.
I have no idea what am I doing wrong. It's the first time I am tinkering with Neural Networks.
I have tried to use the model.predict with the images that it was trained on and it's always getting them right. Is just when it receives dynamic images that it's terribly wrong.
NOTE: I have 8 classes and it was trained using about 13000 images.
Generally to get performance on your training data you have to split your data into training, testing and validation (which I see you haven't done). This can be done manually or done via adding validation_split into your fit function.
Without seeing any curves on how your loss and accuracy it's behaving it's difficult to make any suggestions. However it might be the case that your are underfitting or overfitting to your data (I would assume that your facing overfitting in your case). In case you are overfitting to your data, I would suggest you to add some regularization or change your model architecture as the one used might not be appropriate. Options that one could think of would be to add regularization via Dropout or adding regularization to your weights.

Deep Learning model (LSTM) predicts same class label

I am trying to solve the Spoken Digit Recognition task using the LSTM model, where the audio files are converted into spectrograms and fed into an LSTM model after doing Global Average Pooling. Here is the architecture of it
#input layer
input_= Input(shape = (64, 35))
lstm = LSTM(100, activation='tanh', return_sequences= True, kernel_regularizer = l2(0.000001),
recurrent_initializer = 'glorot_uniform')(input_)
lstm = GlobalAveragePooling1D(data_format='channels_first')(lstm)
dense = Dense(20, activation='relu', kernel_regularizer = l2(0.000001), kernel_initializer='glorot_uniform')(lstm)
drop = Dropout(0.8)(dense)
dense1 = Dense(25, activation='relu', kernel_regularizer = l2(0.000001), kernel_initializer= 'he_uniform')(drop)
drop = Dropout(0.95)(dense1)
output = Dense(10,activation = 'softmax', kernel_regularizer = l2(0.000001), kernel_initializer= 'glorot_uniform')(drop)
model_2 = Model(inputs = [input_], outputs = output)
Having summary as -
I need to calculate the F1 score to check the performance of the model, I have implemented a custom callback and used TensorFlow addons F1 score too. However, I won't get the correct result, for every epoch I get the constant F1 score value.
On further digging, I found out that my model predicts the same class label, for the entire epoch, whereas it is supposed to predict 10 classes in one epoch. as there are 10 class label values present.
Here is my model.compile and model.predict commands. I have used TensorFlow addon here -
from tensorflow import keras
opt = keras.optimizers.Adam(0.001, clipnorm=0.8)
model_2.compile(loss='categorical_crossentropy', optimizer=opt, metrics = metric)
hist =[X_train_spectrogram],
validation_data= ([X_test_spectrogram], [y_test_converted]),
epochs = 10,
verbose =1,
callbacks=[tensorBoard_callbk2, ClearMemory()],
# steps_per_epoch = 3,
Here is what I mean by getting the same prediction, the entire array is filled with the same predicted values.
Why is the model predicting the same class label? or How to rectify it?
I have tried increasing the number of trainable parameters, increasing - decreasing batch size too, but it won't help me. If anyone knows can you please help me out?

why am I getting error in transfer learning?

I am training a model for Optical Character Recognition of Gujarati Language. The input image is a character image. I have taken 37 classes. Total training images are 22200 (600 per class) and testing images are 5920 (160 per class). My input images are 32x32
Below is my code:
model = tf.keras.applications.DenseNet121(include_top=False, weights='imagenet', pooling='max')
base_inputs = model.layers[0].input
base_outputs = model.layers[-1].output # NOTICE -1 not -2
prefinal_outputs = layers.Dense(1024)(base_outputs)
final_outputs = layers.Dense(37)(prefinal_outputs)
new_model = keras.Model(inputs=base_inputs, outputs=base_outputs)
from tensorflow.keras.preprocessing.image import ImageDataGenerator
train_datagen = ImageDataGenerator(
test_datagen = ImageDataGenerator(horizontal_flip = False)
training_set = train_datagen.flow_from_directory('C:/Users/shweta/Desktop/characters/train',
target_size = (32, 32),
batch_size = 64,
class_mode = 'categorical')
test_set = test_datagen.flow_from_directory('C:/Users/shweta/Desktop/characters/test',
target_size = (32, 32),
batch_size = 64,
class_mode = 'categorical')
new_model.compile(optimizer = 'adam', loss = 'categorical_crossentropy', metrics = ['accuracy'])
epochs = 25,
validation_data = test_set, shuffle=True)'alphanumeric.mod')
I am getting following output:
Thanks in advance!
First of all, very well written code.
These are some of the things, I have noticed while I was going through the code and tf,keras docs.
I would like to ask what kind of labels have you got beacuse you know categorical_crossentropy expects ONE HOT CODED labels.(Check this).So, if your labels are integers, use sparsecategoricalentropy.
Similar issue
There was post where someone was trying to classsify into 2 and used categorical instead of binary crossentropy. If you want to look at.
Let me know how it goes!
PS: #gerry made a very good point and if labels are One hot encoded use categoricalcrossentropy!
The code should be:
model = tf.keras.applications.DenseNet121(include_top=False, weights='imagenet, pooling='max', input_shape=(32,32,3))
base_outputs = model.layers[-1].output
prefinal_outputs = layers.Dense(1024)(base_outputs)
final_outputs = layers.Dense(37)(prefinal_outputs)
new_model = keras.Model(inputs=model.input, outputs=final_outputs)
new_model.compile(Adam(), loss='categorical_crossentropy', metrics=['accuracy'])
Also you should use in the future. can now work with generators and model.fit_generator will be depreciate in future versions of tensorflow. I ran against your dataset and got accurate results in about 10 epochs. Here is some additional advice. It is best to use and adjustable learning rate. The keras callback ReduceLROnPlateau makes this easy to do. Documentation is here. Set it to monitor the validation loss. My use is shown below.
lr_adjust=tf.keras.callbacks.ReduceLROnPlateau( monitor="val_loss", factor=0.5, patience=1, verbose=1, mode="auto",
min_delta=0.00001, cooldown=0, min_lr=0)
Also I recommend using the callback ModelCheckpoint. Documentation is here. Set it up to monitor validation loss and it will save the weights that achieved the lowest validation loss. My implementation is shown below.
sav_loc=r'c:\Temp' # set this to the path where you want to save the weights
checkpoint=tf.keras.callbacks.ModelCheckpoint(filepath=save_loc, monitor='val_loss', verbose=1, save_best_only=True,
save_weights_only=True, mode='auto', save_freq='epoch', options=None)
callbacks=[checkpoint, lr_adjust]
In include callbacks=callbacks. When training is completed you want to load these saved weights into the model, then save the model. You can use the saved model to make predictions. Code is below.

Writing the following CNN in tensorflow

I am new to this Deep Learning. I have learnt the basics through reading and trying to implement a real network to see how/if it'll really work. I chose Tensorflow in digits and the following network because they give out the exact architecture with training materiel. Steganalysis with DL
I wrote the following code for the architecture in Steganalysis with DL by looking at networks existing networks in digits and Tensorflow documentation.
from model import Tower
from utils import model_property
import tensorflow as tf
import tensorflow.contrib.slim as slim
import utils as digits
class UserModel(Tower):
def inference(self):
x = tf.reshape(self.x, shape=[-1, self.input_shape[0], self.input_shape[1], self.input_shape[2]])
with slim.arg_scope([slim.conv2d, slim.fully_connected],
conv1 = tf.layers.conv2d(inputs=x, filters=64, kernel_size=7, padding='same', strides=2, activation=tf.nn.relu)
rnorm1 = tf.nn.local_response_normalization(input=conv1)
conv2 = tf.layers.conv2d(inputs=rnorm1, filters=16, kernel_size=5, padding='same', strides=1, activation=tf.nn.relu)
rnorm2 = tf.nn.local_response_normalization(input=conv2)
flatten = tf.contrib.layers.flatten(rnorm2)
fc1 = tf.contrib.layers.fully_connected(inputs=flatten, num_outputs=1000, activation_fn=tf.nn.relu)
fc2 = tf.contrib.layers.fully_connected(inputs=fc1, num_outputs=1000, activation_fn=tf.nn.relu)
fc3 = tf.contrib.layers.fully_connected(inputs=fc2, num_outputs=2)
sm = tf.nn.softmax(fc3)
return fc3
def loss(self):
model = self.inference
loss = digits.classification_loss(model, self.y)
accuracy = digits.classification_accuracy(model, self.y)
self.summaries.append(tf.summary.scalar(, accuracy))
return loss
I tried running it but the accuracy is pretty low. Could someone tell me if I've done it completely wrong or what's wrong with it and tell me how to properly code it?
UPDATE: Thank you Nessuno! With the fix you mentioned I came up with this code:
from model import Tower
from utils import model_property
import tensorflow as tf
import tensorflow.contrib.slim as slim
import utils as digits
class UserModel(Tower):
def inference(self):
x = tf.reshape(self.x, shape=[-1, self.input_shape[0], self.input_shape[1], self.input_shape[2]])
with slim.arg_scope([slim.conv2d, slim.fully_connected],
conv1 = tf.layers.conv2d(inputs=x, filters=64, kernel_size=7, padding='Valid', strides=2, activation=tf.nn.relu)
rnorm1 = tf.nn.local_response_normalization(input=conv1)
conv2 = tf.layers.conv2d(inputs=rnorm1, filters=16, kernel_size=5, padding='Valid', strides=1, activation=tf.nn.relu)
rnorm2 = tf.nn.local_response_normalization(input=conv2)
flatten = tf.contrib.layers.flatten(rnorm2)
fc1 = tf.contrib.layers.fully_connected(inputs=flatten, num_outputs=1000, activation_fn=tf.nn.relu)
fc2 = tf.contrib.layers.fully_connected(inputs=fc1, num_outputs=1000, activation_fn=tf.nn.relu)
fc3 = tf.contrib.layers.fully_connected(inputs=fc2, num_outputs=2, activation_fn=None)
return fc3
def loss(self):
model = self.inference
loss = digits.classification_loss(model, self.y)
accuracy = digits.classification_accuracy(model, self.y)
self.summaries.append(tf.summary.scalar(, accuracy))
return loss
Solver type is SGD. Learning rate is 0.001. I am shuffling training data.I have increased training data to 6000 (3000 per category, 20% from that is reserved for validation). I downloaded the training data from this link. But I am only getting the following graph. I think this is overfitting. Do you have any suggestions to improve the validation accuracy?
In NVIDIA digits, classification_loss, exactly as in tensorflow tf.nn.softmax_cross_entropy_with_logits expects as input a linear layer of neuron.
Instead, you're passing as input sm = tf.nn.softmax(fc3), hence you're applying the softmax operation 2 times and this is the reasong of your low accuracy.
In order to solve this issue, just change the model output layer to
fc3 = slim.fully_connected(fc2, 2, activation_fn=None, scope='fc3')
return fc3

ValueError: Error when checking target: expected activation_6 to have shape(None,2) but got array with shape (5760,1)

I am trying to adapt Python code for a Convolutional Neural Network (in Keras) with 8 classes to work on 2 classes. My problem is that I get the following error message:
ValueError: Error when checking target: expected activation_6 to have
shape(None,2) but got array with shape (5760,1).
My Model is as follows (without the indentation issues):
class MiniVGGNet:
def build(width, height, depth, classes):
# initialize the model along with the input shape to be
# "channels last" and the channels dimension itself
model = Sequential()
inputShape = (height, width, depth)
chanDim = -1
# if we are using "channels first", update the input shape
# and channels dimension
if K.image_data_format() == "channels_first":
inputShape = (depth, height, width)
chanDim = 1
# first CONV => RELU => CONV => RELU => POOL layer set
model.add(Conv2D(32, (3, 3), padding="same",
model.add(Conv2D(32, (3, 3), padding="same"))
model.add(MaxPooling2D(pool_size=(2, 2)))
# second CONV => RELU => CONV => RELU => POOL layer set
model.add(Conv2D(64, (3, 3), padding="same"))
model.add(Conv2D(64, (3, 3), padding="same"))
model.add(MaxPooling2D(pool_size=(2, 2)))
# first (and only) set of FC => RELU layers
# softmax classifier
# return the constructed network architecture
return model
Where classes = 2, and inputShape=(32,32,3).
I know that my error has something to do with my classes/use of binary_crossentropy and occurs in the line below, but haven't been able to figure out why it is problematic, or how to fix it.
By changing model.add(Dense(classes)) above to model.add(Dense(classes-1)) I can get the model to train, but then my labels size and target_names are mismatched, and I have only one category which everything is categorized as.
# import the necessary packages
from sklearn.preprocessing import LabelBinarizer
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
from pyimagesearch.nn.conv import MiniVGGNet
from pyimagesearch.preprocessing import ImageToArrayPreprocessor
from pyimagesearch.preprocessing import SimplePreprocessor
from pyimagesearch.datasets import SimpleDatasetLoader
from keras.optimizers import SGD
#from keras.datasets import cifar10
from imutils import paths
import matplotlib.pyplot as plt
import numpy as np
import argparse
# construct the argument parse and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-d", "--dataset", required=True,
help="path to input dataset")
ap.add_argument("-o", "--output", required=True,
help="path to the output loss/accuracy plot")
args = vars(ap.parse_args())
# grab the list of images that we'll be describing
print("[INFO] loading images...")
imagePaths = list(paths.list_images(args["dataset"]))
# initialize the image preprocessors
sp = SimplePreprocessor(32, 32)
iap = ImageToArrayPreprocessor()
# load the dataset from disk then scale the raw pixel intensities
# to the range [0, 1]
sdl = SimpleDatasetLoader(preprocessors=[sp, iap])
(data, labels) = sdl.load(imagePaths, verbose=500)
data = data.astype("float") / 255.0
# partition the data into training and testing splits using 75% of
# the data for training and the remaining 25% for testing
(trainX, testX, trainY, testY) = train_test_split(data, labels,
test_size=0.25, random_state=42)
# convert the labels from integers to vectors
trainY = LabelBinarizer().fit_transform(trainY)
testY = LabelBinarizer().fit_transform(testY)
# initialize the label names for the items dataset
labelNames = ["mint", "used"]
# initialize the optimizer and model
print("[INFO] compiling model...")
opt = SGD(lr=0.01, decay=0.01 / 10, momentum=0.9, nesterov=True)
model =, height=32, depth=3, classes=2)
model.compile(loss="binary_crossentropy", optimizer=opt,
# train the network
print("[INFO] training network...")
H =, trainY, validation_data=(testX, testY),
batch_size=64, epochs=10, verbose=1)
print ("Made it past training")
# evaluate the network
print("[INFO] evaluating network...")
predictions = model.predict(testX, batch_size=64)
predictions.argmax(axis=1), target_names=labelNames))
# plot the training loss and accuracy"ggplot")
plt.plot(np.arange(0, 10), H.history["loss"], label="train_loss")
plt.plot(np.arange(0, 10), H.history["val_loss"], label="val_loss")
plt.plot(np.arange(0, 10), H.history["acc"], label="train_acc")
plt.plot(np.arange(0, 10), H.history["val_acc"], label="val_acc")
plt.title("Training Loss and Accuracy on items dataset")
plt.xlabel("Epoch #")
I have looked at these questions already, but cannot workout how to get around this problem based on the responses.
Stackoverflow Question 1
Stackoverflow Question 2
Stackoverflow Question 3
Any advice or help would be much appreciated, as I've spent the last couple of days on this.
Matt's comment was absolutely correct in that the problem lay with using LabelBinarizer and this hint led me to a solution that did not require me to give up using softmax, or change the last layer to have classes = 1. For posterity and for others, here's the section of code that I altered and how I was able to avoid LabelBinarizer:
from keras.utils import np_utils
from sklearn.preprocessing import LabelEncoder
# load the dataset from disk then scale the raw pixel intensities
# to the range [0,1]
sp = SimplePreprocessor (32, 32)
iap = ImageToArrayPreprocessor()
# encode the labels, converting them from strings to integers
labels = le.fit_transform(labels)
data = data.astype("float") / 255.0
labels = np_utils.to_categorical(labels,2)
# partition the data into training and testing splits using 75% of
# the data for training and the remaining 25% for testing
I believe the problem lies in the use of LabelBinarizer.
From this example:
>>> lb = preprocessing.LabelBinarizer()
>>> lb.fit_transform(['yes', 'no', 'no', 'yes'])
I gather that the output of your transformation has the same format, i. e. a single 1 or 0 encoding "is new" or "is used".
If your problem only calls for classification among these two classes, that format is preferable because it contains all the information and uses less space than the alternative, i. e. [1,0], [0,1], [0,1], [1,0].
Therefore, using classes = 1 would be correct, and the output should be a float indicating the network's confidence in a sample being in the first class. Since these values have to sum to one, the probability of it being in the second class could easily be inferred by subtracting from 1.
You would need to replace softmax with any other activation, because softmax on a single value always returns 1. I'm not completely sure about the behaviour of binary_crossentropy with a single-valued result, and you may want to try mean_squared_error as the loss.
If you are looking to expand your model to cover more than two classes, you would want to convert your target vector to a One-hot encoding. I believe inverse_transform from LabelBinarizer would do this, although that would seem to be quite a roundabout way to get there. I see that sklearn also has OneHotEncoder which may the more appropriate replacement.
NB: You can specify the activation function for any layer more easily with, for example:
Dense(36, activation='relu')
This may be helpful in keeping your code to a manageable size.