Why are encoded representations bad for classification? - tensorflow

Given a pre-trained well-performing auto-encoder. When I train a classifier on encodings (produced by the auto-encoder) the classifier does very poorly. In particular, it does much worse than training a classifier on normal inputs (i.e. unencoded inputs).
However, when I fine-tune the encoder based on classification loss, the classifier does quite well.
Why are encoded representations bad for classification?
Details: I’m working on CIFAR-100 and trying to classify coarse image labels, i.e. 20 classes (but I think I had the same problem when doing classification on CIFAR-10). The classifier has 5 layers and I’m using dropout:
classifier = tf.keras.Sequential([
], name='classifier')


Keras CNN: Multi Label Classification of Images

I am rather new to deep learning and got some questions on performing a multi-label image classification task with keras convolutional neural networks. Those are mainly referring to evaluating keras models performing multi label classification tasks. I will structure this a bit to get a better overview first.
Problem Description
The underlying dataset are album cover images from different genres. In my case those are electronic, rock, jazz, pop, hiphop. So we have 5 possible classes that are not mutual exclusive. Task is to predict possible genres for a given album cover. Each album cover is of size 300px x 300px. The images are loaded into tensorflow datasets, resized to 150px x 150px.
Model Architecture
The architecture for the model is the following.
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.models import Sequential
data_augmentation = keras.Sequential(
layers.experimental.preprocessing.RandomZoom(height_factor=(0.2, 0.6), width_factor=(0.2, 0.6))
def create_model(num_classes=5, augmentation_layers=None):
model = Sequential()
# We can pass a list of layers performing data augmentation here
if augmentation_layers:
# The first layer of the augmentation layers must define the input shape
model.add(layers.experimental.preprocessing.Rescaling(1./255, input_shape=(img_height, img_width, 3)))
model.add(layers.Conv2D(32, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(128, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(128, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Dense(512, activation='relu'))
# Use sigmoid activation function. Basically we train binary classifiers for each class by specifiying binary crossentropy loss and sigmoid activation on the output layer.
model.add(layers.Dense(num_classes, activation='sigmoid'))
return model
I'm not using the usual metrics here like standard accuracy. In this paper I read that you cannot evaluate multi-label classification models with the usual methods. In chapter 7. evaluation metrics the hamming loss and an adjusted accuracy (variant of exact match) are presented which I use for this model.
The hamming loss is already provided by tensorflow-addons (see here) and an implementation of the subset accuracy I found here (see here).
from tensorflow_addons.metrics import HammingLoss
hamming_loss = HammingLoss(mode="multilabel", threshold=0.5)
def subset_accuracy(y_true, y_pred):
# From https://stackoverflow.com/questions/56739708/how-to-implement-exact-match-subset-accuracy-as-a-metric-for-keras
threshold = tf.constant(.5, tf.float32)
gtt_pred = tf.math.greater(y_pred, threshold)
gtt_true = tf.math.greater(y_true, threshold)
accuracy = tf.reduce_mean(tf.cast(tf.equal(gtt_pred, gtt_true), tf.float32), axis=-1)
return accuracy
# Create model
model = create_model(num_classes=5, augmentation_layers=data_augmentation)
# Compile model
model.compile(loss="binary_crossentropy", optimizer="adam", metrics=[subset_accuracy, hamming_loss])
# Fit the model
history = model.fit(training_dataset, epochs=epochs, validation_data=validation_dataset, callbacks=callbacks)
Problem with this model
When training the model subset_accuracy hamming_loss are at some point stuck which looks like the following:
What could cause this behaviour. I am honestly a little bit lost right now. Could this be a case of the dying relu problem? Or is it wrong use of the metrics mentioned or is the implementation of those maybe wrong?
So far I tried to test differen optimizers and lowering the learning rate (e.g. from 0.01 to 0.001, 0.0001, etc..) but that didn't help either.
Maybe somebody has an idea that can help me.
Thanks in advance!
I think you need to tune your model's hyperparameters right. For that I'll recommend try using Keras Tuner library.
This would take some time to run, but will fetch you right set of hyperparameters.

Why is my CNN/Image Classifier model accuracy so low?

I'm currently trying to build a CNN that can detect whether a patient has pnemonia caused by covid or not, and no matter what parameters I change the model accuracy is staying at 49%/50% so its basically useless because it's the same as a coin flip. Here is my code, I thought I would try using the VGG-16 model.
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Conv2D, MaxPool2D, GlobalAveragePooling2D
from tensorflow.keras.callbacks import ModelCheckpoint, EarlyStopping
from keras.preprocessing.image import ImageDataGenerator
# Loading in the dataset
traindata = ImageDataGenerator(rescale=1/255)
trainingdata = traindata.flow_from_directory(
testdata = ImageDataGenerator(rescale=1/255)
testingdata = testdata.flow_from_directory(
# Initialize the model w/ Sequential & add layers + input and output <- will refer to the VGG 16 model architecture
model = Sequential()
model.add(Conv2D(input_shape=(224,224,3),filters=64,kernel_size=(2,2),padding="same", activation="relu"))
model.add(Conv2D(filters=64, kernel_size=(3,3), padding="same", activation ="relu"))
model.add(MaxPool2D(pool_size=(2,2), strides=2))
model.add(Conv2D(filters=128, kernel_size=(3,3), padding="same", activation="relu"))
model.add(Conv2D(filters=128, kernel_size=(3,3), padding="same", activation="relu"))
model.add(MaxPool2D(pool_size=(2,2), strides=2))
model.add(Conv2D(filters=256, kernel_size=(3,3), padding="same", activation="relu"))
model.add(Conv2D(filters=256, kernel_size=(3,3), padding="same", activation="relu"))
model.add(Conv2D(filters=256, kernel_size=(3,3), padding="same", activation="relu"))
model.add(MaxPool2D(pool_size=(2,2), strides=2))
model.add(Conv2D(filters=512, kernel_size=(3,3), padding="same", activation="relu"))
model.add(Conv2D(filters=512, kernel_size=(3,3), padding="same", activation="relu"))
model.add(Conv2D(filters=512, kernel_size=(3,3), padding="same", activation="relu"))
model.add(MaxPool2D(pool_size=(2,2), strides=2))
model.add(Conv2D(filters=512, kernel_size=(3,3), padding="same", activation="relu"))
model.add(Conv2D(filters=512, kernel_size=(3,3), padding="same", activation="relu"))
model.add(Conv2D(filters=512, kernel_size=(3,3), padding="same", activation="relu"))
model.add(MaxPool2D(pool_size=(2,2), strides=2))
model.add(Dense(units=4096, activation="relu"))
model.add(Dense(units=4096, activation="relu"))
model.add(Dense(units=1000, activation="relu"))
model.add(Dense(units=1, activation="softmax"))
# Compile the model
model_optimizer = Adam(lr=0.001)
model.compile(optimizer=model_optimizer, loss=keras.losses.binary_crossentropy, metrics=['accuracy'])
# Add the callbacks
checkpoint = ModelCheckpoint(filepath="Covid-19.hdf5", monitor='val_acc', verbose=1, save_best_only=True, save_weights_only=False, mode='auto')
early = EarlyStopping(monitor='val_acc', min_delta=0, patience=50, verbose=1, mode='auto')
fit = model.fit_generator(steps_per_epoch=25, generator=trainingdata, validation_data=testingdata, validation_steps=10,epochs=10,callbacks=[checkpoint,early])
This always gives:
Epoch 1/10 6/25 [======>.......................] - ETA: 1:22:37 -
loss: 7.5388 - accuracy: 0.5083
<- Well, it just always gives a really poor accuracy...
Additional info:
Some of the images in the data set are JPG others are PNG (Not sure if this is the culprit)
The Dataset has 2072 images for training Covid CTs and 2098 images for training NonCovid CTs
The Dataset has 576 images for testing Covid CTs and 532 images for testing NonCovid CTs
File structure looks like this: Covid19ModelImages -> Training Data & Testing Data - Training Data has 2 subfolders Covid19CT and noncovid19 CT and testing data also has 2 subfolders Covid19CT and noncovid19CT
Also: Am I just being too impatient? I never let it run past the 1st epoch cause I just assume its never going to get better than 50%, could it be that the model will improve more on the next epochs?
If anyone would be willing to help out, or if you need any other additional info to maybe help you gain a better understanding of the problem, please let me know!
Since you are using binary cross entropy, the activation function in the dense layer with 1 unit should be "sigmoid". Since you are not using a GPU you have very long training times per epoch. To see if the model is working correctly you may want to reduce this time. There are few things you could do. Try reducing the image size say to 128 by 128. With 224 X 224 you have 50176 pixels to process versus 16384 for the 128 X 128 image so you reduce the computations by about a factor of 3. Also you have two dense layers with 4096 units. This is also computationally expense. It may also lead to overfitting. Try your model initially without these layers and see how it performs. I am not a fan of early stopping because it is a crutch to avoid dealing with the over fitting issue. If you encounter over fitting add a dropout layer to help avoid it. Finally I recommend you use an adjustable learning rate. The callback ReduceLROnPlateau makes this easy to do. Set it to monitor validation loss. You can set the parameters to reduce the learning rate a factor<1 if the loss fails to decrease after "patience" number of consecutive epochs. I usually use factor=.5 and patience=1. This also enables you to use a larger initial learning rate for faster convergence. Documentation is here. You need to let your model run for several epochs to see if the training loss and validation loss are decreasing.

How are the input layers in Keras defined?

So I have this assignment to train a very simple neural network. Our dataset has 6 features that are fed into the network and we are required to train it and then predict one output number. The professor gave us the code and basically told us to learn by ourselves lol. So my doubt is, in the following code, in which the layers for the neural network are defined, does the first dense layer defined (the one with 50 nodes) corresponds to the input layer, or is it the first hidden layer?
If it's the first hidden layer, how are input layers defined?
Thanks in advance!
def get_compiled_model():
model = tf.keras.Sequential([
tf.keras.layers.Dense(50, activation='relu', input_shape=(6,)),
tf.keras.layers.Dense(30, activation='relu'),
tf.keras.layers.Dense(30, activation='relu'),
tf.keras.layers.Dense(1, activation='linear'),
The first dense layer is the first hidden layer. Keras automatically provides an input layer in Sequential objects, and the number of units is defined by input_shape or input_dim.
You can also explicitly state the input layer as follows:
def get_compiled_model():
model = tf.keras.Sequential([
tf.keras.layers.Dense(50, activation='relu'),
tf.keras.layers.Dense(30, activation='relu'),
tf.keras.layers.Dense(30, activation='relu'),
tf.keras.layers.Dense(1, activation='linear'),
It is the first hidden layer. The input layer isn't defined as a separate layer; it simply consists of the input data, and its size is defined by input_shape=(6,).

Tflite 200mb big

I am building a model which should classify flowers. So I created a model with Tensorflow:
keras.layers.Conv2D(128, (3,3), activation='relu', input_shape=(imageShape[0], imageShape[1],3)),
keras.layers.Conv2D(256, (3,3), activation='relu'),
keras.layers.Conv2D(512, (3,3), activation='relu'),
keras.layers.Dense(280, activation='relu'),
keras.layers.Dense(4, activation='softmax')
opt = tf.keras.optimizers.RMSprop()
optimizer= opt,
While training I save checkpoints as .h5
checkpoint = ModelCheckpoint("preSaved"+str(time.time())+".h5", monitor='val_loss', verbose=1,
save_best_only=True, save_weights_only=False, mode='auto', period=1)
Now I got an epoch with a pretty low loss and want to convert it to .tflite to upload it to Firebase (use it in an Android Studio App).
import tensorflow as tf
new_model= tf.keras.models.load_model(filepath="model.h5")
tflite_converter = tf.lite.TFLiteConverter.from_keras_model(new_model)
tflite_model = tflite_converter.convert()
open("tf_lite_model.tflite", "wb").write(tflite_model)
The .h5 has about 335mb and the final .tflite got 160mb.But Firebase only allows .tflite to 60 mb and if I use a local model it needs minutes to load. I read that .tflite are usually smaller.
Is there a problem in my model or when I convert it to .tflite?
The model size is largely determined by your model architecture (the different layers that make up the model and the number of parameters in each layer). You can experiment changing those to get a smaller model.
Here is much simpler architecture for an image classification model. Keep in mind, of course, that going with a smaller model might have lower accuracy than a more sophisticated version.

RNN Not Generalizing on Text Classification

I am using keras and RNN to classify slack text data on whether the text is reaction worthy or not (1 - emoji, 0 - no emoji). I have removed usernames and urls from the text as well as dropped duplicates with different target variables.
I am not able to get the model to generalize to unseen data. The loss of the train/val sets look good and continually decrease but the accuracy of the val set only decreases.
I am using a pretrained GLOVE word embedding since my training size is only about 25,000 sentences.
I have added additional layers, changed my regularization value and increased dropout but get similar results. Is my model not complex enough to generalize the data? The times i added additional layers they were much smaller but deeper because the training time was about 2 min per epoch.
Any insight would be appreciated.
embedding_layer = Embedding(len(word_index) + 1,
# Creating the Model
model = Sequential()
model.add(Convolution1D(filters=32, kernel_size=3, padding='same', activation='relu'))
model.add(Dense(1, activation='sigmoid'))
# Compiling the model with our given Optimizer
optimizer = Adam(lr=0.001, beta_1=0.9, beta_2=0.999, epsilon=1e-08, decay=0.000025)
model.compile(loss='binary_crossentropy', optimizer=optimizer, metrics=['accuracy'])