Multivariate Binary Classification Prediction Tensorflow 2 LSTM - tensorflow

I am currently working on the implementation of an LSTM to predict a binary outcome (either 0 or 1) for a given set of normed scaled features.
self._regressor.add(LSTM(units=60, activation='relu', return_sequences=True, input_shape=(data.x_train.shape[1], data.x_train.shape[2])))
self._regressor.add(Dropout(0.2))
self._regressor.add(LSTM(units=60, activation='relu', return_sequences=True))
self._regressor.add(Dropout(0.3))
self._regressor.add(LSTM(units=80, activation='relu', return_sequences=True))
self._regressor.add(Dropout(0.4))
self._regressor.add(LSTM(units=120, activation='relu'))
self._regressor.add(Dropout(0.5))
#this is the output layer
self._regressor.add(Dense(units=1, activation='sigmoid'))
self._logger.info("TensorFlow Summary\n {}".format(self._regressor.summary()))
#run regressor
self._regressor.compile(optimizer='adam', loss="binary_crossentropy", metrics=['accuracy'])
self._regressor.fit(data.x_train, data.y_train, epochs=1, batch_size=32)
data.y_pred_scaled = self._regressor.predict(data.x_test)
data.y_pred = self._scaler_target.inverse_transform(data.y_pred_scaled)
scores = self._regressor.evaluate(data.x_test, data.y_test, verbose=0)
My issue here is that the output of my prediction has a range of max: 0.5188445 and min: 0.518052, implying to me that all of my classifications are positive (which is definitely incorrect). I even tried predict_classes and this yielded an array of 1's.
I am struggling to find where my issue is despite numerous searches online. I have ensured that my final output layer consists of a sigmoid function as well as included the loss as the binary_crossentropy also. My data has been scaled using sklearn's MinMaxScaler with feature_range=(0,1). I am running my code through a debugger and everything up to the self._regressor.fit looks good so far. I am just struggling with quantifying the output of the predictions.
Any help would be greatly appreciated.

Related

CNN with LSTM-Layer

I have implemented a CNN with an LSTM layer. My input consists of four images. The images were transformed into a tensor by feature extraction. The input shape is (4,256,256,3).
The following is the structure of my model:
model = keras.models.Sequential()
model.add(TimeDistributed(Conv2D(32,(3,3),padding = 'same', activation = 'relu'),input_shape = (4,256,256,3)))
model.add(TimeDistributed(MaxPooling2D((2,2))))
model.add(TimeDistributed(Dropout(0.25)))
model.add(TimeDistributed(Conv2D(64,(3,3),padding = 'same', activation = 'relu')))
model.add(TimeDistributed(MaxPooling2D((4,4))))
model.add(TimeDistributed(Dropout(0.25)))
model.add(TimeDistributed(Conv2D(128,(3,3),padding = 'same', activation = 'relu')))
model.add(TimeDistributed(MaxPooling2D((2,2))))
model.add(TimeDistributed(Dropout(0.25)))
model.add(TimeDistributed(Flatten()))
model.add(LSTM(128, activation='tanh'))# finalize with standard Dense, Dropout...
model.add(Dense(64, activation='relu'))
model.add(Dropout(.5))
model.add(Dense(1, activation='relu'))
optim = keras.optimizers.Adam(learning_rate=0.001)
model.compile(optimizer=optim, loss=['MSE'])
history = model.fit(x=X, y=Y, batch_size=4, epochs=5, validation_split=0.2, validation_data=(X,Y))
My problem is that my model predicts the same values for all inputs.
What could be the problem?
you use the same data for training and validation. this kills the whole point of validation. Perhaps the mistake lies in this. Try to split the data, or apply cross validation.
Also, the application of the relu activation function to the last layer in combination with the mse error looks strange. At least the real can give an unlimited result, and the data should be normalized.
I hope this will help you
if you are working with a classification problem specifically binary classification, then using sigmoid activation instead softmax And MSE loss is not a good choice for binary classification.

Keras CNN: Multi Label Classification of Images

I am rather new to deep learning and got some questions on performing a multi-label image classification task with keras convolutional neural networks. Those are mainly referring to evaluating keras models performing multi label classification tasks. I will structure this a bit to get a better overview first.
Problem Description
The underlying dataset are album cover images from different genres. In my case those are electronic, rock, jazz, pop, hiphop. So we have 5 possible classes that are not mutual exclusive. Task is to predict possible genres for a given album cover. Each album cover is of size 300px x 300px. The images are loaded into tensorflow datasets, resized to 150px x 150px.
Model Architecture
The architecture for the model is the following.
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.models import Sequential
data_augmentation = keras.Sequential(
[
layers.experimental.preprocessing.RandomFlip("horizontal",
input_shape=(img_height,
img_width,
3)),
layers.experimental.preprocessing.RandomFlip("vertical"),
layers.experimental.preprocessing.RandomRotation(0.4),
layers.experimental.preprocessing.RandomZoom(height_factor=(0.2, 0.6), width_factor=(0.2, 0.6))
]
)
def create_model(num_classes=5, augmentation_layers=None):
model = Sequential()
# We can pass a list of layers performing data augmentation here
if augmentation_layers:
# The first layer of the augmentation layers must define the input shape
model.add(augmentation_layers)
model.add(layers.experimental.preprocessing.Rescaling(1./255))
else:
model.add(layers.experimental.preprocessing.Rescaling(1./255, input_shape=(img_height, img_width, 3)))
model.add(layers.Conv2D(32, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(128, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(128, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Flatten())
model.add(layers.Dense(512, activation='relu'))
# Use sigmoid activation function. Basically we train binary classifiers for each class by specifiying binary crossentropy loss and sigmoid activation on the output layer.
model.add(layers.Dense(num_classes, activation='sigmoid'))
model.summary()
return model
I'm not using the usual metrics here like standard accuracy. In this paper I read that you cannot evaluate multi-label classification models with the usual methods. In chapter 7. evaluation metrics the hamming loss and an adjusted accuracy (variant of exact match) are presented which I use for this model.
The hamming loss is already provided by tensorflow-addons (see here) and an implementation of the subset accuracy I found here (see here).
from tensorflow_addons.metrics import HammingLoss
hamming_loss = HammingLoss(mode="multilabel", threshold=0.5)
def subset_accuracy(y_true, y_pred):
# From https://stackoverflow.com/questions/56739708/how-to-implement-exact-match-subset-accuracy-as-a-metric-for-keras
threshold = tf.constant(.5, tf.float32)
gtt_pred = tf.math.greater(y_pred, threshold)
gtt_true = tf.math.greater(y_true, threshold)
accuracy = tf.reduce_mean(tf.cast(tf.equal(gtt_pred, gtt_true), tf.float32), axis=-1)
return accuracy
# Create model
model = create_model(num_classes=5, augmentation_layers=data_augmentation)
# Compile model
model.compile(loss="binary_crossentropy", optimizer="adam", metrics=[subset_accuracy, hamming_loss])
# Fit the model
history = model.fit(training_dataset, epochs=epochs, validation_data=validation_dataset, callbacks=callbacks)
Problem with this model
When training the model subset_accuracy hamming_loss are at some point stuck which looks like the following:
What could cause this behaviour. I am honestly a little bit lost right now. Could this be a case of the dying relu problem? Or is it wrong use of the metrics mentioned or is the implementation of those maybe wrong?
So far I tried to test differen optimizers and lowering the learning rate (e.g. from 0.01 to 0.001, 0.0001, etc..) but that didn't help either.
Maybe somebody has an idea that can help me.
Thanks in advance!
I think you need to tune your model's hyperparameters right. For that I'll recommend try using Keras Tuner library.
This would take some time to run, but will fetch you right set of hyperparameters.

(TensorFlow) TimeDistributed layer for image classification

I know that “Time Distributed” layers are used when we have several images that are chronologically ordered to detect movements, actions, directions etc.
However, I work on speech classification using spectrograms. Every speech is transformed into a spectrogram, which will be fed later to a neural network to perform classification. So my database is in the form of 2093 RGB images(100x100x3). For now I have used a CNN and the input is
x_train = np.array(x_train).reshape(2093,100,100, 3)
And every thing works just fine.
But now, I would like to use CNN+BLSTM (similar to the following picture, which is taken from this paper) , which means I am going to need time steps. So, every image should be divided into smaller frames.
The question is, how to prepare the data to do such a thing ?
Assuming that I want to divide every image into 10 frames (time steps). Should I just reshape the data
x_train = np.array(x_train).reshape(2093,10,10,100, 3)
Which works just fine but I'm not sure if it's the right thing , or there is another way to do that ?
This is the model that I'm using
model = tf.keras.Sequential([
tf.keras.layers.TimeDistributed(tf.keras.layers.Conv2D(filters=64, kernel_size=2, padding='same', activation='relu', input_shape=(100,100,3),name="conv1")),
tf.keras.layers.Dropout(0.3),
tf.keras.layers.TimeDistributed(tf.keras.layers.MaxPooling2D(pool_size=2)),
tf.keras.layers.TimeDistributed(tf.keras.layers.Conv2D(filters=128, kernel_size=2, padding='same', activation='relu')),
tf.keras.layers.Dropout(0.3),
tf.keras.layers.TimeDistributed(tf.keras.layers.MaxPooling2D(pool_size=2)),
tf.keras.layers.TimeDistributed(tf.keras.layers.Conv2D(filters=256, kernel_size=2, padding='same', activation='relu')),
tf.keras.layers.Dropout(0.3),
tf.keras.layers.TimeDistributed(tf.keras.layers.MaxPooling2D(pool_size=2)),
tf.keras.layers.TimeDistributed(tf.keras.layers.Flatten()),
tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(64)),
tf.keras.layers.Dropout(0.3),
tf.keras.layers.Dense(200, activation="relu"),
tf.keras.layers.Dense(10, activation= "softmax")
])
By using the previous model, I got 47% on train accuracy and 46% accuracy on validation accuracy, but with using only CNN I got 95% on train and 71% on validation, could anyone give me a hint how to solve this problem ?

How to use a Keras LSTM with batch size of 1 ? (avoid zero padding)

I want to train a Keras LSTM using a batch size of one.
In this case I wouldn't need zero padding so far I understood. The necessity for zero padding comes from equalizing the size of the batches, right?
It turns out this is not as easy as I thought.
My network looks like this:
model = Sequential()
model.add(Embedding(output_dim=embeddings.shape[1],
input_dim=embeddings.shape[0],
weights=[embeddings],
trainable=False))
model.add(Dropout(0.2))
model.add(Bidirectional(LSTM(100,
return_sequences=True,
activation="tanh",kernel_initializer="glorot_uniform")))
model.add(TimeDistributed(Dense(maxLabel)))
model.add(Activation('softmax'))
model.compile(loss='categorical_crossentropy',
optimizer='sgd',
metrics=['accuracy'])
model.fit(x_train, y_train, epochs=20, batch_size=1, shuffle=True)
I naively provide my training data and training labels as simple numpy array with this shape properties:
X: (2161,)
Y: (2161,)
I get now a ValueError: Error when checking target: expected activation_1 to have 3 dimensions, but got array with shape (2161, 1)
I am not sure how to satisfy this 3-D property without zero-padding which is what I want to avoid when working with batch sizes of one anyway.
Does anyone see what I am doing wrong?

RNN Not Generalizing on Text Classification

I am using keras and RNN to classify slack text data on whether the text is reaction worthy or not (1 - emoji, 0 - no emoji). I have removed usernames and urls from the text as well as dropped duplicates with different target variables.
I am not able to get the model to generalize to unseen data. The loss of the train/val sets look good and continually decrease but the accuracy of the val set only decreases.
I am using a pretrained GLOVE word embedding since my training size is only about 25,000 sentences.
I have added additional layers, changed my regularization value and increased dropout but get similar results. Is my model not complex enough to generalize the data? The times i added additional layers they were much smaller but deeper because the training time was about 2 min per epoch.
Any insight would be appreciated.
embedding_layer = Embedding(len(word_index) + 1,
100,
weights=[embeddings_matrix],
input_length=max_message_length,
embeddings_regularizer=l2(0.001),
trainable=True)
# Creating the Model
model = Sequential()
model.add(embedding_layer)
model.add(Convolution1D(filters=32, kernel_size=3, padding='same', activation='relu'))
model.add(MaxPooling1D(pool_size=2))
model.add(Dropout(0.7))
model.add(layers.GRU(128))
model.add(Dropout(0.7))
model.add(Dense(1, activation='sigmoid'))
# Compiling the model with our given Optimizer
optimizer = Adam(lr=0.001, beta_1=0.9, beta_2=0.999, epsilon=1e-08, decay=0.000025)
model.compile(loss='binary_crossentropy', optimizer=optimizer, metrics=['accuracy'])
print(model.summary())