Categorical_crossentropy loss function has value of 0.0000e +00 for a BiLSTM sentiment analysis model - tensorflow

This is the graph of my model
Model
Code format:
def model_creation(vocab_size, embedding_dim, embedding_matrix,
rnn_units, batch_size,
train_embed=False):
model = Sequential(
[
Embedding(vocab_size, embedding_dim,
weights=[embedding_matrix], trainable=train_embed, mask_zero=True),
Bidirectional(LSTM(rnn_units, return_sequences=True, dropout=0.5)),
Bidirectional(LSTM(rnn_units, dropout=0.25)),
Dense(1, activation="softmax")
])
return model
The embedding layer receive an embedding matrix with value from Word2Vec
This is the code for the embedding matrix:
Embedding Matrix
def create_embedding_matrix(encoder,dict_w2v):
embedding_dim = 50
embedding_matrix = np.zeros((encoder.vocab_size, embedding_dim))
for word in encoder.tokens:
embedding_vector = dict_w2v.get(word)
if embedding_vector is not None: # dictionary contains word
test = encoder.encode(word)
token_id = encoder.encode(word)[0]
embedding_matrix[token_id] = embedding_vector
return embedding_matrix
Dataset
I'm using the amazon product dataset https://jmcauley.ucsd.edu/data/amazon/
This is what the dataframe look like
I'm only interested in overall and reviewText
overall is my Label and reviewText is my Feature
overall has a range of [1,5]
Problem
During training with categorical_crossentropy loss the is at 0.0000e +00, I don't think loss can be minimized well so accuracy is always at 0.1172
Did I configure my model wrong or is there any problem? How do I fix my loss function issue ? Please tell me if it's not clear enough I'll provide more information. I'm not sure what the problem is

Related

Output logits with softmax aren't extreme when using Tensorflow. No prediction is very confident

I trained a text classification model on data in TensorFlow and plotted the SoftMax confidence for the correct prediction as well as the SoftMax confidence for incorrect predictions. When I did this I noticed that there were no output predictions with a high logit/class probability. For example, predicting between 4 classes had these results:
(TensorFlow version)
input text: "Text that fits into class 0"
logits: [.3928, 0.2365, 0.1854, 0.1854]
label: class 0
I would hope that the logit output for class 0 would be higher than .3928! Looking at the graph you can see that none of the prediction logits output a number higher than (.5).
Next, I retrained the exact same model and dataset but in Pytorch. With Pytorch, I got the results I was looking for. Both models had the exact same validation accuracy after training. (90% val accuracy)
(Pytorch Version)
input text: "Text that fits into class 0"
logits: [0.8778, 0.0532, 0.0056, 0.0635]
label: class 0
Here is what I understand about the softmax function:
The softmax function transforms its inputs, which can be seen as logits, into a probability distribution over the classes. The maximum value of the softmax is 1, so if your largest logit is 0.5, then it means that the highest probability assigned by the softmax will be relatively low (less than 1/2).
To have more extreme outputs, you need to have larger logits. One way to do this is to train your model with more data and a more powerful architecture, so that it can learn more complex relationships between inputs and outputs. It may not be desirable for a model to have extremely high confidence in its predictions, as it could lead to overfitting to the training data. The appropriate level of confidence will depend on the specific use case and the desired trade-off between precision and recall.
My Tensorflow Model:
tokenizer = AutoTokenizer.from_pretrained('prajjwal1/bert-tiny', from_pt = True)
encoder = TFAutoModel.from_pretrained('prajjwal1/bert-tiny', from_pt = True)
# Define input layer with token and attention mask
input_ids = tf.keras.layers.Input(shape=(None,), dtype=tf.int32, name="input_ids")
attention_mask = tf.keras.layers.Input(shape=(None,), dtype=tf.int32, name="attention_mask")
# Call the ALBERT model with the inputs
pooler_output = encoder(input_ids, attention_mask=attention_mask)[1] # 1 is pooler output
# Define a dense layer on top of the pooled output
x = tf.keras.layers.Dense(units=params['fc_layer_size'])(pooler_output)
x = tf.keras.layers.Dropout(params['dropout'])(x)
outputs = tf.keras.layers.Dense(4, activation='softmax')(x)
# Define a model with the inputs and dense layer
model = tf.keras.Model(inputs=[input_ids, attention_mask], outputs=outputs)
loss = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=False)
optimizer = tf.keras.optimizers.SGD(learning_rate=0.0008)
metrics = [tf.metrics.SparseCategoricalAccuracy()]
# Compile the model
model.compile(optimizer='sgd', loss=loss, metrics=metrics)
My Pytorch Model:
tokenizer = AutoTokenizer.from_pretrained('prajjwal1/bert-tiny')
encoder = AutoModel.from_pretrained('prajjwal1/bert-tiny')
loss_fn = nn.CrossEntropyLoss()
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=0.0008)
class EscalationClassifier(nn.Module):
def __init__(self, encoder):
super(EscalationClassifier, self).__init__()
self.encoder = encoder
self.fc1 = nn.Linear(128, 312)
self.fc2 = nn.Linear(312, 4)
self.dropout = nn.Dropout(0.2)
def forward(self, input_ids, attention_mask):
pooled_output = self.encoder(input_ids, attention_mask=attention_mask)[1]# [0] is last hidden state, 1 for pooler output
# pdb.set_trace()
x = self.fc1(pooled_output)
x = self.dropout(x)
x = self.fc2(x)
return x
model = EscalationClassifier(encoder)
Can anyone help me explain why my Tensorflow logit outputs aren't more confident like the pytorch outputs? *The problem doesn't seem to be with the softmax itself.

How to properly stack RNN layers?

I've been trying to implement a character-level language model in tensorflow based on this tutorial.
I would like to extend the model by allowing multiple RNN layers to be stacked. So far I've come up with this:
class MyModel(tf.keras.Model):
def __init__(self, vocab_size, embedding_dim, rnn_type, rnn_units, num_layers, dropout):
super().__init__(self)
self.rnn_type = rnn_type.lower()
self.num_layers = num_layers
self.embedding = tf.keras.layers.Embedding(vocab_size, embedding_dim)
if self.rnn_type == 'gru':
rnn_layer = tf.keras.layers.GRU
elif self.rnn_type == 'lstm':
rnn_layer = tf.keras.layers.LSTM
elif self.rnn_type == 'rnn':
rnn_layer = tf.keras.layers.SimpleRNN
else:
raise ValueError(f'Unsupported RNN layer: {rnn_type}')
setattr(self, self.rnn_type, rnn_layer(rnn_units, return_sequences=True, return_state=True, dropout=dropout))
for i in range(1, num_layers):
setattr(self, f'{self.rnn_type}_{i}', rnn_layer(rnn_units, return_sequences=True, return_state=True, dropout=dropout))
self.dense = tf.keras.layers.Dense(vocab_size)
def call(self, inputs, states=None, return_state=False, training=False):
x = inputs
x = self.embedding(x, training=training)
rnn = self.get_layer(self.rnn_type)
if states is None:
states = rnn.get_initial_state(x)
x, states = rnn(x, initial_state=states, training=training)
for i in range(1, self.num_layers):
layer = self.get_layer(f'{self.rnn_type}_{i}')
x, states = layer(x, initial_state=states, training=training)
x = self.dense(x, training=training)
if return_state:
return x, states
else:
return x
model = MyModel(
vocab_size=vocab_size,
embedding_dim=embedding_dim,
rnn_type='gru',
rnn_units=512,
num_layers=3,
dropout=dropout)
When trained for 30 epochs on the dataset in the tutorial, this model generates some random gibberish. Now I don't know if I'm doing the stacking wrong or if the dataset is just too small.
There are multiple factors contributing to the bad predictions of your model:
The dataset is small
The model itself you are using is quite simple
The training time is very short
Predicting Shakespeare sonnets will produce random gibberish even if trained right
Try to train it for longer. This will ultimately lead to better results although predicting coorect speech based on text may be one of the hardest tasks in Machine Learning in general. For example GPT3, one of the models, which solves this problem almost perfectly, consists of billions of parameters (see here).
EDIT: The reason why your model is performing worse than the one in the tutorial although you have more stacked RNN layers may be, that more layers need more training time. Simply increasing the number of layers will not necessarily increase your prediction quality. As I said, try to increase training time or play around with hyper parameters (learning rate, Nomralization layers, etc.).

How to extract the hidden vector (the output of the ReLU after the third encoder layer) as the image representation

I am implementing an autoencoder using the Fashion Mnsit dataset. The code for the encoder-
class MNISTClassifier(Model):
def __init__(self):
super(MNISTClassifier, self).__init__()
self.encoder = Sequential([
layers.Dense(128, activation = "relu"),
layers.Dense(64, activation = "relu"),
layers.Dense(32, activation = "relu")
])
self.decoder = Sequential([
layers.Dense(64, activation = "relu"),
layers.Dense(128, activation= "relu"),
layers.Dense(784, activation= "relu")
])
def call(self, x):
encoded = self.encoder(x)
decoded = self.decoder(encoded)
return decoded
autoencoder = MNISTClassifier()
now I want to train an SVM classifier on the image representations extracted from the above autoencoder mean
Once the above fully-connected autoencoder is trained, for each image, I want to extract the 32-
dimensional hidden vector (the output of the ReLU after the third encoder layer) as the
image representation and then train a linear SVM classifier on the training images of fashion mnist based on the 32-
dimensional features.
How to extract the output 32-
dimensional hidden vector??
Thanks in Advance!!!!!!!!!!!!
I recommend to use Functional API in order to define multiple outputs of your model because of a more clear code. However, you can do this with Sequential model by getting the output of any layer you want and add to your model's output.
Print your model.summary() and check your layers to find which layer you want to branch. You can access each layer's output by it's index with model.layers[index].output .
Then you can create a multi-output model of the layers you want, like this:
third_layer = model.layers[2]
last_layer = model.layers[-1]
my_model = Model(inputs=model.input, outputs=(third_layer.output, last_layer.output))
Then, you can access the outputs of both of layers you have defined:
third_layer_predict, last_layer_predict = my_model.predict(X_test)

How to apply Triplet Loss for a ResNet50 based Siamese Network in Keras or Tf 2

I have a ResNet based siamese network which uses the idea that you try to minimize the l-2 distance between 2 images and then apply a sigmoid so that it gives you {0:'same',1:'different'} output and based on how far the prediction is, you just flow the gradients back to network but there is a problem that updation of gradients is too little as we're changing the distance between {0,1} so I thought of using the same architecture but based on Triplet Loss.
I1 = Input(shape=image_shape)
I2 = Input(shape=image_shape)
res_m_1 = ResNet50(include_top=False, weights='imagenet', input_tensor=I1, pooling='avg')
res_m_2 = ResNet50(include_top=False, weights='imagenet', input_tensor=I2, pooling='avg')
x1 = res_m_1.output
x2 = res_m_2.output
# x = Flatten()(x) or use this one if not using any pooling layer
distance = Lambda( lambda tensors : K.abs( tensors[0] - tensors[1] )) ([x1,x2] )
final_output = Dense(1,activation='sigmoid')(distance)
siamese_model = Model(inputs=[I1,I2], outputs=final_output)
siamese_model.compile(loss='binary_crossentropy',optimizer=Adam(),metrics['acc'])
siamese_model.fit_generator(train_gen,steps_per_epoch=1000,epochs=10,validation_data=validation_data)
So how can I change it to use the Triplet Loss function? What adjustments should be done here in order to get this done? One change will be that I'll have to calculate
res_m_3 = ResNet50(include_top=False, weights='imagenet', input_tensor=I2, pooling='avg')
x3 = res_m_3.output
One thing found in tf docs is triplet-semi-hard-loss and is given as:
tfa.losses.TripletSemiHardLoss()
As shown in the paper, the best results are from triplets known as "Semi-Hard". These are defined as triplets where the negative is farther from the anchor than the positive, but still produces a positive loss. To efficiently find these triplets we utilize online learning and only train from the Semi-Hard examples in each batch.
Another implementation of Triplet Loss which I found on Kaggle is: Triplet Loss Keras
Which one should I use and most importantly, HOW?
P.S: People also use something like: x = Lambda(lambda x: K.l2_normalize(x,axis=1))(x) after model.output. Why is that? What is this doing?
Following this answer of mine, and with role of TripletSemiHardLoss in mind, we could do following:
import tensorflow as tf
import tensorflow_addons as tfa
import tensorflow_datasets as tfds
from tensorflow.keras import models, layers
BATCH_SIZE = 32
LATENT_DEM = 128
def _normalize_img(img, label):
img = tf.cast(img, tf.float32) / 255.
return (img, label)
train_dataset, test_dataset = tfds.load(name="mnist", split=['train', 'test'], as_supervised=True)
# Build your input pipelines
train_dataset = train_dataset.shuffle(1024).batch(BATCH_SIZE)
train_dataset = train_dataset.map(_normalize_img)
test_dataset = test_dataset.batch(BATCH_SIZE)
test_dataset = test_dataset.map(_normalize_img)
inputs = layers.Input(shape=(28, 28, 1))
resNet50 = tf.keras.applications.ResNet50(include_top=False, weights=None, input_tensor=inputs, pooling='avg')
outputs = layers.Dense(LATENT_DEM, activation=None)(resNet50.output) # No activation on final dense layer
outputs = layers.Lambda(lambda x: tf.math.l2_normalize(x, axis=1))(outputs) # L2 normalize embedding
siamese_model = models.Model(inputs=inputs, outputs=outputs)
# Compile the model
siamese_model.compile(
optimizer=tf.keras.optimizers.Adam(0.001),
loss=tfa.losses.TripletSemiHardLoss())
# Train the network
history = siamese_model.fit(
train_dataset,
epochs=3)

Keras Sequential Model accuracy is bad. Model is Ignoring/neglecting a class

little background: I'm making a simple rock, paper, scissors image classifier program. Basically, I want the image classifier to be able to distinguish between a rock, paper, or scissor image.
problem: The program works amazing for two of the classes, rock and paper, but completely fails whenever given a scissors test image. I've tried increasing my training data and a few other things but no luck. I was wondering if anyone has any ideas on how to offset this.
sidenote: I suspect it also has something to do with overfitting. I say this because the model has about a 92% accuracy with the training data but 55% accuracy on test data.
import numpy as np
import os
import cv2
import random
import tensorflow as tf
from tensorflow import keras
CATEGORIES = ['rock', 'paper', 'scissors']
IMG_SIZE = 400 # The size of the images that your neural network will use
CLASS_SIZE = len(CATEGORIES)
TRAIN_DIR = "../Train/"
def loadData( directoryPath ):
data = []
for category in CATEGORIES:
path = os.path.join(directoryPath, category)
class_num = CATEGORIES.index(category)
for img in os.listdir(path):
try:
img_array = cv2.imread(os.path.join(path, img), cv2.IMREAD_GRAYSCALE)
new_array = cv2.resize(img_array, (IMG_SIZE, IMG_SIZE))
data.append([new_array, class_num])
except Exception as e:
pass
return data
training_data = loadData(TRAIN_DIR)
random.shuffle(training_data)
X = [] #features
y = [] #labels
for i in range(len(training_data)):
features = training_data[i][0]
label = training_data[i][1]
X.append(features)
y.append(label)
X = np.array(X)
y = np.array(y)
X = X/255.0
model = keras.Sequential([
keras.layers.Flatten(input_shape=(IMG_SIZE, IMG_SIZE)),
keras.layers.Dense(128, activation='relu'),
keras.layers.Dense(CLASS_SIZE)
])
model.compile(optimizer='adam',
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=['accuracy'])
model.fit(X, y, epochs=25)
TEST_DIR = "../Test/"
test_data = loadData( TEST_DIR )
random.shuffle(test_data)
test_images = []
test_labels = []
for i in range(len(test_data)):
features = test_data[i][0]
label = test_data[i][1]
test_images.append(features)
test_labels.append(label)
test_images = np.array(test_images)
test_images = test_images/255.0
test_labels = np.array(test_labels)
test_loss, test_acc = model.evaluate(test_images, test_labels, verbose=2)
print('\nTest accuracy:', test_acc)
# Saving the model
model_json = model.to_json()
with open("model.json", "w") as json_file :
json_file.write(model_json)
model.save_weights("model.h5")
print("Saved model to disk")
model.save('CNN.model')
If you want to create a massive amount of training data fast: https://github.com/ThomasStuart/RockPaperScissorsMachineLearning/blob/master/source/0.0-collectMassiveData.py
Thanks in advance to any help or ideas :)
You can simply test overfitting by adding 2 additional layers, one dropout layer and one dense layer. Also be sure to shuffle your train_data after each epoch, so the model keeps the learning general. Also, if I see this correctly, you are doing a multi class classification but do not have a softmax activation in the last layer. I would recommend you, to use it.
With drouput and softmax your model would look like this:
model = keras.Sequential([
keras.layers.Flatten(input_shape=(IMG_SIZE, IMG_SIZE)),
keras.layers.Dense(128, activation='relu'),
keras.layers.Dropout(0.4), #0.4 means 40% of the neurons will be randomly unused
keras.layers.Dense(CLASS_SIZE, activation="softmax")
])
As last advice: Cnns perform in general way better with tasks like this. You might want to switch to a CNN network, for having even better performance.