How to define combine loss function in keras? - tensorflow

My model arch is
I have two outputs, I want to train a model based on two outputs such as mse, and cross-entropy. At first, I used two keras loss
model1.compile(loss=['mse','sparse_categorical_crossentropy'], metrics = ['mse','accuracy'], optimizer='adam')
it's working fine, the problem is the cross entropy loss is very unstable, sometimes gives accuracy 74% in the next epoch shows 32%. I'm confused why is?
Now if define customer loss.
def my_custom_loss(y_true, y_pred):
mse = mean_squared_error(y_true[0], y_pred[0])
crossentropy = binary_crossentropy(y_true[1], y_pred[1])
return mse + crossentropy
But it's not working, it showed a negative loss in total loss.

It is hard to judge the issues depending on the information given. A reason might be a too small batch size or a too high learning rate, making the training unstable. I also wonder, that you use sparse_categorical_crossentropy in the top example and binary_crossentropy in the lower one. How many classes do you actually have?

Related

How to build a Neural Network in Keras using a custom loss function with datapoint-specific weight?

I want to train a Neural Network for a classification task in Keras using a TensorFlow backend with a custom loss function. In my loss, I want to give different weights to different training examples. I have some datapoints I consider important and some I do not consider as important. I want my loss function to take this into account and punish errors in important examples more than in less important ones.
I have already built my model:
input = tf.keras.Input(shape=(16,))
hidden_layer_1 = tf.keras.layers.Dense(5, kernel_initializer='glorot_uniform', activation='relu')(input)
output = tf.keras.layers.Dense(1, kernel_initializer='normal', activation='softmax')(hidden_layer_1)
model = tf.keras.Model(input, output)
model.compile(loss=custom_loss(input), optimizer='adam', run_eagerly=True, metrics = [tf.keras.metrics.Accuracy(), 'acc'])
and the currrent state of my loss function is:
def custom_loss(input):
def loss(y_true, y_pred):
return ...
return loss
I'm struggling with implementing the loss function in the way I explained above, mainly because I don't exactly know what input, y_pred and y_true are (KerasTensors, I know - but what is the content? And is it for one training example only or for the whole batch?). I'd appreciate help with
printing out the values of input, y_true and y_pred
converting the input value to a numpy ndarray ([1,3,7] for example) so I can use the array to look up my weight for this specific training data point
once I have my weigth as a number (0.5 for example), how do I implement the computation of the loss function in Keras? My loss for one training exaple should be 0 if the classification was correct and weight if it was incorrect.

where does class_weights or weighted loss penalize the network?

I am working on a Semantic segmentation project where I have to work on multiclass data which is highly imbalanced. I searched for optimizing it during training using the model.fit parameter and in that to use class_weights or sample_weights.
I can implement a following using a class_weight dictionary as
{ 0:1, 1:10,2:15 }
I also saw a method of updating weights in loss function
But at what point do these weights get updated?
If class_weights are used where will it get penalized? I already have a kernel_regularizer for each layer so if my classes have to be penalized based on my class weights then will it penalize the output of each layer y=Wx+b or only at the final layer?
Same if I use a weighted loss function will it get penalized only on the final layer before loss calculation or on each layer and then the final loss is calculated?
Any explanation on this would be very useful.
The class_weights you mentioned in your dictionary are there to account for your imbalanced data. They will never change, they are only there to increase the penalty for misclassified instances of minority classes (that way your network pays more attention to them and the gradients returned treat one 'Class2' instance as if it was 15 times more important than one 'Class0' instance).
The kernel_regularizer you mention resides at your loss function and penalizes large weight norms for weight matrices throughout the network (if you use kernel_regularizer = tf.keras.regularizers.l1(0.01) in a Dense layer, it only affects that layer). So that is a different weight that has nothing to do with classes, only with weights inside your network. Your eventual loss will be something like loss = Cross_entropy + a * norm(Weight_matrix) and that way the network will have as an additional task assigned to it to minimize the classification loss (cross entropy) while the weight norms remain low.

RNN Text Generation: How to balance training/test lost with validation loss?

I'm working on a short project that involves implementing a character RNN for text generation. My model uses a single LSTM layer with varying units (messing around with between 50 and 500), dropout at a rate of 0.2, and softmax activation. I'm using RMSprop with a learning rate of 0.01.
My issue is that I can't find a good way to characterize the validation loss. I'm using a validation split of 0.3 and I'm finding that the validation loss starts to become constant after only a few epochs (maybe 2-5 or so) while the training loss keeps decreasing. Does validation loss carry much weight in this sort of problem? The purpose of the model is to generate new strings, so quantifying the validation loss with other strings seems... pointless?
It's hard for me to really find the best model since qualitatively I get the sense that the best model is trained for more epochs than it takes for the validation loss to stop changing but also for fewer epochs than it takes for the training loss to start increasing. I would really appreciate any advice you have regarding this problem as well as any general advice about RNN's for text generation, especially regarding dropout and overfitting. Thanks!
This is the code for fitting the model for every epoch. The callback is a custom callback that just prints a few tests. I'm now realizing that history_callback.history['loss'] is probably the training loss isn't it...
for i in range(num_epochs):
history_callback = model.fit(x, y,
batch_size=128,
epochs=1,
callbacks=[print_callback],
validation_split=0.3)
loss_history.append(history_callback.history['loss'])
validation_loss_history.append(history_callback.history['val_loss'])
My intention for this model isn't to replicate sentences from the training data, rather, I'd like to generate sentence from the same distribution that I'm training on.
Yes history_callback.history['loss'] is Training Loss and history_callback.history['val_loss'] is the Validation Loss.
Yes, Validation Loss carries weight in this sort of problem because you just don't want to replicate the sentences which are given during Training but you want to learn the patterns from the Training Data and generate new sentences when it sees a new data.
From the information you mentioned in the question and from the insights identified from comments (thanks to Brian Bartoldson), it is understood that your model is overfitting. In addition to EarlyStopping and dropout, you can try the below mentioned techniques to mitigate overfitting problem.
3.a. Shuffle the Data, by using shuffle=True in model.fit. Code is shown below
3.b. Use recurrent_dropout. For example, If we set the value of Recurrent Dropout as 0.2 in a Recurrent Layer (LSTM), it means that it will consider only 80% of the Time Steps for that Recurrent Layer (LSTM).
3.c. Use Regularization. You can try l1 Regularization or l1_l2 Regularization as well for the arguments, kernel_regularizer, recurrent_regularizer, bias_regularizer, activity_regularizer of the LSTM Layer.
Sample code to use Shuffle, Early Stopping, Recurrent_Dropout, Regularization is shown below:
from tensorflow.keras.regularizers import l2
from tensorflow.keras.models import Sequential
model = Sequential()
Regularizer = l2(0.001)
model.add(tf.keras.layers.LSTM(units = 50, activation='relu',kernel_regularizer=Regularizer ,
recurrent_regularizer=Regularizer , bias_regularizer=Regularizer , activity_regularizer=Regularizer, dropout=0.2, recurrent_dropout=0.3))
callback = tf.keras.callbacks.EarlyStopping(monitor='val_loss', patience=15)
history_callback = model.fit(x, y,
batch_size=128,
epochs=1,
callbacks=[print_callback, callback],
validation_split=0.3, shuffle = True)
Hope this helps. Happy Learning!

Dropout with densely connected layer

Iam using a densenet model for one of my projects and have some difficulties using regularization.
Without any regularization, both validation and training loss (MSE) decrease. The training loss drops faster though, resulting in some overfitting of the final model.
So I decided to use dropout to avoid overfitting. When using Dropout, both validation and training loss decrease to about 0.13 during the first epoch and remain constant for about 10 epochs.
After that both loss functions decrease in the same way as without dropout, resulting in overfitting again. The final loss value is in about the same range as without dropout.
So for me it seems like dropout is not really working.
If I switch to L2 regularization though, Iam able to avoid overfitting, but I would rather use Dropout as a regularizer.
Now Iam wondering if anyone has experienced that kind of behaviour?
I use dropout in both the dense block (bottleneck layer) and in the transition block (dropout rate = 0.5):
def bottleneck_layer(self, x, scope):
with tf.name_scope(scope):
x = Batch_Normalization(x, training=self.training, scope=scope+'_batch1')
x = Relu(x)
x = conv_layer(x, filter=4 * self.filters, kernel=[1,1], layer_name=scope+'_conv1')
x = Drop_out(x, rate=dropout_rate, training=self.training)
x = Batch_Normalization(x, training=self.training, scope=scope+'_batch2')
x = Relu(x)
x = conv_layer(x, filter=self.filters, kernel=[3,3], layer_name=scope+'_conv2')
x = Drop_out(x, rate=dropout_rate, training=self.training)
return x
def transition_layer(self, x, scope):
with tf.name_scope(scope):
x = Batch_Normalization(x, training=self.training, scope=scope+'_batch1')
x = Relu(x)
x = conv_layer(x, filter=self.filters, kernel=[1,1], layer_name=scope+'_conv1')
x = Drop_out(x, rate=dropout_rate, training=self.training)
x = Average_pooling(x, pool_size=[2,2], stride=2)
return x
Without any regularization, both validation and training loss (MSE) decrease. The training loss drops faster though, resulting in some overfitting of the final model.
This is not overfitting.
Overfitting starts when your validation loss starts increasing, while your training loss continues decreasing; here is its telltale signature:
The image is adapted from the Wikipedia entry on overfitting - diferent things may lie in the horizontal axis, e.g. depth or number of boosted trees, number of neural net fitting iterations etc.
The (generally expected) difference between training and validation loss is something completely different, called the generalization gap:
An important concept for understanding generalization is the generalization gap, i.e., the difference between a model’s performance on training data and its performance on unseen data drawn from the same distribution.
where, practically speaking, validation data is unseen data indeed.
So for me it seems like dropout is not really working.
It can very well be the case - dropout is not expected to work always and for every problem.
Interesting problem,
I would recommend plotting the validation loss and the training loss to see if it is really overfitting. if you see that the validation loss didn't change while the training loss dropped ( you will also probably see a large gap between them ) then it is overfitting.
If it is overfitting then try to reduce the number of layers or the number of nodes ( also play a little with the Dropout rate after you do that). Reducing the number of epochs could also be helpful.
If you would like to use a different method instead of dropout I would recommend using the Gaussian Noise layer.
Keras - https://keras.io/layers/noise/
TensorFlow - https://www.tensorflow.org/api_docs/python/tf/keras/layers/GaussianNoise

Tensorflow Polynomial Linear Regression curve fit

I have created this Linear regression model using Tensorflow (Keras). However, I am not getting good results and my model is trying to fit the points around a linear line. I believe fitting points around degree 'n' polynomial can give better results. I have looked googled how to change my model to polynomial linear regression using Tensorflow Keras, but could not find a good resource. Any recommendation on how to improve the prediction?
I have a large dataset. Shuffled it first and then spited to 80% training and 20% Testing. Also dataset is normalized.
1) Building model:
def build_model():
model = keras.Sequential()
model.add(keras.layers.Dense(units=300, input_dim=32))
model.add(keras.layers.Activation('sigmoid'))
model.add(keras.layers.Dense(units=250))
model.add(keras.layers.Activation('tanh'))
model.add(keras.layers.Dense(units=200))
model.add(keras.layers.Activation('tanh'))
model.add(keras.layers.Dense(units=150))
model.add(keras.layers.Activation('tanh'))
model.add(keras.layers.Dense(units=100))
model.add(keras.layers.Activation('tanh'))
model.add(keras.layers.Dense(units=50))
model.add(keras.layers.Activation('linear'))
model.add(keras.layers.Dense(units=1))
#sigmoid tanh softmax relu
optimizer = tf.train.RMSPropOptimizer(0.001,
decay=0.9,
momentum=0.0,
epsilon=1e-10,
use_locking=False,
centered=False,
name='RMSProp')
#optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.1)
model.compile(loss='mse',
optimizer=optimizer,
metrics=['mae'])
return model
model = build_model()
model.summary()
2) Train the model:
class PrintDot(keras.callbacks.Callback):
def on_epoch_end(self, epoch, logs):
if epoch % 100 == 0: print('')
print('.', end='')
EPOCHS = 500
# Store training stats
history = model.fit(train_data, train_labels, epochs=EPOCHS,
validation_split=0.2, verbose=1,
callbacks=[PrintDot()])
3) plot Train loss and val loss
enter image description here
4) Stop When results does not get improved
enter image description here
5) Evaluate the result
[loss, mae] = model.evaluate(test_data, test_labels, verbose=0)
#Testing set Mean Abs Error: 1.9020842795676374
6) Predict:
test_predictions = model.predict(test_data).flatten()
enter image description here
7) Prediction error:
enter image description here
Polynomial regression is a linear regression with some extra additional input features which are the polynomial functions of original input features.
i.e.;
let the original input features are : (x1,x2,x3,...)
Generate a set of polynomial functions by adding some transformations of the original features, for example: (x12, x23, x13x2,...).
One may decide which all functions are to be included depending on their constraints such as intuition on correlation to the target values, computational resources, and training time.
Append these new features to the original input feature vector. Now the transformed input feature vector has a size of len(x1,x2,x3,...) + len(x12, x23, x13x2,...)
Further, this updated set of input features (x1,x2,x3,x12, x23, x13x2,...) is feeded into the normal linear regression model. ANN's architecture may be tuned again to get the best trained model.
PS: I see that your network is huge while the number of inputs is only 32 - this is not a common scale of architecture. Even in this particular linear model, reducing the hidden layers to one or two hidden layers may help in training better models (It's a suggestion with an assumption that this particular dataset is similar to other generally seen regression datasets)
I've actually created polynomial layers for Tensorflow 2.0, though these may not be exactly what you are looking for. If they are, you could use those layers directly or follow the procedure used there to create a more general layer https://github.com/jloveric/piecewise-polynomial-layers