Define custom loss with class weight argument in tensorflow - tensorflow

I wanted to use focal loss for my imbalanced tabular data. I used Tensorflow API Focal Loss, but it is not working. My goal is to use focal loss with class weight as custom loss function. There are other ways to do it, but I wanted to use tensorflow API.
np.random.seed(16)
python_random.seed(17)
tf.random.set_seed(18)
model =Sequential()
model.add(LSTM(128, input_shape = (50, 5),return_sequences=False))
model.add(Dropout(0.2))
model.add(Dense(1, activation="sigmoid"))
model.compile(loss=tf.keras.losses.BinaryFocalCrossentropy(
apply_class_balancing=True, gamma=2, from_logits=True)
, metrics=[tf.keras.metrics.AUC(name='auc'),tf.keras.metrics.Recall()]
, optimizer=adam)
model.summary()
I got the following error, not sure why! I appreciate your suggestions.
TypeError: BinaryFocalCrossentropy.__init__() got an unexpected keyword argument 'apply_class_balancing'

Related

weighted loss function for multilabel classification

I am working on multilabel classification problem for images. I have 5 classes and I am using sigmoid for the last layer of classification. I have imbalanced data caused by multilabel problem and I thought I can use:
tf.nn.weighted_cross_entropy_with_logits( labels, logits, pos_weight, name=None)
However I don't know how to get logits from my model. I also think I shouldn't use sigmoid in the last layer since this loss function applies sigmoid to the logit.
First of all I suggest you have a look at the TensorFlow tutorial for classification on imbalanced dataset. However keep in mind that this tutorial is for binary classification and uses a sigmoid as last dense layer activation function. For multi-label classification you should use a softmax activation.
The softmax function normalizes a set of N real numbers into a probability distribution such that they sum up to 1.
For K = 2, the softmax and sigmoid function are the same.
I don't know your model, but you could create something like this (following the tutorial):
model = tf.keras.models.Sequential([
tf.keras.layers.Flatten(input_shape=(28, 28)),
tf.keras.layers.Dense(128, activation='relu'),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.Dense(10, activation=None)
])
To obtain the predictions you could do:
predictions = model(x_train[:1]).numpy() # obtains the prediction logits
tf.nn.softmax(predictions).numpy() # converts the logits to probabilities
In order to train you can define the following loss, compile the model, and train:
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
model.compile(optimizer='adam',
loss=loss_fn,
metrics=['accuracy'])
model.fit(x_train, y_train, epochs=5)
Now, since you have an imbalanced dataset, in order to add weights, if you look at the documentation of SparseCategoricalCrossEntropy, you can see that the __call__ method has an optional parameter sample_weights:
Optional sample_weight acts as a coefficient for the loss. If a scalar
is provided, then the loss is simply scaled by the given value. If
sample_weight is a tensor of size [batch_size], then the total loss
for each sample of the batch is rescaled by the corresponding element
in the sample_weight vector.
I suggest you have a look at this answer if you have doubts on how to proceed. I think it answers perfectly what you want to achieve.
Also I find that this tutorial explains pretty well the multi-label classification problem.

LSTM training difficulties

I wanted to train LSTM model for tabular time series data. My data shape is
((7342689, 50, 5), (7342689,))
I was having a hard time to handle the training loss. Initially I tried with default learning rate , but it didn't help. My class label is severely skewed. I have added focal loss and class weights to handle class imbalance issues. I have tried with adding one more layer with 50 neurons, but that loss started to increase instead of decrease. I appreciate your suggestions. Thanks!
Here is my current model architecture:
adam = Adam(learning_rate=0.0001)
model = keras.Sequential()
model.add(LSTM(100, input_shape = (50, 5)))
model.add(Dropout(0.5))
model.add(Dense(1, activation="sigmoid"))
model.compile(loss=tfa.losses.SigmoidFocalCrossEntropy()
, metrics=[keras.metrics.binary_accuracy]
, optimizer=adam)
model.summary()
class_weights = dict(zip(np.unique(y_train), class_weight.compute_class_weight('balanced', classes=np.unique(y_train),y=y_train)))
history=model.fit(X_train, y_train, batch_size=64, epochs=50,class_weight=class_weights)
The loss of the model first decreased and then increased, which may be because the optimization process got stuck in a local optimal solution. Maybe you can try reducing the learning rate and increasing the epoch.

efficientnet.tfkeras vs tf.keras.applications.efficientnet

I am trying to use efficientnet to custom train my dataset.
And I find out with all other code/data/config the same.
efficientnet.tfkeras.EfficientNetB0 can gives ~90% training/prediction accruacy and tf.keras.applications.efficientnet.EfficientNetB0 only gives ~70% accuracy.
But I guess both should be the same implementation of the efficient net, or I am missing something here?
I am using latest efficientnet and Tensorflow 2.3.0
with strategy.scope():
model = tf.keras.Sequential([
efficientnet.tfkeras.EfficientNetB0( #tf.keras.applications.efficientnet.EfficientNetB0
input_shape=(IMAGE_SIZE, IMAGE_SIZE, 3),
weights='imagenet',
include_top=False
),
L.GlobalAveragePooling2D(),
L.Dense(1, activation='sigmoid')
])
model.compile(
optimizer='adam',
loss='binary_crossentropy',
metrics=['binary_crossentropy']
)
model.summary()
I did run into the same problem for EfficientNetB4 and did encounter the following:
The number of total parameters are not equal. The trainable parameters are equal, but the non-trainable parameters aren't. The efficientnet.tfkeras has 7 fewer non-trainable parameters than the tf.keras.applications model.
The number of layers are not equal, the efficientnet.tfkeras has fewer layers than tf.keras.application model.
The different layers are at the very beginning, the most noteworthy are the normalization and rescaling layers, which are in the tf.keras.applications model, but not in the efficientnet.tfkeras model. You can observe this yourself using the model.summary() method.
When applying this layer, by using model.layers[i](array), it turn out these layers do rescale the image by dividing it by 255 and applying normalization according to:
(input_image - IMAGENET_MEAN) / square_root(IMAGENET_STD)
Thus, it turns out the image normalization is build into the model. When you perform this normalization yourself to the input image, the image will be normalized twice resulting in extremely small pixel values. The model will therefore have a hard time learning.
TLDR: Do not normalize the input image as it is build into the tf.keras.application model, input images should have values in the range 0-255.

Multivariate Binary Classification Prediction Tensorflow 2 LSTM

I am currently working on the implementation of an LSTM to predict a binary outcome (either 0 or 1) for a given set of normed scaled features.
self._regressor.add(LSTM(units=60, activation='relu', return_sequences=True, input_shape=(data.x_train.shape[1], data.x_train.shape[2])))
self._regressor.add(Dropout(0.2))
self._regressor.add(LSTM(units=60, activation='relu', return_sequences=True))
self._regressor.add(Dropout(0.3))
self._regressor.add(LSTM(units=80, activation='relu', return_sequences=True))
self._regressor.add(Dropout(0.4))
self._regressor.add(LSTM(units=120, activation='relu'))
self._regressor.add(Dropout(0.5))
#this is the output layer
self._regressor.add(Dense(units=1, activation='sigmoid'))
self._logger.info("TensorFlow Summary\n {}".format(self._regressor.summary()))
#run regressor
self._regressor.compile(optimizer='adam', loss="binary_crossentropy", metrics=['accuracy'])
self._regressor.fit(data.x_train, data.y_train, epochs=1, batch_size=32)
data.y_pred_scaled = self._regressor.predict(data.x_test)
data.y_pred = self._scaler_target.inverse_transform(data.y_pred_scaled)
scores = self._regressor.evaluate(data.x_test, data.y_test, verbose=0)
My issue here is that the output of my prediction has a range of max: 0.5188445 and min: 0.518052, implying to me that all of my classifications are positive (which is definitely incorrect). I even tried predict_classes and this yielded an array of 1's.
I am struggling to find where my issue is despite numerous searches online. I have ensured that my final output layer consists of a sigmoid function as well as included the loss as the binary_crossentropy also. My data has been scaled using sklearn's MinMaxScaler with feature_range=(0,1). I am running my code through a debugger and everything up to the self._regressor.fit looks good so far. I am just struggling with quantifying the output of the predictions.
Any help would be greatly appreciated.

How to do Gradient Normalization using Tensorflow LazyAdamOptimizer in functional Keras Model?

I am using a bidirectional RNN in Keras and need to use Tensoflows LazyAdamOptimizer. I need to do Gradient Normalization. How can I implement gradient normalization with tensorflows LazyAdamOptimizer and than use the functional keras model further on?
I am training a unsupervised RNN to predict a input sequence of lenght 10. The Problem is, that i am using a keras functional model. Because of the sparsity of the embedding layer i need to use Tensorflows LazyAdamOptimizer, which is not a default optimizer in keras. When using a default keras optimizer i can do gradient normalization just by setting the argument 'clipnorm=1' in the optimizer function. Because i am using LazyAdam i need to do this with tensorflow and than pass it back to my keras model, but i can't get the code going.
#model architecture
model_input = Input(shape=(seq_len, ))
embedding_a = Embedding(len(port_fwd_dict), 50, input_length=seq_len, mask_zero=True)(model_input)
lstm_a = Bidirectional(GRU(25, return_sequences=True,implementation=2, reset_after=True, recurrent_activation='sigmoid'), merge_mode="concat (embedding_a)
dropout_a = Dropout(0.2)(lstm_a)
lstm_b = Bidirectional(GRU(25, return_sequences=False, activation="relu", implementation=2, reset_after=True, recurrent_activation='sigmoid'), merge_mode="concat")(dropout_a)
dropout_b = Dropout(0.2)(lstm_b)
dense_layer = Dense(100, activation="linear")(dropout_b)
dropout_c = Dropout(0.2)(dense_layer)
model_output = Dense(len(port_fwd_dict)-1, activation="softmax(dropout_c)
# trying to implement gradient normalization
optimizer = tf.contrib.opt.LazyAdamOptimizer()
optimizer = tf.contrib.estimator.clip_gradients_by_norm(optimizer, 1)
loss = tf.reduce_mean(categorical_crossentropy(model_input, model_output))
train_op = optimizer.minimize(loss, tf.train.get_global_step())
model = Model(inputs=model_input, outputs=model_output)
model.compile(optimizer=train_op, loss='categorical_crossentropie', metrics = [ 'categorical_accuracy'])
history = model.fit(X_train, Y_train, epochs=epochs, batch_size=batch_size, validation_split=validation_split, class_weight = 'auto')
Blockquote
I get the following Error: NameError: name 'categorical_crossentropy' is not defined
But even if this error is solved, i do not know if this code will work. Because I need to use the keras function model.compile and in this function there need to be a loss specified. but when i do this in the tensorflow part above, it is not working.
I need a way to do gradient normalization and use my normal keras functional model?!
maybe you can try my implement of lazy optimizer:
https://github.com/bojone/keras_lazyoptimizer
It is a pure keras implement, wrapping a existing optimizer to be a lazy version.