weighted loss function for multilabel classification - tensorflow

I am working on multilabel classification problem for images. I have 5 classes and I am using sigmoid for the last layer of classification. I have imbalanced data caused by multilabel problem and I thought I can use:
tf.nn.weighted_cross_entropy_with_logits( labels, logits, pos_weight, name=None)
However I don't know how to get logits from my model. I also think I shouldn't use sigmoid in the last layer since this loss function applies sigmoid to the logit.

First of all I suggest you have a look at the TensorFlow tutorial for classification on imbalanced dataset. However keep in mind that this tutorial is for binary classification and uses a sigmoid as last dense layer activation function. For multi-label classification you should use a softmax activation.
The softmax function normalizes a set of N real numbers into a probability distribution such that they sum up to 1.
For K = 2, the softmax and sigmoid function are the same.
I don't know your model, but you could create something like this (following the tutorial):
model = tf.keras.models.Sequential([
tf.keras.layers.Flatten(input_shape=(28, 28)),
tf.keras.layers.Dense(128, activation='relu'),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.Dense(10, activation=None)
])
To obtain the predictions you could do:
predictions = model(x_train[:1]).numpy() # obtains the prediction logits
tf.nn.softmax(predictions).numpy() # converts the logits to probabilities
In order to train you can define the following loss, compile the model, and train:
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
model.compile(optimizer='adam',
loss=loss_fn,
metrics=['accuracy'])
model.fit(x_train, y_train, epochs=5)
Now, since you have an imbalanced dataset, in order to add weights, if you look at the documentation of SparseCategoricalCrossEntropy, you can see that the __call__ method has an optional parameter sample_weights:
Optional sample_weight acts as a coefficient for the loss. If a scalar
is provided, then the loss is simply scaled by the given value. If
sample_weight is a tensor of size [batch_size], then the total loss
for each sample of the batch is rescaled by the corresponding element
in the sample_weight vector.
I suggest you have a look at this answer if you have doubts on how to proceed. I think it answers perfectly what you want to achieve.
Also I find that this tutorial explains pretty well the multi-label classification problem.

Related

How to build a Neural Network in Keras using a custom loss function with datapoint-specific weight?

I want to train a Neural Network for a classification task in Keras using a TensorFlow backend with a custom loss function. In my loss, I want to give different weights to different training examples. I have some datapoints I consider important and some I do not consider as important. I want my loss function to take this into account and punish errors in important examples more than in less important ones.
I have already built my model:
input = tf.keras.Input(shape=(16,))
hidden_layer_1 = tf.keras.layers.Dense(5, kernel_initializer='glorot_uniform', activation='relu')(input)
output = tf.keras.layers.Dense(1, kernel_initializer='normal', activation='softmax')(hidden_layer_1)
model = tf.keras.Model(input, output)
model.compile(loss=custom_loss(input), optimizer='adam', run_eagerly=True, metrics = [tf.keras.metrics.Accuracy(), 'acc'])
and the currrent state of my loss function is:
def custom_loss(input):
def loss(y_true, y_pred):
return ...
return loss
I'm struggling with implementing the loss function in the way I explained above, mainly because I don't exactly know what input, y_pred and y_true are (KerasTensors, I know - but what is the content? And is it for one training example only or for the whole batch?). I'd appreciate help with
printing out the values of input, y_true and y_pred
converting the input value to a numpy ndarray ([1,3,7] for example) so I can use the array to look up my weight for this specific training data point
once I have my weigth as a number (0.5 for example), how do I implement the computation of the loss function in Keras? My loss for one training exaple should be 0 if the classification was correct and weight if it was incorrect.

How to calculate confidence score of a Neural Network prediction

I am using a deep neural network model (implemented in keras)to make predictions. Something like this:
def make_model():
model = Sequential()
model.add(Conv2D(20,(5,5), activation = "relu"))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Flatten())
model.add(Dense(20, activation = "relu"))
model.add(Lambda(lambda x: tf.expand_dims(x, axis=1)))
model.add(SimpleRNN(50, activation="relu"))
model.add(Dense(1, activation="sigmoid"))
model.compile(loss = "binary_crossentropy", optimizer = adagrad, metrics = ["accuracy"])
return model
model = make_model()
model.fit(x_train, y_train, validation_data = (x_validation,y_validation), epochs = 25, batch_size = 25, verbose = 1)
##Prediciton:
prediction = model.predict_classes(x)
probabilities = model.predict_proba(x) #I assume these are the probabilities of class being predictied
My problem is a classification(binary) problem. I wish to calculate the confidence score of each of these prediction i.e. I wish to know - Is my model 99% certain it is "0" or is it 58% it is "0".
I have found some views on how to do it, but can't implement them. The approach I wish to follow says: "With classifiers, when you output you can interpret values as the probability of belonging to each specific class. You can use their distribution as a rough measure of how confident you are that an observation belongs to that class."
How should I predict with something like above model so that I get its confidence about each predictions? I would appreciate some practical examples (preferably in Keras).
The softmax is a problematic way to estimate a confidence of the model`s prediction.
There are a few recent papers about this topic.
You can look for "calibration" of neural networks in order to find relevant papers.
This is one example you can start with - https://arxiv.org/pdf/1706.04599.pdf
In Keras, there is a method called predict() that is available for both Sequential and Functional models. It will work fine in your case if you are using binary_crossentropy as your loss function and a final Dense layer with a sigmoid activation function.
Here is how to call it with one test data instance. Below, mymodel.predict() will return an array of two probabilities adding up to 1.0. These values are the confidence scores that you mentioned. You can further use np.where() as shown below to determine which of the two probabilities (the one over 50%) will be the final class.
yhat_probabilities = mymodel.predict(mytestdata, batch_size=1)
yhat_classes = np.where(yhat_probabilities > 0.5, 1, 0).squeeze().item()
I've come to understand that the probabilities that are output by logistic regression can be interpreted as confidence.
Here are some links to help you come to your own conclusion.
https://machinelearningmastery.com/how-to-score-probability-predictions-in-python/
how to assess the confidence score of a prediction with scikit-learn
https://stats.stackexchange.com/questions/34823/can-logistic-regressions-predicted-probability-be-interpreted-as-the-confidence
https://kiwidamien.github.io/are-you-sure-thats-a-probability.html
Feel free to upvote my answer if you find it useful.
How about to use a softmax as the activation in the last layer? Let's say something like this:
model.add(Dense(2, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer = adagrad, metrics = ["accuracy"])
In this way, for each data point, you will be given a probabilistic-ish result by the model, which tells what is the likelihood that your data point belongs to each of two classes.
For example for a given X, if the model returns (0.3,0.7), you will know it is more likely that X belongs to class 1 than class 0. and you know that the likelihood has been estimated to be 0.7 over 0.3.

Multivariate Binary Classification Prediction Tensorflow 2 LSTM

I am currently working on the implementation of an LSTM to predict a binary outcome (either 0 or 1) for a given set of normed scaled features.
self._regressor.add(LSTM(units=60, activation='relu', return_sequences=True, input_shape=(data.x_train.shape[1], data.x_train.shape[2])))
self._regressor.add(Dropout(0.2))
self._regressor.add(LSTM(units=60, activation='relu', return_sequences=True))
self._regressor.add(Dropout(0.3))
self._regressor.add(LSTM(units=80, activation='relu', return_sequences=True))
self._regressor.add(Dropout(0.4))
self._regressor.add(LSTM(units=120, activation='relu'))
self._regressor.add(Dropout(0.5))
#this is the output layer
self._regressor.add(Dense(units=1, activation='sigmoid'))
self._logger.info("TensorFlow Summary\n {}".format(self._regressor.summary()))
#run regressor
self._regressor.compile(optimizer='adam', loss="binary_crossentropy", metrics=['accuracy'])
self._regressor.fit(data.x_train, data.y_train, epochs=1, batch_size=32)
data.y_pred_scaled = self._regressor.predict(data.x_test)
data.y_pred = self._scaler_target.inverse_transform(data.y_pred_scaled)
scores = self._regressor.evaluate(data.x_test, data.y_test, verbose=0)
My issue here is that the output of my prediction has a range of max: 0.5188445 and min: 0.518052, implying to me that all of my classifications are positive (which is definitely incorrect). I even tried predict_classes and this yielded an array of 1's.
I am struggling to find where my issue is despite numerous searches online. I have ensured that my final output layer consists of a sigmoid function as well as included the loss as the binary_crossentropy also. My data has been scaled using sklearn's MinMaxScaler with feature_range=(0,1). I am running my code through a debugger and everything up to the self._regressor.fit looks good so far. I am just struggling with quantifying the output of the predictions.
Any help would be greatly appreciated.

cost function after converting tf.layers to tf.keras.layers

I have a CNN where output dimension is [None, 10]
It is a multi-label problem, where output signifies possible categories which x might belong. (eg, an image can be classified as cat dark and so on)
Following is what I have now, how can I change the code to keras version?
I can't find equivalent of sigmoid_cross_entropy_with_logits
model = tf.layers.dense(L3, category_num, activation=None)
cross_entropy = tf.nn.sigmoid_cross_entropy_with_logits(logits=model, labels=Y)
cost = tf.reduce_mean(tf.reduce_sum(cross_entropy, axis=1))
optimizer = tf.train.AdamOptimizer(learning_rate).minimize(cost)
Direct alternative in Keras is to use sigmoid activation in your output layer and binary_crossentropy as cost function.
net.add(Dense(..., activation='sigmoid'))
net.compile(optimizer, loss='binary_crossentropy')
Take a look https://github.com/keras-team/keras/issues/741
In Keras:
#you model here -- last layer:
model.add(Dense(10))
model.add(Activation('sigmoid'))
model.compile(loss='categorical_crossentropy',
optimizer="adam",metrics=['accuracy'])

Keras dense layer outputs are 'nan'

I'm using Keras to build a RNN model with CTC loss.
I found that when passed a tensor to a Dense layer with activation=None, and the outputs of this layer were all nan.
But when set activation='softmax', the outputs were normal not nan.
problem code (elements of logits are all nan):
logits = Dense(out_shape, activation = None, name="logits")(x_permute)#x_permute is a tensor with shape (?,1876,96)
loss_ctc = Lambda(ctc_lambda_func, name='ctc_my')(
[logits, labels, x_len, lab_len])
model = Model(inputs=[x, labels, x_len, lab_len], outputs=[loss_ctc])
model.compile(loss={'ctc_my': lambda y_true,y_pred: y_pred}, optimizer='adadelta')
normal code(elements of logits are not nan):
logits = Dense(out_shape, activation = None, name="logits")(x_permute)#x_permute is a tensor with shape (?,1876,96)
output = Activation(activation="softmax", name="softmax")(logits)
loss_ctc = Lambda(ctc_lambda_func, name='ctc_my')(
[output, labels, x_len, lab_len])
model = Model(inputs=[x, labels, x_len, lab_len], outputs=[loss_ctc])
model.compile(loss={'ctc_my': lambda y_true,y_pred: y_pred}, optimizer='adadelta')
def ctc_lambda_func(args):
y_pred, y_true, input_length, label_length = args
return ctc_batch_cost(y_true, y_pred,input_length,label_length)
Anyone helps? many thanks.
I may misunderstand you, but why would you want activation="none"?
Maybe what you want to use is linear activation?
Have a look at Keras Activation Functions
as per Klemen Grm
your neural network is completely linear. You might consider different activation functions (eg: tanh, sigmoid, linear) for your hidden and output layers. This both lets you constrain the output range, and will probably improve the learning properties of your network.
In addition to what Klemen says, for the last one you want a softmax,
that normalizes the outputs into probabilities.
Neural networks have to implement complex mapping functions hence they need activation functions that are non-linear in order to bring in the much needed non-linearity property that enables them to approximate any function. A neuron without an activation function is equivalent to a neuron with a linear activation function