How to deal with Imbalanced Dataset for Multi Label Classification - tensorflow

I was wondering how to penalize less represented classes more then other classes when dealing with a really imbalanced dataset (10 classes over about 20000 samples but here is th number of occurence for each class : [10868 26 4797 26 8320 26 5278 9412 4485 16172 ]).
I read about the Tensorflow function : weighted_cross_entropy_with_logits (https://www.tensorflow.org/api_docs/python/tf/nn/weighted_cross_entropy_with_logits) but I am not sure I can use it for a multi label problem.
I found a post that sum up perfectly the problem I have (Neural Network for Imbalanced Multi-Class Multi-Label Classification) and that propose an idea but it had no answers and I thought the idea might be good :)
Thank you for your ideas and answers !

First of all, there is my suggestion you can modify your cost function to use in a multi-label way. There is code which show how to use Softmax Cross Entropy in Tensorflow for multilabel image task.
With that code, you can multiple weights in each row of loss calculation. Here is the example code in case you have multi-label task: (i.e, each image can have two labels)
logits_split = tf.split( axis=1, num_or_size_splits=2, value= logits )
labels_split = tf.split( axis=1, num_or_size_splits=2, value= labels )
weights_split = tf.split( axis=1, num_or_size_splits=2, value= weights )
total = 0.0
for i in range ( len(logits_split) ):
temp = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits( logits=logits_split[i] , labels=labels_split[i] ))
total += temp * tf.reshape(weights_split[i],[-1])

I think you can just use tf.nn.weighted_cross_entropy_with_logits for multiclass classification.
For example, for 4 classes, where the ratios to the class with the largest number of members are [0.8, 0.5, 0.6, 1], You would just give it a weight vector in the following way:
cross_entropy = tf.nn.weighted_cross_entropy_with_logits(
targets=ground_truth_input, logits=logits,
pos_weight = tf.constant([0.8,0.5,0.6,1]))

So I am not entirely sure that I understand your problem given what you have written. The post you link to writes about multi-label AND multi-class, but that doesn't really make sense given what is written there either. So I will approach this as a multi-class problem where for each sample, you have a single label.
In order to penalize the classes, I implemented a weight Tensor based on the labels in the current batch. For a 3-class problem, you could eg. define the weights as the inverse frequency of the classes, such that if the proportions are [0.1, 0.7, 0.2] for class 1, 2 and 3, respectively, the weights will be [10, 1.43, 5]. Defining a weight tensor based on the current batch is then
weight_per_class = tf.constant([10, 1.43, 5]) # shape (, num_classes)
onehot_labels = tf.one_hot(labels, depth=3) # shape (batch_size, num_classes)
weights = tf.reduce_sum(
tf.multiply(onehot_labels, weight_per_class), axis=1) # shape (batch_size, num_classes)
reduction = tf.losses.Reduction.MEAN # this ensures that we get a weighted mean
loss = tf.losses.softmax_cross_entropy(
onehot_labels=onehot_labels, logits=logits, weights=weights, reduction=reduction)
Using softmax ensures that the classification problem is not 3 independent classifications.

Related

How can I assign certain samples as negative samples when using sampled_softmax_loss in TensorFlow?

The API of sampled_softmax_loss goes like:
tf.nn.sampled_softmax_loss(
weights,
biases,
labels,
inputs,
num_sampled,
num_classes,
num_true=1,
sampled_values=None,
...
)
I've noticed that arg sampled_values is the one which determines what negatives samples we take and it's returned by a _candidate_sampler function like tf.random.fixed_unigram_candidate_sampler.
And in tf.random.fixed_unigram_candidate_sampler we can decide the probability of each sample chosen as negative sample.
How can I assign certain sample as negative sample on purpose?
For instance, in the case of recommender system, I'd like to add some hard negative sample to the model. So I want the hard negative samples been chosen for sure, not by probability like in _candidate_sampler function
How can I assign certain samples as negative samples when using sampled_softmax_loss in TensorFlow?
You need to understand that the sampler candidates function is only a remarks function and your question is right about how to create a negative sampler.
You don't need to create a negative sampler when you assigned a unique. The sampler is (sampled_candidates, true_expected_count, sampled_expected_count). Hard negative is when you add contrast values to significant the candidates. In this way, you can have it with distributions.
Random Uniform Candidates Sampler
Candidate Sampling
Sampled SoftMax
Simple: It is weight and bias are varies, and functions are the same.
import tensorflow as tf
weights = tf.zeros([4, 1])
biases = tf.zeros([4])
labels = tf.ones([4, 1])
inputs = tf.zeros([4, 1])
num_sampled = 1
num_classes = 1
true_classes = tf.ones([4, 4], dtype=tf.int64)
num_true = 4
num_sampled = 1
unique =True
range_max = 1
sampler = tf.random.uniform_candidate_sampler(
true_classes,
num_true,
num_sampled,
unique,
range_max,
seed=None,
name=None
)
loss_fn = tf.nn.sampled_softmax_loss(
weights,
biases,
labels,
inputs,
num_sampled,
num_classes,
num_true=1,
sampled_values=sampler,
remove_accidental_hits=True,
seed=None,
name='sampled_softmax_loss'
)
print( loss_fn )
Output: Value output as examples, and ran three times.
tf.Tensor([6.437752 6.437752 6.437752 6.437752], shape=(4,), dtype=float32)
tf.Tensor([6.437752 6.437752 6.437752 6.437752], shape=(4,), dtype=float32)
tf.Tensor([6.437752 6.437752 6.437752 6.437752], shape=(4,), dtype=float32)

Evaluating (model.evaluate) with a triplet loss Siamese neural network model - tensorflow

I have trained a Siamese neural network that uses triplet loss. It was a pain, but I think I managed to do it. However, I am struggling to understand how to make evaluations with this model.
The SNN:
def triplet_loss(y_true, y_pred):
margin = K.constant(1)
return K.mean(K.maximum(K.constant(0), K.square(y_pred[:,0]) - 0.5*(K.square(y_pred[:,1])+K.square(y_pred[:,2])) + margin))
def euclidean_distance(vects):
x, y = vects
return K.sqrt(K.maximum(K.sum(K.square(x - y), axis=1, keepdims=True), K.epsilon()))
anchor_input = Input((max_len, ), name='anchor_input')
positive_input = Input((max_len, ), name='positive_input')
negative_input = Input((max_len, ), name='negative_input')
Shared_DNN = create_base_network(embedding_dim = EMBEDDING_DIM, max_len=MAX_LEN, embed_matrix=embed_matrix)
encoded_anchor = Shared_DNN(anchor_input)
encoded_positive = Shared_DNN(positive_input)
encoded_negative = Shared_DNN(negative_input)
positive_dist = Lambda(euclidean_distance, name='pos_dist')([encoded_anchor, encoded_positive])
negative_dist = Lambda(euclidean_distance, name='neg_dist')([encoded_anchor, encoded_negative])
tertiary_dist = Lambda(euclidean_distance, name='ter_dist')([encoded_positive, encoded_negative])
stacked_dists = Lambda(lambda vects: K.stack(vects, axis=1), name='stacked_dists')([positive_dist, negative_dist, tertiary_dist])
model = Model([anchor_input, positive_input, negative_input], stacked_dists, name='triple_siamese')
model.compile(loss=triplet_loss, optimizer=adam_optim, metrics=[accuracy])
history = model.fit([Anchor,Positive,Negative],y=Y_dummy,validation_data=([Anchor_test,Positive_test,Negative_test],Y_dummy2), batch_size=128, epochs=25)
I understand that once a model is trained with triplets, the evaluation shouldn't actually require that triplets be used. However, how do I finagle this reshaping?
Because this is a SNN, I would want to feed two inputs into model.evaluate, along with a categorical variable denoting if the two inputs are similar or not (1 = similar, 0 = not similar).
So basically, I want model.evaluate(input1, input2, y_label). But I am not sure how to get this with the model that I trained. As shown above, I trained with three inputs: model.fit([Anchor,Positive,Negative],y=Y_dummy ... ) .
I know I should save the weights of my trained model, but I just don't know what model to load the weights onto.
Your help is greatly appreciated!
EDIT:
I am aware of the below approach for prediction, but I am not looking for prediction, I am looking to use model.evaluate as I want to get some final measure of loss/accuracy for the model. Also this approach only feeds the anchor into the model (wheras I'm interested in text similarity, so would want to feed in 2 inputs)
eval_model = Model(inputs=anchor_input, outputs=encoded_anchor)
eval_model.load_weights('weights.hdf5')
Considering that eval_model is trained to produce embeddings, I think that should be good to evaluate the similarity between two embeddings using cosine similarity.
Following the TF documentation, the cosine similarity is a number between -1 and 1. When it is a negative number closer to -1, it indicates greater similarity. When it is a positive number closer to 1, it indicates greater dissimilarity.
We can simply calculate the cosine similarity between Positive and Negative inputs for all the samples at disposal. When the cosine similarity is < 0 we can say that the two inputs are similar (1 = similar, 0 = not similar). In the end, is possible to calculate the binary accuracy as a final metric.
We can make all the calculations using TF and without the need of using model.evaluate.
eval_model = Model(inputs=anchor_input, outputs=encoded_anchor)
eval_model.load_weights('weights.hdf5')
cos_sim = tf.keras.losses.cosine_similarity(
eval_model(X1), eval_model(X2)
).numpy().reshape(-1,1)
accuracy = tf.reduce_mean(tf.keras.metrics.binary_accuracy(Y, -cos_sim, threshold=0))
Another approach consists in computing the cosine similarity between the anchor and positive images and comparing it with the similarity between the anchor and the negative images.
eval_model = Model(inputs=anchor_input, outputs=encoded_anchor)
eval_model.load_weights('weights.hdf5')
positive_similarity = tf.keras.losses.cosine_similarity(
eval_model(X_anchor), eval_model(X_positive)
).numpy().mean()
negative_similarity = tf.keras.losses.cosine_similarity(
eval_model(X_anchor), eval_model(X_negative)
).numpy().mean()
We should expect the similarity between the anchor and positive images to be larger than the similarity between the anchor and the negative images.

Predict if probability is higher than certain value

I am using a MLP model for classification.
When I predict for new data, I want to keep only those predictions whose probability of prediction is larger than 0.5, and change all other predictions into class 0.
How can I do it in keras ?
I'm using using last layer as follows
model.add(layers.Dense(7 , activation='softmax'))
Is it meaningful to get predictions with probability larger than 0.5 using the softmax?
newdata = (nsamples, nfeatures)
predictions = model.predict (newdata)
print (predictions.shape)
(500, 7)
you can do something like:
preds=model.predict etc
index=np.argmax(preds)
probability= preds(index)
if probability >=.75:
print (' class is ', index,' with high confidence')
elif probability >=.5:
print (' class is ', index,' with medium confidence')
else:
print (' class is ', index,' with low confidence')
Softmax function outputs probabilities. So in your case you will have 7 classes and their probability sum will be equal to 1.
Now consider a case [0.1, 0.1, 0.1, 0.1, 0.1, 0.2, 0.3] which is the output of the softmax. Appyling a threshold in that case would not make sense as you can see.
Threshold 0.5 has nothing to do with n-classed predictions. It is something special for binary classification.
For to get classes, you should use argmax.
Edit: If you want to drop your predictions if they are under a certain threshold, you can use, but that's not a correct way to deal with multi-class predictions:
labels = []
threshold = 0.5
for probs_thresholded in out:
labels.append([])
for i in range(len(probs_thresholded)):
if probs_thresholded[i] >= threshold:
labels[-1].append(1)
else:
labels[-1].append(0)

How BatchNormalization in keras works?

I want to know how BatchNormalization works in keras, so I write the code:
X_input = keras.Input((2,))
X = keras.layers.BatchNormalization(axis=1)(X_input)
model1 = keras.Model(inputs=X_input, outputs=X)
the input is a batch of two dimenstions vector, and normalizing it along axis=1, then print the output:
a = np.arange(4).reshape((2,2))
print('a=')
print(a)
print('output=')
print(model1.predict(a,batch_size=2))
and the output is:
a=
array([[0, 1],
[2, 3]])
output=
array([[ 0. , 0.99950039],
[ 1.99900079, 2.9985013 ]], dtype=float32)
I can not figure out the results. As far as I know, the mean of the batch should be ([0,1] + [2,3])/2 = [1,2], the var is 1/2*(([0,1] - [1,2])^2 + ([2,3]-[1,2])^2) = [1,1]. Finally, normalizing it with (x - mean)/sqrt(var), therefore the results are [-1, -1] and [1,1], where am I wrong?
BatchNormalization will substract the mean, divide by the variance, apply a factor gamma and an offset beta. If these parameters would actually be the mean and variance of your batch, the result would be centered around zero with variance 1.
But they are not. The keras BatchNormalization layer stores these as weights that can be trained, called moving_mean, moving_variance, beta and gamma. They are initialized as beta=0, gamma=1, moving_mean=0 and moving_variance=1. Since you don't have any train steps, BatchNorm does not change your values.
So, why don't you get exactly your input values? Because there is another parameter epsilon (a small number), which gets added to the variance. Therefore, all values are divided by 1+epsilon and end up a little bit below their input values.

Implementing contrastive loss and triplet loss in Tensorflow

I started to play with TensorFlow two days ago and I'm wondering if there is the triplet and the contrastive losses implemented.
I've been looking at the documentation, but I haven't found any example or description about these things.
Update (2018/03/19): I wrote a blog post detailing how to implement triplet loss in TensorFlow.
You need to implement yourself the contrastive loss or the triplet loss, but once you know the pairs or triplets this is quite easy.
Contrastive Loss
Suppose you have as input the pairs of data and their label (positive or negative, i.e. same class or different class). For instance you have images as input of size 28x28x1:
left = tf.placeholder(tf.float32, [None, 28, 28, 1])
right = tf.placeholder(tf.float32, [None, 28, 28, 1])
label = tf.placeholder(tf.int32, [None, 1]). # 0 if same, 1 if different
margin = 0.2
left_output = model(left) # shape [None, 128]
right_output = model(right) # shape [None, 128]
d = tf.reduce_sum(tf.square(left_output - right_output), 1)
d_sqrt = tf.sqrt(d)
loss = label * tf.square(tf.maximum(0., margin - d_sqrt)) + (1 - label) * d
loss = 0.5 * tf.reduce_mean(loss)
Triplet Loss
Same as with contrastive loss, but with triplets (anchor, positive, negative). You don't need labels here.
anchor_output = ... # shape [None, 128]
positive_output = ... # shape [None, 128]
negative_output = ... # shape [None, 128]
d_pos = tf.reduce_sum(tf.square(anchor_output - positive_output), 1)
d_neg = tf.reduce_sum(tf.square(anchor_output - negative_output), 1)
loss = tf.maximum(0., margin + d_pos - d_neg)
loss = tf.reduce_mean(loss)
The real trouble when implementing triplet loss or contrastive loss in TensorFlow is how to sample the triplets or pairs. I will focus on generating triplets because it is harder than generating pairs.
The easiest way is to generate them outside of the Tensorflow graph, i.e. in python and feed them to the network through the placeholders. Basically you select images 3 at a time, with the first two from the same class and the third from another class. We then perform a feedforward on these triplets, and compute the triplet loss.
The issue here is that generating triplets is complicated. We want them to be valid triplets, triplets with a positive loss (otherwise the loss is 0 and the network doesn't learn).
To know whether a triplet is good or not you need to compute its loss, so you already make one feedforward through the network...
Clearly, implementing triplet loss in Tensorflow is hard, and there are ways to make it more efficient than sampling in python but explaining them would require a whole blog post !
Triplet loss with semihard negative mining is now implemented in tf.contrib, as follows:
triplet_semihard_loss(
labels,
embeddings,
margin=1.0
)
where:
Args:
labels: 1-D tf.int32 Tensor with shape [batch_size] of multiclass
integer labels.
embeddings: 2-D float Tensor of embedding vectors.Embeddings should
be l2 normalized.
margin: Float, margin term in theloss definition.
Returns:
triplet_loss: tf.float32 scalar.
For further information, check the link bellow:
https://www.tensorflow.org/versions/master/api_docs/python/tf/contrib/losses/metric_learning/triplet_semihard_loss
Tiago, I don't think you are using the same formula Olivier gave.
Here is the right code (not sure it will work though, just fixing the formula) :
def compute_euclidean_distance(x, y):
"""
Computes the euclidean distance between two tensorflow variables
"""
d = tf.reduce_sum(tf.square(tf.sub(x, y)),1)
return d
def compute_contrastive_loss(left_feature, right_feature, label, margin):
"""
Compute the contrastive loss as in
L = 0.5 * Y * D^2 + 0.5 * (Y-1) * {max(0, margin - D)}^2
**Parameters**
left_feature: First element of the pair
right_feature: Second element of the pair
label: Label of the pair (0 or 1)
margin: Contrastive margin
**Returns**
Return the loss operation
"""
label = tf.to_float(label)
one = tf.constant(1.0)
d = compute_euclidean_distance(left_feature, right_feature)
d_sqrt = tf.sqrt(compute_euclidean_distance(left_feature, right_feature))
first_part = tf.mul(one-label, d)# (Y-1)*(d)
max_part = tf.square(tf.maximum(margin-d_sqrt, 0))
second_part = tf.mul(label, max_part) # (Y) * max(margin - d, 0)
loss = 0.5 * tf.reduce_mean(first_part + second_part)
return loss