Balanced Accuracy Score in Tensorflow - tensorflow

I am implementing a CNN for an highly unbalanced classification problem and I would like to implement custum metrics in tensorflow to use the Select Best Model callback.
Specifically I would like to implement the balanced accuracy score, which is the average of the recall of each class (see sklearn implementation here), does someone know how to do it?

I was facing the same issue so I implemented a custom class based off SparseCategoricalAccuracy:
class BalancedSparseCategoricalAccuracy(keras.metrics.SparseCategoricalAccuracy):
def __init__(self, name='balanced_sparse_categorical_accuracy', dtype=None):
super().__init__(name, dtype=dtype)
def update_state(self, y_true, y_pred, sample_weight=None):
y_flat = y_true
if y_true.shape.ndims == y_pred.shape.ndims:
y_flat = tf.squeeze(y_flat, axis=[-1])
y_true_int = tf.cast(y_flat, tf.int32)
cls_counts = tf.math.bincount(y_true_int)
cls_counts = tf.math.reciprocal_no_nan(tf.cast(cls_counts, self.dtype))
weight = tf.gather(cls_counts, y_true_int)
return super().update_state(y_true, y_pred, sample_weight=weight)
The idea is to set each class weight inversely proportional to its size.
This code produces some warnings from Autograph but I believe those are Autograph bugs, and the metric seems to work fine.

There are 3 ways I can think of tackling the situation :-
1)Random Under-sampling - In this method you can randomly remove samples from the majority classes.
2)Random Over-sampling - In this method you can increase the samples by replicating them.
3)Weighted cross entropy - You can also use weighted cross entropy so that the loss value can be compensated for the minority classes. See here
I have personally tried method2 and it does increase my accuracy by significant value but it may vary from dataset to dataset

NOTE
It appears that the implementation/API of the Recall class, which I used as a template for my answer, has been modified in the newer TF versions (as pointed out by #guilaumme-gaudin), so I recommend you look at the Recall implementation used in your current TF version and take it from there to implement the metric using the same approach I describe in the original post, this way I don't have to update my answer every time the TF team modifies the implementation/API of its metrics.
Original post
I'm not an expert in Tensorflow but using a bit of pattern matching between metrics implementations in the tf source code I came up with this
from tensorflow.python.keras import backend as K
from tensorflow.python.keras.metrics import Metric
from tensorflow.python.keras.utils import metrics_utils
from tensorflow.python.ops import init_ops
from tensorflow.python.ops import math_ops
from tensorflow.python.keras.utils.generic_utils import to_list
class BACC(Metric):
def __init__(self, thresholds=None, top_k=None, class_id=None, name=None, dtype=None):
super(BACC, self).__init__(name=name, dtype=dtype)
self.init_thresholds = thresholds
self.top_k = top_k
self.class_id = class_id
default_threshold = 0.5 if top_k is None else metrics_utils.NEG_INF
self.thresholds = metrics_utils.parse_init_thresholds(
thresholds, default_threshold=default_threshold)
self.true_positives = self.add_weight(
'true_positives',
shape=(len(self.thresholds),),
initializer=init_ops.zeros_initializer)
self.true_negatives = self.add_weight(
'true_negatives',
shape=(len(self.thresholds),),
initializer=init_ops.zeros_initializer)
self.false_positives = self.add_weight(
'false_positives',
shape=(len(self.thresholds),),
initializer=init_ops.zeros_initializer)
self.false_negatives = self.add_weight(
'false_negatives',
shape=(len(self.thresholds),),
initializer=init_ops.zeros_initializer)
def update_state(self, y_true, y_pred, sample_weight=None):
return metrics_utils.update_confusion_matrix_variables(
{
metrics_utils.ConfusionMatrix.TRUE_POSITIVES: self.true_positives,
metrics_utils.ConfusionMatrix.TRUE_NEGATIVES: self.true_negatives,
metrics_utils.ConfusionMatrix.FALSE_POSITIVES: self.false_positives,
metrics_utils.ConfusionMatrix.FALSE_NEGATIVES: self.false_negatives,
},
y_true,
y_pred,
thresholds=self.thresholds,
top_k=self.top_k,
class_id=self.class_id,
sample_weight=sample_weight)
def result(self):
"""
Returns the Balanced Accuracy (average between recall and specificity)
"""
result = (math_ops.div_no_nan(self.true_positives, self.true_positives + self.false_negatives) +
math_ops.div_no_nan(self.true_negatives, self.true_negatives + self.false_positives)) / 2
return result
def reset_states(self):
num_thresholds = len(to_list(self.thresholds))
K.batch_set_value(
[(v, np.zeros((num_thresholds,))) for v in self.variables])
def get_config(self):
config = {
'thresholds': self.init_thresholds,
'top_k': self.top_k,
'class_id': self.class_id
}
base_config = super(BACC, self).get_config()
return dict(list(base_config.items()) + list(config.items()))
I've simply taken the Recall class implementation from the source code as a template and I extended it to make sure it has a TP,TN,FP and FN defined.
After that I modified the result method so that it calculates balanced accuracy and voila :)
I compared the results from this with sklearn's balanced accuracy score and the values matched so I think it's correct, but do double check just in case.

I have not tested this code yet, but looking at the source code of tensorflow==2.1.0, this might work for the binary classification case:
from tensorflow.keras.metrics import Recall
from tensorflow.python.ops import math_ops
class BalancedBinaryAccuracy(Recall):
def result(self):
result = (math_ops.div_no_nan(self.true_positives, self.true_positives + self.false_negatives) +
math_ops.div_no_nan(self.true_negatives, self.true_negatives + self.false_positives)) / 2
return result[0] if len(self.thresholds) == 1 else result

As an alternative to writing a custom metric, you can write a custom callback using the metrics already implemented ad available via the training logs. For example you can define the training balanced accuracy callback like this:
class TrainBalancedAccuracyCallback(tf.keras.callbacks.Callback):
def __init__(self, **kargs):
super(TrainBalancedAccuracyCallback, self).__init__(**kargs)
def on_epoch_end(self, epoch, logs={}):
train_sensitivity = logs['tp'] / (logs['tp'] + logs['fn'])
train_specificity = logs['tn'] / (logs['tn'] + logs['fp'])
logs['train_sensitivity'] = train_sensitivity
logs['train_specificity'] = train_specificity
logs['train_balacc'] = (train_sensitivity + train_specificity) / 2
print('train_balacc', logs['train_balacc'])
and the same for the validation:
class ValBalancedAccuracyCallback(tf.keras.callbacks.Callback):
def __init__(self, **kargs):
super(ValBalancedAccuracyCallback, self).__init__(**kargs)
def on_epoch_end(self, epoch, logs={}):
val_sensitivity = logs['val_tp'] / (logs['val_tp'] + logs['val_fn'])
val_specificity = logs['val_tn'] / (logs['val_tn'] + logs['val_fp'])
logs['val_sensitivity'] = val_sensitivity
logs['val_specificity'] = val_specificity
logs['val_balacc'] = (val_sensitivity + val_specificity) / 2
print('val_balacc', logs['val_balacc'])
and then you can use these as values to the callback argument of the fit method of the model.

Related

How to avoid memory leakage in an autoregressive model within tensorflow

Recently, I am training a LSTM with attention mechanism for regressionin tensorflow 2.9 and I met an problem during training with model.fit():
At the beginning, the training time is okay, like 7s/step. However, it was increasing during the process and after several steps, like 1000, the value might be 50s/step. Here below is a part of the code for my model:
class AttentionModel(tf.keras.Model):
def __init__(self, encoder_output_dim, dec_units, dense_dim, batch):
super().__init__()
self.dense_dim = dense_dim
self.batch = batch
encoder = Encoder(encoder_output_dim)
decoder = Decoder(dec_units,dense_dim)
self.encoder = encoder
self.decoder = decoder
def call(self, inputs):
# Creat a tensor to record the result
tempt = list()
encoder_output, encoder_state = self.encoder(inputs)
new_features = np.zeros((self.batch, 1, 1))
dec_initial_state = encoder_state
for i in range(6):
dec_inputs = DecoderInput(new_features=new_features, enc_output=encoder_output)
dec_result, dec_state = self.decoder(dec_inputs, dec_initial_state)
tempt.append(dec_result.logits)
new_features = dec_result.logits
dec_initial_state = dec_state
result=tf.concat(tempt,1)
return result
In the official documents for tf.function, I notice: "Don't rely on Python side effects like object mutation or list appends".
Since I use a dynamic python list with append() to record the intermediate variables, I guess each time during training, a new tf.graph was added. Is the reason my training is getting slower and slower?
Additionally, what should I use instead of python list to avoid this? I have tried with a numpy.zeros matrix but it will lead to another problem:
tempt = np.zeros(shape=(1,6))
...
for i in range(6):
dec_inputs = DecoderInput(new_features=new_features, enc_output=encoder_output)
dec_result, dec_state = self.decoder(dec_inputs, dec_initial_state)
tempt[i]=(dec_result.logits)
...
Cannot convert a symbolic tf.Tensor (decoder/dense_3/BiasAdd:0) to a numpy array. This error may indicate that you're trying to pass a Tensor to a NumPy call, which is not supported.

Metrics using batches v/s metrics using full dataset

I am using training an image classification model using the pre-trained mobile network. During training, I am seeing very high values (more than 70%) for Accuracy, Precision, Recall, and F1-score on both the training dataset and validation dataset.
For me, this is an indication that my model is learning fine.
But when I checked these metrics on an Unbatched training and Unbatched Validation these metrics are very low. These are not even 1%.
Unbatched dataset means I am not taking calculating these metrics over batches and not taking the average of metrics to calculate the final metrics which is what Tensorflow/Keras does during model training. I am calculating these metrics on a full dataset in a single run
I am unable to find out what is causing this Behaviour. Please help me understand what is causing this difference and how to ensure that results are consistent on both, i.e. a minor difference is acceptable.
Code that I used for evaluating metrics
My old code
def test_model(model, data, CLASSES, label_one_hot=True, average="micro",
threshold_analysis=False, thres_analysis_start_point=0.0,
thres_analysis_end_point=0.95, thres_step=0.05, classwise_analysis=False,
produce_confusion_matrix=False):
images_ds = data.map(lambda image, label: image)
labels_ds = data.map(lambda image, label: label).unbatch()
NUM_VALIDATION_IMAGES = count_data_items(tf_records_filenames=data)
cm_correct_labels = next(iter(labels_ds.batch(NUM_VALIDATION_IMAGES))).numpy() # get everything as one batch
if label_one_hot is True:
cm_correct_labels = np.argmax(cm_correct_labels, axis=-1)
cm_probabilities = model.predict(images_ds)
cm_predictions = np.argmax(cm_probabilities, axis=-1)
warnings.filterwarnings('ignore')
overall_score = f1_score(cm_correct_labels, cm_predictions, labels=range(len(CLASSES)), average=average)
overall_precision = precision_score(cm_correct_labels, cm_predictions, labels=range(len(CLASSES)), average=average)
overall_recall = recall_score(cm_correct_labels, cm_predictions, labels=range(len(CLASSES)), average=average)
# cmat = (cmat.T / cmat.sum(axis=1)).T # normalized
# print('f1 score: {:.3f}, precision: {:.3f}, recall: {:.3f}'.format(score, precision, recall))
overall_test_results = {'overall_f1_score': overall_score, 'overall_precision':overall_precision, 'overall_recall':overall_recall}
if classwise_analysis is True:
label_index_dict = get_index_label_from_tf_record(dataset=data)
label_index_dict = {k:v for k, v in sorted(list(label_index_dict.items()))}
label_index_df = pd.DataFrame(label_index_dict, index=[0]).T.reset_index().rename(columns={'index':'class_ind', 0:'class_names'})
# Class wise precision, recall and f1_score
classwise_score = f1_score(cm_correct_labels, cm_predictions, labels=range(len(CLASSES)), average=None)
classwise_precision = precision_score(cm_correct_labels, cm_predictions, labels=range(len(CLASSES)), average=None)
classwise_recall = recall_score(cm_correct_labels, cm_predictions, labels=range(len(CLASSES)), average=None)
ind_class_count_df = class_ind_counter_from_tfrecord(data)
ind_class_count_df = ind_class_count_df.merge(label_index_df, how='left', left_on='class_names', right_on='class_names')
classwise_test_results = {'classwise_f1_score':classwise_score, 'classwise_precision':classwise_precision,
'classwise_recall':classwise_recall, 'class_names':CLASSES}
classwise_test_results_df = pd.DataFrame(classwise_test_results)
if produce_confusion_matrix is True:
cmat = confusion_matrix(cm_correct_labels, cm_predictions, labels=range(len(CLASSES)))
return overall_test_results, classwise_test_results, cmat
return overall_test_results, classwise_test_results
if produce_confusion_matrix is True:
cmat = confusion_matrix(cm_correct_labels, cm_predictions, labels=range(len(CLASSES)))
return overall_test_results, cmat
warnings.filterwarnings('always')
return overall_test_results
Just to ensure that my model testing function is correct I write a newer version of code in TensorFlow.
def eval_model(y_true, y_pred):
eval_results = {}
unbatch_accuracy = tf.keras.metrics.CategoricalAccuracy(name='unbatch_accuracy')
unbatch_recall = tf.keras.metrics.Recall(name='unbatch_recall')
unbatch_precision = tf.keras.metrics.Precision(name='unbatch_precision')
unbatch_f1_micro = tfa.metrics.F1Score(name='unbatch_f1_micro', num_classes=n_labels, average='micro')
unbatch_f1_macro = tfa.metrics.F1Score(name='unbatch_f1_macro', num_classes=n_labels, average='macro')
unbatch_accuracy.update_state(y_true, y_pred)
unbatch_recall.update_state(y_true, y_pred)
unbatch_precision.update_state(y_true, y_pred)
unbatch_f1_micro.update_state(y_true, y_pred)
unbatch_f1_macro.update_state(y_true, y_pred)
eval_results['unbatch_accuracy'] = unbatch_accuracy.result().numpy()
eval_results['unbatch_recall'] = unbatch_recall.result().numpy()
eval_results['unbatch_precision'] = unbatch_precision.result().numpy()
eval_results['unbatch_f1_micro'] = unbatch_f1_micro.result().numpy()
eval_results['unbatch_f1_macro'] = unbatch_f1_macro.result().numpy()
unbatch_accuracy.reset_states()
unbatch_recall.reset_states()
unbatch_precision.reset_states()
unbatch_f1_micro.reset_states()
unbatch_f1_macro.reset_states()
return eval_results
The results are nearly the same by using both of the functions.
Please suggest what is going on here.
I think this sugesstion MAY help you, I am not sure. in this, you added
unbatch_accuracy.reset_states()
unbatch_recall.reset_states()
unbatch_precision.reset_states()
unbatch_f1_micro.reset_states()
unbatch_f1_macro.reset_states()
resetting states at each epoch maybe not be a cumulative one
After spending many hours, I found the issue was due to the shuffle function. I was using the below function to shuffle, batch and prefetch the dataset.
def shuffle_batch_prefetch(dataset, prefetch_size=1, batch_size=16,
shuffle_buffer_size=None,
drop_remainder=False,
interleave_num_pcall=None):
if shuffle_buffer_size is None:
raise ValueError("shuffle_buffer_size can't be None")
def shuffle_fn(ds):
return ds.shuffle(buffer_size=shuffle_buffer_size, seed=108)
dataset = dataset.apply(shuffle_fn)
dataset = dataset.batch(batch_size, drop_remainder=drop_remainder)
dataset = dataset.prefetch(buffer_size=prefetch_size)
return dataset
Part of the function that causes the problem
def shuffle_fn(ds):
return ds.shuffle(buffer_size=shuffle_buffer_size, seed=108)
dataset = dataset.apply(shuffle_fn)
I removed the shuffle part and metrics are back as per the expectation.
Function after removing the shuffle part
def shuffle_batch_prefetch(dataset, prefetch_size=1, batch_size=16,
drop_remainder=False,
interleave_num_pcall=None):
dataset = dataset.batch(batch_size, drop_remainder=drop_remainder)
dataset = dataset.prefetch(buffer_size=prefetch_size)
return dataset
Results after removing the shuffle part
I am still not able to understand why shuffling causes this error. Shuffling was the best practice to follow before training your data. Although, I have already shuffled training data during data read time so removing this was not a problem for me

TensorFlow-Keras generator: Turn off auto-sharding or switch auto_shard_policiy to DATA

While training my model I ran into the issue described in the post Tensorflow - Keras: Consider either turning off auto-sharding or switching the auto_shard_policy to DATA to shard this dataset. My question now is: Does the solution mentioned by #Graham501617 work with generators as well? Here is some dummy code for what I use so far:
class BatchGenerator(Sequence):
def __init__(self, some_args):
...
def __len__(self):
num_batches_in_sequence = ...
def __getitem__(self, _):
data, labels = get_one_batch(self.some_args)
return data, labels
In the main script I do something like:
train_generator = BatchGenerator(some_args)
valid_generator = BatchGenerator(some_args)
cross_device_ops = tf.distribute.HierarchicalCopyAllReduce(num_packs=2)
strategy = tf.distribute.MirroredStrategy(cross_device_ops=cross_device_ops)
with strategy.scope():
model = some_model
model.compile(some_args)
history = model.fit(
x=train_generator,
validation_data=valid_generator,
...
)
I would probably have to modify the __getitem__ function somehow, do I?
I appreciate your support!
You'd have to wrap your generator into a single function...
Example below assumes your data is stored as numpy array (.npy), each file already has the correct amount of mini-batch size, is labeled 0_x.npy, 1_x.npy, 2_x.npy, etc.. and both data and label arrays are float64.
from pathlib import Path
import tensorflow as tf
import numpy as np
# Your new generator as a function rather than an object you need to instantiate
def getNextBatch(stop, data_dir):
i = 0
data_dir = data_dir.decode('ascii')
while True:
while i < stop:
x = np.load(str(Path(data_dir + "/" + str(i) + "_x.npy")))
y = np.load(str(Path(data_dir + "/" + str(i) + "_y.npy")))
yield x, y
i += 1
i = 0
# Make a dataset given the directory and strategy
def makeDataset(generator_func, dir, strategy=None):
# Get amount of files
data_size = int(len([name for name in os.listdir(dir) if os.path.isfile(os.path.join(dir, name))])/2)
ds = tf.data.Dataset.from_generator(generator_func, args=[data_size, dir], output_types=(tf.float64, tf.float64)) # Make a dataset from the generator. MAKE SURE TO SPECIFY THE DATA TYPE!!!
options = tf.data.Options()
options.experimental_distribute.auto_shard_policy = tf.data.experimental.AutoShardPolicy.OFF
ds = ds.with_options(options)
# Optional: Make it a distributed dataset if you're using a strategy
if strategy is not None:
ds = strategy.experimental_distribute_dataset(ds)
return ds
training_ds = makeDataset(getNextBatch, str(Path(data_dir + "/training")), None)
validation_ds = makeDataset(getNextBatch, str(Path(data_dir + "/validation")), None)
model.fit(training_ds,
epochs=epochs,
callbacks=callbacks,
validation_data=validation_ds)
You might need to pass the amount of steps per epoch in your fit() call, in which case you can use the generator you've already made.

how to calculate entropy on float numbers over a tensor in python keras

I have been struggling on this and could not get it to work. hope someone can help me with this.
I want to calculate the entropy on each row of the tensor. Because my data are float numbers not integers I think I need to use bin_histogram.
For example a sample of my data is tensor =[[0.2, -0.1, 1],[2.09,-1.4,0.9]]
Just for information My model is seq2seq and written in keras with tensorflow backend.
This is my code so far: I need to correct rev_entropy
class entropy_measure(Layer):
def __init__(self, beta,batch, **kwargs):
self.beta = beta
self.batch = batch
self.uses_learning_phase = True
self.supports_masking = True
super(entropy_measure, self).__init__(**kwargs)
def call(self, x):
return K.in_train_phase(self.rev_entropy(x, self.beta,self.batch), x)
def get_config(self):
config = {'beta': self.beta}
base_config = super(entropy_measure, self).get_config()
return dict(list(base_config.items()) + list(config.items()))
def rev_entropy(self, x, beta,batch):
for i in x:
i = pd.Series(i)
p_data = i.value_counts() # counts occurrence of each value
entropy = entropy(p_data) # get entropy from counts
rev = 1/(1+entropy)
return rev
new_f_w_t = x * (rev.reshape(rev.shape[0], 1))*beta
return new_f_w_t
Any input is much appreciated:)
It looks like you have a series of questions that come together on this issue. I'll settle it here.
You calculate entropy in the following form of scipy.stats.entropy according to your code:
scipy.stats.entropy(pk, qk=None, base=None)
Calculate the entropy of a distribution for given probability values.
If only probabilities pk are given, the entropy is calculated as S =
-sum(pk * log(pk), axis=0).
Tensorflow does not provide a direct API to calculate entropy on each row of the tensor. What we need to do is to implement the above formula.
import tensorflow as tf
import pandas as pd
from scipy.stats import entropy
a = [1.1,2.2,3.3,4.4,2.2,3.3]
res = entropy(pd.value_counts(a))
_, _, count = tf.unique_with_counts(tf.constant(a))
# [1 2 2 1]
prob = count / tf.reduce_sum(count)
# [0.16666667 0.33333333 0.33333333 0.16666667]
tf_res = -tf.reduce_sum(prob * tf.log(prob))
with tf.Session() as sess:
print('scipy version: \n',res)
print('tensorflow version: \n',sess.run(tf_res))
scipy version:
1.329661348854758
tensorflow version:
1.3296613488547582
Then we need to define a function and achieve for loop through tf.map_fn in your custom layer according to above code.
def rev_entropy(self, x, beta,batch):
def row_entropy(row):
_, _, count = tf.unique_with_counts(row)
prob = count / tf.reduce_sum(count)
return -tf.reduce_sum(prob * tf.log(prob))
value_ranges = [-10.0, 100.0]
nbins = 50
new_f_w_t = tf.histogram_fixed_width_bins(x, value_ranges, nbins)
rev = tf.map_fn(row_entropy, new_f_w_t,dtype=tf.float32)
new_f_w_t = x * 1/(1+rev)*beta
return new_f_w_t
Notes that the hidden layer will not produce a gradient that cannot propagate backwards since entropy is calculated on the basis of statistical probabilistic values. Maybe you need to rethink your hidden layer structure.

Output error rate per label / confusion matrix

I train an image classifier using Keras up to around 98% test accuracy. Now I know that the overall accuracy is 98%, but i want to know the accuracy/error per distinct class/label.
Has Keras a builtin function for that or would I have to test this myself per class/label?
Update: Thanks #gionni. I didn't know the actual term was "Confusion Matrix". But that's what I am actually looking for. That being said, is there a function to generate one? I have to use Keras 1.2.2 by the way.
I had similar issue so I could share my code with you. The following function computes a single class accuracy:
def single_class_accuracy(interesting_class_id):
def fn(y_true, y_pred):
class_id_preds = K.argmax(y_pred, axis=-1)
# Replace class_id_preds with class_id_true for recall here
positive_mask = K.cast(K.equal(class_id_preds, interesting_class_id), 'int32')
true_mask = K.cast(K.equal(y_true, interesting_class_id), 'int32')
acc_mask = K.cast(K.equal(positive_mask, true_mask), 'float32')
class_acc = K.mean(acc_mask)
return class_acc
return fn
Now - if you want to get an accuracy for 0 class you could add it to metrics while compiling a model:
model.compile(..., metrics=[..., single_class_accuracy(0)])
If you want to have all classes accuracy you could type:
model.compile(...,
metrics=[...] + [single_class_accuracy(i) for i in range(nb_of_classes)])
There may be better options, but you can use this:
import numpy as np
#gather each true label
distinct, counts = np.unique(trueLabels,axis=0,return_counts=True)
for dist,count in zip(distinct, counts):
selector = (trueLabels == dist).all(axis=-1)
selectedX = testData[selector]
selectedY = trueLabels[selector]
print('\n\nEvaluating for ' + str(count) + ' occurrences of class ' + str(dist))
print(model.evaluate(selectedX,selectedY,verbose=0))