Is it possible to save a histogram during evaluation using the estimator API?
I couldn't find a solution since the estimator api does not write down any summaries during evaluation and I can only add scalars to the evaluates metrics.

For the sake of those who came here and haven't found a solution, I will update that I used the above approach, with a slight modification:
summary_writer = tf.compat.v1.summary.FileWriter(
logdir=self.model_dir + '/eval_histograms/',
summary_ops = [tf.compat.v1.summary.histogram(name=k, values=v)
for k, v in
eval_hooks = [
for summary_op in summary_ops]
And it worked fine!

You can use SummarySaverHook
eval_hooks = []
eval_summary_hook = tf.train.SummarySaverHook(
summary_op=tf.summary.histogram(, logits))
return tf.estimator.EstimatorSpec(mode=mode,


Metrics using batches v/s metrics using full dataset

I am using training an image classification model using the pre-trained mobile network. During training, I am seeing very high values (more than 70%) for Accuracy, Precision, Recall, and F1-score on both the training dataset and validation dataset.
For me, this is an indication that my model is learning fine.
But when I checked these metrics on an Unbatched training and Unbatched Validation these metrics are very low. These are not even 1%.
Unbatched dataset means I am not taking calculating these metrics over batches and not taking the average of metrics to calculate the final metrics which is what Tensorflow/Keras does during model training. I am calculating these metrics on a full dataset in a single run
I am unable to find out what is causing this Behaviour. Please help me understand what is causing this difference and how to ensure that results are consistent on both, i.e. a minor difference is acceptable.
Code that I used for evaluating metrics
My old code
def test_model(model, data, CLASSES, label_one_hot=True, average="micro",
threshold_analysis=False, thres_analysis_start_point=0.0,
thres_analysis_end_point=0.95, thres_step=0.05, classwise_analysis=False,
images_ds = image, label: image)
labels_ds = image, label: label).unbatch()
NUM_VALIDATION_IMAGES = count_data_items(tf_records_filenames=data)
cm_correct_labels = next(iter(labels_ds.batch(NUM_VALIDATION_IMAGES))).numpy() # get everything as one batch
if label_one_hot is True:
cm_correct_labels = np.argmax(cm_correct_labels, axis=-1)
cm_probabilities = model.predict(images_ds)
cm_predictions = np.argmax(cm_probabilities, axis=-1)
overall_score = f1_score(cm_correct_labels, cm_predictions, labels=range(len(CLASSES)), average=average)
overall_precision = precision_score(cm_correct_labels, cm_predictions, labels=range(len(CLASSES)), average=average)
overall_recall = recall_score(cm_correct_labels, cm_predictions, labels=range(len(CLASSES)), average=average)
# cmat = (cmat.T / cmat.sum(axis=1)).T # normalized
# print('f1 score: {:.3f}, precision: {:.3f}, recall: {:.3f}'.format(score, precision, recall))
overall_test_results = {'overall_f1_score': overall_score, 'overall_precision':overall_precision, 'overall_recall':overall_recall}
if classwise_analysis is True:
label_index_dict = get_index_label_from_tf_record(dataset=data)
label_index_dict = {k:v for k, v in sorted(list(label_index_dict.items()))}
label_index_df = pd.DataFrame(label_index_dict, index=[0]).T.reset_index().rename(columns={'index':'class_ind', 0:'class_names'})
# Class wise precision, recall and f1_score
classwise_score = f1_score(cm_correct_labels, cm_predictions, labels=range(len(CLASSES)), average=None)
classwise_precision = precision_score(cm_correct_labels, cm_predictions, labels=range(len(CLASSES)), average=None)
classwise_recall = recall_score(cm_correct_labels, cm_predictions, labels=range(len(CLASSES)), average=None)
ind_class_count_df = class_ind_counter_from_tfrecord(data)
ind_class_count_df = ind_class_count_df.merge(label_index_df, how='left', left_on='class_names', right_on='class_names')
classwise_test_results = {'classwise_f1_score':classwise_score, 'classwise_precision':classwise_precision,
'classwise_recall':classwise_recall, 'class_names':CLASSES}
classwise_test_results_df = pd.DataFrame(classwise_test_results)
if produce_confusion_matrix is True:
cmat = confusion_matrix(cm_correct_labels, cm_predictions, labels=range(len(CLASSES)))
return overall_test_results, classwise_test_results, cmat
return overall_test_results, classwise_test_results
if produce_confusion_matrix is True:
cmat = confusion_matrix(cm_correct_labels, cm_predictions, labels=range(len(CLASSES)))
return overall_test_results, cmat
return overall_test_results
Just to ensure that my model testing function is correct I write a newer version of code in TensorFlow.
def eval_model(y_true, y_pred):
eval_results = {}
unbatch_accuracy = tf.keras.metrics.CategoricalAccuracy(name='unbatch_accuracy')
unbatch_recall = tf.keras.metrics.Recall(name='unbatch_recall')
unbatch_precision = tf.keras.metrics.Precision(name='unbatch_precision')
unbatch_f1_micro = tfa.metrics.F1Score(name='unbatch_f1_micro', num_classes=n_labels, average='micro')
unbatch_f1_macro = tfa.metrics.F1Score(name='unbatch_f1_macro', num_classes=n_labels, average='macro')
unbatch_accuracy.update_state(y_true, y_pred)
unbatch_recall.update_state(y_true, y_pred)
unbatch_precision.update_state(y_true, y_pred)
unbatch_f1_micro.update_state(y_true, y_pred)
unbatch_f1_macro.update_state(y_true, y_pred)
eval_results['unbatch_accuracy'] = unbatch_accuracy.result().numpy()
eval_results['unbatch_recall'] = unbatch_recall.result().numpy()
eval_results['unbatch_precision'] = unbatch_precision.result().numpy()
eval_results['unbatch_f1_micro'] = unbatch_f1_micro.result().numpy()
eval_results['unbatch_f1_macro'] = unbatch_f1_macro.result().numpy()
return eval_results
The results are nearly the same by using both of the functions.
Please suggest what is going on here.
I think this sugesstion MAY help you, I am not sure. in this, you added
resetting states at each epoch maybe not be a cumulative one
After spending many hours, I found the issue was due to the shuffle function. I was using the below function to shuffle, batch and prefetch the dataset.
def shuffle_batch_prefetch(dataset, prefetch_size=1, batch_size=16,
if shuffle_buffer_size is None:
raise ValueError("shuffle_buffer_size can't be None")
def shuffle_fn(ds):
return ds.shuffle(buffer_size=shuffle_buffer_size, seed=108)
dataset = dataset.apply(shuffle_fn)
dataset = dataset.batch(batch_size, drop_remainder=drop_remainder)
dataset = dataset.prefetch(buffer_size=prefetch_size)
return dataset
Part of the function that causes the problem
def shuffle_fn(ds):
return ds.shuffle(buffer_size=shuffle_buffer_size, seed=108)
dataset = dataset.apply(shuffle_fn)
I removed the shuffle part and metrics are back as per the expectation.
Function after removing the shuffle part
def shuffle_batch_prefetch(dataset, prefetch_size=1, batch_size=16,
dataset = dataset.batch(batch_size, drop_remainder=drop_remainder)
dataset = dataset.prefetch(buffer_size=prefetch_size)
return dataset
Results after removing the shuffle part
I am still not able to understand why shuffling causes this error. Shuffling was the best practice to follow before training your data. Although, I have already shuffled training data during data read time so removing this was not a problem for me

How to use an optimizer within a forward pass in PyTorch

I want to use an optimizer within the forward pass of a custom defined Function, but it doesn't work. My code is as follows:
class MyFct(Function):
def forward(ctx, *args):
input, weight, bias = args[0], args[1], args[2]
y = torch.tensor([[0]], dtype=torch.float, requires_grad=True) #initial guess
loss_fn = lambda y_star: (input + weight - y_star)**2
learning_rate = 1e-4
optimizer = torch.optim.Adam([y], lr=learning_rate)
for t in range(5000):
y_star = y
loss = loss_fn(y_star)
if t % 100 == 99:
print(t, loss.item())
return y_star
And that's my test inputs:
x = torch.tensor([[2]], dtype=torch.float, requires_grad=True)
w = torch.tensor([[2]], dtype=torch.float, requires_grad=True)
y = torch.tensor([[6]], dtype=torch.float)
fct= MyFct.apply
y_hat = fct(x, w, None)
I always get the RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn.
Also, I've tested the optimization outside of the forward and it works, so I guess it's something with the context? According to the documentation "Tensor arguments that track history (i.e., with requires_grad=True) will be converted to ones that don’t track history before the call, and their use will be registered in the graph", see Is this the problem? Is there a way to work around it?
I am new to PyTorch and I wonder what I'm overlooking. Any help and explanation is appreciated.
I think I found an answer here: , i.e. I need to wrap the oprimization with with torch.enable_grad():.
However, I still don't understand why it's necessary to convert the original Tensors to ones that don’t track history in forward().

Excluding slim.assign_from_checkpoint searching for Momentum variables

I am trying to finetune vgg_16 model with the Momentum Optimizer . For this, I use the pretrained models from here.
Before finetuning, I assign the varible values from the models as following,
variables_to_restore = slim.get_variables_to_restore(exclude=["vgg_16/fc8"])
init_assign_op, init_feed_dict = slim.assign_from_checkpoint(model_path, variables_to_restore)
Note, I do not exclude the vgg_16/*/*/Momentum variables. Hence I recieve an error,
ValueError: Checkpoint is missing variable [vgg_16/conv1/conv1_1/weights/Momentum],
as expected.
My problem is that including all the Momentum variables in the exlude list very cumbersome(example). Is there an smarter way to exclude just the Momentum variables?
This is important since manual enterring of exclusions is impossible for large models such as resnet.
Thank you in advance!
You can solve this problem by using this code:
def _init_fn():
variables_to_restore = []
for var in slim.get_model_variables():
excluded = False
for exclusion in exclusions:
excluded = True
if not excluded:
if tf.gfile.IsDirectory(FLAGS.checkpoint_path):
checkpoint_path = tf.train.latest_checkpoint(FLAGS.checkpoint_path)
checkpoint_path = FLAGS.checkpoint_path'Fine-tuning from %s' % checkpoint_path)
return slim.assign_from_checkpoint_fn(
use this function in slim.learning.train(init_fn=init_fn,)

Output error rate per label / confusion matrix

I train an image classifier using Keras up to around 98% test accuracy. Now I know that the overall accuracy is 98%, but i want to know the accuracy/error per distinct class/label.
Has Keras a builtin function for that or would I have to test this myself per class/label?
Update: Thanks #gionni. I didn't know the actual term was "Confusion Matrix". But that's what I am actually looking for. That being said, is there a function to generate one? I have to use Keras 1.2.2 by the way.
I had similar issue so I could share my code with you. The following function computes a single class accuracy:
def single_class_accuracy(interesting_class_id):
def fn(y_true, y_pred):
class_id_preds = K.argmax(y_pred, axis=-1)
# Replace class_id_preds with class_id_true for recall here
positive_mask = K.cast(K.equal(class_id_preds, interesting_class_id), 'int32')
true_mask = K.cast(K.equal(y_true, interesting_class_id), 'int32')
acc_mask = K.cast(K.equal(positive_mask, true_mask), 'float32')
class_acc = K.mean(acc_mask)
return class_acc
return fn
Now - if you want to get an accuracy for 0 class you could add it to metrics while compiling a model:
model.compile(..., metrics=[..., single_class_accuracy(0)])
If you want to have all classes accuracy you could type:
metrics=[...] + [single_class_accuracy(i) for i in range(nb_of_classes)])
There may be better options, but you can use this:
import numpy as np
#gather each true label
distinct, counts = np.unique(trueLabels,axis=0,return_counts=True)
for dist,count in zip(distinct, counts):
selector = (trueLabels == dist).all(axis=-1)
selectedX = testData[selector]
selectedY = trueLabels[selector]
print('\n\nEvaluating for ' + str(count) + ' occurrences of class ' + str(dist))

Tensorflow: Don't Update if gradient is Nan

I have a deep model to train on CIFAR-10. Training works fine with CPU. However, when I use GPU support, it causes gradients for some batches to be NaNs (I checked it using tf.check_numerics) and it happens randomly but early enough. I believe the problem is related to my GPU.
My question is that: is there away not to update if at least one of the gradients has NaNs and force the model to proceed to the next batch ?
Edit: Perhaps I should elaborate more on my problem.
This is how I apply the gradients:
with tf.control_dependencies([tf.check_numerics(grad, message='Gradient %s check failed, possible NaNs' % for grad, var in grads]):
# Apply the gradients to adjust the shared variables.
apply_gradient_op = opt.apply_gradients(grads, global_step=global_step)
I have thought of using tf.check_numerics first to verify that there are Nans in the gradients, and, then, if there are Nans (check failed) I can "pass" without using opt.apply_gradients. However, is there a way to catch an error with tf.control_dependencies ?
I could figure it out, albeit not in the most elegant way.
My solution is as follows:
1) check all gradients first
2) if gradients are NaNs-free, apply them
3) otherwise, apply fake update (with zero values), this needs gradient override.
This is my code:
First define custom gradient:
def _zero_grad(unused_op, grad):
return tf.zeros_like(grad)
Then define an exception-handling function:
#this is added for gradient check of NaNs
def check_numerics_with_exception(grad, var):
tf.check_numerics(grad, message='Gradient %s check failed, possible NaNs' %
return tf.constant(False, shape=())
return tf.constant(True, shape=())
Then create conditional node:
num_nans_grads = tf.Variable(1.0, name='num_nans_grads')
check_all_numeric_op = tf.reduce_sum(tf.cast(tf.stack([tf.logical_not(check_numerics_with_exception(grad, var)) for grad, var in grads]), dtype=tf.float32))
with tf.control_dependencies([tf.assign(num_nans_grads, check_all_numeric_op)]):
# Apply the gradients to adjust the shared variables.
def fn_true_apply_grad(grads, global_step):
apply_gradients_true = opt.apply_gradients(grads, global_step=global_step)
return apply_gradients_true
def fn_false_ignore_grad(grads, global_step):
#print('batch update ignored due to nans, fake update is applied')
g = tf.get_default_graph()
with g.gradient_override_map({"Identity": "ZeroGrad"}):
for (grad, var) in grads:
tf.assign(var, tf.identity(var, name="Identity"))
apply_gradients_false = opt.apply_gradients(grads, global_step=global_step)
return apply_gradients_false
apply_gradient_op = tf.cond(tf.equal(num_nans_grads, 0.), lambda : fn_true_apply_grad(grads, global_step), lambda : fn_false_ignore_grad(grads, global_step))