I followed the given mnist tutorials and was able to train a model and evaluate its accuracy. However, the tutorials don't show how to make predictions given a model. I'm not interested in accuracy, I just want to use the model to predict a new example and in the output see all the results (labels), each with its assigned score (sorted or not).

In the "Deep MNIST for Experts" example, see this line:
We can now implement our regression model. It only takes one line! We
multiply the vectorized input images x by the weight matrix W, add the
bias b, and compute the softmax probabilities that are assigned to
each class.
y = tf.nn.softmax(tf.matmul(x,W) + b)
Just pull on node y and you'll have what you want.
feed_dict = {x: [your_image]}
classification =, feed_dict)
print classification
This applies to just about any model you create - you'll have computed the prediction probabilities as one of the last steps before computing the loss.

As #dga suggested, you need to run your new instance of the data though your already predicted model.
Here is an example:
Assume you went though the first tutorial and calculated the accuracy of your model (the model is this: y = tf.nn.softmax(tf.matmul(x, W) + b)). Now you grab your model and apply the new data point to it. In the following code I calculate the vector, getting the position of the maximum value. Show the image and print that maximum position.
from matplotlib import pyplot as plt
from random import randint
num = randint(0, mnist.test.images.shape[0])
img = mnist.test.images[num]
classification =, 1), feed_dict={x: [img]})
plt.imshow(img.reshape(28, 28),
print 'NN predicted', classification[0]

2.0 Compatible Answer: Suppose you have built a Keras Model as shown below:
model = keras.Sequential([
keras.layers.Flatten(input_shape=(28, 28)),
keras.layers.Dense(128, activation='relu'),
keras.layers.Dense(10, activation='softmax')
Then Train and Evaluate the Model using the below code:, train_labels, epochs=10)
test_loss, test_acc = model.evaluate(test_images, test_labels, verbose=2)
After that, if you want to predict the class of a particular image, you can do it using the below code:
predictions_single = model.predict(img)
If you want to predict the classes of a set of Images, you can use the below code:
predictions = model.predict(new_images)
where new_images is an Array of Images.
For more information, refer this Tensorflow Tutorial.

The question is specifically about the Google MNIST tutorial, which defines a predictor but doesn't apply it. Using guidance from Jonathan Hui's TensorFlow Estimator blog post, here is code which exactly fits the Google tutorial and does predictions:
from matplotlib import pyplot as plt
images = mnist.test.images[0:10]
predict_input_fn = tf.estimator.inputs.numpy_input_fn(
for image,p in zip(images,mnist_classifier.predict(input_fn=predict_input_fn)):
plt.imshow(image.reshape(28, 28),


Using training weights on a non-training data to design a new loss function

I would like to access the training point(s) at a training iteration and incorporate a soft constraint into my loss function by using data points not included in the training set. I will use this post as a reference.
import numpy as np
import keras.backend as K
from keras.layers import Dense, Input
from keras.models import Model
# Some random training data and labels
features = np.random.rand(100, 5)
labels = np.random.rand(100, 2)
# Simple neural net with three outputs
input_layer = Input((20,))
hidden_layer = Dense(16)(input_layer)
output_layer = Dense(3)(hidden_layer)
# Model
model = Model(inputs=input_layer, outputs=output_layer)
#each training point has another data pair. In the real example, I will have multiple
#supporters. That is why I am using dict.
holder = np.random.rand(100, 5)
iter = np.arange(start=1, stop=features.shape[0], step=1)
supporters = {}
for i,j in zip(iter, holder): #i represent the ith training data
# Write a custom loss function
def custom_loss(y_true, y_pred):
# Normal MSE loss
mse = K.mean(K.square(y_true-y_pred), axis=-1)
new_constraint = ....
model.compile(loss=custom_loss, optimizer='sgd'), labels, epochs=1, ,batch_size=1=1)
For simplicity, let us assume that I'd like to minimize the minimum absolute value difference between the prediction value and the prediction of the pair data stored in supporters by using the fixed network weights. Also, assume that I pass one training point at each batch. However, I could not figure out how to perform this opeartion. I've tried something shown below, but clearly, it is not correct.
new_constraint = K.sum(y_pred -
Fit is the procedure of training evaluating the model. I think that it would be better for your problem to load a new instance of your model with your current weights and evaluate the batch loss in order to calculate the loss of the main model.
main_model = Model() # This is your main training model
def custom_loss_1(y_true, y_pred): # Avoid recursive calls
mse = K.mean(K.square(y_true-y_pred), axis=-1)
return mse
def custom_loss(y_true, y_pred):
support_model = tf.keras.models.clone_model(main_model) # You copy the main model but the weights are uninitialized,)) # You build with inputs same as your support data
support_model.compile(loss=custom_loss_1, optimizer='sgd')
support_model.set_weights(main_model.get_weights()) # You load the weight of the main model
mse = custom_loss_1(y_true, y_pred)
# You just want to evaluate the model, not to train. If you have more
# metrics than just loss the use support_model.evaluate(supporters)[0]
new_constraint = K.sum(y_pred - support_model.predict(supporters)) # predict to get the output, evaluate to get the metrics

How can I isolate why my tensorflow model has such a high loss and low accuracy?

The Context:
I am creating a test application that largely replicates the functionality described here.
I was able to run the code found in the tutorial linked above, and I see losses and accuracies that are reasonable, even after just a couple of epochs.
Tutorial Code: Early into the training of the two-headed CNN, losses and accuracy look good
This is because the code starts with the VGG16 model and the already trained weights, and it freezes those layers so that no learning is required for the core classification.
My test code largely replicates the tutorial structure. It uses the exact same dataset, and the already-trained VGG16 weights. However I load the image dataset using generators (rather than pulling all data into memory, as the tutorial does).
You can find how I created those generators in the answer provided here. I had struggled for a while, before I finally got it to a point that I think is correct.
The Problem:
When I train my model the classification loss and accuracy are as expected, however the bounding box loss grows, and the bounding box accuracy does not improve, over the epochs.
My Code: Even after just a couple epochs you see the bounding box loss starting to grow
Further Details:
I've spent a lot of time looking at the (image, target) tuples yielded by the generator, and I think I am handling the yielded data properly (including the unitrect).
A pycharm view of the images and target tuples yielded by generator
In fact I've also added a debug mode that allows me to display the images and rectangles fed into the training session.
A motorcycle with the bounding box as computed from the unit rectangle bounding box loaded from CSV into the dataframe (df); df is an input to flow_from_dataframe
The model I am using:
imodel = tf.keras.applications.vgg16.VGG16(weights=None, include_top=False,
input_tensor=Input(shape=(224, 224, 3)))
imodel.load_weights(weights, by_name=True)
imodel.trainable = False
# flatten the max-pooling output of VGG
flatten = imodel.output
flatten = Flatten()(flatten)
# construct a fully-connected layer header to output the predicted
# bounding box coordinates
bboxHead = Dense(128, activation="relu")(flatten)
bboxHead = Dense(64, activation="relu")(bboxHead)
bboxHead = Dense(32, activation="relu")(bboxHead)
bboxHead = Dense(4, activation="sigmoid",
# construct a second fully-connected layer head, this one to predict
# the class label
softmaxHead = Dense(512, activation="relu")(flatten)
softmaxHead = Dropout(0.5)(softmaxHead)
softmaxHead = Dense(512, activation="relu")(softmaxHead)
softmaxHead = Dropout(0.5)(softmaxHead)
softmaxHead = Dense(len(classes), activation="softmax",
# put together our model which accept an input image and then output
# bounding box coordinates and a class label
model = Model(
outputs=(bboxHead, softmaxHead))
# define a dictionary to set the loss methods -- categorical
# cross-entropy for the class label head and mean absolute error
# for the bounding box head
losses = {
"class_label": "categorical_crossentropy",
"bounding_box": "mean_squared_error",
# define a dictionary that specifies the weights per loss (both the
# class label and bounding box outputs will receive equal weight)
lossWeights = {
"class_label": 1.0,
"bounding_box": 1.0
# initialize the optimizer, compile the model, and show the model
# summary
opt = Adam(lr=learning_rate)
model.compile(loss=losses, optimizer=opt, metrics=["accuracy"], loss_weights=lossWeights)
My call to "fit"[0], steps_per_epoch=train_generator[1],
validation_data=validation_generator[0], validation_steps=validation_generator[1],
epochs=epochs, verbose=1)
The weights that I load I've used in other experiments and downloaded them from kaggle - (see vgg16_weights_tf_dim_ordering_tf_kernels.h5).
My Generator:
def generate_image_generator(generator, data_directory, df, subset, target_size, batch_size, shuffle, seed):
genImages = generator.flow_from_dataframe(dataframe=df, directory=data_directory, target_size=target_size,
y_col=['cls_onehot', 'bbox'],
batch_size=batch_size, shuffle=shuffle, seed=seed)
while True:
images, labels =
targets = {
'class_label': labels[0],
'bounding_box': np.array(labels[1], dtype="float32")
yield images, targets
def get_train_and_validate_generators(self, data_directory, files, max_images, validation_split, shuffle, seed, target_size):
generator = ImageDataGenerator(validation_split=validation_split,
df = get_dataframe(data_directory, files)
if max_images:
df = df.head(max_images)
train_generator = generate_image_generator(generator, data_directory, df, "training",
shuffle, seed)
valid_generator = generate_image_generator(generator, data_directory, df, "validation",
shuffle, seed)
Loading the dataframe from a list of CSV
def get_dataframe(data_directory, files):
for di in files:
df = pd.read_csv(data_directory+di["file"])
df = pd.concat(frames)
df['cls_onehot'] = df['cls'].str.get_dummies().values.tolist()
df['bbox'] = df[['sxu', 'syu', 'exu', 'eyu']].values.tolist()
return df
A snippet of the CSV:
When I load weights from "imagenet", rather than use those I received from kaggle, I see the very same increase in bounding box loss
imodel = tf.keras.applications.vgg16.VGG16(weights="imagenet", include_top=False,
input_tensor=Input(shape=(224, 224, 3)))
The Question:
Please provide suggestions on how to isolate this bounding box loss growth problem.
Ok. It looks like the problem was not at all with my generator. The code was fine except for one silly oversight. I still had an old call to compile running. I called compile correctly the first time with the composite loss function. Then I called it a second time strictly with categorical cross entropy as the cost, effectively ignoring my bounding boxes.
Anyways, if someone stumbles on this post, I hope they find the complete view of how to do classification and object detection, with a generator function, useful.
I edited the above question with the correct details.. so it now reflects the right answer.
I'd still like to get the perspective of the experts who have had to dig into the workings of a model to better understand the underlying details that lead to loss calculation.
Now that I'm starting to understand tensorflow at the high-level, its clear how to recognize when things are working.. its not clear how to diagnose when things aren't working.

fine-tuning huggingface DistilBERT for multi-class classification on custom dataset yields weird output shape on prediction

I'm trying to fine-tune huggingface's implementation of distilbert for multi-class classification (100 classes) on a custom dataset following the tutorial at
I'm doing so using Tensorflow, and fine-tuning in native tensorflow, that is, I use the following part of the tutorial for dataset creation:
import tensorflow as tf
train_dataset =
val_dataset =
test_dataset =
And this one for fine-tuning:
from transformers import TFDistilBertForSequenceClassification
model = TFDistilBertForSequenceClassification.from_pretrained('distilbert-base-uncased')
optimizer = tf.keras.optimizers.Adam(learning_rate=5e-5)
model.compile(optimizer=optimizer, loss=model.compute_loss) # can also use any keras loss fn, epochs=3, batch_size=16)
Everything seems to go fine with fine-tuning, but when I try to predict on the test dataset using model.predict(test_dataset) as argument (with 2000 examples), the model seems to yield one prediction per token rather than one prediction per sequence...
That is, instead of getting an output of shape (1, 2000, 100), I get an output of shape (1, 1024000, 100), where 1024000 is the number of test examples (2000) * the sequence length (512).
Any hint on what's going on here?
(Sorry if this is naive, I'm very new to tensorflow).
I had exactly the same problem. I do not know why it's happening, as it should by the right code by looking at the tutorial.
But for me it worked to create numpy arrays out of the train_encodings and pass them directly to the fit method instead of creating the Dataset.
x1 = np.array(list(dict(train_encodings).values()))[0]
x2 = np.array(list(dict(train_encodings).values()))[1][x1,x2], train_labels, epochs=20)

Shouldn't same neural network weights produce same results?

So I am working with different deep learning frameworks as part of my research and have observed something weird (at least I cannot explain the cause of it).
I trained a fairly simple MLP model (on mnist dataset) in Tensorflow, extracted trained weights, created the same model architecture in PyTorch and applied the trained weights to PyTorch model. Now my expectation is to get same test accuracy from both Tensorflow and PyTorch models but this isn't the case. I get different results.
So my question is: If a model is trained to some optimal value, shouldn't the trained weights produce same results every time testing is done on the same dataset (regardless of the framework used)?
PyTorch Model:
class Net(nn.Module):
def __init__(self) -> None:
super(Net, self).__init__()
self.fc1 = nn.Linear(784, 24)
self.fc2 = nn.Linear(24, 10)
def forward(self, x: Tensor) -> Tensor:
x = torch.flatten(x, 1)
x = F.relu(self.fc1(x))
x = self.fc2(x)
return x
Tensorflow Model:
def build_model() -> tf.keras.Model:
# Build model layers
model = models.Sequential()
# Flatten Layer
# Fully connected layer
model.add(layers.Dense(24, activation='relu'))
# compile the model
# return newly built model
return model
To extract weights from Tensorflow model and apply them to Pytorch model I use following functions:
Extract Weights:
def get_weights(model):
# fetch latest weights
weights = model.get_weights()
# transpose weights
t_weights = []
for w in weights:
# return
return t_weights
Apply Weights:
def set_weights(model, weights):
"""Set model weights from a list of NumPy ndarrays."""
state_dict = OrderedDict(
{k: torch.Tensor(v) for k, v in zip(model.state_dict().keys(), weights)}
self.load_state_dict(state_dict, strict=True)
Providing solution in answer section for the benefit of community. From comments
If you are using the same weights in the same manner then results
should be the same, though float rounding error should also be
accounted. Also it doesn't matter if model is trained at all. You can
think of your model architecture as a chain of matrix multiplications
with element-wise nonlinearities in between. How big is the
difference? Are you comparing model outputs, our metrics computed over
dataset? As a suggestion, intialize model with some random values in
Keras, do a forward pass for a single batch (paraphrased from jdehesa and Taras Sereda)

How to fix flatlined accuracy and NaN loss in tensorflow image classification

I am currently experimenting with TensorFlow and machine learning, and as a challenge, I decided to try and code a machine learning software, on the Kaggle website, that can analyze brain MRI scans and predict if a tumour exists or not. I did so with the code below and began training the model. However, the text that showed up during training showed that none of the loss values (training or validation) had proper values and that the accuracies flatlined, or fluctuated between two numbers (the same numbers each time).
I have looked at other posts but was unable to find anything that gave me tips. I changed my loss function (from sparse_categorical_crossentropy to binary_crossentropy). But none of these changed the values.
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import os
import tensorflow as tf
from tensorflow import keras
import numpy as np
import cv2
import pandas as pd
from random import shuffle
data_path = "../input/brain_tumor_dataset"
data = []
folders = os.listdir(data_path)
for folder in folders:
for file in os.listdir(os.path.join(data_path, folder)):
if file.endswith("jpg") or file.endswith("jpeg") or file.endswith("png") or file.endswith("JPG"):
data.append(os.path.join(data_path, folder, file))
images = []
labels = []
for file in data:
img = cv2.imread(file)
img = cv2.resize(img, (IMG_SIZE, IMG_SIZE))
if "Y" in file:
union_list = list(zip(images, labels))
images, labels = zip(*union_list)
images = np.array(images)
labels = np.array(labels)
train_img = images[:200]
train_lbl = labels[:200]
val_img = images[200:]
val_lbl = labels[200:]
train_img = np.array(train_img)
val_img = np.array(val_img)
train_img = train_img.astype("float32") / 255.0
val_img = val_img.astype("float32") / 255.0
model = keras.Sequential([
tf.keras.layers.Conv2D(32, (3, 3), padding='same', activation=tf.nn.relu, input_shape=(IMG_SIZE, IMG_SIZE, 3)),
tf.keras.layers.MaxPooling2D((2,2), strides=2),
tf.keras.layers.Conv2D(64, (3, 3), padding='same', activation=tf.nn.relu),
tf.keras.layers.MaxPooling2D((2,2), strides=2),
tf.keras.layers.Dense(128, activation=tf.nn.relu),
tf.keras.layers.Dense(1, activation=tf.nn.sigmoid)
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
history =, train_lbl, epochs = 100, validation_data=(val_img, val_lbl))
This should give a result with increasing accuracy, and decreasing loss, but the loss is nan, and the accuracy is flatlined.
I managed to solve the problem. I looked at my code again and realized that my output layer only had one node. However, it needed to output the probabilities for two different categories ('yes' or 'no' for whether it is a tumour or not). Once I changed it to 2 nodes, the network began working properly and reached 95% accuracy on both the training and validation sets.
My validation accuracy still fluctuates a little between a few values, but this is most likely because I only have 23 images in the validation set. In order to decrease the fluctuations, however, I also decreased the epoch number to just 10. Everything seems to be great now.
It's likely the cause of the flatlining accuracy is the NaN loss. I'd try to figure out at what point in the computation the loss is becoming NaN (in inference? in the optimiser? in the loss calculation?). This post details some methods for outputting these intermediate values.