mixed_precision - not learning anything - tensorflow

I am trying to use mixed_precision for my neural network. Currently I am trying with only one image and no augmentation. When I am not using
policy = tf.keras.mixed_precision.Policy('mixed_float16')
tf.keras.mixed_precision.set_global_policy(policy)
Without Mixed Precision
With Mixed Precision
Creating Optimizer:
optimizer = tf.keras.optimizers.RMSprop()
optimizer = tf.keras.mixed_precision.LossScaleOptimizer(optimizer)
Training step:
with tf.GradientTape() as tape:
heat_pred = self.model(image_batch, training=True)
loss = self.getTotalLoss(heat_pred, annotation_batch)
print("Training loss: {}".format(loss))
scaled_loss = optimizer.get_scaled_loss(loss)
print("Training scaled_loss: {}".format(scaled_loss))
scaled_gradients = tape.gradient(scaled_loss, self.model.trainable_variables)
gradients = optimizer.get_unscaled_gradients(scaled_gradients)
optimizer.apply_gradients(zip(gradients, self.model.trainable_variables))
Has anyone seen something like this?
EDIT:
If I check the gradients they are not 0. They look something like:
[-2.203200e+04, 4.542500e+02, 1.624000e+03, ...,
6.125000e+02, 6.860000e+02, 7.970000e+02],
[-1.819000e+03, 2.240625e+01, 1.200625e+02, ...,
4.284375e+01, 6.003125e+01, 7.137500e+01],
[-1.928000e+04, 3.230000e+02, 1.611000e+03, ...,
4.502500e+02, 8.300000e+02, 7.055000e+02]]]], dtype=float32)>, <tf.Tensor: shape=(18,), dtype=float32, numpy=
array([-40192., 981., 3306., 1214., 2340., 2396., 1392.,
2808., 2060., 3936., 2304., 4408., 2352., 3656.,
2282., 1054., 1890., 1842.], dtype=float32)>]
But when optimizer.apply_gradients(zip(gradients, self.model.trainable_variables)) is called, the weights don't seem to be updated. (Since the loss is constant)

Related

Tensorflow 2: How to fit a subclassed model that returns multiple values in the call method?

I built the following model via Model Subclassing in TensorFlow 2:
from tensorflow.keras import Model, Input
from tensorflow.keras.applications import DenseNet201
from tensorflow.keras.applications.densenet import preprocess_input
from tensorflow.keras.layers import Flatten, Dense
class Detector(Model):
def __init__(self, num_classes=3, name="DenseNet201"):
super(Detector, self).__init__(name=name)
self.feature_extractor = DenseNet201(
include_top=False,
weights="imagenet",
)
self.feature_extractor.trainable = False
self.flatten_layer = Flatten()
self.prediction_layer = Dense(num_classes, activation=None)
def call(self, inputs):
x = preprocess_input(inputs)
extracted_feature = self.feature_extractor(x, training=False)
x = self.flatten_layer(extracted_feature)
y_hat = self.prediction_layer(x)
return extracted_feature, y_hat
The subsequent steps are compiling and fitting the model. The model compiled as normal but when fitting my image generator (built from ImageDataGenerator), I encountered the error: InvalidArgumentError: Incompatible shapes: [64,18,18] vs. [64,1] [[node Equal (defined at :19) ]] [Op:__inference_train_function_32187] Function call stack: train_function –.
history = detector.fit(
train_generator,
epochs=1,
validation_data=val_generator,
callbacks=callbacks
)
This is obvious because TensorFlow does not know whether the prediction is y_hat or extracted_featureduring detector.fit() and thus threw an error. So, what is the right implementation of detector.fit for my case?
Following this question-answer1, you should first train your model with (let's say) one input and one output. And later if you want to compute grad-cam, you would pick some intermediate layer of your base model (not the final output of the base model) and in that case, you need to build your feature extractor separately. For example
# (let's say: one input and one output)
# use for training
base_model = keras.application(...)
x = base_model(..)
dese_drop_bn_[whatever] = x
out = dese_drop_bn_[whatever]
model = Model(base_model.input, out)
# inference / we need to compute grad cam
new_model = tf.keras.models.Model(model.input,
[model.layers[15].output, model.output])
In the above, the model is used for training, and later in inference time if you need to compute grad-cam based on the layer for example layer number 15, you need to build new_model with appropriate outputs. Hope this makes things clear. For more information about feature extraction, see the official doc, Extract and reuse nodes in the graph of layers2. FYI, the exact same things are happening here as I informed you earlier. Also, check this official code example, you will see exact same thing there.
However, there is another way that I'm thinking might work for your easily. That is, as you're using a custom model, we can take the privilege training argument in the call() method. Normally in training time, this is True and for inference time it's False. So, based on this, we can return desired output the accordingly. Here is the complete code example:
import tensorflow as tf
# get some data
data_dir = tf.keras.utils.get_file(
'flower_photos',
'https://storage.googleapis.com/download.tensorflow.org/example_images/flower_photos.tgz',
untar=True)
datagen_kwargs = dict(rescale=1./255, validation_split=.20)
dataflow_kwargs = dict(target_size=(64, 64),
batch_size=16,
interpolation="bilinear")
train_datagen = tf.keras.preprocessing.image.ImageDataGenerator(
rotation_range=40,
horizontal_flip=True,
width_shift_range=0.2, height_shift_range=0.2,
shear_range=0.2, zoom_range=0.2,
**datagen_kwargs)
train_generator = train_datagen.flow_from_directory(
data_dir, subset="training", shuffle=True, **dataflow_kwargs)
for image, label in train_generator:
print(image.shape, image.dtype)
print(label.shape, label.dtype)
print(label[:4])
break
(16, 64, 64, 3) float32
(16, 5) float32
[[0. 0. 0. 0. 1.]
[0. 0. 0. 1. 0.]
[0. 0. 0. 1. 0.]
[0. 0. 0. 0. 1.]]
Here we do that trick based on the boolean value of training in the call method.
class Detector(Model):
def __init__(self, num_classes=5, name="DenseNet201"):
super(Detector, self).__init__(name=name)
self.feature_extractor = DenseNet201(
include_top=False,
weights="imagenet",
)
self.feature_extractor.trainable = False
self.flatten_layer = Flatten()
self.prediction_layer = Dense(num_classes, activation='softmax')
def call(self, inputs, training):
x = preprocess_input(inputs)
extracted_feature = self.feature_extractor(x, training=False)
x = self.flatten_layer(extracted_feature)
y_hat = self.prediction_layer(x)
if training:
return y_hat
else:
return [y_hat, extracted_feature]
Train
det = Detector()
det.compile(loss='categorical_crossentropy',
optimizer='adam', metrics=['acc'])
train_step = train_generator.samples // train_generator.batch_size
det.fit(train_generator,
steps_per_epoch=train_step,
validation_data=train_generator,
validation_steps=train_step,
epochs=2, verbose=2)
Epoch 1/2
37s 139ms/step - loss: 1.7543 - acc: 0.2650 - val_loss: 1.5310 - val_acc: 0.3764
Epoch 2/2
21s 115ms/step - loss: 1.4913 - acc: 0.3915 - val_loss: 1.3066 - val_acc: 0.4667
<tensorflow.python.keras.callbacks.History at 0x7fa2890b1790>
Evaluate
det.evaluate(train_generator,
steps=train_step)
4s 76ms/step - loss: 1.3066 - acc: 0.4667
[1.3065541982650757, 0.46666666865348816]
Inference
Here, we will get two outputs of this model (unlike 1 output that we've got in the training time).
y_hat, base_feature = det.predict(train_generator,
steps=train_step)
y_hat.shape, base_feature.shape
((720, 5), (720, 2, 2, 1920))
Now, you can do grad-cam or whatever require such feature maps.

Custom Keras binary_crossentropy loss function not working

I’m trying to re-define keras’s binary_crossentropy loss function so that I can customize it but it’s not giving me the same results as the existing one.
I'm using TF 1.13.1 with Keras 2.2.4.
I went through Keras’s github code. My understanding is that the loss in model.compile(optimizer='adam', loss='binary_crossentropy', metrics =['accuracy']), is defined in losses.py, using binary_crossentropy defined in tensorflow_backend.py.
I ran a dummy data and model to test it. Here are my findings:
The custom loss function outputs the same results as keras’s one
Using the custom loss in a keras model gives different accuracy results
from numpy.random import seed
seed(1)
from tensorflow import set_random_seed
set_random_seed(2)
import tensorflow as tf
from keras import losses
import keras.backend as K
import keras.backend.tensorflow_backend as tfb
from keras.layers import Dense
from keras import Sequential
#Dummy check of loss output
def binary_crossentropy_custom(y_true, y_pred):
return K.mean(binary_crossentropy_custom_tf(y_true, y_pred), axis=-1)
def binary_crossentropy_custom_tf(target, output, from_logits=False):
"""Binary crossentropy between an output tensor and a target tensor.
# Arguments
target: A tensor with the same shape as `output`.
output: A tensor.
from_logits: Whether `output` is expected to be a logits tensor.
By default, we consider that `output`
encodes a probability distribution.
# Returns
A tensor.
"""
# Note: tf.nn.sigmoid_cross_entropy_with_logits
# expects logits, Keras expects probabilities.
if not from_logits:
# transform back to logits
_epsilon = tfb._to_tensor(tfb.epsilon(), output.dtype.base_dtype)
output = tf.clip_by_value(output, _epsilon, 1 - _epsilon)
output = tf.log(output / (1 - output))
return tf.nn.sigmoid_cross_entropy_with_logits(labels=target,
logits=output)
logits = tf.constant([[-3., -2.11, -1.22],
[-0.33, 0.55, 1.44],
[2.33, 3.22, 4.11]])
labels = tf.constant([[1., 1., 1.],
[1., 1., 0.],
[0., 0., 0.]])
custom_sigmoid_cross_entropy_with_logits = binary_crossentropy_custom(labels, logits)
keras_binary_crossentropy = losses.binary_crossentropy(y_true=labels, y_pred=logits)
with tf.Session() as sess:
print('CUSTOM sigmoid_cross_entropy_with_logits: ', sess.run(custom_sigmoid_cross_entropy_with_logits), '\n')
print('KERAS keras_binary_crossentropy: ', sess.run(keras_binary_crossentropy), '\n')
#CUSTOM sigmoid_cross_entropy_with_logits: [16.118095 10.886106 15.942386]
#KERAS keras_binary_crossentropy: [16.118095 10.886106 15.942386]
#Dummy check of model accuracy
X_train = tf.random.uniform((3, 5), minval=0, maxval=1, dtype=tf.dtypes.float32)
labels = tf.constant([[1., 0., 0.],
[0., 0., 1.],
[1., 0., 0.]])
model = Sequential()
#First Hidden Layer
model.add(Dense(5, activation='relu', kernel_initializer='random_normal', input_dim=5))
#Output Layer
model.add(Dense(3, activation='sigmoid', kernel_initializer='random_normal'))
#I ran model.fit for each model.compile below 10 times using the same X_train and provide the range of accuracy measurement
# model.compile(optimizer='adam', loss='binary_crossentropy', metrics =['accuracy']) #0.748 < acc < 0.779
# model.compile(optimizer='adam', loss=losses.binary_crossentropy, metrics =['accuracy']) #0.761 < acc < 0.778
model.compile(optimizer='adam', loss=binary_crossentropy_custom, metrics =['accuracy']) #0.617 < acc < 0.663
history = model.fit(X_train, labels, steps_per_epoch=100, epochs=1)
I'd expect the custom loss function to give similar model accuracy output but it does not. Any idea? Thanks!
Keras automatically selects which accuracy implementation to use according to the loss, and this won't work if you use a custom loss. But in this case you can just explictly use the right accuracy, which is binary_accuracy:
model.compile(optimizer='adam', loss=binary_crossentropy_custom, metrics =['binary_accuracy'])

Tensorflow: Why is my loss declining although my gradients are zero?

For debugging of my code and understanding of RNNs I set my gradients manually to 0 like this:
gvs = optimizer.compute_gradients(cost)
gvs[0] = (tf.zeros((5002,2), dtype=tf.float32), tf.trainable_variables()[0])
gvs[1] = (tf.zeros((2,), dtype=tf.float32), tf.trainable_variables()[1])
train_op = optimizer.apply_gradients(gvs)
I only have two trainable variables, so above quick-and-dirty approach should set all gradients to zero:
tf.trainable_variables()
Out[8]:
[<tf.Variable 'rnn/basic_rnn_cell/kernel:0' shape=(5002, 2) dtype=float32_ref>,
<tf.Variable 'rnn/basic_rnn_cell/bias:0' shape=(2,) dtype=float32_ref>]
When I run the network the loss is still declining. How can that be? As far as I understand the new variable values should be old value + learning rate * gradients.
I am using the AdaGradOptimizer.
Update: np.sum(sess.run(gvs[0][0])) and np.sum(sess.run(gvs[1][0])) both return 0.

Tensorflow: Using Batch Normalization gives poor (erratic) validation loss and accuracy

I am trying to use Batch Normalization using tf.layers.batch_normalization() and my code looks like this:
def create_conv_exp_model(fingerprint_input, model_settings, is_training):
# Dropout placeholder
if is_training:
dropout_prob = tf.placeholder(tf.float32, name='dropout_prob')
# Mode placeholder
mode_placeholder = tf.placeholder(tf.bool, name="mode_placeholder")
he_init = tf.contrib.layers.variance_scaling_initializer(mode="FAN_AVG")
# Input Layer
input_frequency_size = model_settings['bins']
input_time_size = model_settings['spectrogram_length']
net = tf.reshape(fingerprint_input,
[-1, input_time_size, input_frequency_size, 1],
name="reshape")
net = tf.layers.batch_normalization(net,
training=mode_placeholder,
name='bn_0')
for i in range(1, 6):
net = tf.layers.conv2d(inputs=net,
filters=8*(2**i),
kernel_size=[5, 5],
padding='same',
kernel_initializer=he_init,
name="conv_%d"%i)
net = tf.layers.batch_normalization(net,
training=mode_placeholder,
name='bn_%d'%i)
with tf.name_scope("relu_%d"%i):
net = tf.nn.relu(net)
net = tf.layers.max_pooling2d(net, [2, 2], [2, 2], 'SAME',
name="maxpool_%d"%i)
net_shape = net.get_shape().as_list()
net_height = net_shape[1]
net_width = net_shape[2]
net = tf.layers.conv2d( inputs=net,
filters=1024,
kernel_size=[net_height, net_width],
strides=(net_height, net_width),
padding='same',
kernel_initializer=he_init,
name="conv_f")
net = tf.layers.batch_normalization( net,
training=mode_placeholder,
name='bn_f')
with tf.name_scope("relu_f"):
net = tf.nn.relu(net)
net = tf.layers.conv2d( inputs=net,
filters=model_settings['label_count'],
kernel_size=[1, 1],
padding='same',
kernel_initializer=he_init,
name="conv_l")
### Squeeze
squeezed = tf.squeeze(net, axis=[1, 2], name="squeezed")
if is_training:
return squeezed, dropout_prob, mode_placeholder
else:
return squeezed, mode_placeholder
And my train step looks like this:
update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
with tf.control_dependencies(update_ops):
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate_input)
gvs = optimizer.compute_gradients(cross_entropy_mean)
capped_gvs = [(tf.clip_by_value(grad, -2., 2.), var) for grad, var in gvs]
train_step = optimizer.apply_gradients(gvs))
During training, I am feeding the graph with:
train_summary, train_accuracy, cross_entropy_value, _, _ = sess.run(
[
merged_summaries, evaluation_step, cross_entropy_mean, train_step,
increment_global_step
],
feed_dict={
fingerprint_input: train_fingerprints,
ground_truth_input: train_ground_truth,
learning_rate_input: learning_rate_value,
dropout_prob: 0.5,
mode_placeholder: True
})
During validation,
validation_summary, validation_accuracy, conf_matrix = sess.run(
[merged_summaries, evaluation_step, confusion_matrix],
feed_dict={
fingerprint_input: validation_fingerprints,
ground_truth_input: validation_ground_truth,
dropout_prob: 1.0,
mode_placeholder: False
})
My loss and accuracy curves (orange is training, blue is validation):
Plot of loss vs number of iterations,
Plot of accuracy vs number of iterations
The validation loss (and accuracy) seem very erratic. Is my implementation of Batch Normalization wrong? Or is this normal with Batch Normalization and I should wait for more iterations?
You need to pass is_training to tf.layers.batch_normalization(..., training=is_training) or it tries to normalize the inference minibatches using the minibatch statistics instead of the training statistics, which is wrong.
There are mainly two things to check.
1. Are you sure that you are using batch normalization (BN) correctly in the train op?
If you read the layer documentation:
Note: when training, the moving_mean and moving_variance need to be updated.
By default the update ops are placed in tf.GraphKeys.UPDATE_OPS, so they
need to be added as a dependency to the train_op. Also, be sure to add
any batch_normalization ops before getting the update_ops collection.
Otherwise, update_ops will be empty, and training/inference will not work
properly.
For example:
x_norm = tf.layers.batch_normalization(x, training=training)
# ...
update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
with tf.control_dependencies(update_ops):
train_op = optimizer.minimize(loss)
2. Otherwise, try lowering the "momentum" in the BN.
During the training, in fact, the BN uses two moving averages of the mean and the variance that are supposed to approximate the population statistics. Mean and variance are initialized to 0 and 1 respectively and then, step by step, they are multiplied by the momentum value (default is 0.99) and added the new value*0.01. At inference (test) time, the normalization uses these statistics. For this reason, it takes these values a little while to arrive at the "real" mean and variance of the data.
Source:
https://www.tensorflow.org/api_docs/python/tf/layers/batch_normalization
https://github.com/keras-team/keras/issues/7265
https://github.com/keras-team/keras/issues/3366
The original BN paper can be found here:
https://arxiv.org/abs/1502.03167
I also observed oscillations in validation loss when adding batch norm before ReLU. We found that moving the batch norm after the ReLU resolved the issue.

How to properly use tf.metrics.accuracy?

I have some trouble using the accuracy function from tf.metrics for a multiple classification problem with logits as input.
My model output looks like:
logits = [[0.1, 0.5, 0.4],
[0.8, 0.1, 0.1],
[0.6, 0.3, 0.2]]
And my labels are one hot encoded vectors:
labels = [[0, 1, 0],
[1, 0, 0],
[0, 0, 1]]
When I try to do something like tf.metrics.accuracy(labels, logits) it never gives the correct result. I am obviously doing something wrong but I can't figure what it is.
TL;DR
The accuracy function tf.metrics.accuracy calculates how often predictions matches labels based on two local variables it creates: total and count, that are used to compute the frequency with which logits matches labels.
acc, acc_op = tf.metrics.accuracy(labels=tf.argmax(labels, 1),
predictions=tf.argmax(logits,1))
print(sess.run([acc, acc_op]))
print(sess.run([acc]))
# Output
#[0.0, 0.66666669]
#[0.66666669]
acc (accuracy): simply returns the metrics using total and count, doesnt update the metrics.
acc_op (update up): updates the metrics.
To understand why the acc returns 0.0, go through the details below.
Details using a simple example:
logits = tf.placeholder(tf.int64, [2,3])
labels = tf.Variable([[0, 1, 0], [1, 0, 1]])
acc, acc_op = tf.metrics.accuracy(labels=tf.argmax(labels, 1),
predictions=tf.argmax(logits,1))
Initialize the variables:
Since metrics.accuracy creates two local variables total and count, we need to call local_variables_initializer() to initialize them.
sess = tf.Session()
sess.run(tf.local_variables_initializer())
sess.run(tf.global_variables_initializer())
stream_vars = [i for i in tf.local_variables()]
print(stream_vars)
#[<tf.Variable 'accuracy/total:0' shape=() dtype=float32_ref>,
# <tf.Variable 'accuracy/count:0' shape=() dtype=float32_ref>]
Understanding update ops and accuracy calculation:
print('acc:',sess.run(acc, {logits:[[0,1,0],[1,0,1]]}))
#acc: 0.0
print('[total, count]:',sess.run(stream_vars))
#[total, count]: [0.0, 0.0]
The above returns 0.0 for accuracy as total and count are zeros, inspite of giving matching inputs.
print('ops:', sess.run(acc_op, {logits:[[0,1,0],[1,0,1]]}))
#ops: 1.0
print('[total, count]:',sess.run(stream_vars))
#[total, count]: [2.0, 2.0]
With the new inputs, the accuracy is calculated when the update op is called. Note: since all the logits and labels match, we get accuracy of 1.0 and the local variables total and count actually give total correctly predicted and the total comparisons made.
Now we call accuracy with the new inputs (not the update ops):
print('acc:', sess.run(acc,{logits:[[1,0,0],[0,1,0]]}))
#acc: 1.0
Accuracy call doesnt update the metrics with the new inputs, it just returns the value using the two local variables. Note: the logits and labels dont match in this case. Now calling update ops again:
print('op:',sess.run(acc_op,{logits:[[0,1,0],[0,1,0]]}))
#op: 0.75
print('[total, count]:',sess.run(stream_vars))
#[total, count]: [3.0, 4.0]
The metrics are updated to new inputs
For more information on how to use the metrics during training and how to reset them during validation, can be found here.
On TF 2.0, if you are using the tf.keras API, you can define a custom class myAccuracy which inherits from tf.keras.metrics.Accuracy, and overrides the update method like this:
# imports
# ...
class myAccuracy(tf.keras.metrics.Accuracy):
def update_state(self, y_true, y_pred, sample_weight=None):
y_true = tf.argmax(y_true,1)
y_pred = tf.argmax(y_pred,1)
return super(myAccuracy,self).update_state(y_true,y_pred,sample_weight)
Then, when compiling the model you can add metrics in the usual way.
from my_awesome_models import discriminador
discriminador.compile(tf.keras.optimizers.Adam(),
loss=tf.nn.softmax_cross_entropy_with_logits,
metrics=[myAccuracy()])
from my_puzzling_datasets import train_dataset,test_dataset
discriminador.fit(train_dataset.shuffle(70000).repeat().batch(1000),
epochs=1,steps_per_epoch=1,
validation_data=test_dataset.shuffle(70000).batch(1000),
validation_steps=1)
# Train for 1 steps, validate for 1 steps
# 1/1 [==============================] - 3s 3s/step - loss: 0.1502 - accuracy: 0.9490 - val_loss: 0.1374 - val_accuracy: 0.9550
Or evaluate yout model over the whole dataset
discriminador.evaluate(test_dataset.batch(TST_DSET_LENGTH))
#> [0.131587415933609, 0.95354694]
Applied on a cnn you can write:
x_len=24*24
y_len=2
x = tf.placeholder(tf.float32, shape=[None, x_len], name='input')
fc1 = ... # cnn's fully connected layer
keep_prob = tf.placeholder(tf.float32, name='keep_prob')
layer_fc_dropout = tf.nn.dropout(fc1, keep_prob, name='dropout')
y_pred = tf.nn.softmax(fc1, name='output')
logits = tf.argmax(y_pred, axis=1)
y_true = tf.placeholder(tf.float32, shape=[None, y_len], name='y_true')
acc, acc_op = tf.metrics.accuracy(labels=tf.argmax(y_true, axis=1), predictions=tf.argmax(y_pred, 1))
sess.run(tf.global_variables_initializer())
sess.run(tf.local_variables_initializer())
def print_accuracy(x_data, y_data, dropout=1.0):
accuracy = sess.run(acc_op, feed_dict = {y_true: y_data, x: x_data, keep_prob: dropout})
print('Accuracy: ', accuracy)
Extending the answer to TF2.0, the tutorial here explains clearly how to use tf.metrics for accuracy and loss.
https://www.tensorflow.org/beta/tutorials/quickstart/advanced
Notice that it mentions that the metrics are reset after each epoch :
train_loss.reset_states()
train_accuracy.reset_states()
test_loss.reset_states()
test_accuracy.reset_states()
When label and predictions are one-hot-coded
def train_step(features, labels):
with tf.GradientTape() as tape:
prediction = model(features)
loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=labels, logits=predictions))
gradients = tape.gradient(loss, model.trainable_weights)
optimizer.apply_gradients(zip(gradients, model.trainable_weights))
train_loss(loss)
train_accuracy(tf.argmax(labels, 1), tf.argmax(predictions, 1))
Here how I use it:
test_accuracy = tf.keras.metrics.Accuracy()
# use dataset api or normal dataset from lists/np arrays
ds_test_batch = zip(x_test,y_test)
predicted_classes = np.array([])
for (x, y) in ds_test_batch:
# training=False is needed only if there are layers with different
# behaviour during training versus inference (e.g. Dropout).
#Ajust the input similar to your input during the training
logits = model(x.reshape(1,-1), training=False )
prediction = tf.argmax(logits, axis=1, output_type=tf.int64)
predicted_classes = np.concatenate([predicted_classes,prediction.numpy()])
test_accuracy(prediction, y)
print("Test set accuracy: {:.3%}".format(test_accuracy.result()))