I try to implement object detector like yolo. It uses complex custom loss function. So I need to print/debug its tensors. I understand, that python code only builds computing graph so standard print won't work in not eager mode.
tensorflow 1.12.0
keras 2.2.4
I tried all methods from these posts Keras custom loss function not printing value of tensor, Debugging keras tensor values and nothing works. I tried tf.Print, tf.print, callback, K.tensor_print - the same result. In console i see only standart output messages. I`m not even sure if loss function been called. The answer from this post Keras - printing intermediate tensors in loss function (tf.Print and K.print_tensor do not work...) says that loss function sometimes doesn't even called! Ok, but how to use tf.contrib.eager.defun decorator then? The example is for pure tensorflow and dont't understand how to use it in keras.
import tensorflow as tf
from keras.datasets import fashion_mnist
import matplotlib.pyplot as plt
from keras.layers.convolutional import Conv2D, MaxPooling2D
from keras.layers import Flatten, Dense, Dropout
from keras.models import Sequential
from keras import optimizers
import numpy as np
from random import randrange
from keras.callbacks import LambdaCallback
import keras.backend as K
import keras
print(tf.__version__)
print(keras.__version__)
num_filters = 64
(x_train, y_train), (x_test, y_test) = fashion_mnist.load_data()
x_train = x_train.astype('float32') / 255
x_test = x_test.astype('float32') / 255
#reshape
x_train = x_train.reshape(60000,28,28,1)[:1000,...]
x_test = x_test.reshape(10000,28,28,1)[:100,...]
# One-hot encode the labels
y_train = tf.keras.utils.to_categorical(y_train, 10)[:1000,...]
y_test = tf.keras.utils.to_categorical(y_test, 10)[:100,]
labels = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat', 'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']
model = Sequential()
model.add(Conv2D(input_shape=(28,28,1), filters=num_filters,kernel_size=3,strides=(1, 1),padding="valid", activation='relu', use_bias=True))
model.add(MaxPooling2D(pool_size=2))
model.add(Dropout(0.3))
model.add(Conv2D(filters=num_filters,kernel_size=3,strides=(1, 1),padding="valid", activation='relu', use_bias=True))
model.add(MaxPooling2D(pool_size=2))
model.add(Dropout(0.3))
model.add(Flatten())
model.add(Dense(256, activation = 'relu'))
model.add(Dropout(0.5))
model.add(Dense(10, activation = 'softmax'))
#model.summary()
#loss 1
def customLoss(yTrue,yPred):
d = yPred-yTrue
d = K.print_tensor(d)
return K.mean(K.square(d), axis=-1)
#loss 2
def cat_loss(y_true, y_pred):
d = y_true - y_pred
d = tf.Print(d, [d], "Inside loss function")
return tf.reduce_mean(tf.square(d))
model.compile(loss=customLoss,
optimizer='adam')
import keras.callbacks as cbks
# 3 try to print with callback
class CustomMetrics(cbks.Callback):
def on_epoch_end(self, epoch, logs=None):
for k in logs:
if k.endswith('cat_loss'):
print(logs[k])
#checkpointer = ModelCheckpoint(filepath='model.weights.best.hdf5', verbose = 1, save_best_only=True)
model.fit(x_train,
y_train,
#verbose=1,
batch_size=16,
epochs=10,
validation_data=(x_test, y_test),
callbacks=[CustomMetrics()])
# Evaluate the model on test set
score = model.evaluate(x_test, y_test, verbose=0)
# Print test accuracy
print('\n', 'Test accuracy:', score)
rand_img = randrange(100)
result = np.argmax(model.predict(x_test[rand_img].reshape(1,28,28,1)))
plt.imshow(x_test[rand_img].reshape(28,28), cmap='gray')
plt.title(labels[result])
==========>......] - ETA: 0s - loss: 0.0243
832/1000 [=======================>......] - ETA: 0s - loss: 0.0242
Warning (from warnings module):
File "C:\Python36\lib\site-packages\keras\callbacks.py", line 122
% delta_t_median)
UserWarning: Method on_batch_end() is slow compared to the batch update (0.101474). Check your callbacks.
976/1000 [============================>.] - ETA: 0s - loss: 0.0238
992/1000 [============================>.] - ETA: 0s - loss: 0.0236
1000/1000 [==============================] - 3s 3ms/step - loss: 0.0239 - val_loss: 0.0352
Test accuracy: 0.035189545452594756```
The truth was somewhere near. Idle doesn't output tf.Print and therefore K.print_tensor() to it's shell, so when i used cmd.exe python train.py, i saw tensor output.
Related
Hello i need to build an ANN using binary_alpha_digits from tensorflow but i am unable to pass in the train data inside as it requires 'flatten_input' but I am passing in ['image','label'] dictionary. How do i solve this problem? Appreciate any help on this problem thanks.
from matplotlib import pyplot as plt
import tensorflow as tf
from tensorflow.keras import layers
train_ds, test_ds = tfds.load('BinaryAlphaDigits',
split=['train[:60%]', 'train[60%:]'])
model = tf.keras.Sequential()
model.add(layers.Flatten(input_shape=(28, 28)))
model.add(layers.Dense(10, activation=tf.nn.relu))
model.add(layers.Dense(10, activation=tf.nn.relu))
model.add(layers.Dense(10, activation=tf.nn.softmax))
model.compile(optimizer= tf.optimizers.Adam(),
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
epochs = 10
model.fit(train_ds, epochs=epochs)
as you feed images into model, so the input shape must have defined in shape (Height, Width, Channel) which refers to image dimensions and color mode and the second one is that you should preprocess dataset before fitting model on it.
Even notice the output layers units for multi-class classification is not set correctly for this dataset, while there are more than 10 labels, based on dataset it contains 39 labels and so the last layer units would be set to 39.
Here i would implement code which work correctly for you with preprocessing function for images and labels, And even notice the images of the dataset are in shape (20, 16, 1) so you could resize images to set it into (28, 28, 1) or just fed model with the images in their size.
After preprocessing, images are grouped by creating batches or mini-batches, and even shuffle training set to avoid high variance on testing set, so the operations below will be have done cause of that
from matplotlib import pyplot as plt
import tensorflow as tf
from tensorflow.keras import layers
import tensorflow_datasets as tfds
train_ds, test_ds = tfds.load('BinaryAlphaDigits', split=['train[:60%]', 'train[60%:]'])
def preprocess(data):
image = data['image']
image = tf.image.resize(image, (28, 28))
label = data['label']
return image, label
train_ds = train_ds.map(preprocess)
train_ds = train_ds.shuffle(1024)
train_ds = train_ds.batch(batch_size = 32)
test_ds = test_ds.map(preprocess)
test_ds = test_ds.batch(batch_size = 32)
model = tf.keras.Sequential()
model.add(layers.Flatten(input_shape=(28, 28, 1)))
model.add(layers.Dense(10, activation=tf.nn.relu))
model.add(layers.Dense(10, activation=tf.nn.relu))
model.add(layers.Dense(39, activation=tf.nn.softmax))
model.compile(optimizer= tf.optimizers.Adam(), loss='sparse_categorical_crossentropy', metrics=['accuracy'])
epochs = 10
model.fit(train_ds, epochs=epochs)
tfds.load by default gives a dictionary with image and label as the keys.
train_ds, test_ds = tfds.load('BinaryAlphaDigits',
split=['train[:60%]', 'train[60%:]'])
train_ds = train_ds.shuffle(1024).batch(4)
for x in train_ds.take(1):
print(type(x))
print(x['image'].shape, x['label'])
>>>
<class 'dict'>
(4, 20, 16, 1) tf.Tensor([ 6 32 6 12], shape=(4,), dtype=int64)
There is a setting called as_supervised that gives it as a proper dataset. Check docs here
If you use that setting and use proper input and output sizes, your model works
train_ds, test_ds = tfds.load('BinaryAlphaDigits',
split=['train[:60%]', 'train[60%:]'],as_supervised=True)
train_ds = train_ds.shuffle(1024).batch(4)
for x in train_ds.take(1):
print(type(x))
print(x[0].shape, x[1])
>>>
<class 'tuple'>
(4, 20, 16, 1) tf.Tensor([13 13 22 31], shape=(4,), dtype=int64)
model = tf.keras.Sequential()
model.add(layers.Flatten(input_shape=(20, 16,1)))
model.add(layers.Dense(10, activation=tf.nn.relu))
model.add(layers.Dense(10, activation=tf.nn.relu))
model.add(layers.Dense(36, activation=tf.nn.softmax))
model.compile(optimizer= tf.optimizers.Adam(),
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
epochs = 10
model.fit(train_ds, epochs=epochs)
>>>
Epoch 1/10
211/211 [==============================] - 1s 3ms/step - loss: 3.5428 - accuracy: 0.0629
Epoch 2/10
211/211 [==============================] - 0s 2ms/step - loss: 3.2828 - accuracy: 0.1105
The second loss is not consistently related to the first epoch. After that, every initial loss always stays the same every epoch. And all these parameters stay the same. I have some background in deep learning, but this is my first time implementing my own model so I want to know what's going wrong with my model intuitively. The dataset is the cropped face with two classifications each having 300 pictures. I highly appreciate your help.
import tensorflow as tf
from tensorflow import keras
from IPython.display import Image
import matplotlib.pyplot as plt
from keras.layers import ActivityRegularization
from keras.layers import Dropout
from tensorflow.keras.preprocessing.image import ImageDataGenerator
image_generator = ImageDataGenerator(
featurewise_center=False, samplewise_center=False,
featurewise_std_normalization=False, samplewise_std_normalization=False,
rotation_range=0, width_shift_range=0.0, height_shift_range=0.0,
brightness_range=None, shear_range=0.0, zoom_range=0.0, channel_shift_range=0.0,
horizontal_flip=False, vertical_flip=False, rescale=1./255
)
image = image_generator.flow_from_directory('./util/untitled folder',batch_size=938)
x, y = image.next()
x_train = x[:500]
y_train = y[:500]
x_test = x[500:600]
y_test = y[500:600]
train_dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train)).batch(4)
test_dataset = tf.data.Dataset.from_tensor_slices((x_test, y_test)).batch(4)
plt.imshow(x_train[0])
def convolutional_model(input_shape):
input_img = tf.keras.Input(shape=input_shape)
x = tf.keras.layers.Conv2D(64, (7,7), padding='same')(input_img)
x = tf.keras.layers.BatchNormalization(axis=3)(x)
x = tf.keras.layers.ReLU()(x)
x = tf.keras.layers.MaxPool2D(pool_size=(2, 2), strides=1, padding='same')(x)
x = Dropout(0.5)(x)
x = tf.keras.layers.Conv2D(128, (3, 3), padding='same', strides=1)(x)
x = tf.keras.layers.ReLU()(x)
x = tf.keras.layers.MaxPool2D(pool_size=(2, 2), padding='same', strides=4)(x)
x = tf.keras.layers.Flatten()(x)
x = ActivityRegularization(0.1,0.2)(x)
outputs = tf.keras.layers.Dense(2, activation='softmax')(x)
model = tf.keras.Model(inputs=input_img, outputs=outputs)
return model
conv_model = convolutional_model((256, 256, 3))
conv_model.compile(loss=keras.losses.categorical_crossentropy,
optimizer=keras.optimizers.SGD(lr=1),
metrics=['accuracy'])
conv_model.summary()
conv_model.fit(train_dataset,epochs=100, validation_data=test_dataset)
Epoch 1/100
2021-12-23 15:06:22.165763: W tensorflow/core/platform/profile_utils/cpu_utils.cc:128] Failed to get CPU frequency: 0 Hz
2021-12-23 15:06:22.172255: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled.
125/125 [==============================] - ETA: 0s - loss: 804.6805 - accuracy: 0.48602021-12-23 15:06:50.936870: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled.
125/125 [==============================] - 35s 275ms/step - loss: 804.6805 - accuracy: 0.4860 - val_loss: 0.7197 - val_accuracy: 0.4980
Epoch 2/100
125/125 [==============================] - 34s 270ms/step - loss: 0.7360 - accuracy: 0.4820 - val_loss: 0.7197 - val_accuracy: 0.4980
Epoch 3/100
125/125 [==============================] - 34s 276ms/step - loss: 0.7360 - accuracy: 0.4820 - val_loss: 0.7197 - val_accuracy: 0.4980
As you have a constant loss + accuracy, it is highly likely that your network does not learn anything (since you have two classes, it always predicts one of them).
The activation function, loss function and number of neurons on the last layer are correct.
The problem is not related to they way you load the images, but to the learning rate which is 1. At such a high learning rate, it is impossible for the network to be able to learn anything.
You should start with a much smaller learning rate, for example 0.0001 or 0.00001, and then try to debug the data-loading process if you still have poor performance.
I am quite certain that it has something to do with how you load the data, and more specifically the x, y = image.next() part. If you are able to split the data from ./util/untitled folder to separate folders having training and validation data respectively, you could use the same kind on pipeline as in the examples section on Tensorflow page:
train_datagen = ImageDataGenerator(
featurewise_center=False,
samplewise_center=False,
featurewise_std_normalization=False,
samplewise_std_normalization=False,
rotation_range=0,
width_shift_range=0.0,
height_shift_range=0.0,
brightness_range=None,
shear_range=0.0,
zoom_range=0.0,
channel_shift_range=0.0,
horizontal_flip=False,
vertical_flip=False,
rescale=1./255)
test_datagen = ImageDataGenerator(featurewise_center=False,
samplewise_center=False,
featurewise_std_normalization=False,
samplewise_std_normalization=False,
rotation_range=0,
width_shift_range=0.0,
height_shift_range=0.0,
brightness_range=None,
shear_range=0.0,
zoom_range=0.0,
channel_shift_range=0.0,
horizontal_flip=False,
vertical_flip=False,
rescale=1./255)
train_generator = train_datagen.flow_from_directory(
'data/train',
target_size=(256, 256),
batch_size=4)
validation_generator = test_datagen.flow_from_directory(
'data/validation',
target_size=(256, 256),
batch_size=4)
model.fit(
train_generator,
epochs=100,
validation_data=validation_generator)
import keras
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense
from keras.optimizers import SGD
import matplotlib.pyplot as plt
(x_train, y_train), (x_test, y_test) = mnist.load_data()
print(x_train.shape, y_train.shape)
print(x_test.shape, y_test.shape)
im = plt.imshow(x_train[0], cmap="gray")
plt.show()
x_train = x_train.reshape(60000, 784)
x_test = x_test.reshape(10000, 784)
x_train = x_train/255
x_test = x_test/255
y_train = keras.utils.to_categorical(y_train, 10)
y_test = keras.utils.to_categorical(y_test, 10)
model = Sequential()
model.add(Dense(512, activation='relu', input_shape=(784,)))
model.add(Dense(256, activation='relu'))
model.add(Dense(10, activation='softmax'))
model.summary()
model.compile(optimizer=SGD(), loss='categorical_crossentropy', metics=['accuracy'])
model.fit(x_train, y_train, batch_size=64, epochs=5, validation_data=(x_test, y_test))
I tried several different versions of the combination, but still reported an error about
AttributeError: module 'tensorflow_core.compat.v2' has no attribute 'internal'
AttributeError: module 'tensorflow_core.compat.v2' has no attribute '__internal__'
Generally you will get above error due to incompatibility between Tensorflow and Keras. You can import keras without any issues by upgrade to latest version. For more details you can refer this solution.
Coming to your code, have couple of issues and it can be resolved
1.to_categorical has now packed in np_utils. You need to add import as shown below
from keras.utils.np_utils import to_categorical
2.Typo mistake, replace metics to metrics in model.compile
Working code as shown below
import keras
print(keras.__version__)
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense
from keras.optimizers import SGD
import matplotlib.pyplot as plt
from keras.utils.np_utils import to_categorical
(x_train, y_train), (x_test, y_test) = mnist.load_data()
print(x_train.shape, y_train.shape)
print(x_test.shape, y_test.shape)
im = plt.imshow(x_train[0], cmap="gray")
plt.show()
x_train = x_train.reshape(60000, 784)
x_test = x_test.reshape(10000, 784)
x_train = x_train/255
x_test = x_test/255
y_train = to_categorical(y_train, 10)
y_test = to_categorical(y_test, 10)
model = Sequential()
model.add(Dense(512, activation='relu', input_shape=(784,)))
model.add(Dense(256, activation='relu'))
model.add(Dense(10, activation='softmax'))
model.summary()
model.compile(optimizer=SGD(), loss='categorical_crossentropy', metrics=['accuracy'])
model.fit(x_train, y_train, batch_size=64, epochs=5, validation_data=(x_test, y_test))
Output:
2.5.0
(60000, 28, 28) (60000,)
(10000, 28, 28) (10000,)
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense (Dense) (None, 512) 401920
_________________________________________________________________
dense_1 (Dense) (None, 256) 131328
_________________________________________________________________
dense_2 (Dense) (None, 10) 2570
=================================================================
Total params: 535,818
Trainable params: 535,818
Non-trainable params: 0
_________________________________________________________________
Epoch 1/5
938/938 [==============================] - 21s 8ms/step - loss: 1.2351 - accuracy: 0.6966 - val_loss: 0.3644 - val_accuracy: 0.9011
Epoch 2/5
938/938 [==============================] - 7s 7ms/step - loss: 0.3554 - accuracy: 0.9023 - val_loss: 0.2943 - val_accuracy: 0.9166
Epoch 3/5
938/938 [==============================] - 7s 7ms/step - loss: 0.2929 - accuracy: 0.9176 - val_loss: 0.2553 - val_accuracy: 0.9282
Epoch 4/5
938/938 [==============================] - 7s 7ms/step - loss: 0.2538 - accuracy: 0.9281 - val_loss: 0.2309 - val_accuracy: 0.9337
Epoch 5/5
938/938 [==============================] - 7s 8ms/step - loss: 0.2313 - accuracy: 0.9355 - val_loss: 0.2096 - val_accuracy: 0.9401
<keras.callbacks.History at 0x7f615c82d090>
You can refer this gist,for the above use case in tensorflow version.
Error:
AttributeError: module 'tensorflow_core.compat.v2' has no attribute '__internal__'
Solution:
Install Libraries
!pip install tensorflow==2.1
!pip install keras==2.3.1
Import
from tensorflow.keras.models import load_model
Can anyone point to references where one can learn how to perform Quantization Aware Training (QAT) with tf.GradientTape on TensorFlow 2?
I only see this done with the tf.keras API. I do not use tf. keras, I always build customized training with tf.GradientTape provides more control over the training process. I now need to quantize a model but I only see references on how to do it using the tf. keras API.
In the official examples here, they showed QAT training with model. fit. Here is a demonstration of Quantization Aware Training using tf.GradientTape(). But for complete reference, let's do both here.
Base model training. This is directly from the official doc. For more details, please check there.
import os
import tensorflow as tf
from tensorflow import keras
import tensorflow_model_optimization as tfmot
# Load MNIST dataset
mnist = keras.datasets.mnist
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()
# Normalize the input image so that each pixel value is between 0 to 1.
train_images = train_images / 255.0
test_images = test_images / 255.0
# Define the model architecture.
model = keras.Sequential([
keras.layers.InputLayer(input_shape=(28, 28)),
keras.layers.Reshape(target_shape=(28, 28, 1)),
keras.layers.Conv2D(filters=12, kernel_size=(3, 3), activation='relu'),
keras.layers.MaxPooling2D(pool_size=(2, 2)),
keras.layers.Flatten(),
keras.layers.Dense(10)
])
# Train the digit classification model
model.compile(optimizer='adam',
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=['accuracy'])
model.summary()
model.fit(
train_images,
train_labels,
epochs=1,
validation_split=0.1,
)
10ms/step - loss: 0.5411 - accuracy: 0.8507 - val_loss: 0.1142 - val_accuracy: 0.9705
<tensorflow.python.keras.callbacks.History at 0x7f9ee970ab90>
QAT .fit.
Now, performing QAT over the base model.
# -----------------------
# ------------- Quantization Aware Training -------------
import tensorflow_model_optimization as tfmot
quantize_model = tfmot.quantization.keras.quantize_model
# q_aware stands for for quantization aware.
q_aware_model = quantize_model(model)
# `quantize_model` requires a recompile.
q_aware_model.compile(optimizer='adam',
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=['accuracy'])
q_aware_model.summary()
train_images_subset = train_images[0:1000]
train_labels_subset = train_labels[0:1000]
q_aware_model.fit(train_images_subset, train_labels_subset,
batch_size=500, epochs=1, validation_split=0.1)
356ms/step - loss: 0.1431 - accuracy: 0.9629 - val_loss: 0.1626 - val_accuracy: 0.9500
<tensorflow.python.keras.callbacks.History at 0x7f9edf0aef90>
Checking performance
_, baseline_model_accuracy = model.evaluate(
test_images, test_labels, verbose=0)
_, q_aware_model_accuracy = q_aware_model.evaluate(
test_images, test_labels, verbose=0)
print('Baseline test accuracy:', baseline_model_accuracy)
print('Quant test accuracy:', q_aware_model_accuracy)
Baseline test accuracy: 0.9660999774932861
Quant test accuracy: 0.9660000205039978
QAT tf.GradientTape().
Here is the QAT training part on the base model. Note we can also perform custom training over the base model.
batch_size = 500
train_dataset = tf.data.Dataset.from_tensor_slices((train_images_subset,
train_labels_subset))
train_dataset = train_dataset.batch(batch_size=batch_size,
drop_remainder=False)
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
optimizer = tf.keras.optimizers.Adam()
for epoch in range(1):
for x, y in train_dataset:
with tf.GradientTape() as tape:
preds = q_aware_model(x, training=True)
loss = loss_fn(y, preds)
grads = tape.gradient(loss, q_aware_model.trainable_variables)
optimizer.apply_gradients(zip(grads, q_aware_model.trainable_variables))
_, baseline_model_accuracy = model.evaluate(
test_images, test_labels, verbose=0)
_, q_aware_model_accuracy = q_aware_model.evaluate(
test_images, test_labels, verbose=0)
print('Baseline test accuracy:', baseline_model_accuracy)
print('Quant test accuracy:', q_aware_model_accuracy)
Baseline test accuracy: 0.9660999774932861
Quant test accuracy: 0.9645000100135803
I tried the binary classification using MNIST only number “1” and “5”.
But the accuracy isn’t well.. The following program is anything wrong?
If you find something, please give me some advice.
loss: -9.9190e+04
accuracy: 0.5599
import tensorflow as tf
from tensorflow import keras
import numpy as np
import matplotlib.pyplot as plt
mnist = keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train = x_train / 255.0
x_test = x_test / 255.0
train_filter = np.where((y_train == 1) | (y_train == 5))
test_filter = np.where((y_test == 1) | (y_test == 5))
x_train, y_train = x_train[train_filter], y_train[train_filter]
x_test, y_test = x_test[test_filter], y_test[test_filter]
print("x_train", x_train.shape)
print("x_test", x_test.shape)
# x_train (12163, 28, 28)
# x_test (2027, 28, 28)
model = keras.Sequential(
[
keras.layers.Flatten(input_shape=(28, 28)),
keras.layers.Dense(128, activation="relu"),
keras.layers.Dense(1, activation="sigmoid"),
]
)
model.compile(optimizer="adam", loss="binary_crossentropy", metrics=["accuracy"])
model.fit(x_train, y_train, epochs=5)
loss, acc = model.evaluate(x_test, y_test, verbose=2)
print("accuracy:", acc)
# 2027/1 - 0s - loss: -9.9190e+04 - accuracy: 0.5599
# accuracy: 0.5599408
Your y_train and y_test is filled with class labels 1 and 5, but sigmoid activation in your last layer is squashing output between 0 and 1.
if you change 5 into 0 in your y you will get a really high accuracy:
y_train = np.where(y_train == 5, 0, y_train)
y_test = np.where(y_test == 5, 0, y_test)
result:
64/64 - 0s - loss: 0.0087 - accuracy: 0.9990