I've copied a piece of code to create a NN and after training it logs are successfully created but when I tried to visualise it using tensorboard it is showing that no scalar data is found.
This is code and logs are successfully created and even event files are there but it is showing
checkpoint_path = "autoencoder.h5" # For each epoch creating a checkpoint
checkpoint_dir = os.path.dirname(checkpoint_path)
cp_callback = tf.keras.callbacks.ModelCheckpoint(filepath=checkpoint_path,save_weights_only=False,verbose=0,save_best_only=False) # To save the model if the metric is improved
# Tensorbaord
! rm -rf ./logs_autoencoder/ # Removing all the files present in the directory
logdir = os.path.join("logs_autoencoder", datetime.datetime.now().strftime("%Y%m%d-%H%M%S")) # Directory for storing the logs that are required for tensorboard
%reload_ext tensorboard
%tensorboard --logdir $logdir
tensorboard_callback = tf.keras.callbacks.TensorBoard(logdir, histogram_freq=1)
lrScheduler = tf.keras.callbacks.ReduceLROnPlateau(monitor='val_loss',patience=2,factor=0.2,verbose=1)
callbacks = [cp_callback,tensorboard_callback,lrScheduler]
autoencoder.fit( train_dataset,shuffle=True,epochs=10,validation_data= test_dataset,callbacks=callbacks)
The output was like,
Epoch 1/10
1338/1338 [==============================] - 839s 626ms/step - loss: 0.0104 - val_loss: 0.0046 - lr: 0.0010
Epoch 2/10
1338/1338 [==============================] - 818s 611ms/step - loss: 0.0047 - val_loss: 0.0042 - lr: 0.0010
Epoch 3/10
1338/1338 [==============================] - 824s 616ms/step - loss: 0.0043 - val_loss: 0.0041 - lr: 0.0010
Epoch 4/10
1338/1338 [==============================] - 824s 616ms/step - loss: 0.0040 - val_loss: 0.0037 - lr: 0.0010
Epoch 5/10
1338/1338 [==============================] - 829s 619ms/step - loss: 0.0038 - val_loss: 0.0033 - lr: 0.0010
Epoch 6/10
1338/1338 [==============================] - 834s 624ms/step - loss: 0.0036 - val_loss: 0.0032 - lr: 0.0010
Epoch 7/10
1338/1338 [==============================] - 852s 637ms/step - loss: 0.0035 - val_loss: 0.0032 - lr: 0.0010
Epoch 8/10
1338/1338 [==============================] - ETA: 0s - loss: 0.0034
Epoch 8: ReduceLROnPlateau reducing learning rate to 0.00020000000949949026.
1338/1338 [==============================] - 953s 712ms/step - loss: 0.0034 - val_loss: 0.0031 - lr: 0.0010
Epoch 9/10
1338/1338 [==============================] - 962s 719ms/step - loss: 0.0033 - val_loss: 0.0031 - lr: 2.0000e-04
Epoch 10/10
1338/1338 [==============================] - ETA: 0s - loss: 0.0033
Epoch 10: ReduceLROnPlateau reducing learning rate to 4.0000001899898055e-05.
1338/1338 [==============================] - 939s 702ms/step - loss: 0.0033 - val_loss: 0.0031 - lr: 2.0000e-04
Out[16]:
<keras.callbacks.History at 0x7f8cfe7b2090>
This could simply be an issue of order of events in Jupyter notebook. I'd recommend breaking things up a little
checkpoint_path = "autoencoder.h5" # For each epoch creating a checkpoint
checkpoint_dir = os.path.dirname(checkpoint_path)
cp_callback = tf.keras.callbacks.ModelCheckpoint(filepath=checkpoint_path,save_weights_only=False,verbose=0,save_best_only=False) # To save the model if the metric is improved
# Tensorbaord
! rm -rf ./logs_autoencoder/ # Removing all the files present in the directory
logdir = os.path.join("logs_autoencoder", datetime.datetime.now().strftime("%Y%m%d-%H%M%S")) # Directory for storing the logs that are required for tensorboard
tensorboard_callback = tf.keras.callbacks.TensorBoard(logdir, histogram_freq=1)
lrScheduler = tf.keras.callbacks.ReduceLROnPlateau(monitor='val_loss',patience=2,factor=0.2,verbose=1)
callbacks = [cp_callback,tensorboard_callback,lrScheduler]
autoencoder.fit( train_dataset,shuffle=True,epochs=10,validation_data= test_dataset,callbacks=callbacks)
New cell block
%reload_ext tensorboard
%tensorboard --logdir $logdir
The tensorboard is launching before the model training starts, and says nothing is present. Then, if you try to re-run the cell, your !rm -rf line is deleting everything, so the tensorboard doesn't see the previous data
Related
Here is my complete code:
from tensorflow.keras.preprocessing.image import load_img
from tensorflow.keras.preprocessing.image import img_to_array
from tensorflow.keras.applications.vgg16 import preprocess_input
from tensorflow.keras.applications.vgg16 import decode_predictions
from tensorflow.keras.applications.vgg16 import VGG16
from tensorflow.keras.models import load_model
from tensorflow.keras.layers import Dense, Flatten, Dropout, Input
from tensorflow.keras.models import Model
import numpy as np
from os import listdir
from os.path import isfile, join
import matplotlib.pyplot as plt
train_cats_path = r"C:\Users\jrand\Downloads\dogscats\train\cats"
train_cats = [train_cats_path+"\\"+f for f in listdir(train_cats_path) if isfile(join(train_cats_path, f))]
train_cat_array = []
for tmp_cat in train_cats:
img = load_img(tmp_cat, target_size=(224, 224)) # Load image in with desired dimensions
img = img_to_array(img) # Convert the image to a numpy array
img = img.reshape(224, 224, 3)
img = preprocess_input(img)
train_cat_array.append(img)
x_train_cat_labels = np.array([1.0]*len(train_cat_array))
x_train_cat_images = np.array(train_cat_array)
train_dogs_path = r"C:\Users\jrand\Downloads\dogscats\train\cats"
train_dogs = [train_dogs_path+"\\"+f for f in listdir(train_dogs_path) if isfile(join(train_dogs_path, f))]
train_dog_array = []
for tmp_dog in train_dogs:
img = load_img(tmp_dog, target_size=(224, 224)) # Load image in with desired dimensions
img = img_to_array(img) # Convert the image to a numpy array
img = img.reshape(224, 224, 3)
img = preprocess_input(img)
train_dog_array.append(img)
x_train_dog_labels = np.array([0.0]*len(train_dog_array))
x_train_dog_images = np.array(train_dog_array)
print("len of dog images", len(x_train_dog_images), "len of cat images", len(x_train_cat_images))
print("len of dog labels", len(x_train_dog_labels), "len of cat labels", len(x_train_cat_labels))
x_train_images = np.concatenate([x_train_dog_images, x_train_cat_images])[0:1000]
x_train_labels = np.concatenate([x_train_dog_labels, x_train_cat_labels])[0:1000]
model = VGG16(weights = 'imagenet', include_top = False, input_shape = (224, 224, 3))
for layer in model.layers:
layer.trainable = False
x = Flatten()(model.output)
predictions = Dense(1, activation="sigmoid")(x)
new_model = Model(model.input, predictions)
# compile model
model.compile(optimizer="Adam", loss="binary_crossentropy", metrics=["accuracy"])
# train model
history = model.fit(x_train_images, x_train_labels, batch_size=1, epochs=100)
So, as you can see, I take the images of cats and dogs, generate labels for them, then I concatenate the two arrays together using numpy so that I can then train on them.
I guess my problems are as follows:
I have to reduce the size of my training set by splicing the arrays to the first 1000 images, which doesn't help things
I have to use a batch_size of 1, also doesn't help things
Can anyone give me some tips for improving this code and the NN performance?
Here is sample output as things currently stand:
len of dog images 11500 len of cat images 11500
len of dog labels 11500 len of cat labels 11500
2022-03-04 11:46:54.410085: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-03-04 11:46:55.170021: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1510] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 6613 MB memory: -> device: 0, name: NVIDIA GeForce GTX 1080, pci bus id: 0000:02:00.0, compute capability: 6.1
2022-03-04 11:46:55.170948: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1510] Created device /job:localhost/replica:0/task:0/device:GPU:1 with 6613 MB memory: -> device: 1, name: NVIDIA GeForce GTX 1080, pci bus id: 0000:04:00.0, compute capability: 6.1
2022-03-04 11:46:56.198632: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:185] None of the MLIR Optimization Passes are enabled (registered 2)
Epoch 1/100
2022-03-04 11:46:56.725619: I tensorflow/stream_executor/cuda/cuda_dnn.cc:369] Loaded cuDNN version 8201
1000/1000 [==============================] - 9s 7ms/step - loss: 2.1102 - accuracy: 1.0204e-04
Epoch 2/100
1000/1000 [==============================] - 7s 7ms/step - loss: 2.1102 - accuracy: 1.0204e-04
Epoch 3/100
1000/1000 [==============================] - 7s 7ms/step - loss: 2.1102 - accuracy: 1.0204e-04
Epoch 4/100
1000/1000 [==============================] - 7s 7ms/step - loss: 2.1102 - accuracy: 1.0204e-04
Epoch 5/100
1000/1000 [==============================] - 7s 7ms/step - loss: 2.1102 - accuracy: 1.0204e-04
Epoch 6/100
1000/1000 [==============================] - 7s 7ms/step - loss: 2.1102 - accuracy: 1.0204e-04
Epoch 7/100
1000/1000 [==============================] - 7s 7ms/step - loss: 2.1102 - accuracy: 1.0204e-04
Epoch 8/100
1000/1000 [==============================] - 7s 7ms/step - loss: 2.1102 - accuracy: 1.0204e-04
Epoch 9/100
1000/1000 [==============================] - 7s 7ms/step - loss: 2.1102 - accuracy: 1.0204e-04
Epoch 10/100
1000/1000 [==============================] - 7s 7ms/step - loss: 2.1102 - accuracy: 1.0204e-04
Epoch 11/100
1000/1000 [==============================] - 7s 7ms/step - loss: 2.1102 - accuracy: 1.0204e-04
Epoch 12/100
1000/1000 [==============================] - 7s 7ms/step - loss: 2.1102 - accuracy: 1.0204e-04
Epoch 13/100
1000/1000 [==============================] - 7s 7ms/step - loss: 2.1102 - accuracy: 1.0204e-04
Epoch 14/100
1000/1000 [==============================] - 7s 7ms/step - loss: 2.1102 - accuracy: 1.0204e-04
Epoch 15/100
1000/1000 [==============================] - 7s 7ms/step - loss: 2.1102 - accuracy: 1.0204e-04
Epoch 16/100
1000/1000 [==============================] - 7s 7ms/step - loss: 2.1102 - accuracy: 1.0204e-04
Epoch 17/100
1000/1000 [==============================] - 7s 7ms/step - loss: 2.1102 - accuracy: 1.0204e-04
Epoch 18/100
1000/1000 [==============================] - 7s 7ms/step - loss: 2.1102 - accuracy: 1.0204e-04
Epoch 19/100
1000/1000 [==============================] - 7s 7ms/step - loss: 2.1102 - accuracy: 1.0204e-04
Epoch 20/100
1000/1000 [==============================] - 7s 7ms/step - loss: 2.1102 - accuracy: 1.0204e-04
Epoch 21/100
1000/1000 [==============================] - 7s 7ms/step - loss: 2.1102 - accuracy: 1.0204e-04
Epoch 22/100
1000/1000 [==============================] - 7s 7ms/step - loss: 2.1102 - accuracy: 1.0204e-04
Epoch 23/100
1000/1000 [==============================] - 7s 7ms/step - loss: 2.1102 - accuracy: 1.0204e-04
Epoch 24/100
1000/1000 [==============================] - 7s 7ms/step - loss: 2.1102 - accuracy: 1.0204e-04
Epoch 25/100
1000/1000 [==============================] - 7s 7ms/step - loss: 2.1102 - accuracy: 1.0204e-04
Epoch 26/100
1000/1000 [==============================] - 7s 7ms/step - loss: 2.1102 - accuracy: 1.0204e-04
Epoch 27/100
1000/1000 [==============================] - 7s 7ms/step - loss: 2.1102 - accuracy: 1.0204e-04
Epoch 28/100
1000/1000 [==============================] - 7s 7ms/step - loss: 2.1102 - accuracy: 1.0204e-04
Epoch 29/100
1000/1000 [==============================] - 7s 7ms/step - loss: 2.1102 - accuracy: 1.0204e-04
Epoch 30/100
1000/1000 [==============================] - 7s 7ms/step - loss: 2.1102 - accuracy: 1.0204e-04
Epoch 31/100
1000/1000 [==============================] - 7s 7ms/step - loss: 2.1102 - accuracy: 1.0204e-04
Epoch 32/100
1000/1000 [==============================] - 7s 7ms/step - loss: 2.1102 - accuracy: 1.0204e-04
Epoch 33/100
1000/1000 [==============================] - 7s 7ms/step - loss: 2.1102 - accuracy: 1.0204e-04
Epoch 34/100
1000/1000 [==============================] - 7s 7ms/step - loss: 2.1102 - accuracy: 1.0204e-04
Epoch 35/100
1000/1000 [==============================] - 7s 7ms/step - loss: 2.1102 - accuracy: 1.0204e-04
Epoch 36/100
1000/1000 [==============================] - 7s 7ms/step - loss: 2.1102 - accuracy: 1.0204e-04
Epoch 37/100
1000/1000 [==============================] - 7s 7ms/step - loss: 2.1102 - accuracy: 1.0204e-04
Epoch 38/100
1000/1000 [==============================] - 7s 7ms/step - loss: 2.1102 - accuracy: 1.0204e-04
Epoch 39/100
1000/1000 [==============================] - 7s 7ms/step - loss: 2.1102 - accuracy: 1.0204e-04
Epoch 40/100
The reason I wrote this program is to get some experience with transfer learning. There may be better ways to accomplish this but this is the way I chose.
I updated the code and removed a loop I was messing with. Sorry about that.
I don't think the issue is the small dataset, since transfer learning is used to deal with smaller datasets.
The issue is that you are freezing all the layers of the pre-trained model (VGG), without adding any new Dense Layer. Then you call model.fit, but none of the layers are trainable. Therefore, nothing is allowed to change. In fact, your problem is not that you are getting very low accuracy, but that the accuracy doesn't change at all among epochs. This should be a red flag meaning something in your code is broken!
Try to add at least another Dense layer before the last.
EDIT:
You are also compiling and calling fit() on model instead of new_model.
I hope I've been helpful
I'm trying to follow the fine-tuning steps described in https://www.tensorflow.org/tutorials/images/transfer_learning#create_the_base_model_from_the_pre-trained_convnets to get a trained model for binary segmentation.
I create an encoder-decoder with the weights of the encoder being the ones of the MobileNetV2 and fixed as encoder.trainable = False. Then, I define my decoder as said in the tutorial and I train the network for 300 epochs using a learning rate of 0.005. I get the following loss value and Jaccard index during the lasts epochs:
Epoch 297/300
55/55 [==============================] - 85s 2s/step - loss: 0.2443 - jaccard_sparse3D: 0.5556 - accuracy: 0.9923 - val_loss: 0.0440 - val_jaccard_sparse3D: 0.3172 - val_accuracy: 0.9768
Epoch 298/300
55/55 [==============================] - 75s 1s/step - loss: 0.2437 - jaccard_sparse3D: 0.5190 - accuracy: 0.9932 - val_loss: 0.0422 - val_jaccard_sparse3D: 0.3281 - val_accuracy: 0.9776
Epoch 299/300
55/55 [==============================] - 78s 1s/step - loss: 0.2465 - jaccard_sparse3D: 0.4557 - accuracy: 0.9936 - val_loss: 0.0431 - val_jaccard_sparse3D: 0.3327 - val_accuracy: 0.9769
Epoch 300/300
55/55 [==============================] - 85s 2s/step - loss: 0.2467 - jaccard_sparse3D: 0.5030 - accuracy: 0.9923 - val_loss: 0.0463 - val_jaccard_sparse3D: 0.3315 - val_accuracy: 0.9740
I store all the weights of this model and then, I compute the fine-tuning with the following steps:
model.load_weights('my_pretrained_weights.h5')
model.trainable = True
model.compile(optimizer=Adam(learning_rate=0.00001, name='adam'),
loss=SparseCategoricalCrossentropy(from_logits=True),
metrics=[jaccard, "accuracy"])
model.fit(training_generator, validation_data=(val_x, val_y), epochs=5,
validation_batch_size=2, callbacks=callbacks)
Suddenly the performance of my model is way much worse than during the training of the decoder:
Epoch 1/5
55/55 [==============================] - 89s 2s/step - loss: 0.2417 - jaccard_sparse3D: 0.0843 - accuracy: 0.9946 - val_loss: 0.0079 - val_jaccard_sparse3D: 0.0312 - val_accuracy: 0.9992
Epoch 2/5
55/55 [==============================] - 90s 2s/step - loss: 0.1920 - jaccard_sparse3D: 0.1179 - accuracy: 0.9927 - val_loss: 0.0138 - val_jaccard_sparse3D: 7.1138e-05 - val_accuracy: 0.9998
Epoch 3/5
55/55 [==============================] - 95s 2s/step - loss: 0.2173 - jaccard_sparse3D: 0.1227 - accuracy: 0.9932 - val_loss: 0.0171 - val_jaccard_sparse3D: 0.0000e+00 - val_accuracy: 0.9999
Epoch 4/5
55/55 [==============================] - 94s 2s/step - loss: 0.2428 - jaccard_sparse3D: 0.1319 - accuracy: 0.9927 - val_loss: 0.0190 - val_jaccard_sparse3D: 0.0000e+00 - val_accuracy: 1.0000
Epoch 5/5
55/55 [==============================] - 97s 2s/step - loss: 0.1920 - jaccard_sparse3D: 0.1107 - accuracy: 0.9926 - val_loss: 0.0215 - val_jaccard_sparse3D: 0.0000e+00 - val_accuracy: 1.0000
Is there any known reason why this is happening? Is it normal?
Thank you in advance!
OK I found out what I do different that makes it NOT necessary to compile. I do not set encoder.trainable = False. What I do in the code below is equivalent
for layer in encoder.layers:
layer.trainable=False
then train your model. Then you can unfreeze the encoder weights with
for layer in encoder.layers:
layer.trainable=True
You do not need to recompile the model. I tested this and it works as expected. You can
verify by priniting model summary before and after and look at the number of trainable parameters. As for changing the learning rate I find it is best to use the the keras callback ReduceLROnPlateau to automatically adjust the learning rate based on validation loss. I also recommend using the EarlyStopping callback which monitors validation and halts training if the loss fails to reduce after 'patience' number of consecutive epochs. Setting restore_best_weights=True will load the weights for the epoch with the lowest validation loss so you don't have to save then reload the weights. Set epochs to a large number to ensure this callback activates. The code I use is shown below
es=tf.keras.callbacks.EarlyStopping( monitor="val_loss", patience=3,
verbose=1, restore_best_weights=True)
rlronp=tf.keras.callbacks.ReduceLROnPlateau( monitor="val_loss", factor=0.5, patience=1,
verbose=1)
callbacks=[es, rlronp]
In model.fit set callbacks=callbacks
I am training a classifier model on cats vs dogs data. The model is a minor variant of ResNet18 & returns a softmax probability for classes. However, I am noticing that the validation loss is majorly NaN whereas training loss is steadily decreasing & behaves as expected. Training & Validation accuracy increase epoch by epoch.
Epoch 1/15
312/312 [==============================] - 1372s 4s/step - loss: 0.7849 - accuracy: 0.5131 - val_loss: nan - val_accuracy: 0.5343
Epoch 2/15
312/312 [==============================] - 1372s 4s/step - loss: 0.6966 - accuracy: 0.5539 - val_loss: 13989871201999266517090304.0000 - val_accuracy: 0.5619
Epoch 3/15
312/312 [==============================] - 1373s 4s/step - loss: 0.6570 - accuracy: 0.6077 - val_loss: 747123703808.0000 - val_accuracy: 0.5679
Epoch 4/15
312/312 [==============================] - 1372s 4s/step - loss: 0.6180 - accuracy: 0.6483 - val_loss: nan - val_accuracy: 0.6747
Epoch 5/15
312/312 [==============================] - 1373s 4s/step - loss: 0.5838 - accuracy: 0.6852 - val_loss: nan - val_accuracy: 0.6240
Epoch 6/15
312/312 [==============================] - 1372s 4s/step - loss: 0.5338 - accuracy: 0.7301 - val_loss: 31236203781405710523301888.0000 - val_accuracy: 0.7590
Epoch 7/15
312/312 [==============================] - 1373s 4s/step - loss: 0.4872 - accuracy: 0.7646 - val_loss: 52170.8672 - val_accuracy: 0.7378
Epoch 8/15
312/312 [==============================] - 1372s 4s/step - loss: 0.4385 - accuracy: 0.7928 - val_loss: 2130819335420217655296.0000 - val_accuracy: 0.8101
Epoch 9/15
312/312 [==============================] - 1373s 4s/step - loss: 0.3966 - accuracy: 0.8206 - val_loss: 116842888.0000 - val_accuracy: 0.7857
Epoch 10/15
312/312 [==============================] - 1372s 4s/step - loss: 0.3643 - accuracy: 0.8391 - val_loss: nan - val_accuracy: 0.8199
Epoch 11/15
312/312 [==============================] - 1373s 4s/step - loss: 0.3285 - accuracy: 0.8557 - val_loss: 788904.2500 - val_accuracy: 0.8438
Epoch 12/15
312/312 [==============================] - 1372s 4s/step - loss: 0.3029 - accuracy: 0.8670 - val_loss: nan - val_accuracy: 0.8245
Epoch 13/15
312/312 [==============================] - 1373s 4s/step - loss: 0.2857 - accuracy: 0.8781 - val_loss: 121907.8594 - val_accuracy: 0.8444
Epoch 14/15
312/312 [==============================] - 1373s 4s/step - loss: 0.2585 - accuracy: 0.8891 - val_loss: nan - val_accuracy: 0.8674
Epoch 15/15
312/312 [==============================] - 1374s 4s/step - loss: 0.2430 - accuracy: 0.8965 - val_loss: 822.7968 - val_accuracy: 0.8776
I checked for the following -
Infinity/NaN in validation data
Infinity/NaN caused when normalizing data (using tf.keras.applications.resnet.preprocess_input)
If the model is predicting only one class & hence causing loss function to behave oddly
Training code for reference -
optimizer = tf.keras.optimizers.Adam(learning_rate=5e-3)
model = Resnet18(NUM_CLASSES=NUM_CLASSES) # variant of original model
model.compile(optimizer=optimizer, loss="categorical_crossentropy", metrics=["accuracy"])
history = model.fit(
train_dataset,
steps_per_epoch=len(X_train) // BATCH_SIZE,
epochs=EPOCHS,
validation_data=valid_dataset,
validation_steps=len(X_valid) // BATCH_SIZE,
verbose=1,
)
The most relevant answer I found was the last paragraph of the accepted answer here. However, that doesn't seem to be the case here as validation loss diverges by order of magnitudes compared to training loss & returns nan. Seems like the loss function is misbehaving.
I am currently studying the book hands on machine learning. I want to create a simple neural network, as described in the book chapter 10 for the mnist hand written data. But my model is stuck, and the accuracy is not increasing at all.
Here is my code:
import tensorflow as tf
from tensorflow import keras
import pandas as pd
import numpy as np
data = pd.read_csv('sample_data/mnist_train_small.csv', header=None)
test = pd.read_csv('sample_data/mnist_test.csv', header=None)
labels = data[0]
data = data.drop(0, axis=1)
test_labels = test[0]
test = test.drop(0, axis=1)
model = keras.models.Sequential([
keras.layers.Dense(300, activation='relu', input_shape=(784,)),
keras.layers.Dense(100, activation='relu'),
keras.layers.Dense(10, activation='softmax'),
])
model.compile(loss='sparse_categorical_crossentropy',
optimizer='sgd',
metrics=['accuracy'])
keras.utils.plot_model(model, show_shapes=True)
hist = model.fit(data.to_numpy(), labels.to_numpy(), epochs=20, validation_data=(test.to_numpy(), test_labels.to_numpy()))
The first few outputs are :
Epoch 1/20
625/625 [==============================] - 2s 3ms/step - loss: 2055059923226079526912.0000 - accuracy: 0.1115 - val_loss: 2.4539 - val_accuracy: 0.1134
Epoch 2/20
625/625 [==============================] - 2s 3ms/step - loss: 2.4160 - accuracy: 0.1085 - val_loss: 2.2979 - val_accuracy: 0.1008
Epoch 3/20
625/625 [==============================] - 2s 2ms/step - loss: 2.3006 - accuracy: 0.1110 - val_loss: 2.3014 - val_accuracy: 0.1136
Epoch 4/20
625/625 [==============================] - 2s 3ms/step - loss: 2.3009 - accuracy: 0.1121 - val_loss: 2.3014 - val_accuracy: 0.1136
Epoch 5/20
625/625 [==============================] - 2s 3ms/step - loss: 2.3009 - accuracy: 0.1121 - val_loss: 2.3014 - val_accuracy: 0.1136
Epoch 6/20
625/625 [==============================] - 2s 3ms/step - loss: 2.3008 - accuracy: 0.1121 - val_loss: 2.3014 - val_accuracy: 0.1136
Epoch 7/20
625/625 [==============================] - 2s 3ms/step - loss: 2.3008 - accuracy: 0.1121 - val_loss: 2.3014 - val_accuracy: 0.1136
Epoch 8/20
625/625 [==============================] - 2s 3ms/step - loss: 2.3008 - accuracy: 0.1121 - val_loss: 2.3014 - val_accuracy: 0.1136
Epoch 9/20
625/625 [==============================] - 2s 2ms/step - loss: 2.3008 - accuracy: 0.1121 - val_loss: 2.3014 - val_accuracy: 0.1136
Epoch 10/20
625/625 [==============================] - 2s 3ms/step - loss: 2.3008 - accuracy: 0.1121 - val_loss: 2.3014 - val_accuracy: 0.1136
Epoch 11/20
625/625 [==============================] - 2s 3ms/step - loss: 2.3008 - accuracy: 0.1121 - val_loss: 2.3014 - val_accuracy: 0.1136
Epoch 12/20
625/625 [==============================] - 2s 3ms/step - loss: 2.3008 - accuracy: 0.1121 - val_loss: 2.3014 - val_accuracy: 0.1136
Your loss function should be categorical_crossentrophy. Sparse is for large and mostly empty matrixes(word matrixes etc.). And also instead of data[] you can use data.iloc[]. And adam optimizer would be better in this problem.
I just started to work with tensorflow 2.0 and followed the simple example from its official website.
import tensorflow as tf
import tensorflow.keras.layers as layers
mnist = tf.keras.datasets.mnist
(t_x, t_y), (v_x, v_y) = mnist.load_data()
model = tf.keras.Sequential()
model.add(layers.Flatten())
model.add(layers.Dense(128, activation="relu"))
model.add(layers.Dropout(0.2))
model.add(layers.Dense(10))
lossFunc = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
model.compile(optimizer='adam', loss=lossFunc,
metrics=['accuracy'])
model.fit(t_x, t_y, epochs=5)
The output for the above code is:
Train on 60000 samples
Epoch 1/5
60000/60000 [==============================] - 4s 60us/sample - loss: 2.5368 - accuracy: 0.7455
Epoch 2/5
60000/60000 [==============================] - 3s 51us/sample - loss: 0.5846 - accuracy: 0.8446
Epoch 3/5
60000/60000 [==============================] - 3s 51us/sample - loss: 0.4751 - accuracy: 0.8757
Epoch 4/5
60000/60000 [==============================] - 3s 51us/sample - loss: 0.4112 - accuracy: 0.8915
Epoch 5/5
60000/60000 [==============================] - 3s 51us/sample - loss: 0.3732 - accuracy: 0.9018
However, if I change the lossFunc to the following:
def myfunc(y_true, y_pred):
return lossFunc(y_true, y_pred)
which just simply wrap the previous function, it performs totally differently. The output is:
Train on 60000 samples
Epoch 1/5
60000/60000 [==============================] - 4s 60us/sample - loss: 2.4444 - accuracy: 0.0889
Epoch 2/5
60000/60000 [==============================] - 3s 51us/sample - loss: 0.5696 - accuracy: 0.0933
Epoch 3/5
60000/60000 [==============================] - 3s 51us/sample - loss: 0.4493 - accuracy: 0.0947
Epoch 4/5
60000/60000 [==============================] - 3s 51us/sample - loss: 0.4046 - accuracy: 0.0947
Epoch 5/5
60000/60000 [==============================] - 3s 51us/sample - loss: 0.3805 - accuracy: 0.0943
The loss values are very similar but the accuracy values are totally different. Anyone knows what is the magic in it, and what is the correct way to write your own loss function?
When you use built-in loss function, you can use 'accuracy' as metric . Under the hood, tensorflow will select appropriate accuracy function (in your case it is tf.keras.metrics.SparseCategoricalAccuracy()).
When you define custom_loss function, then tensorflow doesn't know which accuracy function to use. In this case, you need to explicitly specify that it is tf.keras.metrics.SparseCategoricalAccuracy(). Please check the gist hub gist here.
The code modification and the output is as follows
model2 = tf.keras.Sequential()
model2.add(layers.Flatten())
model2.add(layers.Dense(128, activation="relu"))
model2.add(layers.Dropout(0.2))
model2.add(layers.Dense(10))
lossFunc = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
model2.compile(optimizer='adam', loss=myfunc,
metrics=['accuracy',tf.keras.metrics.SparseCategoricalAccuracy()])
model2.fit(t_x, t_y, epochs=5)
output
Train on 60000 samples
Epoch 1/5
60000/60000 [==============================] - 5s 81us/sample - loss: 2.2295 - accuracy: 0.0917 - sparse_categorical_accuracy: 0.7483
Epoch 2/5
60000/60000 [==============================] - 5s 76us/sample - loss: 0.5827 - accuracy: 0.0922 - sparse_categorical_accuracy: 0.8450
Epoch 3/5
60000/60000 [==============================] - 5s 76us/sample - loss: 0.4602 - accuracy: 0.0933 - sparse_categorical_accuracy: 0.8760
Epoch 4/5
60000/60000 [==============================] - 5s 76us/sample - loss: 0.4197 - accuracy: 0.0946 - sparse_categorical_accuracy: 0.8910
Epoch 5/5
60000/60000 [==============================] - 5s 76us/sample - loss: 0.3965 - accuracy: 0.0937 - sparse_categorical_accuracy: 0.8979
<tensorflow.python.keras.callbacks.History at 0x7f5095286780>
Hope this helps