I have trained a simple CNN for binary image-classification. Then I tried to make predictions. I used ImageDataGenerator to create a prediction-iterator (predict_it). Then I used
probabilities = model.predict(predict_it)
and got the following results, which are plausible:
print(probabilities)
[[9.0815449e-01]
[1.9250402e-02]
[9.8504424e-01]
...
[4.3504283e-06]
[3.5914429e-06]
[7.5442227e-07]]
So the model itself seems to work correctly. However, this method of prediction processes an entire subdirectory filled with images. Therefore I tried to process a single image. But then I always get the same value when executing the following code (multiple times for 8 different images):
probabilitiy = model.predict(single_image)
Result:
1/1 [==============================] - 0s 82ms/step
probability: [[0.5078959]]
1/1 [==============================] - 0s 33ms/step
probability: [[0.5078958]]
1/1 [==============================] - 0s 32ms/step
probability: [[0.5078958]]
1/1 [==============================] - 0s 42ms/step
probability: [[0.5078958]]
1/1 [==============================] - 0s 45ms/step
probability: [[0.50789577]]
1/1 [==============================] - 0s 39ms/step
probability: [[0.5078958]]
1/1 [==============================] - 0s 32ms/step
probability: [[0.5078957]]
1/1 [==============================] - 0s 33ms/step
probability: [[0.5078958]]
Does anyone have an idea how this can happen?
Thanks in advance.
I tried to save the model in different forms: (.keras and .h5) - but that didn't change the results mentioned above.
The code I use to preprocess a single image:
# a function to load and prepare each image
def load_image(filename):
# load the image
img = load_img(filename, grayscale=False, target_size=(256, 256))
# convert to array
img = img_to_array(img)
# reshape into a single sample with 1 channel
img = img.reshape(1, 256, 256, 3)
# prepare pixel data
img = img.astype("float32")
img = img / 255.0
return img
# run through all files in a specified folder
for filename in files:
# load a single image
img = load_image(filename)
# predict the probability
probability = model.predict(img)
# display the result
print(f"probability: {probability}")
I actually found the solution in the Keras documentation on the rescaling layer:
"Rescaling is applied both during training and inference."
Since my model has a rescaling-layer I must pass an image to the model (when doing single image prediction) which is NOT rescaled already. Otherwise rescaling is applied twice!
So the correct code for preprocessing a single image for prediction is:
def load_image(filename):
# load the image
img = load_img(filename, grayscale=False, target_size=(256, 256))
# convert to array
img = img_to_array(img)
# reshape into a single sample with 1 channel
img = img.reshape(1, 256, 256, 3)
# prepare pixel data
img = img.astype("float32")
return img
# run through all files in a specified folder
for filename in files:
# load a single image
img = load_image(filename)
# predict the probability
probability = model.predict(img)
# display the result
print(f"probability: {probability}")```
Related
Is there any way to show the training progress from the TensorFlow linear estimator: tf.estimator.LinearClassifier().train() similar to how the progress output would be with a model.fit() for each Epoch?
tensorflow==2.9.2
Epoch 1/10
1875/1875 [==============================] - 5s 3ms/step - loss: 0.4964 - accuracy: 0.8270
Epoch 2/10
1875/1875 [==============================] - 4s 2ms/step - loss: 0.3751 - accuracy: 0.8652
Epoch 3/10
1875/1875 [==============================] - 5s 3ms/step - loss: 0.3382 - accuracy: 0.8762
Here is a sample of my code
#input function
def make_input_fn(data_df, label_df, num_epochs=1000, shuffle=True, batch_size=32):
def input_function(): # inner function, this will be returned
ds = tf.data.Dataset.from_tensor_slices((dict(data_df), label_df)) # create tf.data.Dataset object with data and its label
if shuffle:
ds = ds.shuffle(1000) # randomize order of data
ds = ds.batch(batch_size).repeat(num_epochs) # split dataset into batches of 32 and repeat process for number of epochs
return ds # return a batch of the dataset
return input_function # return a function object for use
train_input_fn = make_input_fn(dftrain, y_train) # here we will call the input_function that was returned to us to get a dataset object we can feed to the model
eval_input_fn = make_input_fn(dfeval, y_eval, num_epochs=1, shuffle=False)
pre_input_fn = make_input_fn(dfpre, y_pre, num_epochs=1, shuffle=False)
linear_est = tf.estimator.LinearClassifier(feature_columns=feature_columns)
linear_est.train(train_input_fn) # train
result = linear_est.evaluate(eval_input_fn)
What I've been doing, and I'm sure this isn't recommended (but I've not seen another method) is to run linear_est.train multiple times, and access the return of linear_est.evaluate() as so:
loss_by_epoch = list()
current_loss = 1.0
acceptable_loss = 0.4
while current_loss > acceptable_loss:
linear_est.train(train_input_fn)
result = linear_est.evaluate(eval_input_fn)
current_loss = result['loss']
loss_by_epoch.append(current_loss)
print(loss_by_epoch)
P.S. If anyone else wants to answer this question, feel free; this answer seems like the only way, and I hope it isn't.
I'm trying to train a U-Net like model to segment the German Asphalt Pavement Distress Dataset.
Mask images are stored as grey value images.
Coding of the grey values:
0 = VOID, 1 = intact road, 2 = applied patch, 3 = pothole, 4 = inlaid patch, 5 = open joint, 6 = crack 7 = street inventory
I found the following colab notebook which was implementing U-Net segmentation on Oxford pets dataset:
https://colab.research.google.com/github/keras-team/keras-io/blob/master/examples/vision/ipynb/oxford_pets_image_segmentation.ipynb
I modified the notebook to fit my problem of GAPs segmentation, and this is a link to my modified notebook:
https://colab.research.google.com/drive/1YfM4lC78QNdfbkgz-1LGSKaBG4-65dkC?usp=sharing
The training is running, but while the loss is decreasing, but accuracy is never increasing above 0.05. I'm stuck with this issue for days now, and I need help about how to get the model to train properly.
The following is a link to the dataset images and masks:
https://drive.google.com/drive/folders/1-JvLSa9b1falqEake2KVaYYtyVh-dgKY?usp=sharing
In the Sequence class, you do not shuffle the content of the batches, only the batch order is shuffled with the fit method. You have to shuffle the order of all the data at each epoch.
Here is a way to do it in a Sequence subclass:
class OxfordPets(keras.utils.Sequence):
"""Helper to iterate over the data (as Numpy arrays)."""
def __init__(self, batch_size, img_size, input_img_paths, target_img_paths):
self.batch_size = batch_size
self.img_size = img_size
self.input_img_paths = input_img_paths
self.target_img_paths = target_img_paths
self.set_len = len(self.target_img_paths) // self.batch_size
self.indices = random.sample(range(self.set_len), k=self.set_len)
def __len__(self):
return self.set_len
def __getitem__(self, idx):
"""Returns tuple (input, target) correspond to batch #idx."""
i = idx * self.batch_size
indices = self.indices[i : i + self.batch_size]
batch_input_img_paths = [self.input_img_paths[k] for k in indices]
batch_target_img_paths = [self.target_img_paths[k] for k in indices]
x = np.zeros((self.batch_size,) + self.img_size + (3,), dtype="float32")
for j, path in enumerate(batch_input_img_paths):
img = load_img(path, target_size=self.img_size)
x[j] = img
y = np.zeros((self.batch_size,) + self.img_size + (1,), dtype="uint8")
for j, path in enumerate(batch_target_img_paths):
img = load_img(path, target_size=self.img_size, color_mode="grayscale")
y[j] = np.expand_dims(img, 2)
# Ground truth labels are 1, 2, 3. Subtract one to make them 0, 1, 2:
#y[j] -= 1 # I commented this line out because the ground truth labels of GAPs dataset are 0, 1, 2, 3, 4, 5, 6, 7
return x, y
def on_epoch_end(self):
self.indices = random.sample(range(self.set_len), k=self.set_len)
self.indices is a random shuffle of the all indices range(self.set_len) and it is built in the constructor and at the end of each epoch. This permits to shuffle the order of all the data.
Using rmsprop optimizer, it works then :
Epoch 1/15
88/88 [==============================] - 96s 1s/step - loss: 1.9617 - categorical_accuracy: 0.9156 - val_loss: 5.8705 - val_categorical_accuracy: 0.9375
Epoch 2/15
88/88 [==============================] - 93s 1s/step - loss: 0.4754 - categorical_accuracy: 0.9369 - val_loss: 1.9207 - val_categorical_accuracy: 0.9375
Epoch 3/15
88/88 [==============================] - 94s 1s/step - loss: 0.4497 - categorical_accuracy: 0.9447 - val_loss: 9.3833 - val_categorical_accuracy: 0.9375
Epoch 4/15
88/88 [==============================] - 94s 1s/step - loss: 0.3173 - categorical_accuracy: 0.9423 - val_loss: 14.2518 - val_categorical_accuracy: 0.9369
Epoch 5/15
88/88 [==============================] - 94s 1s/step - loss: 0.0645 - categorical_accuracy: 0.9400 - val_loss: 110.9821 - val_categorical_accuracy: 0.8963
Note that there is very quickly some overfitting.
I am working on the dataset given in the paper https://arxiv.org/ftp/arxiv/papers/1511/1511.02459.pdf
In this paper, a dataset of images (portraits of people) is labeled by a floating number between 1 and 5 (1 ugly, 5 good looking). I wanted to work on this dataset and use MobileNetV2 with transfer learning (pretrained on Imagenet) in Tensorflow 2.4.0-dev20201009 with CUDA 11.1 on my RTX 3070 8gb. I don't really see my mistake but training my model yields often in constant validation loss, for example:
78/78 [==============================] - ETA: 0s - loss: 52145660442.33472020-11-20 13:19:36.796481: E tensorflow/core/grappler/optimizers/meta_optimizer.cc:596] layout failed: Invalid argument: Size of values 2 does not match size of permutation 4 # fanin shape insequential/dense/BiasAdd-0-TransposeNHWCToNCHW-LayoutOptimizer
78/78 [==============================] - 16s 70ms/step - loss: 51654522711.5709 - val_loss: 9.5415
Epoch 2/300
78/78 [==============================] - 4s 52ms/step - loss: 9.4870 - val_loss: 9.5415
Epoch 3/300
78/78 [==============================] - 4s 52ms/step - loss: 9.3986 - val_loss: 9.5415
Epoch 4/300
78/78 [==============================] - 4s 51ms/step - loss: 9.4950 - val_loss: 9.5415
Epoch 5/300
78/78 [==============================] - 4s 52ms/step - loss: 9.4076 - val_loss: 9.5415
Epoch 6/300
78/78 [==============================] - 4s 52ms/step - loss: 9.4993 - val_loss: 9.5415
Epoch 7/300
78/78 [==============================] - 4s 52ms/step - loss: 9.3758 - val_loss: 9.5415
...
The validation loss would remain constant for 300 epochs. My code can be found here below. Let me summarize:
I used transfer-learning from Imagenet and froze the convolutional base of MobileNetV2.
I added a dense layer as the classificator and 1 output neuron. The loss function I used is MSE. The optimizer in the code is SGD, and I also tried ADAM which could also yield constant loss values on the validation set.
The above error (constant val loss) occurs also with different learning rates and with ADAM. Sometimes the same learning rate yields not constant val loss but reasonable loss. I assume this is due to the randomized weights initialization method on the dense layers in my classificator. I even tried absurd learning_rates like 10, and values are still constant. If the lr is very high then changes should be clearly seen! This is not the case. What is wrong?
My code:
import os
from typing import Dict, Any
from PIL import Image
from sklearn.model_selection import GridSearchCV
import tensorflow as tf
from tensorflow.keras.applications.mobilenet_v2 import MobileNetV2
from tensorflow.keras import layers
from tensorflow import keras
import matplotlib.pyplot as plt
import pickle
import numpy as np
import cv2
import random
#method to create the model
def create_model(IMG_SIZE, lr):
#Limit memore usage of GPU
gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
try:
tf.config.experimental.set_virtual_device_configuration(gpus[0], [
tf.config.experimental.VirtualDeviceConfiguration(memory_limit=1024*7)])
except RuntimeError as e:
print(e)
model = keras.Sequential()
model.add(MobileNetV2(input_shape=(IMG_SIZE, IMG_SIZE, 3), include_top=False))
model.layers[0].trainable = False
model.add(layers.GlobalAveragePooling2D())
model.add(tf.keras.layers.Dropout(0.8))
model.add(layers.Dense(128, activation="relu"))
model.add(layers.Dense(1, activation="relu"))
#use adam or sgd as optimizers
adam = tf.keras.optimizers.Adam(learning_rate=lr, beta_1=0.9, beta_2=0.98,
epsilon=1e-9)
sgd = tf.keras.optimizers.SGD(lr=lr, decay=1e-6, momentum=0.5)
model.compile(optimizer=sgd,
loss=tf.losses.mean_squared_error,
)
model.summary()
return model
#preprocessing
def loadImages(IMG_SIZE):
path = os.path.join(os.getcwd(), 'data\\Images')
training_data=[]
labelMap = getLabelMap()
for img in os.listdir(path):
out_array = np.zeros((350,350, 3), np.float32) #original size of images in the dataset
try:
img_array = cv2.imread(os.path.join(path, img))
img_array=img_array.astype('float32') #cast to float because to prevent normalization erros
out_array = cv2.normalize(img_array, out_array, 0, 1, cv2.NORM_MINMAX) #normalize image
out_array = cv2.resize(out_array, (IMG_SIZE, IMG_SIZE)) #resize, bc we need 224x224 for Imagenet pretrained weights
training_data.append([out_array, float(labelMap[img])])
except Exception as e:
pass
return training_data
#preprocessing, the txt file All_labels.txt has lines of the form 'filename.jpg 3.2' and 3.2 is the label
def getLabelMap():
map = {}
path = os.getcwd()
path = os.path.join(path, "data\\train_test_files\\All_labels.txt")
f = open(path, "r")
for line in f:
line = line.split()
map[line[0]] = line[1]
f.close()
return map
#not important, in case you want to see the images after preprocessing
def showimg(image):
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
plt.imshow(image)
plt.show()
#pickle the preprocessed data
def pickle_it(training_set, IMG_SIZE):
X = []
Y = []
for features, label in training_set:
X.append(features)
Y.append(label)
X = np.array(X).reshape(-1, IMG_SIZE, IMG_SIZE, 3)
Y = np.array(Y)
pickle_out = open("X.pickle", "wb")
pickle.dump(X, pickle_out)
pickle_out.close()
pickle_out = open("Y.pickle", "wb")
pickle.dump(Y, pickle_out)
pickle_out.close()
#for prediction after training the model
def betterThan(y, Y):
Z=np.sort(Y)
cnt = 0
for z in Z:
if z>y:
break
else:
cnt = cnt+1
return float(cnt/len(Y))
#for prediction after training the model
def predictImage(image, model, Y):
img_array = cv2.imread(image)
img_array = cv2.resize(img_array, (IMG_SIZE, IMG_SIZE))
img_array = np.array(img_array).reshape(-1, IMG_SIZE, IMG_SIZE, 3)
y = model.predict(img_array)
per = betterThan(y, Y)
print('You look better than ' + str(per) + '% of the dataset')
#Main/Driver function
#Preprocessing
IMG_SIZE = 224
training_set=[]
training_set = loadImages(IMG_SIZE)
random.shuffle(training_set)
pickle_it(training_set, IMG_SIZE) #I pickle my data, so that I don't always have to go through the preprocessing
#Load preprocessed data
X = pickle.load(open("X.pickle", "rb"))
Y = pickle.load(open("Y.pickle", "rb"))
#Just to check that the images look correct
showimg(X[0])
# define the grid search parameters, feel free to edit the grids
batch_size = [64]
epochsGrid = [300]
learning_rate = [0.1]
#save models and best parameters found in grid search
size_histories = {}
min_val_loss = 10
best_para = {}
#ignore this, used for bugs on my gpu... You possibly don't need this
config = tf.compat.v1.ConfigProto(gpu_options=tf.compat.v1.GPUOptions(allow_growth=True))
sess = tf.compat.v1.Session(config=config)
#grid search, training the model
for epochs in epochsGrid:
for batch in batch_size:
for lr in learning_rate:
model = create_model(IMG_SIZE, lr)
model_name = str(epochs) + '_' + str(batch) + '_' + str(lr)
#train the model with the given hyperparameters
size_histories[model_name] = model.fit(X, Y, batch_size=batch, epochs=epochs, validation_split=0.1)
# save model with the best loss value
if min(size_histories[model_name].history['val_loss']) < min_val_loss:
min_val_loss = min(size_histories[model_name].history['val_loss'])
best_para['epoch'] = epochs
best_para['batch'] = batch
best_para['lr'] = lr
model.save('savedModel')
#If you want to make prediction
model = tf.keras.models.load_model("savedModel")
image = os.path.join(os.getcwd(), 'data\\otherImages\\beautifulWomen.jpg')
predictImage(image, model, Y)
EDIT:
I have found the issue. It is 'relu' in the output neuron. When I change my loss from RMSE to MAPE I will see that I got a 100 percent error on validation. I assume this is because all my validation data is output to 0. This is only possible when the value in the output neuron before 'relu' is negative. I don't know why this is the case. But removing 'relu' will yield better training.
Does anyone know why 'relu' causes this problem with regression problems?
If this is your last layer
model.add(layers.Dense(1, activation="relu"))
then your models final output is y if y > 0 else 0. At your untrained state, your model could very well have y pinned to something like -17 or 17 with fairly equal chance. In the case of -17, the relu will convert that to 0 and also set the gradient to 0, which means the network doesn't learn. Yeah, the network doesn't learn anything from any part of a network where a relu unit output 0. In the case of the layer before
model.add(layers.Dense(128, activation="relu"))
there will be a really good chance that about half of the units will fire with a positive value and so they learn, so that layer is fine.
What can be done in the case of a bad initialization or after training a bad state in which the output of that last layer is pushed down to below 0? Well, what if we just don't use relu. What activation to use? None! Let's look at what that would be
1: model = keras.Sequential()
2: model.add(MobileNetV2(input_shape=(IMG_SIZE, IMG_SIZE, 3), include_top=False))
3: model.layers[0].trainable = False
4: model.add(layers.GlobalAveragePooling2D())
5: model.add(tf.keras.layers.Dropout(0.8))
6: model.add(layers.Dense(128, activation="relu"))
7: model.add(layers.Dense(1))
Lines 1-6 are all the same. It is important to note that the output of line 6 passes through the non-linear relu activation, and so there is the capability to learn non-linearities. Line 7, without an activation function will be a linear combination of Line 6, with a full ability to generate gradients in the positive and negative output region. When backprop is applied to learn the target values of 1 to 5, if the network outputs -17, it can learn to output a larger number. Yeah!
If you'd like to have 2 layers of nonlinearity, I'd suggest the following
1: model = keras.Sequential()
2: model.add(MobileNetV2(input_shape=(IMG_SIZE, IMG_SIZE, 3), include_top=False))
3: model.layers[0].trainable = False
4: model.add(layers.GlobalAveragePooling2D())
5: model.add(layers.Dense(128, activation="tanh"))
6: model.add(layers.Dense(64, activation="tanh"))
7: model.add(layers.Dense(1))
Ditch the dropout unless you have actual proof that it helps in this very specific network (and right now I suspect you don't). Try tanh as your hidden layer activation function. It has some nice features, like being positive and negative, gradient even with large and/or negative numbers, and acts somewhat to automatically regularize weights. But, importantly, the last output either has no activation function.
I am running the code from https://www.tensorflow.org/tutorials/text/text_generation. I will copy it at the bottom of the question. If I change the EPOCHS line to
EPOCHS = 100
something odd happens to the loss. It starts by going down, as in:
Epoch 1/100
172/172 [==============================] - 301s 2s/step - loss: 2.7219
Epoch 2/100
172/172 [==============================] - 328s 2s/step - loss: 1.9963
Epoch 3/100
172/172 [==============================] - 344s 2s/step - loss: 1.7313
Epoch 4/100
172/172 [==============================] - 321s 2s/step - loss: 1.5778
Epoch 5/100
172/172 [==============================] - 325s 2s/step - loss: 1.4840
reaching it's lowest level at Epoch 46/100 when the loss is 0.6233. It then goes back up again finishing with:
Epoch 96/100
172/172 [==============================] - 292s 2s/step - loss: 0.8749
Epoch 97/100
172/172 [==============================] - 292s 2s/step - loss: 0.8933
Epoch 98/100
172/172 [==============================] - 292s 2s/step - loss: 0.9073
Epoch 99/100
172/172 [==============================] - 292s 2s/step - loss: 0.9181
Epoch 100/100
172/172 [==============================] - 292s 2s/step - loss: 0.9298
Why is it doing this and what does it mean?
import tensorflow as tf
import numpy as np
import os
import time
path_to_file = tf.keras.utils.get_file('shakespeare.txt', 'https://storage.googleapis.com/download.tensorflow.org/data/shakespeare.txt')
# Read, then decode for py2 compat.
text = open(path_to_file, 'rb').read().decode(encoding='utf-8')
# length of text is the number of characters in it
print('Length of text: {} characters'.format(len(text)))
# Take a look at the first 250 characters in text
print(text[:250])
# The unique characters in the file
vocab = sorted(set(text))
print('{} unique characters'.format(len(vocab)))
# Creating a mapping from unique characters to indices
char2idx = {u:i for i, u in enumerate(vocab)}
idx2char = np.array(vocab)
text_as_int = np.array([char2idx[c] for c in text])
print('{')
for char,_ in zip(char2idx, range(20)):
print(' {:4s}: {:3d},'.format(repr(char), char2idx[char]))
print(' ...\n}')
# Show how the first 13 characters from the text are mapped to integers
print('{} ---- characters mapped to int ---- > {}'.format(repr(text[:13]), text_as_int[:13]))
# The maximum length sentence you want for a single input in characters
seq_length = 100
examples_per_epoch = len(text)//(seq_length+1)
# Create training examples / targets
char_dataset = tf.data.Dataset.from_tensor_slices(text_as_int)
for i in char_dataset.take(5):
print(idx2char[i.numpy()])
sequences = char_dataset.batch(seq_length+1, drop_remainder=True)
for item in sequences.take(5):
print(repr(''.join(idx2char[item.numpy()])))
def split_input_target(chunk):
input_text = chunk[:-1]
target_text = chunk[1:]
return input_text, target_text
dataset = sequences.map(split_input_target)
for input_example, target_example in dataset.take(1):
print('Input data: ', repr(''.join(idx2char[input_example.numpy()])))
print('Target data:', repr(''.join(idx2char[target_example.numpy()])))
for i, (input_idx, target_idx) in enumerate(zip(input_example[:5], target_example[:5])):
print("Step {:4d}".format(i))
print(" input: {} ({:s})".format(input_idx, repr(idx2char[input_idx])))
print(" expected output: {} ({:s})".format(target_idx, repr(idx2char[target_idx])))
# Batch size
BATCH_SIZE = 64
# Buffer size to shuffle the dataset
# (TF data is designed to work with possibly infinite sequences,
# so it doesn't attempt to shuffle the entire sequence in memory. Instead,
# it maintains a buffer in which it shuffles elements).
BUFFER_SIZE = 10000
dataset = dataset.shuffle(BUFFER_SIZE).batch(BATCH_SIZE, drop_remainder=True)
dataset
# Length of the vocabulary in chars
vocab_size = len(vocab)
# The embedding dimension
embedding_dim = 256
# Number of RNN units
rnn_units = 1024
def build_model(vocab_size, embedding_dim, rnn_units, batch_size):
model = tf.keras.Sequential([
tf.keras.layers.Embedding(vocab_size, embedding_dim,
batch_input_shape=[batch_size, None]),
tf.keras.layers.GRU(rnn_units,
return_sequences=True,
stateful=True,
recurrent_initializer='glorot_uniform'),
tf.keras.layers.Dense(vocab_size)
])
return model
model = build_model(
vocab_size=len(vocab),
embedding_dim=embedding_dim,
rnn_units=rnn_units,
batch_size=BATCH_SIZE)
for input_example_batch, target_example_batch in dataset.take(1):
example_batch_predictions = model(input_example_batch)
print(example_batch_predictions.shape, "# (batch_size, sequence_length, vocab_size)")
model.summary()
sampled_indices = tf.random.categorical(example_batch_predictions[0], num_samples=1)
sampled_indices = tf.squeeze(sampled_indices,axis=-1).numpy()
sampled_indices
print("Input: \n", repr("".join(idx2char[input_example_batch[0]])))
print()
print("Next Char Predictions: \n", repr("".join(idx2char[sampled_indices ])))
def loss(labels, logits):
return tf.keras.losses.sparse_categorical_crossentropy(labels, logits, from_logits=True)
example_batch_loss = loss(target_example_batch, example_batch_predictions)
print("Prediction shape: ", example_batch_predictions.shape, " # (batch_size, sequence_length, vocab_size)")
print("scalar_loss: ", example_batch_loss.numpy().mean())
model.compile(optimizer='adam', loss=loss)
# Directory where the checkpoints will be saved
checkpoint_dir = './training_checkpoints'
# Name of the checkpoint files
checkpoint_prefix = os.path.join(checkpoint_dir, "ckpt_{epoch}")
checkpoint_callback = tf.keras.callbacks.ModelCheckpoint(
filepath=checkpoint_prefix,
save_weights_only=True)
EPOCHS = 100
history = model.fit(dataset, epochs=EPOCHS, callbacks=[checkpoint_callback])
tf.train.latest_checkpoint(checkpoint_dir)
model = build_model(vocab_size, embedding_dim, rnn_units, batch_size=1)
model.load_weights(tf.train.latest_checkpoint(checkpoint_dir))
model.build(tf.TensorShape([1, None]))
model.summary()
def generate_text(model, start_string):
# Evaluation step (generating text using the learned model)
# Number of characters to generate
num_generate = 1000
# Converting our start string to numbers (vectorizing)
input_eval = [char2idx[s] for s in start_string]
input_eval = tf.expand_dims(input_eval, 0)
# Empty string to store our results
text_generated = []
# Low temperature results in more predictable text.
# Higher temperature results in more surprising text.
# Experiment to find the best setting.
temperature = 1.0
# Here batch size == 1
model.reset_states()
for i in range(num_generate):
predictions = model(input_eval)
# remove the batch dimension
predictions = tf.squeeze(predictions, 0)
# using a categorical distribution to predict the character returned by the model
predictions = predictions / temperature
predicted_id = tf.random.categorical(predictions, num_samples=1)[-1,0].numpy()
# Pass the predicted character as the next input to the model
# along with the previous hidden state
input_eval = tf.expand_dims([predicted_id], 0)
text_generated.append(idx2char[predicted_id])
return (start_string + ''.join(text_generated))
print(generate_text(model, start_string=u"ROMEO: "))
This particular model can't fit any better than this, since it is limited to its architecture and only one symbol generation per step.
A loss steadily going up after some epochs is a usual thing indicating your model overtrains, and there is no point in training any further.
You could tune hyperparameters to (possibly) make some minor improvements.
Edit:
To tune embedding dimensions, rnn units, and sequence length change those values:
seq_length = 100
embedding_dim = 256
rnn_units = 1024
To tune learning rate replace this lane:
model.compile(optimizer='adam', loss=loss)
with this one:
model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=0.005), loss=loss)
Also, you can add arbitrary layers to build_model function.
Here is an example with an extra GRU layer:
def build_model(vocab_size, embedding_dim, rnn_units, batch_size):
model = tf.keras.Sequential([
tf.keras.layers.Embedding(vocab_size, embedding_dim,
batch_input_shape=[batch_size, None]),
tf.keras.layers.GRU(rnn_units,
return_sequences=True,
stateful=True,
recurrent_initializer='glorot_uniform'),
tf.keras.layers.GRU(rnn_units,
return_sequences=True,
stateful=True,
recurrent_initializer='glorot_uniform'),
tf.keras.layers.Dense(vocab_size)
])
return model
I am training CNN with tf.keras. After of saving checkpoint Keras didn't start next epoch
Note:
1)As a saver was used tf.keras.callbacks.ModelCeckpoint
2)For training used fit_generator()
def iterate_minibatches(inputs, targets, batchsize):
assert len(inputs) == len(targets)
indices = np.arange(len(inputs))
np.random.shuffle(indices)
for start_idx in np.arange(0, len(inputs) - batchsize + 1, batchsize):
excerpt = indices[start_idx:start_idx + batchsize]
yield load_images(inputs[excerpt], targets[excerpt])
#Model path
model_path = "C:/Users/Paperspace/Desktop/checkpoints/cp.ckpt"
#saver = tf.train.Saver(max_to_keep=3)
cp_callback = tf.keras.callbacks.ModelCheckpoint(model_path,
verbose=1,
save_weights_only=True,
period=2)
tb_callback =TensorBoard(log_dir="./Graph/{}".format(time()))
batch_size = 750
history = model.fit_generator(generator=iterate_minibatches(X_train, Y_train,batch_size),
validation_data=iterate_minibatches(X_test, Y_test, batch_size),
# validation_data=None,
steps_per_epoch=len(X_train)//batch_size,
validation_steps=len(X_test)//batch_size,
verbose=1,
epochs=30,
callbacks=[cp_callback,tb_callback]
)
Actual result it stops training without any issue.
Expected result to go next epoch.
**Log**
Epoch 1/30
53/53 [==============================] - 919s 17s/step - loss: 1.2445 - acc: 0.0718
426/426 [==============================] - 7058s 17s/step - loss: 1.7877 - acc: 0.0687 - val_loss: 1.2445 - val_acc: 0.0718
Epoch 2/30
WARNING:tensorflow:Your dataset iterator ran out of data.
Epoch 00002: saving model to C:/Users/Paperspace/Desktop/checkpoints/cp.ckpt
WARNING:tensorflow:This model was compiled with a Keras optimizer (<tensorflow.python.keras.optimizers.Adam object at 0x0000023A913DE470>) but is being saved in TensorFlow format with `save_weights`. The model's weights will be saved, but unlike with TensorFlow optimizers in the TensorFlow format the optimizer's state will not be saved.
Consider using a TensorFlow optimizer from `tf.train`.
WARNING:tensorflow:From C:\Users\Paperspace\Anaconda3\lib\site-packages\tensorflow\python\keras\engine\network.py:1436: update_checkpoint_state (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.train.CheckpointManager to manage checkpoints rather than manually editing the Checkpoint proto.
0/426 [..............................] - ETA: 0s - loss: 0.0000e+00 - acc: 0.0687 - val_loss: 0.0000e+00 - val_acc: 0.0000e+00
On first look, your generator looks incorrect. Keras generators need a while True: loop in them. Maybe this will work for you
def iterate_minibatches(inputs, targets, batchsize):
assert len(inputs) == len(targets)
indices = np.arange(len(inputs))
np.random.shuffle(indices)
while True:
start = 0
end = batchsize
while start < len(inputs):
excerpt = indices[start:end]
yield load_images(inputs[excerpt], targets[excerpt])
start += batchsize
end += batchsize
A Keras generator has to yield batches in an infinite loop. This change should work, otherwise you can follow a tutorial like this.
def iterate_minibatches(inputs, targets, batchsize):
assert len(inputs) == len(targets)
while True:
indices = np.arange(len(inputs))
np.random.shuffle(indices)
for start_idx in np.arange(0, len(inputs) - batchsize + 1, batchsize):
excerpt = indices[start_idx:start_idx + batchsize]
yield load_images(inputs[excerpt], targets[excerpt])