Tensorflow/Keras stops using gpu after recompiling model - tensorflow

I'm trying to train my sequential model (RNN->GRU->Dense) with Keras/TensorFlow 2.0 in two phases with different loss weights in the two phases. To change the loss weights, I need to recompile the model between the two phases. My problem is that training becomes much much slower after the recompilation, and I can see no other explanation than that the GPU is no longer used. Here is the relevant code:
# Build model
input_ = tf.keras.layers.Input(shape=(None, num_features))
masking = tf.keras.layers.Masking(mask_value=0.)(input_)
rnn = tf.keras.layers.SimpleRNN(24, return_sequences=True, name="rnn")(masking)
gru = tf.keras.layers.GRU(16, return_sequences=True, name="gru")(rnn)
dense1 = tf.keras.layers.Dense(5, activation=tf.nn.softmax, name="dense1")(gru)
dense2 = tf.keras.layers.Dense(1, activation=tf.math.sigmoid, name="dense2")(gru)
model = tf.keras.Model(inputs=[input_], outputs=[dense1, dense2])
# Learn reate scheduler: Reduce learn reate by factor 0.5 when no progress after 7 epochs
reduce_lr = tf.keras.callbacks.ReduceLROnPlateau(monitor='loss', factor=0.5, patience=7, min_lr=0.0001)
# Compile and fit, phase 1
optimizer = tf.keras.optimizers.Adam(lr=0.01, clipvalue=0.1)
model.compile(optimizer=optimizer, loss=['categorical_crossentropy', 'binary_crossentropy'], sample_weight_mode="temporal", loss_weights=[0.7, 0.3], metrics=['accuracy'])
model.fit_generator(train_generator(), steps_per_epoch=BATCHES_PER_EPOCH, epochs=375, callbacks=[reduce_lr])
# Recompile and fit, phase 2
optimizer.lr = 0.001
model.compile(optimizer=optimizer, loss=['categorical_crossentropy', 'binary_crossentropy'], sample_weight_mode="temporal", loss_weights=[0.99, 0.01], metrics=['accuracy'])
model.fit_generator(train_generator(), steps_per_epoch=BATCHES_PER_EPOCH, epochs=125, callbacks=[reduce_lr])
Output at end of phase 1 and start of phase 2 shows how training becomes about 5 times slower:
Epoch 374/375
4/4 [==============================] - 5s 1s/step - loss: 0.1177 - dense1_loss: 0.1479 - dense2_loss: 0.0473 - dense1_accuracy: 0.9249 - dense2_accuracy: 0.9784
Epoch 375/375
4/4 [==============================] - 5s 1s/step - loss: 0.1177 - dense1_loss: 0.1479 - dense2_loss: 0.0473 - dense1_accuracy: 0.9249 - dense2_accuracy: 0.9784
Epoch 1/125
4/4 [==============================] - 27s 7s/step - loss: 0.1494 - dense1_loss: 0.1504 - dense2_loss: 0.0478 - dense1_accuracy: 0.9225 - dense2_accuracy: 0.9779
Epoch 2/125
4/4 [==============================] - 24s 6s/step - loss: 0.1603 - dense1_loss: 0.1614 - dense2_loss: 0.0545 - dense1_accuracy: 0.9201 - dense2_accuracy: 0.9750
What could be the explanation? Is the model reorganized in some way when it's recompiled, so TensorFlow can no longer map the operations to the GPU?
(I have tried just changing the loss weights with model.loss_weights = [0.99, 0.01] but that doesn't work - recompilation is necessary.)

Try this:
Build two separate models with same layers (weights):
input_ = tf.keras.layers.Input(shape=(None, num_features))
masking = tf.keras.layers.Masking(mask_value=0.)(input_)
rnn = tf.keras.layers.SimpleRNN(24, return_sequences=True, name="rnn")(masking)
gru = tf.keras.layers.GRU(16, return_sequences=True, name="gru")(rnn)
dense1 = tf.keras.layers.Dense(5, activation=tf.nn.softmax, name="dense1")(gru)
dense2 = tf.keras.layers.Dense(1, activation=tf.math.sigmoid, name="dense2")(gru)
model1 = tf.keras.Model(inputs=[input_], outputs=[dense1, dense2])
model2 = tf.keras.Model(inputs=[input_], outputs=[dense1, dense2])
Compile and fit each one separately, with different optimiser instances:
optimizer1 = tf.keras.optimizers.Adam(lr=0.01, clipvalue=0.1)
optimizer2 = tf.keras.optimizers.Adam(lr=0.001, clipvalue=0.1)
model1.compile(optimizer=optimizer1, loss=['categorical_crossentropy', 'binary_crossentropy'], sample_weight_mode="temporal", loss_weights=[0.7, 0.3], metrics=['accuracy'])
model2.compile(optimizer=optimizer2, loss=['categorical_crossentropy', 'binary_crossentropy'], sample_weight_mode="temporal", loss_weights=[0.99, 0.01], metrics=['accuracy'])
model1.fit_generator(train_generator(), steps_per_epoch=BATCHES_PER_EPOCH, epochs=375, callbacks=[reduce_lr])
model2.fit_generator(train_generator(), steps_per_epoch=BATCHES_PER_EPOCH, epochs=125, callbacks=[reduce_lr])

Related

eager mode and keras.fit have different results

I am trying to convert model.fit() in Keras to the eager mode training. The model is an autoencoder. It has one encoder and two decoders. The decoders have different loss functions. The losses for decoders in eager model and model.fit are the same. I tried to set everything as the model.fit(). But the losses are different. I really appreciate help me out.
The link for google colab: https://colab.research.google.com/drive/1XNOwJ9oVgs1z9qqXIs_ldnKuSm3Dn2Ud?usp=sharing
In the following, the definition and training of the model are shown. I use model.fit() for training. Also, in the end, the output is shown, which shows the values for losses.
def fit_ae (x_unlab, p_m, alpha, parameters):
# Parameters
_, dim = x_unlab.shape
epochs = parameters['epochs']
batch_size = parameters['batch_size']
# Build model
inputs = contrib_layers.Input(shape=(dim,))
# Encoder
h = contrib_layers.Dense(int(256), activation='relu', name='encoder1')(inputs)
h = contrib_layers.Dense(int(128), activation='relu', name='encoder2')(h)
h = contrib_layers.Dense(int(26), activation='relu', name='encoder3')(h)
# Mask estimator
output_1 = contrib_layers.Dense(dim, activation='sigmoid', name = 'mask')(h)
# Feature estimator
output_2 = contrib_layers.Dense(dim, activation='sigmoid', name = 'feature')(h)
#Projection Network
model = Model(inputs = inputs, outputs = [output_1, output_2])
model.compile(optimizer='rmsprop',
loss={'mask': 'binary_crossentropy',
'feature': 'mean_squared_error'},
loss_weights={'mask':1, 'feature':alpha})
m_unlab = mask_generator(p_m, x_unlab)
m_label, x_tilde = pretext_generator(m_unlab, x_unlab)
# Fit model on unlabeled data
model.fit(x_tilde, {'mask': m_label, 'feature': x_unlab}, epochs = epochs, batch_size= batch_size)
########### OUTPUT
Epoch 1/15
4/4 [==============================] - 1s 32ms/step - loss: 1.0894 - mask_loss: 0.6560 - feature_loss: 0.2167
Epoch 2/15
4/4 [==============================] - 0s 23ms/step - loss: 0.6923 - mask_loss: 0.4336 - feature_loss: 0.1293
Epoch 3/15
4/4 [==============================] - 0s 26ms/step - loss: 0.4720 - mask_loss: 0.3022 - feature_loss: 0.0849
Epoch 4/15
4/4 [==============================] - 0s 23ms/step - loss: 0.4054 - mask_loss: 0.2581 - feature_loss: 0.0736
In the following code, I implemented the above code in eager mode. I set all optimizer and loss functions same as the above code. Data are the same for training both model.
###################################################### MODEL AUTOENCODER ============================================
def eager_ae(x_unlab,p_m,alpha,parameters):
# import pdb; pdb.set_trace()
_, dim = x_unlab.shape
epochs = parameters['epochs']
batch_size = parameters['batch_size']
E = keras.Sequential([
Input(shape=[dim,]),
Dense(256,activation='relu'),
Dense(128,activation='relu'),
Dense(26,activation='relu'),
])
# Mask estimator
output_1 = keras.Sequential([
Dense(dim,activation='sigmoid'),
])
# Feature estimator
output_2 = keras.Sequential([
Dense(dim,activation='sigmoid'),
])
optimizer = tf.keras.optimizers.RMSprop()
loss_mask = tf.keras.losses.BinaryCrossentropy()
loss_feature = tf.keras.losses.MeanSquaredError()
# Generate corrupted samples
m_unlab = mask_generator(p_m, x_unlab)
m_label, x_tilde = pretext_generator(m_unlab, x_unlab)
for epoch in range(epochs):
loss_metric = tf.keras.metrics.Mean(name='train_loss')
len_batch = range(int(x_unlab.shape[0]/batch_size))
for i in len_batch:
samples = x_tilde[i*batch_size:(i+1)*batch_size]
mask = m_label[i*batch_size:(i+1)*batch_size]
# train_step(samples,tgt)
with tf.GradientTape() as tape:
latent = E(samples, training=True)
out_mask = output_1(latent)
out_feat = output_2(latent)
# import pdb; pdb.set_trace()
lm = loss_mask(out_mask,tf.Variable(mask,dtype=tf.float32))
lf = loss_feature(out_feat,tf.Variable(samples,dtype=tf.float32))
pred_loss = lm + alpha*lf
trainable_vars = E.trainable_weights+output_1.trainable_weights+output_2.trainable_weights
grads = tape.gradient(pred_loss, trainable_vars)
optimizer.apply_gradients(zip(grads, trainable_vars))
loss_metric.update_state(pred_loss)
print(f'Epoch {epoch}, Loss {loss_metric.result()}')
return E
############# OUTPUT
Epoch 0, Loss 7.902271747589111
Epoch 1, Loss 5.336598873138428
Epoch 2, Loss 2.880791664123535
Epoch 3, Loss 1.9296690225601196
Epoch 4, Loss 1.6377944946289062
Epoch 5, Loss 1.5342860221862793
Epoch 6, Loss 1.5015968084335327
Epoch 7, Loss 1.4912563562393188
The total loss in the first code is less than zero (≈0.25), while the total loss in the second code is more than 1 (≈1.3). I can not find the issue in my second implementation (the second code).

How to reduce false positives and false negatives on train set in deep learning

I have training a deep neural network for classification task on my machine learning dataset.
On train as well as test set below are the observations:
For every true positive there are approx 3 false positive.
For approx 4 true negatives there is 1 false negatives
The data is scaled with standardscaler and then clipped between -5 and 5.
Below are observations while training.
382/382 [==============================] - 3s 9ms/step - loss: 0.6897 - tp: 84096.0000 - fp: 244779.0000 - tn: 355888.0000 - fn: 97448.0000 - accuracy: 0.5625 - precision: 0.2557 - recall: 0.4632 - auc: 0.5407 - prc: 0.2722
val_loss: 0.6838 - val_tp: 19065.0000 - val_fp: 56533.0000 - val_tn: 91902.0000 - val_fn: 23829.0000 - val_accuracy: 0.5800 - val_precision: 0.2522 - val_recall: 0.4445 - val_auc: 0.5468 - val_prc: 0.2722
Can someone expert please help to let me know what can I do to minimise false classification on train as well as test set.
I am using imbalanced dataset with class_weight as shown in below code:-
METRICS = [
keras.metrics.TruePositives(name='tp'),
keras.metrics.FalsePositives(name='fp'),
keras.metrics.TrueNegatives(name='tn'),
keras.metrics.FalseNegatives(name='fn'),
keras.metrics.BinaryAccuracy(name='accuracy'),
keras.metrics.Precision(name='precision'),
keras.metrics.Recall(name='recall'),
keras.metrics.AUC(name='auc'),
keras.metrics.AUC(name='prc', curve='PR'), # precision-recall curve
]
pos = sum(y_train)
neg = y_train.shape[0] - pos
total = y_train.shape[0]
weight_for_0 = (1 / neg) * (total / 2.0)
weight_for_1 = (1 / pos) * (total / 2.0)
class_weight = {0: weight_for_0, 1: weight_for_1}
def make_model(size, layers, metrics=METRICS, output_bias=None):
if output_bias is not None:
output_bias = tf.keras.initializers.Constant(output_bias)
model = keras.Sequential()
model.add(keras.layers.Dense(size,input_shape=(window_length*indicators,)))
model.add(keras.layers.Dropout(0.5))
for i in range(layers-1):
model.add(keras.layers.Dense(size))
model.add(keras.layers.Dropout(0.5))
model.add(keras.layers.Dense(1, activation = "sigmoid", bias_initializer=output_bias))
model.compile(
optimizer=keras.optimizers.Adam(learning_rate=0.0001),
loss=tf.keras.losses.BinaryCrossentropy(),
metrics=metrics)
return model
EPOCHS = 100
BATCH_SIZE = 2048
early_stopping = tf.keras.callbacks.EarlyStopping(
monitor='val_prc',
verbose=1,
patience=10,
mode='max',
restore_best_weights=True)
model = make_model(size = size,layers=layers, output_bias=np.log(pos/neg))
history = model.fit(X_train,y_train, batch_size=BATCH_SIZE, epochs=EPOCHS,callbacks=[early_stopping],validation_data=(X_test, y_test_pos)
,class_weight=class_weight
)
Can somebody please help

What does it mean when the loss starts going up again?

I am running the code from https://www.tensorflow.org/tutorials/text/text_generation. I will copy it at the bottom of the question. If I change the EPOCHS line to
EPOCHS = 100
something odd happens to the loss. It starts by going down, as in:
Epoch 1/100
172/172 [==============================] - 301s 2s/step - loss: 2.7219
Epoch 2/100
172/172 [==============================] - 328s 2s/step - loss: 1.9963
Epoch 3/100
172/172 [==============================] - 344s 2s/step - loss: 1.7313
Epoch 4/100
172/172 [==============================] - 321s 2s/step - loss: 1.5778
Epoch 5/100
172/172 [==============================] - 325s 2s/step - loss: 1.4840
reaching it's lowest level at Epoch 46/100 when the loss is 0.6233. It then goes back up again finishing with:
Epoch 96/100
172/172 [==============================] - 292s 2s/step - loss: 0.8749
Epoch 97/100
172/172 [==============================] - 292s 2s/step - loss: 0.8933
Epoch 98/100
172/172 [==============================] - 292s 2s/step - loss: 0.9073
Epoch 99/100
172/172 [==============================] - 292s 2s/step - loss: 0.9181
Epoch 100/100
172/172 [==============================] - 292s 2s/step - loss: 0.9298
Why is it doing this and what does it mean?
import tensorflow as tf
import numpy as np
import os
import time
path_to_file = tf.keras.utils.get_file('shakespeare.txt', 'https://storage.googleapis.com/download.tensorflow.org/data/shakespeare.txt')
# Read, then decode for py2 compat.
text = open(path_to_file, 'rb').read().decode(encoding='utf-8')
# length of text is the number of characters in it
print('Length of text: {} characters'.format(len(text)))
# Take a look at the first 250 characters in text
print(text[:250])
# The unique characters in the file
vocab = sorted(set(text))
print('{} unique characters'.format(len(vocab)))
# Creating a mapping from unique characters to indices
char2idx = {u:i for i, u in enumerate(vocab)}
idx2char = np.array(vocab)
text_as_int = np.array([char2idx[c] for c in text])
print('{')
for char,_ in zip(char2idx, range(20)):
print(' {:4s}: {:3d},'.format(repr(char), char2idx[char]))
print(' ...\n}')
# Show how the first 13 characters from the text are mapped to integers
print('{} ---- characters mapped to int ---- > {}'.format(repr(text[:13]), text_as_int[:13]))
# The maximum length sentence you want for a single input in characters
seq_length = 100
examples_per_epoch = len(text)//(seq_length+1)
# Create training examples / targets
char_dataset = tf.data.Dataset.from_tensor_slices(text_as_int)
for i in char_dataset.take(5):
print(idx2char[i.numpy()])
sequences = char_dataset.batch(seq_length+1, drop_remainder=True)
for item in sequences.take(5):
print(repr(''.join(idx2char[item.numpy()])))
def split_input_target(chunk):
input_text = chunk[:-1]
target_text = chunk[1:]
return input_text, target_text
dataset = sequences.map(split_input_target)
for input_example, target_example in dataset.take(1):
print('Input data: ', repr(''.join(idx2char[input_example.numpy()])))
print('Target data:', repr(''.join(idx2char[target_example.numpy()])))
for i, (input_idx, target_idx) in enumerate(zip(input_example[:5], target_example[:5])):
print("Step {:4d}".format(i))
print(" input: {} ({:s})".format(input_idx, repr(idx2char[input_idx])))
print(" expected output: {} ({:s})".format(target_idx, repr(idx2char[target_idx])))
# Batch size
BATCH_SIZE = 64
# Buffer size to shuffle the dataset
# (TF data is designed to work with possibly infinite sequences,
# so it doesn't attempt to shuffle the entire sequence in memory. Instead,
# it maintains a buffer in which it shuffles elements).
BUFFER_SIZE = 10000
dataset = dataset.shuffle(BUFFER_SIZE).batch(BATCH_SIZE, drop_remainder=True)
dataset
# Length of the vocabulary in chars
vocab_size = len(vocab)
# The embedding dimension
embedding_dim = 256
# Number of RNN units
rnn_units = 1024
def build_model(vocab_size, embedding_dim, rnn_units, batch_size):
model = tf.keras.Sequential([
tf.keras.layers.Embedding(vocab_size, embedding_dim,
batch_input_shape=[batch_size, None]),
tf.keras.layers.GRU(rnn_units,
return_sequences=True,
stateful=True,
recurrent_initializer='glorot_uniform'),
tf.keras.layers.Dense(vocab_size)
])
return model
model = build_model(
vocab_size=len(vocab),
embedding_dim=embedding_dim,
rnn_units=rnn_units,
batch_size=BATCH_SIZE)
for input_example_batch, target_example_batch in dataset.take(1):
example_batch_predictions = model(input_example_batch)
print(example_batch_predictions.shape, "# (batch_size, sequence_length, vocab_size)")
model.summary()
sampled_indices = tf.random.categorical(example_batch_predictions[0], num_samples=1)
sampled_indices = tf.squeeze(sampled_indices,axis=-1).numpy()
sampled_indices
print("Input: \n", repr("".join(idx2char[input_example_batch[0]])))
print()
print("Next Char Predictions: \n", repr("".join(idx2char[sampled_indices ])))
def loss(labels, logits):
return tf.keras.losses.sparse_categorical_crossentropy(labels, logits, from_logits=True)
example_batch_loss = loss(target_example_batch, example_batch_predictions)
print("Prediction shape: ", example_batch_predictions.shape, " # (batch_size, sequence_length, vocab_size)")
print("scalar_loss: ", example_batch_loss.numpy().mean())
model.compile(optimizer='adam', loss=loss)
# Directory where the checkpoints will be saved
checkpoint_dir = './training_checkpoints'
# Name of the checkpoint files
checkpoint_prefix = os.path.join(checkpoint_dir, "ckpt_{epoch}")
checkpoint_callback = tf.keras.callbacks.ModelCheckpoint(
filepath=checkpoint_prefix,
save_weights_only=True)
EPOCHS = 100
history = model.fit(dataset, epochs=EPOCHS, callbacks=[checkpoint_callback])
tf.train.latest_checkpoint(checkpoint_dir)
model = build_model(vocab_size, embedding_dim, rnn_units, batch_size=1)
model.load_weights(tf.train.latest_checkpoint(checkpoint_dir))
model.build(tf.TensorShape([1, None]))
model.summary()
def generate_text(model, start_string):
# Evaluation step (generating text using the learned model)
# Number of characters to generate
num_generate = 1000
# Converting our start string to numbers (vectorizing)
input_eval = [char2idx[s] for s in start_string]
input_eval = tf.expand_dims(input_eval, 0)
# Empty string to store our results
text_generated = []
# Low temperature results in more predictable text.
# Higher temperature results in more surprising text.
# Experiment to find the best setting.
temperature = 1.0
# Here batch size == 1
model.reset_states()
for i in range(num_generate):
predictions = model(input_eval)
# remove the batch dimension
predictions = tf.squeeze(predictions, 0)
# using a categorical distribution to predict the character returned by the model
predictions = predictions / temperature
predicted_id = tf.random.categorical(predictions, num_samples=1)[-1,0].numpy()
# Pass the predicted character as the next input to the model
# along with the previous hidden state
input_eval = tf.expand_dims([predicted_id], 0)
text_generated.append(idx2char[predicted_id])
return (start_string + ''.join(text_generated))
print(generate_text(model, start_string=u"ROMEO: "))
This particular model can't fit any better than this, since it is limited to its architecture and only one symbol generation per step.
A loss steadily going up after some epochs is a usual thing indicating your model overtrains, and there is no point in training any further.
You could tune hyperparameters to (possibly) make some minor improvements.
Edit:
To tune embedding dimensions, rnn units, and sequence length change those values:
seq_length = 100
embedding_dim = 256
rnn_units = 1024
To tune learning rate replace this lane:
model.compile(optimizer='adam', loss=loss)
with this one:
model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=0.005), loss=loss)
Also, you can add arbitrary layers to build_model function.
Here is an example with an extra GRU layer:
def build_model(vocab_size, embedding_dim, rnn_units, batch_size):
model = tf.keras.Sequential([
tf.keras.layers.Embedding(vocab_size, embedding_dim,
batch_input_shape=[batch_size, None]),
tf.keras.layers.GRU(rnn_units,
return_sequences=True,
stateful=True,
recurrent_initializer='glorot_uniform'),
tf.keras.layers.GRU(rnn_units,
return_sequences=True,
stateful=True,
recurrent_initializer='glorot_uniform'),
tf.keras.layers.Dense(vocab_size)
])
return model

After saving checkpoint with ModelCeckpoint, Keras stopped training process

I am training CNN with tf.keras. After of saving checkpoint Keras didn't start next epoch
Note:
1)As a saver was used tf.keras.callbacks.ModelCeckpoint
2)For training used fit_generator()
def iterate_minibatches(inputs, targets, batchsize):
assert len(inputs) == len(targets)
indices = np.arange(len(inputs))
np.random.shuffle(indices)
for start_idx in np.arange(0, len(inputs) - batchsize + 1, batchsize):
excerpt = indices[start_idx:start_idx + batchsize]
yield load_images(inputs[excerpt], targets[excerpt])
#Model path
model_path = "C:/Users/Paperspace/Desktop/checkpoints/cp.ckpt"
#saver = tf.train.Saver(max_to_keep=3)
cp_callback = tf.keras.callbacks.ModelCheckpoint(model_path,
verbose=1,
save_weights_only=True,
period=2)
tb_callback =TensorBoard(log_dir="./Graph/{}".format(time()))
batch_size = 750
history = model.fit_generator(generator=iterate_minibatches(X_train, Y_train,batch_size),
validation_data=iterate_minibatches(X_test, Y_test, batch_size),
# validation_data=None,
steps_per_epoch=len(X_train)//batch_size,
validation_steps=len(X_test)//batch_size,
verbose=1,
epochs=30,
callbacks=[cp_callback,tb_callback]
)
Actual result it stops training without any issue.
Expected result to go next epoch.
**Log**
Epoch 1/30
53/53 [==============================] - 919s 17s/step - loss: 1.2445 - acc: 0.0718
426/426 [==============================] - 7058s 17s/step - loss: 1.7877 - acc: 0.0687 - val_loss: 1.2445 - val_acc: 0.0718
Epoch 2/30
WARNING:tensorflow:Your dataset iterator ran out of data.
Epoch 00002: saving model to C:/Users/Paperspace/Desktop/checkpoints/cp.ckpt
WARNING:tensorflow:This model was compiled with a Keras optimizer (<tensorflow.python.keras.optimizers.Adam object at 0x0000023A913DE470>) but is being saved in TensorFlow format with `save_weights`. The model's weights will be saved, but unlike with TensorFlow optimizers in the TensorFlow format the optimizer's state will not be saved.
Consider using a TensorFlow optimizer from `tf.train`.
WARNING:tensorflow:From C:\Users\Paperspace\Anaconda3\lib\site-packages\tensorflow\python\keras\engine\network.py:1436: update_checkpoint_state (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.train.CheckpointManager to manage checkpoints rather than manually editing the Checkpoint proto.
0/426 [..............................] - ETA: 0s - loss: 0.0000e+00 - acc: 0.0687 - val_loss: 0.0000e+00 - val_acc: 0.0000e+00
On first look, your generator looks incorrect. Keras generators need a while True: loop in them. Maybe this will work for you
def iterate_minibatches(inputs, targets, batchsize):
assert len(inputs) == len(targets)
indices = np.arange(len(inputs))
np.random.shuffle(indices)
while True:
start = 0
end = batchsize
while start < len(inputs):
excerpt = indices[start:end]
yield load_images(inputs[excerpt], targets[excerpt])
start += batchsize
end += batchsize
A Keras generator has to yield batches in an infinite loop. This change should work, otherwise you can follow a tutorial like this.
def iterate_minibatches(inputs, targets, batchsize):
assert len(inputs) == len(targets)
while True:
indices = np.arange(len(inputs))
np.random.shuffle(indices)
for start_idx in np.arange(0, len(inputs) - batchsize + 1, batchsize):
excerpt = indices[start_idx:start_idx + batchsize]
yield load_images(inputs[excerpt], targets[excerpt])

Val_loss in Keras using TFRecord

I was building a model in Keras using Tensorflow's Dataset function and TFRecord. I succeeded in the training model with Keras, but the problem lies in val_loss. It is not showing at all in Keras's progress bar.
if __name__ == '__main__':
x_train,y_train = input_fn('train_whale_without07.tfrecords')
x_test,y_test = input_fn('test_whale_without07.tfrecords')
img_input = layers.Input(tensor = x_train)
model = CNN(img_input)
model.compile(optimizer=Adam(lr=0.001), loss='categorical_crossentropy',
metrics=[categorical_crossentropy, categorical_accuracy],
target_tensors=[y_train])
model.fit(steps_per_epoch=3000, epochs=EPOCHS, batch_size=None, verbose=1, validation_data = ([x_test,y_test]))
model.save('my_model_keras.h5')
The results are like this
Epoch 1/15
1/3000 [..............................] - ETA: 00:05:12 - loss: 8.1786 - categorical_crossentropy: 8.1786 - categorical_accuracy: 0.0000e+00
Anybpdy know how to add val_loss?
Validation loss and metrics are only computed at the end of an epoch, not during training. So it won't be shown while iterating batches on the training set, only at the end of the epoch.