AlreadyExistsError while training a network on colab - tensorflow

I'm trying to train an LSTMs network on Google Colab. However, this error occurs:
AlreadyExistsError: Resource __per_step_116/training_4/Adam/gradients/bidirectional_4/while/ReadVariableOp/Enter_grad/ArithmeticOptimizer/AddOpsRewrite_Add/tmp_var/N10tensorflow19TemporaryVariableOp6TmpVarE
[[{{node training_4/Adam/gradients/bidirectional_4/while/ReadVariableOp/Enter_grad/ArithmeticOptimizer/AddOpsRewrite_Add/tmp_var}}]]
I don't know where can be the issue. This is the model of the network:
sl_model = keras.models.Sequential()
sl_model.add(keras.layers.Embedding(max_index+1, hidden_size, mask_zero=True))
sl_model.add(keras.layers.Bidirectional(keras.layers.LSTM(hidden_size,
activation='tanh', dropout=0.2, recurrent_dropout = 0.2, return_sequences=True)))
sl_model.add(keras.layers.Bidirectional(keras.layers.LSTM(hidden_size, activation='tanh', dropout=0.2, recurrent_dropout = 0.2, return_sequences=False))
)
sl_model.add(keras.layers.Dense(max_length, activation='softsign'))
optimizer = keras.optimizers.Adam()
sl_model.compile(loss='binary_crossentropy', optimizer=optimizer, metrics=['acc'])
batch_size = 128
epochs = 3
cbk = keras.callbacks.TensorBoard("logging/keras_model")
print("\nStarting training...")
sl_model.fit(x_train, y_train, epochs=epochs, batch_size=batch_size,
shuffle=True, validation_data=(x_dev, y_dev), callbacks=[cbk])
Thank you so much!

You need to restart your runtime -- this happens when you have defined multiple graphs built in a single jupyter (Colaboratory) runtime.
Calling tf.reset_default_graph() may also help, but depending on whether you are using eager exection and how you've defined your sessions this may or may not work.

Related

why am I getting error in transfer learning?

I am training a model for Optical Character Recognition of Gujarati Language. The input image is a character image. I have taken 37 classes. Total training images are 22200 (600 per class) and testing images are 5920 (160 per class). My input images are 32x32
Below is my code:
model = tf.keras.applications.DenseNet121(include_top=False, weights='imagenet', pooling='max')
base_inputs = model.layers[0].input
base_outputs = model.layers[-1].output # NOTICE -1 not -2
prefinal_outputs = layers.Dense(1024)(base_outputs)
final_outputs = layers.Dense(37)(prefinal_outputs)
new_model = keras.Model(inputs=base_inputs, outputs=base_outputs)
from tensorflow.keras.preprocessing.image import ImageDataGenerator
train_datagen = ImageDataGenerator(
shear_range=0.2,
zoom_range=0.2,
horizontal_flip=False)
test_datagen = ImageDataGenerator(horizontal_flip = False)
training_set = train_datagen.flow_from_directory('C:/Users/shweta/Desktop/characters/train',
target_size = (32, 32),
batch_size = 64,
class_mode = 'categorical')
test_set = test_datagen.flow_from_directory('C:/Users/shweta/Desktop/characters/test',
target_size = (32, 32),
batch_size = 64,
class_mode = 'categorical')
new_model.compile(optimizer = 'adam', loss = 'categorical_crossentropy', metrics = ['accuracy'])
new_model.fit_generator(training_set,
epochs = 25,
validation_data = test_set, shuffle=True)
new_model.save('alphanumeric.mod')
I am getting following output:
Thanks in advance!
First of all, very well written code.
These are some of the things, I have noticed while I was going through the code and tf,keras docs.
I would like to ask what kind of labels have you got beacuse you know categorical_crossentropy expects ONE HOT CODED labels.(Check this).So, if your labels are integers, use sparsecategoricalentropy.
Similar issue
There was post where someone was trying to classsify into 2 and used categorical instead of binary crossentropy. If you want to look at.
Cheers
Let me know how it goes!
PS: #gerry made a very good point and if labels are One hot encoded use categoricalcrossentropy!
The code should be:
model = tf.keras.applications.DenseNet121(include_top=False, weights='imagenet, pooling='max', input_shape=(32,32,3))
base_outputs = model.layers[-1].output
prefinal_outputs = layers.Dense(1024)(base_outputs)
final_outputs = layers.Dense(37)(prefinal_outputs)
new_model = keras.Model(inputs=model.input, outputs=final_outputs)
new_model.compile(Adam(), loss='categorical_crossentropy', metrics=['accuracy'])
Also you should use model.fit in the future. Model.fit can now work with generators and model.fit_generator will be depreciate in future versions of tensorflow. I ran against your dataset and got accurate results in about 10 epochs. Here is some additional advice. It is best to use and adjustable learning rate. The keras callback ReduceLROnPlateau makes this easy to do. Documentation is here. Set it to monitor the validation loss. My use is shown below.
lr_adjust=tf.keras.callbacks.ReduceLROnPlateau( monitor="val_loss", factor=0.5, patience=1, verbose=1, mode="auto",
min_delta=0.00001, cooldown=0, min_lr=0)
Also I recommend using the callback ModelCheckpoint. Documentation is here. Set it up to monitor validation loss and it will save the weights that achieved the lowest validation loss. My implementation is shown below.
sav_loc=r'c:\Temp' # set this to the path where you want to save the weights
checkpoint=tf.keras.callbacks.ModelCheckpoint(filepath=save_loc, monitor='val_loss', verbose=1, save_best_only=True,
save_weights_only=True, mode='auto', save_freq='epoch', options=None)
callbacks=[checkpoint, lr_adjust]
In model.fit include callbacks=callbacks. When training is completed you want to load these saved weights into the model, then save the model. You can use the saved model to make predictions. Code is below.
model.load_weights(save_loc)
model.save(save_loc)

Tensorflow TensorBoard not showing acc, loss, acc_val, and loss_val only only epoch_accuracy and epoch_loss

I am looking to have a TensorBoard display graphs corresponding to acc, loss, acc_val, and loss_val but they do not appear for some reason. Here is what I am seeing.
I am looking to have this:
I am following the instruction here to be able to use tensorboard in a google colab notebook
This is the code used to generate the tensorboard:
opt = tf.keras.optimizers.Adam(lr=0.001, decay=1e-6)
tensorboard = TensorBoard(log_dir="logs/{}".format(NAME),
histogram_freq=1,
write_graph=True,
write_grads=True,
batch_size=BATCH_SIZE,
write_images=True)
model.compile(
loss='sparse_categorical_crossentropy',
optimizer=opt,
metrics=['accuracy']
)
# Train model
history = model.fit(
train_x, train_y,
batch_size=BATCH_SIZE,
epochs=EPOCHS,
validation_data=(validation_x, validation_y),
callbacks=[tensorboard]
)
How do I go about solving this issue? Any ideas? Your help is very much appreciated!
That's the intended behavior. If you want to log custom scalars such as a dynamic learning rate, you need to use the TensorFlow Summary API.
Retrain the regression model and log a custom learning rate. Here's how:
Create a file writer, using tf.summary.create_file_writer().
Define a custom learning rate function. This will be passed to the Keras LearningRateScheduler callback.
Inside the learning rate function, use tf.summary.scalar() to log the custom learning rate.
Pass the LearningRateScheduler callback to Model.fit().
In general, to log a custom scalar, you need to use tf.summary.scalar() with a file writer. The file writer is responsible for writing data for this run to the specified directory and is implicitly used when you use the tf.summary.scalar().
logdir = "logs/scalars/" + datetime.now().strftime("%Y%m%d-%H%M%S")
file_writer = tf.summary.create_file_writer(logdir + "/metrics")
file_writer.set_as_default()
def lr_schedule(epoch):
"""
Returns a custom learning rate that decreases as epochs progress.
"""
learning_rate = 0.2
if epoch > 10:
learning_rate = 0.02
if epoch > 20:
learning_rate = 0.01
if epoch > 50:
learning_rate = 0.005
tf.summary.scalar('learning rate', data=learning_rate, step=epoch)
return learning_rate
lr_callback = keras.callbacks.LearningRateScheduler(lr_schedule)
tensorboard_callback = keras.callbacks.TensorBoard(log_dir=logdir)
model = keras.models.Sequential([
keras.layers.Dense(16, input_dim=1),
keras.layers.Dense(1),
])
model.compile(
loss='mse', # keras.losses.mean_squared_error
optimizer=keras.optimizers.SGD(),
)
training_history = model.fit(
x_train, # input
y_train, # output
batch_size=train_size,
verbose=0, # Suppress chatty output; use Tensorboard instead
epochs=100,
validation_data=(x_test, y_test),
callbacks=[tensorboard_callback, lr_callback],
)

Tensorflow Hub vs Keras application - performance drop

I have image classification problem and i want to use Keras pretrained models for this task.
When I use such a model
model = tf.keras.Sequential([
hub.KerasLayer("https://tfhub.dev/google/tf2-preview/mobilenet_v2/feature_vector/4",
output_shape=[1280],
trainable=False),
tf.keras.layers.Dropout(0.5),
tf.keras.layers.Dense(num_classes, activation='softmax')
])
model.build([None, image_size[0], image_size[1], 3])
model.compile(
optimizer=tf.keras.optimizers.Adam(),
loss='categorical_crossentropy',
metrics=['acc'])
I easily get ~90% accuracy and very low loss on balanced dataset. However, if use keras.application like that:
`base_model = tf.keras.applications.mobilenet_v2.MobileNetV2(
input_shape=input_img_size,
include_top=False,
weights='imagenet'
)
base_model.trainable = False
model = tf.keras.layers.Dropout(0.5)(model)
model = tf.keras.layers.Dense(num_classes, activation='softmax')(model)
model = tf.keras.models.Model(inputs=base_model.input, outputs=model)
model.compile(
optimizer=tf.keras.optimizers.Adam(),
loss='categorical_crossentropy',
metrics=['acc'])`
and use a proper tf.keras.application.mobilenet_v2.preprocess_input function in datagenerator (and leaving everything else the same) it is stuck at around 60% validation and 80% training.
what is the difference between these approaches? why one is superior to the other?
The data generator:
datagen = tf.keras.preprocessing.image.ImageDataGenerator(
preprocessing_function = preprocessing_function,
rotation_range=10,
zoom_range=0.3,
width_shift_range=0.2,
height_shift_range=0.2,
horizontal_flip=True,
vertical_flip=True,
shear_range=0.2,
)
Training:
history = model.fit_generator(
train_generator,
epochs=nb_epochs,
verbose=1,
steps_per_epoch=steps_per_epoch,
validation_data=valid_generator,
validation_steps=val_steps_per_epoch,
callbacks=[
checkpoint,
learning_rate_reduction,
csv_logger,
tensorboard_callback,
],
)
I believe you are training two different 'models'. In your TensorFlow Hub example, you used mobilenet's feature vector. Feature vector as I understand it, is not the same as a model. It is a 1-D tensor of certain length. It is probably the last layer before the output of the mobilenet model. This is different from the tf.keras example, where you are invoking the full mobilenet model.

Tensorflow dense layers worse than keras sequential

I try to train an agent on the inverse-pendulum (similar to cart-pole) problem, which is a benchmark of reinforcement learning. I use neural-fitted-Q-iteration algorithm which uses a multi-layer neural network to evaluate the Q function.
I use Keras.Sequential and tf.layers.dense to build the neural network repectively, and leave all other things to be the same. However, Keras gives me a good results and tensorflow does not. In fact, tensorflow doesn't work at all with its loss being increasing and the agent learns nothing from the training.
Here I present the code for Keras as follows
def build_model():
model = Sequential()
model.add(Dense(5, input_dim=3))
model.add(Activation('sigmoid'))
model.add(Dense(5))
model.add(Activation('sigmoid'))
model.add(Dense(1))
model.add(Activation('sigmoid'))
adam = Adam(lr=1E-3)
model.compile(loss='mean_squared_error', optimizer=adam)
return model
and the tensorflow version is
class NFQ_fit(object):
"""
neural network approximator for NFQ iteration
"""
def __init__(self, sess, N_feature, learning_rate=1E-3, batch_size=100):
self.sess = sess
self.N_feature = N_feature
self.learning_rate = learning_rate
self.batch_size = batch_size
# DNN structure
self.inputs = tf.placeholder(tf.float32, [None, N_feature], 'inputs')
self.labels = tf.placeholder(tf.float32, [None, 1], 'labels')
self.l1 = tf.layers.dense(inputs=self.inputs,
units=5,
activation=tf.sigmoid,
use_bias=True,
kernel_initializer=tf.truncated_normal_initializer(0.0, 1E-2),
bias_initializer=tf.constant_initializer(0.0),
kernel_regularizer=tf.contrib.layers.l2_regularizer(1E-4),
name='hidden-layer-1')
self.l2 = tf.layers.dense(inputs=self.l1,
units=5,
activation=tf.sigmoid,
use_bias=True,
kernel_initializer=tf.truncated_normal_initializer(0.0, 1E-2),
bias_initializer=tf.constant_initializer(0.0),
kernel_regularizer=tf.contrib.layers.l2_regularizer(1E-4),
name='hidden-layer-2')
self.outputs = tf.layers.dense(inputs=self.l2,
units=1,
activation=tf.sigmoid,
use_bias=True,
kernel_initializer=tf.truncated_normal_initializer(0.0, 1E-2),
bias_initializer=tf.constant_initializer(0.0),
kernel_regularizer=tf.contrib.layers.l2_regularizer(1E-4),
name='outputs')
# optimization
# self.mean_loss = tf.losses.mean_squared_error(self.labels, self.outputs)
self.mean_loss = tf.reduce_mean(tf.square(self.labels-self.outputs))
self.regularization_loss = tf.losses.get_regularization_loss()
self.loss = self.mean_loss # + self.regularization_loss
self.train_op = tf.train.AdamOptimizer(learning_rate=self.learning_rate).minimize(self.loss)
The two models are the same. Both of them has two hidden layers with the same dimension. I expect that the problems may come from the kernel initialization but I don't know how to fix it.
Using Keras is great. If you want better TensorFlow integration check out tf.keras. There's no particular reason to use tf.layers if the Keras (or tf.keras) defaults work better.
In this case glorot_uniform looks like the default initializer. This is also the global TensorFlow default, so consider removing the kernel_initializer argument instead of the explicit truncated normal initialization in your question (or passing Glorot explicitly).

Google Colab TPU takes more time than GPU

Below is the code I am using. I commented out the line to convert my model to the TPU model. With GPU for the same amount of data it's taking 7 seconds for an epoch while using TPU it takes 90 secs.
Inp = tf.keras.Input(name='input', shape=(input_dim,), dtype=tf.float32)
x = tf.keras.layers.Dense(900, kernel_initializer='uniform', activation='relu', input_dim=input_dim, name = 'Dense_01')(Inp)
x = tf.keras.layers.Dropout(0.3, name = 'Dropout_02')(x)
output = tf.keras.layers.Dense(stop_criteria, activation='softmax',name = 'Dense_02')(x)
model = tf.keras.Model(inputs=[Inp], outputs=[output])
opt = tf.train.AdamOptimizer(.001)
model.compile(optimizer=opt, loss='categorical_crossentropy', metrics=['acc'])
'''tpu_model = tf.contrib.tpu.keras_to_tpu_model(model,
strategy=tf.contrib.tpu.TPUDistributionStrategy(
tf.contrib.cluster_resolver.TPUClusterResolver(TPU_ADDRESS)))'''
model.fit(X_tra, y_tra, epochs=5, batch_size=batch_size, shuffle=False,
validation_split=0.1, verbose=2)
Here is the link to the notebook
Have you tried the tpu_model.fit_generator method like in the example below?
The other part looks fine.
Also, one problem could be the use of Adam Optimizer. There was smth. about it, but I forgot where the link is. Try another optimizer and the code below and if a different optimizer worked, you know it must be smth. with the Adam Optimizer.
tf.keras.backend.clear_session()
training_model = lstm_model(seq_len=100, batch_size=128, stateful=False)
tpu_model = tf.contrib.tpu.keras_to_tpu_model(
training_model,
strategy=tf.contrib.tpu.TPUDistributionStrategy(
tf.contrib.cluster_resolver.TPUClusterResolver(TPU_WORKER)))
tpu_model.fit_generator(
training_generator(seq_len=100, batch_size=1024),
steps_per_epoch=100,
epochs=10,
)
tpu_model.save_weights('/tmp/bard.h5', overwrite=True)