Google Colab TPU takes more time than GPU - tensorflow

Below is the code I am using. I commented out the line to convert my model to the TPU model. With GPU for the same amount of data it's taking 7 seconds for an epoch while using TPU it takes 90 secs.
Inp = tf.keras.Input(name='input', shape=(input_dim,), dtype=tf.float32)
x = tf.keras.layers.Dense(900, kernel_initializer='uniform', activation='relu', input_dim=input_dim, name = 'Dense_01')(Inp)
x = tf.keras.layers.Dropout(0.3, name = 'Dropout_02')(x)
output = tf.keras.layers.Dense(stop_criteria, activation='softmax',name = 'Dense_02')(x)
model = tf.keras.Model(inputs=[Inp], outputs=[output])
opt = tf.train.AdamOptimizer(.001)
model.compile(optimizer=opt, loss='categorical_crossentropy', metrics=['acc'])
'''tpu_model = tf.contrib.tpu.keras_to_tpu_model(model,
strategy=tf.contrib.tpu.TPUDistributionStrategy(
tf.contrib.cluster_resolver.TPUClusterResolver(TPU_ADDRESS)))'''
model.fit(X_tra, y_tra, epochs=5, batch_size=batch_size, shuffle=False,
validation_split=0.1, verbose=2)
Here is the link to the notebook

Have you tried the tpu_model.fit_generator method like in the example below?
The other part looks fine.
Also, one problem could be the use of Adam Optimizer. There was smth. about it, but I forgot where the link is. Try another optimizer and the code below and if a different optimizer worked, you know it must be smth. with the Adam Optimizer.
tf.keras.backend.clear_session()
training_model = lstm_model(seq_len=100, batch_size=128, stateful=False)
tpu_model = tf.contrib.tpu.keras_to_tpu_model(
training_model,
strategy=tf.contrib.tpu.TPUDistributionStrategy(
tf.contrib.cluster_resolver.TPUClusterResolver(TPU_WORKER)))
tpu_model.fit_generator(
training_generator(seq_len=100, batch_size=1024),
steps_per_epoch=100,
epochs=10,
)
tpu_model.save_weights('/tmp/bard.h5', overwrite=True)

Related

Saving TensorFlow Neural Network KFold Cross Validation model

I am working on a sample Neural Network with KFold cross validation using TensorFlow 2.4.1. and sklearn.
Unfortunately, I am not able to save the model.
def my_model(self,):
inputs = keras.Input(shape=(48, 48, 3))
x = layers.Conv2D(filters=4, kernel_size=self.k_size, padding='same', activation="relu")(inputs)
x = layers.BatchNormalization()(x)
x = layers.MaxPool2D()(x)
x = layers.Flatten()(x)
output = layers.Dense(10, activation='softmax')(x)
model = keras.Model(inputs=inputs, outputs=output)
model.compile(optimizer='adam',
loss=[keras.losses.SparseCategoricalCrossentropy(from_logits=True)],
metrics=['accuracy'])
return model
def train_model(self):
try:
os.mkdir('model/saved_models')
except OSError:
pass
try:
os.mkdir('model/saved_graphs')
except OSError:
pass
kf = KFold(n_splits=3)
for train_index, test_index in kf.split(self.x_train):
x_train, x_test = self.x_train[train_index], self.x_train[test_index]
y_train, y_test = self.y_train[train_index], self.y_train[test_index]
model = self.my_model()
print(model.summary())
trained_model = model.fit(x_train, y_train, epochs=self.epochs, steps_per_epoch=10, verbose=2)
trained_model = trained_model.history
print('Model evaluation', model.evaluate(x_test, y_test, verbose = 2))
trained_model.save(f'model/saved_models/dummy_model_{date}')
return trained_model
I am getting a following error:
trained_model.save(f'model/saved_models/dummy_model_{date}')
AttributeError: 'dict' object has no attribute 'save'
I am not able to think of a way to take the trained model out of the for loop. And this might be the possible reason I can think of for this problem.
Can anybody suggest how we can solve this issue? Or is there any other way to build a ANN with KFold?
Thanks.
Yea your code has some typo:
trained_model = trained_model.history # This is your train stats, so your train stats is a dictionary
model.save(f'model/saved_models/dummy_model_{date}') # This is what your saving the actual model

why am I getting error in transfer learning?

I am training a model for Optical Character Recognition of Gujarati Language. The input image is a character image. I have taken 37 classes. Total training images are 22200 (600 per class) and testing images are 5920 (160 per class). My input images are 32x32
Below is my code:
model = tf.keras.applications.DenseNet121(include_top=False, weights='imagenet', pooling='max')
base_inputs = model.layers[0].input
base_outputs = model.layers[-1].output # NOTICE -1 not -2
prefinal_outputs = layers.Dense(1024)(base_outputs)
final_outputs = layers.Dense(37)(prefinal_outputs)
new_model = keras.Model(inputs=base_inputs, outputs=base_outputs)
from tensorflow.keras.preprocessing.image import ImageDataGenerator
train_datagen = ImageDataGenerator(
shear_range=0.2,
zoom_range=0.2,
horizontal_flip=False)
test_datagen = ImageDataGenerator(horizontal_flip = False)
training_set = train_datagen.flow_from_directory('C:/Users/shweta/Desktop/characters/train',
target_size = (32, 32),
batch_size = 64,
class_mode = 'categorical')
test_set = test_datagen.flow_from_directory('C:/Users/shweta/Desktop/characters/test',
target_size = (32, 32),
batch_size = 64,
class_mode = 'categorical')
new_model.compile(optimizer = 'adam', loss = 'categorical_crossentropy', metrics = ['accuracy'])
new_model.fit_generator(training_set,
epochs = 25,
validation_data = test_set, shuffle=True)
new_model.save('alphanumeric.mod')
I am getting following output:
Thanks in advance!
First of all, very well written code.
These are some of the things, I have noticed while I was going through the code and tf,keras docs.
I would like to ask what kind of labels have you got beacuse you know categorical_crossentropy expects ONE HOT CODED labels.(Check this).So, if your labels are integers, use sparsecategoricalentropy.
Similar issue
There was post where someone was trying to classsify into 2 and used categorical instead of binary crossentropy. If you want to look at.
Cheers
Let me know how it goes!
PS: #gerry made a very good point and if labels are One hot encoded use categoricalcrossentropy!
The code should be:
model = tf.keras.applications.DenseNet121(include_top=False, weights='imagenet, pooling='max', input_shape=(32,32,3))
base_outputs = model.layers[-1].output
prefinal_outputs = layers.Dense(1024)(base_outputs)
final_outputs = layers.Dense(37)(prefinal_outputs)
new_model = keras.Model(inputs=model.input, outputs=final_outputs)
new_model.compile(Adam(), loss='categorical_crossentropy', metrics=['accuracy'])
Also you should use model.fit in the future. Model.fit can now work with generators and model.fit_generator will be depreciate in future versions of tensorflow. I ran against your dataset and got accurate results in about 10 epochs. Here is some additional advice. It is best to use and adjustable learning rate. The keras callback ReduceLROnPlateau makes this easy to do. Documentation is here. Set it to monitor the validation loss. My use is shown below.
lr_adjust=tf.keras.callbacks.ReduceLROnPlateau( monitor="val_loss", factor=0.5, patience=1, verbose=1, mode="auto",
min_delta=0.00001, cooldown=0, min_lr=0)
Also I recommend using the callback ModelCheckpoint. Documentation is here. Set it up to monitor validation loss and it will save the weights that achieved the lowest validation loss. My implementation is shown below.
sav_loc=r'c:\Temp' # set this to the path where you want to save the weights
checkpoint=tf.keras.callbacks.ModelCheckpoint(filepath=save_loc, monitor='val_loss', verbose=1, save_best_only=True,
save_weights_only=True, mode='auto', save_freq='epoch', options=None)
callbacks=[checkpoint, lr_adjust]
In model.fit include callbacks=callbacks. When training is completed you want to load these saved weights into the model, then save the model. You can use the saved model to make predictions. Code is below.
model.load_weights(save_loc)
model.save(save_loc)

Tensorflow TensorBoard not showing acc, loss, acc_val, and loss_val only only epoch_accuracy and epoch_loss

I am looking to have a TensorBoard display graphs corresponding to acc, loss, acc_val, and loss_val but they do not appear for some reason. Here is what I am seeing.
I am looking to have this:
I am following the instruction here to be able to use tensorboard in a google colab notebook
This is the code used to generate the tensorboard:
opt = tf.keras.optimizers.Adam(lr=0.001, decay=1e-6)
tensorboard = TensorBoard(log_dir="logs/{}".format(NAME),
histogram_freq=1,
write_graph=True,
write_grads=True,
batch_size=BATCH_SIZE,
write_images=True)
model.compile(
loss='sparse_categorical_crossentropy',
optimizer=opt,
metrics=['accuracy']
)
# Train model
history = model.fit(
train_x, train_y,
batch_size=BATCH_SIZE,
epochs=EPOCHS,
validation_data=(validation_x, validation_y),
callbacks=[tensorboard]
)
How do I go about solving this issue? Any ideas? Your help is very much appreciated!
That's the intended behavior. If you want to log custom scalars such as a dynamic learning rate, you need to use the TensorFlow Summary API.
Retrain the regression model and log a custom learning rate. Here's how:
Create a file writer, using tf.summary.create_file_writer().
Define a custom learning rate function. This will be passed to the Keras LearningRateScheduler callback.
Inside the learning rate function, use tf.summary.scalar() to log the custom learning rate.
Pass the LearningRateScheduler callback to Model.fit().
In general, to log a custom scalar, you need to use tf.summary.scalar() with a file writer. The file writer is responsible for writing data for this run to the specified directory and is implicitly used when you use the tf.summary.scalar().
logdir = "logs/scalars/" + datetime.now().strftime("%Y%m%d-%H%M%S")
file_writer = tf.summary.create_file_writer(logdir + "/metrics")
file_writer.set_as_default()
def lr_schedule(epoch):
"""
Returns a custom learning rate that decreases as epochs progress.
"""
learning_rate = 0.2
if epoch > 10:
learning_rate = 0.02
if epoch > 20:
learning_rate = 0.01
if epoch > 50:
learning_rate = 0.005
tf.summary.scalar('learning rate', data=learning_rate, step=epoch)
return learning_rate
lr_callback = keras.callbacks.LearningRateScheduler(lr_schedule)
tensorboard_callback = keras.callbacks.TensorBoard(log_dir=logdir)
model = keras.models.Sequential([
keras.layers.Dense(16, input_dim=1),
keras.layers.Dense(1),
])
model.compile(
loss='mse', # keras.losses.mean_squared_error
optimizer=keras.optimizers.SGD(),
)
training_history = model.fit(
x_train, # input
y_train, # output
batch_size=train_size,
verbose=0, # Suppress chatty output; use Tensorboard instead
epochs=100,
validation_data=(x_test, y_test),
callbacks=[tensorboard_callback, lr_callback],
)

tensorflow vs keras execution

I use this code to build a regression model
training_input_func = tf.estimator.inputs.pandas_input_fn( x=x_train,
y=y_train['Price'],
batch_size=256,
num_epochs=500,
shuffle=True )
regressor = tf.estimator.DNNRegressor(feature_columns = feature_cols,
activation_fn = tf.nn.relu,
hidden_units=[100, 50, 100],
model_dir = 'model',
optimizer = tf.train.GradientDescentOptimizer( learning_rate= 0.01 ))
regressor.train(input_fn = training_input_func, steps=2000)
It takes 2-3 min to execute, But when i try this code using keras
epochs = 500
batch_size = 256
model_1 = keras.Sequential()
model_1.add(Dense(100, activation ="tanh"))
model_1.add(Dense(50, activation ="relu"))
model_1.add(Dense(y_train_array.shape[0]))
model_1.compile(loss='mean_squared_error', optimizer=Adam(), metrics=[metrics.mae])
model_1.fit(x_train_array, y_train_array,
batch_size=batch_size,
epochs=epochs,
shuffle=True,
verbose=2, # Change it to 2, if wished to observe execution. o if not
validation_data=(x_validation_array, y_validation_array),
callbacks=keras_callbacks,
use_multiprocessing=True,
workers=50)
It takes almost 3-4 hour for first epoch. The size of training example is around 3M and validation is 30 k. Is there any problem in my code? I know keras take more time time compare to tensorflow.

AlreadyExistsError while training a network on colab

I'm trying to train an LSTMs network on Google Colab. However, this error occurs:
AlreadyExistsError: Resource __per_step_116/training_4/Adam/gradients/bidirectional_4/while/ReadVariableOp/Enter_grad/ArithmeticOptimizer/AddOpsRewrite_Add/tmp_var/N10tensorflow19TemporaryVariableOp6TmpVarE
[[{{node training_4/Adam/gradients/bidirectional_4/while/ReadVariableOp/Enter_grad/ArithmeticOptimizer/AddOpsRewrite_Add/tmp_var}}]]
I don't know where can be the issue. This is the model of the network:
sl_model = keras.models.Sequential()
sl_model.add(keras.layers.Embedding(max_index+1, hidden_size, mask_zero=True))
sl_model.add(keras.layers.Bidirectional(keras.layers.LSTM(hidden_size,
activation='tanh', dropout=0.2, recurrent_dropout = 0.2, return_sequences=True)))
sl_model.add(keras.layers.Bidirectional(keras.layers.LSTM(hidden_size, activation='tanh', dropout=0.2, recurrent_dropout = 0.2, return_sequences=False))
)
sl_model.add(keras.layers.Dense(max_length, activation='softsign'))
optimizer = keras.optimizers.Adam()
sl_model.compile(loss='binary_crossentropy', optimizer=optimizer, metrics=['acc'])
batch_size = 128
epochs = 3
cbk = keras.callbacks.TensorBoard("logging/keras_model")
print("\nStarting training...")
sl_model.fit(x_train, y_train, epochs=epochs, batch_size=batch_size,
shuffle=True, validation_data=(x_dev, y_dev), callbacks=[cbk])
Thank you so much!
You need to restart your runtime -- this happens when you have defined multiple graphs built in a single jupyter (Colaboratory) runtime.
Calling tf.reset_default_graph() may also help, but depending on whether you are using eager exection and how you've defined your sessions this may or may not work.