Tensorboard not updating by batch in google colab - tensorflow

I'm using tensorboard in google colab, it's works fine if i want to track the epochs. However, i want to track the accuracy/loss by batch. I'm trying it using the getting started at documentation https://www.tensorflow.org/tensorboard/get_started but if i change the argument update_freq by update_freq="batch" it doesn't work. I have tried in my local pc and it works. Any idea of what is happening?
Using tensorboard 2.8.0 and tensorflow 2.8.0
Code (running in colab)
%load_ext tensorboard
import tensorflow as tf
import datetime
!rm -rf ./logs/
mnist = tf.keras.datasets.mnist
(x_train, y_train),(x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0
def create_model():
return tf.keras.models.Sequential([
tf.keras.layers.Flatten(input_shape=(28, 28)),
tf.keras.layers.Dense(512, activation='relu'),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.Dense(10, activation='softmax')
])
model = create_model()
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
log_dir = "logs/fit_2/" + datetime.datetime.now().strftime("%Y%m%d-%H%M%S")
tensorboard_callback = tf.keras.callbacks.TensorBoard(log_dir=log_dir, update_freq="batch")
model.fit(x=x_train,
y=y_train,
epochs=5,
validation_data=(x_test, y_test),
callbacks=[tensorboard_callback])
I've tried to use a integer and it doesn't work either. In my local computer i've no problems.

The change after TensorFlow 2.3 made the batch-level summaries part of the Model.train_function rather than something that the TensorBoard callback creates itself. This resulted in a 2x improvement in speed for many small models in Model.fit, but it does have the side effect that calling TensorBoard.on_train_batch_end(my_batch, my_metrics) in a custom training loop will no longer log batch-level metrics.
This issue was discussed in one of the GitHub issue.
There can be a workaround by creating a custom callback like LambdaCallback.
I have modified the last part of your code to explicitly add scalar values of batch_loss and batch_accuracy using tf.summary.scalar() to be shown in tensorboard logs.
The code module is as follows:
model = create_model()
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
from keras.callbacks import LambdaCallback
def batchOutput(batch, logs):
tf.summary.scalar('batch_loss', data=logs['loss'], step=batch)
tf.summary.scalar('batch_accuracy', data=logs['accuracy'], step=batch)
return batch
batchLogCallback = LambdaCallback(on_batch_end=batchOutput)
log_dir = "logs/fit_2/" + datetime.datetime.now().strftime("%Y%m%d-%H%M%S")
tensorboard_callback = tf.keras.callbacks.TensorBoard(log_dir='./logs', update_freq='batch')
model.fit(x=x_train,
y=y_train,
epochs=1,
validation_data=(x_test, y_test),
callbacks=[tensorboard_callback, batchLogCallback])
I tried this in Colab as well it worked.

Related

Tensorboard callback doesn't work when calling

I am new to Tensorflow and Keras. I just started beginning my Deep learning Journey. I installed Tensorflow 2.4.3 as well as Keras. I was learning Tensorboard. I created a model for imdb dataset as follows
import tensorflow as tf
import keras
from tensorflow.keras import *
from tensorflow.keras.datasets import imdb
from tensorflow.keras.preprocessing import sequence
## model making
max_features = 2000
max_len = 500
(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=max_features)
x_train = sequence.pad_sequences(x_train, maxlen=max_len)
x_test = sequence.pad_sequences(x_test, maxlen=max_len)
model = models.Sequential()
model.add(layers.Embedding(max_features, 128,
input_length=max_len,
name='embed'))
model.add(layers.Conv1D(32, 7, activation='relu'))
model.add(layers.MaxPooling1D(5))
model.add(layers.Conv1D(32, 7, activation='relu'))
model.add(layers.GlobalMaxPooling1D())
model.add(layers.Dense(1))
model.summary()
model.compile(optimizer='rmsprop',
loss='binary_crossentropy',
metrics=['acc'])
I used the tensorboard callback here.
callbacks = [
keras.callbacks.TensorBoard(
log_dir='my_log_dir',
histogram_freq=1,
embeddings_freq=1,
)
]
history = model.fit(x_train, y_train,
epochs=3,
batch_size=128,
validation_split=0.2,
callbacks=callbacks)
Then I got the following warning.
C:\Users\ktripat\Anaconda3\envs\tf2\lib\site-packages\keras\callbacks\tensorboard_v2.py:102: UserWarning: The TensorBoard callback does not support embeddings display when using TensorFlow 2.0. Embeddings-related arguments are ignored.
warnings.warn('The TensorBoard callback does not support.'
Please find any solution if you guys have any. Thank you in advance!
You will need to follow this guide.
It describes how to save the weights of your embedding layer in a way that you can visualize it in TensorBoard:
# Set up a logs directory, so Tensorboard knows where to look for files.
log_dir='/logs/imdb-example/'
if not os.path.exists(log_dir):
os.makedirs(log_dir)
# Save Labels separately on a line-by-line manner.
with open(os.path.join(log_dir, 'metadata.tsv'), "w") as f:
for subwords in encoder.subwords:
f.write("{}\n".format(subwords))
# Fill in the rest of the labels with "unknown".
for unknown in range(1, encoder.vocab_size - len(encoder.subwords)):
f.write("unknown #{}\n".format(unknown))
# Save the weights we want to analyze as a variable. Note that the first
# value represents any unknown word, which is not in the metadata, here
# we will remove this value.
weights = tf.Variable(model.layers[0].get_weights()[0][1:])
# Create a checkpoint from embedding, the filename and key are the
# name of the tensor.
checkpoint = tf.train.Checkpoint(embedding=weights)
checkpoint.save(os.path.join(log_dir, "embedding.ckpt"))
# Set up config.
config = projector.ProjectorConfig()
embedding = config.embeddings.add()
# The name of the tensor will be suffixed by `/.ATTRIBUTES/VARIABLE_VALUE`.
embedding.tensor_name = "embedding/.ATTRIBUTES/VARIABLE_VALUE"
embedding.metadata_path = 'metadata.tsv'
projector.visualize_embeddings(log_dir, config)
If you want to visualize during training, you can call this code in a save callback during training every X episodes using this.

Where is the key to make tensorboard work?

I'm new to tensorboard and learning the use of it following the tutorial, which goes well and tensorboard works as expected.
Referring to that tutorial, I wrote my own code to train a logic-and model with jupyter notebook
%load_ext tensorboard
import datetime
log_folder = "logs/fit/" + datetime.datetime.now().strftime("%Y%m%d-%H%M%S")
import tensorflow as tf
import numpy as np
x_train = np.asarray([[0, 0],[0, 1],[1, 0],[1, 1]], np.float32)
y_train = np.asarray([0, 0, 0, 1], np.float32)
model = tf.keras.models.Sequential([
tf.keras.layers.Dense(1, activation=tf.nn.sigmoid)
])
def custom_loss(y,a):
return -(y*tf.math.log(a) + (1-y)*tf.math.log(1-a))
model.compile(loss=custom_loss,
optimizer='SGD',
metrics=['accuracy'])
tensorboard_callback = tf.keras.callbacks.TensorBoard(log_folder, histogram_freq=1)
model.fit(x_train, y_train, epochs=2000, verbose=0,
callbacks=[tensorboard_callback])
the training goes well and needs some improvement.
However, tensorboard shows nothing
%tensorboard --logdir log_folder
where is the key to make tensorboard work?
You're just using the ipython magic wrong. You need to put a dollar sign in front of your variable name (see e.g. How to pass a variable to magic ´run´ function in IPython).
%tensorboard --logdir $log_folder
For further exploring, try pretending you're working in the future (at least as far as date goes), and add a cell like this
log_folder_future = "logs/fit/" + (datetime.datetime.now() - datetime.timedelta(days=1)).strftime("%Y%m%d-%H%M%S")
up_dir = './' + '/'.join(log_folder_future.split('/')[:-1])
model.compile(loss=custom_loss,
optimizer='SGD',
metrics=['accuracy'])
tensorboard_callback_future = tf.keras.callbacks.TensorBoard(log_folder_future, histogram_freq=1)
model.fit(x_train, y_train, epochs=2500, verbose=0,
callbacks=[tensorboard_callback_future])
and call like this
%tensorboard --logdir $up_dir
and end up with something like this
For more on tensorboard directory structures and multiple runs, see this page
https://github.com/tensorflow/tensorboard/blob/master/README.md#runs-comparing-different-executions-of-your-model

Keras ValueError trying to load model

I am using Anaconda Navigator, Jupyter to be precised.
import tensorflow as tf
from tensorflow import keras
print(tf.__version__)
>>> 1.14.0
This is my model
def create_model():
model = tf.keras.Sequential([
keras.layers.Dense(86, activation='relu', kernel_regularizer=keras.regularizers.l2(0.0001),input_shape=(129,)),
keras.layers.Dropout(0.2),
keras.layers.Dense(142, activation='relu', kernel_regularizer=keras.regularizers.l2(0.0001)),
keras.layers.Dropout(0.2),
keras.layers.Dense(4, activation='softmax')
])
return model
model = create_model()
# Display the model's architecture
model.summary()
After training,predicting and evaluating my model, I decided to save it using
model.save('/Users/Jennifer/myproject/my_model.h5')
I checked the directory and folder with the h5py file. And I decided to load it using
new_model1 = tf.keras.models.load_model('/Users/Jennifer/myproject/my_model.h5')
I got an Error
ValueError: Unknown entries in loss dictionary: ['class_name', 'config']. Only expected following keys: ['dense_17']
Please help me. What should I do? I have almost spent the whole day trying to solve this issue. Thanks
Here is a bit of a work around that just loads the weights:
#!/usr/bin/env python3
from tensorflow import keras
import os
def create_model():
model = keras.Sequential([
keras.layers.Dense(86, activation='relu', kernel_regularizer=keras.regularizers.l2(0.0001),input_shape=(129,)),
keras.layers.Dropout(0.2),
keras.layers.Dense(142, activation='relu', kernel_regularizer=keras.regularizers.l2(0.0001)),
keras.layers.Dropout(0.2),
keras.layers.Dense(4, activation='softmax')
])
return model
if os.path.exists("junk.h5"):
model = create_model()
model.load_weights("junk.h5")
else:
model = create_model()
model.compile(optimizer=keras.optimizers.Adam(0.0001), loss=keras.losses.CategoricalCrossentropy(from_logits=True), metrics=['accuracy'])
model.save("junk.h5")
Another workaround would be to save the model without the optimizer
model.save("junk.h5", include_optimizer=False)
It looks like the loss function you're using creates a dictionary that has invalid keys. This sounds like a bug in keras/tensorflow. That is why the colab one probably worked because it was using a newer version.

Tensorflow TensorBoard not showing acc, loss, acc_val, and loss_val only only epoch_accuracy and epoch_loss

I am looking to have a TensorBoard display graphs corresponding to acc, loss, acc_val, and loss_val but they do not appear for some reason. Here is what I am seeing.
I am looking to have this:
I am following the instruction here to be able to use tensorboard in a google colab notebook
This is the code used to generate the tensorboard:
opt = tf.keras.optimizers.Adam(lr=0.001, decay=1e-6)
tensorboard = TensorBoard(log_dir="logs/{}".format(NAME),
histogram_freq=1,
write_graph=True,
write_grads=True,
batch_size=BATCH_SIZE,
write_images=True)
model.compile(
loss='sparse_categorical_crossentropy',
optimizer=opt,
metrics=['accuracy']
)
# Train model
history = model.fit(
train_x, train_y,
batch_size=BATCH_SIZE,
epochs=EPOCHS,
validation_data=(validation_x, validation_y),
callbacks=[tensorboard]
)
How do I go about solving this issue? Any ideas? Your help is very much appreciated!
That's the intended behavior. If you want to log custom scalars such as a dynamic learning rate, you need to use the TensorFlow Summary API.
Retrain the regression model and log a custom learning rate. Here's how:
Create a file writer, using tf.summary.create_file_writer().
Define a custom learning rate function. This will be passed to the Keras LearningRateScheduler callback.
Inside the learning rate function, use tf.summary.scalar() to log the custom learning rate.
Pass the LearningRateScheduler callback to Model.fit().
In general, to log a custom scalar, you need to use tf.summary.scalar() with a file writer. The file writer is responsible for writing data for this run to the specified directory and is implicitly used when you use the tf.summary.scalar().
logdir = "logs/scalars/" + datetime.now().strftime("%Y%m%d-%H%M%S")
file_writer = tf.summary.create_file_writer(logdir + "/metrics")
file_writer.set_as_default()
def lr_schedule(epoch):
"""
Returns a custom learning rate that decreases as epochs progress.
"""
learning_rate = 0.2
if epoch > 10:
learning_rate = 0.02
if epoch > 20:
learning_rate = 0.01
if epoch > 50:
learning_rate = 0.005
tf.summary.scalar('learning rate', data=learning_rate, step=epoch)
return learning_rate
lr_callback = keras.callbacks.LearningRateScheduler(lr_schedule)
tensorboard_callback = keras.callbacks.TensorBoard(log_dir=logdir)
model = keras.models.Sequential([
keras.layers.Dense(16, input_dim=1),
keras.layers.Dense(1),
])
model.compile(
loss='mse', # keras.losses.mean_squared_error
optimizer=keras.optimizers.SGD(),
)
training_history = model.fit(
x_train, # input
y_train, # output
batch_size=train_size,
verbose=0, # Suppress chatty output; use Tensorboard instead
epochs=100,
validation_data=(x_test, y_test),
callbacks=[tensorboard_callback, lr_callback],
)

AlreadyExistsError while training a network on colab

I'm trying to train an LSTMs network on Google Colab. However, this error occurs:
AlreadyExistsError: Resource __per_step_116/training_4/Adam/gradients/bidirectional_4/while/ReadVariableOp/Enter_grad/ArithmeticOptimizer/AddOpsRewrite_Add/tmp_var/N10tensorflow19TemporaryVariableOp6TmpVarE
[[{{node training_4/Adam/gradients/bidirectional_4/while/ReadVariableOp/Enter_grad/ArithmeticOptimizer/AddOpsRewrite_Add/tmp_var}}]]
I don't know where can be the issue. This is the model of the network:
sl_model = keras.models.Sequential()
sl_model.add(keras.layers.Embedding(max_index+1, hidden_size, mask_zero=True))
sl_model.add(keras.layers.Bidirectional(keras.layers.LSTM(hidden_size,
activation='tanh', dropout=0.2, recurrent_dropout = 0.2, return_sequences=True)))
sl_model.add(keras.layers.Bidirectional(keras.layers.LSTM(hidden_size, activation='tanh', dropout=0.2, recurrent_dropout = 0.2, return_sequences=False))
)
sl_model.add(keras.layers.Dense(max_length, activation='softsign'))
optimizer = keras.optimizers.Adam()
sl_model.compile(loss='binary_crossentropy', optimizer=optimizer, metrics=['acc'])
batch_size = 128
epochs = 3
cbk = keras.callbacks.TensorBoard("logging/keras_model")
print("\nStarting training...")
sl_model.fit(x_train, y_train, epochs=epochs, batch_size=batch_size,
shuffle=True, validation_data=(x_dev, y_dev), callbacks=[cbk])
Thank you so much!
You need to restart your runtime -- this happens when you have defined multiple graphs built in a single jupyter (Colaboratory) runtime.
Calling tf.reset_default_graph() may also help, but depending on whether you are using eager exection and how you've defined your sessions this may or may not work.