I'm new to tensorboard and learning the use of it following the tutorial, which goes well and tensorboard works as expected.
Referring to that tutorial, I wrote my own code to train a logic-and model with jupyter notebook
%load_ext tensorboard
import datetime
log_folder = "logs/fit/" + datetime.datetime.now().strftime("%Y%m%d-%H%M%S")
import tensorflow as tf
import numpy as np
x_train = np.asarray([[0, 0],[0, 1],[1, 0],[1, 1]], np.float32)
y_train = np.asarray([0, 0, 0, 1], np.float32)
model = tf.keras.models.Sequential([
tf.keras.layers.Dense(1, activation=tf.nn.sigmoid)
])
def custom_loss(y,a):
return -(y*tf.math.log(a) + (1-y)*tf.math.log(1-a))
model.compile(loss=custom_loss,
optimizer='SGD',
metrics=['accuracy'])
tensorboard_callback = tf.keras.callbacks.TensorBoard(log_folder, histogram_freq=1)
model.fit(x_train, y_train, epochs=2000, verbose=0,
callbacks=[tensorboard_callback])
the training goes well and needs some improvement.
However, tensorboard shows nothing
%tensorboard --logdir log_folder
where is the key to make tensorboard work?
You're just using the ipython magic wrong. You need to put a dollar sign in front of your variable name (see e.g. How to pass a variable to magic ´run´ function in IPython).
%tensorboard --logdir $log_folder
For further exploring, try pretending you're working in the future (at least as far as date goes), and add a cell like this
log_folder_future = "logs/fit/" + (datetime.datetime.now() - datetime.timedelta(days=1)).strftime("%Y%m%d-%H%M%S")
up_dir = './' + '/'.join(log_folder_future.split('/')[:-1])
model.compile(loss=custom_loss,
optimizer='SGD',
metrics=['accuracy'])
tensorboard_callback_future = tf.keras.callbacks.TensorBoard(log_folder_future, histogram_freq=1)
model.fit(x_train, y_train, epochs=2500, verbose=0,
callbacks=[tensorboard_callback_future])
and call like this
%tensorboard --logdir $up_dir
and end up with something like this
For more on tensorboard directory structures and multiple runs, see this page
https://github.com/tensorflow/tensorboard/blob/master/README.md#runs-comparing-different-executions-of-your-model
Related
I am new to Tensorflow and Keras. I just started beginning my Deep learning Journey. I installed Tensorflow 2.4.3 as well as Keras. I was learning Tensorboard. I created a model for imdb dataset as follows
import tensorflow as tf
import keras
from tensorflow.keras import *
from tensorflow.keras.datasets import imdb
from tensorflow.keras.preprocessing import sequence
## model making
max_features = 2000
max_len = 500
(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=max_features)
x_train = sequence.pad_sequences(x_train, maxlen=max_len)
x_test = sequence.pad_sequences(x_test, maxlen=max_len)
model = models.Sequential()
model.add(layers.Embedding(max_features, 128,
input_length=max_len,
name='embed'))
model.add(layers.Conv1D(32, 7, activation='relu'))
model.add(layers.MaxPooling1D(5))
model.add(layers.Conv1D(32, 7, activation='relu'))
model.add(layers.GlobalMaxPooling1D())
model.add(layers.Dense(1))
model.summary()
model.compile(optimizer='rmsprop',
loss='binary_crossentropy',
metrics=['acc'])
I used the tensorboard callback here.
callbacks = [
keras.callbacks.TensorBoard(
log_dir='my_log_dir',
histogram_freq=1,
embeddings_freq=1,
)
]
history = model.fit(x_train, y_train,
epochs=3,
batch_size=128,
validation_split=0.2,
callbacks=callbacks)
Then I got the following warning.
C:\Users\ktripat\Anaconda3\envs\tf2\lib\site-packages\keras\callbacks\tensorboard_v2.py:102: UserWarning: The TensorBoard callback does not support embeddings display when using TensorFlow 2.0. Embeddings-related arguments are ignored.
warnings.warn('The TensorBoard callback does not support.'
Please find any solution if you guys have any. Thank you in advance!
You will need to follow this guide.
It describes how to save the weights of your embedding layer in a way that you can visualize it in TensorBoard:
# Set up a logs directory, so Tensorboard knows where to look for files.
log_dir='/logs/imdb-example/'
if not os.path.exists(log_dir):
os.makedirs(log_dir)
# Save Labels separately on a line-by-line manner.
with open(os.path.join(log_dir, 'metadata.tsv'), "w") as f:
for subwords in encoder.subwords:
f.write("{}\n".format(subwords))
# Fill in the rest of the labels with "unknown".
for unknown in range(1, encoder.vocab_size - len(encoder.subwords)):
f.write("unknown #{}\n".format(unknown))
# Save the weights we want to analyze as a variable. Note that the first
# value represents any unknown word, which is not in the metadata, here
# we will remove this value.
weights = tf.Variable(model.layers[0].get_weights()[0][1:])
# Create a checkpoint from embedding, the filename and key are the
# name of the tensor.
checkpoint = tf.train.Checkpoint(embedding=weights)
checkpoint.save(os.path.join(log_dir, "embedding.ckpt"))
# Set up config.
config = projector.ProjectorConfig()
embedding = config.embeddings.add()
# The name of the tensor will be suffixed by `/.ATTRIBUTES/VARIABLE_VALUE`.
embedding.tensor_name = "embedding/.ATTRIBUTES/VARIABLE_VALUE"
embedding.metadata_path = 'metadata.tsv'
projector.visualize_embeddings(log_dir, config)
If you want to visualize during training, you can call this code in a save callback during training every X episodes using this.
I'm using tensorboard in google colab, it's works fine if i want to track the epochs. However, i want to track the accuracy/loss by batch. I'm trying it using the getting started at documentation https://www.tensorflow.org/tensorboard/get_started but if i change the argument update_freq by update_freq="batch" it doesn't work. I have tried in my local pc and it works. Any idea of what is happening?
Using tensorboard 2.8.0 and tensorflow 2.8.0
Code (running in colab)
%load_ext tensorboard
import tensorflow as tf
import datetime
!rm -rf ./logs/
mnist = tf.keras.datasets.mnist
(x_train, y_train),(x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0
def create_model():
return tf.keras.models.Sequential([
tf.keras.layers.Flatten(input_shape=(28, 28)),
tf.keras.layers.Dense(512, activation='relu'),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.Dense(10, activation='softmax')
])
model = create_model()
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
log_dir = "logs/fit_2/" + datetime.datetime.now().strftime("%Y%m%d-%H%M%S")
tensorboard_callback = tf.keras.callbacks.TensorBoard(log_dir=log_dir, update_freq="batch")
model.fit(x=x_train,
y=y_train,
epochs=5,
validation_data=(x_test, y_test),
callbacks=[tensorboard_callback])
I've tried to use a integer and it doesn't work either. In my local computer i've no problems.
The change after TensorFlow 2.3 made the batch-level summaries part of the Model.train_function rather than something that the TensorBoard callback creates itself. This resulted in a 2x improvement in speed for many small models in Model.fit, but it does have the side effect that calling TensorBoard.on_train_batch_end(my_batch, my_metrics) in a custom training loop will no longer log batch-level metrics.
This issue was discussed in one of the GitHub issue.
There can be a workaround by creating a custom callback like LambdaCallback.
I have modified the last part of your code to explicitly add scalar values of batch_loss and batch_accuracy using tf.summary.scalar() to be shown in tensorboard logs.
The code module is as follows:
model = create_model()
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
from keras.callbacks import LambdaCallback
def batchOutput(batch, logs):
tf.summary.scalar('batch_loss', data=logs['loss'], step=batch)
tf.summary.scalar('batch_accuracy', data=logs['accuracy'], step=batch)
return batch
batchLogCallback = LambdaCallback(on_batch_end=batchOutput)
log_dir = "logs/fit_2/" + datetime.datetime.now().strftime("%Y%m%d-%H%M%S")
tensorboard_callback = tf.keras.callbacks.TensorBoard(log_dir='./logs', update_freq='batch')
model.fit(x=x_train,
y=y_train,
epochs=1,
validation_data=(x_test, y_test),
callbacks=[tensorboard_callback, batchLogCallback])
I tried this in Colab as well it worked.
I'm using Talos to run hyperparameter tuning of a Keras model. Running this short code on Google colab TPU is very slow. I think it has something to do with the type of data. Should I convert it to tensors to make the TPU faster?
%tensorflow_version 2.x
import os
import tensorflow as tf
import talos as ta
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.optimizers import Adam
from sklearn.model_selection import train_test_split
def iris_model(x_train, y_train, x_val, y_val, params):
# Specify a distributed strategy to use TPU
resolver = tf.distribute.cluster_resolver.TPUClusterResolver(tpu='grpc://' + os.environ['COLAB_TPU_ADDR'])
tf.config.experimental_connect_to_host(resolver.master())
tf.tpu.experimental.initialize_tpu_system(resolver)
strategy = tf.distribute.experimental.TPUStrategy(resolver)
# Use the strategy to create and compile a Keras model
with strategy.scope():
model = Sequential()
model.add(Dense(32, input_shape=(4,), activation=tf.nn.relu, name="relu"))
model.add(Dense(3, activation=tf.nn.softmax, name="softmax"))
model.compile(optimizer=Adam(learning_rate=0.1), loss=params['losses'])
# Convert data type to use TPU
x_train = x_train.astype('float32')
x_val = x_val.astype('float32')
# Fit the Keras model on the dataset
out = model.fit(x_train, y_train, batch_size=params['batch_size'], epochs=params['epochs'], validation_data=[x_val, y_val], verbose=0, steps_per_epoch=0)
return out, model
# Load dataset
X, y = ta.templates.datasets.iris()
# Train and test set
x_train, x_val, y_train, y_val = train_test_split(X, y, test_size=0.30, shuffle=False)
# Create a hyperparameter distributions
p = {'losses': ['logcosh'], 'batch_size': [128, 256, 384, 512, 1024], 'epochs': [10, 20]}
# Use Talos to scan the best hyperparameters of the Keras model
scan_object = ta.Scan(x_train, y_train, params=p, model=iris_model, experiment_name='test', x_val=x_val, y_val=y_val, fraction_limit=0.5)
Thank you for your question.
Unfortunately, I was not able to get your code sample to run on TensorFlow 2.2, so I don't know what performance you were seeing originally. I was able to fix it up and get it running on TPUs with the following changes:
Replace tf.config.experimental_connect_to_host(resolver.master()) with tf.config.experimental_connect_to_cluster(resolver)
Move TPU initialization outside of iris_model().
Use tf.data.Dataset for TPU input.
Here's the modified Colab code:
# Run this to install Talos before running the rest of the code.
!pip install git+https://github.com/autonomio/talos#1.0
%tensorflow_version 2.x
import os
import tensorflow as tf
import talos as ta
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.optimizers import Adam
from sklearn.model_selection import train_test_split
print(tf.__version__) # TF 2.2.0 in my case
resolver = tf.distribute.cluster_resolver.TPUClusterResolver(tpu='grpc://' + os.environ['COLAB_TPU_ADDR'])
tf.config.experimental_connect_to_cluster(resolver)
tf.tpu.experimental.initialize_tpu_system(resolver)
def iris_model(x_train, y_train, x_val, y_val, params):
# Use the strategy to create and compile a Keras model
strategy = tf.distribute.experimental.TPUStrategy(resolver)
with strategy.scope():
model = Sequential()
model.add(Dense(32, input_shape=(4,), activation=tf.nn.relu, name="relu"))
model.add(Dense(3, activation=tf.nn.softmax, name="softmax"))
model.compile(optimizer=Adam(learning_rate=0.1), loss=params['losses'])
train_dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train)).batch(params['batch_size'])
val_dataset = tf.data.Dataset.from_tensor_slices((x_val, y_val)).batch(params['batch_size'])
# Fit the Keras model on the dataset
out = model.fit(train_dataset, epochs=params['epochs'], validation_data=val_dataset)
return out, model
# Load dataset
X, y = ta.templates.datasets.iris()
# Train and test set
x_train, x_val, y_train, y_val = train_test_split(X, y, test_size=0.30, shuffle=False)
# Create a hyperparameter distributions
p = {'losses': ['logcosh'], 'batch_size': [128, 256, 384, 512, 1024], 'epochs': [10, 20]}
# Use Talos to scan the best hyperparameters of the Keras model
scan_object = ta.Scan(x_train, y_train, params=p, model=iris_model, experiment_name='test', x_val=x_val, y_val=y_val, fraction_limit=0.5)
For me, the last call took a little less than 2 minutes.
For well-known datasets, you can skip the step of creating your own tf.data.Dataset by using the TensorFlow Datasets library. TFDS does have the iris dataset in their library. For an end-to-end example of using TFDS with TPUs, see TensorFlow's official TPU guide.
I want to profile my Keras model according to this comment on github. I use the tf.Keras API with Tensorflow version: 1.9.0-rc2 and Keras version: 2.1.6-tf.
run_options = tf.RunOptions(trace_level=tf.RunOptions.FULL_TRACE)
run_metadata = tf.RunMetadata()
training_set = load_datasets(...)
model.compile(loss=helpers.mean_categorical_crossentropy,optimizer='adam',options=run_options,run_metadata=run_metadata)
model.fit(training_set.make_one_shot_iterator(), steps_per_epoch=steps_per_epoch_train,epochs=num_epochs, verbose=2)
trace = timeline.Timeline(step_stats=run_metadata.step_stats)
with open('timeline.ctf.json', 'w') as f:
f.write(trace.generate_chrome_trace_format())
Error
('Some keys in session_kwargs are not supported at this time: %s',
dict_keys(['options', 'run_metadata']))
In another github post someone gives this example and somehow it runs without errors. I however get the same error as above.
import keras
from keras.layers.core import Dense
from keras.models import Sequential
import tensorflow as tf
from tensorflow.python.client import timeline
import numpy as np
x = np.random.randn(10000, 2)
y = (x[:, 0] * x[:, 1]) > 0 # xor
run_options = tf.RunOptions(trace_level=tf.RunOptions.FULL_TRACE)
run_metadata = tf.RunMetadata()
model = Sequential()
model.add(Dense(units=64, activation='relu', input_dim=2))
model.add(Dense(units=2, activation='softmax'))
model.compile(loss='categorical_crossentropy',
optimizer='sgd',
options=run_options,
run_metadata=run_metadata)
model.fit(x, keras.utils.to_categorical(y), epochs=1)
trace = timeline.Timeline(step_stats=run_metadata.step_stats)
with open('timeline.ctf.json', 'w') as f:
f.write(trace.generate_chrome_trace_format())
I also found this issue on github which suggests that profiling with Keras models isn't implemented yet. I am confused.
Does anybody know how to fix it?
There's a pull request that fixes this: https://github.com/tensorflow/tensorflow/pull/19932
It's not merged yet to master, but I got it to work by merging it locally, or simply applying the changes manually to the installed tensorflow library
So I am using tensorboard within keras. In tensorflow one could use two different summarywriters for train and validation scalars so that tensorboard could plot them in a same figure. Something like the figure in
TensorBoard - Plot training and validation losses on the same graph?
Is there a way to do this in keras?
Thanks.
To handle the validation logs with a separate writer, you can write a custom callback that wraps around the original TensorBoard methods.
import os
import tensorflow as tf
from keras.callbacks import TensorBoard
class TrainValTensorBoard(TensorBoard):
def __init__(self, log_dir='./logs', **kwargs):
# Make the original `TensorBoard` log to a subdirectory 'training'
training_log_dir = os.path.join(log_dir, 'training')
super(TrainValTensorBoard, self).__init__(training_log_dir, **kwargs)
# Log the validation metrics to a separate subdirectory
self.val_log_dir = os.path.join(log_dir, 'validation')
def set_model(self, model):
# Setup writer for validation metrics
self.val_writer = tf.summary.FileWriter(self.val_log_dir)
super(TrainValTensorBoard, self).set_model(model)
def on_epoch_end(self, epoch, logs=None):
# Pop the validation logs and handle them separately with
# `self.val_writer`. Also rename the keys so that they can
# be plotted on the same figure with the training metrics
logs = logs or {}
val_logs = {k.replace('val_', ''): v for k, v in logs.items() if k.startswith('val_')}
for name, value in val_logs.items():
summary = tf.Summary()
summary_value = summary.value.add()
summary_value.simple_value = value.item()
summary_value.tag = name
self.val_writer.add_summary(summary, epoch)
self.val_writer.flush()
# Pass the remaining logs to `TensorBoard.on_epoch_end`
logs = {k: v for k, v in logs.items() if not k.startswith('val_')}
super(TrainValTensorBoard, self).on_epoch_end(epoch, logs)
def on_train_end(self, logs=None):
super(TrainValTensorBoard, self).on_train_end(logs)
self.val_writer.close()
In __init__, two subdirectories are set up for training and validation logs
In set_model, a writer self.val_writer is created for the validation logs
In on_epoch_end, the validation logs are separated from the training logs and written to file with self.val_writer
Using the MNIST dataset as an example:
from keras.models import Sequential
from keras.layers import Dense
from keras.datasets import mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train = x_train.reshape(60000, 784)
x_test = x_test.reshape(10000, 784)
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255
x_test /= 255
model = Sequential()
model.add(Dense(64, activation='relu', input_shape=(784,)))
model.add(Dense(10, activation='softmax'))
model.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
model.fit(x_train, y_train, epochs=10,
validation_data=(x_test, y_test),
callbacks=[TrainValTensorBoard(write_graph=False)])
You can then visualize the two curves on a same figure in TensorBoard.
EDIT: I've modified the class a bit so that it can be used with eager execution.
The biggest change is that I use tf.keras in the following code. It seems that the TensorBoard callback in standalone Keras does not support eager mode yet.
import os
import tensorflow as tf
from tensorflow.keras.callbacks import TensorBoard
from tensorflow.python.eager import context
class TrainValTensorBoard(TensorBoard):
def __init__(self, log_dir='./logs', **kwargs):
self.val_log_dir = os.path.join(log_dir, 'validation')
training_log_dir = os.path.join(log_dir, 'training')
super(TrainValTensorBoard, self).__init__(training_log_dir, **kwargs)
def set_model(self, model):
if context.executing_eagerly():
self.val_writer = tf.contrib.summary.create_file_writer(self.val_log_dir)
else:
self.val_writer = tf.summary.FileWriter(self.val_log_dir)
super(TrainValTensorBoard, self).set_model(model)
def _write_custom_summaries(self, step, logs=None):
logs = logs or {}
val_logs = {k.replace('val_', ''): v for k, v in logs.items() if 'val_' in k}
if context.executing_eagerly():
with self.val_writer.as_default(), tf.contrib.summary.always_record_summaries():
for name, value in val_logs.items():
tf.contrib.summary.scalar(name, value.item(), step=step)
else:
for name, value in val_logs.items():
summary = tf.Summary()
summary_value = summary.value.add()
summary_value.simple_value = value.item()
summary_value.tag = name
self.val_writer.add_summary(summary, step)
self.val_writer.flush()
logs = {k: v for k, v in logs.items() if not 'val_' in k}
super(TrainValTensorBoard, self)._write_custom_summaries(step, logs)
def on_train_end(self, logs=None):
super(TrainValTensorBoard, self).on_train_end(logs)
self.val_writer.close()
The idea is the same --
Check the source code of TensorBoard callback
See what it does to set up the writer
Do the same thing in this custom callback
Again, you can use the MNIST data to test it,
from tensorflow.keras.datasets import mnist
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.train import AdamOptimizer
tf.enable_eager_execution()
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train = x_train.reshape(60000, 784)
x_test = x_test.reshape(10000, 784)
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255
x_test /= 255
y_train = y_train.astype(int)
y_test = y_test.astype(int)
model = Sequential()
model.add(Dense(64, activation='relu', input_shape=(784,)))
model.add(Dense(10, activation='softmax'))
model.compile(loss='sparse_categorical_crossentropy', optimizer=AdamOptimizer(), metrics=['accuracy'])
model.fit(x_train, y_train, epochs=10,
validation_data=(x_test, y_test),
callbacks=[TrainValTensorBoard(write_graph=False)])
If you are using TensorFlow 2.0, you now get this by default using the Keras TensorBoard callback. (When using TensorFlow with Keras, make sure you're using tensorflow.keras.)
See this tutorial :
https://www.tensorflow.org/tensorboard/scalars_and_keras