How to write graph to tensorboard using tensorflow 2.0? - tensorflow

Am doing this
# eager on
tf.summary.trace_on(graph=True, profiler=True)
tf.summary.trace_export('stuff', step=1, profiler_outdir='output')
# ... call train operation
tf.summary.trace_off()
Profile section shows up in tensorboard but no graph yet.

Please find the github gist here where I have created a graph using Tf2.0 and visualized it in tensorboard. Also for more information, please go through the following link.
Code for the same is mentioned below:
!pip install tensorflow==2.0.0-beta1
import tensorflow as tf
# The function to be traced.
#tf.function
def my_func(x, y):
# A simple hand-rolled layer.
return tf.nn.relu(tf.matmul(x, y))
# Set up logging.
logdir = './logs/func'
writer = tf.summary.create_file_writer(logdir)
# Sample data for your function.
x = tf.random.uniform((3, 3))
y = tf.random.uniform((3, 3))
# Bracket the function call with
# tf.summary.trace_on() and tf.summary.trace_export().
tf.summary.trace_on(graph=True, profiler=True)
# Call only one tf.function when tracing.
z = my_func(x, y)
with writer.as_default():
tf.summary.trace_export(
name="my_func_trace",
step=0,
profiler_outdir=logdir)
%load_ext tensorboard
%tensorboard --logdir ./logs/func
If the answer was helpful, please upvote it. Thanks!

Related

How can I convert Tensor to eagerTensor

I got tensor class from Model.pred() that tensor class is <tf.python.framework.ops.Tensor> (not eager).
but I can't use them for custom loss function. So I tried convert 'that Tensor' to <tf.python.framework.ops.EagerTensor>.
If I convert them I can use .numpy() for a calculate in loss function.
Is there way to convert them?
or Can I get numpy in <... ops.Tensor>?
I'm using Tensorflow 2.3.0
You can either:
Try forcing eager execution with tf.config.run_functions_eagerly(True) or tf.compat.v1.enable_eager_execution() at the start of your code.
Or using a session (documentation here) and calling .eval() on your Tensor instead of .numpy().
Example code of the second possibility:
import tensorflow as tf
tf.compat.v1.disable_eager_execution()
# Build a graph.
a = tf.constant(5.0)
b = tf.constant(6.0)
c = a * b
# Launch the graph in a session.
sess = tf.compat.v1.Session()
with sess.as_default():
print(c.eval())
sess.close()

Hparams plugin with tf.keras (tensorflow 2.0)

I try to follow the example from the tensorflow docs and setup hyperparameter logging. It also mentions that, if you use tf.keras, you can just use the callback hp.KerasCallback(logdir, hparams). However, if I use the callback I don't get my metrics (only the outcome).
The trick is to define the Hparams config with the path in which TensorBoard saves its validation logs.
So, if your TensorBoard callback is set up as:
log_dir = 'path/to/training-logs'
tensorboard_cb = TensorBoard(log_dir=log_dir)
Then you should set up Hparams like this:
hparams_dir = os.path.join(log_dir, 'validation')
with tf.summary.create_file_writer(hparams_dir).as_default():
hp.hparams_config(
hparams=HPARAMS,
metrics=[hp.Metric('epoch_accuracy')] # metric saved by tensorboard_cb
)
hparams_cb = hp.KerasCallback(
writer=hparams_dir,
hparams=HPARAMS
)
I managed but not entirely sure what was the magic word. Here my flow in case it helps.
callbacks.append(hp.KerasCallback(log_dir, hparams))
HP_NUM_LATENT = hp.HParam('num_latent_dim', hp.Discrete([2, 5, 100]))
hparams = {
HP_NUM_LATENT: num_latent,
}
model = create_simple_model(latent_dim=hparams[HP_NUM_LATENT]) # returns compiled model
model.fit(x, y, validation_data=validation_data,
epochs=4,
verbose=2,
callbacks=callbacks)
Since I have lost a couple of hours because of this. I would like to add to the good remark of Julian about defining the hparams config, that the tag of the metric you like to log with hparams and possibly its group in hp.Metric(tag='epoch_accuracy', group='validation') should match the one of a metric that you capture with Keras model.fit(..., metrics=). See hparams_demo for a good example
I just want to add to the previous answers. If you are using TensorBoard in a notebook on Colab, the issue may not be due to your code, but due to how TensorBoard is run on Colab. And the solution is to kill the existing TensorBoard and launch it again.
Please correct me if I am wrong.
Sample code:
from tensorboard.plugins.hparams import api as hp
HP_LR = hp.HParam('learning_rate', hp.Discrete([1e-4, 5e-4, 1e-3]))
HPARAMS = [HP_LR]
# this METRICS does not seem to have any effects in my example as
# hp uses epoch_accuracy and epoch_loss for both training and validation anyway.
METRICS = [hp.Metric('epoch_accuracy', group="validation", display_name='val_accuracy')]
# save the configuration
log_dir = '/content/logs/hparam_tuning'
with tf.summary.create_file_writer(log_dir).as_default():
hp.hparams_config(hparams=HPARAMS, metrics=METRICS)
def fitness_func(hparams, seed):
rng = random.Random(seed)
# here we build the model
model = tf.keras.Sequential(...)
model.compile(..., metrics=['accuracy']) # need to pass the metric of interest
# set up callbacks
_log_dir = os.path.join(log_dir, seed)
tb_callbacks = tf.keras.callbacks.TensorBoard(_log_dir) # log metrics
hp_callbacks = hp.KerasCallback(_log_dir, hparams) # log hparams
# fit the model
history = model.fit(
..., validation_data=(x_te, y_te), callbacks=[tb_callbacks, hp_callbacks])
rng = random.Random(0)
session_index = 0
# random search
num_session_groups = 4
sessions_per_group = 2
for group_index in range(num_session_groups):
hparams = {h: h.domain.sample_uniform(rng) for h in HPARAMS}
hparams_string = str(hparams)
for repeat_index in range(sessions_per_group):
session_id = str(session_index)
session_index += 1
fitness_func(hparams, session_id)
To check if there is any existing TensorBoard process, run the following in Colab:
!ps ax | grep tensorboard
Assume PID for the TensorBoard process is 5315. Then,
!kill 5315
and run
# of course, replace the dir below with your log_dir
%tensorboard --logdir='/content/logs/hparam_tuning'
In my case, after I reset TensorBoard as above, it can properly log the metrics specified in model.compile, i.e., accuracies.

Can't import frozen graph with BatchNorm layer

I have trained a Keras model based on this repo.
After the training I save the model as checkpoint files like this:
sess=tf.keras.backend.get_session()
saver = tf.train.Saver()
saver.save(sess, current_run_path + '/checkpoint_files/model_{}.ckpt'.format(date))
Then I restore the graph from the checkpoint files and freeze it using the standard tf freeze_graph script. When I want to restore the frozen graph I get the following error:
Input 0 of node Conv_BN_1/cond/ReadVariableOp/Switch was passed float from Conv_BN_1/gamma:0 incompatible with expected resource
How can I fix this issue?
Edit: My problem is related to this question. Unfortunately, I can't use the workaround.
Edit 2:
I have opened an issue on github and created a gist to reproduce the error.
https://github.com/keras-team/keras/issues/11032
Just resolved the same issue. I connected this few answers: 1, 2, 3 and realized that issue originated from batchnorm layer working state: training or learning. So, in order to resolve that issue you just need to place one line before loading your model:
keras.backend.set_learning_phase(0)
Complete example, to export model
import tensorflow as tf
from tensorflow.python.framework import graph_io
from tensorflow.keras.applications.inception_v3 import InceptionV3
def freeze_graph(graph, session, output):
with graph.as_default():
graphdef_inf = tf.graph_util.remove_training_nodes(graph.as_graph_def())
graphdef_frozen = tf.graph_util.convert_variables_to_constants(session, graphdef_inf, output)
graph_io.write_graph(graphdef_frozen, ".", "frozen_model.pb", as_text=False)
tf.keras.backend.set_learning_phase(0) # this line most important
base_model = InceptionV3()
session = tf.keras.backend.get_session()
INPUT_NODE = base_model.inputs[0].op.name
OUTPUT_NODE = base_model.outputs[0].op.name
freeze_graph(session.graph, session, [out.op.name for out in base_model.outputs])
to load *.pb model:
from PIL import Image
import numpy as np
import tensorflow as tf
# https://i.imgur.com/tvOB18o.jpg
im = Image.open("/home/chichivica/Pictures/eagle.jpg").resize((299, 299), Image.BICUBIC)
im = np.array(im) / 255.0
im = im[None, ...]
graph_def = tf.GraphDef()
with tf.gfile.GFile("frozen_model.pb", "rb") as f:
graph_def.ParseFromString(f.read())
graph = tf.Graph()
with graph.as_default():
net_inp, net_out = tf.import_graph_def(
graph_def, return_elements=["input_1", "predictions/Softmax"]
)
with tf.Session(graph=graph) as sess:
out = sess.run(net_out.outputs[0], feed_dict={net_inp.outputs[0]: im})
print(np.argmax(out))
This is bug with Tensorflow 1.1x and as another answer stated, it is because of the internal batch norm learning vs inference state. In TF 1.14.0 you actually get a cryptic error when trying to freeze a batch norm layer.
Using set_learning_phase(0) will put the batch norm layer (and probably others like dropout) into inference mode and thus the batch norm layer will not work during training, leading to reduced accuracy.
My solution is this:
Create the model using a function (do not use K.set_learning_phase(0)):
def create_model():
inputs = Input(...)
...
return model
model = create_model()
Train model
Save weights:
model.save_weights("weights.h5")
Clear session (important so layer names are the same) and set learning phase to 0:
K.clear_session()
K.set_learning_phase(0)
Recreate model and load weights:
model = create_model()
model.load_weights("weights.h5")
Freeze as before
Thanks for pointing the main issue! I found that keras.backend.set_learning_phase(0) to be not working sometimes, at least in my case.
Another approach might be: for l in keras_model.layers: l.trainable = False

listing available graphs in tensorflow

I am running into ValueError: Tensor("conv2d_1/kernel:0", ...) must be from the same graph as Tensor("IteratorGetNext:0", ...). I am trying to reuse a keras model with Estimator class.
I tried enclosing everything possible into
g = tf.Graph()
with g.as_default():
import tensorflow as tf
g = tf.Graph()
with g.as_default():
MODEL = get_keras_model(...)
def model_fn(mode, features, labels, params):
logits = MODEL(features)
...
def parser(record):
...
def get_dataset_inp_fn(filenames, epochs=20):
def dataset_input_fn():
dataset = tf.contrib.data.TFRecordDataset(filenames)
dataset = dataset.map(parser)
...
with tf.Session(graph=g) as sess:
est = tf.estimator.Estimator(
model_fn,
model_dir=None,
config=None,
params={"optimizer": "AdamOptimizer",
"opt_params":{}}
)
est.train(get_dataset_inp_fn(["mydata.tfrecords"],epochs=20))
but that is not helpful.
Is there a way to list all graphs defined up to current point?
Here's a general debugging technique, put import pdb; pdb.set_trace() into tf.Graph constructor, and then use bt to figure out who is creating the Graph. My first guess would that Keras does not use the default graph and creates its own. You can do inspect.getsourcefile(tf.Graph) to find where Graph file is located locally
The function that checks the graphs and returns the error (wish they return the graph addresses as well) calls following function to check the graphs:
from tensorflow.python.framework.ops import _get_graph_from_inputs
_get_graph_from_inputs([x])
In this case the graph that keras has created is identical to graph g, but one that is created by get_dataset_inp_fn is different from g.

Tensorflow Estimator API: Summaries

I can't achieve to make summaries work with the Estimator API of Tensorflow.
The Estimator class is very useful for many reasons: I have already implemented my own classes which are really similar but I am trying to switch to this one.
Here is the code sample:
import tensorflow as tf
import tensorflow.contrib.layers as layers
import tensorflow.contrib.learn as learn
import numpy as np
# To reproduce the error: docker run --rm -w /algo -v $(pwd):/algo tensorflow/tensorflow bash -c "python sample.py"
def model_fn(x, y, mode):
logits = layers.fully_connected(x, 12, scope="dense-1")
logits = layers.fully_connected(logits, 56, scope="dense-2")
logits = layers.fully_connected(logits, 4, scope="dense-3")
loss = tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(logits=logits, labels=y), name="xentropy")
return {"predictions":logits}, loss, tf.train.AdamOptimizer(0.001).minimize(loss)
def input_fun():
""" To be completed for a 4 classes classification problem """
feature = tf.constant(np.random.rand(100,10))
labels = tf.constant(np.random.random_integers(0,3, size=(100,)))
return feature, labels
estimator = learn.Estimator(model_fn=model_fn, )
trainingConfig = tf.contrib.learn.RunConfig(save_checkpoints_secs=60)
estimator = learn.Estimator(model_fn=model_fn, model_dir="./tmp", config=trainingConfig)
# Works
estimator.fit(input_fn=input_fun, steps=2)
# The following code does not work
# Can't initialize saver
# saver = tf.train.Saver(max_to_keep=10) # Error: No variables to save
# The following fails because I am missing a saver... :(
hooks=[
tf.train.LoggingTensorHook(["xentropy"], every_n_iter=100),
tf.train.CheckpointSaverHook("./tmp", save_steps=1000, checkpoint_basename='model.ckpt'),
tf.train.StepCounterHook(every_n_steps=100, output_dir="./tmp"),
tf.train.SummarySaverHook(save_steps=100, output_dir="./tmp"),
]
estimator.fit(input_fn=input_fun, steps=2, monitors=hooks)
As you can see, I can create an Estimator and use it but I can achieve to add hooks to the fitting process.
The logging hooks works just fine but the others require both tensors and a saver which I can't provide.
The tensors are defined in the model function, thus I can't pass them to the SummaryHook and the Saver can't be initialized because there is no tensor to save...
Is there a solution to my problem? (I am guessing yes but there is a lack of documentation of this part in the tensorflow documentation)
How can I initialized my saver? Or should I use other objects such as Scaffold?
How can I pass summaries to the SummaryHook since they are defined in my model function?
Thanks in advance.
PS: I have seen the DNNClassifier API but I want to use the estimator API for Convolutional Nets and others. I need to create summaries for any estimator.
The intended use case is that you let the Estimator save summaries for you. There are options in RunConfig for configuring summary writing. RunConfigs get passed when constructing the Estimator.
Just have tf.summary.scalar("loss", loss) in the model_fn, and run the code without summary_hook. The loss is recorded and shown in the tensorboard.
See also:
Tensorflow - Using tf.summary with 1.2 Estimator API