How to profile a keras predict call with tensorboard - tensorflow

I would like to have a timing profile/trace of a predict call to get an estimate of how fast my model can perform inference.
Right now I'm using:
log_dir="logs/profile/" + datetime.datetime.now().strftime("%Y%m%d-%H%M%S")
tensorboard_callback = tf.keras.callbacks.TensorBoard(log_dir=log_dir, profile_batch = 1)
x_test, y_test = next(iter(training_ds))
_ = unet.predict(x_test, verbose=1, callbacks=[tensorboard_callback])
But the profiling tab does not show up in tensorboard. What am I missing here?

First, see if CUPTI is correctly loading. In a terminal you should see something like:
2019-12-13 12:01:47.617853: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcupti.so.10.0
If it didn't find the CUPTI libraries, make sure that your LD_LIBRARY_PATH is set correctly. $ echo $LD_LIBRARY_PATH should return something like:
/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64
If this is all set, run the following snippet of code, assuming you have described your model in tensorflow/keras:
# Set up logging.
stamp = datetime.datetime.now().strftime("%Y%m%d-%H%M%S")
logdir = 'logs/trace/%s' % stamp
writer = tf.summary.create_file_writer(logdir)
tf.summary.trace_on(graph=True, profiler=True)
# Forward pass
input, label = next(iter(dataset)) # tf DataSet object
your_model(input)
with writer.as_default():
tf.summary.trace_export(name="model_trace", step=0, profiler_outdir=logdir)
Now, the next final step is critical for viewing the trace in Tensorboard: you have to view Tensorboard in Chrome for it to parse the .trace file correctly.

If you are using virtual environments, make sure you don't mix things up. See my answer to another question.
Also, there are 4 methods to write profiling data. As you tried callbacks before, you should give the other methods a try: Overview of profiling methods.

First of all, make sure that CUPTI is correctly loading. Recently I have tried to find the solution and I used following:
import tensorflow as tf
logs = "../logs/"
for input_data in datas:
with tf.profiler.experimental.Profile(logs):
out_pred = model.predict(input_data)
pass

Related

How do I create a Sagemaker training job with my own Tensorflow code without having to build a container?

I'm trying to define a Sagemaker Training Job with an existing Python class. To my understanding, I could create my own container but would rather not deal with container management.
When choosing "Algorithm Source" there is the option of "Your own algorithm source" but nothing is listed under resources. Where does this come from?
I know I could do this through a notebook, but I really want this defined in a job that can be invoked through an endpoint.
As Bruno has said you will have to use a container somewhere, but you can use an existing container to run your own custom tensorflow code.
There is a good example in the sagemaker github for how to do this.
The way this works is you modify your code to have an entry point which takes argparse command line arguments, and then you point a 'Sagemaker Tensorflow estimator' to the entry point. Then when you call fit on the sagemaker estimator it will download the tensorflow container and run your custom code in there.
So you start off with your own custom code that looks something like this
# my_custom_code.py
import tensorflow as tf
import numpy as np
def build_net():
# single fully connected
image_place = tf.placeholder(tf.float32, [None, 28*28])
label_place = tf.placeholder(tf.int32, [None,])
net = tf.layers.dense(image_place, units=1024, activation=tf.nn.relu)
net = tf.layers.dense(net, units=10, activation=None)
return image_place, label_place, net
def process_data():
# load
(x_train, y_train), (_, _) = tf.keras.datasets.mnist.load_data()
# center
x_train = x_train / 255.0
m = x_train.mean()
x_train = x_train - m
# convert to right types
x_train = x_train.astype(np.float32)
y_train = y_train.astype(np.int32)
# reshape so flat
x_train = np.reshape(x_train, [-1, 28*28])
return x_train, y_train
def train_model(init_learn, epochs):
image_p, label_p, logit = build_net()
x_train, y_train = process_data()
loss = tf.nn.softmax_cross_entropy_with_logits_v2(
logits=logit,
labels=label_p)
optimiser = tf.train.AdamOptimizer(init_learn)
train_step = optimiser.minimize(loss)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
for _ in range(epochs):
sess.run(train_step, feed_dict={image_p: x_train, label_p: y_train})
if __name__ == '__main__':
train_model(0.001, 10)
To make it work with sagemaker we need to create a command line entry point, which will allow sagemaker to run it in the container it will download for us eventually.
# entry.py
import argparse
from my_custom_code import train_model
if __name__ == '__main__':
parser = argparse.ArgumentParser(
formatter_class=argparse.ArgumentDefaultsHelpFormatter)
parser.add_argument(
'--model_dir',
type=str)
parser.add_argument(
'--init_learn',
type=float)
parser.add_argument(
'--epochs',
type=int)
args = parser.parse_args()
train_model(args.init_learn, args.epochs)
Apart from specifying the arguments my function needs to take, we also need to provide a model_dir argument. This is always required, and is an S3 location which is where an model artifacts will be saved when the training job completes. Note that you don't need to specify what this value is (though you can) as Sagemaker will provide a default location in S3 for you.
So we have modified our code, now we need to actually run it on Sagemaker. Go to the AWS console and fire up a small instance from Sagemaker. Download your custom code to the instance, and then create a jupyter notebook as follows:
# sagemaker_run.ipyb
import sagemaker
from sagemaker.tensorflow import TensorFlow
hyperparameters = {
'epochs': 10,
'init_learn': 0.001}
role = sagemaker.get_execution_role()
source_dir = '/path/to/folder/with/my/code/on/instance'
estimator = TensorFlow(
entry_point='entry.py',
source_dir=source_dir,
train_instance_type='ml.t2.medium',
train_instance_count=1,
hyperparameters=hyperparameters,
role=role,
py_version='py3',
framework_version='1.12.0',
script_mode=True)
estimator.fit()
Running the above will:
Spin up an ml.t2.medium instance
Download the tensorflow 1.12.0 container to the instance
Download any data we specify in fit to the newly created instance in fit (in this case nothing)
Run our code on the instance
upload the model artifacts to model_dir
And that is pretty much it. There is of course a lot not mentioned here but you can:
Download training/testing data from s3
Save checkpoint files, and tensorboard files during training and upload them to s3
The best resource I found was the example I shared but here are all the things I was looking at to get this working:
example code again
documentation
explanation of environment variables
I believe this is not possible as you may refer from this part on SageMaker documentation. A container is needed for providing the capability to run with any language and framework.
The algorithms that are listed in training job creation, are the algorithms you can create in SageMaker -> Training -> Algorithms. But, it's necessary to define a container, which is a specification of how you can do training and predictions. Even if you don't build a container, you will refer to an existing one (using a built-in algorithm) or you will be using an algorithm from marketplace wich someones built that using an image.
I believe you could build an image that attends your needs parting from an existing one.
After you build the image, you can easily use it to automate your training/prediction jobs from lambda. Here is an example.
Also, you can provide as many input channels to your container as you need to load data, in theory, you can pass a channel that refers to a script that you want to load while your container starts. But is an idea that I've just had, depending on your scenario, could be worth a test. Normally, you could have an image which you could customize during the docker build process. So, if have several different scripts, you can create only one image and just parameterize it to use a custom script.
Here you can find a custom image that uses Tensorflow.
Here are listed a lot of examples of building different containers for several frameworks, also, Tensorflow.
I hope it helps, let me know if need more info.
Regards.
I am not sure if this helps you, but you can make use of Tensorflow estimators which is a like an inbuilt container from AWS. You need a training script and requirements.txt file which will contain the dependencies that you may need. You can follow this link for more information Sagemaker TensorFlow estimators documentation

Tensorboard doesn't show runtime/memory for all operations

I have a network implemented in TensorFlow that takes very long to train and therefore want to profile it to see which parts cause the long runtime.
To do that, I follow the instructions here to capture runtime and memory information. My code looks like this:
// define network
loss = ...
train_op = tf.train.AdamOptimizer().minimize(loss, global_step=global_step)
// run forward and backward prop for one batch
run_metadata = tf.RunMetadata()
options = tf.RunOptions(trace_level=tf.RunOptions.FULL_TRACE)
_,loss,sum = sess.run([train_op,loss,sum], feed_dict=fd, options=options, run_metadata=run_metadata)
writer.add_run_metadata(run_metadata, 'step_%d' % step)
I can then see "session runs" in TensorBoard. However, as soon as I load a session run, most operations in my graph turn orange as shown below and no runtime or memory information is available for them:
According to the legend, these operations are "unsused". But that cannot be the case, as almost everything except "loss" and "opt" are shown like that. Clearly, the whole network has to be used to compute the loss. So I don't really see why the graph is shown like this.
I use TF 1.3 on a Tesla K40c.
I used to have the same problem as you with Tensorboard not registering anything in my session run except the gradient and optimizer ops.
I fixed it by upgrading my version of Tensorflow to the 1.4 release.
Not sure.
Try adding this line
writer.add_summary(_, step)
after
writer.add_run_metadata...

Saving tf.trainable_variables() using convert_variables_to_constants

I have a Keras model that I would like to convert to a Tensorflow protobuf (e.g. saved_model.pb).
This model comes from transfer learning on the vgg-19 network in which and the head was cut-off and trained with fully-connected+softmax layers while the rest of the vgg-19 network was frozen
I can load the model in Keras, and then use keras.backend.get_session() to run the model in tensorflow, generating the correct predictions:
frame = preprocess(cv2.imread("path/to/img.jpg")
keras_model = keras.models.load_model("path/to/keras/model.h5")
keras_prediction = keras_model.predict(frame)
print(keras_prediction)
with keras.backend.get_session() as sess:
tvars = tf.trainable_variables()
output = sess.graph.get_tensor_by_name('Softmax:0')
input_tensor = sess.graph.get_tensor_by_name('input_1:0')
tf_prediction = sess.run(output, {input_tensor: frame})
print(tf_prediction) # this matches keras_prediction exactly
If I don't include the line tvars = tf.trainable_variables(), then the tf_prediction variable is completely wrong and doesn't match the output from keras_prediction at all. In fact all the values in the output (single array with 4 probability values) are exactly the same (~0.25, all adding to 1). This made me suspect that weights for the head are just initialized to 0 if tf.trainable_variables() is not called first, which was confirmed after inspecting the model variables. In any case, calling tf.trainable_variables() causes the tensorflow prediction to be correct.
The problem is that when I try to save this model, the variables from tf.trainable_variables() don't actually get saved to the .pb file:
with keras.backend.get_session() as sess:
tvars = tf.trainable_variables()
constant_graph = graph_util.convert_variables_to_constants(sess, sess.graph.as_graph_def(), ['Softmax'])
graph_io.write_graph(constant_graph, './', 'saved_model.pb', as_text=False)
What I am asking is, how can I save a Keras model as a Tensorflow protobuf with the tf.training_variables() intact?
Thanks so much!
So your approach of freezing the variables in the graph (converting to constants), should work, but isn't necessary and is trickier than the other approaches. (more on this below). If your want graph freezing for some reason (e.g. exporting to a mobile device), I'd need more details to help debug, as I'm not sure what implicit stuff Keras is doing behind the scenes with your graph. However, if you want to just save and load a graph later, I can explain how to do that, (though no guarantees that whatever Keras is doing won't screw it up..., happy to help debug that).
So there are actually two formats at play here. One is the GraphDef, which is used for Checkpointing, as it does not contain metadata about inputs and outputs. The other is a MetaGraphDef which contains metadata and a graph def, the metadata being useful for prediction and running a ModelServer (from tensorflow/serving).
In either case you need to do more than just call graph_io.write_graph because the variables are usually stored outside the graphdef.
There are wrapper libraries for both these use cases. tf.train.Saver is primarily used for saving and restoring checkpoints.
However, since you want prediction, I would suggest using a tf.saved_model.builder.SavedModelBuilder to build a SavedModel binary. I've provided some boiler plate for this below:
from tensorflow.python.saved_model.signature_constants import DEFAULT_SERVING_SIGNATURE_DEF_KEY as DEFAULT_SIG_DEF
builder = tf.saved_model.builder.SavedModelBuilder('./mymodel')
with keras.backend.get_session() as sess:
output = sess.graph.get_tensor_by_name('Softmax:0')
input_tensor = sess.graph.get_tensor_by_name('input_1:0')
sig_def = tf.saved_model.signature_def_utils.predict_signature_def(
{'input': input_tensor},
{'output': output}
)
builder.add_meta_graph_and_variables(
sess, tf.saved_model.tag_constants.SERVING,
signature_def_map={
DEFAULT_SIG_DEF: sig_def
}
)
builder.save()
After running this code you should have a mymodel/saved_model.pb file as well as a directory mymodel/variables/ with protobufs corresponding to the variable values.
Then to load the model again, simply use tf.saved_model.loader:
# Does Keras give you the ability to start with a fresh graph?
# If not you'll need to do this in a separate program to avoid
# conflicts with the old default graph
with tf.Session(graph=tf.Graph()):
meta_graph_def = tf.saved_model.loader.load(
sess,
tf.saved_model.tag_constants.SERVING,
'./mymodel'
)
# From this point variables and graph structure are restored
sig_def = meta_graph_def.signature_def[DEFAULT_SIG_DEF]
print(sess.run(sig_def.outputs['output'], feed_dict={sig_def.inputs['input']: frame}))
Obviously there's a more efficient prediction available with this code through tensorflow/serving, or Cloud ML Engine, but this should work.
It's possible that Keras is doing something under the hood which will interfere with this process as well, and if so we'd like to hear about it (and I'd like to make sure that Keras users are able to freeze graphs as well, so if you want to send me a gist with your full code or something maybe I can find someone who knows Keras well to help me debug.)
EDIT: You can find an end to end example of this here: https://github.com/GoogleCloudPlatform/cloudml-samples/blob/master/census/keras/trainer/model.py#L85

Cannot run Tensorflow code multiple times in Jupyter Notebook

I'm struggling running Tensorflow (v1.1) code multiple times in Jupyter Notebook.
For example, I execute this simple code snippet that creates an encoding layer for a seq2seq model:
# Construct encoder layer (LSTM)
encoder_cell = tf.contrib.rnn.LSTMCell(encoder_hidden_units)
encoder_outputs, encoder_final_state = tf.nn.dynamic_rnn(
encoder_cell, encoder_inputs_embedded,
dtype=tf.float32, time_major=False
)
First time is totally fine, my encoder is created.
However, if I rerun it (no matter the changes I've applied), I get this error:
Attempt to have a second RNNCell use the weights of a variable scope that already has weights
It's very annoying as it forces me to restart the kernel every time I want to change a layer.
Can someone explain me why this happens and how I can fix this ?
Thanks!
You are trying to build the exact same graph twice and therefore TensorFlow complains because the variables already exist in the default graph.
What you could do is to call tf.reset_default_graph() before trying to call the method a second time to ensure you create a new graph when required.
Just in case, I would also suggest using an interactive session as described here in the Start TensorFlow InteractiveSession section:
import tensorflow as tf
sess = tf.InteractiveSession()

TensorFlow, TensorBoard: No scalar data was found

I'm trying to figure out how to operate tensorboard.
I looked at the demo here:
https://www.tensorflow.org/code/tensorflow/examples/tutorials/mnist/mnist_with_summaries.py
It runs well on my laptop.
Much of it makes sense to me.
So, I wrote a simple tensorflow demo:
# tensorboard_demo1.py
import tensorflow as tf
sess = tf.Session()
with tf.name_scope('scope1'):
y1 = tf.constant(22.9) * 1.1
tf.scalar_summary('y1 scalar_summary', y1)
train_writer = tf.train.SummaryWriter('/tmp/tb1',sess.graph)
print('Result:')
# Now I should run the compute graph:
print(sess.run(y1))
train_writer.close()
# done
It seems to run okay.
Next I ran a simple shell command:
tensorboard --log /tmp/tb1
It told me to browse 0.0.0.0:6006
Which I did.
The web page tells me:
No scalar data was found.
How do I enhance my demo so that it logs a scalar-summary which tensorboard will show me?
You must call train_writer.add_summary() to add some data to the log. For example, one common pattern is to use tf.merge_all_summaries() to create a tensor that implicitly incorporates information from all summaries created in the current graph:
# Creates a TensorFlow tensor that includes information from all summaries
# defined in the current graph.
summary_t = tf.merge_all_summaries()
# Computes the current value of all summaries in the current graph.
summary_val = sess.run(summary_t)
# Writes the summary to the log.
train_writer.add_summary(summary_val)