Device placement unknown in Tensorboard - tensorflow

I'd like to investigate device placement in the tensorboard using the following code for generating the graph in the summary
# Build the summary operation based on the TF collection of Summaries.
summary_op = tf.merge_all_summaries()
saver = tf.train.Saver(tf.all_variables())
summary_writer = tf.train.SummaryWriter(log_directory, graph_def=sess.graph_def)
This works for displaying the graph and summaries defined in the graph. But when selecting 'device placement' in the tensorboard, all nodes are assigned to 'unknown device'. Do I need to dump the device placement in some other way?

The TensorBoard graph visualizer only sees the explicit device assignments that you have made in your program (i.e. those made using with tf.Device("..."): blocks).
The reason for this is that the nodes in a TensorFlow graph are assigned to devices in multiple stages. The first stage, in the client (e.g. your Python program) allows you to explicitly—and optionally—assign devices to each node, and it is the output of this stage that is written to the TensorBoard logs. A later placement stage runs inside the TensorFlow backend, and assigns every node to a device.
I suspect you want to analyze the results of the later placement stage. Currently there is no support for this in TensorBoard, but you can extract some information by creating the tf.Session as follows:
sess = tf.Session(config=tf.ConfigProto(
log_device_placement=True))
…and then the device placement decisions will be logged to stderr.

Related

Running Tensorflow model inference script on multiple GPU

I'm trying to run the model scoring (inference graph) from tensorflow objec detection API to run it on multiple GPU's, tried specifying the GPU number in the main, but it runs only on single GPU.placed GPU utilization snapshot here
Using tensorflow-gpu==1.13.1, can you kindly point me what I'm missing here.
for i in range(2):
with tf.device('/gpu:{}' . format(i)):
tf_init()
init = tf.global_variables_initializer
with detection_graph.as_default():
with tf.Session(config=tf.ConfigProto(log_device_placement=True)) as session:
call to #run_inference_multiple_images function
The responses to this question should give you a few options for fixing this.
Usually TensorFlow will occupy all visible GPUs unless told otherwise. So if you haven't already tried, you could just remove the with tf.device line (assuming you only have the two GPUs) and TensorFlow should use them both.
Otherwise, I think the easiest is setting the environment variables with os.environ["CUDA_VISIBLE_DEVICES"] = "0,1".

Tensorboard doesn't show runtime/memory for all operations

I have a network implemented in TensorFlow that takes very long to train and therefore want to profile it to see which parts cause the long runtime.
To do that, I follow the instructions here to capture runtime and memory information. My code looks like this:
// define network
loss = ...
train_op = tf.train.AdamOptimizer().minimize(loss, global_step=global_step)
// run forward and backward prop for one batch
run_metadata = tf.RunMetadata()
options = tf.RunOptions(trace_level=tf.RunOptions.FULL_TRACE)
_,loss,sum = sess.run([train_op,loss,sum], feed_dict=fd, options=options, run_metadata=run_metadata)
writer.add_run_metadata(run_metadata, 'step_%d' % step)
I can then see "session runs" in TensorBoard. However, as soon as I load a session run, most operations in my graph turn orange as shown below and no runtime or memory information is available for them:
According to the legend, these operations are "unsused". But that cannot be the case, as almost everything except "loss" and "opt" are shown like that. Clearly, the whole network has to be used to compute the loss. So I don't really see why the graph is shown like this.
I use TF 1.3 on a Tesla K40c.
I used to have the same problem as you with Tensorboard not registering anything in my session run except the gradient and optimizer ops.
I fixed it by upgrading my version of Tensorflow to the 1.4 release.
Not sure.
Try adding this line
writer.add_summary(_, step)
after
writer.add_run_metadata...

How to smoothly produce Tensorflow auc summaries for training and test sets?

Tensorflow describes writing file summaries to visualize graph execution.
I envision three stages:
training the data (with optimization)
measuring accuracy on the training set (no optimization)
measuring accuracy on the test set (no optimization!)
I'd like all stages in the same script, as in the evaluate function of the wide_and_deep tutorial, but with the low-level API. I'd like three different graphs for stats like loss or AUC, one for each stage.
Suppose I use one session, and in each stage I define an AUC summary op:
# define auc
auc, auc_op = tf.metrics.auc(labels, predictions)
# summary scalar to track it
tf.summary.scalar("auc", auc_op, family=family_name)
# merge all summaries for evaluation and later writing
summary_op = tf.summary.merge_all()
...
summary_writer.add_summary(summary, step_num)
There are three graphs, but the first graph has all three runs on it, and the second graph has the last two runs (see below). What's worse, each stage starts from the previous state. This makes sense, because all the variables from the previous stages are still around.
I could use a different session for each stage, but that would throw away the model as well.
What is the smooth way to handle this?
I'd like to just clear some of the summary variables. I've tried re-initializing some variables, looked at related questions, read about name scope and variable scope and tried not to re-use variables for AUC, read about variables and sharing, looked into pruning nodes (though I don't understand it), etc. I have not made it work yet.
I am using the low-level API. I saw something like this in the high-level API in _eval_metric_ops, but I don't understand how they 'clear' the different stages. With name_scope?
Do I have to save and load the model into a new session just for this, or is there some clean way to graph each summary separately?
The metric ops will be local variables, so you could run tf.local_variables_initializer() in your Session, which will reset all of your metrics. You could also look through the local variables collection for those with "auc" in the name if you wanted to be a bit more discerning. The high-level way to do this would be to use an Estimator, which will manage metrics for you.

How to properly freeze a tensorflow graph containing a LookupTable

I am working with a model that uses multiple lookup tables to transform the model input from text to feature ids. I am able to train the model fine. I am able to load it via the javacpp bindings. I am using a default Saver object via the tensor flow supervisor on a periodic basis.
When I try to run the model I get the following error:
Table not initialized.
[[Node: hash_table_Lookup_3 = LookupTableFind[Tin=DT_STRING, Tout=DT_INT64,
_class=["loc:#string_to_index_2/hash_table"], _output_shapes=[[-1]],
_device="/job:localhost/replica:0/task:0/cpu:0"]
(string_to_index_2/hash_table, ParseExample/ParseExample:5, string_to_index_2/hash_table/Const)]]
I prepare the model by using the freeze_graph.py script as follows:
bazel-bin/tensorflow/python/tools/freeze_graph --input_graph=/tmp/tf/graph.pbtxt
--input_checkpoint=/tmp/tf/model.ckpt-0 --output_graph=/tmp/ticker_classifier.pb
--output_node_names=sigmoid --initializer_nodes=init_all_tables
As far as I can tell specifying the initializer_nodes has no effect on the resulting file. Am I running into something that is not currently supported? If not than is there something else I need to do to prepare the graph to be frozen?
I had the same problem when using C++ to invoke TF API to run the inference. It seems the reason is I train a model using tf.feature_column.categorical_column_with_hash_bucket, which needs to be initialized like this:
table_init_op = tf.tables_initializer(name="init_all_tables")
sess.run(table_init_op)
So when you want to freeze the model, you must append the name of table_init_op to the argument "--output_node_names":
freeze_graph --input_graph=/tmp/tf/graph.pbtxt
--input_checkpoint=/tmp/tf/model.ckpt-0
-- output_graph=/tmp/ticker_classifier.pb
--output_node_names=sigmoid,init_all_tables
--initializer_nodes=init_all_tables
When you load and init model in C++, you should first invoke TF C++ API like this:
std::vector<Tensor> dummy_outputs;
Status st = session->Run({}, {}, {"init_all_tables"}, dummy_outputs);
Now you have initialized all tables and can do other things such as inference. This issue may give you a help.

TensorFlow, TensorBoard: No scalar data was found

I'm trying to figure out how to operate tensorboard.
I looked at the demo here:
https://www.tensorflow.org/code/tensorflow/examples/tutorials/mnist/mnist_with_summaries.py
It runs well on my laptop.
Much of it makes sense to me.
So, I wrote a simple tensorflow demo:
# tensorboard_demo1.py
import tensorflow as tf
sess = tf.Session()
with tf.name_scope('scope1'):
y1 = tf.constant(22.9) * 1.1
tf.scalar_summary('y1 scalar_summary', y1)
train_writer = tf.train.SummaryWriter('/tmp/tb1',sess.graph)
print('Result:')
# Now I should run the compute graph:
print(sess.run(y1))
train_writer.close()
# done
It seems to run okay.
Next I ran a simple shell command:
tensorboard --log /tmp/tb1
It told me to browse 0.0.0.0:6006
Which I did.
The web page tells me:
No scalar data was found.
How do I enhance my demo so that it logs a scalar-summary which tensorboard will show me?
You must call train_writer.add_summary() to add some data to the log. For example, one common pattern is to use tf.merge_all_summaries() to create a tensor that implicitly incorporates information from all summaries created in the current graph:
# Creates a TensorFlow tensor that includes information from all summaries
# defined in the current graph.
summary_t = tf.merge_all_summaries()
# Computes the current value of all summaries in the current graph.
summary_val = sess.run(summary_t)
# Writes the summary to the log.
train_writer.add_summary(summary_val)