I use tf.summary.tensor_summary in my code, following this: https://www.tensorflow.org/api_docs/python/tf/summary/tensor_summary
But I didn't see anything new in tensorboard, and tensorboard also printed some warnings:
W0321 12:50:47.244003 Reloader tf_logging.py:121] This summary with tag 'component_3_finalize/wealth_tensor' is oddly not associated with a plugin.
How to make this work? Do I need install some plugin? I didn't find any docs on this.
UPDATE:
So I here is how I create my summary:
def finalize_graph(self, graph_building_context):
tf.summary.scalar('loss', loss)
# the wealth tensor is of shape [B], where B is the batch size at runtime
tf.summary.scalar('wealth', tf.reduce_mean(wealth))
# uhmm, this tensor_summary doesn't work yet
tf.summary.tensor_summary('wealth_tensor', wealth)
Then I use a MonitoredTrainingSession, by default it will save the summary, and I can see my loss and wealth scalar summry, but not this wealth_tensor summary.
Related
When loading images from a directory in Tensorflow, you use something like:
dataset = tf.keras.utils.image_dataset_from_directory(
"S:\\Images",
batch_size=32,
image_size=(128,128),
label_mode=None,
validation_split=0.20, #Reserve 20% of images for validation
subset='training', #If we specify a validation_split, we *must* specify subset
seed=619 #If using validation_split we *must* specify a seed to ensure there is no overlap between training and validation data
)
But of course some of the images (.jpg, .png, .gif, .bmp) will be invalid. So we want to ignore those errors; just skip them (and ideally log the filenames so they can be repaired, removed, or deleted).
There have been some ideas along the way of how to ignore invalid images:
Method 1: tf.contrib.data.ignore_errors (Tensorflow 1.x only)
Warning: The tf.contrib module will not be included in TensorFlow 2.0.
Sample usage:
dataset = dataset.apply(tf.contrib.data.ignore_errors())
The only down-side of this method is that it was only available in Tensorflow 1. Trying to use it today simply won't work, as the tf.contib namespace no longer exists. That led to a built-in method:
Method 2: tf.data.experimental.ignore_errors(log_warning=False) (deprecated)
From the documentation:
Creates a Dataset from another Dataset and silently ignores any errors. (deprecated)
Deprecated: THIS FUNCTION IS DEPRECATED. It will be removed in a future version. Instructions for updating: Use tf.data.Dataset.ignore_errors instead.
Sample usage:
dataset = dataset.apply(tf.data.experimental.ignore_errors(log_warning=True))
And this method works. It works great. And it has the advantage of working.
But it's apparently deprecated, and they documentation says we should use method 3:
Method 3 - tf.data.Dataset.ignore_errors(log_warning=False, name=None)
Drops elements that cause errors.
Sample usage:
dataset = dataset.ignore_errors(log_warning=True, name="Loading images from directory")
Except it doesn't work
The dataset.ignore_errors attribute doesn't work, and gives the error:
AttributeError: 'BatchDataset' object has no attribute 'ignore_errors'
Which means:
the thing that works is deprecated
they tell us to use this other thing instead
and "provide the instructions for updating"
but the other thing doesn't work
So we ask Stackoverflow:
How do i use tf.data.Dataset.ignore_errors to ignore errors?
Bonus Reading
TensorFlow Dataset `.map` - Is it possible to ignore errors?
TensorFlow: How to skip broken data
Untested Workaround
Not only is it not what i was asking, but people are not allowed to read this:
It looks like the tf.data.Dataset.ignore_errors() method is not
available in the BatchDataset object, which is what you are using in
your code. You can try using tf.data.Dataset.filter() to filter out
elements that cause errors when loading the images. You can use a
try-except block inside the lambda function passed to filter() to
catch the errors and return False for elements that cause errors,
which will filter them out. Here's an example of how you can use
filter() to achieve this:
def filter_fn(x):
try:
# Load the image and do some processing
# Return True if the image is valid, False otherwise
return True
except:
return False
dataset = dataset.filter(filter_fn)
Alternatively, you can use the tf.data.experimental.ignore_errors()
method, which is currently available in TensorFlow 2.x. This method
will silently ignore any errors that occur while processing the
elements of the dataset. However, keep in mind that this method is
experimental and may be removed or changed in a future version.
tf.data.Dataset.ignore_errors was introduced in TensorFlow 2.11. You can use tf.data.experimental.ignore_errors for older versions like so:
dataset.apply(tf.data.experimental.ignore_errors())
As what I know, when I running tensorflow model on python script I could use the follow code snippet to profile the timeline of each block in the model.
from tensorflow.python.client import timeline
options = tf.RunOptions(trace_level=tf.RunOptions.FULL_TRACE)
run_metadata = tf.RunMetadata()
batch_positive_score = sess.run([positive_score], feed_dict, options=options, run_metadata=run_metadata)
fetched_timeline = timeline.Timeline(run_metadata.step_stats)
chrome_trace = fetched_timeline.generate_chrome_trace_format()
with open('./result/timeline.json', 'w') as f:
f.write(chrome_trace)
But how to profile a model that loading on tensorflow-serving?
I think you can use tf.profiler, even during Serving because, it is finally a Tensorflow Graph and the changes made during Training (including Profiling, as per my understanding) will be reflected in Serving as well.
Please find the below Tensorflow Code:
# User can control the tracing steps and
# dumping steps. User can also run online profiling during training.
#
# Create options to profile time/memory as well as parameters.
builder = tf.profiler.ProfileOptionBuilder
opts = builder(builder.time_and_memory()).order_by('micros').build()
opts2 = tf.profiler.ProfileOptionBuilder.trainable_variables_parameter()
# Collect traces of steps 10~20, dump the whole profile (with traces of
# step 10~20) at step 20. The dumped profile can be used for further profiling
# with command line interface or Web UI.
with tf.contrib.tfprof.ProfileContext('/tmp/train_dir',
trace_steps=range(10, 20),
dump_steps=[20]) as pctx:
# Run online profiling with 'op' view and 'opts' options at step 15, 18, 20.
pctx.add_auto_profiling('op', opts, [15, 18, 20])
# Run online profiling with 'scope' view and 'opts2' options at step 20.
pctx.add_auto_profiling('scope', opts2, [20])
# High level API, such as slim, Estimator, etc.
train_loop()
After that, we can run the below mentioned commands in the command prompt:
bazel-bin/tensorflow/core/profiler/profiler \
--profile_path=/tmp/train_dir/profile_xx
tfprof> op -select micros,bytes,occurrence -order_by micros
# Profiler ui available at: https://github.com/tensorflow/profiler-ui
python ui.py --profile_context_path=/tmp/train_dir/profile_xx
Code for Visualizing Time and Memory:
# The following example generates a timeline.
tfprof> graph -step -1 -max_depth 100000 -output timeline:outfile=<filename>
generating trace file.
******************************************************
Timeline file is written to <filename>.
Open a Chrome browser, enter URL chrome://tracing and load the timeline file.
******************************************************
Attribute TensorFlow graph running time to your Python codes:
tfprof> code -max_depth 1000 -show_name_regexes .*model_analyzer.*py.* -select micros -account_type_regexes .* -order_by micros
Show your model variables and the number of parameters:
tfprof> scope -account_type_regexes VariableV2 -max_depth 4 -select params
Show the most expensive operation types:
tfprof> op -select micros,bytes,occurrence -order_by micros
Auto-profile:
tfprof> advise
For more detailed information on this , you can refer the below links:
Understand all the classes mentioned in this page =>
https://www.tensorflow.org/api_docs/python/tf/profiler
Code is given in detail in the below link:
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/profiler/README.md
I've trained up a model and saved it in a checkpoint, but only just realized that I forgot to name one of the variables I'd like to inspect when I restore the model.
I know how to retrieve named variables from tensorflow, (g = tf.get_default_graph() and then g.get_tensor_by_name([name])). In this case, I know its scope, but it is unnamed. I've tried looking in tf.GraphKeys.GLOBAL_VARIABLES, but it doesn't appear there, for some reason.
Here's how it's defined in the model:
with tf.name_scope("contrastive_loss") as scope:
l2_dist = tf.cast(tf.sqrt(1e-4 + tf.reduce_sum(tf.subtract(pred_left, pred_right), 1)), tf.float32) # the variable I want
# I use it here when calculating another named tensor, if that helps.
con_loss = contrastive_loss(l2_dist)
loss = tf.reduce_sum(con_loss, name="loss")
Is there any way of finding the variable without a name?
First of all, following up on my first comment, it makes sense that tf.get_collection given a name scope is not working. From the documentation, if you provide a scope, only variables or operations with assigned names will be returned. So that's out.
One thing you can try is to list the name of every node in your Graph with:
print([node.name for node in tf.get_default_graph().as_graph_def().node])
Or possibly, when restoring from a checkpoint:
saver = tf.train.import_meta_graph(/path/to/meta/graph)
sess = tf.Session()
saver.restore(sess, /path/to/checkpoints)
graph = sess.graph
print([node.name for node in graph.as_graph_def().node])
Another option is to display the graph using tensorboard or Jupyter Notebook and the show_graph command. There might be a built-in show_graph now, but that link is to a git repository where one is defined. You will then have to search for your operation in the graph and then probably retrieve it with:
my_op = tf.get_collection('full_operation_name')[0]
If you want to set it up in the future so that you can retrieve it by name, you need to add it to a collection using tf.add_to_collection:
my_op = tf.some_operation(stuff, name='my_op')
tf.add_to_collection('my_op_name', my_op)
Then retrieve it by restoring your graph and then using:
my_restored_op = tf.get_collection('my_op_name')[0]
You might also be able to get by just naming it and then specifying its scope in tf.get_collection instead, but I am not sure. More information and a helpful tutorial can be found here.
tf.get_collection does not work with unnamed variables. So list the operations with:
graph = sess.graph
print(graph.get_operations())
... find your tensor in the list and then:
global_step_tensor = graph.get_tensor_by_name('complete_operation_name:0')
And I found this tutorial very helpful to understand the mechanism behind these.
I'm trying to visualize a tensor summary in tensorboard. However I can't see the tensor summary at all in the board. Here is my code:
out = tf.strided_slice(logits, begin=[self.args.uttWindowSize-1, 0], end=[-self.args.uttWindowSize+1, self.args.numClasses],
strides=[1, 1], name='softmax_truncated')
tf.summary.tensor_summary('softmax_input', out)
where out is a multi-dimensional tensor. I guess there must be something wrong with my code. Probably I used the tensor_summary function incorrectly.
What you do is you create a summary op, but you don't invoke it and don't write the summary (see documentation).
To actually create a summary you need to do the following:
# Create a summary operation
summary_op = tf.summary.tensor_summary('softmax_input', out)
# Create the summary
summary_str = sess.run(summary_op)
# Create a summary writer
writer = tf.train.SummaryWriter(...)
# Write the summary
writer.add_summary(summary_str)
Explicitly writing a summary (last two lines) is only necessary if you don't have a higher level helper like a Supervisor. Otherwise you invoke
sv.summary_computed(sess, summary_str)
and the Supervisor will handle it.
More info, also see:
How to manually create a tf.Summary()
Hopefully a workaround which achieves what you want. ..
If you wish to view the tensor values, you can convert them using as_string, then use summary.text. The values will appear in the tensorboard text tab.
Not tried with 3D tensors, but feel free to slice according to needs.
code snippet, which includes use of inserting a print statement to get console output as well.
predictions = tf.argmax(reshaped_logits, 1)
txtPredictions = tf.Print(tf.as_string(predictions),[tf.as_string(predictions)], message='predictions', name='txtPredictions')
txtPredictions_op = tf.summary.text('predictions', txtPredictions)
Not sure whether this is kinda obvious, but you could use something like
def make_tensor_summary(tensor, name='defaultTensorName'):
for i in range(tensor.get_shape()[0]:
for j in range(tensor.get_shape()[1]:
tf.summary.scalar(Name + str(i) + '_' + str(j), tensor[i, j])
in case you know it is a 'matrix-shaped' Tensor in advance.
After following this tutorial on summaries and TensorBoard, I've been able to successfully save and look at data with TensorBoard. Is it possible to open this data with something other than TensorBoard?
By the way, my application is to do off-policy learning. I'm currently saving each state-action-reward tuple using SummaryWriter. I know I could manually store/train on this data, but I thought it'd be nice to use TensorFlow's built in logging features to store/load this data.
As of March 2017, the EventAccumulator tool has been moved from Tensorflow core to the Tensorboard Backend. You can still use it to extract data from Tensorboard log files as follows:
from tensorboard.backend.event_processing.event_accumulator import EventAccumulator
event_acc = EventAccumulator('/path/to/summary/folder')
event_acc.Reload()
# Show all tags in the log file
print(event_acc.Tags())
# E. g. get wall clock, number of steps and value for a scalar 'Accuracy'
w_times, step_nums, vals = zip(*event_acc.Scalars('Accuracy'))
Easy, the data can actually be exported to a .csv file within TensorBoard under the Events tab, which can e.g. be loaded in a Pandas dataframe in Python. Make sure you check the Data download links box.
For a more automated approach, check out the TensorBoard readme:
If you'd like to export data to visualize elsewhere (e.g. iPython
Notebook), that's possible too. You can directly depend on the
underlying classes that TensorBoard uses for loading data:
python/summary/event_accumulator.py (for loading data from a single
run) or python/summary/event_multiplexer.py (for loading data from
multiple runs, and keeping it organized). These classes load groups of
event files, discard data that was "orphaned" by TensorFlow crashes,
and organize the data by tag.
As another option, there is a script
(tensorboard/scripts/serialize_tensorboard.py) which will load a
logdir just like TensorBoard does, but write all of the data out to
disk as json instead of starting a server. This script is setup to
make "fake TensorBoard backends" for testing, so it is a bit rough
around the edges.
I think the data are encoded protobufs RecordReader format. To get serialized strings out of files you can use py_record_reader or build a graph with TFRecordReader op, and to deserialize those strings to protobuf use Event schema. If you get a working example, please update this q, since we seem to be missing documentation on this.
I did something along these lines for a previous project. As mentioned by others, the main ingredient is tensorflows event accumulator
from tensorflow.python.summary import event_accumulator as ea
acc = ea.EventAccumulator("folder/containing/summaries/")
acc.Reload()
# Print tags of contained entities, use these names to retrieve entities as below
print(acc.Tags())
# E. g. get all values and steps of a scalar called 'l2_loss'
xy_l2_loss = [(s.step, s.value) for s in acc.Scalars('l2_loss')]
# Retrieve images, e. g. first labeled as 'generator'
img = acc.Images('generator/image/0')
with open('img_{}.png'.format(img.step), 'wb') as f:
f.write(img.encoded_image_string)
You can also use the tf.train.summaryiterator: To extract events in a ./logs-Folder where only classic scalars lr, acc, loss, val_acc and val_loss are present you can use this GIST: tensorboard_to_csv.py
Chris Cundy's answer works well when you have less than 10000 data points in your tfevent file. However, when you have a large file with over 10000 data points, Tensorboard will automatically sampling them and only gives you at most 10000 points. It is a quite annoying underlying behavior as it is not well-documented. See https://github.com/tensorflow/tensorboard/blob/master/tensorboard/backend/event_processing/event_accumulator.py#L186.
To get around it and get all data points, a bit hacky way is to:
from tensorboard.backend.event_processing.event_accumulator import EventAccumulator
class FalseDict(object):
def __getitem__(self,key):
return 0
def __contains__(self, key):
return True
event_acc = EventAccumulator('path/to/your/tfevents',size_guidance=FalseDict())
It looks like for tb version >=2.3 you can streamline the process of converting your tb events to a pandas dataframe using tensorboard.data.experimental.ExperimentFromDev().
It requires you to upload your logs to TensorBoard.dev, though, which is public. There are plans to expand the capability to locally stored logs in the future.
https://www.tensorflow.org/tensorboard/dataframe_api
You can also use the EventFileLoader to iterate through a tensorboard file
from tensorboard.backend.event_processing.event_file_loader import EventFileLoader
for event in EventFileLoader('path/to/events.out.tfevents.xxx').Load():
print(event)
Surprisingly, the python package tb_parse has not been mentioned yet.
From documentation:
Installation:
pip install tensorflow # or tensorflow-cpu pip install -U tbparse # requires Python >= 3.7
Note: If you don't want to install TensorFlow, see Installing without TensorFlow.
We suggest using an additional virtual environment for parsing and plotting the tensorboard events. So no worries if your training code uses Python 3.6 or older versions.
Reading one or more event files with tbparse only requires 5 lines of code:
from tbparse import SummaryReader
log_dir = "<PATH_TO_EVENT_FILE_OR_DIRECTORY>"
reader = SummaryReader(log_dir)
df = reader.scalars
print(df)