How to profile GPflow optimization process using timeline? - tensorflow

I am trying to profile GPflow using timeline and visualizing it with chrome tracing. But the trace does not seem to show the optimization process (only model construction and prediction). I define a custom config:
custom_config = gpflow.settings.get_settings()
custom_config.profiling.output_file_name = 'gpflow_timeline'
custom_config.profiling.dump_timeline = True
And try to make a simple prediction after optimization:
with gpflow.settings.temp_settings(custom_config), gpflow.session_manager.get_session().as_default():
k = gpflow.kernels.RBF()
m = gpflow.models.GPR(X_train, y_train, kern=k)
run_adam(m, lr=0.1, iterations=100, callback=__PrintAction(m, 'GPR with Adam'))
mean, var = m.predict_y(X_test)
where Adam optimizer is defined as:
class __PrintAction(Action):
def __init__(self, model, text):
self.model = model
self.text = text
def run(self, ctx):
likelihood = ctx.session.run(self.model.likelihood_tensor)
print('{}: iteration {} likelihood {:.4f}'.format(self.text, ctx.iteration, likelihood))
def run_adam(model, lr, iterations, callback=None):
adam = gpflow.train.AdamOptimizer(lr).make_optimize_action(model)
actions = [adam] if callback is None else [adam, callback]
loop = Loop(actions, stop=iterations)()
model.anchor(model.enquire_session())
Is it somehow possible to also show the optimization trace on the timeline?

Extension to #tadejk answer:
You can modify gpflowrc in GPflow/gpflow project folder instead or create it in the same folder where you run the code and tune your profiling parameters there.
[logging]
# possible levels: CRITICAL, ERROR, WARNING, INFO, DEBUG, NOTSET
level = WARNING
[verbosity]
tf_compile_verb = False
[dtypes]
float_type = float64
int_type = int32
[numerics]
jitter_level = 1e-6
# quadrature can be set to: allow, warn, error
ekern_quadrature = warn
[profiling]
dump_timeline = False
dump_tensorboard = False
output_file_name = timeline
output_directory = ./
each_time = False
[session]
intra_op_parallelism_threads = 0
inter_op_parallelism_threads = 0
Not 100% sure, but merging everything into one json file might be a bad idea. Single file produced by a session.run, therefore merging everything into one can mess things up.

I have set:
custom_config.profiling.each_time = True
to get the trace files after each run. I then merged the traces using jq:
jq -s '{traceEvents: map(.traceEvents[])}' gpflow_timeline_* >> gpflow_timeline_all.json

Related

How to calculate cosine similarity given sparse matrix data in TensorFlow?

I'm supposed to change part of a python script on the GitHub website. This code is an attention-based similarity measure, but I want to turn it to cosine similarity.
The respective code is in the layers.py file (inside the call method).
Attention-Based:
def __call__(self, inputs):
x = inputs
# dropout
if self.sparse_inputs:
x = sparse_dropout(x, 1-self.dropout, self.num_features_nonzero)
else:
x = tf.nn.dropout(x, 1-self.dropout)
# graph learning
h = dot(x, self.vars['weights'], sparse=self.sparse_inputs)
N = self.num_nodes
edge_v = tf.abs(tf.gather(h,self.edge[0]) - tf.gather(h,self.edge[1]))
edge_v = tf.squeeze(self.act(dot(edge_v, self.vars['a'])))
sgraph = tf.SparseTensor(indices=tf.transpose(self.edge), values=edge_v, dense_shape=[N, N])
sgraph = tf.sparse_softmax(sgraph)
return h, sgraph
I edited the above code to what I believe are my requirements (cosine similarity). However, when I run the following code, like so:
def __call__(self, inputs):
x = inputs
# dropout
if self.sparse_inputs:
x = sparse_dropout(x, 1-self.dropout, self.num_features_nonzero)
else:
x = tf.nn.dropout(x, 1-self.dropout)
# graph learning
h = dot(x, self.vars['weights'], sparse=self.sparse_inputs)
N = self.num_nodes
h_norm = tf.nn.l2_normalize(h)
edge_v = tf.matmul(h_norm, tf.transpose(h_norm))
h_norm_1 = tf.norm(h_norm)
edge_v /= h_norm_1 * h_norm_1
edge_v = dot(edge_v, self.vars['a']) # It causes an error when I add this line
zero = tf.constant(0, dtype=tf.float32)
where = tf.not_equal(edge_v, zero)
indices = tf.where(where)
values = tf.gather_nd(edge_v, indices)
sgraph = tf.SparseTensor(indices, values, dense_shape= [N,N])
return h, sgraph
The script shows some runtime errors:
Screenshot of error message
I suspect the error here is related to line 226:
edge_v = dot(edge_v, self.vars['a']) # It causes an error when I add this line
Any admonition on how to accomplish this successfully?
Link of the script on GitHub:
https://github.com/jiangboahu/GLCN-tf
Note: I don't want to use built-in functions, because I think they are not precise to do this job.
ETA: It appears that there are some answers around but they seem to tackle different problems, as far, as I understood them.
Thanks a bunch in advance
What's the dot? Have you imported the method?
It should either be:
edge_v = tf.keras.backend.dot(edge_v, self.vars['a'])
or
edge_v = tf.tensordot(edge_v, self.vars['a'])

tfrecordswriter does not write

I am trying to create a tf.data.Dataset from a generator I wrote, and following this great answer: Split .tfrecords file into many .tfrecords files
Generator Code
def get_examples_generator(num_variants, vcf_reader):
def generator():
counter = 0
for vcf_read in vcf_reader:
is_vcf_ok = ... # checking whether this "vcf" example is ok
if is_vcf_ok and counter < num_variants:
counter += 1
# features extraction ...
# we create an example
example = make_example(img=img, label=label) # returns a SerializedExample
yield example
return generator
TFRecordsWriter Usage Code
def write_sharded_tfrecords(filename, path, vcf_reader,
num_variants,
shard_len):
assert Path(path).exists(), "path does not exist"
generator = get_examples_generator(num_variants=num_variants,
vcf_reader=vcf_reader,
cfdna_bam_reader=cfdna_bam_reader)
dataset = tf.data.Dataset.from_generator(generator,
output_types=tf.string,
output_shapes=())
num_shards = int(np.ceil(num_variants/shard_len))
formatter = lambda batch_idx: f'{path}/{filename}-{batch_idx:05d}-of-' \
f'{num_shards:05d}.tfrecord'
# inspired by https://stackoverflow.com/questions/54519309/split-tfrecords-file-into-many-tfrecords-files
for i in range(num_shards):
shard_path = formatter(i)
writer = tf.data.experimental.TFRecordWriter(shard_path)
shard = dataset.shard(num_shards, index=i)
writer.write(shard)
This is supposed to be a straight-forward use of tfrecords writer. However, It does not write any files at all. Does anyone understand why this doesn't work?
In my functions, I call the writer with tf.io.TFRecordWriter. Try changing your writer and see if it works:
writer = tf.io.TFRecordWriter
...
As a further reference, this answer helped me:
https://stackoverflow.com/a/60283571

How to read parameters of layers of .tflite model in python

I was trying to read tflite model and pull all the parameters of the layers out.
My steps:
I generated flatbuffers model representation by running (please build flatc before):
flatc -python tensorflow/tensorflow/lite/schema/schema.fbs
Result is tflite/ folder that contains layer description files (*.py) and some utilitarian files.
I successfully loaded model:
in case of import Error: set PYTHONPATH to point to the folder where tflite/ is
from tflite.Model import Model
def read_tflite_model(file):
buf = open(file, "rb").read()
buf = bytearray(buf)
model = Model.GetRootAsModel(buf, 0)
return model
I partly pulled model and node parameters out and stacked in iterating over nodes:
Model part:
def print_model_info(model):
version = model.Version()
print("Model version:", version)
description = model.Description().decode('utf-8')
print("Description:", description)
subgraph_len = model.SubgraphsLength()
print("Subgraph length:", subgraph_len)
Nodes part:
def print_nodes_info(model):
# what does this 0 mean? should it always be zero?
subgraph = model.Subgraphs(0)
operators_len = subgraph.OperatorsLength()
print('Operators length:', operators_len)
from collections import deque
nodes = deque(subgraph.InputsAsNumpy())
STEP_N = 0
MAX_STEPS = operators_len
print("Nodes info:")
while len(nodes) != 0 and STEP_N <= MAX_STEPS:
print("MAX_STEPS={} STEP_N={}".format(MAX_STEPS, STEP_N))
print("-" * 60)
node_id = nodes.pop()
print("Node id:", node_id)
tensor = subgraph.Tensors(node_id)
print("Node name:", tensor.Name().decode('utf-8'))
print("Node shape:", tensor.ShapeAsNumpy())
# which type is it? what does it mean?
type_of_tensor = tensor.Type()
print("Tensor type:", type_of_tensor)
quantization = tensor.Quantization()
min = quantization.MinAsNumpy()
max = quantization.MaxAsNumpy()
scale = quantization.ScaleAsNumpy()
zero_point = quantization.ZeroPointAsNumpy()
print("Quantization: ({}, {}), s={}, z={}".format(min, max, scale, zero_point))
# I do not understand it again. what is j, that I set to 0 here?
operator = subgraph.Operators(0)
for i in operator.OutputsAsNumpy():
nodes.appendleft(i)
STEP_N += 1
print("-"*60)
Please point me to documentation or some example of using this API.
My problems are:
I can not get documentation on this API
Iterating over Tensor objects seems not possible for me, as it doesn't have Inputs and Outputs methods. + subgraph.Operators(j=0) I do not understand what j means in here. Because of that my cycle goes through two nodes: input (once) and the next one over and over again.
Iterating over Operator objects is surely possible:
Here we iterate over them all but I can not get how to map Operator and Tensor.
def print_in_out_info_of_all_operators(model):
# what does this 0 mean? should it always be zero?
subgraph = model.Subgraphs(0)
for i in range(subgraph.OperatorsLength()):
operator = subgraph.Operators(i)
print('Outputs', operator.OutputsAsNumpy())
print('Inputs', operator.InputsAsNumpy())
I do not understand how to pull parameters out Operator object. BuiltinOptions method gives me Table object, that I do not know what to map at.
subgraph = model.Subgraphs(0)
What does this 0 mean? should it always be zero? obviously no, but what is it? Id of the subgraph? If so - I'm happy. If no, please try to explain it.

Printing value of tensorflow variable inside object

I am trying to print the value of a Tensorflow variable that is defined inside an object. To better illustrate my issue, I am currently trying to run the monodepth library. It has 2 main files, dataloader and main. Basically dataloader iterates over a text file of
class MonodepthDataloader(object):
"""monodepth dataloader"""
def __init__(self, data_path, filenames_file, params, dataset, mode):
self.data_path = data_path
self.params = params
self.dataset = dataset
self.mode = mode
self.left_image_batch = None
self.right_image_batch = None
input_queue = tf.train.string_input_producer([filenames_file], shuffle=False)
line_reader = tf.TextLineReader()
_, line = line_reader.read(input_queue)
split_line = tf.string_split([line]).values
...
left_image_path = tf.string_join([self.data_path, split_line[0]])
left_image_o = self.read_image(left_image_path)
I am trying to print out left_image_path to verify that it is being generated correctly. However this is in an object being called by monodepth_main. That is monodepth_main calls the dataloader with the following lines:
dataloader = MonodepthDataloader(args.data_path, args.filenames_file, params, args.dataset, args.mode)
left = dataloader.left_image_batch
As a result I can't just use sess.run(x). I have also tried using tf.Print(line, [line]) but nothing shows up.
How do I print out the value of a tensorflow variable inside an object? Specifically left_image_path?

getting runtime statistics with monitoredtrainingsession in tensorflow

I am trying to get my tensorflow code profile (running and memory consumption of each layers in the network) by following the runtime statistics instruction here. As far as I understand, I need to create run options and run metadata like this
run_options = tf.RunOptions(trace_level=tf.RunOptions.FULL_TRACE)
run_metadata = tf.RunMetadata()
and pass them to sess.run
However, as I am also trying to use tf.train.MonitoredTrainingSession I don't know if I can pass the same thing into this class. A plausible approach could make use of Hooks but I do not know how to do it. I am still very new to them
You can simply create a custom hook and pass it to the MonitoredTrainingSession. There is no need to pass your own tf.RunMetadata() instance to the run call.
Here is an example Hook which stores metadata every N steps to ckptdir:
import tensorflow as tf
class TraceHook(tf.train.SessionRunHook):
"""Hook to perform Traces every N steps."""
def __init__(self, ckptdir, every_step=50, trace_level=tf.RunOptions.FULL_TRACE):
self._trace = every_step == 1
self.writer = tf.summary.FileWriter(ckptdir)
self.trace_level = trace_level
self.every_step = every_step
def begin(self):
self._global_step_tensor = tf.train.get_global_step()
if self._global_step_tensor is None:
raise RuntimeError("Global step should be created to use _TraceHook.")
def before_run(self, run_context):
if self._trace:
options = tf.RunOptions(trace_level=self.trace_level)
else:
options = None
return tf.train.SessionRunArgs(fetches=self._global_step_tensor,
options=options)
def after_run(self, run_context, run_values):
global_step = run_values.results - 1
if self._trace:
self._trace = False
self.writer.add_run_metadata(run_values.run_metadata,
f'{global_step}', global_step)
if not (global_step + 1) % self.every_step:
self._trace = True
It checks in before_run whether it has to trace or not and if so, adds the RunOptions. In after_run it checks if the next run call needs to be traced and if so, it sets _trace to True again. Additionally it stores the metadata when it is available.