The interaction between TensorFlow Python frontend and its C++ core - tensorflow

TensorFlow whitepaper says that it has a core written in C++. Does it mean that specified computation graph in Python is completely transformed into C++ equivalent code for the execution? If yes, is it possible to extract the generated intermediate code? My use-case is to observe the calls to the cuDNN library for an specified computation graph.

You can see the intermediate format if you do print(tf.get_default_graph().as_graph_def()). To observe CuDNN calls perhaps you could add some print statements to tensorflow/stream_executor/cuda/cuda_dnn.cc

Related

How do TensorFlow C++ operator implementations interface with cuDNN (using conv2d as an example)?

I'm trying to trace how TensorFlow actually uses cuDNN to implement different operators. I'll use Tensorflow's conv2d operator as an example (tf.nn.conv2d).
As a reference to give me an idea of what I should be looking for, as well as how to implement a conv2d operator in cuDNN, I've been reading this blog post: Peter Goldsborough - Convolutions with cuDNN.
So based on this answer (ANSWER: Tensorflow: Where is tf.nn.conv2d Actually Executed?), Tensorflow will (roughly, I recognize there are some other branches that could be taken) call down this stack:
tensorflow/python/ops/nn_ops.py:conv2d_v2
tensorflow/python/ops/nn_ops.py:conv2d
gen_nn_ops.py:conv2d
This file is generated when TF is built (see ANSWER: looking for source code of from gen_nn_ops in tensorflow)
Then we call down into the C++ layer...
...and we end up in the Conv2DOP class (tensorflow/core/kernels/conv_ops.cc:Conv2DOP)
Now I assume (and someone please correct me if I'm wrong), that if we are correctly using TF with cuDNN, we will then be launching a LaunchConv2DOp<GPUDevice, T>::operator().
Towards the end of this operator implementation, around when they start defining a se::dnn::BatchDescriptor (see here), and later when they run LaunchAutotunedConv (see here), this is when I think they are basically making use of their higher abstraction levels, but eventually down these levels they interface with the cuDNN APIs.
Now I expected to find some sort of communication here between, for example, se::dnn::BatchDescriptor or LaunchAutotunedConv and either the cuDNN specific methods found in tensorflow/stream_executor/cuda/cuda_dnn.cc, or any of the auto-generated stub files that are used to wrap cuDNN APIs based on the cuDNN version (e.g., tensorflow/stream_executor/cuda/cudnn_8_0.inc. However, I can find no link between these 2 levels of abstraction.
Am I missing something? At what point does Tensorflow actually make calls to the cuDNN APIs from their C++ operator implementations?

Tensorflow Federated in C++

I'm trying to find a way to utilise Tensorflow Federated in C++. I know it's possible to do it for the regular Tensorflow with the Core API, however I can't find a way for Federated. If it's not possible suggestions for workarounds would be highly appreciated!
It would be helpful to know which part(s) of TFF you want to use in C++, and what your use case is, as that will influence the answer:
The APIs for defining federated computations (tff.federated_computation); as with TensorFlow, these are pretty tightly coupled to Python.
Executing serialized computations (stored as instances of computation.proto). This can conceptually be done using a purely C++ API, though TFF doesn't currently provide such a runtime.
TFF has since implemented a C++ runtime along the lines of Brendan's answer. TFF's CC directory contains the code; most of the implementation is in the executors directory.
These APIs can certainly be called from C++ code; see, e.g., the implementation of TFF's RunWorker, which starts and runs a server that can execute TFF computations.

Why is TensorFlow while_loop node required?

Why does the basic static, compiled computation graph structure of TF (as opposed to a dynamic graph) necessitate a dedicated while loop node and doesn't enable the use "regular" Python control flow expressions?
Thanks.
TensorFlow builds the computational graph and makes it static (unchangeable) for efficiency. Once it's finalized, telling the TensorFlow graph to do something is like sending some input to a separate program which you can no longer change besides passing in different inputs. So the TensorFlow graph at that point has no knowledge of the Python control flow. It just runs when called. Because of this, it needs to explicitly know ahead of time where you want to add in a while loop inside the TensorFlow graph. You can however, still use Python control flow and just call the TensorFlow graph as though it were a specific function.

How is tensorflow tf.add implemented?

I was hoping someone more familiar with the TensorFlow library could help with a simple question. I would like to know how the tensorflow add operation is implemented.
Other tensorflow ops are registered and defined kernels, but where/how are basic arithmetic operations handled?
https://github.com/tensorflow/tensorflow/tree/master/tensorflow/core/kernels
The tf.add() Python function is an automatically generated wrapper function (currently in the module tensorflow.python.ops.gen_math_ops) that adds a node to the current default TensorFlow graph.
When you run a graph containing that node (via tf.Session.run()), the TensorFlow runtime will invoke an instance of BinaryOp<Device, tensorflow::functor::add>, which is contains some code that is common across all componentwise binary operations (e.g. for broadcasting and argument validation), and an invocation of tensorflow::functor::add(), which uses Eigen's scalar_sum_op to perform the addition.

How to parse tensorflow model with C++ API

I want to parse a pre-trained model of tensorflow. For example, I want to get the full list of operation nodes, including the names and dependency given a model.
So, first I searched Java API and apparently there's little APIs supported by Java interface. So I seek for C++ API, but failed to find the right APIs.
The reason I don't use python is that I need to do this on android devices.
The TensorFlow graph is stored as a GraphDef protocol buffer. You should be able to build a java version of this and use it to inspect the stored graph. This will have the lists of operations, and their dependencies, but will have the values of the weights.