I was hoping someone more familiar with the TensorFlow library could help with a simple question. I would like to know how the tensorflow add operation is implemented.
Other tensorflow ops are registered and defined kernels, but where/how are basic arithmetic operations handled?
https://github.com/tensorflow/tensorflow/tree/master/tensorflow/core/kernels
The tf.add() Python function is an automatically generated wrapper function (currently in the module tensorflow.python.ops.gen_math_ops) that adds a node to the current default TensorFlow graph.
When you run a graph containing that node (via tf.Session.run()), the TensorFlow runtime will invoke an instance of BinaryOp<Device, tensorflow::functor::add>, which is contains some code that is common across all componentwise binary operations (e.g. for broadcasting and argument validation), and an invocation of tensorflow::functor::add(), which uses Eigen's scalar_sum_op to perform the addition.
Related
I'm trying to trace how TensorFlow actually uses cuDNN to implement different operators. I'll use Tensorflow's conv2d operator as an example (tf.nn.conv2d).
As a reference to give me an idea of what I should be looking for, as well as how to implement a conv2d operator in cuDNN, I've been reading this blog post: Peter Goldsborough - Convolutions with cuDNN.
So based on this answer (ANSWER: Tensorflow: Where is tf.nn.conv2d Actually Executed?), Tensorflow will (roughly, I recognize there are some other branches that could be taken) call down this stack:
tensorflow/python/ops/nn_ops.py:conv2d_v2
tensorflow/python/ops/nn_ops.py:conv2d
gen_nn_ops.py:conv2d
This file is generated when TF is built (see ANSWER: looking for source code of from gen_nn_ops in tensorflow)
Then we call down into the C++ layer...
...and we end up in the Conv2DOP class (tensorflow/core/kernels/conv_ops.cc:Conv2DOP)
Now I assume (and someone please correct me if I'm wrong), that if we are correctly using TF with cuDNN, we will then be launching a LaunchConv2DOp<GPUDevice, T>::operator().
Towards the end of this operator implementation, around when they start defining a se::dnn::BatchDescriptor (see here), and later when they run LaunchAutotunedConv (see here), this is when I think they are basically making use of their higher abstraction levels, but eventually down these levels they interface with the cuDNN APIs.
Now I expected to find some sort of communication here between, for example, se::dnn::BatchDescriptor or LaunchAutotunedConv and either the cuDNN specific methods found in tensorflow/stream_executor/cuda/cuda_dnn.cc, or any of the auto-generated stub files that are used to wrap cuDNN APIs based on the cuDNN version (e.g., tensorflow/stream_executor/cuda/cudnn_8_0.inc. However, I can find no link between these 2 levels of abstraction.
Am I missing something? At what point does Tensorflow actually make calls to the cuDNN APIs from their C++ operator implementations?
I am trying to understand how the internal flow goes in mxnet when we call forward . Is there any way to get source code of mxnet?
This really depends on what your symbolic graph looks like. I assume you use MXNet with Python (Python documentation). There you can choose to use the MXNet symbol library or the Gluon library.
Now, you were asking whether one can inspect the code, and, yes, you can find it on GitHub. The folder python contains the python interface and src contains all MXNet sources. What happens on forward is eventually defined by the MXNet execution engine, which tracks input/output dependencies of operators and neural network layers, allocate memory on the different devices (CPU, GPUs). There is a general architecture documentation for this.
I suppose you are interested in what each and every operation does, such as argmax (reduction), tanh (unary math operation) or convolution (complex neural network operation). This you can find in the operator folder of MXNet. This requires a whole documentation in itself and there is a special forum for MXNet specifics here, but I will give a short orientation:
Each operation in a (symbolic) execution graph needs a defined forward and backward operation. It also needs to define its output shape, so that it can be chained with other operations. If that operator needs weights, it needs to define the amount of memory it requires, so MXNet can allocate it.
Each operation requires several implementations for a) CPU b) GPU (CUDA) c) wrapper around cuDNN
All unary math operations follow the same pattern, so they are all defined in a similar way in mshadow_op.h (e.g. relu).
This is all I can tell you based on your quite broad question.
Why does the basic static, compiled computation graph structure of TF (as opposed to a dynamic graph) necessitate a dedicated while loop node and doesn't enable the use "regular" Python control flow expressions?
Thanks.
TensorFlow builds the computational graph and makes it static (unchangeable) for efficiency. Once it's finalized, telling the TensorFlow graph to do something is like sending some input to a separate program which you can no longer change besides passing in different inputs. So the TensorFlow graph at that point has no knowledge of the Python control flow. It just runs when called. Because of this, it needs to explicitly know ahead of time where you want to add in a while loop inside the TensorFlow graph. You can however, still use Python control flow and just call the TensorFlow graph as though it were a specific function.
I want to parse a pre-trained model of tensorflow. For example, I want to get the full list of operation nodes, including the names and dependency given a model.
So, first I searched Java API and apparently there's little APIs supported by Java interface. So I seek for C++ API, but failed to find the right APIs.
The reason I don't use python is that I need to do this on android devices.
The TensorFlow graph is stored as a GraphDef protocol buffer. You should be able to build a java version of this and use it to inspect the stored graph. This will have the lists of operations, and their dependencies, but will have the values of the weights.
TensorFlow whitepaper says that it has a core written in C++. Does it mean that specified computation graph in Python is completely transformed into C++ equivalent code for the execution? If yes, is it possible to extract the generated intermediate code? My use-case is to observe the calls to the cuDNN library for an specified computation graph.
You can see the intermediate format if you do print(tf.get_default_graph().as_graph_def()). To observe CuDNN calls perhaps you could add some print statements to tensorflow/stream_executor/cuda/cuda_dnn.cc