Redundancies in tf.keras.backend and tensorflow libraries - tensorflow

I have been working in TensorFlow for about a year now, and I am transitioning from TF 1.x to TF 2.0, and I am looking for some guidance on how to use the tf.keras.backend library in TF 2.0. I understand that the transition to TF 2.0 is supposed to remove a lot of redundancies in modeling and building graphs, since there were many ways to create equivalent layers in earlier TensorFlow versions (and I'm insanely grateful for that change!), but I'm getting stuck on understanding when to use tf.keras.backend, because the operations appear redundant with other TensorFlow libraries.
I see that some of the functions in tf.keras.backend are redundant with other TensorFlow libraries. For instance, tf.keras.backend.abs and tf.math.abs are not aliases (or at least, they're not listed as aliases in the documentation), but both take the absolute value of a tensor. After examining the source code, it looks like tf.keras.backend.abs calls the tf.math.abs function, and so I really do not understand why they are not aliases. Other tf.keras.backend operations don't appear to be duplicated in TensorFlow libraries, but it looks like there are TensorFlow functions that can do equivalent things. For instance, tf.keras.backend.cast_to_floatx can be substituted with tf.dtypes.cast as long as you explicitly specify the dtype. I am wondering two things:
when is it best to use the tf.keras.backend library instead of the equivalent TensorFlow functions?
is there a difference in these functions (and other equivalent tf.keras.backend functions) that I am missing?

Short answer: Prefer tensorflow's native API such as tf.math.* to thetf.keras.backend.* API wherever possible.
Longer answer:
The tf.keras.backend.* API can be mostly viewed as a remnant of the keras.backend.* API. The latter is a design that serves the "exchangeable backend" design of the original (non-TF-specific) keras. This relates to the historical aspect of keras, which supports multiple backend libraries, among which tensorflow used to be just one of them. Back in 2015 and 2016, other backends, such as Theano and MXNet were quite popular too. But going into 2017 and 2018, tensorflow became the dominant backend of keras users. Eventually keras became a part of the tensorflow API (in 2.x and later minor versions of 1.x). In the old multi-backend world, the backend.* API provides a backend-independent abstraction over the myriad of supported backend. But in the tf.keras world, the value of the backend API is much more limited.
The various functions in tf.keras.backend.* can be divided into a few categories:
Thin wrappers around the equivalent or mostly-equivalent tensorflow native API. Examples: tf.keras.backend.less, tf.keras.backend.sin
Slightly thicker wrappers around tensorflow native APIs, with more features included. Examples: tf.keras.backend.batch_normalization, tf.keras.backend.conv2d(https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/keras/backend.py#L4869). They often perform proprocessing and implement other logics, which make your life easier than using native tensorflow API.
Unique functions that don't have equivalent in the native tensorflow API. Examples: tf.keras.backend.rnn, tf.keras.backend.set_learning_phase
For category 1, use native tensorflow APIs. For categories 2 and 3, you may want to use the tf.keras.backend.* API, as long as you can find it in the documentation page: https://www.tensorflow.org/api_docs/python/, because the documented ones have backward compatibility guarantees, so that you don't need to worry about a future version of tensorflow removing it or changing its behavior.

Related

How do TensorFlow C++ operator implementations interface with cuDNN (using conv2d as an example)?

I'm trying to trace how TensorFlow actually uses cuDNN to implement different operators. I'll use Tensorflow's conv2d operator as an example (tf.nn.conv2d).
As a reference to give me an idea of what I should be looking for, as well as how to implement a conv2d operator in cuDNN, I've been reading this blog post: Peter Goldsborough - Convolutions with cuDNN.
So based on this answer (ANSWER: Tensorflow: Where is tf.nn.conv2d Actually Executed?), Tensorflow will (roughly, I recognize there are some other branches that could be taken) call down this stack:
tensorflow/python/ops/nn_ops.py:conv2d_v2
tensorflow/python/ops/nn_ops.py:conv2d
gen_nn_ops.py:conv2d
This file is generated when TF is built (see ANSWER: looking for source code of from gen_nn_ops in tensorflow)
Then we call down into the C++ layer...
...and we end up in the Conv2DOP class (tensorflow/core/kernels/conv_ops.cc:Conv2DOP)
Now I assume (and someone please correct me if I'm wrong), that if we are correctly using TF with cuDNN, we will then be launching a LaunchConv2DOp<GPUDevice, T>::operator().
Towards the end of this operator implementation, around when they start defining a se::dnn::BatchDescriptor (see here), and later when they run LaunchAutotunedConv (see here), this is when I think they are basically making use of their higher abstraction levels, but eventually down these levels they interface with the cuDNN APIs.
Now I expected to find some sort of communication here between, for example, se::dnn::BatchDescriptor or LaunchAutotunedConv and either the cuDNN specific methods found in tensorflow/stream_executor/cuda/cuda_dnn.cc, or any of the auto-generated stub files that are used to wrap cuDNN APIs based on the cuDNN version (e.g., tensorflow/stream_executor/cuda/cudnn_8_0.inc. However, I can find no link between these 2 levels of abstraction.
Am I missing something? At what point does Tensorflow actually make calls to the cuDNN APIs from their C++ operator implementations?

Tensorflow Federated in C++

I'm trying to find a way to utilise Tensorflow Federated in C++. I know it's possible to do it for the regular Tensorflow with the Core API, however I can't find a way for Federated. If it's not possible suggestions for workarounds would be highly appreciated!
It would be helpful to know which part(s) of TFF you want to use in C++, and what your use case is, as that will influence the answer:
The APIs for defining federated computations (tff.federated_computation); as with TensorFlow, these are pretty tightly coupled to Python.
Executing serialized computations (stored as instances of computation.proto). This can conceptually be done using a purely C++ API, though TFF doesn't currently provide such a runtime.
TFF has since implemented a C++ runtime along the lines of Brendan's answer. TFF's CC directory contains the code; most of the implementation is in the executors directory.
These APIs can certainly be called from C++ code; see, e.g., the implementation of TFF's RunWorker, which starts and runs a server that can execute TFF computations.

TensorFlow 2 documentation for graph-mode

When I check the TensorFlow documentation (Python API docs or guides), it all seems exclusively for eager-mode. Almost all the examples don't even mention this.
For some specific operation/function like tf.nn.relu, this does not really make any difference.
However, for more complex things like tf.data (Dataset API, guide), it likely makes a difference. Esp all the examples would be different for graph mode.
Where can I find recent documentation (API references, guides, tutorials, examples) for graph mode?
(My current fallback is to check latest TF 1 documentation. But at some point, this will become more and more outdated.)
Or is graph mode deprecated so far that documentation for it seems not necessary anymore?
Graph mode in TensorFlow 2 is different from graph mode in TensorFlow 1. Instead of using sessions and placeholders, TensorFlow 2 uses functions annotated with tf.function. The eager mode examples you see can be executed in graph mode by wrapping them within a tf.function.
If you prefer to use the TensorFlow 1 style of graph mode with sessions and placeholders, you can still do so in TensorFlow 2 by using the tf.compat.v1 module. The API docs in that module describe the TensorFlow 1 style of graph mode. You can find archived guides about TensorFlow 1 graph mode at https://github.com/tensorflow/docs/tree/master/site/en/r1/guide

Tensorflow: Difference between tf.contrib.slim and tf.layers

When should I use tf.contrib.slim and when tf.layers?
As defined in https://www.tensorflow.org/api_docs/python/tf/contrib the contrib module contains "volatile or experimental code".
Generally speaking, the module tf.contrib contains contributed code. This code typically requires some additional tests and may encounter some significant changes before it is finally integrated into the TensorFlow core. In particular, this code is not supported by the Tensorflow team and may be modified or completely removed at any time without guarantees.
For this reason, in general I prefer to use tf.layers since it is more stable in terms of code support, but obviously some implementations intf.contrib can be useful sometimes (i.e. when there are new kind of layers, optimizers etc. and you cannot or don't want to write them yourself, sometimes these libraries are quickly updated).

TFLearn, tf.contrib.learn or tf.estimator?

I've been tooling around with Tensorflow and TFLearn for a few months. I've made some progress. However, I was expecting to be able to construct a functioning scikit-learn type Estimator as a TFLearn.DNN. I can fit, and I can predict, but I can't do cross-validation because evaluate is failing for me. TensorFlow is throwing:
ValueError: Cannot use the given session to evaluate tensor: the tensor's graph is different from the session's graph.
when I call evaluate. I thought the whole point of the TFLearn API was to abstract things like session management away from my code.
I have asked questions about problems I've had with TFLearn in several forums, including on the project's GitHub page. Unfortunately, I'm not getting any answers.
A few days ago, suddenly I encountered the tf.contrib.learn namespace. I'm seeing a lot of overlap between those classes and TFLearn. Then, I also found the tf.estimator class.
Finally, I just figured out that tensorflow.contrib sub-packages are third-party contributions. This leads me to wonder whether the original TFLearn is simply being absorbed into the larger TensorFlow package. Which direction is the code flowing?
I don't care what I use, as long as I get all the functionality of a scikit-learn estimator object.
I think it's best to use the official sub-modules of TensorFlow like tf.data and tf.estimator. They should be well maintained and features are added quickly.
For instance, #mrry seems in charge of tf.data and the module is very clean, easy to use and compatible with tf.estimator.
The module tf.estimator is a bit less clear, and comes from tf.contrib.learn. Don't take my word for it but I think tf.estimator will slowly replace tf.contrib.learn and it should be the official high-level API for TensorFlow (along with tf.keras).
You can find more information in the official blog post, where they explain the relationship between all modules.