TensorFlow 2 documentation for graph-mode - tensorflow

When I check the TensorFlow documentation (Python API docs or guides), it all seems exclusively for eager-mode. Almost all the examples don't even mention this.
For some specific operation/function like tf.nn.relu, this does not really make any difference.
However, for more complex things like tf.data (Dataset API, guide), it likely makes a difference. Esp all the examples would be different for graph mode.
Where can I find recent documentation (API references, guides, tutorials, examples) for graph mode?
(My current fallback is to check latest TF 1 documentation. But at some point, this will become more and more outdated.)
Or is graph mode deprecated so far that documentation for it seems not necessary anymore?

Graph mode in TensorFlow 2 is different from graph mode in TensorFlow 1. Instead of using sessions and placeholders, TensorFlow 2 uses functions annotated with tf.function. The eager mode examples you see can be executed in graph mode by wrapping them within a tf.function.
If you prefer to use the TensorFlow 1 style of graph mode with sessions and placeholders, you can still do so in TensorFlow 2 by using the tf.compat.v1 module. The API docs in that module describe the TensorFlow 1 style of graph mode. You can find archived guides about TensorFlow 1 graph mode at https://github.com/tensorflow/docs/tree/master/site/en/r1/guide

Related

Redundancies in tf.keras.backend and tensorflow libraries

I have been working in TensorFlow for about a year now, and I am transitioning from TF 1.x to TF 2.0, and I am looking for some guidance on how to use the tf.keras.backend library in TF 2.0. I understand that the transition to TF 2.0 is supposed to remove a lot of redundancies in modeling and building graphs, since there were many ways to create equivalent layers in earlier TensorFlow versions (and I'm insanely grateful for that change!), but I'm getting stuck on understanding when to use tf.keras.backend, because the operations appear redundant with other TensorFlow libraries.
I see that some of the functions in tf.keras.backend are redundant with other TensorFlow libraries. For instance, tf.keras.backend.abs and tf.math.abs are not aliases (or at least, they're not listed as aliases in the documentation), but both take the absolute value of a tensor. After examining the source code, it looks like tf.keras.backend.abs calls the tf.math.abs function, and so I really do not understand why they are not aliases. Other tf.keras.backend operations don't appear to be duplicated in TensorFlow libraries, but it looks like there are TensorFlow functions that can do equivalent things. For instance, tf.keras.backend.cast_to_floatx can be substituted with tf.dtypes.cast as long as you explicitly specify the dtype. I am wondering two things:
when is it best to use the tf.keras.backend library instead of the equivalent TensorFlow functions?
is there a difference in these functions (and other equivalent tf.keras.backend functions) that I am missing?
Short answer: Prefer tensorflow's native API such as tf.math.* to thetf.keras.backend.* API wherever possible.
Longer answer:
The tf.keras.backend.* API can be mostly viewed as a remnant of the keras.backend.* API. The latter is a design that serves the "exchangeable backend" design of the original (non-TF-specific) keras. This relates to the historical aspect of keras, which supports multiple backend libraries, among which tensorflow used to be just one of them. Back in 2015 and 2016, other backends, such as Theano and MXNet were quite popular too. But going into 2017 and 2018, tensorflow became the dominant backend of keras users. Eventually keras became a part of the tensorflow API (in 2.x and later minor versions of 1.x). In the old multi-backend world, the backend.* API provides a backend-independent abstraction over the myriad of supported backend. But in the tf.keras world, the value of the backend API is much more limited.
The various functions in tf.keras.backend.* can be divided into a few categories:
Thin wrappers around the equivalent or mostly-equivalent tensorflow native API. Examples: tf.keras.backend.less, tf.keras.backend.sin
Slightly thicker wrappers around tensorflow native APIs, with more features included. Examples: tf.keras.backend.batch_normalization, tf.keras.backend.conv2d(https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/keras/backend.py#L4869). They often perform proprocessing and implement other logics, which make your life easier than using native tensorflow API.
Unique functions that don't have equivalent in the native tensorflow API. Examples: tf.keras.backend.rnn, tf.keras.backend.set_learning_phase
For category 1, use native tensorflow APIs. For categories 2 and 3, you may want to use the tf.keras.backend.* API, as long as you can find it in the documentation page: https://www.tensorflow.org/api_docs/python/, because the documented ones have backward compatibility guarantees, so that you don't need to worry about a future version of tensorflow removing it or changing its behavior.

Has tf.Estimator become obsolete in Tensorflow 2.0?

Today I've set up a custom model using its tf.Estimator high-level API in Tensorflow 2.0.
It was a pain in the *** to get it running, and there are very few complete examples online that implement custom Estimators in Tensorflow 2, which made me questioning the reasons for using this API.
According to the docs, the main advantages of using the tf.Estimator API are:
You can run Estimator-based models on a local host or on a distributed multi-server environment without changing your model. Furthermore, you can run Estimator-based models on CPUs, GPUs, or TPUs without recoding your model.
You no longer have to worry about creating the computational graph or sessions since Estimators handle all the "plumbing" for you
Advantage 2. clearly doesn't apply to Tensorflow 2.0 anymore, as it runs in eager mode by default, so you don't have to worry about sessions anyways.
Advantage 1. also seems quite irrelevant in Tensorflow 2.0 - using tf.distribute.Strategy, you can now easily run even high level tf.Keras models in a distributed fashion and on CPUs/GPUs/TPUs.
tf.Keras models are so much easier and faster to set up, so why did they even bother to keep the tf.Estimator API in Tensorflow 2.0? Are there other advantages of using this API?

TFLearn, tf.contrib.learn or tf.estimator?

I've been tooling around with Tensorflow and TFLearn for a few months. I've made some progress. However, I was expecting to be able to construct a functioning scikit-learn type Estimator as a TFLearn.DNN. I can fit, and I can predict, but I can't do cross-validation because evaluate is failing for me. TensorFlow is throwing:
ValueError: Cannot use the given session to evaluate tensor: the tensor's graph is different from the session's graph.
when I call evaluate. I thought the whole point of the TFLearn API was to abstract things like session management away from my code.
I have asked questions about problems I've had with TFLearn in several forums, including on the project's GitHub page. Unfortunately, I'm not getting any answers.
A few days ago, suddenly I encountered the tf.contrib.learn namespace. I'm seeing a lot of overlap between those classes and TFLearn. Then, I also found the tf.estimator class.
Finally, I just figured out that tensorflow.contrib sub-packages are third-party contributions. This leads me to wonder whether the original TFLearn is simply being absorbed into the larger TensorFlow package. Which direction is the code flowing?
I don't care what I use, as long as I get all the functionality of a scikit-learn estimator object.
I think it's best to use the official sub-modules of TensorFlow like tf.data and tf.estimator. They should be well maintained and features are added quickly.
For instance, #mrry seems in charge of tf.data and the module is very clean, easy to use and compatible with tf.estimator.
The module tf.estimator is a bit less clear, and comes from tf.contrib.learn. Don't take my word for it but I think tf.estimator will slowly replace tf.contrib.learn and it should be the official high-level API for TensorFlow (along with tf.keras).
You can find more information in the official blog post, where they explain the relationship between all modules.

keras + scikit-learn wrapper, appears to hang when GridSearchCV with n_jobs >1

UPDATE: I have to re-write this question as after some investigation I realise that this is a different problem.
Context: running keras in a gridsearch setting using the kerasclassifier wrapper with scikit learn. Sys: Ubuntu 16.04, libraries: anaconda distribution 5.1, keras 2.0.9, scikitlearn 0.19.1, tensorflow 1.3.0 or theano 0.9.0, using CPUs only.
Code:
I simply used the code here for testing: https://machinelearningmastery.com/use-keras-deep-learning-models-scikit-learn-python/, the second example 'Grid Search Deep Learning Model Parameters'. Pay attention to line 35, which reads:
grid = GridSearchCV(estimator=model, param_grid=param_grid)
Symptoms: When grid search uses more than 1 jobs (means cpus?), e.g.,, setting 'n_jobs' on the above line A to '2', line below:
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=2)
will cause the code to hang indefinitely, either with tensorflow or theano, and there is no cpu usage (see attached screenshot, where 5 python processes were created but none is using cpu).
By debugging, it appears to be the following line with 'sklearn.model_selection._search' that causes problems:
line 648: for parameters, (train, test) in product(candidate_params,
cv.split(X, y, groups)))
, on which the program hangs and cannot continue.
I would really appreciate some insights as to what this means and why this could happen.
Thanks in advance
Are you using a GPU? If so, you can't have multiple threads running each variation of the params because they won't be able to share the GPU.
Here's a full example on how to use keras, sklearn wrappers in a Pipeline with GridsearchCV: Pipeline with a Keras Model
If you really want to have multiple jobs in the GridSearchCV, you can try to limit the GPU fraction used by each job (e.g. if each job only allocates 0.5 of the available GPU memory, you can run 2 jobs simultaneously)
See these issues:
Limit the resource usage for tensorflow backend
GPU memory fraction does not work in keras 2.0.9 but it works in 2.0.8
I dealt with this problem too and it really slowed me down not being able to run what is essentially trivially-parallelizable code. The issue is indeed with the tensorflow session. If a session in created in the parent process before GridSearchCV.fit(), it will hang!
The solution for me was to keep all session/graph creation code restricted to the KerasClassifer class and the model creation function i passed to it.
Also what Felipe said about the memory is true, you will want to restrict the memory usage of TF in either the model creation function or a subclass of KerasClassifier.
Related info:
Session hang issue with python multiprocessing
Keras + Tensorflow and Multiprocessing in Python
TL;DR Answer: You can't because your Keras model can't be serialized, and serialization is needed for parallelizing in Python with joblib.
This problem is much detailed here: https://www.neuraxle.org/stable/scikit-learn_problems_solutions.html#problem-you-can-t-parallelize-nor-save-pipelines-using-steps-that-can-t-be-serialized-as-is-by-joblib
The solution to parallelize your code is to make your Keras estimator serializable. This can be done using savers as described at the link above.
If you're lucky enough to be using TensorFlow v2's prebuilt Keras module, the following practical code sample will reveal to be useful to you as you'd practically just need to take the code and modify it with yours:
https://github.com/guillaume-chevalier/seq2seq-signal-prediction
In this example, all the saving and loading code is all pre-written for you using Neuraxle-TensorFlow, and this makes it parallelizeable if you use Neuraxle's AutoML methods (e.g.: Neuraxle's grid search and Neuraxle's own parallelism things).

Can Tensorflow's tf.contrib.learn be used with RNN (tf.contrib.rnn)?

The verion of Tensorflow I am using is 1.0 . I'd like to ask, if tf.contrib.learn can be used in combined with tf.contrib.rnn?
Existing official examples(here and here) using tf.contrib.rnn all use 'basic'/'low-level' APIs to build and run the model. However, high level APIs in tf.contrib.learn produce much shorter code. Are there examples that show tf.contrib.learn can be used with RNN (tf.contrib.rnn) ?
Tensorflow's API is constantly changing rapidly and there are not many documents that can be found on its website.