When to use #tf.function decorator and when not? I know tf.function builds graph. But how to know when to build graphs? - tensorflow

I started by Tensorflow journey when it already came to 2.0.0, So never used graphs and sessions as in version1. But recently met tf.function and autographs which suits me. (but what i know is it is used only for train step)
Now when reading project code, many people use tf.function decorator on many other functions when they wanna build graphs. But i don't exactly get their point. How to know when to use graph and when not?
Can anyone help me?

Solution
The decorator, #tf.function conveniently converts a python function to a static tensorflow graph. TensorFlow operates in eager mode by default since version 2.0.0. Although eager mode could help you in line-by-line execution, this comes with the pitfall of relatively slower TensorFlow-code execution when compared to static-graph. Converting a certain function into a static graph increases execution speed while training your model.
Quoting tf.function documentation:
Functions can be faster than eager code, especially for graphs with many small ops. But for graphs with a few expensive ops (like convolutions), you may not see much speedup.
The static graph is created once and does not get updated if the function is called repeatedly with different values (not passed as the input-arguments). You should avoid using #tf.function in such scenarios or update the function definition (if possible) to include all the necessary variability through the input-arguments. However,
Now, if your function gets all its inputs through the function arguments, then if you apply #tf.function you will not see any problem.
Here is an example.
### When not to use #tf.function ###
# some variable that changes with time
var = timestamp()
#tf.function
def func(*args, **kwargs):
# your code
return var
In the example above, the function func() although depends on var, it does not access the variable var through its arguments. Thus, when #tf.function is applied for the first time, it creates a static-graph for func(). However, when the value of var changes in future, this will not get updated in the static-graph. See this for more clarity. Also, I would highly encourage you to see the references section.
For Debugging
Quoting source
You can use tf.config.experimental_run_functions_eagerly (which temporarily disables running functions as functions) for debugging purposes.
References
Better performance with tf.function
When to utilize tf.function
TensorFlow 2.0: tf.function and AutoGraph

Related

What is the difference in purpose between tf.py_function and tf.function?

The difference between the two is muddled in my head, notwithstanding the nuances of what is eager and what isn't. From what I gather, the #tf.function decorator has two benefits in that
it converts functions into TensorFlow graphs for performance, and
allows for a more Pythonic style of coding by interpreting many (but not all) common-place Python operations into tensor operations, e.g. if into tf.cond, etc.
From the definition of tf.py_function, it seems that it does just #2 above. Hence, why bother with tf.py_function when tf.function does the job with a performance improvement to boot and without the inability of the former to serialize?
They do indeed start to resemble each other as they are improved, so it is useful to see where they come from. Initially, the difference was that:
#tf.function turns python code into a series of TensorFlow graph nodes.
tf.py_function wraps an existing python function into a single graph node.
This means that tf.function requires your code to be relatively simple while tf.py_function can handle any python code, no matter how complex.
While this line is indeed blurring, with tf.py_function doing more interpretation and tf.function accepting lot's of complex python commands, the general rule stays the same:
If you have relatively simple logic in your python code, use tf.function.
When you use complex code, like large external libraries (e.g. connecting to a database, or loading a large external NLP package) use tf.py_function.

Is an eager-graph compatible same code solution possible?

I am trying to write code that is eager and graph compatible. However, there is very little information online for how to do this, being a literal footnote on TensorFlow's website. Furthermore, what they have wrote is confusing, saying:
The same code written for eager execution will also build a graph during graph execution. Do this by simply running the same code in a new Python session where eager execution is not enabled.
This implies that a same code solution is possible, where the only change required is the addition or removal of tf.enable_eager_execution().
Currently I use tf.keras to define my model and tf.data for my input pipeline. However, many eager operations don't work in graph, with the opposite also being true.
For example, I keep track of my number of epochs using tf.train.Checkpoint(). In eager mode, after restoring I can access it using epochs.numpy() to assign its value to a local variable. However, this does not work with graphs, which instead would require sess.run(epochs) due to the values not being defined during execution.
Again, to compute my gradients in eager I need to use some form of autograd, in my case tf.GradientTape(). This is not compatible with graphs, as "tf.GradientTape.gradients() does not support graph control flow."
I see that tfe.py_func exists, but once again, this only works when eager is not enabled, thus not helping for this problem.
So how do I make a same code solution, when it seems that many aspects of eager and graph directly conflict with each other?

How to use tf.layers classes instead of functions

It seems that tf.Layer modules come in two flavours: functions and classes. I normally use the functions directly (e.g, tf.layers.dense) but I'd like to know how to use classes directly (tf.layers.Dense). I've started experimenting with the new eager execution mode in tensorflow and I think using classes are going to be useful there as well but I haven't seen good examples in the documentation. Is there any part of TF documentation that shows how these are used?
I guess it would make sense to use them in a class where these layers are instantiated in the __init__ and then they're linked in the __call__ method when the inputs and dimensions are known?
Are these tf.layer classes related to tf.keras.Model? Is there an equivalent wrapper class for using tf.layers?
Update: for eager execution there's tfe.Network that must be inherited. There's an example here
tf.layers and tf.keras.layer classes are generally interchangeable and in fact at head (and thus by the next release - 1.9), the former actually inherits from the latter.
TensorFlow is moving towards consolidating on tf.keras APIs for constructing models as that makes state ownership more explicit (e.g., parameters are "owned" by the Layer object, as opposed to the functional style where all model parameters are put in a "collection" associated with the complete graph). This style works well for both eager execution and graph construction (support for eager execution is improving with every release). I'd recommend using tf.keras.layers and tf.keras.Model.
Some examples that you may find useful:
MNIST in the tensorflow/models repository
The programmer's guide
Other eager execution samples (where the exact same model definition works for both graph execution and eager execution).
Not all existing TensorFlow examples have been moved to this style, but they slowly will.
Hope that helps.

skipping layer in backpropagation in keras

I am using Keras with tensorflow backend and I am curious whether it is possible to skip a layer during backpropagation but have it execute in the forward pass. So here is what I mean
Lambda (lambda x: a(x))
I want to apply a to x in the forward pass but I do not want a to be included in the derivation when the backprop takes place.
I was trying to find a solution bit I could not find anything. Can somebody help me out here?
UPDATE 2
In addition to tf.py_func, there is now an official guide on how to add a custom op.
UPDATE
See this question for an example of writing a custom op with gradient purely in Python without needing to rebuild anything. Note that there are some limitations to the method (see the documentation of tf.py_func).
Not exactly a solution to the problem, but still kind of an answer and too long for comments.
That's not even a Keras issue, but a TensorFlow one. Each op defines its own gradient computation that is used during backpropagation. I you really wanted to something like that, you would need to implement the op into TensorFlow yourself (no easy feat) and define the gradient that you want - because you can't have "no gradient", if anything it would be 1 or 0 (otherwise you can't go on with backpropagation). There is a tf.NoGradient function in TensorFlow which causes an op to propagate zeros, but I don't think it is meant to / can be used out of TensorFlow own internals.
UPDATE
Okay so a bit more of context. TensorFlow graphs are built of ops, which are implemented by kernels; this is basically a 1-to-1 mapping, except that there may be for example a CPU and a GPU kernel for an op, hence the differentiation. The set of ops supported by TensorFlow is usually static, I mean it can change with newer versions, but in principle you cannot add your own ops, because the ops of a graph go into the Protobuf serialized format, so if you made your own ops then you would not be able to share your graph. Ops are then defined at C++ level with the macro REGISTER_OP (see for example here), and kernels with REGISTER_KERNEL_BUILDER (see for example here).
Now, where do gradients come into play? Well, the funny thing is that the gradient of an op is not defined at C++ level; there are ops (and kernels) that implement the gradient of other ops (if you look at the previous files you'll find ops/kernels with the name ending in Grad), but (as far as I'm aware) these are not explicitly "linked" at this level. It seems that the associations between ops and their gradients is defined in Python, usually via tf.RegisterGradient or the aforementioned tf.NoGradient (see for example here, Python modules starting with gen_ are autogenerated with the help of the C++ macros); these registrations inform the backpropagation algorithm about how to compute the gradient of the graph.
So, how to actually work this out? Well, you need to create at least one op in C++ with the corresponding kernel/s implementing the computation that you want for your forward pass. Then, if the gradient computation that you want to use can be expressed with existing TensorFlow ops (which is most likely), you would just need to call tf.RegisterGradient in Python and do the computation there in "standard" TensorFlow. This is quite complicated, but the good news is it's possible, and there's even an example for it (although I think they kinda forgot the gradient registration part in that one)! As you will see, the process involves compiling the new op code into a library (btw I'm not sure if any of this may work on Windows) that is then loaded from Python (obviously this involves going through the painful process of manual compilation of TensorFlow with Bazel). A possibly more realistic example can be found in TensorFlow Fold, an extension of TensorFlow for structured data that register (as of one) one custom operation here through a macro defined here that calls REGISTER_OP, and then in Python it loads the library and register its gradient here through their own registration function defined here that simply calls tf.NotDifferentiable (another name for tf.NoGradient)
tldr: It is rather hard, but it can be done and there are even a couple of examples out there.
As mentioned in #jdehesa's comments. You can implement your function with an "alternative gradient". Forgive me if my math is not correct, but I think a derivative returning "1" would be the correct way to have no effect on the backpropagation while still passing the learning through. For how to construct it, see here. The example I cited goes further and allows you to construct an activation function from a python function. So in place of the spiky function, substitute your function a, and in place of his derivative d_spiky replace it with
def constant(x):
return 1
So on the forward pass, a is applied in the layer and the the backwards pass 1 is applied which should simply pass the weight adjustments through.
You can then just create an Activation layer in Keras using this function.

Why does TensorFlow have a lot of mathematical equations re-implemented?

I was looking through the API in TensorFlow and notice that a lot of mathematical operations that already exist in python and numpy have been re-implemented (or at least given a tensorflow interface). For example:
is there a good reason to do this?
I've been searching over their page but can't find why they'd do this.
I do have some guesses though. One of my main guesses is that they probably want those operations to have some backpropagation effect on whatever Neural network graph that gets implementat. In other words, have their derivatives implemented. Is this one of the reasons? (wish I knew how to even check if my guess is right)
For example, in one of the most basic examples of linear regression, one defines the prediction function that one wants to implement:
product = tf.matmul(x,W)
y = product + b
instead of
product = tf.matmul(x,W)
y = tf.add(product, b)
Somehow the first implementation does not interfere with Stochastic Gradient Descent algorithm for training, so it probably doesn't matter if one uses numpy or tf.add to train? This is one aspect that confuses me, when do I know which one should I be using.
Or maybe they are performance reasons? Or maybe its to give those operations access to GPU if required to use GPUs?
You have to understand that you create a tensorflow graph with this operation, meaning they aren't the same as the numpy functions, they are more an abstraction of them.
Maybe you have noticed that you have to create a session and then evaluate the functions through that session to get a result, where with numpy functions they are executed directly. this is because this graph and its functions define what to do like writing down a formula, but to get results for a specific x (or whatever) you have to insert a value for x. This is what your doing through session and eval.
So to conclude this you define a graph with tensorflow which is a more abstract representation of the functions and the graph also isn't executed at runtime, then it is defined, it will be executed when you call the eval function and through that run the session.
Also notice that you cant mix numpy functions and tensorflow functions directly but you can define own tensorflow functions (https://www.tensorflow.org/versions/r0.9/how_tos/adding_an_op/index.html)
Btw I guess most of the tensorflow functions are using numpy under the hood. :)