The docs for tf.data.Dataset.map() state
Performance can often be improved by setting num_parallel_calls so that map will use multiple threads to process elements.
In contrast, the older(?) tf.keras.utils.GeneratorEnqueuer had the use_multiprocessing argument.
Related
I started by Tensorflow journey when it already came to 2.0.0, So never used graphs and sessions as in version1. But recently met tf.function and autographs which suits me. (but what i know is it is used only for train step)
Now when reading project code, many people use tf.function decorator on many other functions when they wanna build graphs. But i don't exactly get their point. How to know when to use graph and when not?
Can anyone help me?
Solution
The decorator, #tf.function conveniently converts a python function to a static tensorflow graph. TensorFlow operates in eager mode by default since version 2.0.0. Although eager mode could help you in line-by-line execution, this comes with the pitfall of relatively slower TensorFlow-code execution when compared to static-graph. Converting a certain function into a static graph increases execution speed while training your model.
Quoting tf.function documentation:
Functions can be faster than eager code, especially for graphs with many small ops. But for graphs with a few expensive ops (like convolutions), you may not see much speedup.
The static graph is created once and does not get updated if the function is called repeatedly with different values (not passed as the input-arguments). You should avoid using #tf.function in such scenarios or update the function definition (if possible) to include all the necessary variability through the input-arguments. However,
Now, if your function gets all its inputs through the function arguments, then if you apply #tf.function you will not see any problem.
Here is an example.
### When not to use #tf.function ###
# some variable that changes with time
var = timestamp()
#tf.function
def func(*args, **kwargs):
# your code
return var
In the example above, the function func() although depends on var, it does not access the variable var through its arguments. Thus, when #tf.function is applied for the first time, it creates a static-graph for func(). However, when the value of var changes in future, this will not get updated in the static-graph. See this for more clarity. Also, I would highly encourage you to see the references section.
For Debugging
Quoting source
You can use tf.config.experimental_run_functions_eagerly (which temporarily disables running functions as functions) for debugging purposes.
References
Better performance with tf.function
When to utilize tf.function
TensorFlow 2.0: tf.function and AutoGraph
The difference between the two is muddled in my head, notwithstanding the nuances of what is eager and what isn't. From what I gather, the #tf.function decorator has two benefits in that
it converts functions into TensorFlow graphs for performance, and
allows for a more Pythonic style of coding by interpreting many (but not all) common-place Python operations into tensor operations, e.g. if into tf.cond, etc.
From the definition of tf.py_function, it seems that it does just #2 above. Hence, why bother with tf.py_function when tf.function does the job with a performance improvement to boot and without the inability of the former to serialize?
They do indeed start to resemble each other as they are improved, so it is useful to see where they come from. Initially, the difference was that:
#tf.function turns python code into a series of TensorFlow graph nodes.
tf.py_function wraps an existing python function into a single graph node.
This means that tf.function requires your code to be relatively simple while tf.py_function can handle any python code, no matter how complex.
While this line is indeed blurring, with tf.py_function doing more interpretation and tf.function accepting lot's of complex python commands, the general rule stays the same:
If you have relatively simple logic in your python code, use tf.function.
When you use complex code, like large external libraries (e.g. connecting to a database, or loading a large external NLP package) use tf.py_function.
I was reading the TF performance guide for Data Loading section. For prefetch it says,
The tf.data API provides a software pipelining mechanism through the
tf.data.Dataset.prefetch transformation, which can be used to decouple
the time when data is produced from the time when data is consumed. In
particular, the transformation uses a background thread and an
internal buffer to prefetch elements from the input dataset ahead of
the time they are requested. The number of elements to prefetch should
be equal to (or possibly greater than) the number of batches consumed
by a single training step. You could either manually tune this value,
or set it to tf.data.experimental.AUTOTUNE which will prompt the
tf.data runtime to tune the value dynamically at runtime.
What is AUTOTUNE doing internally? Which algorithm, heuristics are being applied?
Additionally, in practice, what kind of manual tuning is done?
tf.data builds a performance model of the input pipeline and runs an optimization algorithm to find a good allocation of its CPU budget across all parameters specified as AUTOTUNE. While the input pipeline is running, tf.data tracks the time spent in each operation, so that these times can be fed into the optimization algorithm.
The OptimizationOptions object gives some control over how autotune will behave.
The authors provide details about the AUTOTUNE in their vldb paper https://vldb.org/pvldb/vol14/p2945-klimovic.pdf. Refer section 3.3.2.
I am using tensorflow 1.12 and the eager execution mode. I want to summarize the graph to the tensorboard log. I found a function called tf.contrib.summary.graph, however, it requires a parameter called param. What should I pass for this parameter? Thanks.
As documented, the param parameter is for the graph object, which in eager can be a tf.Graph, tf.GraphDef, or a string containing a serialized GraphDef protocol buffer.
Note that in eager execution, by definition there isn't a single computation graph any more because the ops execute immediately instead of building a graph, so this is unlikely to be useful unless you're building traditional tf.Graph computation graphs in addition to running logic eagerly. We may introduce some ways to record graphs in eager mode for TF 2.0, but there still won't be a single graph.
When I was learning tensorflow, one basic concept of tensorflow was computational graphs, and the graphs was said to be static.
And I found in Pytorch, the graphs was said to be dynamic.
What's the difference of static Computational Graphs in tensorflow and dynamic Computational Graphs in Pytorch?
Both frameworks operate on tensors and view any model as a directed acyclic graph (DAG), but they differ drastically on how you can define them.
TensorFlow follows ‘data as code and code is data’ idiom. In TensorFlow you define graph statically before a model can run. All communication with outer world is performed via tf.Session object and tf.Placeholder which are tensors that will be substituted by external data at runtime.
In PyTorch things are way more imperative and dynamic: you can define, change and execute nodes as you go, no special session interfaces or placeholders. Overall, the framework is more tightly integrated with Python language and feels more native most of the times. When you write in TensorFlow sometimes you feel that your model is behind a brick wall with several tiny holes to communicate over. Anyways, this still sounds like a matter of taste more or less.
However, those approaches differ not only in a software engineering perspective: there are several dynamic neural network architectures that can benefit from the dynamic approach. Recall RNNs: with static graphs, the input sequence length will stay constant. This means that if you develop a sentiment analysis model for English sentences you must fix the sentence length to some maximum value and pad all smaller sequences with zeros. Not too convenient, huh. And you will get more problems in the domain of recursive RNNs and tree-RNNs. Currently Tensorflow has limited support for dynamic inputs via Tensorflow Fold. PyTorch has it by-default.
Reference:
https://medium.com/towards-data-science/pytorch-vs-tensorflow-spotting-the-difference-25c75777377b
https://www.reddit.com/r/MachineLearning/comments/5w3q74/d_so_pytorch_vs_tensorflow_whats_the_verdict_on/
Both TensorFlow and PyTorch allow specifying new computations at any point in time. However, TensorFlow has a "compilation" steps which incurs performance penalty every time you modify the graph. So TensorFlow optimal performance is achieved when you specify the computation once, and then flow new data through the same sequence of computations.
It's similar to interpreters vs. compilers -- the compilation step makes things faster, but also discourages people from modifying the program too often.
To make things concrete, when you modify the graph in TensorFlow (by appending new computations using regular API, or removing some computation using tf.contrib.graph_editor), this line is triggered in session.py. It will serialize the graph, and then the underlying runtime will rerun some optimizations which can take extra time, perhaps 200usec. In contrast, running an op in previously defined graph, or in numpy/PyTorch can be as low as 1 usec.
In tensorflow you first have to define the graph, then you execute it.
Once defined you graph is immutable: you can't add/remove nodes at runtime.
In pytorch, instead, you can change the structure of the graph at runtime: you can thus add/remove nodes at runtime, dynamically changing its structure.