TFF: Customizing the model implementation - google-colaboratory

Please can anyone explain to me this part of tuto:
here is the link https://www.tensorflow.org/federated/tutorials/federated_learning_for_image_classification
Part:
Customizing the model implementation
Keras is the recommended high-level model API for TensorFlow, and we encourage using Keras models (via tff.learning.from_keras_model or tff.learning.from_compiled_keras_model) in TFF whenever possible.
However, tff.learning provides a lower-level model interface, tff.learning.Model, that exposes the minimal functionality necessary for using a model for federated learning. Directly implementing this interface (possibly still using building blocks like tf.keras.layers) allows for maximum customization without modifying the internals of the federated learning algorithms.
So let's do it all over again from scratch.
Defining model variables, forward pass, and metrics
The first step is to identify the TensorFlow variables we're going to work with. In order to make the following code more legible, let's define a data structure to represent the entire set. This will include variables such as weights and bias that we will train, as well as variables that will hold various cumulative statistics and counters we will update during training, such as loss_sum, accuracy_sum, and num_examples.
MnistVariables = collections.namedtuple(
'MnistVariables', 'weights bias num_examples loss_sum accuracy_sum')

Roughly analogous to the multiple paths Keras exposes to create a Keras model, TFF exposes multiple distinct ways of creating a tff.learning.Model. One of them is through the constructor functions, tff.learning.from_keras_model or tff.learning.from_compiled_keras_model, but each of these functions constructs and returns an instance of the abstract base class tff.learning.Model; the purpose of this section of the tutorial is to show that it is possible to instead directly construct such an instance by implementing the appropriate methods in the abstract interface.
If it is the collections.namedtuple MnistVariables you are asking about, it is simply a data container class introduced for convenience, to help group the tf.Variables which will roughly be used by the TFF runtime to track state during training. One important thing to note from the tff.learning.Model documentation, evidenced by this tutorial, is the line:
All tf.Variables should be introduced in __init__
If you are familiar with TensorFlow Variables, you will understand that controlling their instantiation is quite important.

Related

tensorflow eager and imperative custom layers

In one deep learning notes (Stanford cs20si), I once saw the following statement regarding eager. I don't quite understand what does the imperative custom layers indicate, and how to understand this code example in the context of imperative custom layers?
Normally, using tensorflow you are not able to access the content of a tensor directly. This means, that you are not able to use if-statements. Instead, you have to construct both possible branches of the branch and then use tf.conditional to include a node which switches between these two, depending on the content of a tensor. This makes it sometimes hard to implement imperative commands in layers.
The example you posted above show, that you are now (with eager execution) able to access the content of tensors, which means, that you can write all the if-statements, for-loops and so on, directly in python and you do not have to construct a huge graph on your own for each possibility. As the code inside the layer is now executed just like a normal imperative programming language, you can call this kind of layer an imperative layer - this is identical to the motivation behind PyTorch.

How to use tf.layers classes instead of functions

It seems that tf.Layer modules come in two flavours: functions and classes. I normally use the functions directly (e.g, tf.layers.dense) but I'd like to know how to use classes directly (tf.layers.Dense). I've started experimenting with the new eager execution mode in tensorflow and I think using classes are going to be useful there as well but I haven't seen good examples in the documentation. Is there any part of TF documentation that shows how these are used?
I guess it would make sense to use them in a class where these layers are instantiated in the __init__ and then they're linked in the __call__ method when the inputs and dimensions are known?
Are these tf.layer classes related to tf.keras.Model? Is there an equivalent wrapper class for using tf.layers?
Update: for eager execution there's tfe.Network that must be inherited. There's an example here
tf.layers and tf.keras.layer classes are generally interchangeable and in fact at head (and thus by the next release - 1.9), the former actually inherits from the latter.
TensorFlow is moving towards consolidating on tf.keras APIs for constructing models as that makes state ownership more explicit (e.g., parameters are "owned" by the Layer object, as opposed to the functional style where all model parameters are put in a "collection" associated with the complete graph). This style works well for both eager execution and graph construction (support for eager execution is improving with every release). I'd recommend using tf.keras.layers and tf.keras.Model.
Some examples that you may find useful:
MNIST in the tensorflow/models repository
The programmer's guide
Other eager execution samples (where the exact same model definition works for both graph execution and eager execution).
Not all existing TensorFlow examples have been moved to this style, but they slowly will.
Hope that helps.

What is the difference of static Computational Graphs in tensorflow and dynamic Computational Graphs in Pytorch?

When I was learning tensorflow, one basic concept of tensorflow was computational graphs, and the graphs was said to be static.
And I found in Pytorch, the graphs was said to be dynamic.
What's the difference of static Computational Graphs in tensorflow and dynamic Computational Graphs in Pytorch?
Both frameworks operate on tensors and view any model as a directed acyclic graph (DAG), but they differ drastically on how you can define them.
TensorFlow follows ‘data as code and code is data’ idiom. In TensorFlow you define graph statically before a model can run. All communication with outer world is performed via tf.Session object and tf.Placeholder which are tensors that will be substituted by external data at runtime.
In PyTorch things are way more imperative and dynamic: you can define, change and execute nodes as you go, no special session interfaces or placeholders. Overall, the framework is more tightly integrated with Python language and feels more native most of the times. When you write in TensorFlow sometimes you feel that your model is behind a brick wall with several tiny holes to communicate over. Anyways, this still sounds like a matter of taste more or less.
However, those approaches differ not only in a software engineering perspective: there are several dynamic neural network architectures that can benefit from the dynamic approach. Recall RNNs: with static graphs, the input sequence length will stay constant. This means that if you develop a sentiment analysis model for English sentences you must fix the sentence length to some maximum value and pad all smaller sequences with zeros. Not too convenient, huh. And you will get more problems in the domain of recursive RNNs and tree-RNNs. Currently Tensorflow has limited support for dynamic inputs via Tensorflow Fold. PyTorch has it by-default.
Reference:
https://medium.com/towards-data-science/pytorch-vs-tensorflow-spotting-the-difference-25c75777377b
https://www.reddit.com/r/MachineLearning/comments/5w3q74/d_so_pytorch_vs_tensorflow_whats_the_verdict_on/
Both TensorFlow and PyTorch allow specifying new computations at any point in time. However, TensorFlow has a "compilation" steps which incurs performance penalty every time you modify the graph. So TensorFlow optimal performance is achieved when you specify the computation once, and then flow new data through the same sequence of computations.
It's similar to interpreters vs. compilers -- the compilation step makes things faster, but also discourages people from modifying the program too often.
To make things concrete, when you modify the graph in TensorFlow (by appending new computations using regular API, or removing some computation using tf.contrib.graph_editor), this line is triggered in session.py. It will serialize the graph, and then the underlying runtime will rerun some optimizations which can take extra time, perhaps 200usec. In contrast, running an op in previously defined graph, or in numpy/PyTorch can be as low as 1 usec.
In tensorflow you first have to define the graph, then you execute it.
Once defined you graph is immutable: you can't add/remove nodes at runtime.
In pytorch, instead, you can change the structure of the graph at runtime: you can thus add/remove nodes at runtime, dynamically changing its structure.

Tensorflow - Why are there so many similar or even duplicate functions in tf.nn and tf.layers / tf.losses / tf.contrib.layers etc?

In Tensorflow (as of v1.2.1), it seems that there are (at least) two parallel APIs to construct computational graphs. There are functions in tf.nn, like conv2d, avg_pool, relu, dropout and then there are similar functions in tf.layers, tf.losses and elsewhere, like tf.layers.conv2d, tf.layers.dense, tf.layers.dropout.
Superficially, it seems that this situation only serves to confuse: for example, tf.nn.dropout uses a 'keep rate' while tf.layers.dropout uses a 'drop rate' as an argument.
Does this distinction have any practical purpose for the end-user / developer?
If not, is there any plan to cleanup the API?
Tensorflow proposes on the one hand a low level API (tf., tf.nn....), and on the other hand, a higher level API (tf.layers., tf.losses.,...).
The goal of the higher level API is to provide functions that greatly simplify the design of the most common neural nets. The lower level API is there for people with special needs, or who wishes to keep a finer control of what is going on.
The situation is a bit confused though, because some functions have the same or similar names, and also, there is no clear way to distinguish at first sight which namespace correspond to which level of the API.
Now, let's look at conv2d for example. A striking difference between tf.nn.conv2d and tf.layers.conv2d is that the later takes care of all the variables needed for weights and biases. A single line of code, and voilà, you just created a convolutional layer. With tf.nn.conv2d, you have to take declare the weights variable yourself before passing it to the function. And as for the biases, well, they are actually not even handled: you need to add them yourself later.
Add to that that tf.layers.conv2d also proposes to add regularization and activation in the same function call, you can imagine how this can reduce code size when one's need is covered by the higher-level API.
The higher level also makes some decisions by default that could be considered as best practices. For example, losses in tf.losses are added to the tf.GraphKeys.LOSSES collection by default, which makes recovery and summation of the various component easy and somewhat standardized. If you use the lower level API, you would need to do all of that yourself. Obviously, you would need to be careful when you start mixing low and high level API functions there.
The higher-level API is also an answer to a great need from people that have been otherwise used to similarly high-level function in other frameworks, Theano aside. This is rather obvious when one ponders the number of alternative higher level APIs built on top of tensorflow, such as keras 2 (now part of the official tensorflow API), slim (in tf.contrib.slim), tflearn, tensorlayer, and the likes.
Finally, if I may add an advice: if you are beginning with tensorflow and do not have a preference towards a particular API, I would personnally encourage you to stick to the tf.keras.* API:
Its API is friendly and at least as good as the other high-level APIs built on top of the low-level tensorflow API
It has a clear namespace within tensorflow (although it can -- and sometimes should -- be used with parts from other namespaces, such as tf.data)
It is now a first-class citizen of tensorflow (it used to be in tf.contrib.keras), and care is taken to make new tensorflow features (such as eager) compatible with keras.
Its generic implementation can use other toolkits such as CNTK, and so does not lock you to tensorflow.

Tensorflow Define Op Polymorphic on Fully Defined vs not Fully Defined Shape

If I am defining a custom Op in Tensorflow, is it possible to provide two Kernels for the top that are polymorphic on whether the shape for the inputs are fully defined? For example, I can construct certain structures once at Kernel construction if the shape is fully known / defined.
It's not currently possible to do this. The kernel dispatch mechanism is implemented in a low-level part of the TensorFlow code where information about tensor shapes is not (generally) available.
However, the ability to specialize a graph based on known shapes does seem like a useful ability, and it might be worth raising this as a feature request on the GitHub issues page. One possible workaround would be to try registering an optimization pass that makes use of shape information and rewrites the names of ops with known input shapes to a different op that relies on static shape information (e.g. via an additional attr). However, doing this in TensorFlow currently requires you to rebuild from source.