report uninitialized variables Tensorboard - tensorflow

I am constructing a neural network in Tensorflow. I am using tf.layers module.
For some reason in the Graph visualisation i am seeing a 'report uninitialised variables' connected to every part of my graph.
Does anyone have an explanation of this? Is it related to the get_variable and variable_scope methods?
The graph seems to work. I am just trying to understand the meaning of these nodes. I am not sure if it is related to the fact that i am using a MonitoredTrainingSession.
It seems to be related to all the variables including of the optimizer.
https://i.stack.imgur.com/ySFM5.png
The is sort of an init node but it seems to say noop, not sure if proper initilaization is done by the MonitoredTrainingSession. The strange thing is that the graph still works and no 'Initialisation Error' is given. https://i.stack.imgur.com/umrRA.png

Did you use tf.train.Supervisor() in your code? I had the same case as yours when I used tf.train.Supervisor(). When tf.train.Supervisor() object is created, it will automatically verify model is fully initialized by running the tf.report_uninitialized_variables() operation, and this is why you see a report_uninitialized_variables block in your tensorboard. You can disable Supervisor to rerify your model, so that there will not be a report_uninitialized_variables block in your graph.
Solution: tf.train.Supervisor(ready_op=None)

Related

What is difference between a regular model checkpoint and a saved model in tensorflow?

There's a fairly clear difference between a model and a frozen model. As described in model_files, relevant part: Freezing
...so there's the freeze_graph.py script that takes a graph definition and a set of checkpoints and freezes them together into a single file.
Is a "saved_model" most similar to a "frozen_model" (and not a saved
"model_checkpoint")?
Is this defined somewhere in docs I'm missing?
In prior versions of tensorflow we would save and restore model
weights, but this seems to be in context of a "model_checkpoint" not
a "saved_model", is that still correct?
I'm asking more for the design overview here, not implementation specifics.
Checkpoint file only contains variables for specific model and should be loaded with either exactly same, predefined graph or with specific assignment_map to load only chosen variables. See https://www.tensorflow.org/api_docs/python/tf/train/init_from_checkpoint
Saved model is more broad cause it contains graph that can be loaded within a session and training could be continued. Frozen graph, however, is serialized and could not be used to continue training.
You can find all the info here https://www.tensorflow.org/guide/saved_model

Is an eager-graph compatible same code solution possible?

I am trying to write code that is eager and graph compatible. However, there is very little information online for how to do this, being a literal footnote on TensorFlow's website. Furthermore, what they have wrote is confusing, saying:
The same code written for eager execution will also build a graph during graph execution. Do this by simply running the same code in a new Python session where eager execution is not enabled.
This implies that a same code solution is possible, where the only change required is the addition or removal of tf.enable_eager_execution().
Currently I use tf.keras to define my model and tf.data for my input pipeline. However, many eager operations don't work in graph, with the opposite also being true.
For example, I keep track of my number of epochs using tf.train.Checkpoint(). In eager mode, after restoring I can access it using epochs.numpy() to assign its value to a local variable. However, this does not work with graphs, which instead would require sess.run(epochs) due to the values not being defined during execution.
Again, to compute my gradients in eager I need to use some form of autograd, in my case tf.GradientTape(). This is not compatible with graphs, as "tf.GradientTape.gradients() does not support graph control flow."
I see that tfe.py_func exists, but once again, this only works when eager is not enabled, thus not helping for this problem.
So how do I make a same code solution, when it seems that many aspects of eager and graph directly conflict with each other?

How can I change the network dynamically in tensorflow?

I have a deep fully connected network.
I want to be able to change the structure of middle layers of the network dynamically.
What is the best way of doing that?
What I did right now is to create an output placeholder for my network. I thought I will create a network dynamically by using feed_dict. However, when I run it it says.
`ValueError: No gradients provided for any variable, check your graph for ops that do not support gradients, between variables ... `
Tensorflow won't make this easy for you. Once you define the graph and open a session it's fixed. I believe you need to define a new graph, copy over your variables, and move on from there every time you want to alter the architecture. Kinda annoying for experimenting with this kind of stuff.
I have a friend/fellow researcher who's been experimenting with dynamic neural network architectures and is tackling this in pytorch, which has specific support for dynamically altering network architectures.

skipping layer in backpropagation in keras

I am using Keras with tensorflow backend and I am curious whether it is possible to skip a layer during backpropagation but have it execute in the forward pass. So here is what I mean
Lambda (lambda x: a(x))
I want to apply a to x in the forward pass but I do not want a to be included in the derivation when the backprop takes place.
I was trying to find a solution bit I could not find anything. Can somebody help me out here?
UPDATE 2
In addition to tf.py_func, there is now an official guide on how to add a custom op.
UPDATE
See this question for an example of writing a custom op with gradient purely in Python without needing to rebuild anything. Note that there are some limitations to the method (see the documentation of tf.py_func).
Not exactly a solution to the problem, but still kind of an answer and too long for comments.
That's not even a Keras issue, but a TensorFlow one. Each op defines its own gradient computation that is used during backpropagation. I you really wanted to something like that, you would need to implement the op into TensorFlow yourself (no easy feat) and define the gradient that you want - because you can't have "no gradient", if anything it would be 1 or 0 (otherwise you can't go on with backpropagation). There is a tf.NoGradient function in TensorFlow which causes an op to propagate zeros, but I don't think it is meant to / can be used out of TensorFlow own internals.
UPDATE
Okay so a bit more of context. TensorFlow graphs are built of ops, which are implemented by kernels; this is basically a 1-to-1 mapping, except that there may be for example a CPU and a GPU kernel for an op, hence the differentiation. The set of ops supported by TensorFlow is usually static, I mean it can change with newer versions, but in principle you cannot add your own ops, because the ops of a graph go into the Protobuf serialized format, so if you made your own ops then you would not be able to share your graph. Ops are then defined at C++ level with the macro REGISTER_OP (see for example here), and kernels with REGISTER_KERNEL_BUILDER (see for example here).
Now, where do gradients come into play? Well, the funny thing is that the gradient of an op is not defined at C++ level; there are ops (and kernels) that implement the gradient of other ops (if you look at the previous files you'll find ops/kernels with the name ending in Grad), but (as far as I'm aware) these are not explicitly "linked" at this level. It seems that the associations between ops and their gradients is defined in Python, usually via tf.RegisterGradient or the aforementioned tf.NoGradient (see for example here, Python modules starting with gen_ are autogenerated with the help of the C++ macros); these registrations inform the backpropagation algorithm about how to compute the gradient of the graph.
So, how to actually work this out? Well, you need to create at least one op in C++ with the corresponding kernel/s implementing the computation that you want for your forward pass. Then, if the gradient computation that you want to use can be expressed with existing TensorFlow ops (which is most likely), you would just need to call tf.RegisterGradient in Python and do the computation there in "standard" TensorFlow. This is quite complicated, but the good news is it's possible, and there's even an example for it (although I think they kinda forgot the gradient registration part in that one)! As you will see, the process involves compiling the new op code into a library (btw I'm not sure if any of this may work on Windows) that is then loaded from Python (obviously this involves going through the painful process of manual compilation of TensorFlow with Bazel). A possibly more realistic example can be found in TensorFlow Fold, an extension of TensorFlow for structured data that register (as of one) one custom operation here through a macro defined here that calls REGISTER_OP, and then in Python it loads the library and register its gradient here through their own registration function defined here that simply calls tf.NotDifferentiable (another name for tf.NoGradient)
tldr: It is rather hard, but it can be done and there are even a couple of examples out there.
As mentioned in #jdehesa's comments. You can implement your function with an "alternative gradient". Forgive me if my math is not correct, but I think a derivative returning "1" would be the correct way to have no effect on the backpropagation while still passing the learning through. For how to construct it, see here. The example I cited goes further and allows you to construct an activation function from a python function. So in place of the spiky function, substitute your function a, and in place of his derivative d_spiky replace it with
def constant(x):
return 1
So on the forward pass, a is applied in the layer and the the backwards pass 1 is applied which should simply pass the weight adjustments through.
You can then just create an Activation layer in Keras using this function.

Advantages of naming operations in Tensorflow?

For operations in Tensorflow, we have the option to pick a name.
Example:
tf.argmin(input, dimension, name=None)
What does this do? Does it help with debugging? If so, how?
Defining a name for ops and vars helps you to build a logically correct graph.
You can visualize, then, you graph in Tensorboard and see if everything you defined is exactly as you thought.
In general, giving a name to a variable or an op is a good practice. Further, when you export a graph and you re-use it somewhere, it's unhandy to use the default generated names by tensorflow to interact with the graph. You'll surely prefer to work with name with a sense.
Think about something like BatchNorm/relu:0 vs BatchNorm/network_output:0. The latter is more clear and describes exactly what you meant when defined that operation