Integrating tfe.EagerVariableStore with tfe.Checkpoint? - tensorflow

tfe.Checkpoint seems to require things to be checkpointed to implement CheckpointableBase which EagerVariableStore doesn't.
What is the right way then to use EagerVariableStore to "eagerify" the functional parts of Tensorflow with ability to checkpoint?
Providing some working code would be appreciated.

For eagerifying functional code, I'd suggest tf.make_template rather than EagerVariableStore directly. When executing eagerly, this will create a variable store automatically (allowing variable reuse with tf.get_variable), and the object tf.make_template returns is checkpointable.
import tensorflow as tf
tf.enable_eager_execution()
def uses_functional_layers(x):
return tf.layers.dense(inputs=x, units=1)
save_template = tf.make_template("save_template", uses_functional_layers)
save_checkpoint = tf.train.Checkpoint(model=save_template)
save_template(tf.ones([1, 1]))
save_template.variables[0].assign([42.])
save_output = save_template(tf.ones([1, 1]))
save_path = save_checkpoint.save('/tmp/tf_template_ckpt')
So we make a function which wraps our functional layers / tf.get_variable usage, then make a template object out of that with tf.make_template, and finally can checkpoint that template object after it has been called once to create its variables.
An advantage of doing it this way is that we get restore-on-create for variables in the template, meaning the template is evaluated with the restored values the first time it is called:
import numpy
# Create a second template to load the checkpoint into
restore_template = tf.make_template("save_template", uses_functional_layers)
tf.train.Checkpoint(model=restore_template).restore(save_path)
numpy.testing.assert_allclose(
save_output,
restore_template(tf.ones([1, 1]))) # Variables are restored on creation
numpy.testing.assert_equal([42.], restore_template.variables[0].numpy())
Nested templates work too. Note that the template object strips its own variable_scope from variables created within it, but otherwise uses the full variable names (which may be more fragile than usual object-based checkpointing):
Looking up variables repeatedly with tf.get_variable (done each time the template is evaluated) is also quite slow, which is one reason TensorFlow is moving toward object-oriented Keras-style layers instead of functional layers.

I have found a "hackish" way, but works!
The main problem is:
tfe.EagerVariableStore doesn't inherit CheckpointableBase, hence it can't be saved with tfe.Checkpoint
The big idea is:
We are going to create a CheckpointableBase object that "points" to every variable stored in the tfe.EagerVariableStore
How to know what are stored in EagerVariableStore?
Reference: https://github.com/tensorflow/tensorflow/blob/r1.8/tensorflow/python/ops/variable_scope.py
It says that EagerVariableStore uses _VariableStore to store all the variables, via _store.
Now, the _VariableStore stores the variables in self._vars as a dictionary.
If we have a container = tfe.EagerVariableStore(), we can get all the variables via container._store._vars as a dictionary.
How to create a CheckpointableBase that points to every variable then?
We will use tfe.Checkpointable since it has __setattr__.
checkpointable = tfe.Checkpointable()
for k, v in container._store._vars.items():
setattr(checkpointable, k, v)
How to combine the two?
As we have a tfe.Checkpoint for saving, all we need to do is this:
saver = tfe.Checkpoint(checkpointable=checkpointable)
saver.save(...)
And saver.restore(...) to restore.
Your tfe.EagerVariableStore need not to be changed, the checkpointable after restored via tfe.Checkpoint will "replace" the values in tfe.EagerVariableStore automigically!

Related

Kernel's hyper-parameters; initialization and setting bounds

I think many other people like me might be interested in how they can use GPFlow for their special problems. The key is how GPFlow is customizable, and a good example would be very helpful.
In my case, I read and tried lots of comments in raised issues without any real success. Setting kernel model parameters is not straightforward (creating with default values, and then do it via the delete object method). Transform method is vague.
It would be really helpful if you could add an example showing. how one can initialize and set bounds of an anisotropic kernel model (length-scales values and bounds, variances, ...) and specially adding observations error (as an array-like alpha parameter)
If you just want to set a value, then you can do
model = gpflow.models.GPR(np.zeros((1, 1)),
np.zeros((1, 1)),
gpflow.kernels.RBF(1, lengthscales=0.2))
Alternatively
model = gpflow.models.GPR(np.zeros((1, 1)),
np.zeros((1, 1)),
gpflow.kernels.RBF(1))
model.kern.lengthscales = 0.2
If you want to change the transform, you either need to subclass the kernel, or you can also do
with gpflow.defer_build():
model = gpflow.models.GPR(np.zeros((1, 1)),
np.zeros((1, 1)),
gpflow.kernels.RBF(1))
transform = gpflow.transforms.Logistic(0.1, 1.))
model.kern.lengthscales = gpflow.params.Parameter(0.3, transform=transform)
model.compile()
You need the defer_build to stop the graph being compiled before you've changed the transform. Using the approach above, the compilation of the tensorflow graph is delayed (until the explicit model.compile()) so is built with the intended bounding transform.
Using an array parameter for likelihood variance is outside the scope of gpflow. For what it's worth (and because it has been asked about before), that particular model is especially problematic as it is not clear how test points are defined.
Setting kernel parameters can be done using the .assign() function, or through direct assignment. See the notebook https://github.com/GPflow/GPflow/blob/develop/doc/source/notebooks/understanding/tf_graphs_and_sessions.ipynb. You do not need to delete a parameter to assign a new value to it.
If you want to have per-datapoint noise, you will need to implement your own custom likelihood, which you can do by taking Gaussian likelihood in likelihoods.py as an example.
If by "bounds" you mean limiting the optimisation range for a parameter, you can use the Logistic transform. If you want to pass in a custom transformation for a parameter, you can pass a constructed Parameter object into constructors with a custom transform. Alternatively you can assign a newly created Parameter with a new transform to the model.
Here is more information on how to access and change GPflow parameters: viewing, getting and settings parameters documentation.
Extra bit for #user1018464 answer about replacing transform in existing parameter: changing transformation is a bit tricky, you can't change transformation once a model was compiled in TensorFlow.
E.g.
likelihood = gpflow.likelihoods.Gaussian()
likelihood.variance.transform = gpflow.transforms.Logistic(1., 10.)
----
GPflowError: Parameter "Gaussian/variance" has already been compiled.
Instead you have to reset GPflow object:
likelihood = gpflow.likelihoods.Gaussian() # All tensors compiled
likelihood.clear()
likelihood.variance.transform = gpflow.transforms.Logistic(2, 5)
likelihood.variance = 2.5
likelihood.compile()

How do I store and retrieve Tensors in a global namespace in Tensorflow?

I am trying to read a big Tensorflow project. For a project that the nodes of the computation graph are scattered around the project, I wonder if there is a way to store a Tensor node of the computation graph and add that node to the fetch list in sess.run?
For example, if I want to add probs at line 615 of
https://github.com/allenai/document-qa/blob/master/docqa/nn/span_prediction.py to a global namespace, is there a method like tf.add_node(probs, "probs"), and later I could get tf.get_node("probs"), just for the sake of conveniently passing node around the project.
A more general question would be, what will be a better idea to structure the tensorflow code and improve the efficiency of experimenting with different models.
Of course you can. To retrieve it later, you'll have to give it a name so that you can retrieve it by name. Take probs in your code as an example. It's created with tf.nn.softmax() function, the API for which is shown below.
tf.nn.softmax(
logits,
axis=None,
name=None,
dim=None
)
See the parameter name? You can add this parameter to line 615 like this:
probs = tf.nn.softmax(all_logits, name='my_tensor')
Later when you need it, you can call tf.Graph.get_tensor_by_name(name) to retrieve this tensor.
graph = tf.get_default_graph()
retrieved_probs = graph.get_tensor_by_name('my_tensor:0')
'my_tensor' is the name of the softmax operation, and ':0' should be added to the end of it meaning that you're retrieving the tensor instead of the operation. When calling Graph.get_operation_by_name(), no ':0' should be added.
You'll have to make sure that the tensor exists(it might be created in the code executed before this line, or it might be restored from a meta graph file). If it's created in a variable scope, you'll also have to add the scope name and a '/' in the front of the name param. For example, 'my_scope/my_tensor:0'.

Usage of tf.GraphKeys

In tensorflow, there's a class GraphKeys. I came across many codes, where it's been used. But it's not explained very well what's the usage of this class both in tensorflow documentation as well as in the codes, where it has been used.
Can someone please explain what's the usage of tf.GraphKey?
Thank you!
As far as I know, tf.GraphKeys is a collection of collections of keys for variables and ops in the graph. The usage (just as common python dictionaries) is to retrieve variables and ops.
Given that said, here are some subsets of tf.GraphKeys I came across:
GLOBAL_VARIABLES and LOCAL_VARIABLES contain all variables of the graph, which need to be initialized before training. tf.global_variables() returns the global variables in a list and can be used with tf.variables_initializer for initialization.
Variables created with option trainable=True will be added to TRAINABLE_VARIABLES and will be fetched and updated by any optimizer under tf.train during training.
SUMMARIES contains keys for all summaries added by tf.summary (scalar, image, histogram, text, etc). tf.summary.merge_all gathers all such keys and returns an op to be run and written to file so that you can visualize them on tensorboard.
Custom functions to update some variables can be added to UPDATE_OPS and separately run at each iteration using sess.run(tf.get_collection(tf.GraphKeys.UPDATE_OPS)). In this case, these variables are set trainable=False to avoid being updated by gradient descent.
You may create your own collections using tf.add_to_collection(some_name, var_or_op) and retrieve the variable or op later. You may retrieve specific variables or ops using tf.get_collection() and tweak the scope.

How do you list the variables in the graph in tensorflow?

This looks like it should work but it just prints an empty list after importing the pretrained Inception network: https://gist.github.com/tachim/6d44136171be86430dba16fecafa5872.
That shows contents of VARIABLES collection which does not get restored with import_graph_def. However, it does get restored when you import meta graph
You could go over Graph and show all names for ops of type Variable
[op.name for op in tf.get_default_graph().get_operations() if op.op_def and op.op_def.name=='Variable']
This gives you name of the variable op in the Graph, rather than Python object wrapping it, so you'd be limited to low-level Graph-based API. IE, you can fetch the variable value by using this name as argument to sess.run, but there's no convenient way to get to its initializer or assign_op.

Example to save and load a sub graph in TensorFlow?

I created a model within a variable/name scope and would like to store its graph definition and variables to disk to later load it without defining the graph again. How can I save and load all operations and variables within a given variable/name scope?
Conveniently, I would like to just use tf.Saver. Its save() has an option to store the meta graph but restore() does not seem to import it. Moreover, in the saver constructor, I can specify a list of parameters, but not a name scope to control what operations are saved.
There is also tf.train.write_graph() but I couldn't find an explanation of what it is and how it relates to the saver class and meta graph.