I'm working on a project which needs to evaluate the performance of CNN/RNN after adding noise to all the variables. For example, if we have an simple MLP, I want to add a random gaussian noise to all the weight parameters, which is not difficult. However, it doesn't seem easy to manipulate the variables for RNN. For example, the variables inside the tf.contrib.rnn.BasicLSTMCell are encapsulated and not accessble for users.
I found a possible way to do this by using the tf.train.saver() function. I can print all the variables including the encapsulated variables. However, how to modify the value of all the variables is still not clear.
Is there an easy way to do this?
You can use tf.trainable_variables (doc) or tf.global_variables (doc) to get those variables, and add noisy to them.
Related
I use keras fit() method with custom metrics passed to model.
The metrics are stateful - i.e. are a subclass of a Metric, as described in https://keras.io/api/metrics/#as-subclasses-of-metric-stateful
When I run the code in a multi-gpu environment using a tf.distribute.MirroredStrategy() my metric code is called on every GPU separately with batch_size/no_of_gpus examples passed, which is reasonable to expect.
What happens next is that multiple scalars (one from every GPU) of the metric value need to be reduced to a single scalar, and what I get all the time is a sum reduction, while I would like to control that.
Keep in mind, that reduction parameter is the one of Loss in keras, and there is no such thing in the Metric class: https://github.com/tensorflow/tensorflow/blob/acbc065f8eb2ed05c7ab5c42b5c5bd6abdd2f91f/tensorflow/python/keras/metrics.py#L87
(the only crazy thing I tried was to inherit from a Mean class that is a subclass of a Metric but that didn't change anything)
reduction is mentioned in the metrics code, however this is a reduction over multiple accumulated values in a single metric object, and in multi-gpu setting - this is not the case, as every metric works in its own GPU and is somehow aggregated at the end.
The way I debugged it to understand this behaviour was - I was printing the shapes and the results inside update_state method of the metric. And then I looked at value of the metric in logs object in on_batch_end callback.
I tried looking at TF code, but couldn't find the place this is happening.
I would like to be able to control this behaviour - so either pick 'mean' or 'sum' for the metric, or at least know where it is being done in the code.
Edited: I guess this https://github.com/tensorflow/tensorflow/issues/39268 sheds some more light on this issue
I am facing the same problem as you (and that's why I found your question).
Seeing that it's been 15 days since you asked the question and there are no answers/comments yet, I thought I might share my temporary workaround.
Like you, I also think that a SUM reduction has been performed when combining progress over multiple GPUs. What I did is to pass the number of GPUs (e.g. given by the num_replicas_in_sync attribute of your tf.distribute strategy object) into the __init__(...) constructor of your sub-classed metric object, and use it to divide the return value in the results() method.
Potentially, you could also use tf.distribute.get_strategy() from within the metric object to make it "strategy aware", and use the information to decide how to modify the values in an ad hoc manner so that the SUM reduction will produce what you want.
I hope this helps for now, whether as a suggestion or as a confirmation that you're not alone on this.
When implementing the subclass of the Keras Metric class, you have to override the merge_state() function correctly. If you do not override this function, the default implementation will be used - which is a simple sum.
See: https://www.tensorflow.org/api_docs/python/tf/keras/metrics/Metric
I think many other people like me might be interested in how they can use GPFlow for their special problems. The key is how GPFlow is customizable, and a good example would be very helpful.
In my case, I read and tried lots of comments in raised issues without any real success. Setting kernel model parameters is not straightforward (creating with default values, and then do it via the delete object method). Transform method is vague.
It would be really helpful if you could add an example showing. how one can initialize and set bounds of an anisotropic kernel model (length-scales values and bounds, variances, ...) and specially adding observations error (as an array-like alpha parameter)
If you just want to set a value, then you can do
model = gpflow.models.GPR(np.zeros((1, 1)),
np.zeros((1, 1)),
gpflow.kernels.RBF(1, lengthscales=0.2))
Alternatively
model = gpflow.models.GPR(np.zeros((1, 1)),
np.zeros((1, 1)),
gpflow.kernels.RBF(1))
model.kern.lengthscales = 0.2
If you want to change the transform, you either need to subclass the kernel, or you can also do
with gpflow.defer_build():
model = gpflow.models.GPR(np.zeros((1, 1)),
np.zeros((1, 1)),
gpflow.kernels.RBF(1))
transform = gpflow.transforms.Logistic(0.1, 1.))
model.kern.lengthscales = gpflow.params.Parameter(0.3, transform=transform)
model.compile()
You need the defer_build to stop the graph being compiled before you've changed the transform. Using the approach above, the compilation of the tensorflow graph is delayed (until the explicit model.compile()) so is built with the intended bounding transform.
Using an array parameter for likelihood variance is outside the scope of gpflow. For what it's worth (and because it has been asked about before), that particular model is especially problematic as it is not clear how test points are defined.
Setting kernel parameters can be done using the .assign() function, or through direct assignment. See the notebook https://github.com/GPflow/GPflow/blob/develop/doc/source/notebooks/understanding/tf_graphs_and_sessions.ipynb. You do not need to delete a parameter to assign a new value to it.
If you want to have per-datapoint noise, you will need to implement your own custom likelihood, which you can do by taking Gaussian likelihood in likelihoods.py as an example.
If by "bounds" you mean limiting the optimisation range for a parameter, you can use the Logistic transform. If you want to pass in a custom transformation for a parameter, you can pass a constructed Parameter object into constructors with a custom transform. Alternatively you can assign a newly created Parameter with a new transform to the model.
Here is more information on how to access and change GPflow parameters: viewing, getting and settings parameters documentation.
Extra bit for #user1018464 answer about replacing transform in existing parameter: changing transformation is a bit tricky, you can't change transformation once a model was compiled in TensorFlow.
E.g.
likelihood = gpflow.likelihoods.Gaussian()
likelihood.variance.transform = gpflow.transforms.Logistic(1., 10.)
----
GPflowError: Parameter "Gaussian/variance" has already been compiled.
Instead you have to reset GPflow object:
likelihood = gpflow.likelihoods.Gaussian() # All tensors compiled
likelihood.clear()
likelihood.variance.transform = gpflow.transforms.Logistic(2, 5)
likelihood.variance = 2.5
likelihood.compile()
In tensorflow, there's a class GraphKeys. I came across many codes, where it's been used. But it's not explained very well what's the usage of this class both in tensorflow documentation as well as in the codes, where it has been used.
Can someone please explain what's the usage of tf.GraphKey?
Thank you!
As far as I know, tf.GraphKeys is a collection of collections of keys for variables and ops in the graph. The usage (just as common python dictionaries) is to retrieve variables and ops.
Given that said, here are some subsets of tf.GraphKeys I came across:
GLOBAL_VARIABLES and LOCAL_VARIABLES contain all variables of the graph, which need to be initialized before training. tf.global_variables() returns the global variables in a list and can be used with tf.variables_initializer for initialization.
Variables created with option trainable=True will be added to TRAINABLE_VARIABLES and will be fetched and updated by any optimizer under tf.train during training.
SUMMARIES contains keys for all summaries added by tf.summary (scalar, image, histogram, text, etc). tf.summary.merge_all gathers all such keys and returns an op to be run and written to file so that you can visualize them on tensorboard.
Custom functions to update some variables can be added to UPDATE_OPS and separately run at each iteration using sess.run(tf.get_collection(tf.GraphKeys.UPDATE_OPS)). In this case, these variables are set trainable=False to avoid being updated by gradient descent.
You may create your own collections using tf.add_to_collection(some_name, var_or_op) and retrieve the variable or op later. You may retrieve specific variables or ops using tf.get_collection() and tweak the scope.
Usually, if we define a function through a neural network in a class, then in another class, if we need the function's parameter or variables list, in tensorflow, we can use tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, scope="function name"), it is convenient and familiar to me, although I guess there are many other more efficient ways to do so.
However, in some cases, we may need to define a function which is built upon two different neural networks, say F(x) = F(NN_1(x), NN_2(x)), then in another class, what is the right way to get the two variables list of both NN_1() and NN_2()? It's clear that use tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, scope="function name here leads to get mixed variables list of F(x) insetead of two variables lists of NN_1 and NN_2.
def function()
with tf.name_scope(function):
with tf.name_scope(subfunction_1):
neural_network_1
with tf.name_scope(subfunction_2):
neural_network_2
Within a tree of name scopes you can access the individual scope variables with:
vars_1 = tf.get_collection(
tf.GraphKeys.TRAINABLE_VARIABLES, scope="function_name/subfunction_name_1")
vars_2 = tf.get_collection(
tf.GraphKeys.TRAINABLE_VARIABLES, scope="function_name/subfunction_name_2")
When should I use one or another? Tutorials and examples use either Sequential([Stabilizer(), Recurrence(LSTM(hidden_dim))]) or LSTMP_component_with_self_stabilization from Examples/common/nn.py. I've tried replacing the former with Recurrence(LSTM(hidden_dim, enable_self_stabilization=True)) in the char_rnn.py example, but the results are significantly worse.
The Stabilizer layer multiplies its input with a learnable scalar. This simple trick has been shown to significantly improve convergence and stability. It has some similarity with BatchNormalization. Generally, when you can use BatchNormalization, you should try that first. Where that is not possible, which is specifically inside recurrent loops, I recommend to use Stabilizer instead.
Normally, you must inject it explicitly in your model. A special case are the recurrent step functions (e.g. LSTM), which include Stabilizers inside. Use enable_self_stabilization=True to enable that. Those built-in Stabilizers only apply to internal variables. For the main input, you must insert a Stabilizer yourself.
If you include explicit Stabilizers but set enable_self_stabilization=False (e.g. as a default_option), then those explicit Stabilizers are no-ops.
It is not my experience that Stabilizer makes things worse. It is generally a sure-fire thing to improve convergence. It does change numeric ranges, though. So if it makes convergence worse, I suggest to experiment with different hyper-parameter settings, e.g. reduce the learning rate.