Explanation for tf.contrib.framework.get_global_step() - tensorflow

I was trying out programming with tensorflow and I came across this function:
global_step = tf.contrib.framework.get_global_step()
Could anyone explain to me what exactly is happening here? I found this explanation in tensorflow's documentation, but it wasn't very clear to me.
global_step: An integer Variable representing the step counter to increment for each model training run. Can easily be created/incremented in TensorFlow via the get_global_step() function.
where get_global_step returns the global tensor.
Thank you very much!

It returns the global_step variable.
As far as I understand, this variable is used to keep track of the current (global) training step, i.e. when you pass it to optimizers, they will increment it everytime they do an update on the parameters.
From the definition of tf.train.Optimizer.minimize() you can see how this works:
global_step: Optional Variable to increment by one after the variables have been updated.
One other use case is to save a checkpoint:
saver.save(sess, FLAGS.train_dir, global_step=step)
PS:
You need to define it before you can call this function. i.e.: global_step = tf.Variable(0, name="global_step", trainable=False)
If you have multiple trainings within the same session, this variable will be incremented by all optimizers.

Related

How to get current global_step in data pipeline

I am trying to create a filter which depends on the current global_step of the training but I am failing to do so properly.
First, I cannot use tf.train.get_or_create_global_step() in the code below because it will throw
ValueError: Variable global_step already exists, disallowed. Did you mean to set reuse=True or reuse=tf.AUTO_REUSE in VarScope? Originally defined at:
This is why I tried fetching the scope with tf.get_default_graph().get_name_scope() and within that context I was able to "get" the global step:
def filter_examples(example):
scope = tf.get_default_graph().get_name_scope()
with tf.variable_scope(scope, reuse=tf.AUTO_REUSE):
current_step = tf.train.get_or_create_global_step()
subtokens_by_step = tf.floor(current_step / curriculum_step_update)
max_subtokens = min_subtokens + curriculum_step_size * tf.cast(subtokens_by_step, dtype=tf.int32)
return tf.size(example['targets']) <= max_subtokens
dataset = dataset.filter(filter_examples)
The problem with this is that it does not seem to work as I expected. From what I am observing, the current_step in the code above seems to be 0 all the time (I don't know that, just based on my observations I assume that).
The only thing that seems to make a difference, and it sounds weird, is restarting the training. I think, also based on observations, in that case current_step will be the actual current step of the training at this point. But the value itself won't update as the training continues.
If there a way to get the actual value of the current step and use it in my filter like above?
Environment
Tensorflow 1.12.1
As we discussed in the comments, having and updating your own counter might be an alternative to using the global_step variable. The counter variable could be updated as follows:
op = tf.assign_add(counter, 1)
with tf.control_dependencies(op):
# Some operation here before which the counter should be updated
Using tf.control_dependencies allows to "attach" the update of counter to a path within the computational graph. You can then use the counter variable wherever you need it.
If you use variables inside datasets you need to reinitilize iterators in tf 1.x.
iterator = tf.compat.v1.make_initializable_iterator(dataset)
init = iterator.initializer
tensors = iterator.get_next()
with tf.compat.v1.Session() as sess:
for epoch in range(num_epochs):
sess.run(init)
for example in range(num_examples):
tensor_vals = sess.run(tensors)

Why there is a need of using tf.Variable?

In the following code I am unable to understand the need of using tf.Variable? I get the same value whether I use tf.Variable or omit it.
`initial = tf.Variable(tf.truncated_normal(shape=[1,10,1], mean=0,
stddev=0.1,seed=123))`
As I answered in your another post, I will post it again, In tensorflow, anything that is created using tf.Variable(), will get updated during training in back-propagation, for example, a weight matrix.
Ideally, by default, every tf.Variable() becomes trainable unless you specify it non-trainable explicitly.
If you do this initial = tf.truncated_normal([5,10], mean=0, stddev=0.1), then tensorflow will not know that it's a trainable variable and hence it will not be trained. It will stay constant throughout the training.

Keras - coding a custom optimizer and attempting to compute a second gradient inside get_updates

I am a researcher in optimization and I trying to write a custom optimizer. I have come across a problem. I have asked in many places and so far no response.
Take any optimizer code, say just copy SGD. In the beginning of get_updates, you see
grads = self.get_gradients(loss, params)
now add the following line right after this one:
gradsb = self.get_gradients(loss, [tf.Variable(a) for a in params])
this should compute the gradients at a new tensor, with all the values the same as before
now try to see what you get:
for a in gradsb:
print(a)
you get a list of Nones (but if you print the list grads you see that they are still Tensors)
Why?
And how to circumvent this problem? This is important as I'd like to compute the gradients at another point for my algorithm.
When you write gradsb = self.get_gradients(loss, [tf.Variable(a) for a in params]) you are defining a new tf.Variable for each a in params. Because the loss does not depend on these new variables, your gradients are None.
If you want to compute a second gradient you need to make sure that you're computing it with respect to Tensors that the objective does depend on.
Apparently even replacing the current vector of parameters is not OK!! If I type this in the code:
grads = self.get_gradients(loss, params)
tempparam = [tf.Variable(a) for a in params]
params = [tf.add(a,a) for a in params]
gradsn = self.get_gradients(loss, params)
for a in gradsn:
print(a)
params = [tf.Variable(a) for a in tempparam]
The result is still that None is printed!!
I know you understand what I am trying to do, at each iteration of get_updates, I would like to compute the gradients at a (slightly) different value of the parameter tensors, and use that to construct the update to the parameters for optimization and training. Is there any way to do this within the keras package?

What does global_step mean in Tensorflow?

In this is tutorial code from TensorFlow website,
could anyone help explain what does global_step mean?
I found on the Tensorflow website written that global step is used count training steps, but I don't quite get what exactly it means.
Also, what does the number 0 mean when setting up global_step?
def training(loss,learning_rate):
tf.summary.scalar('loss',loss)
optimizer = tf.train.GradientDescentOptimizer(learning_rate)
# Why 0 as the first parameter of the global_step tf.Variable?
global_step = tf.Variable(0, name='global_step',trainable=False)
train_op = optimizer.minimize(loss, global_step=global_step)
return train_op
According to Tensorflow doc global_step: increment by one after the variables have been updated. Does that mean after one update global_step becomes 1?
global_step refers to the number of batches seen by the graph. Every time a batch is provided, the weights are updated in the direction that minimizes the loss. global_step just keeps track of the number of batches seen so far. When it is passed in the minimize() argument list, the variable is increased by one. Have a look at optimizer.minimize().
You can get the global_step value using tf.train.global_step().
Also handy are the utility methods tf.train.get_global_step or tf.train.get_or_create_global_step.
0 is the initial value of the global step in this context.
The global_step Variable holds the total number of steps during training across the tasks (each step index will occur only on a single task).
A timeline created by global_step helps us understand know where we are in
the grand scheme, from each of the tasks separately. For instance, the loss and accuracy could be plotted against global_step on Tensorboard.
show you a vivid sample below:
code:
train_op = tf.train.GradientDescentOptimizer(learning_rate=LEARNING_RATE).minimize(loss_tensor,global_step=tf.train.create_global_step())
with tf.Session() as sess:
...
tf.logging.log_every_n(tf.logging.INFO,"np.mean(loss_evl)= %f at step %d",100,np.mean(loss_evl),sess.run(tf.train.get_global_step()))
corresponding print
INFO:tensorflow:np.mean(loss_evl)= 1.396970 at step 1
INFO:tensorflow:np.mean(loss_evl)= 1.221397 at step 101
INFO:tensorflow:np.mean(loss_evl)= 1.061688 at step 201
There are networks, e.g. GANs, that may need two (or more) different steps. Training a GANs with the WGAN specification requires that the steps on the discriminator (or critic) D are more than the ones done on the generator G. In that case, it is usefull to declare different global_steps variables.
Example: (G_lossand D_loss are the loss of the generator and the discriminator)
G_global_step = tf.Variable(0, name='G_global_step', trainable=False)
D_global_step = tf.Variable(0, name='D_global_step', trainable=False)
minimizer = tf.train.RMSPropOptimizer(learning_rate=0.00005)
G_solver = minimizer.minimize(G_loss, var_list=params, global_step=G_global_step)
D_solver = minimizer.minimize(D_loss, var_list=params, global_step=D_global_step)

Is it possible to make a trainable variable not trainable?

I created a trainable variable in a scope. Later, I entered the same scope, set the scope to reuse_variables, and used get_variable to retrieve the same variable. However, I cannot set the variable's trainable property to False. My get_variable line is like:
weight_var = tf.get_variable('weights', trainable = False)
But the variable 'weights' is still in the output of tf.trainable_variables.
Can I set a shared variable's trainable flag to False by using get_variable?
The reason I want to do this is that I'm trying to reuse the low-level filters pre-trained from VGG net in my model, and I want to build the graph like before, retrieve the weights variable, and assign VGG filter values to the weight variable, and then keep them fixed during the following training step.
After looking at the documentation and the code, I was not able to find a way to remove a Variable from the TRAINABLE_VARIABLES.
Here is what happens:
The first time tf.get_variable('weights', trainable=True) is called, the variable is added to the list of TRAINABLE_VARIABLES.
The second time you call tf.get_variable('weights', trainable=False), you get the same variable but the argument trainable=False has no effect as the variable is already present in the list of TRAINABLE_VARIABLES (and there is no way to remove it from there)
First solution
When calling the minimize method of the optimizer (see doc.), you can pass a var_list=[...] as argument with the variables you want to optimizer.
For instance, if you want to freeze all the layers of VGG except the last two, you can pass the weights of the last two layers in var_list.
Second solution
You can use a tf.train.Saver() to save variables and restore them later (see this tutorial).
First you train your entire VGG model with all trainable variables. You save them in a checkpoint file by calling saver.save(sess, "/path/to/dir/model.ckpt").
Then (in another file) you train the second version with non trainable variables. You load the variables previously stored with saver.restore(sess, "/path/to/dir/model.ckpt").
Optionally, you can decide to save only some of the variables in your checkpoint file. See the doc for more info.
When you want to train or optimize only certain layers of a pre-trained network, this is what you need to know.
TensorFlow's minimize method takes an optional argument var_list, a list of variables to be adjusted through back-propagation.
If you don't specify var_list, any TF variable in the graph could be adjusted by the optimizer. When you specify some variables in var_list, TF holds all other variables constant.
Here's an example of a script which jonbruner and his collaborator have used.
tvars = tf.trainable_variables()
g_vars = [var for var in tvars if 'g_' in var.name]
g_trainer = tf.train.AdamOptimizer(0.0001).minimize(g_loss, var_list=g_vars)
This finds all the variables they defined earlier that have "g_" in the variable name, puts them into a list, and runs the ADAM optimizer on them.
You can find the related answers here on Quora
In order to remove a variable from the list of trainable variables, you can first access the collection through:
trainable_collection = tf.get_collection_ref(tf.GraphKeys.TRAINABLE_VARIABLES)
There, trainable_collection contains a reference to the collection of trainable variables. If you pop elements from this list, doing for example trainable_collection.pop(0), you are going to remove the corresponding variable from the trainable variables, and thus this variable will not be trained.
Although this works with pop, I am still struggling to find a way to correctly use remove with the correct argument, so we don't depend on the index of the variables.
EDIT: Given that you have the name of the variables in the graph (you can obtain that by inspecting the graph protobuf or, what is easier, using Tensorboard), you can use it to loop through the list of trainable variables and then remove the variables from the trainable collection.
Example: say that I want the variables with names "batch_normalization/gamma:0" and "batch_normalization/beta:0" NOT to be trained, but they are already added to the TRAINABLE_VARIABLES collection. What I can do is:
`
#gets a reference to the list containing the trainable variables
trainable_collection = tf.get_collection_ref(tf.GraphKeys.TRAINABLE_VARIABLES)
variables_to_remove = list()
for vari in trainable_collection:
#uses the attribute 'name' of the variable
if vari.name=="batch_normalization/gamma:0" or vari.name=="batch_normalization/beta:0":
variables_to_remove.append(vari)
for rem in variables_to_remove:
trainable_collection.remove(rem)
`
This will successfully remove the two variables from the collection, and they will not be trained anymore.
You can use tf.get_collection_ref to get the reference of collection rather than tf.get_collection