What does tf.train.get_global_step() do in TensorFlow? - tensorflow

What is the use of the function tf.train.get_global_step() in TensorFlow?
In machine learning concepts what is it equivalent to?

You could use it to restart training exactly where you left off when the training procedure has been stopped for some reason. Of course you can always restart training without knowing the global_step (if you save checkpoints regularly in your code, that is), but unless you somehow keep track of how many iterations you already performed, you will not know how many iterations are left after the restart. Sometimes you really want your model to be trained exactly n iterations and not n plus unknown amount before crash. So in my opinion, this is more of a practicality than a theoretical machine learning concept.

tf.train.get_global_step() return global step(variable, tensor from variable node or None) through get_collection(tf.GraphKeys.GLOBAL_STEP) or get_tensor_by_name('global_step:0')
global step is widely used in learn rate decay(like tf.train.exponential_decay, see Decaying the learning rate for more information).
You can pass global step to optimzer apply_gradients or minimize method to increment by one.

while you defined the global step operator, you can get value of it by sess.run(global_step_op)

Related

Defining assignment function as variable in tensroflow?

I am training a neural network by SGD (batch size = 1). The inputs are randomly generated, and the labels are calculated based on the input. AKA the data does not have to be realistic, but the relationships between inputs and labels are specific. I will train my NN only 1 epoch, but with many batches.
I have the following code:
training_input = tf.Variable(tf.zeros(...))
assign_training_input_with_random_values = training_input.assign(tf.random_normal(...))
//Create a session, initialize a bunch of variables, construct a neural network...
for batch in range(batch_number):
sess.run(assign_training_input_with_random_values)
//Train my neural network...
However I noticed that if I write the above code differently the speed goes down by a lot:
//Run the assignment operation directly without defining it as a variable
for batch in range(batch_size)
sess.run(training_input.assign(tf.random_normal(...)))
//Train my neural network...
The first snippet being significantly faster makes me worry that tensorflow is only randomizing when I define the assign_training_input_with_random_values variable, and the same training examples are fed to the NN over every batch afterwards. In this case, the NN will probably not generalize well. Meanwhile, the second snippet is slow because it is randomizing every batch. Is this actually the case or is there another reason for this?
First the explanation to your observations
Computational difference between 1st and 2nd solutions
It makes sense that your first solution is faster than the second. You define the assign operation once and then execute that for 100 epochs. However in the 2nd solution you create an op every epoch, growing the computational graph over time which causes your program to slow down.
Observation about the 1st solution
(After #Y.Z.'s finding) Apparently the first solution does evaluate to different random number arrays every time you run it. Therefore, the first solution is also valid.
Another way to implement this
The correct way to implement your solution would be to use a tf.placeholder to feed values in every epoch the following way.
import tensorflow as tf
import numpy as np
training_input = tf.Variable(tf.zeros(shape=[3, 2]))
tf_random = tf.placeholder(shape=[3, 2], dtype=tf.float32)
assign_training_input_with_random_values = training_input.assign(tf_random)
#Create a session, initialize a bunch of variables, construct a neural network...
epoch=0
with tf.Session() as sess:
while epoch < 10:
epoch+= 1
sess.run(assign_training_input_with_random_values, feed_dict={tf_random:np.random.normal(size=(3,2))})
Comparing Solution 1 vs My solution
So turns out, both your first solution and my solution will not grow the graph. If you run the line
print([n.name for n in tf.get_default_graph().as_graph_def().node])
for your first solution and my solution (Be careful to run tf.reset_default_graph() at the beginning) you'll see that the number of tensors remain constant regardless of the number of iterations. Appears that TensorFlow is smart enough to prune those old tf.random tensors no longer used.

Tensorflow: When to stop training due to the best (minimum) cost?

I am using TensorFlow to classify images using LeNet network. I use AdamOptimizer to minimize the cost function. When I start to train the model, I can observe that the training accuracy and validation accuracy and also the cost is changing, sometimes reducing and sometimes increasing.
My questions: When should we stop the training? How can we know that the optimizer will find the minimum cost? how many iterations should we do the training? Can we set a variable or condition to stop at the minimum cost?
My solution is to define a global variable (min_cost) and in each iteration check if the cost is reducing then save the session and replace the min_cost with the new cost. At the end, I will have the saved session for the minimum cost,
Is this a correct approach?
Thanks in advance,
While training neural networks, mostly a target error is defined alongside with a maximum amount of iterations to train. So for example, a target error could be 0.001MSE. Once this error has been reached, the training will stop - if this error has not been reached after the maximum amount of iterations, the training will also stop.
But it seems like you want to train until you know the network can't do any better. Saving the 'best' parameters like you're doing is a fine approach, but do realise that once some kind of minimum cost has been reached, the error won't fluctuate that much anymore. It won't be like the error suddenly goes up significantly, so it is not completely necessary to save the network.
There is no such thing as 'minimal cost' - the network is always trying to go to some local minima, and it will always be doing so. There is not really way you (or an algorithm) can figure out that there is no better error to be reached anymore.
tl;dr - just set a target reasonable target error alongside with a maximum amount of iterations.

Why and when do I need to use the global step in tensorflow

I am using tensorflow, but I am not sure why I even need the global_step variable or if it is even necessary for training. I have sth like this:
gradients_and_vars = optimizer.compute_gradients(value)
train_op = optimizer.apply_gradients(gradients_and_vars)
and then in my loop inside a session I do this:
_ = sess.run([train_op])
I am using a Queue to feed my data the the graph. Do I even have to instantiate a global_step variable?
My loop looks like this:
while not coord.should_stop():
So this loop stops, when it should stop. So why do I need the global_step at all?
You don't need the global step in all cases. But sometimes people want to stop training, tweak some code and then continue training with the saved and restored model. Then often it is nice to know how long (=for how many time steps) this model had been trained so far. Thus the global step.
Also sometimes your learning rate regime might depend on the time the model already had been trained. Say you want to decay your learning rate every 100.000 steps. If you don't keep track of the number of steps already taken this might be difficult if you interrupted training in between and didn't keep track of the number of steps already taken.
And furthermore if you are using tensorboard the global step is the central parameter for your x-axis of the charts.

Convergence in Logistic Regression in distributed tensorflow

I'm trying to develop logistic regression in distributed tensorflow and I want to integrate a convergence check in my algorithm apart from the upper bound of iterations. The convergence criteria I am about to use is
||prevW - currW|| < E
where prevW is the previous values of the model weights and currW the current ones. E is the convergence tolerance.
My question is about the previous model weights. Since I am using between graph replication and asynchronous training, I don't know when it's worker of the cluster will update the weights. So let's say a worker has computed the new weights using a batch and wants to check if the algorithm has converged in order to stop. I will use the weights available in local replica (so use the corresponding tensor) or I will evaluate the tensor to get the last updated value before I continue with the current computation? I tried to do as described above, but the algorithm did not converge and stopped after the upper bound for the iterations was reached.
Thank you beforehand for your help :D
I would do the convergence check in the same device where the variables are. This way you avoid copying too much stuff over the network. This can be done by putting it in a with tf.device(variable.device): block.

What caching model does TensorFlow use?

I read the question here
TensorFlow - get current value of a Variable
and the answer has left me confused.
On one hand, dga says "And to be very clear: Running the variable will
produce only the current value of the variable; it will not run any
assign operations associated with it. It's cheap."
On the other hand, Salvador Dali says "#dga yes, if the variable depends
on n other variables, they also need to be evaluated."
So, which is it? Does evaluating the variable only return its current
value, or does it recompute its value from scratch from the variables it
depends on?
What happens if I evaluate the same variable twice in a row? Does
Tensorflow have any notion of "stale" variables, i.e. variables that
need to be recomputed because their dependencies actually changed (i.e. like in
build system)?
I ask because I work with multiple nets where the partial output of one
net becomes the partial input of another net. I want to fetch the
gradients computed at the input layer of one net and merge+apply them to
the output layer of another net. I was hoping to do this by manually
retrieving/storing gradients in the variables of a graph, and then
running graph operations to backpropagate the gradients. Thus I need to
understand how it all works under the hood.
What I do is similar to this
How to use Tensorflow Optimizer without recomputing activations in reinforcement learning program that returns control after each iteration?, but I can't conclude whether it's possible based on the last answer (experimental support now in?)
Thanks!
#dga is correct. If you pass a tf.Variable object to tf.Session.run() TensorFlow will return the current value of the variable, and it will not perform any computation. It is cheap (the cost of a memory copy, or possibly a network transfer in the case of a distributed TensorFlow setup). TensorFlow does not retain any history* about how the value of a tf.Variable was updated, so it cannot in general recompute its value from scratch.
(* Technically TensorFlow remembers the tf.Tensor that was used to initialize each variable, so it is possible to recompute the inital value of the variable.)