I was working on hyperparameter optimization for neural network. I was running the model for 20 epochs. After figuring out the best hyperparameters, I ran the same model again alone (now no hyperparameter optimization) but I got different results. Not only that, I figured out that the value (accuracy) reached while performing hyperparameter optimization occured at the last epoch (20th). On the other hand, when I ran the same model again, I figured out that accuracy achieved was not until 200 epochs. Yet, the values were slightly less. Below is the figure:
Therefore, I would like to know what was the random seed chosen by tensorflow at that moment. As a result, I am not interested in setting the random seed to a certain constant, but I would like to see what was chosen by tensorflow.
Your help is much appreciated!!
This question is very similar, but it does not have an answer, see the comments thread. In general, you cannot "extract the seed" at any given time, because there is no seed once the RNG has started working.
If you just want to see the initial seed, you need to understand there are graph-level and op-level seeds (see tf.set_random_seed, and the implementation in random_seed.py):
If both are set then both are combined to produce the actual seed.
If the graph seed is set but the op seed is not, the seed is determined deterministically from the graph seed and the "op id".
If the op seed is set but the graph seed is not, then a default graph seed is used
If none of them is set, then a random seed is produced. To see where this comes from you would look at GuardedPhiloxRandom which provides the two numbers that are finally used by PhiloxRandom. In the case that no seed at all is provided, picks two random value generated from /dev/urandom, as seen in random.cc
You can actually see these, by the way, when they are set. You just need to access the specific random operation that you are interested in and read its attributes seed and seed2. Note that TensorFlow public functions return the result of a few operations (scaling, displacing), so you have to "climb up" the graph a bit to get to the interesting one:
import tensorflow as tf
def print_seeds(random_normal):
# Get to the random TensorFlow op (RandomStandardNormal) and print seeds
random_op = random_normal.op.inputs[0].op.inputs[0].op
print(random_op.get_attr('seed'), random_op.get_attr('seed2'))
print_seeds(tf.random_normal(()))
# 0 0
print_seeds(tf.random_normal((), seed=100))
# 87654321 100
tf.set_random_seed(200)
print_seeds(tf.random_normal(()))
# 200 15
print_seeds(tf.random_normal((), seed=300))
# 200 300
Unfortunately, when the seed is unspecified there is no way to retrieve the random values generated by TensorFlow. The two random numbers are passed to PhiloxRandom, which uses them to initialize its internal key_ and counter_ variables, which cannot be read out anyhow.
Related
The documentation for PolynomialDecay suggests that by default, frequency=100 so that pruning is only applied every 100 steps. This presumably means that the parameters which are pruned to 0 will drift away from 0 during the other 99/100 steps. So at the end of the pruning process, unless you are careful to have an exact multiple of 100 steps, you well end up with a model that is not perfectly pruned but which has a large number of near-zero values.
How does one stop this happening? Do you have to tweak frequency to be a divisor of the number of steps? I can't find any code samples that do that...
As per this example in the doc: while training the tfmot.sparsity.keras.UpdatePruningStep() callback must be registered:
callbacks = [
tfmot.sparsity.keras.UpdatePruningStep(),
…
]
model_for_pruning.fit(…, callbacks=callbacks)
This will ensure that the mask is applied (and so weights set to zero) when the training ends.
https://github.com/tensorflow/model-optimization/blob/master/tensorflow_model_optimization/python/core/sparsity/keras/pruning_callbacks.py#L64
What is the difference between Global Seed and Operation Seed in TensorFlow.
According to the tensorflow documentation
While explaining Global Seed, they mention this
If the global seed is set but the operation seed is not set, we get
different results for every call to the random op, but the same
sequence for every re-run of the program:
and while explaining Operation Seed, they again state, something similiar
If the operation seed is set, we get different results for every call
to the random op, but the same sequence for every re-run of the
program:
what are the main differences between the two...and how do they operate at a intuitive level.
Thanks.
Here is good decription of the differences: https://www.kite.com/python/docs/tensorflow.set_random_seed
In short, tf.random.set_seed or tf.set_random_seed will guarantee that all operations will produce repeatable results across sessions. It will set the operation-seed deterministically for every operation.
Setting operation seed makes sense only as part of operation definition tf.random_uniform([1], seed=1) and will also lead to same sequences produced by this op across sessions.
What is the difference?
graph-seed makes all ops repeatedly deterministic. Use it if you want to fix all ops. Different ops will still produce different sequences (but repeated across sessions)
operation-seed makes single operation deterministic. You can create 2 ops that will produce same sequences.
I am trying to get deterministic behaviour from tf.train.shuffle_batch(). I could, instead, use tf.train.batch() which works fine (always the same order of elements), but I need to get examples from multiple tf-records and so I am stuck with shuffle_batch().
I am using:
random.seed(0)
np.random.seed(0)
tf.set_random_seed(0)
data_entries = tf.train.shuffle_batch(
[data], batch_size=batch_size, num_threads=1, capacity=512,
seed=57, min_after_dequeue=32)
But every time I restart my script I get slightly different results (not completely different, but about 20% of the elements are in the wrong order).
Is there anything I am missing?
Edit: Solved it! See my answer below!
Maybe I misunderstood something, but you can collect multiple tf-records in a queue with tf.train.string_input_producer(), then read the examples into tensors and finally use tf.train.batch().
Take a look at CIFAR-10 input.
Answering my own question:
First the reason shuffle_batch is non deterministic:
The time until I request a batch is inherently random.
In that time, a random number of tensors are available.
Tensorflow calls a shuffle operation that is seeded but depending on the number of items, it will return a different order.
So no matter the seeding, the order is always different unless the number of elements is constant. So the solution is to keep the number of elements constant, but how we do it?
By setting capacity=min_after_dequeue+batch_size. This will force Tensorflow to fill up the queue until it reaches full capacity before dequeuing an item. Therefore, at the time of the shuffle operation, we have capacity many items which is a constant number.
So why are we doing this? Because one tf.record contains many examples but we want examples from multiple tf.records. With a normal batch we would first get all the examples of one record and then of the next one. This also means we should set min_after_dequeue to something larger than the number of items in one tf.record. In my example, I have 50 examples in one file so I set min_after_dequeue=2048.
Alternatively, we can also shuffle the examples before creating the tf.records, but this was not possible for me because I read tf.records from multiple directories (each with their own dataset).
Last Note: You should also use a batch size of 1 to be super save.
I am modeling a perceptual process in tensorflow. In the setup I am interested in, the modeled agent is playing a resource game: it has to choose 1 out of n resouces, by relying only on the label that a classifier gives to the resource. Each resource is an ordered pair of two reals. The classifier only sees the first real, but payoffs depend on the second. There is a function taking first to second.
Anyway, ideally I'd like to train the classifier in the following way:
In each run, the classifier give labels to n resources.
The agent then gets the payoff of the resource corresponding to the highest label in some predetermined ranking (say, A > B > C > D), and randomly in case of draw.
The loss is taken to be the normalized absolute difference between the payoff thus obtained and the maximum payoff in the set of resources. I.e., (Payoff_max - Payoff) / Payoff_max
For this to work, one needs to run inference n times, once for each resource, before calculating the loss. Is there a way to do this in tensorflow? If I am tackling the problem in the wrong way feel free to say so, too.
I don't have much knowledge in ML aspects of this, but from programming point of view, I can see doing it in two ways. One is by copying your model n times. All the copies can share the same variables. The output of all of these copies would go into some function that determines the the highest label. As long as this function is differentiable, variables are shared, and n is not too large, it should work. You would need to feed all n inputs together. Note that, backprop will run through each copy and update your weights n times. This is generally not a problem, but if it is, I heart about some fancy tricks one can do by using partial_run.
Another way is to use tf.while_loop. It is pretty clever - it stores activations from each run of the loop and can do backprop through them. The only tricky part should be to accumulate the inference results before feeding them to your loss. Take a look at TensorArray for this. This question can be helpful: Using TensorArrays in the context of a while_loop to accumulate values
Given that I train a model; save it off with metagraph/save.Saver, and the load that graph into a new script/process to test against test data, what is the best way to make sure I only iterate over the test data once?
With my training data, I want to be able to iterate over the entire data set for an arbitrary number of iterations. I use
tf.train.string_input_producer()
to drive a queue of loading files for training, so I can safely leave num_epochs as default (=None) and let other controls drive training termination.
However, when I run the graph for evaluation, I just want to the evaluate the test set once (and gather the appropriate statistics).
Initial attempted solution:
Make a tensor for Epochs, and pass that into tf.train.string_input_producer, and then tf.Assign it to the appropriate value based on test/train.
But:
tf.train.string_input_producer only takes integers as num_epochs, so this isn't possible...unless I'm missing something.
Further notes: I use
tf.train.batch()
to read-in test/train data that has been serialized into protocol buffers (https://www.tensorflow.org/versions/r0.11/how_tos/reading_data/index.html#file-formats), so I have minimal visibility into how the data is loaded and how far along it is.
tf.train.batch apparently will throw tf.errors.OutOfRangeError, but I'm not clear how to catch that successfully, or if that is even what I really want to do. I tried a very naive
try...except...finally
(like in https://www.tensorflow.org/versions/r0.11/how_tos/reading_data/index.html#creating-threads-to-prefetch-using-queuerunner-objects), which didn't catch the error from tf.train.batch.