Gloabl Seed and Operation Seed in Tensorflow2 - tensorflow

What is the difference between Global Seed and Operation Seed in TensorFlow.
According to the tensorflow documentation
While explaining Global Seed, they mention this
If the global seed is set but the operation seed is not set, we get
different results for every call to the random op, but the same
sequence for every re-run of the program:
and while explaining Operation Seed, they again state, something similiar
If the operation seed is set, we get different results for every call
to the random op, but the same sequence for every re-run of the
program:
what are the main differences between the two...and how do they operate at a intuitive level.
Thanks.

Here is good decription of the differences: https://www.kite.com/python/docs/tensorflow.set_random_seed
In short, tf.random.set_seed or tf.set_random_seed will guarantee that all operations will produce repeatable results across sessions. It will set the operation-seed deterministically for every operation.
Setting operation seed makes sense only as part of operation definition tf.random_uniform([1], seed=1) and will also lead to same sequences produced by this op across sessions.
What is the difference?
graph-seed makes all ops repeatedly deterministic. Use it if you want to fix all ops. Different ops will still produce different sequences (but repeated across sessions)
operation-seed makes single operation deterministic. You can create 2 ops that will produce same sequences.

Related

What is actually meant by parallel_iterations in tfp.mcmc.sample_chain?

I am not able to get what does the parameter parallel_iterations stand for in sampling multiple chains during MCMC.
The documentation for mcmc.sample_chain() doesn't give much details, it just says that
The parallel iterations are the number of iterations allowed to run in parallel. It must be a positive integer.
I am running a NUTS sampler with multiple chains while specifying parallel_iterations=8.
Does it mean that the chains are strictly run in parallel? Is the parallel execution dependent on multi-core support? If so, what is a good value (based on the number of cores) to set parallel_iterations? Should I naively set it to some higher value?
TensorFlow can unroll iterations of while loops to execute in parallel, when some parts of the data flow (I.e. iteration condition) can be computed faster than other parts. If you don't have a special preference (i.e. reproducibility with legacy stateful samplers), leave it at default.

Problem when predicting via multiprocess with Tensorflow

I have 4 (or more) models (same structure but different training data). Now I want to ensemble them to make a prediction. I want to pre-load the models and then predict one input message (one message at a time) in parallel via multiprocess. However, the program always stops at "session.run" step. I could not figure it out why.
I tried passing all arguments to the function in each process, as shown in the code below. I also tried using a Queue object and put all the data (except the model object) in the queue. I also tried to set the number of process to 1. It made no difference.
with Manager() as manager:
first_level_test_features=manager.list()
procs =[]
for id in range(4):
p = Process(target=predict, args=(id, (message, models, configs, vocabs, emoji_dict,first_level_test_features)))
procs.append(p)
p.start()
for p in procs:
p.join()
I did not get any error message since it is just stuck there. I would expect the program can start multiple processes and each process uses the model pass to it to make the prediction.
I am unsure how session sharing along different Processes would work, and this is probably where your issue comes from. Given the way TensorFlow works, I would advise implementing the ensemble call as a graph operation, so that it can be run through a single session.run call, with TF handling the parallelization of computations wherever possible.
In practice, if you have symbolic tensors representing the models' predictions, you could use a TF operation to aggregate them (tf.concat, tf.reduce_mean, tf.add_n... whichever suits your design) and end up with a single symbolic tensor representing the ensemble prediction.
I hope this helps; if not, please provide some more details as to what your setting is, notably which form your models have.

Printing out the random seed automatically chosen by tensorflow

I was working on hyperparameter optimization for neural network. I was running the model for 20 epochs. After figuring out the best hyperparameters, I ran the same model again alone (now no hyperparameter optimization) but I got different results. Not only that, I figured out that the value (accuracy) reached while performing hyperparameter optimization occured at the last epoch (20th). On the other hand, when I ran the same model again, I figured out that accuracy achieved was not until 200 epochs. Yet, the values were slightly less. Below is the figure:
Therefore, I would like to know what was the random seed chosen by tensorflow at that moment. As a result, I am not interested in setting the random seed to a certain constant, but I would like to see what was chosen by tensorflow.
Your help is much appreciated!!
This question is very similar, but it does not have an answer, see the comments thread. In general, you cannot "extract the seed" at any given time, because there is no seed once the RNG has started working.
If you just want to see the initial seed, you need to understand there are graph-level and op-level seeds (see tf.set_random_seed, and the implementation in random_seed.py):
If both are set then both are combined to produce the actual seed.
If the graph seed is set but the op seed is not, the seed is determined deterministically from the graph seed and the "op id".
If the op seed is set but the graph seed is not, then a default graph seed is used
If none of them is set, then a random seed is produced. To see where this comes from you would look at GuardedPhiloxRandom which provides the two numbers that are finally used by PhiloxRandom. In the case that no seed at all is provided, picks two random value generated from /dev/urandom, as seen in random.cc
You can actually see these, by the way, when they are set. You just need to access the specific random operation that you are interested in and read its attributes seed and seed2. Note that TensorFlow public functions return the result of a few operations (scaling, displacing), so you have to "climb up" the graph a bit to get to the interesting one:
import tensorflow as tf
def print_seeds(random_normal):
# Get to the random TensorFlow op (RandomStandardNormal) and print seeds
random_op = random_normal.op.inputs[0].op.inputs[0].op
print(random_op.get_attr('seed'), random_op.get_attr('seed2'))
print_seeds(tf.random_normal(()))
# 0 0
print_seeds(tf.random_normal((), seed=100))
# 87654321 100
tf.set_random_seed(200)
print_seeds(tf.random_normal(()))
# 200 15
print_seeds(tf.random_normal((), seed=300))
# 200 300
Unfortunately, when the seed is unspecified there is no way to retrieve the random values generated by TensorFlow. The two random numbers are passed to PhiloxRandom, which uses them to initialize its internal key_ and counter_ variables, which cannot be read out anyhow.

Tensorflow Shuffle Batch Non Deterministic

I am trying to get deterministic behaviour from tf.train.shuffle_batch(). I could, instead, use tf.train.batch() which works fine (always the same order of elements), but I need to get examples from multiple tf-records and so I am stuck with shuffle_batch().
I am using:
random.seed(0)
np.random.seed(0)
tf.set_random_seed(0)
data_entries = tf.train.shuffle_batch(
[data], batch_size=batch_size, num_threads=1, capacity=512,
seed=57, min_after_dequeue=32)
But every time I restart my script I get slightly different results (not completely different, but about 20% of the elements are in the wrong order).
Is there anything I am missing?
Edit: Solved it! See my answer below!
Maybe I misunderstood something, but you can collect multiple tf-records in a queue with tf.train.string_input_producer(), then read the examples into tensors and finally use tf.train.batch().
Take a look at CIFAR-10 input.
Answering my own question:
First the reason shuffle_batch is non deterministic:
The time until I request a batch is inherently random.
In that time, a random number of tensors are available.
Tensorflow calls a shuffle operation that is seeded but depending on the number of items, it will return a different order.
So no matter the seeding, the order is always different unless the number of elements is constant. So the solution is to keep the number of elements constant, but how we do it?
By setting capacity=min_after_dequeue+batch_size. This will force Tensorflow to fill up the queue until it reaches full capacity before dequeuing an item. Therefore, at the time of the shuffle operation, we have capacity many items which is a constant number.
So why are we doing this? Because one tf.record contains many examples but we want examples from multiple tf.records. With a normal batch we would first get all the examples of one record and then of the next one. This also means we should set min_after_dequeue to something larger than the number of items in one tf.record. In my example, I have 50 examples in one file so I set min_after_dequeue=2048.
Alternatively, we can also shuffle the examples before creating the tf.records, but this was not possible for me because I read tf.records from multiple directories (each with their own dataset).
Last Note: You should also use a batch size of 1 to be super save.

How do I have to train a HMM with Baum-Welch and multiple observations?

I am having some problems understanding how the Baum-Welch algorithm exactly works. I read that it adjusts the parameters of the HMM (the transition and the emission probabilities) in order to maximize the probability that my observation sequence may be seen by the given model.
However, what does happen if I have multiple observation sequences? I want to train my HMM against a huge lot of observations (and I think this is what is usually done).
ghmm for example can take both a single observation sequence and a full set of observations for the baumWelch method.
Does it work the same in both situations? Or does the algorithm have to know all observations at the same time?
In Rabiner's paper, the parameters of GMMs (weights, means and covariances) are re-estimated in the Baum-Welch algorithm using these equations:
These are just for the single observation sequence case. In the multiple case, the numerators and denominators are just summed over all observation sequences, and then divided to get the parameters. (this can be done since they simply represent occupation counts, see pg. 273 of the paper)
So it's not required to know all observation sequences during an invocation of the algorithm. As an example, the HERest tool in HTK has a mechanism that allows splitting up the training data amongst multiple machines. Each machine computes the numerators and denominators and dumps them to a file. In the end, a single machine reads these files, sums up the numerators and denominators and divides them to get the result. See pg. 129 of the HTK book v3.4