TensorFlow Code not reproducible, even using seed - tensorflow

I'm working on someone's old code written in TensorFlow 1.4, and I find that the results are not reproducible even with both Numpy and TF being seeded:
seed = 1
np.random.seed(seed)
tf.set_random_seed(seed)
I'm not training using a GPU. What is the correct way to set a seed in TensorFlow 1.4?
Relatedly, what is the difference between tf.random.set_seed and tf.set_random_seed?

Related

How to use legacy_seq2seq for TensorFlow 2?

I am new to TensorFlow and I am wanting to use tensorflow.config.legacy_seq2se, specifically embedding_rnn_seq2seq() and I can't figure out how to use it (or if there is an equivalent method) for TensorFlow 2.
I know that in TensorFlow 2, TensorFlow removed contrib and according to this document
tf.contrib.legacy_seq2seq has been deleted and replaced with tf.seq2seq in TensorFlow 2, but I can't find embedding_rnn_seq2seq() in the tf.seq2seq documentation I have seen.
The reason I want to use it is I am trying to implement something similar to what is done with embedding_rnn_seq2seq() in this article. So is there an equivalent in tensorflow 2, or is there a different way to achieve the same goal?
According to https://docs.w3cub.com/tensorflow~python/tf/contrib/legacy_seq2seq/embedding_rnn_seq2seq , contrib.legacy_rnn_seq2seq createsan embedding of an argument that you pass, encoder_inputs (the shape is num_encoder_symbols x input_size). It then runs an RNN to encode the embedded encoder_inputs to convert it into a state vector. Then it embeds another argument you pass decoder_inputs (the shape is num_decoder_symbols x input_size). Next it runs an RNN decoder initialized with with the last encoder state, on the embedded decoder_inputs.
Contrib was a community maintained part of Tensorflow, and seq2seq was part of it. In Tensorflow 2 it was removed.
You could just use a Tensorflow_addons which contains community made add ons including seq2seq I believe.
You can import Tensorflow add ons via
import tensorflow_addons
Or you could just use a Tensorflow version that still has Seq2Seq (I believe 1.1 is the latest).
There are also things like bi-directional recurrent neural networks and dynamic RNNs (they are basically a new version of seq2seq) that may work.

Resetting graph in Tensorflow 2

In my use case, I have some time series data where at each time t, I train a new model over a rolling window. In tensorflow 1, I had to do the following otherwise models will accumulate in the default graph and essentially leak memory.
import tensorflow as tf
import keras.backend as K
...
tf.reset_default_graph()
K.clear_session()
In tensorflow 2, I've found equivalent functions tf.compat.v1.reset_default_graph() and tf.keras.backend.clear_session(). However, from the documentation, TF2 ties graph variables to python variables so theoretically if a python variable is destroyed, the graph variable should also be destroyed. Is this interpretation correct? I've tried putting model creation code in a loop, whilst memory usage still grows, it wasn't the sort of explosion I've witnessed in TF1.

Predict batches using Tensorflow Data API and Keras Model

Suppose I have a dataset and a Keras Model. The dataset has been divided into batches using batch() in tf Dataset API. Now I am seeking an efficient and clean way to do batch predictions for all testing samples.
I have tried the following code and it works.
batch_size = 32
dataset = dataset.batch(batch_size)
predictions = keras_model.predict(dataset, steps=math.ceil(num_testing_samples / batch_size))
I wonder is there any more efficient and elegant approach to implement this?
TF >= 1.14.0
You can just set steps=None. From the official documentation of tf.keras.Model.predict():
If x is a tf.data dataset and steps is None, predict will run until the input dataset is exhausted.
Just make sure that your dataset object is not in repeat mode and you are good to go :).
TF 1.12.0 & 1.13.0
The support for tf.data.Dataset with tf.keras is very poor in these versions. The tf.data.Dataset object is transformed into an iterator here, which then triggers an error here if you didn't set the stepsĀ argument. This is patched in 1.14.0.

How do I use an imported MNIST dataset?

I'm not familiar with using python as a ML tool and wanted to train the MNIST data set. I have downloaded the MNIST library using
pip install python-mnist but do not know what my next step should be. What would an import statement look like? Should I also import TensorFlow and or Keras to train the data?
I know the MNIST dataset is available in TensorFlow and Keras, however, importing via pip a necessary solution for my use case. Both TensorFlow and Keras have tutorials using the MNIST data set, but I was wondering if it is possible to use the data set without using their pre-downloaded library.
The import statement should look like this
from mnist import MNIST
mndata = MNIST('./dir_with_mnist_data_files')
images, labels = mndata.load_training()
then you can work directly with the arrays of raw images and labels.

tf.set_random_seed seems doesn't work, any better way to make tensorflow code reproducible?

I want to reproduce my result, I used the following lines to fix the randomness
import numpy as np
np.random.seed(1)
import tensorflow as tf
tf.set_random_seed(1)
But I still get different results at each run. any idea how to fix?
You also have to set the seed of every operation that uses random numbers.
Tensorflow has two different seeds: the graph level seed and the operation level sedd.
For instance, tf.truncated_normal needs to have both the graph seed (that you set with tf.set_random_seed(1)) and the operation seed (the seed parameter) sat, in order to be reproducibile.