What is the intuition behind the Iterator.get_next method? - tensorflow

The name of the method get_next() is a little bit misleading. The documentation says
Returns a nested structure of tf.Tensors representing the next element.
In graph mode, you should typically call this method once and use its result as the input to another computation. A typical loop will then call tf.Session.run on the result of that computation. The loop will terminate when the Iterator.get_next() operation raises tf.errors.OutOfRangeError. The following skeleton shows how to use this method when building a training loop:
dataset = ... # A `tf.data.Dataset` object.
iterator = dataset.make_initializable_iterator()
next_element = iterator.get_next()
# Build a TensorFlow graph that does something with each element.
loss = model_function(next_element)
optimizer = ... # A `tf.compat.v1.train.Optimizer` object.
train_op = optimizer.minimize(loss)
with tf.compat.v1.Session() as sess:
try:
while True:
sess.run(train_op)
except tf.errors.OutOfRangeError:
pass
Python also has a function called next, which needs to be called every time we need the next element of the iterator. However, according to the documentation of get_next() quoted above, get_next() should be called only once and its result should be evaluated by calling the method run of the session, so this is a little bit unintuitive, because I was used to the Python's built-in function next. In this script, get_next() is also called only and the result of the call is evaluated at every step of the computation.
What is the intuition behind get_next() and how is it different from next()? I think that the next element of the dataset (or feedable iterator), in the second example I linked above, is retrieved every time the result of the first call to get_next() is evaluated by calling the method run, but this is a little unintuitive. I don't get why we do not need to call get_next at every step of the computation (to get the next element of the feedable iterator), even after reading the note in the documentation
NOTE: It is legitimate to call Iterator.get_next() multiple times, e.g. when you are distributing different elements to multiple devices in a single step. However, a common pitfall arises when users call Iterator.get_next() in each iteration of their training loop. Iterator.get_next() adds ops to the graph, and executing each op allocates resources (including threads); as a consequence, invoking it in every iteration of a training loop causes slowdown and eventual resource exhaustion. To guard against this outcome, we log a warning when the number of uses crosses a fixed threshold of suspiciousness.
In general, it is not clear how the Iterator works.

The idea is that get_next adds some operations to the graph such that, every time you evaluate them, you get the next element in the dataset. On each iteration, you just need to run the operations that get_next made, you do not need to create them over and over again.
Maybe a good way to get an intuition is to try to write an iterator yourself. Consider something like the following:
import tensorflow as tf
tf.compat.v1.disable_v2_behavior()
# Make an iterator, returns next element and initializer
def iterator_next(data):
data = tf.convert_to_tensor(data)
i = tf.Variable(0)
# Check we are not out of bounds
with tf.control_dependencies([tf.assert_less(i, tf.shape(data)[0])]):
# Get next value
next_val_1 = data[i]
# Update index after the value is read
with tf.control_dependencies([next_val_1]):
i_updated = tf.compat.v1.assign_add(i, 1)
with tf.control_dependencies([i_updated]):
next_val_2 = tf.identity(next_val_1)
return next_val_2, i.initializer
# Test
with tf.compat.v1.Graph().as_default(), tf.compat.v1.Session() as sess:
# Example data
data = tf.constant([1, 2, 3, 4])
# Make operations that give you the next element
next_val, iter_init = iterator_next(data)
# Initialize iterator
sess.run(iter_init)
# Iterate until exception is raised
while True:
try:
print(sess.run(next_val))
# assert throws InvalidArgumentError
except tf.errors.InvalidArgumentError: break
Output:
1
2
3
4
Here, iterator_next gives you something comparable to what get_next in an iterator would give you, plus an initializer operation. Every time you run next_val you get a new element from data, you don't need to call the function every time (which is how next works in Python), you call it once and then evaluate the result multiple times.
EDIT: The function iterator_next above could also be simplified to the following:
def iterator_next(data):
data = tf.convert_to_tensor(data)
# Start from -1
i = tf.Variable(-1)
# First increment i
i_updated = tf.compat.v1.assign_add(i, 1)
with tf.control_dependencies([i_updated]):
# Check i is not out of bounds
with tf.control_dependencies([tf.assert_less(i, tf.shape(data)[0])]):
# Get next value
next_val = data[i]
return next_val, i.initializer
Or even simpler:
def iterator_next(data):
data = tf.convert_to_tensor(data)
i = tf.Variable(-1)
i_updated = tf.compat.v1.assign_add(i, 1)
# Using i_updated directly as a value is equivalent to using i with
# a control dependency to i_updated
with tf.control_dependencies([tf.assert_less(i_updated, tf.shape(data)[0])]):
next_val = data[i_updated]
return next_val, i.initializer

Related

How to keep vectors in a dictionary in tensorflow?

It seems that tf.lookup.experimental.DenseHashTable cannot hold vectors and I could not find examples of how to use it.
Below you can find a simple implementation of dictionary of vectors in Tensorflow. It is also an example of usage of tf.lookup.experimental.DenseHashTable and tf.TensorArray.
As said, vectors cannot be kept in tf.lookup.experimental.DenseHashTable, and therefore tf.TensorArray is used to keep the actual vectors.
Of course, this is a simple example, and it does not include deletion of entries in the dictionary - an operation that will require some management of the free cells of the array. Also, you should read in the respective API pages of tf.lookup.experimental.DenseHashTable and tf.TensorArray how to tune them for your needs.
import tensorflow as tf
class DictionaryOfVectors:
def __init__(self, dtype):
empty_key = tf.constant('')
deleted_key = tf.constant('deleted')
self.ht = tf.lookup.experimental.DenseHashTable(key_dtype=tf.string,
value_dtype=tf.int32,
default_value=-1,
empty_key=empty_key,
deleted_key=deleted_key)
self.ta = tf.TensorArray(dtype, size=0, dynamic_size=True, clear_after_read=False)
self.inserts_counter = 0
#tf.function
def insertOrAssign(self, key, vec):
# Insert the vector to the TensorArray. The write() method returns a new
# TensorArray object with flow that ensures the write occurs. It should be
# used for subsequent operations.
with tf.init_scope():
self.ta = self.ta.write(self.inserts_counter, vec)
# Insert the same counter value to the hash table
self.ht.insert_or_assign(key, self.inserts_counter)
self.inserts_counter += 1
#tf.function
def lookup(self, key):
with tf.init_scope():
index = self.ht.lookup(key)
return self.ta.read(index)
dictionary_of_vectors = DictionaryOfVectors(dtype=tf.float32)
dictionary_of_vectors.insertOrAssign('first', [1,2,3,4,5])
print(dictionary_of_vectors.lookup('first'))
The example is a bit more sophisticated, as the insert and lookup methods are decorated with #tf.function. Because the methods change variables defined outside of them, the tf.init_scope() is used. You might ask what is changed in the lookup() method as it actually only reads from the hash table and the array. The reason is that in graph mode, the index that is returned from the lookup() call is a Tensor, and in the TensorArray implementation there is a line containing if index < 0: which fails with:
OperatorNotAllowedInGraphError: using a tf.Tensor as a Python bool is not allowed.
When we use the tf.init_scope(), as explained in its API documentation, "code inside an init_scope block runs with eager execution enabled even when tracing a tf.function". So in that case that index is not a Tensor but as scalar.

How to initialize tf.contrib.lookup.HashTable used in Tensorflow Estimator model_fn?

I have a tf.contrib.lookup.HashTable declared inside a Tensorflow Estimator model_fn. As the session is not directly available to us in Estimators, I am stuck with not being able to initialize the table. I am aware that if not used with Estimators, table can be initialized with table.init.run() using the session
I tried to initialize the table by using a sessionRunHook which I was already using for some other purpose. I pass the table init op as argument to session run in the before_run function. But table is still not initialized. I also tried to pass tf.tables_initializer() instead, but that did not work too. Another option I tried without success is the tf.add_to_collection(tf.GraphKeys.TABLE_INITIALIZERS.. command.
#sessionRunHook code below
class SaveToCSVHook(tf.train.SessionRunHook):
def begin(self):
samples_weights_table = session.graph.get_tensor_by_name('samples_weights_table:0')
self.samples_weights_table_init_op = samples_weights_table.init
self.table_init_op = tf.tables_initializer() # also tried passing this to self.args instead - same result though
tf.add_to_collection(tf.GraphKeys.TABLE_INITIALIZERS, samples_weights_table.init)
def after_create_session(self, session, coord):
self.args ={'table_init_op':self.samples_weights_table_init_op}
def before_run(self, run_context):
return tf.train.SessionRunArgs(self.args)
def after_run(self, run_context, run_values):
print(f"Got Values: {run_values.results}")
# Estimator model_fn code below
def model_fn(..)
samples_weights_table = tf.contrib.lookup.HashTable(tf.contrib.lookup.KeyValueTensorInitializer(keysb, values, key_dtype=tf.string, value_dtype=tf.float32,name='samples_weights_table_init_op'), -1.0,name='samples_weights_table')
I get error:
FailedPreconditionError (see above for traceback): Table not initialized
which obviously means the table is not getting initialized
If anyone is interested to know the answer, the hashtable need not be explicitly initialized when used with Estimators. They are initialized by default for high level APIs like Estimators. The error goes away when the initializer code is removed and the table works as expected.

What's the difference between `tf.random_normal` and `tf.random_normal_initializer`?

Using the two functions seems to get the same result.
t4 = tf.get_variable('t4', initializer=tf.random_normal((2,), seed=0))
t5 = tf.get_variable('t5', shape=(2,), initializer=tf.random_normal_initializer(seed=0))
And I find inside the random_normal_initializer() also use the random_normal().
I indistinctly realize the difference between them. The random_normal will return a constant tensor, but the random_normal_initializer will return value after init.
I want to know more about how to use of these two functions at the right time.
Does it use random_normal to init a variable will actually init twice(after init the variable)? In other words, if there performance problems between them.
Maxim's answer to this question is excellent, but I want to answer a slightly more simple question (with a few examples) that the OP might be asking:
Most basic answer: tf.random_normal is a Tensor; buttf.random_normal_initializer is a RandomNormal, not a Tensor. I think simple code best clarifies the difference between these two:
# Simple examples to clarify tf.random_normal from tf.random_normal_initializer
tf.reset_default_graph()
# OP's code
t4 = tf.get_variable('t4', initializer=tf.random_normal((2,), seed=0))
t5 = tf.get_variable('t5', shape=(2,), initializer=tf.random_normal_initializer(seed=0))
# clarifying Tensor vs Initializer outside the context of get_variable.
t6 = tf.random_normal((2,),seed=0)
t7 = tf.random_normal_initializer(seed=0)
# types
print(type(t6)) # <class 'tensorflow.python.framework.ops.Tensor'>
print(type(t7)) # <class 'tensorflow.python.ops.init_ops.RandomNormal'>
# run the graph...
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
# OP's code
print(sess.run(t4)) #[-0.39915761 2.10443926]
print(sess.run(t5)) #[-0.39915761 2.10443926]
# tf.random_normal is a Tensor
print(sess.run(t6)) #[-0.39915761 2.10443926]
# tf.random_normal_initializer returns a tf.RandomNormal, not a Tensor or Op, so can't be sess.run()!
try:
print(sess.run(t7)) # Exception!
except:
print("Exception!")
# But notice that you don't need to initialize an initializer, just a variable.
t8 = tf.random_normal_initializer(seed=0)
t9 = tf.get_variable('t9',shape=(2,), initializer=t8)
sess.run(t9.initializer) # still need to initialize the variable
print(sess.run(t9)) #[-0.39915761 2.10443926]
In your setting: Now, as far as the code you are calling goes, there is no real difference; the initializer keyword is overloaded to accept both and will behave as Maxim indicates. From the tf/ops/variable_scope docs:
if initializer is None:
init, initializing_from_value = self._get_default_initializer(
name=name, shape=shape, dtype=dtype)
if initializing_from_value:
init_shape = None
else:
init_shape = var_shape
elif callable(initializer):
init = initializer
init_shape = var_shape
elif isinstance(initializer, ops.Tensor):
init = array_ops.slice(initializer, var_offset, var_shape)
# Use the dtype of the given tensor.
dtype = init.dtype.base_dtype
init_shape = None
else:
init = ops.convert_to_tensor(initializer, dtype=dtype)
init = array_ops.slice(init, var_offset, var_shape)
init_shape = None
tf.random_normal returns a tensor of the specified shape filled with random normal values. In addition, it creates a number of under-the-hood ops to compute the value:
random_normal/shape
random_normal/mean
random_normal/stddev
random_normal/RandomStandardNormal
random_normal/mul
At runtime, consecutive evaluations of this tensor produce a new value, but not other nodes are added.
tf.random_normal_initializer is an Initializer instance, which invokes tf.random_normal upon calling. So there is no big difference between tf.random_normal_initializer and tf.random_normal. Even if you call the init twice, neither of those will add new nodes to the graph. But both add 6 additional nodes in compilation.
Another alternative (that may be even more efficient in some cases) is initialization with numpy.random.normal array, like this:
t1 = tf.Variable(name='t1', initial_value=np.random.normal(size=(2,)))
This way no random_normal nodes are added to the graph, neither in compilation or in runtime.
UPD: tensorflow adds the const op .../initial_value in this case and the whole numpy array is going to be present in the graph, which may be a problem if the array is large.

How can I reroute the training input pipeline to test pipeline in tensorflow using tf.contrib.graph_editor?

Suppose now I have a training input pipeline which finally generate train_x and train_y using tf.train.shuffle_batch. I export meta graph and re-import the graph in another code file. Now I want to detach the input pipeline, i.e., the train_x and train_y, and connect a new test_x and test_y. How can I make accomplish this using tf.contrib.graph_editor?
EDIT: As suggested by #iga, I change my input directory using input_map
filenames = tf.train.match_filenames_once(FLAGS.data_dir + '*', name='matching_filenames')
if FLAGS.ckpt != '':
latest = FLAGS.log_dir + FLAGS.ckpt
else:
latest = tf.train.latest_checkpoint(FLAGS.log_dir)
if not latest or not os.path.exists(latest+'.meta'):
print("checkpoint " + latest + " does not exist")
sys.exit(1)
saver = tf.train.import_meta_graph(latest+'.meta',
input_map={'matching_filenames:0':filenames},
import_scope='import')
g = tf.get_default_graph()
but I get the following error:
ValueError: graph_def is invalid at node u'matching_filenames/Assign':
Input tensor 'matching_filenames:0' Cannot convert a tensor of type
string to an input of type string_ref.
Are there any elegant way to resolve this?
For this task, you should be able to just use input_map argument to https://www.tensorflow.org/api_docs/python/tf/import_graph_def. If you are using import_meta_graph, you can pass the input_map into its kwargs and it will get passed down to import_graph_def.
RESPONSE TO EDIT: I am assuming that your original graph (the one you are deserializing) had the same matching_filenames variable. Quite confusingly, the tensor name "matching_filenames:0" actually refers to the tensor going from the VariableV2 op to the Assign op. The type of this edge is string_ref and you don't really want to break that edge.
The output from a variable typically goes through an identity op called matching_filenames/read. This is what you want to use as the key in your input_map. For the value, you want the same tensor in your new filenames. So, your call should probably look like:
tf.train.import_meta_graph(latest+'.meta',
input_map={'matching_filenames/read': filenames.read_value()},
import_scope='import')
In general, variables are fairly complicated. If this does not work, you can use some placeholder op and feed the names into it manually.

How to filter tensor from queue based on some predicate in tensorflow?

How can I filter data stored in a queue using a predicate function? For example, let's say we have a queue that stores tensors of features and labels and we just need those that meet the predicate. I tried the following implementation without success:
feature, label = queue.dequeue()
if (predicate(feature, label)):
enqueue_op = another_queue.enqueue(feature, label)
The most straightforward way to do this is to dequeue a batch, run them through the predicate test, use tf.where to produce a dense vector of the ones that match the predicate, and use tf.gather to collect the results, and enqueue that batch. If you want that to happen automatically, you can start a queue runner on the second queue - the easiest way to do that is to use tf.train.batch:
Example:
import numpy as np
import tensorflow as tf
a = tf.constant(np.array([5, 1, 9, 4, 7, 0], dtype=np.int32))
q = tf.FIFOQueue(6, dtypes=[tf.int32], shapes=[])
enqueue = q.enqueue_many([a])
dequeue = q.dequeue_many(6)
predmatch = tf.less(dequeue, [5])
selected_items = tf.reshape(tf.where(predmatch), [-1])
found = tf.gather(dequeue, selected_items)
secondqueue = tf.FIFOQueue(6, dtypes=[tf.int32], shapes=[])
enqueue2 = secondqueue.enqueue_many([found])
dequeue2 = secondqueue.dequeue_many(3) # XXX, hardcoded
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
sess.run(enqueue) # Fill the first queue
sess.run(enqueue2) # Filter, push into queue 2
print sess.run(dequeue2) # Pop items off of queue2
The predicate produces a boolean vector; the tf.where produces a dense vector of the indexes of the true values, and the tf.gather collects items from your original tensor based upon those indexes.
A lot of things are hardcoded in this example that you'd need to make not-hardcoded in reality, of course, but hopefully it shows the structure of what you're trying to do (create a filtering pipeline). In practice, you'd want QueueRunners on there to keep things churning automatically. Using tf.train.batch is very useful to handle that automatically -- see Threading and Queues for more detail.