About the tf.contrib.data.Dataset (from TensorFlow 1.2, see here and here) usage:
The way how to get data doesn't really fit any way how I get the data usually. In my case, I have a thread and I receive data there and I don't know in advance when it will end but I see when it ends. Then I wait until I processed all the buffers and then I have finished one epoch. How can I get this logic with the Dataset?
Note that I prefer the Dataset interface over the QueueBase interface because it gives me the iterator interface which I can reinitialize and even reset to a different Dataset. This is more powerful compared to queues which cannot be reopened currently after they are closed (see here and here).
Maybe a similar question, or the same question: How can I wrap around a Dataset over a queue? I have some thread with reads some data from somewhere and which can feed it and queue it somehow. How do I get the data into the Dataset? I could repeat some dummy tensor infinite times and then use map to just return my queue.dequeue() but that really only gets me back to all the original problems with the queue, i.e. how to reopen the queue.
The new Dataset.from_generator() method allows you to define a Dataset that is fed by a Python generator. (To use this feature at present, you must download a nightly build of TensorFlow or build it yourself from source. It will be part of TensorFlow 1.4.)
The easiest way to implement your example would be to replace your receiving thread with a generator, with pseudocode as follows:
def receiver():
while True:
next_element = ... # Receive next element from external source.
# Note that this method may block.
end_of_epoch = ... # Decide whether or not to stop based on next_element.
if not end_of_epoch:
yield next_element # Note: you may need to convert this to an array.
else:
return # Returning will signal OutOfRangeError on downstream iterators.
dataset = tf.contrib.data.Dataset.from_generator(receiver, output_types=...)
# You can chain other `Dataset` methods after the generator. For example:
dataset = dataset.prefetch(...) # This will start a background thread
# to prefetch elements from `receiver()`.
dataset = dataset.repeat(...) # Note that each repetition will call
# `receiver()` again, and start from
# a fresh state.
dataset = dataset.batch(...)
More complicated topologies are possible. For example, you can use Dataset.interleave() to create many receivers in parallel.
Related
I'm computing gradients from a private network and applying them to another master network. Then I'm copying the weights for the master to the private (it sounds redundant but bear with me). The problem is that with every iteration get_weights becomes slower and I even run out of memory.
def work(self, session):
with session.as_default(), session.graph.as_default():
self.private_net = ACNetwork()
state = self.env.reset()
while counter<TOTAL_TR_STEPS:
action_index, action_vector = self.get_action(state)
next_state, reward, done, info = self.env.step(action_index)
....# store the new data : reward, state etc...
if done == True:
# end of episode
state = self.env.reset()
a_grads, c_grads = self.private_net.get_gradients()
self.master.update_from_gradients(a_grads, c_grads)
self._update_worker_net() #this is the slow one
!!!!!!
This is the function that uses get_weights.
def _update_worker_net(self):
self.private_net.actor_t.set_weights(\
self.master.actor_t.get_weights())
self.private_net.critic.set_weights(\
self.master.critic.get_weights())
return
Looking around I found a post that suggested using
K.clear_session()
at the end of the while block (at the !!!!!! segment) because somehow new nodes are being added (?!) at the graph. But that onle returned an error:
AssertionError: Do not use tf.reset_default_graph() to clear nested graphs. If you need a cleared graph, exit the nesting and create a new graph.
Is there a faster way to transfer weights? Is there a way to not add new nodes (if that is what is indeed happening?)
This would typically happen when you dynamically add new nodes to the graph. Example situation:
while True:
grad_op = optimizer.get_gradients()
session.run([gradients])
Where get_gradients will add new operations to the graph. Operations returned by get_gradients would not change regardless of how many times you call it, therefore a single call should be enough. The correct way to rewrite it would be:
grad_op = optimizer.get_gradients()
while True:
session.run([gradients])
Something like that is probably happening in your code. Try to make sure that you dont construct new operations within your while loop.
I have 4 (or more) models (same structure but different training data). Now I want to ensemble them to make a prediction. I want to pre-load the models and then predict one input message (one message at a time) in parallel via multiprocess. However, the program always stops at "session.run" step. I could not figure it out why.
I tried passing all arguments to the function in each process, as shown in the code below. I also tried using a Queue object and put all the data (except the model object) in the queue. I also tried to set the number of process to 1. It made no difference.
with Manager() as manager:
first_level_test_features=manager.list()
procs =[]
for id in range(4):
p = Process(target=predict, args=(id, (message, models, configs, vocabs, emoji_dict,first_level_test_features)))
procs.append(p)
p.start()
for p in procs:
p.join()
I did not get any error message since it is just stuck there. I would expect the program can start multiple processes and each process uses the model pass to it to make the prediction.
I am unsure how session sharing along different Processes would work, and this is probably where your issue comes from. Given the way TensorFlow works, I would advise implementing the ensemble call as a graph operation, so that it can be run through a single session.run call, with TF handling the parallelization of computations wherever possible.
In practice, if you have symbolic tensors representing the models' predictions, you could use a TF operation to aggregate them (tf.concat, tf.reduce_mean, tf.add_n... whichever suits your design) and end up with a single symbolic tensor representing the ensemble prediction.
I hope this helps; if not, please provide some more details as to what your setting is, notably which form your models have.
I am trying to use new features of TF, namely Data API, and I am not sure how prefetch works. In the code below
def dataset_input_fn(...)
dataset = tf.data.TFRecordDataset(filenames, compression_type="ZLIB")
dataset = dataset.map(lambda x:parser(...))
dataset = dataset.map(lambda x,y: image_augmentation(...)
, num_parallel_calls=num_threads
)
dataset = dataset.shuffle(buffer_size)
dataset = dataset.batch(batch_size)
dataset = dataset.repeat(num_epochs)
iterator = dataset.make_one_shot_iterator()
does it matter between each lines above I put dataset=dataset.prefetch(batch_size)? Or maybe it should be after every operation that would be using output_buffer_size if the dataset was coming from tf.contrib.data?
In discussion on github I found a comment by mrry:
Note that in TF 1.4 there will be a Dataset.prefetch() method that
makes it easier to add prefetching at any point in the pipeline, not
just after a map(). (You can try it by downloading the current nightly
build.)
and
For example, Dataset.prefetch() will start a background thread to
populate a ordered buffer that acts like a tf.FIFOQueue, so that
downstream pipeline stages need not block. However, the prefetch()
implementation is much simpler, because it doesn't need to support as
many different concurrent operations as a tf.FIFOQueue.
so it means prefetch could be put by any command and it works on the previous command. So far I have noticed the biggest performance gains by putting it only at the very end.
There is one more discussion on Meaning of buffer_size in Dataset.map , Dataset.prefetch and Dataset.shuffle where mrry explains a bit more about the prefetch and buffer.
UPDATE 2018/10/01:
From version 1.7.0 Dataset API (in contrib) has an option to prefetch_to_device. Note that this transformation has to be the last in the pipeline and when TF 2.0 arrives contrib will be gone. To have prefetch work on multiple GPUs please use MultiDeviceIterator (example see #13610) multi_device_iterator_ops.py.
https://www.tensorflow.org/versions/master/api_docs/python/tf/contrib/data/prefetch_to_device
Imagine the case that I want to pair samples from one pool of data with samples from another pool of data together to feed into the network. But many samples from the first pool should be paired with the same sample in the second pool. (let's assume all samples are of the same shape).
For example, if we denote the samples from the first pool as f_i, samples from the second pool as g_j, I might want to have a mini-batch of samples as below (each line is one sample in the mini-batch):
(f_0, g_0)
(f_1, g_0)
(f_2, g_0)
(f_3, g_0)
...
(f_10, g_0)
(f_11, g_1)
(f_12, g_1)
(f_13, g_1)
...
(f_19, g_1)
...
If the data from the second pool are small (like labels), then I can simply store them together with samples from the first pool to tfrecords. But in my case the data from the second pool are of the same size as data from the first pool (for example, both are movie segments). Then saving them in pair in one tfrecords files seems to almost double the disk space use.
I wonder if there is any way in which I can only save all the samples from the second pool once on the disk, but still feed the data to my network as the way I wanted? (Assume I already have already specified the mapping between samples in the first pool and those from the second pool based on their file names).
Thanks a lot!
You can use an iterator for each one of the tfrecords (or pool of samples), so you get two iterators where each can iterate on its own pace. When you run get_next() on an iterator, the next sample is returned, so you have to keep it in a tensor and manually feed it. Quoting from the documentation:
(Note that, like other stateful objects in TensorFlow, calling Iterator.get_next() does not immediately advance the iterator. Instead you must use the returned tf.Tensor objects in a TensorFlow expression, and pass the result of that expression to tf.Session.run() to get the next elements and advance the iterator.)
So all you need is a couple of loops that iterate and combine samples from each iterator as a pair, and then you can feed this when you run your desired operation. For example:
g_iterator = g_dataset.make_one_shot_iterator()
get_next_g = g_iterator.get_next()
f_iterator = f_dataset.make_one_shot_iterator()
get_next_f = f_iterator.get_next()
# loop g:
temp_g = session.run(get_next_g)
# loop f:
temp_f = session.run(get_next_f)
session.run(train, feed_dict={f: temp_f, g: temp_g})
Does this answer your question?
Given that I train a model; save it off with metagraph/save.Saver, and the load that graph into a new script/process to test against test data, what is the best way to make sure I only iterate over the test data once?
With my training data, I want to be able to iterate over the entire data set for an arbitrary number of iterations. I use
tf.train.string_input_producer()
to drive a queue of loading files for training, so I can safely leave num_epochs as default (=None) and let other controls drive training termination.
However, when I run the graph for evaluation, I just want to the evaluate the test set once (and gather the appropriate statistics).
Initial attempted solution:
Make a tensor for Epochs, and pass that into tf.train.string_input_producer, and then tf.Assign it to the appropriate value based on test/train.
But:
tf.train.string_input_producer only takes integers as num_epochs, so this isn't possible...unless I'm missing something.
Further notes: I use
tf.train.batch()
to read-in test/train data that has been serialized into protocol buffers (https://www.tensorflow.org/versions/r0.11/how_tos/reading_data/index.html#file-formats), so I have minimal visibility into how the data is loaded and how far along it is.
tf.train.batch apparently will throw tf.errors.OutOfRangeError, but I'm not clear how to catch that successfully, or if that is even what I really want to do. I tried a very naive
try...except...finally
(like in https://www.tensorflow.org/versions/r0.11/how_tos/reading_data/index.html#creating-threads-to-prefetch-using-queuerunner-objects), which didn't catch the error from tf.train.batch.