I intend to write a queue runner thread which does some processing. But basically I can formulate my action as some TF op which depends on another queue.dequeue.
So, maybe the canonical way to write this would be:
op = make_op()
with coord.stop_on_exception():
while True:
session.run(op)
At the very end it would stop because of an OutOfRangeError exception.
I could however also implement the loop as part of the graph and write it like this:
def body(last):
with tf.control_dependencies([last]):
return make_op()
op = tf.while_loop(
cond=lambda: tf.constant(True),
body=body,
loop_vars=[tf.no_op()],
parallel_iterations=1, back_prop=False)
with coord.stop_on_exception():
session.run(op)
I would expect that this is faster because there will be no Python code involved while executing the loop. Is there any downsides with this approach or any reason why this would not work?
Related
I'm computing gradients from a private network and applying them to another master network. Then I'm copying the weights for the master to the private (it sounds redundant but bear with me). The problem is that with every iteration get_weights becomes slower and I even run out of memory.
def work(self, session):
with session.as_default(), session.graph.as_default():
self.private_net = ACNetwork()
state = self.env.reset()
while counter<TOTAL_TR_STEPS:
action_index, action_vector = self.get_action(state)
next_state, reward, done, info = self.env.step(action_index)
....# store the new data : reward, state etc...
if done == True:
# end of episode
state = self.env.reset()
a_grads, c_grads = self.private_net.get_gradients()
self.master.update_from_gradients(a_grads, c_grads)
self._update_worker_net() #this is the slow one
!!!!!!
This is the function that uses get_weights.
def _update_worker_net(self):
self.private_net.actor_t.set_weights(\
self.master.actor_t.get_weights())
self.private_net.critic.set_weights(\
self.master.critic.get_weights())
return
Looking around I found a post that suggested using
K.clear_session()
at the end of the while block (at the !!!!!! segment) because somehow new nodes are being added (?!) at the graph. But that onle returned an error:
AssertionError: Do not use tf.reset_default_graph() to clear nested graphs. If you need a cleared graph, exit the nesting and create a new graph.
Is there a faster way to transfer weights? Is there a way to not add new nodes (if that is what is indeed happening?)
This would typically happen when you dynamically add new nodes to the graph. Example situation:
while True:
grad_op = optimizer.get_gradients()
session.run([gradients])
Where get_gradients will add new operations to the graph. Operations returned by get_gradients would not change regardless of how many times you call it, therefore a single call should be enough. The correct way to rewrite it would be:
grad_op = optimizer.get_gradients()
while True:
session.run([gradients])
Something like that is probably happening in your code. Try to make sure that you dont construct new operations within your while loop.
y = []
for i in range(9999):
x = ... # build/get sub graph
y.append(x)
y = tf.stack(x)
I am trying to build a Tensorflow Graph that contains multiple separate sub-graph/inputs (independently each other) that I am using a python for-loop to achieve this. (assuming the for-loop can not achieve by tensor operation).
I figure out this part is really slow and I think it is because the python for-loop is not in parallel (process them one by one). I wonder is there solution I can make it parallel (because each sub-graph/input is independent each other and separate)? I wonder change to tf.while_loop can be helpful or any other suggestions?
I am trying to minimize x*x with adagrad optimiser. I expect to get x=0 as result, but I get value x, close to initial value.
import tensorflow as tf
x=tf.Variable(-2.)
sq=x*x
o = tf.train.AdamOptimizer(1e-1).minimize(sq)
with tf.Session() as sess:
init = tf.global_variables_initializer()
sess.run([init])
sess.run([o])
r=sess.run([x])
print("done",r)
I get -1.9 as a result, instead of expected 0.
Do I understand correctly that -2 is initial value here, or is it something else? Does AdamOptimiser perform just one step or is it possible to launch it for continious optimisation? How do I get x=0 as result?
sess.run([0]) runs only a single step. To perform a full optimization, you need to run many steps, which can be done by repeating the single step in a loop.
Thus, you can replace sess.run([o]) with:
for i in range(1000):
sess.run([o])
This yields the results 3.4735016e-23, very close to the expected 0.
In my experience, people usually run many optimization steps just as I demonstrated, with a for loop. If you are interested in implementing the loop as a TensorFlow operation, and then running this operation only once, this can be done, but it is not recommended. The reasons are: (a) I don't think you will gain any "elegance" in your code by doing this. (b) If you want to run 1000 steps, you will need to add 1000 sets of operations to your graph, and group them as one. Contrast this to needing only one set of operations.
You can see more relevant information in this question.
What is the canonical way of running a Tensorflow "for loop"?
Specifically, suppose we have some body function which does NOT depend on the loop iteration, but must be run n times.
One might think that a good method might be to run this inside of a tf.while_loop like this:
def body(x):
return ...
def while_body(i,x):
return i+1, body(x)
i, x = tf.while_loop(lambda i: tf.less(i, n), while_body, [tf.constant(0),x])
In fact, that is precisely what the highest rated answer in this question suggests:
How can I run a loop with a tensor as its range? (in tensorflow)
However, the tf.while_loop docs specifically say
For correct programs, while_loop should return the same result for any parallel_iterations > 0.
If you put a counter in the body, then it seems that that condition is violated. So it seems that there must be a different way of setting up a "for loop".
Furthermore, even if there is no explicit error, doing so seems like it will create a dependency between iterations meaning that I do not think they will run in parallel.
After some investigation, it seems that the tf.while_loop idiom used above is quite common. Alternatively, one can use tf.scan:
def body( x ):
return ...
def scan_body( previous_output, iteration ):
return body( ... )
x = tf.scan( scan_body, tf.range(n), initializer = [x] )
although I have no idea if one is preferable from a performance point of view. Note in the above that we have to wrap the body function to accept the previous output.
About the tf.contrib.data.Dataset (from TensorFlow 1.2, see here and here) usage:
The way how to get data doesn't really fit any way how I get the data usually. In my case, I have a thread and I receive data there and I don't know in advance when it will end but I see when it ends. Then I wait until I processed all the buffers and then I have finished one epoch. How can I get this logic with the Dataset?
Note that I prefer the Dataset interface over the QueueBase interface because it gives me the iterator interface which I can reinitialize and even reset to a different Dataset. This is more powerful compared to queues which cannot be reopened currently after they are closed (see here and here).
Maybe a similar question, or the same question: How can I wrap around a Dataset over a queue? I have some thread with reads some data from somewhere and which can feed it and queue it somehow. How do I get the data into the Dataset? I could repeat some dummy tensor infinite times and then use map to just return my queue.dequeue() but that really only gets me back to all the original problems with the queue, i.e. how to reopen the queue.
The new Dataset.from_generator() method allows you to define a Dataset that is fed by a Python generator. (To use this feature at present, you must download a nightly build of TensorFlow or build it yourself from source. It will be part of TensorFlow 1.4.)
The easiest way to implement your example would be to replace your receiving thread with a generator, with pseudocode as follows:
def receiver():
while True:
next_element = ... # Receive next element from external source.
# Note that this method may block.
end_of_epoch = ... # Decide whether or not to stop based on next_element.
if not end_of_epoch:
yield next_element # Note: you may need to convert this to an array.
else:
return # Returning will signal OutOfRangeError on downstream iterators.
dataset = tf.contrib.data.Dataset.from_generator(receiver, output_types=...)
# You can chain other `Dataset` methods after the generator. For example:
dataset = dataset.prefetch(...) # This will start a background thread
# to prefetch elements from `receiver()`.
dataset = dataset.repeat(...) # Note that each repetition will call
# `receiver()` again, and start from
# a fresh state.
dataset = dataset.batch(...)
More complicated topologies are possible. For example, you can use Dataset.interleave() to create many receivers in parallel.