Can cond support TF ops with side effects? - tensorflow

The (source code) documentation for tf.cond is unclear on whether the functions to be performed when the predicate is evaluated can have side effects or not. I've done some tests but I'm getting conflicting results. For example the code below does not work:
import tensorflow as tf
from tensorflow.python.ops import control_flow_ops
pred = tf.placeholder(tf.bool, [])
count = tf.Variable(0)
adder = count.assign_add(1)
subtractor = count.assign_sub(2)
my_op = control_flow_ops.cond(pred, lambda: adder, lambda: subtractor)
sess = tf.InteractiveSession()
tf.initialize_all_variables().run()
my_op.eval(feed_dict={pred: True})
count.eval() # returns -1
my_op.eval(feed_dict={pred: False})
count.eval() # returns -2
I.e. no matter what value the predicate evaluates to, both functions are getting run, and so the net result is a subtraction of 1. On the other hand, this code snippet does work, where the only difference is that I add new ops to the graph every time my_op is called:
pred = tf.placeholder(tf.bool, [])
count = tf.Variable(0)
my_op = control_flow_ops.cond(pred, lambda:count.assign_add(1), lambda:count.assign_sub(2))
sess = tf.InteractiveSession()
tf.initialize_all_variables().run()
my_op.eval(feed_dict={pred: False})
count.eval() # returns -2
my_op.eval(feed_dict={pred: True})
count.eval() # returns -1
Not sure why creating new ops every time works while the other case doesn't, but I'd obviously rather not be adding nodes as the graph will eventually become too big.

Your second version—where the assign_add() and assign_sub() ops are creating inside the lambdas passed to cond()—is the correct way to do this. Fortunately, each of the two lambdas is only evaluated once, during the call to cond(), so your graph will not grow without bound.
Essentially what cond() does is the following:
Create a Switch node, which forwards its input to only one of two outputs, depending on the value of pred. Let's call the outputs pred_true and pred_false. (They have the same value as pred but that's unimportant since this is never directly evaluated.)
Build the subgraph corresponding to the if_true lambda, where all of the nodes have a control dependency on pred_true.
Build the subgraph corresponding to the if_false lambda, where all of the nodes have a control dependency on pred_false.
Zip together the lists of return values from the two lambdas, and create a Merge node for each of these. A Merge node takes two inputs, of which only one is expected to be produced, and forwards it to its output.
Return the tensors that are the outputs of the Merge nodes.
This means you can run your second version, and be content that the graph remains a fixed size, regardless of how many steps you run.
The reason your first version doesn't work is that, when a Tensor is captured (like adder or subtractor in your example), an additional Switch node is added to enforce the logic that the value of the tensor is only forwarded to the branch that actually executes. This is an artifact of how TensorFlow combines feed-forward dataflow and control flow in its execution model. The result is that the captured tensors (in this case the results of the assign_add and assign_sub) will always be evaluated, even if they aren't used, and you'll see their side effects. This is something we need to document better, and as Michael says, we're going to make this more usable in future.

The second case works because you have added the ops within the cond: this causes them to conditionally execute.
The first case it is analogous to saying:
adder = (count += 1)
subtractor = (count -= 2)
if (cond) { adder } else { subtractor }
Since adder and subtractor are outside the conditional, they are always executed.
The second case is more like saying
if (cond) { adder = (count += 1) } else { subtractor = (count -= 2) }
which in this case does what you expected.
We realize that the interaction between side effects and (somewhat) lazy evaluation is confusing, and we have a medium-term goal to make things more uniform. But the important thing to understand for now is that we do not do true lazy evaluation: the conditional acquires a dependency on every quantity defined outside the conditional that is used within either branch.

Related

get_weights is slow with every iteration

I'm computing gradients from a private network and applying them to another master network. Then I'm copying the weights for the master to the private (it sounds redundant but bear with me). The problem is that with every iteration get_weights becomes slower and I even run out of memory.
def work(self, session):
with session.as_default(), session.graph.as_default():
self.private_net = ACNetwork()
state = self.env.reset()
while counter<TOTAL_TR_STEPS:
action_index, action_vector = self.get_action(state)
next_state, reward, done, info = self.env.step(action_index)
....# store the new data : reward, state etc...
if done == True:
# end of episode
state = self.env.reset()
a_grads, c_grads = self.private_net.get_gradients()
self.master.update_from_gradients(a_grads, c_grads)
self._update_worker_net() #this is the slow one
!!!!!!
This is the function that uses get_weights.
def _update_worker_net(self):
self.private_net.actor_t.set_weights(\
self.master.actor_t.get_weights())
self.private_net.critic.set_weights(\
self.master.critic.get_weights())
return
Looking around I found a post that suggested using
K.clear_session()
at the end of the while block (at the !!!!!! segment) because somehow new nodes are being added (?!) at the graph. But that onle returned an error:
AssertionError: Do not use tf.reset_default_graph() to clear nested graphs. If you need a cleared graph, exit the nesting and create a new graph.
Is there a faster way to transfer weights? Is there a way to not add new nodes (if that is what is indeed happening?)
This would typically happen when you dynamically add new nodes to the graph. Example situation:
while True:
grad_op = optimizer.get_gradients()
session.run([gradients])
Where get_gradients will add new operations to the graph. Operations returned by get_gradients would not change regardless of how many times you call it, therefore a single call should be enough. The correct way to rewrite it would be:
grad_op = optimizer.get_gradients()
while True:
session.run([gradients])
Something like that is probably happening in your code. Try to make sure that you dont construct new operations within your while loop.

How to make tensorflow assignment op part of computational graph without explicitly running its output?

I am trying to create a custom gradient in tensorflow to implement the exponentially smoothed (unbiased) gradient of a logarithm that is suggested in this paper (https://arxiv.org/pdf/1801.04062.pdf). What I need to do is crease a new variable that stores an exponentially smoothed value, which is updated and used in a custom gradient function. Additionally, I need a flag which tells me when the first gradient calculation is being done, so I can initialize the exponentially smoothed value to the appropriate (data-dependent) value. Furthermore, the output of the custom gradient function must be just the gradient, so it will be a pain in the butt to access the output of a tf.assign from inside the custom gradient. Lastly, I do not want to create a second operation that 'manually' initializes the exponential smoothing by running it separately in my training loop. Anyway, this is all too complicated, so I have an abstract, but simple, problem outlined below, the solution to which would solve my problem:
What I need to be able to do is update one variable in a manner which is conditional upon a second, and furthermore I need to update the second variable without providing it as explicit output by my function. Example code demonstrating my problem is below:
import tensorflow as tf
a = tf.get_variable(name = "test",initializer=True)
b = tf.get_variable(name = "testval",initializer = 10.)
init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init)
def make_function(inp):
with tf.variable_scope("",reuse = True):
a = tf.get_variable(name = "test",dtype = tf.bool)
b = tf.get_variable(name = "testval")
iftrue = lambda: [tf.assign(b,inp),tf.assign(a,False)]
iffalse = lambda: [tf.assign(b,(b + inp)/2),tf.assign(a,False)]
acond,bcond = tf.cond(a,iftrue,iffalse)
return acond
I = tf.placeholder(tf.float32)
tcond = make_function(I)
print("{}\tThe initial values of a and b".format(sess.run([a,b])))
print("{}\t\tRun, tcond1. output is the updated value of b.".format(sess.run(tcond,{I:1})))
print("{}\tNow we see that b has been updated, but a has not.".format(sess.run([a,b])))
print("{}\t\tSo now the value is 2 instead of 1.5 like it should be.".format(sess.run(tcond,{I:2})))
The output is:
[True, 10.0] The initial values of a and b
1.0 Run, tcond1. output is the updated value of b.
[True, 1.0] Now we see that b has been updated, but a has not.
2.0 So now the value is 2 instead of 1.5 like it should be.
Now, I understand that I need to have a line like sess.run(acond) where acond is the output of the conditional within make_function, but I can't return that because my function needs to only return the value of b (not a), and I don't want to have to carry around an extra op that I need to remember to run on the first training iteration, but not on the others.
So, is there a way to add the assignment op acond to the computational graph without explicitly returning it and running with it sess.run?
Add this operation to a custom collection and, then, create a dependency between your final op (e.g. the train_op) and your acond.
Inside the method:
tf.add_to_collection("to_run", acond)
In the definition of the final op:
to_run = tf.get_collection("to_run")
with tf.control_dependencies(to_run):
final_op = <something>
When you run final_op you are assured your acond has been already executed.

Cache intermediate tensor and update periodically

I have a large tensor that is expensive to calculate, but realistically I only need to recalculate it every 10 iterations or so (during gradient descent). What's the best way to do this?
More specifically:
Suppose I have an intermediate_tensor that is used in the calculation of final_tensor each time the a tf.Session is run. final_tensor is, in my case, a set of modified gradients to use in optimization. It is possible to define a graph that contains both intermediate_tensor and final_tensor. However, running this graph will be inefficient when intermediate_tensor changes slowly. In pseudocode, this is what I'd like to do:
intermediate_tensor = tf.some_operation(earlier_variable)
final_tensor = tf.matmul(intermediate_tensor, other_earlier_variable)
with tf.Session() as sess:
# pretending `partial_run` works like I want it to:
sess.partial_run(intermediate_tensor, feed_dict = {})
for i in range(5):
ft = sess.partial_run(final_tensor, feed_dict = {})
print(ft)
The experimental partial_run feature is almost what I'm looking for. However, partial_run can only be used if I want to evaluate final_tensor just once for each time I evaluate intemediate_tensor. It won't work for a for loop.
My workaround for the moment is to use tf.placeholder. I evaluate intermediate_tensor in one call to sess.run, then feed the result to a new call of sess.run as a placeholder. However, this is very inflexible. It requires that I hardcode the variable shape at compile time, for example. It's also not very good when the number of intermediate variables I'd like to use is very large.
Is there a better way? This would be very helpful if, say, one were using a curvature matrix that doesn't need to be evaluated every iteration.

TensorFlow: tf.case() & f_default should do nothing

I'm working on a network with 4 class-specific autoencoders (3 layer feedforward), and within the training iteration, there is a case check to decide, which autoencoder has to be updated:
def f(k): return tf.train.AdamOptimizer(learning_rate=lernrate).minimize(Cost_List[k]), n_List[k].assign_add(1.0), Cost_List[k]
def g(): ???
nothing = g()
min_index = tf.argmin(Cost_List, 0)
Case_0 = (tf.equal(min_index,0), lambda: f(0))
Case_1 = (tf.equal(min_index,1), lambda: f(1))
Case_2 = (tf.equal(min_index,2), lambda: f(2))
Case_3 = (tf.equal(min_index,3), lambda: f(3))
Case_List = [Case_0, Case_1, Case_2, Case_3]
[optimizer, update, cost] = tf.case(Case_List, nothing)
In the case, that no condition is fulfilled, nothing should be done. In this scenario, one of the four cases will be realized, so it's no practical problem here yet, but I want to modify the code, and then it will be important, that the training sample will be skipped in the default case. The problem is, that the return type(s) of f_default and all other return types have to be the same, because sess.run([optimizer, update, cost]) is expecting a certain type. How can I do this, that really nothing happens in the default case? I have already tried to use tf.no_op() but that's not working...
Thanks,
Meridius
To make the signatures match, you could define g() as follows:
def g():
return tf.no_op(), tf.no_op(), tf.constant(0.0)
Note that it would be slightly more efficient to pass g directly as f_default (as opposed to passing g() as the current code does) but the behaviour should be the same.

In dynamic_rnn() is it valid to include variables in the state?

I'm implementing a RNN cell around BasicLSTMCell where I want to be able to look back on past hidden states (across batch boundaries). I'm using dynamic_rnn() and the basic pattern I use is:
def __call__(self, inputs, old_state, scope=None):
mem = old_state[2]
# [do something with with mem]
cell_out, new_state = self.cell(inputs,
(old_state[0],
old_state[1]))
h_state = new_state.h
c_state = new_state.c
# control dependency required because of self.buf_index
with tf.get_default_graph().control_dependencies([cell_out]):
new_mem = write_to_buf(self.out_buf,
cell_out,
self.buf_index)
# update the buffer index
with tf.get_default_graph().control_dependencies(new_mem):
inc_step = tf.assign(self.buf_index, (self.buf_index + 1) %
self.buf_size)
with tf.get_default_graph().control_dependencies([inc_step]):
h_state = tf.identity(h_state)
t = [c_state, h_state, new_mem]
return cell_out, tuple(t)
self.buf and self.buf_index are variables. write_to_buf() is a function that uses scatter_update() to write the new hidden states to the buffer and returns the result.
I rely on the assumption that accesses to scatter updates return value guarantee that the new variable value is used (similar to this) so that caching of variables does not mess things up.
From debug prints it seems to work but it would be nice to get some confirmation or suggestions on alternatives.