How to make tensorflow assignment op part of computational graph without explicitly running its output? - tensorflow

I am trying to create a custom gradient in tensorflow to implement the exponentially smoothed (unbiased) gradient of a logarithm that is suggested in this paper (https://arxiv.org/pdf/1801.04062.pdf). What I need to do is crease a new variable that stores an exponentially smoothed value, which is updated and used in a custom gradient function. Additionally, I need a flag which tells me when the first gradient calculation is being done, so I can initialize the exponentially smoothed value to the appropriate (data-dependent) value. Furthermore, the output of the custom gradient function must be just the gradient, so it will be a pain in the butt to access the output of a tf.assign from inside the custom gradient. Lastly, I do not want to create a second operation that 'manually' initializes the exponential smoothing by running it separately in my training loop. Anyway, this is all too complicated, so I have an abstract, but simple, problem outlined below, the solution to which would solve my problem:
What I need to be able to do is update one variable in a manner which is conditional upon a second, and furthermore I need to update the second variable without providing it as explicit output by my function. Example code demonstrating my problem is below:
import tensorflow as tf
a = tf.get_variable(name = "test",initializer=True)
b = tf.get_variable(name = "testval",initializer = 10.)
init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init)
def make_function(inp):
with tf.variable_scope("",reuse = True):
a = tf.get_variable(name = "test",dtype = tf.bool)
b = tf.get_variable(name = "testval")
iftrue = lambda: [tf.assign(b,inp),tf.assign(a,False)]
iffalse = lambda: [tf.assign(b,(b + inp)/2),tf.assign(a,False)]
acond,bcond = tf.cond(a,iftrue,iffalse)
return acond
I = tf.placeholder(tf.float32)
tcond = make_function(I)
print("{}\tThe initial values of a and b".format(sess.run([a,b])))
print("{}\t\tRun, tcond1. output is the updated value of b.".format(sess.run(tcond,{I:1})))
print("{}\tNow we see that b has been updated, but a has not.".format(sess.run([a,b])))
print("{}\t\tSo now the value is 2 instead of 1.5 like it should be.".format(sess.run(tcond,{I:2})))
The output is:
[True, 10.0] The initial values of a and b
1.0 Run, tcond1. output is the updated value of b.
[True, 1.0] Now we see that b has been updated, but a has not.
2.0 So now the value is 2 instead of 1.5 like it should be.
Now, I understand that I need to have a line like sess.run(acond) where acond is the output of the conditional within make_function, but I can't return that because my function needs to only return the value of b (not a), and I don't want to have to carry around an extra op that I need to remember to run on the first training iteration, but not on the others.
So, is there a way to add the assignment op acond to the computational graph without explicitly returning it and running with it sess.run?

Add this operation to a custom collection and, then, create a dependency between your final op (e.g. the train_op) and your acond.
Inside the method:
tf.add_to_collection("to_run", acond)
In the definition of the final op:
to_run = tf.get_collection("to_run")
with tf.control_dependencies(to_run):
final_op = <something>
When you run final_op you are assured your acond has been already executed.

Related

How do I get value function/critic values from Rllib's PPO algorithm for a range of observations?

Goal: I want to train a PPO agent on a problem and determine its optimal value function for a range of observations. Later I plan to work with this value function (economic inequality research). The problem is sufficiently complex so that dynamic programming techniques no longer work.
Approach: In order to check, whether I get correct outputs for the value function, I have trained PPO on a simple problem, whose analytical solution is known. However, the results for the value function are rubbish, which is why I suspect that I have done sth wrong.
The code:
from keras import backend as k_util
...
parser = argparse.ArgumentParser()
# Define framework to use
parser.add_argument(
"--framework",
choices=["tf", "tf2", "tfe", "torch"],
default="tf",
help="The DL framework specifier.",
)
...
def get_rllib_config(seeds, debug=False, framework="tf") -> Dict:
...
def get_value_function(agent, min_state, max_state):
policy = agent.get_policy()
value_function = []
for i in np.arange(min_state, max_state, 1):
model_out, _ = policy.model({"obs": np.array([[i]], dtype=np.float32)})
value = k_util.eval(policy.model.value_function())[0]
value_function.append(value)
print(i, value)
return value_function
def train_schedule(config, reporter):
rllib_config = config["config"]
iterations = rllib_config.pop("training_iteration", 10)
agent = PPOTrainer(env=rllib_config["env"], config=rllib_config)
for _ in range(iterations):
result = agent.train()
reporter(**result)
values = get_value_function(agent, 0, 100)
print(values)
agent.stop()
...
resources = PPO.default_resource_request(exp_config)
tune_analysis = tune.Tuner(tune.with_resources(train_schedule, resources=resources), param_space=exp_config).fit()
ray.shutdown()
So first I get the policy (policy = agent.get_policy()) and run a forward pass with each of the 100 values (model_out, _ = policy.model({"obs": np.array([[i]], dtype=np.float32)})). Then, after each forward pass I use the value_function() method to get the output of the critic network and evaluate the tensor via keras backend.
The results:
True VF (analytical solution)
VF output of Rllib
Unfortunately you can see that the results are not that promising. Maybe I have missed a pre- or postprocessing step? Does the value_function() method even return the last layer of the critic network?
I am very grateful for any help!
It's not part of your script, but I assume that you have trained the policy before you attempt to get useful values out of it.
You are correct in assuming that the value_function() returns the output of the last layer of the critic network in RLlib's implementations.
Have a look at the value function metrics to see if it's actually learning anything (RLlib logs .../learner_stats/vf_loss and .../learner_stats/vf_explained_var)!
After training the model, I'd also try to query the model directly. If that looks better, something is likely off with the code you posted here.

Writing custom optimizers questions

apologies beforehand if these are basic questions - I've done some digging around and can't find straightforward answers. If there are links to any resources that can help that would be great too!
I'm currently looking at the below piece of code for a class that implements optimizer.optimizer. I'm having trouble understanding the following things:
def _create_slots(self, var_list):
# Create slots for the first and second moments.
for v in var_list:
self._zeros_slot(v, "m", self._name)
def _apply_dense(self, grad, var):
lr_t = math_ops.cast(self._lr_t, var.dtype.base_dtype)
beta_t = math_ops.cast(self._beta_t, var.dtype.base_dtype)
alpha_t = math_ops.cast(self._alpha_t, var.dtype.base_dtype)
eps = 1e-7 #cap for moving average
m = self.get_slot(var, "m")
m_t = m.assign(tf.maximum(beta_t * m + eps, tf.abs(grad)))
var_update = state_ops.assign_sub(var, lr_t*grad*(1.0+alpha_t*tf.sign(grad)*tf.sign(m_t) ) )
#Create an op that groups multiple operations
#When this op finishes, all ops in input have finished
return control_flow_ops.group(*[var_update, m_t])
In create slots, is it allocating a variable "m" for each weight? And if I needed more variables then would I have another line in the for loop like self._zero_slots(v, "another variable", self._name) and so forth?
Is the input grad and var in _apply_dense the gradient per weight and variables per weight? What if I needed other gradients, e.g. if I wanted to do a global update based on the whole gradient matrix?
How is m being updated with new weights? It seems like on every iteration it would just be multiplied by beta.
In the line with var_update, it seems like var is being updated with the weight update, but I can also get my variables from var?

How to fetch gradients with respect to certain occurrences of variables in tensorflow?

Since tensorflow supports variable reuse, some part of computing graph may occur multiple times in both forward and backward process. So my question is, is it possible to update variables with respect their certain occurrences in the compute graph?
For example, in X_A->Y_B->Y_A->Y_B, Y_B occurs twice, how to update them respectively? I mean, at first, we take the latter occurrence as constant, and update the previous one, then do opposite.
A more simple example is, say X_A, Y_B, Y_A are all scalar variable, then let Z = X_A * Y_B * Y_A * Y_B, here the gradient of Z w.r.t both occurrences of Y_B is X_A * Y_B * Y_A, but actually the gradient of Z to Y_B is 2*X_A * Y_B * Y_A. In this example computing gradients respectively may seems unnecessary, but not always are those computation commutative.
In the first example, gradients to the latter occurrence may be computed by calling tf.stop_gradient on X_A->Y_B. But I could not think of a way to fetch the previous one. Is there a way to do it in tensorflow's python API?
Edit:
#Seven provided an example on how to deal with it when reuse a single variable. However often it's a variable scope that is reused, which contains many variables and functions that manage them. As far as I know, their is no way to reuse a variable scope with applying tf.stop_gradient to all variables it contains.
With my understanding, when you use A = tf.stop_gradient(A), A will be considered as a constant. I have an example here, maybe it can help you.
import tensorflow as tf
wa = tf.get_variable('a', shape=(), dtype=tf.float32,
initializer=tf.constant_initializer(1.5))
b = tf.get_variable('b', shape=(), dtype=tf.float32,
initializer=tf.constant_initializer(7))
x = tf.placeholder(tf.float32, shape=())
l = tf.stop_gradient(wa*x) * (wa*x+b)
op_gradient = tf.gradients(l, x)
sess = tf.Session()
sess.run(tf.global_variables_initializer())
print sess.run([op_gradient], feed_dict={x:11})
I have a workaround for this question. Define a custom getter for the concerning variable scope, which wraps the default getter with tf.stop_gradient. This could set all variables returned in this scope as a Tensor contributing no gradients, though sometimes things get complicated because it returns a Tensor instead of a variable, such as when using tf.nn.batch_norm.

TensorFlow: tf.case() & f_default should do nothing

I'm working on a network with 4 class-specific autoencoders (3 layer feedforward), and within the training iteration, there is a case check to decide, which autoencoder has to be updated:
def f(k): return tf.train.AdamOptimizer(learning_rate=lernrate).minimize(Cost_List[k]), n_List[k].assign_add(1.0), Cost_List[k]
def g(): ???
nothing = g()
min_index = tf.argmin(Cost_List, 0)
Case_0 = (tf.equal(min_index,0), lambda: f(0))
Case_1 = (tf.equal(min_index,1), lambda: f(1))
Case_2 = (tf.equal(min_index,2), lambda: f(2))
Case_3 = (tf.equal(min_index,3), lambda: f(3))
Case_List = [Case_0, Case_1, Case_2, Case_3]
[optimizer, update, cost] = tf.case(Case_List, nothing)
In the case, that no condition is fulfilled, nothing should be done. In this scenario, one of the four cases will be realized, so it's no practical problem here yet, but I want to modify the code, and then it will be important, that the training sample will be skipped in the default case. The problem is, that the return type(s) of f_default and all other return types have to be the same, because sess.run([optimizer, update, cost]) is expecting a certain type. How can I do this, that really nothing happens in the default case? I have already tried to use tf.no_op() but that's not working...
Thanks,
Meridius
To make the signatures match, you could define g() as follows:
def g():
return tf.no_op(), tf.no_op(), tf.constant(0.0)
Note that it would be slightly more efficient to pass g directly as f_default (as opposed to passing g() as the current code does) but the behaviour should be the same.

Can cond support TF ops with side effects?

The (source code) documentation for tf.cond is unclear on whether the functions to be performed when the predicate is evaluated can have side effects or not. I've done some tests but I'm getting conflicting results. For example the code below does not work:
import tensorflow as tf
from tensorflow.python.ops import control_flow_ops
pred = tf.placeholder(tf.bool, [])
count = tf.Variable(0)
adder = count.assign_add(1)
subtractor = count.assign_sub(2)
my_op = control_flow_ops.cond(pred, lambda: adder, lambda: subtractor)
sess = tf.InteractiveSession()
tf.initialize_all_variables().run()
my_op.eval(feed_dict={pred: True})
count.eval() # returns -1
my_op.eval(feed_dict={pred: False})
count.eval() # returns -2
I.e. no matter what value the predicate evaluates to, both functions are getting run, and so the net result is a subtraction of 1. On the other hand, this code snippet does work, where the only difference is that I add new ops to the graph every time my_op is called:
pred = tf.placeholder(tf.bool, [])
count = tf.Variable(0)
my_op = control_flow_ops.cond(pred, lambda:count.assign_add(1), lambda:count.assign_sub(2))
sess = tf.InteractiveSession()
tf.initialize_all_variables().run()
my_op.eval(feed_dict={pred: False})
count.eval() # returns -2
my_op.eval(feed_dict={pred: True})
count.eval() # returns -1
Not sure why creating new ops every time works while the other case doesn't, but I'd obviously rather not be adding nodes as the graph will eventually become too big.
Your second version—where the assign_add() and assign_sub() ops are creating inside the lambdas passed to cond()—is the correct way to do this. Fortunately, each of the two lambdas is only evaluated once, during the call to cond(), so your graph will not grow without bound.
Essentially what cond() does is the following:
Create a Switch node, which forwards its input to only one of two outputs, depending on the value of pred. Let's call the outputs pred_true and pred_false. (They have the same value as pred but that's unimportant since this is never directly evaluated.)
Build the subgraph corresponding to the if_true lambda, where all of the nodes have a control dependency on pred_true.
Build the subgraph corresponding to the if_false lambda, where all of the nodes have a control dependency on pred_false.
Zip together the lists of return values from the two lambdas, and create a Merge node for each of these. A Merge node takes two inputs, of which only one is expected to be produced, and forwards it to its output.
Return the tensors that are the outputs of the Merge nodes.
This means you can run your second version, and be content that the graph remains a fixed size, regardless of how many steps you run.
The reason your first version doesn't work is that, when a Tensor is captured (like adder or subtractor in your example), an additional Switch node is added to enforce the logic that the value of the tensor is only forwarded to the branch that actually executes. This is an artifact of how TensorFlow combines feed-forward dataflow and control flow in its execution model. The result is that the captured tensors (in this case the results of the assign_add and assign_sub) will always be evaluated, even if they aren't used, and you'll see their side effects. This is something we need to document better, and as Michael says, we're going to make this more usable in future.
The second case works because you have added the ops within the cond: this causes them to conditionally execute.
The first case it is analogous to saying:
adder = (count += 1)
subtractor = (count -= 2)
if (cond) { adder } else { subtractor }
Since adder and subtractor are outside the conditional, they are always executed.
The second case is more like saying
if (cond) { adder = (count += 1) } else { subtractor = (count -= 2) }
which in this case does what you expected.
We realize that the interaction between side effects and (somewhat) lazy evaluation is confusing, and we have a medium-term goal to make things more uniform. But the important thing to understand for now is that we do not do true lazy evaluation: the conditional acquires a dependency on every quantity defined outside the conditional that is used within either branch.