New to Tensorflow so not sure if this is a specific question for Tensorflow Federated.
I'm studying adversarial attack on federated learning in this code. I'm curious how the weights received from the server are updated at the client.
For example, here is the code for a 'benign' update:
#tf.function
def compute_benign_update():
"""compute benign update sent back to the server."""
tf.nest.map_structure(lambda a, b: a.assign(b), model_weights,
initial_weights)
num_examples_sum = benign_dataset.reduce(initial_state=tf.constant(0),
reduce_func=reduce_fn)
weights_delta_benign = tf.nest.map_structure(lambda a, b: a - b,
model_weights.trainable,
initial_weights.trainable)
aggregated_outputs = model.report_local_outputs()
return weights_delta_benign, aggregated_outputs, num_examples_sum
I can see that the initial weights received from the server are assigned to model_weights then reduce_fn is used to train on a batch of data on the local client.
#tf.function
def reduce_fn(num_examples_sum, batch):
"""Runs `tff.learning.Model.train_on_batch` on local client batch."""
with tf.GradientTape() as tape:
output = model.forward_pass(batch)
gradients = tape.gradient(output.loss, model.trainable_variables)
optimizer.apply_gradients(zip(gradients, model.trainable_variables))
return num_examples_sum + tf.shape(output.predictions)[0]
Inside this function training occurs and (I think) model.trainable_variables is updated. The part that doesn't make sense to me is how the weights_delta_benign is calculated:
weights_delta_benign = tf.nest.map_structure(lambda a, b: a - b,
model_weights.trainable,
initial_weights.trainable)
It seems that the difference between model_weights.trainable and initial_weights.trainable is used, but didn't we originally set these to be equal in the first line of the compute_benign_update() function? I'm assuming the reduce_fn alters initial_weights somehow but I don't see the connection between model.trainable_variables used in the reduce function and initial_weights.trainable_variables.
Thanks, any help appreciated!
In the code you point to, initial_weights is only a collection of values (tf.Tensor objects), and model_weights is a reference to the model's variables (tf.Variable objects). We use initial_weights to assign the initial value to the model's variables.
Then, in the call to optimizer.apply_gradients(zip(gradients, model.trainable_variables)), you only modify the model's variables. (model.trainable_variables, which refers is the same objects as model_weights.trainable. I acknowledge, this is a bit confusing.)
So the subsequent computation of weights_delta_benign is computing the difference between the model's trainable variables at the end and start of the client's training procedure.
Related
I would like to ask you some help for creating my custom layer.
What I am trying to do is actually quite simple: generating an output layer with 'stateful' variables, i.e. tensors whose value is updated at each batch.
In order to make everything more clear, here is a snippet of what I would like to do:
def call(self, inputs)
c = self.constant
m = self.extra_constant
update = inputs*m + c
X_new = self.X_old + update
outputs = X_new
self.X_old = X_new
return outputs
The idea here is quite simple:
X_old is initialized to 0 in the def__ init__(self, ...)
update is computed as a function of the inputs to the layer
the output of the layer is computed (i.e. X_new)
the value of X_old is set equal to X_new so that, at the next batch, X_old is no longer equal to zero but equal to X_new from the previous batch.
I have found out that K.update does the job, as shown in the example:
X_new = K.update(self.X_old, self.X_old + update)
The problem here is that, if I then try to define the outputs of the layer as:
outputs = X_new
return outputs
I will receiver the following error when I try model.fit():
ValueError: An operation has `None` for gradient. Please make sure that all of your ops have
gradient defined (i.e. are differentiable). Common ops without gradient: K.argmax, K.round, K.eval.
And I keep having this error even though I imposed layer.trainable = False and I did not define any bias or weights for the layer. On the other hand, if I just do self.X_old = X_new, the value of X_old does not get updated.
Do you guys have a solution to implement this? I believe it should not be that hard, since also stateful RNN have a 'similar' functioning.
Thanks in advance for your help!
Defining a custom layer can become confusing some times. Some of the methods that you override are going to be called once but it gives you the impression that just like many other OO libraries/frameworks, they are going to be called many times.
Here is what I mean: When you define a layer and use it in a model the python code that you write for overriding call method is not going to be directly called in forward or backward passes. Instead, it's called only once when you call model.compile. It compiles the python code to a computational graph and that graph in which the tensors will flow is what does the computations during training and prediction.
That's why if you want to debug your model by putting a print statement it won't work; you need to use tf.print to add a print command to the graph.
It is the same situation with the state variable you want to have. Instead of simply assigning old + update to new you need to call a Keras function that adds that operation to the graph.
And note that tensors are immutable so you need to define the state as tf.Variable in the __init__ method.
So I believe this code is more like what you're looking for:
class CustomLayer(tf.keras.layers.Layer):
def __init__(self, **kwargs):
super(CustomLayer, self).__init__(**kwargs)
self.state = tf.Variable(tf.zeros((3,3), 'float32'))
self.constant = tf.constant([[1,1,1],[1,0,-1],[-1,0,1]], 'float32')
self.extra_constant = tf.constant([[1,1,1],[1,0,-1],[-1,0,1]], 'float32')
self.trainable = False
def call(self, X):
m = self.constant
c = self.extra_constant
outputs = self.state + tf.matmul(X, m) + c
tf.keras.backend.update(self.state, tf.reduce_sum(outputs, axis=0))
return outputs
In all the toturials (including tf official docs) that I see about tfe, The example uses the gradient tape, and manually adding all the gradients to the list of computed gradients e.g
variables = [w1, b1, w2, b2] <--- manually store all the variables
optimizer = tf.train.AdamOptimizer()
with tf.GradientTape() as tape:
y_pred = model.predict(x, variables)
loss = model.compute_loss(y_pred, y)
grads = tape.gradient(loss, variables) < ---- send them to tape.gradient
optimizer.apply_gradients(zip(grads, variables))
But is it the only way? even for huge models we need to accumulate all the parameters, or we somehow can access the defaults graph variables list
Trying to access tf.get_default_graph().get_collection(tf.GraphKeys.GLOBAL_VARIABLES)
or trainable_variables inside a tfe session gave the empty list.
To the best of my understanding, Eager mode in TensorFlow stores information about model in objects, for example in tf.keras.Model or tf.estimator.Estimator. In the absence of graph you can get the list of variables only there, using tf.keras.Model.trainable_variables for example.
Eager mode, however, can work with graph object created explicitly. In this case, i think it will store list of variables. Without it, keras model object will be the only explicit storage for variables.
I'm having trouble implementing my own parameter update defined below. I am trying to do this for a convolutional neural network that works when I use the AdamOptimizer.
Showing a histogram of weight and bias values shows no change over iterations, despite a change in loss. Thanks in advance.
def gradient_upgrade(gradients, base_rate, rate_multiplier):
with tf.name_scope('gradient-update'):
for i in range(len(weights)):
weights[i].assign(tf.subtract(weights[i], tf.multiply(gradients[i], base_rate * rate_multiplier)))
biases[i].assign(tf.subtract(biases[i], tf.multiply(gradients[len(weights)+i], base_rate * rate_multiplier)))
return weights, biases
gradient = tf.gradients(cost, [*weights, *biases])
Where I later call sess.run on feed_dict = minibatch
sess.run(gradient_upgrade(gradient, .001, 1), feed_dict = feed_dict)
weights and biases are in the following forms respectively
tf.Variable(tf.truncated_normal(shape, stddev=0.05))
tf.Variable(tf.constant(0.05, shape=[length]))
What seems to be happening is that your assign operations are not executed. In your call
sess.run(gradient_upgrade(gradient, .001, 1), feed_dict = feed_dict)
the invocation of gradient_update simply creates the assign operations it does not execute them. Since the returned values of gradient_update are weights and biases, sess.run() executes the portion of the graph to get these variables, which simply means read their current values.
To execute assign ops, you have a few options. First, you can save the created assign ops and pass them explicitly to sess.run(). Another option is to use with tf.control_dependencies(). You can search for more examples, but basically, it allows you to add dependencies between operations. In other words, you can tell tensorflow "before you can execute any of operations [a, b, c,...] you need to execute all operations in [x, y, z, ...]". Using it you can create something like an update_op created by Estimators. This update op will depend on all of your assign ops. Whenever you run sess.run(update_op) tensorflow will execute all of you assign ops.
Pseudo-code would look something like this:
# Create and put all of your assign ops in some list
# assign_ops_list.append(weights[i].assign(....)))
with tf.control_dependencies(assign_ops_list):
train_op = ... # Some operations that should trigger assignments
sess.run(train_op) # all assign ops will now run.
I am trying to implemente a Memory-augmented neural network, in which the memory and the read/write/usage weight vectors are updated according to a combination of their previous values. These weigths are different from the classic weight matrices between layers that are automatically updated with the fit() function! My problem is the following: how can I correctly initialize these weights as keras tensors and use them in the model? I explain it better with the following simplified example.
My API model is something like:
input = Input(shape=(5,6))
controller = LSTM(20, activation='tanh',stateful=False, return_sequences=True)(input)
write_key = Dense(4,activation='tanh')(controller)
read_key = Dense(4,activation='tanh')(controller)
w_w = Add()([w_u, w_r]) #<---- UPDATE OF WRITE WEIGHTS
to_write = Dot()([w_w, write_key])
M = Add()([M,to_write])
cos_sim = Dot()([M,read_key])
w_r = Lambda(lambda x: softmax(x,axis=1))(cos_sim) #<---- UPDATE OF READ WEIGHTS
w_u = Add()([w_u,w_r,w_w]) #<---- UPDATE OF USAGE WEIGHTS
retrieved_memory = Dot()([w_r,M])
controller_output = concatenate([controller,retrieved_memory])
final_output = Dense(6,activation='sigmoid')(controller_output)`
You can see that, in order to compute w_w^t, I have to have first defined w_r^{t-1} and w_u^{t-1}. So, at the beginning I have to provide a valid initialization for these vectors. What is the best way to do it? The initializations I would like to have are:
M = K.variable(numpy.zeros((10,4))) # MEMORY
w_r = K.variable(numpy.zeros((1,10))) # READ WEIGHTS
w_u = K.variable(numpy.zeros((1,10))) # USAGE WEIGHTS`
But, analogously to what said in #2486(entron), these commands do not return a keras tensor with all the needed meta-data and so this returns the following error:
AttributeError: 'NoneType' object has no attribute 'inbound_nodes'
I also thought to use the old M, w_r and w_u as further inputs at each iteration and analogously get in output the same variables to complete the loop. But this means that I have to use the fit() function to train online the model having just the target as final output (Model 1), and employ the predict() function on the model with all the secondary outputs (Model 2) to get the variables to use at the next iteration. I have also to pass the weigth matrices from Model 1 to Model 2 using get_weights() and set_weights(). As you can see, it becomes a little bit messy and too slow.
Do you have any suggestions for this problem?
P.S. Please, do not focus too much on the API model above because it is a simplified (almost meaningless) version of the complete one where I skipped several key steps.
Overview: I want to update only selected variables in a network. The network has parts A->B (in forward direction) and each of them has separate losses La and Lb. I want to train the weights a of A to optimize Lb. While doing this, the weights b of B should be fixed. How can I do this?
Approach 1: Select only a as variables to minimize using var_list in optimizer.minimize(loss, var_list=[a]).
https://github.com/tensorflow/tensorflow/issues/834 . This crashes with an error ValueError: No gradients provided for any variable, check your graph for ops that do not support gradients, between variables (...) and loss (...). This actually works fine in other scenarios, but apparently it does not like that weights b are not in the var_list.
Edit 1: The line that causes the error: a_optim = tf.train.AdamOptimizer(args.lr, beta1=args.beta1).minimize(self.a_loss, var_list=self.a_vars, global_step=self.global_step)
Approach 2: Same as Approach 1, but also include b in the var_list. The problem is now that the network updates a and b, whereas it should just send the gradients through B and only update A.
Edit 2: The line that works, but is not what I want: a_optim = tf.train.AdamOptimizer(args.lr, beta1=args.beta1).minimize(self.a_loss, var_list=self.a_vars+self.b_vars, global_step=self.global_step)
Approach 3: Use tf.stop_gradient(tensor) Holding variables constant during optimizer . From the documentation I infer that this only stops the gradients from flowing further to the left in the graph. I want the ignore variables on the right.
Approach 4: Set tf.Variable(..., trainable=True), but that looks very inflexible if I want to alternate training between A and B.
I found that, for a better control of which variables to update during the optimization, it is better to use: 'compute_gradients' and 'apply_gradients' approach.
The compute_gradients will return a list of tuple of gradients and variables tensors. You can modify the returning gradient tensors whatever you want and also be able to select the subset of variables for updating.
Then, you pass a list of tuple of gradients and variables that you want to update to 'apply_gradients'
Here are some examples:
optimizer = tf.train.AdamOptimizer(learning_rate=0.0001)
grads = optimizer.compute_gradients(your_cost_function)
# You can update 'g' and exclude some v's
grad_lists = [(g, v) for g, v in grads]
train_op = optimizer.apply_gradients(grad_lists)
Then, run your session.
sess.run(train_op, feed_dict={...})
Also, since you have 2 loss functions, you should create 2 train operations.
Hope this help!
It turns out that the final op in A was non-differentiable (tf_argmax) and therefore obviously gradients could not be passed from B to A.