Is it possible to send different subset of weights to different clients? - tensorflow

I'm trying to use tensorflow-federated to select different subset of weights at the server and send them to the clients. The clients then would train and send back the trained weights. The server aggregates the results and starts a new communication round.
The main problem is that I cannot access the numpy version of the weights and therefore I don't know how to access a subset of them for each layer. I tried using tf.gather_nd and tf.tensor_scatter_nd_update to perform selection and update, but they only work for tensors, and not lists of tensors (as the server_state is in tensorflow-federated).
Does anyone have any hint to solve this problem? Is it even possible to send different weights to each client?

If I follow correctly, a way to write the high-level computation being described in the TFF type shorthand would be:
#tff.federated_computation(...)
def run_one_round(server_state, client_datasets):
weights_subset = tff.federated_map(subset_fn, server_state)
clients_weights_subset = tff.federated_broadcast(weights_subset)
client_models = tff.federated_map(client_training_fn,
(clients_weights_subset, client_datasets))
aggregated_update = tff.federated_aggregate(client_models, ...)
new_server_state = tff.federated_map(apply_aggregated_update_fn, server_state)
return new_server_state
If this is true, it seems like the majority of the work needs to happen in subset_fn which takes the server state and returns a subset of the global mode weights. Generally a model is a structure (list or dict, possibly nested) of tf.Tensor, which as you observed cannot be used as an argument to tf.gather_nd or tf.tensor_scatter_nd_update. However, they can be be applied pointwise to the structure of tensors uses tf.nest.map_structure. For example, selecting the value at [0, 0] from a nested structure of three tensors:
import tensorflow as tf
import pprint
struct_of_tensors = {
'trainable': [tf.constant([[2.0, 4.0, 6.0]]), tf.constant([[5.0]])],
'non_trainable': [tf.constant([[1.0]])],
}
pprint.pprint(tf.nest.map_structure(
lambda tensor: tf.gather_nd(params=tensor, indices=[[0, 0]]),
struct_of_tensors))
>>> {'non_trainable': [<tf.Tensor: shape=(1,), dtype=float32, numpy=array([1.], dtype=float32)>],
'trainable': [<tf.Tensor: shape=(1,), dtype=float32, numpy=array([2.], dtype=float32)>,
<tf.Tensor: shape=(1,), dtype=float32, numpy=array([5.], dtype=float32)>]}

Related

'Tensor' vs 'tf.Tensor' tensorflow

I am exploring tensorflow internals, and sometimes when I print the value of a tensor, I will see data like the following: Tensor("x/PlaceholderWithDefault:0", shape=(), dtype=int32)
and other times I will see tf.Tensor(0, shape=(), dtype=int32).
What is the difference between these two expressions? Are Tensor and tf.Tensor different? And if not, why are they displayed differently (and seem to have different behavior)?

GluonCV object detection fine-tuning - Select which layers are modified (freeze the rest)

I have a question about the procedure for fine-tuning a pre-trained object detection model with GluonCV, described in this tutorial.
As far as I understand, the described procedure modifies all the weight values in the model.
I wanted to only fine-tune the fully connected layer at the end of the network, and freeze the rest of the weights.
I assume that I should specify which parameters I want to modify when creating the Trainer:
trainer = gluon.Trainer(net.collect_params(), 'sgd', {'learning_rate': 0.001, 'wd': 0.0005, 'momentum': 0.9})
so, instead of net.collect_params(), I should list the parameters I’m interested in training, and run the rest of the process normally.
However, I don’t know how to isolate these parameters precisely…I tried printing:
params = net.collect_params()
but, out of this list, I don’t know which ones correspond to the final FC layers. Any suggestions?
Let's say we have a pretrained Gluon model for a classification task:
>>> import mxnet as mx
>>> net = mx.gluon.nn.HybridSequential()
>>> net.add(mx.gluon.nn.Conv2D(channels=6, kernel_size=5, padding=2, activation='sigmoid'))
>>> net.add(mx.gluon.nn.MaxPool2D(pool_size=2, strides=2))
>>> net.add(mx.gluon.nn.Flatten())
>>> net.add(mx.gluon.nn.Dense(units=10))
>>> net.collect_params()
hybridsequential0_ (
Parameter conv0_weight (shape=(6, 0, 5, 5), dtype=<class 'numpy.float32'>)
Parameter conv0_bias (shape=(6,), dtype=<class 'numpy.float32'>)
Parameter dense0_weight (shape=(1, 0), dtype=float32)
Parameter dense0_bias (shape=(1,), dtype=float32)
)
To fine-tune this convolutional network, we want to freeze all the blocks except Dense.
First, recall that collect_params method accepts a regexp string to choose specific block parameters by their names (or prefixes; prefix parameter of Conv2D, Dense, or any other Gluon (hybrid) block). By default, the prefixes are class names, i.e. if a block is Conv2D then the prefix is conv0_ or conv1_ etc. Moreover, collect_params returns an instance of mxnet.gluon.parameter.ParameterDict, which has setattr method.
Solution:
>>> conv_params = net.collect_params('(?!dense).*')
>>> conv_params.setattr('grad_req', 'null')
or simply
>>> net.collect_params('(?!dense).*').setattr('grad_req', 'null')
Here we exclude all the parameters matching dense to get only conv blocks and set their grad_req attributes to 'null'. Now, training the model net with mxnet.gluon.Trainer will update only dense parameters.
It is more convenient to have a pretrained model with separate attributes indicating specific blocks, e.g. the features block, anchor generators etc. In our case, we have a convolutional network that extracts features and passes them to an output block.
class ConvNet(mx.gluon.nn.HybridSequential):
def __init__(self, n_classes, params=None, prefix=None):
super().__init__(params=params, prefix=prefix)
self.features = mx.gluon.nn.HybridSequential()
self.features.add(mx.gluon.nn.Conv2D(channels=6, kernel_size=5, padding=2,
activation='sigmoid'))
self.add(mx.gluon.nn.MaxPool2D(pool_size=2, strides=2))
self.add(mx.gluon.nn.Flatten())
self.output = mx.gluon.nn.Dense(units=n_classes)
def hybrid_forward(self, F, x):
x = self.features(x)
return self.output(x)
With this convnet declaration, we don't have to use regexps to access required blocks:
>>> net = ConvNet(n_classes=10)
>>> net.features.collect_params().setattr('grad_req', 'null')
Gluon CV models follow exactly this pattern. See the documentation of the desired model and choose the blocks you would like to freeze. If the docs are empty, run
collect_params to see all the parameters and filter out with regexp the ones to fine-tune and set the returned parameters' grad_req to 'null'.

What's the difference between tf.random.normal and tf.distributions.Normal?

What's the difference between tf.random.normal and tf.distributions.Normal? Or the difference between tf.distributions.Multinomial and tf.random.multinomial or anything similar?
Is tf.distributions.Normal used as the backend for tf.random.normal?
I recently looked at tf probability, the new place for tf distributions. This is my understanding:
They are not the same. tf.distributions.Normal will give you a distribution object from which you can sample (this will be same as evaluating the tensor returned by tf.random.normal function call for the same mean and loc values). But, a distribution additionally allows you to evaluate probability of a sample that you provide and all the aspects of having access to a distribution.
For example, you could do the following:
>>> import tensorflow as tf
>>> dist = tf.distributions.Normal(loc=0., scale=1.)
>>> dist.log_prob(tf.random.normal(shape=(3,3)))
<tf.Tensor: id=58, shape=(3, 3), dtype=float32, numpy=
array([[-0.9486696 , -0.95645994, -1.1610177 ],
[-1.244764 , -1.416851 , -1.1236244 ],
[-0.9292835 , -0.98901427, -0.9705758 ]], dtype=float32)>

How to debug Keras in TensorFlow 2.0?

Actually, I find the problem already in TensorFlow 1.13.0. (tensorflow1.12.0 works well).
My code is listed as a simple example:
def Lambda layer(temp):
print(temp)
return temp
which is used as a lambda layer in my Keras model.
In tensorflow1.12.0, the print(temp) can output the detail data like following
[<tf.Tensor: id=250, shape=(1024, 2, 32), dtype=complex64, numpy=
array([[[ 7.68014073e-01+0.95353246j, 7.01403618e-01+0.64385843j,
8.30483198e-01+1.0340731j , ..., -8.88018191e-01+0.4751519j ,
-1.20197642e+00+0.6313924j , -1.03787208e+00+0.22964947j],
[-7.94382274e-01+0.56390345j, -4.73938555e-01+0.55901265j,
-8.73749971e-01+0.67095983j, ..., -5.81580341e-01-0.91620034j,
-7.04443693e-01-1.2709806j , -3.23135853e-01-1.0887597j ]],
It is because I use the 1024 as batch_size. But when I update to tensorflow1.13.0 or TensorFlow 2.0, the same code's output
Tensor("lambda_1/truediv:0", shape=(None, 1), dtype=float32)
This is terrible since I can not know the exact mistakes.
So, any idea about how to solve it?
You see that output because the Keras model is being converted to its graph representation, and thus print printes the tf.Tensor graph description.
To see the content of a tf.Tensor when using Tensorflow 2.0 you should use tf.print instead of print since the former gets converted to its graph representation while the latter doesn't.

Keras Backend: Difference between random_normal and random_normal_variable

My neural network has a custom layer, which takes an input vector x, generates a normally distributed tensor A and returns both A (used in subsequent layers) and the product Ax. Assuming I want to reuse the value stored in A at the output of the custom layer, in a second different layer, is there any subtle aspect that I need to factor while determining which Keras backend function (K.backend.random_normal or K.backend.random_normal_variable) I should use in order to generate A?
a) The backend function random_normal returns a tensor storing a different value following each call (see code snippet below). To me, this suggests that random_normal acts as a generator of normally distributed values. Does this mean that one should not use random_normal to generate a normally distributed tensor if they want to hold its value following calls?
b) The backend function random_normal_variable appears safer (see code snippet below) as it retains value across calls.
Is my conceptual understanding correct? Or am I missing something basic?
I am using Keras 2.1.2 and Tensorflow 1.4.0.
Experiment with random_normal (value changes across calls):
In [5]: A = K.random_normal(shape = (2,2), mean=0.0, stddev=0.5)
In [6]: K.get_value(A)
Out[6]: array([[ 0.4459489 , -0.82019573],
[-0.39853573, -0.33919844]], dtype=float32)
In [7]: K.get_value(A)
Out[7]: array([[-0.37467018, 0.42445764],
[-0.573843 , -0.3468301 ]], dtype=float32)
Experiment with random_normal_variable (value holds across calls):
In [9]: B = K.random_normal_variable(shape=(2,2), mean=0., scale=0.5)
In [10]: K.get_value(B)
Out[10]: array([[ 0.07700552, 0.28008622],
[-0.69484973, -1.32078779]], dtype=float32)
In [11]: K.get_value(B)
Out[11]: array([[ 0.07700552, 0.28008622],
[-0.69484973, -1.32078779]], dtype=float32)
From my understanding, this is due to the fact that random_normal_variable returns an instantiated Variable while random_normal returns a Tensor.
K.random_normal(shape=(2,2), mean=0.0, stddev=0.5)
<tf.Tensor 'random_normal:0' shape=(2, 2) dtype=float32>
K.random_normal_variable(shape=(2,2), mean=0.0, scale=0.5)
<tf.Variable 'Variable:0' shape=(2, 2) dtype=float32_ref>
As for why the values vary for the Tensor and not for the Variable, I think the answer to this thread sums it up well:
Variable is basically a wrapper on Tensor that maintains state across multiple calls to run [...]
The answer also mentions that the variable needs to be initialized to evaluate it, which is the case here as you noticed (since you did not initialize the variable to evaluate it). In fact, the returned variable is already initialized thanks to a call to tensorflow.random_normal_initializer within the random_normal_variable function. Hope this clarifies why your code has this behaviour.