What's the difference between tf.random.normal and tf.distributions.Normal? - tensorflow

What's the difference between tf.random.normal and tf.distributions.Normal? Or the difference between tf.distributions.Multinomial and tf.random.multinomial or anything similar?
Is tf.distributions.Normal used as the backend for tf.random.normal?

I recently looked at tf probability, the new place for tf distributions. This is my understanding:
They are not the same. tf.distributions.Normal will give you a distribution object from which you can sample (this will be same as evaluating the tensor returned by tf.random.normal function call for the same mean and loc values). But, a distribution additionally allows you to evaluate probability of a sample that you provide and all the aspects of having access to a distribution.
For example, you could do the following:
>>> import tensorflow as tf
>>> dist = tf.distributions.Normal(loc=0., scale=1.)
>>> dist.log_prob(tf.random.normal(shape=(3,3)))
<tf.Tensor: id=58, shape=(3, 3), dtype=float32, numpy=
array([[-0.9486696 , -0.95645994, -1.1610177 ],
[-1.244764 , -1.416851 , -1.1236244 ],
[-0.9292835 , -0.98901427, -0.9705758 ]], dtype=float32)>

Related

Is it possible to send different subset of weights to different clients?

I'm trying to use tensorflow-federated to select different subset of weights at the server and send them to the clients. The clients then would train and send back the trained weights. The server aggregates the results and starts a new communication round.
The main problem is that I cannot access the numpy version of the weights and therefore I don't know how to access a subset of them for each layer. I tried using tf.gather_nd and tf.tensor_scatter_nd_update to perform selection and update, but they only work for tensors, and not lists of tensors (as the server_state is in tensorflow-federated).
Does anyone have any hint to solve this problem? Is it even possible to send different weights to each client?
If I follow correctly, a way to write the high-level computation being described in the TFF type shorthand would be:
#tff.federated_computation(...)
def run_one_round(server_state, client_datasets):
weights_subset = tff.federated_map(subset_fn, server_state)
clients_weights_subset = tff.federated_broadcast(weights_subset)
client_models = tff.federated_map(client_training_fn,
(clients_weights_subset, client_datasets))
aggregated_update = tff.federated_aggregate(client_models, ...)
new_server_state = tff.federated_map(apply_aggregated_update_fn, server_state)
return new_server_state
If this is true, it seems like the majority of the work needs to happen in subset_fn which takes the server state and returns a subset of the global mode weights. Generally a model is a structure (list or dict, possibly nested) of tf.Tensor, which as you observed cannot be used as an argument to tf.gather_nd or tf.tensor_scatter_nd_update. However, they can be be applied pointwise to the structure of tensors uses tf.nest.map_structure. For example, selecting the value at [0, 0] from a nested structure of three tensors:
import tensorflow as tf
import pprint
struct_of_tensors = {
'trainable': [tf.constant([[2.0, 4.0, 6.0]]), tf.constant([[5.0]])],
'non_trainable': [tf.constant([[1.0]])],
}
pprint.pprint(tf.nest.map_structure(
lambda tensor: tf.gather_nd(params=tensor, indices=[[0, 0]]),
struct_of_tensors))
>>> {'non_trainable': [<tf.Tensor: shape=(1,), dtype=float32, numpy=array([1.], dtype=float32)>],
'trainable': [<tf.Tensor: shape=(1,), dtype=float32, numpy=array([2.], dtype=float32)>,
<tf.Tensor: shape=(1,), dtype=float32, numpy=array([5.], dtype=float32)>]}

Loss added to custom layer in tensorflow 2 is cleared when compiling

I am trying to port the implementation of concrete dropout in keras in https://github.com/yaringal/ConcreteDropout/blob/master/concrete-dropout-keras.ipynb to tensorflow 2. This is mostly straightforward, as tf 2 has most of the keras API built into it. However, the custom losses are being cleared before fitting.
After the model is defined, and before compiling it, I can see that the losses for each concrete dropout layer have been added to the model losses by the line self.layer.add_loss(regularizer) run when the layers are built:
>>> print(model.losses)
[<tf.Tensor: id=64, shape=(), dtype=float32, numpy=-8.4521576e-05>, <tf.Tensor: id=168, shape=(), dtype=float32, numpy=-0.000650166>, <tf.Tensor: id=272, shape=(), dtype=float32, numpy=-0.000650166>, <tf.Tensor: id=376, shape=(), dtype=float32, numpy=-0.000650166>, <tf.Tensor: id=479, shape=(), dtype=float32, numpy=-0.000650166>]
After the compilation, however, model.losses becomes an empty list, and the assertion assert len(model.losses) == 5 fails. If I choose to ignore the assertion, the fact that the layer losses are being neglected shows up in the warning WARNING:tensorflow:Gradients do not exist for variables ['concrete_dropout/p_logit:0', 'concrete_dropout_1/p_logit:0', 'concrete_dropout_2/p_logit:0', 'concrete_dropout_3/p_logit:0', 'concrete_dropout_4/p_logit:0'] when minimizing the loss. when training the model.
After digging into the compilation code in https://github.com/tensorflow/tensorflow/blob/r2.0/tensorflow/python/keras/engine/training.py#L184 I believe the problematic lines are
# Clear any `_eager_losses` that was added.
self._clear_losses()
Why is this being done at model compilation? And how can I add a loss per layer in tensorflow 2, if this is not the way to do it?
Since that custom loss is not dependent on the model's inputs, you should add it using a zero-argument callable like this:
self.layer.add_loss(lambda: regularizer)

How to debug Keras in TensorFlow 2.0?

Actually, I find the problem already in TensorFlow 1.13.0. (tensorflow1.12.0 works well).
My code is listed as a simple example:
def Lambda layer(temp):
print(temp)
return temp
which is used as a lambda layer in my Keras model.
In tensorflow1.12.0, the print(temp) can output the detail data like following
[<tf.Tensor: id=250, shape=(1024, 2, 32), dtype=complex64, numpy=
array([[[ 7.68014073e-01+0.95353246j, 7.01403618e-01+0.64385843j,
8.30483198e-01+1.0340731j , ..., -8.88018191e-01+0.4751519j ,
-1.20197642e+00+0.6313924j , -1.03787208e+00+0.22964947j],
[-7.94382274e-01+0.56390345j, -4.73938555e-01+0.55901265j,
-8.73749971e-01+0.67095983j, ..., -5.81580341e-01-0.91620034j,
-7.04443693e-01-1.2709806j , -3.23135853e-01-1.0887597j ]],
It is because I use the 1024 as batch_size. But when I update to tensorflow1.13.0 or TensorFlow 2.0, the same code's output
Tensor("lambda_1/truediv:0", shape=(None, 1), dtype=float32)
This is terrible since I can not know the exact mistakes.
So, any idea about how to solve it?
You see that output because the Keras model is being converted to its graph representation, and thus print printes the tf.Tensor graph description.
To see the content of a tf.Tensor when using Tensorflow 2.0 you should use tf.print instead of print since the former gets converted to its graph representation while the latter doesn't.

Using seed to sample in tensorflow-probability

I am trying to use tensorflow-probability and started off with something really simple:
import tensorflow as tf
import tensorflow_probability as tfp
tf.enable_eager_execution()
tfd = tfp.distributions
poiss = tfd.Poisson(0.8)
poiss.sample(2, seed=1)
#> Out: <tf.Tensor: id=3569, shape=(2,), dtype=float32, numpy=array([0., 0.], dtype=float32)>
poiss.sample(2, seed=1)
#> Out: <tf.Tensor: id=3695, shape=(2,), dtype=float32, numpy=array([1., 0.], dtype=float32)>
poiss.sample(2, seed=1)
#> Out: <tf.Tensor: id=3824, shape=(2,), dtype=float32, numpy=array([2., 2.], dtype=float32)>
poiss.sample(2, seed=1)
#> Out: <tf.Tensor: id=3956, shape=(2,), dtype=float32, numpy=array([0., 1.], dtype=float32)>
I was thinking I would get the same results when re-using the same seed, but somehow that's not true.
I also tried without eager execution, but the results still weren't reproducible. Same story if I add something like tf.set_random_seed(12).
I suppose there is something basic I am missing?
For those interested, I am running Python 3.5.2 on Ubuntu 16.04 with
tensorflow-probability==0.5.0
tensorflow==1.12.0
For deterministic output in graph mode, you need to set both the graph random seed (tf.set_random_seed) and the op random seed (seed= in your sample call).
The workings of random samplers in TFv2 are still being sorted out. For now, my best understanding is that you can call tf.set_random_seed prior to each call to a sampler, and pass the sampler a seed=, if you want deterministic output in eager.
This is now cleaner, we support fully deterministic randomness in TFP. You can pass a tuple of two ints for seed, or a Tensor of shape (2,) to trigger the deterministic behavior. tfp.random.split_seed is also relevant here.
Besides setting the seed for sample or sample_chain in mcmc, you might need to set the followings as well:
seed = 24
os.environ['TF_DETERMINISTIC_OPS'] = 'true'
os.environ['PYTHONHASHSEED'] = f'{seed}'
np.random.seed(seed)
random.seed(seed)
tf.random.set_seed(seed)

Keras Backend: Difference between random_normal and random_normal_variable

My neural network has a custom layer, which takes an input vector x, generates a normally distributed tensor A and returns both A (used in subsequent layers) and the product Ax. Assuming I want to reuse the value stored in A at the output of the custom layer, in a second different layer, is there any subtle aspect that I need to factor while determining which Keras backend function (K.backend.random_normal or K.backend.random_normal_variable) I should use in order to generate A?
a) The backend function random_normal returns a tensor storing a different value following each call (see code snippet below). To me, this suggests that random_normal acts as a generator of normally distributed values. Does this mean that one should not use random_normal to generate a normally distributed tensor if they want to hold its value following calls?
b) The backend function random_normal_variable appears safer (see code snippet below) as it retains value across calls.
Is my conceptual understanding correct? Or am I missing something basic?
I am using Keras 2.1.2 and Tensorflow 1.4.0.
Experiment with random_normal (value changes across calls):
In [5]: A = K.random_normal(shape = (2,2), mean=0.0, stddev=0.5)
In [6]: K.get_value(A)
Out[6]: array([[ 0.4459489 , -0.82019573],
[-0.39853573, -0.33919844]], dtype=float32)
In [7]: K.get_value(A)
Out[7]: array([[-0.37467018, 0.42445764],
[-0.573843 , -0.3468301 ]], dtype=float32)
Experiment with random_normal_variable (value holds across calls):
In [9]: B = K.random_normal_variable(shape=(2,2), mean=0., scale=0.5)
In [10]: K.get_value(B)
Out[10]: array([[ 0.07700552, 0.28008622],
[-0.69484973, -1.32078779]], dtype=float32)
In [11]: K.get_value(B)
Out[11]: array([[ 0.07700552, 0.28008622],
[-0.69484973, -1.32078779]], dtype=float32)
From my understanding, this is due to the fact that random_normal_variable returns an instantiated Variable while random_normal returns a Tensor.
K.random_normal(shape=(2,2), mean=0.0, stddev=0.5)
<tf.Tensor 'random_normal:0' shape=(2, 2) dtype=float32>
K.random_normal_variable(shape=(2,2), mean=0.0, scale=0.5)
<tf.Variable 'Variable:0' shape=(2, 2) dtype=float32_ref>
As for why the values vary for the Tensor and not for the Variable, I think the answer to this thread sums it up well:
Variable is basically a wrapper on Tensor that maintains state across multiple calls to run [...]
The answer also mentions that the variable needs to be initialized to evaluate it, which is the case here as you noticed (since you did not initialize the variable to evaluate it). In fact, the returned variable is already initialized thanks to a call to tensorflow.random_normal_initializer within the random_normal_variable function. Hope this clarifies why your code has this behaviour.