Keras Backend: Difference between random_normal and random_normal_variable - tensorflow

My neural network has a custom layer, which takes an input vector x, generates a normally distributed tensor A and returns both A (used in subsequent layers) and the product Ax. Assuming I want to reuse the value stored in A at the output of the custom layer, in a second different layer, is there any subtle aspect that I need to factor while determining which Keras backend function (K.backend.random_normal or K.backend.random_normal_variable) I should use in order to generate A?
a) The backend function random_normal returns a tensor storing a different value following each call (see code snippet below). To me, this suggests that random_normal acts as a generator of normally distributed values. Does this mean that one should not use random_normal to generate a normally distributed tensor if they want to hold its value following calls?
b) The backend function random_normal_variable appears safer (see code snippet below) as it retains value across calls.
Is my conceptual understanding correct? Or am I missing something basic?
I am using Keras 2.1.2 and Tensorflow 1.4.0.
Experiment with random_normal (value changes across calls):
In [5]: A = K.random_normal(shape = (2,2), mean=0.0, stddev=0.5)
In [6]: K.get_value(A)
Out[6]: array([[ 0.4459489 , -0.82019573],
[-0.39853573, -0.33919844]], dtype=float32)
In [7]: K.get_value(A)
Out[7]: array([[-0.37467018, 0.42445764],
[-0.573843 , -0.3468301 ]], dtype=float32)
Experiment with random_normal_variable (value holds across calls):
In [9]: B = K.random_normal_variable(shape=(2,2), mean=0., scale=0.5)
In [10]: K.get_value(B)
Out[10]: array([[ 0.07700552, 0.28008622],
[-0.69484973, -1.32078779]], dtype=float32)
In [11]: K.get_value(B)
Out[11]: array([[ 0.07700552, 0.28008622],
[-0.69484973, -1.32078779]], dtype=float32)

From my understanding, this is due to the fact that random_normal_variable returns an instantiated Variable while random_normal returns a Tensor.
K.random_normal(shape=(2,2), mean=0.0, stddev=0.5)
<tf.Tensor 'random_normal:0' shape=(2, 2) dtype=float32>
K.random_normal_variable(shape=(2,2), mean=0.0, scale=0.5)
<tf.Variable 'Variable:0' shape=(2, 2) dtype=float32_ref>
As for why the values vary for the Tensor and not for the Variable, I think the answer to this thread sums it up well:
Variable is basically a wrapper on Tensor that maintains state across multiple calls to run [...]
The answer also mentions that the variable needs to be initialized to evaluate it, which is the case here as you noticed (since you did not initialize the variable to evaluate it). In fact, the returned variable is already initialized thanks to a call to tensorflow.random_normal_initializer within the random_normal_variable function. Hope this clarifies why your code has this behaviour.

Related

'Tensor' vs 'tf.Tensor' tensorflow

I am exploring tensorflow internals, and sometimes when I print the value of a tensor, I will see data like the following: Tensor("x/PlaceholderWithDefault:0", shape=(), dtype=int32)
and other times I will see tf.Tensor(0, shape=(), dtype=int32).
What is the difference between these two expressions? Are Tensor and tf.Tensor different? And if not, why are they displayed differently (and seem to have different behavior)?

Is it possible to send different subset of weights to different clients?

I'm trying to use tensorflow-federated to select different subset of weights at the server and send them to the clients. The clients then would train and send back the trained weights. The server aggregates the results and starts a new communication round.
The main problem is that I cannot access the numpy version of the weights and therefore I don't know how to access a subset of them for each layer. I tried using tf.gather_nd and tf.tensor_scatter_nd_update to perform selection and update, but they only work for tensors, and not lists of tensors (as the server_state is in tensorflow-federated).
Does anyone have any hint to solve this problem? Is it even possible to send different weights to each client?
If I follow correctly, a way to write the high-level computation being described in the TFF type shorthand would be:
#tff.federated_computation(...)
def run_one_round(server_state, client_datasets):
weights_subset = tff.federated_map(subset_fn, server_state)
clients_weights_subset = tff.federated_broadcast(weights_subset)
client_models = tff.federated_map(client_training_fn,
(clients_weights_subset, client_datasets))
aggregated_update = tff.federated_aggregate(client_models, ...)
new_server_state = tff.federated_map(apply_aggregated_update_fn, server_state)
return new_server_state
If this is true, it seems like the majority of the work needs to happen in subset_fn which takes the server state and returns a subset of the global mode weights. Generally a model is a structure (list or dict, possibly nested) of tf.Tensor, which as you observed cannot be used as an argument to tf.gather_nd or tf.tensor_scatter_nd_update. However, they can be be applied pointwise to the structure of tensors uses tf.nest.map_structure. For example, selecting the value at [0, 0] from a nested structure of three tensors:
import tensorflow as tf
import pprint
struct_of_tensors = {
'trainable': [tf.constant([[2.0, 4.0, 6.0]]), tf.constant([[5.0]])],
'non_trainable': [tf.constant([[1.0]])],
}
pprint.pprint(tf.nest.map_structure(
lambda tensor: tf.gather_nd(params=tensor, indices=[[0, 0]]),
struct_of_tensors))
>>> {'non_trainable': [<tf.Tensor: shape=(1,), dtype=float32, numpy=array([1.], dtype=float32)>],
'trainable': [<tf.Tensor: shape=(1,), dtype=float32, numpy=array([2.], dtype=float32)>,
<tf.Tensor: shape=(1,), dtype=float32, numpy=array([5.], dtype=float32)>]}

Create a TF Dataset of SparseTensors with from_generator

I have a generator that yields tf.sparse.SparseTensors. I want to turn this into a Tensorflow Dataset, but am running into some issues. I am using TF2. First, unlike regular Tensors, you cannot simply pass them in (and providing the correct data types for output_types). For a sparse tensor of [1,0,0,0,5,0], the error looks like
tensorflow.python.framework.errors_impl.InvalidArgumentError: TypeError: `generator` yielded an element that could not be converted to the expected type. The expected type was int64, but the yielded element was SparseTensor(indices=tf.Tensor(
E [[0]
E [4]], shape=(2, 1), dtype=int64), values=tf.Tensor([1 5], shape=(2,), dtype=int64), dense_shape=tf.Tensor([6], shape=(1,), dtype=int64)).
After doing some looking around on the internet, I found this open issue and tried to do something similar https://github.com/tensorflow/tensorflow/issues/16689 - read the indices, values, and shape as separate tensors into a TF Dataset, and then mapping over the dataset to create the sparse tensor. This is not working as shown in some of the examples in the github issue - tf.sparse.SparseTensor(indices, values, shape) does not seem to accept indices and shape in the form of a tf.Tensor - it will happily take in a list or numpy array, but not a Tensor. Since map is not eager, I also cannot call .numpy() on the Tensor either. What is best way to get this to work? I see there is tf.py_function/tf.numpy_function which could help, but constructing the output type can be tricky (though not impossible) for my use case - the incoming data is not fixed and can have a mix of sparse and dense tensors.

How to debug Keras in TensorFlow 2.0?

Actually, I find the problem already in TensorFlow 1.13.0. (tensorflow1.12.0 works well).
My code is listed as a simple example:
def Lambda layer(temp):
print(temp)
return temp
which is used as a lambda layer in my Keras model.
In tensorflow1.12.0, the print(temp) can output the detail data like following
[<tf.Tensor: id=250, shape=(1024, 2, 32), dtype=complex64, numpy=
array([[[ 7.68014073e-01+0.95353246j, 7.01403618e-01+0.64385843j,
8.30483198e-01+1.0340731j , ..., -8.88018191e-01+0.4751519j ,
-1.20197642e+00+0.6313924j , -1.03787208e+00+0.22964947j],
[-7.94382274e-01+0.56390345j, -4.73938555e-01+0.55901265j,
-8.73749971e-01+0.67095983j, ..., -5.81580341e-01-0.91620034j,
-7.04443693e-01-1.2709806j , -3.23135853e-01-1.0887597j ]],
It is because I use the 1024 as batch_size. But when I update to tensorflow1.13.0 or TensorFlow 2.0, the same code's output
Tensor("lambda_1/truediv:0", shape=(None, 1), dtype=float32)
This is terrible since I can not know the exact mistakes.
So, any idea about how to solve it?
You see that output because the Keras model is being converted to its graph representation, and thus print printes the tf.Tensor graph description.
To see the content of a tf.Tensor when using Tensorflow 2.0 you should use tf.print instead of print since the former gets converted to its graph representation while the latter doesn't.

Tensorflow: what exactly does tf.gradients() return

Quick question as I'm kind of confused here.
Let's say we have a simple graph:
a = tf.Variable(tf.truncated_normal(shape=[200, 1], mean=0., stddev=.5))
b = tf.Variable(tf.truncated_normal(shape=[200, 100], mean=0., stddev=.5))
add = a+b
add
<tf.Tensor 'add:0' shape=(200, 100) dtype=float32> #shape is because of broadcasting
So I've got a node that takes in 2 tensors, and produces 1 tensor as an output. Let's now run tf.gradients on it
tf.gradients(add, [a, b])
[<tf.Tensor 'gradients/add_grad/Reshape:0' shape=(200, 1) dtype=float32>,
<tf.Tensor 'gradients/add_grad/Reshape_1:0' shape=(200, 100) dtype=float32>]
So we get gradients exactly in the shape of the input tensors. But... why?
Not like there's a single metric with respect to which we can take the partial derivative. Shouldn't the gradients map from every single value of the input tensors to every single value of the output tensor, effectively giving a 200x1x200x100 gradients for input a?
This is just a simple example where every element of the output tensor depends only on one value from tensor b, and one row from tensor a. However if we did something more complicated, like running a gaussian blur on a tensor then gradients would surely have to be bigger than just the input tensor.
What am I getting here wrong?
By default tf.gradients takes the gradient of the scalar you get by summing all elements of all tensors passed to tf.gradients as outputs.