Actually, I find the problem already in TensorFlow 1.13.0. (tensorflow1.12.0 works well).
My code is listed as a simple example:
def Lambda layer(temp):
print(temp)
return temp
which is used as a lambda layer in my Keras model.
In tensorflow1.12.0, the print(temp) can output the detail data like following
[<tf.Tensor: id=250, shape=(1024, 2, 32), dtype=complex64, numpy=
array([[[ 7.68014073e-01+0.95353246j, 7.01403618e-01+0.64385843j,
8.30483198e-01+1.0340731j , ..., -8.88018191e-01+0.4751519j ,
-1.20197642e+00+0.6313924j , -1.03787208e+00+0.22964947j],
[-7.94382274e-01+0.56390345j, -4.73938555e-01+0.55901265j,
-8.73749971e-01+0.67095983j, ..., -5.81580341e-01-0.91620034j,
-7.04443693e-01-1.2709806j , -3.23135853e-01-1.0887597j ]],
It is because I use the 1024 as batch_size. But when I update to tensorflow1.13.0 or TensorFlow 2.0, the same code's output
Tensor("lambda_1/truediv:0", shape=(None, 1), dtype=float32)
This is terrible since I can not know the exact mistakes.
So, any idea about how to solve it?
You see that output because the Keras model is being converted to its graph representation, and thus print printes the tf.Tensor graph description.
To see the content of a tf.Tensor when using Tensorflow 2.0 you should use tf.print instead of print since the former gets converted to its graph representation while the latter doesn't.
Related
I am exploring tensorflow internals, and sometimes when I print the value of a tensor, I will see data like the following: Tensor("x/PlaceholderWithDefault:0", shape=(), dtype=int32)
and other times I will see tf.Tensor(0, shape=(), dtype=int32).
What is the difference between these two expressions? Are Tensor and tf.Tensor different? And if not, why are they displayed differently (and seem to have different behavior)?
I'm trying to use tensorflow-federated to select different subset of weights at the server and send them to the clients. The clients then would train and send back the trained weights. The server aggregates the results and starts a new communication round.
The main problem is that I cannot access the numpy version of the weights and therefore I don't know how to access a subset of them for each layer. I tried using tf.gather_nd and tf.tensor_scatter_nd_update to perform selection and update, but they only work for tensors, and not lists of tensors (as the server_state is in tensorflow-federated).
Does anyone have any hint to solve this problem? Is it even possible to send different weights to each client?
If I follow correctly, a way to write the high-level computation being described in the TFF type shorthand would be:
#tff.federated_computation(...)
def run_one_round(server_state, client_datasets):
weights_subset = tff.federated_map(subset_fn, server_state)
clients_weights_subset = tff.federated_broadcast(weights_subset)
client_models = tff.federated_map(client_training_fn,
(clients_weights_subset, client_datasets))
aggregated_update = tff.federated_aggregate(client_models, ...)
new_server_state = tff.federated_map(apply_aggregated_update_fn, server_state)
return new_server_state
If this is true, it seems like the majority of the work needs to happen in subset_fn which takes the server state and returns a subset of the global mode weights. Generally a model is a structure (list or dict, possibly nested) of tf.Tensor, which as you observed cannot be used as an argument to tf.gather_nd or tf.tensor_scatter_nd_update. However, they can be be applied pointwise to the structure of tensors uses tf.nest.map_structure. For example, selecting the value at [0, 0] from a nested structure of three tensors:
import tensorflow as tf
import pprint
struct_of_tensors = {
'trainable': [tf.constant([[2.0, 4.0, 6.0]]), tf.constant([[5.0]])],
'non_trainable': [tf.constant([[1.0]])],
}
pprint.pprint(tf.nest.map_structure(
lambda tensor: tf.gather_nd(params=tensor, indices=[[0, 0]]),
struct_of_tensors))
>>> {'non_trainable': [<tf.Tensor: shape=(1,), dtype=float32, numpy=array([1.], dtype=float32)>],
'trainable': [<tf.Tensor: shape=(1,), dtype=float32, numpy=array([2.], dtype=float32)>,
<tf.Tensor: shape=(1,), dtype=float32, numpy=array([5.], dtype=float32)>]}
Is there a way to access the current Keras training step as a tensor in the tensorflow graph?
I am trying to build a model which has an 'epsilon' parameter which is decayed as a function of the current training step.
epsilon = some_fn_of(K.global_step) # <- Something like this?
self.q = K.Sequential([
K.layers.InputLayer(input_shape),
K.layers.Dense(n, name='q'),
K.layers.Lambda(lambda x: tf.cond(tf.random.uniform((), 0, 1) < epsilon,
lambda _: tf.constant(0.0),
lambda ac: ac)
], name='q')
FYI: I'm using the Tensorflow bundled Keras.
I don't know if this will work for all purposes, but it looks like you can find the next training step number using model.optimizer.iterations. The variable name appears to have the format "<optimizer name>/iter:0". You can find the iterations property in the Optimizer documentation. Example value:
<tf.Variable 'Adam/iter:0' shape=() dtype=int64, numpy=5978>
I suspect that Keras does not have any such tensor in the graph and that the only way to access the step is through Callbacks (Keras Docs,Tensorflow Docs). Especially since Keras is meant to be agnostic to the backend, and so would likely maintain the step outside of tensorflow.
What's the difference between tf.random.normal and tf.distributions.Normal? Or the difference between tf.distributions.Multinomial and tf.random.multinomial or anything similar?
Is tf.distributions.Normal used as the backend for tf.random.normal?
I recently looked at tf probability, the new place for tf distributions. This is my understanding:
They are not the same. tf.distributions.Normal will give you a distribution object from which you can sample (this will be same as evaluating the tensor returned by tf.random.normal function call for the same mean and loc values). But, a distribution additionally allows you to evaluate probability of a sample that you provide and all the aspects of having access to a distribution.
For example, you could do the following:
>>> import tensorflow as tf
>>> dist = tf.distributions.Normal(loc=0., scale=1.)
>>> dist.log_prob(tf.random.normal(shape=(3,3)))
<tf.Tensor: id=58, shape=(3, 3), dtype=float32, numpy=
array([[-0.9486696 , -0.95645994, -1.1610177 ],
[-1.244764 , -1.416851 , -1.1236244 ],
[-0.9292835 , -0.98901427, -0.9705758 ]], dtype=float32)>
My neural network has a custom layer, which takes an input vector x, generates a normally distributed tensor A and returns both A (used in subsequent layers) and the product Ax. Assuming I want to reuse the value stored in A at the output of the custom layer, in a second different layer, is there any subtle aspect that I need to factor while determining which Keras backend function (K.backend.random_normal or K.backend.random_normal_variable) I should use in order to generate A?
a) The backend function random_normal returns a tensor storing a different value following each call (see code snippet below). To me, this suggests that random_normal acts as a generator of normally distributed values. Does this mean that one should not use random_normal to generate a normally distributed tensor if they want to hold its value following calls?
b) The backend function random_normal_variable appears safer (see code snippet below) as it retains value across calls.
Is my conceptual understanding correct? Or am I missing something basic?
I am using Keras 2.1.2 and Tensorflow 1.4.0.
Experiment with random_normal (value changes across calls):
In [5]: A = K.random_normal(shape = (2,2), mean=0.0, stddev=0.5)
In [6]: K.get_value(A)
Out[6]: array([[ 0.4459489 , -0.82019573],
[-0.39853573, -0.33919844]], dtype=float32)
In [7]: K.get_value(A)
Out[7]: array([[-0.37467018, 0.42445764],
[-0.573843 , -0.3468301 ]], dtype=float32)
Experiment with random_normal_variable (value holds across calls):
In [9]: B = K.random_normal_variable(shape=(2,2), mean=0., scale=0.5)
In [10]: K.get_value(B)
Out[10]: array([[ 0.07700552, 0.28008622],
[-0.69484973, -1.32078779]], dtype=float32)
In [11]: K.get_value(B)
Out[11]: array([[ 0.07700552, 0.28008622],
[-0.69484973, -1.32078779]], dtype=float32)
From my understanding, this is due to the fact that random_normal_variable returns an instantiated Variable while random_normal returns a Tensor.
K.random_normal(shape=(2,2), mean=0.0, stddev=0.5)
<tf.Tensor 'random_normal:0' shape=(2, 2) dtype=float32>
K.random_normal_variable(shape=(2,2), mean=0.0, scale=0.5)
<tf.Variable 'Variable:0' shape=(2, 2) dtype=float32_ref>
As for why the values vary for the Tensor and not for the Variable, I think the answer to this thread sums it up well:
Variable is basically a wrapper on Tensor that maintains state across multiple calls to run [...]
The answer also mentions that the variable needs to be initialized to evaluate it, which is the case here as you noticed (since you did not initialize the variable to evaluate it). In fact, the returned variable is already initialized thanks to a call to tensorflow.random_normal_initializer within the random_normal_variable function. Hope this clarifies why your code has this behaviour.