Effect of nested #tf.function calls on computation graph - tensorflow

Consider the following code:
import tensorflow as tf
#tf.function
def inner(tensor):
tf.print(tensor)
#tf.function
def outer(tensor):
tf.print(tensor)
inner(tensor)
tensor = tf.convert_to_tensor([1, 2, 3], dtype=tf.int32)
writer = tf.summary.create_file_writer('logs/outer')
tf.summary.trace_on(graph=True)
outer(tensor)
with writer.as_default():
tf.summary.trace_export(name='outer', step=0)
When I inspect the computation graph in tensorboard, it looks as follows:
Notice that inner is represented by a StatefulPartitionedCall. Then there's another thing in the tensorboard output, which I theorize is the actual op instantiation of inner but that doesn't have an apparent tie to the StatefulPartionedCall.
What are the conclusions of this? Do inner and outer both get separate computation graphs? Can I be confident that inner is still executing graph style (my empirical testing says yes). Is there a performance hit from not having it all inline in a single graph, or is it all effectively still in a single graph?
Thanks for any insight.

Yes. When you create a tf.function, you are creating a new graph for the operations within that function. All computed tensors then belong to this created graph an can only be accessed if returned from the tf.function, irrespective of inner/outer graph relationship (under normal use anyway - there are ways to capture tensors from an outer graph).
The following snippet demonstrates that for each tf.function in your example, a separate tf.FuncGraph is created.
import tensorflow as tf
#tf.function
def inner(tensor):
g = tensor.op.graph
print(f"In inner tf.function, we have graph {g} named {g.name}. "
f"The graph {g.name} has {g.outer_graph} as it's outer graph, "
f"which is named {g.outer_graph.name}.")
#tf.function
def outer(tensor):
inner(tensor)
g = tensor.op.graph
print(f"In outer tf.function, we have graph {g} named {g.name}. ")
tensor = tf.convert_to_tensor([1, 2, 3], dtype=tf.int32)
outer(tensor)
Running the above will give you the following output (modulo id, of course).
In inner tf.function, we have graph FuncGraph(name=inner, id=5173956128) named inner. The graph inner has FuncGraph(name=outer, id=5172536368) as it's outer graph, which is named outer.
In outer tf.function, we have graph FuncGraph(name=outer, id=5172536368) named outer.

Related

Extracting mean and std from MixtureNormal model in Tensorflow Probability

I'm currently using tensorflow probability to build an MDN to perform a regression problem. Everything works great, however, I would like to explore some properties of the model. Because I'm using a model with a mixture of gaussians, I should be able to see the mean and std of each gaussian component. Indeed, I can extract the weights from the model. It seems like there are three numbers from each gaussian component. I'm wondering which (if any) are the mean and std from the mixture of gaussians.
The model I am using is built as follows:
def keras_model_2gauss_mdn(n_variables, name='gauss2_mdn'):
event_shape = [1]
num_components = 2
param_size = tfp.layers.MixtureNormal.params_size(num_components, event_shape)
x_1 = tf.keras.Input(shape=n_variables)
hidden_0 = tf.keras.layers.Dense(192, activation='relu')(x_1)
hidden_1 = tf.keras.layers.Dense(192, activation='relu')(hidden_0)
hidden_2 = tf.keras.layers.Dense(192, activation='relu')(hidden_1)
hidden_3 = tf.keras.layers.Dense(128, activation='relu')(hidden_2)
hidden_4 = tf.keras.layers.Dense(64, activation='relu')(hidden_3)
hidden_5 = tf.keras.layers.Dense(param_size, activation=None)(hidden_4)
output = tfp.layers.MixtureNormal(num_components, event_shape)(hidden_5)
return tf.keras.Model(inputs=x_1, outputs=output, name=name)
After compiling and fitting (i.e. after training), I can get the weights from the whole model by calling .get_weights. By selecting the last vector from this output, I can get the weights of the MixtureNormal layer. This looks something like
array([ 0.09415845, -0.0941584 , -0.02495631, -0.05152947, -0.04510244,
-0.00484127], dtype=float32)
I suspect the first number in each group of three is the weight, the second is the mean, and the third is the std, but need some clarity on if this is actually the case.
Notice that I've also tried the solution given here and it doesn't seem to work for tfp.layers.MixtureNormal.
I'm rather new to ML and tensorflow, so any help is greatly appreciated!
The idea here is when you pass an input to your network, you get a distribution back. In order to make things work nicely with Keras and other things you might do with the output of a NN, the resulting distribution is wrapped in something called _TensorCoercible. This means that when you pass the distribution into a TF op, the distribution will turn itself into a tensor. The default way of doing this is to sample the distribution, but it's configurable via the convert_to_tensor_fn argument that all TFP layers accept. Eg, you could use convert_to_tensor_fn=lambda dist: dist.mean() (or whatever you like!). Anyway, this means that when you invoke your model on some input, you don't directly get the MixtureSameFamily (Distribution!) instance underlying the MixtureNormal (TFP layer!) output -- you get a _TensorCoercible wrapper around it.
To get the MixtureSameFamily instance, look at the tensor_distribution member on the resultant TC object. It appears that, within the MSF instance, the mixture distribution is not a TC, but the components distribution is. Not sure why. Here's a runnable snippet adapted from your code:
import tensorflow as tf
import tensorflow_probability as tfp
n_variables=[1]
name='blah'
event_shape = [1]
num_components = 2
param_size = tfp.layers.MixtureNormal.params_size(num_components, event_shape)
x_1 = tf.keras.Input(shape=n_variables)
hidden_0 = tf.keras.layers.Dense(192, activation='relu')(x_1)
hidden_1 = tf.keras.layers.Dense(192, activation='relu')(hidden_0)
hidden_2 = tf.keras.layers.Dense(192, activation='relu')(hidden_1)
hidden_3 = tf.keras.layers.Dense(128, activation='relu')(hidden_2)
hidden_4 = tf.keras.layers.Dense(64, activation='relu')(hidden_3)
hidden_5 = tf.keras.layers.Dense(param_size, activation=None)(hidden_4)
output = tfp.layers.MixtureNormal(num_components, event_shape)(hidden_5)
model = tf.keras.Model(inputs=x_1, outputs=output, name=name)
model.compile()
dist = model(tf.constant([[1.]]))
print('mixture component logits: ',
dist.tensor_distribution.mixture_distribution.logits.numpy())
print('mixutre component means: ',
dist.tensor_distribution.components_distribution.tensor_distribution.mean().numpy())
print('mixture component stddevs: ',
dist.tensor_distribution.components_distribution.tensor_distribution.stddev().numpy())
Output:
mixture component logits: [[0.01587015 0.03365375]]
mixutre component means: [[[ 0.04741365]
[-0.01594907]]]
mixture component stddevs: [[[0.68762577]
[0.687484 ]]]
HTH!

Layer wise propagation(LRP) in keras neural network

I have been following LRP implementation using pyTorch and wanted to test it out using Tensorflow and Keras. I am using the same model with weights(VGG16) in Keras and was able to successfully execute the forward pass and element wise division using
# keras-tensorflow implementation
z = incr(clasifierLayers[l](A[l])) # forward pass step(1)
s = (R[l+1]/z) # Element wise division step(2)
But i am facing trouble in recreating the backward pass. In the original code(LRP), which uses pyTorch, the backward pass is computed using
# pyTorch implementation
(z*s).sum().backward(); c = A[l].grad
and when i tried to find the replicate the backward pass using tensorflow, my gradient returns None. Here is my code trying to compute the backward pass.
def getGradients(product,layer,l):
with tf.GradientTape() as tape:
tape.watch(product)
a=layers[l](A[l])
gradient = tape.gradient(product, a)
return gradient
c = getGradients((z*s).numpy().sum(),layers[l],l) # backward pass step(3)
Can someone tell me whats wrong with this implementation?
Thanks in Advance
I tried to replicate the issue with one layer and performing an LRP backward step, here is the code:
import tensorflow as tf
x = tf.ones((1,10))
layer=tf.keras.layers.Dense(10)
y=layer(x)
with tf.GradientTape() as tape:
tape.watch(x)
z = tf.keras.layers.Dense(10)(x)+1e-9
s = y/z
s = tf.reshape(s, z.shape)
c = tape.gradient(tf.reduce_sum(z*s), x)
y*c
This code works, in the sense that it returns the gradients to c.
I did not test it with a dataset, so do not know if it works as it should. Nonetheless, I think the problem with your code is that you should have the first block:
# keras-tensorflow implementation
z = incr(clasifierLayers[l](A[l])) # forward pass step(1)
s = (R[l+1]/z) # Element wise division step(2)
inside the TapeGradient scope and ask for the gradients with respect to the A[l].
Edit:
I forgot to avoid gradients being propagated through s. The gradient computation should be done as follows:
c = tape.gradient(tf.reduce_sum(z*s.numpy()), x)

Which type of parameter the Adam optimizer in GPflow is working on, constrained or unconstrained?

In the document of GPflow like SVGP and natural gradient, the Adam optimizer in TensorFlow is used when it comes to training model parameters (lengthscale, variance, inducing inputs, etc) of the GP model using stochastic variational inference technique, while the natural gradient optimizer for variational parameters. A snippet looks as follows
def run_adam(model, iterations):
"""
Utility function running the Adam optimizer
:param model: GPflow model
:param interations: number of iterations
"""
# Create an Adam Optimizer action
logf = []
train_iter = iter(train_dataset.batch(minibatch_size))
training_loss = model.training_loss_closure(train_iter, compile=True)
optimizer = tf.optimizers.Adam()
#tf.function
def optimization_step():
optimizer.minimize(training_loss, model.trainable_variables)
for step in range(iterations):
optimization_step()
if step % 10 == 0:
elbo = -training_loss().numpy()
logf.append(elbo)
return logf
As demonstrated, model.trainable_variables is passed to the Adam optimizer, which is inherited from tf.Module, and is composed of several parameters including lengthscale and variance.
What I am concerning is whether the Adam optimizer is working on unconstrained or constrained version of the parameters of the model. A snippet of test code runs as follows
import gpflow as gpf
import numpy as np
x = np.arange(10)[:, np.newaxis]
y = np.arange(10)[:, np.newaxis]
model = gpf.models.GPR((x, y),
kernel = gpf.kernels.SquaredExponential(variance = 2, lengthscales = 3),
noise_variance = 4)
model.kernel.parameters[0].unconstrained_variable is model.trainable_variables[0]
and returns
True
As far as I know, parameters of the gaussian process like lenghtscales and the variances of a kernel are nonegative, and they should be constrained when training. I am not an expert of the source code of GPflow or TensorFlow, but it seems that Adam is working on unconstrained parameters. Is this simply a misunderstanding of me, or anything else?
Thanks in advance for any help!
You're right, and that's by design. A constrained variable in GPflow is represented by a Parameter. The Parameter wraps the unconstrained_variable. When you access .trainable_variables on your model, this will include the unconstrained_variable of the Parameter, and so when you pass these variables to the optimizer, the optimizer will train those rather than the Parameter itself.
But your model doesn't see the unconstrained_value, it sees the Parameter interface which is a tf.Tensor-like interface related to the wrapped unconstrained_variable via an optional transformation. This transformation maps the unconstrained value to a constrained value. As such, your model will only see the constrained value. It's not a problem that your constrained value must be positive, the transform will map negative values of the unconstrained values to positive values for the constrained value.
You can see the unconstrained and constrained values of the first Parameter for your kernel, as well as the transform that relates them, with
param = model.kernel.parameters[0]
param.value() # this is what your model will see
param.unconstrained_variable # this is what the optimizer will see
param.transform # the above two are related via this

How to initialize a keras tensor employed in an API model

I am trying to implemente a Memory-augmented neural network, in which the memory and the read/write/usage weight vectors are updated according to a combination of their previous values. These weigths are different from the classic weight matrices between layers that are automatically updated with the fit() function! My problem is the following: how can I correctly initialize these weights as keras tensors and use them in the model? I explain it better with the following simplified example.
My API model is something like:
input = Input(shape=(5,6))
controller = LSTM(20, activation='tanh',stateful=False, return_sequences=True)(input)
write_key = Dense(4,activation='tanh')(controller)
read_key = Dense(4,activation='tanh')(controller)
w_w = Add()([w_u, w_r]) #<---- UPDATE OF WRITE WEIGHTS
to_write = Dot()([w_w, write_key])
M = Add()([M,to_write])
cos_sim = Dot()([M,read_key])
w_r = Lambda(lambda x: softmax(x,axis=1))(cos_sim) #<---- UPDATE OF READ WEIGHTS
w_u = Add()([w_u,w_r,w_w]) #<---- UPDATE OF USAGE WEIGHTS
retrieved_memory = Dot()([w_r,M])
controller_output = concatenate([controller,retrieved_memory])
final_output = Dense(6,activation='sigmoid')(controller_output)`
You can see that, in order to compute w_w^t, I have to have first defined w_r^{t-1} and w_u^{t-1}. So, at the beginning I have to provide a valid initialization for these vectors. What is the best way to do it? The initializations I would like to have are:
M = K.variable(numpy.zeros((10,4))) # MEMORY
w_r = K.variable(numpy.zeros((1,10))) # READ WEIGHTS
w_u = K.variable(numpy.zeros((1,10))) # USAGE WEIGHTS`
But, analogously to what said in #2486(entron), these commands do not return a keras tensor with all the needed meta-data and so this returns the following error:
AttributeError: 'NoneType' object has no attribute 'inbound_nodes'
I also thought to use the old M, w_r and w_u as further inputs at each iteration and analogously get in output the same variables to complete the loop. But this means that I have to use the fit() function to train online the model having just the target as final output (Model 1), and employ the predict() function on the model with all the secondary outputs (Model 2) to get the variables to use at the next iteration. I have also to pass the weigth matrices from Model 1 to Model 2 using get_weights() and set_weights(). As you can see, it becomes a little bit messy and too slow.
Do you have any suggestions for this problem?
P.S. Please, do not focus too much on the API model above because it is a simplified (almost meaningless) version of the complete one where I skipped several key steps.

writing a custom cost function in tensorflow

I'm trying to write my own cost function in tensor flow, however apparently I cannot 'slice' the tensor object?
import tensorflow as tf
import numpy as np
# Establish variables
x = tf.placeholder("float", [None, 3])
W = tf.Variable(tf.zeros([3,6]))
b = tf.Variable(tf.zeros([6]))
# Establish model
y = tf.nn.softmax(tf.matmul(x,W) + b)
# Truth
y_ = tf.placeholder("float", [None,6])
def angle(v1, v2):
return np.arccos(np.sum(v1*v2,axis=1))
def normVec(y):
return np.cross(y[:,[0,2,4]],y[:,[1,3,5]])
angle_distance = -tf.reduce_sum(angle(normVec(y_),normVec(y)))
# This is the example code they give for cross entropy
cross_entropy = -tf.reduce_sum(y_*tf.log(y))
I get the following error:
TypeError: Bad slice index [0, 2, 4] of type <type 'list'>
At present, tensorflow can't gather on axes other than the first - it's requested.
But for what you want to do in this specific situation, you can transpose, then gather 0,2,4, and then transpose back. It won't be crazy fast, but it works:
tf.transpose(tf.gather(tf.transpose(y), [0,2,4]))
This is a useful workaround for some of the limitations in the current implementation of gather.
(But it is also correct that you can't use a numpy slice on a tensorflow node - you can run it and slice the output, and also that you need to initialize those variables before you run. :). You're mixing tf and np in a way that doesn't work.
x = tf.Something(...)
is a tensorflow graph object. Numpy has no idea how to cope with such objects.
foo = tf.run(x)
is back to an object python can handle.
You typically want to keep your loss calculation in pure tensorflow, so do the cross and other functions in tf. You'll probably have to do the arccos the long way, as tf doesn't have a function for it.
just realized that the following failed:
cross_entropy = -tf.reduce_sum(y_*np.log(y))
you cant use numpy functions on tf objects, and the indexing my be different too.
I think you can use "Wraps Python function" method in tensorflow. Here's the link to the documentation.
And as for the people who answered "Why don't you just use tensorflow's built in function to construct it?" - sometimes the cost function people are looking for cannot be expressed in tf's functions or extremely difficult.
This is because you have not initialized your variable and because of this it does not have your Tensor there right now (can read more in my answer here)
Just do something like this:
def normVec(y):
print y
return np.cross(y[:,[0,2,4]],y[:,[1,3,5]])
t1 = normVec(y_)
# and comment everything after it.
To see that you do not have a Tensor now and only Tensor("Placeholder_1:0", shape=TensorShape([Dimension(None), Dimension(6)]), dtype=float32).
Try initializing your variables
init = tf.initialize_all_variables()
sess = tf.Session()
sess.run(init)
and evaluate your variable sess.run(y). P.S. you have not fed your placeholders up till now.