Is it possible to incorporate an Add() function in the tf.keras.Sequential() model, when defined like:
from tensorflow import keras
model = keras.Sequential([
keras.Input(shape(input_shape,)),
keras.layers.Dense(32),
keras.layers.Dense(8),
# I want to add here
keras.layers.Add()(some_var)
], name='my_model')
some_var is a tensor of with the same size as the network at that point. So each element needs to be added to its corresponding element in some_var.
I know I can do this quite easily with the functional API, but would prefer to use a sequential model as it would match other branches in my network.
If its not clear keras.layers.Add()(some_var) is just a guess of how I would like it to work. This gives the error: ValueError: A merge layer should be called on a list of inputs..
My question is specific to the style in which I define the Sequential model.
One of the main difference between Functional and Sequential API is that Sequential works with single input and single output where as Functional API works with single-input and single-output or single-input and multiple-output, or multiple-inputs and multiple-outputs. So using Functional API, you can add two layers of multiple-inputs through `keras.layers.Add().
Also, this keras.layers.Add() can be used in to add two input tensors which is not really we do. we can rather use like d = tf.add(a,b). Both c and d are equal
a = tf.constant(1.,dtype=tf.float32, shape=(1,3)).
b = tf.constant(2.,dtype=tf.float32, shape=(1,3)).
c = tf.keras.layers.Add()([a, b]).
The following example is from keras website. You can see how it is used in Functional API
import keras
input1 = keras.layers.Input(shape=(16,))
x1 = keras.layers.Dense(8, activation='relu')(input1)
input2 = keras.layers.Input(shape=(32,))
x2 = keras.layers.Dense(8, activation='relu')(input2)
# equivalent to added = keras.layers.add([x1, x2])
added = keras.layers.Add()([x1, x2])
out = keras.layers.Dense(4)(added)
model = keras.models.Model(inputs=[input1, input2], outputs=out)
Thanks to #today comment (and then a deleted answer?!), I solved it using the tf.keras.layer.Lambda function.
model = keras.Sequential([
keras.Input(shape(input_shape,)),
keras.layers.Dense(32),
keras.layers.Dense(8),
keras.layers.Lambda(lambda x : x + some_var)
], name='my_model')
Related
I'm currently using tensorflow probability to build an MDN to perform a regression problem. Everything works great, however, I would like to explore some properties of the model. Because I'm using a model with a mixture of gaussians, I should be able to see the mean and std of each gaussian component. Indeed, I can extract the weights from the model. It seems like there are three numbers from each gaussian component. I'm wondering which (if any) are the mean and std from the mixture of gaussians.
The model I am using is built as follows:
def keras_model_2gauss_mdn(n_variables, name='gauss2_mdn'):
event_shape = [1]
num_components = 2
param_size = tfp.layers.MixtureNormal.params_size(num_components, event_shape)
x_1 = tf.keras.Input(shape=n_variables)
hidden_0 = tf.keras.layers.Dense(192, activation='relu')(x_1)
hidden_1 = tf.keras.layers.Dense(192, activation='relu')(hidden_0)
hidden_2 = tf.keras.layers.Dense(192, activation='relu')(hidden_1)
hidden_3 = tf.keras.layers.Dense(128, activation='relu')(hidden_2)
hidden_4 = tf.keras.layers.Dense(64, activation='relu')(hidden_3)
hidden_5 = tf.keras.layers.Dense(param_size, activation=None)(hidden_4)
output = tfp.layers.MixtureNormal(num_components, event_shape)(hidden_5)
return tf.keras.Model(inputs=x_1, outputs=output, name=name)
After compiling and fitting (i.e. after training), I can get the weights from the whole model by calling .get_weights. By selecting the last vector from this output, I can get the weights of the MixtureNormal layer. This looks something like
array([ 0.09415845, -0.0941584 , -0.02495631, -0.05152947, -0.04510244,
-0.00484127], dtype=float32)
I suspect the first number in each group of three is the weight, the second is the mean, and the third is the std, but need some clarity on if this is actually the case.
Notice that I've also tried the solution given here and it doesn't seem to work for tfp.layers.MixtureNormal.
I'm rather new to ML and tensorflow, so any help is greatly appreciated!
The idea here is when you pass an input to your network, you get a distribution back. In order to make things work nicely with Keras and other things you might do with the output of a NN, the resulting distribution is wrapped in something called _TensorCoercible. This means that when you pass the distribution into a TF op, the distribution will turn itself into a tensor. The default way of doing this is to sample the distribution, but it's configurable via the convert_to_tensor_fn argument that all TFP layers accept. Eg, you could use convert_to_tensor_fn=lambda dist: dist.mean() (or whatever you like!). Anyway, this means that when you invoke your model on some input, you don't directly get the MixtureSameFamily (Distribution!) instance underlying the MixtureNormal (TFP layer!) output -- you get a _TensorCoercible wrapper around it.
To get the MixtureSameFamily instance, look at the tensor_distribution member on the resultant TC object. It appears that, within the MSF instance, the mixture distribution is not a TC, but the components distribution is. Not sure why. Here's a runnable snippet adapted from your code:
import tensorflow as tf
import tensorflow_probability as tfp
n_variables=[1]
name='blah'
event_shape = [1]
num_components = 2
param_size = tfp.layers.MixtureNormal.params_size(num_components, event_shape)
x_1 = tf.keras.Input(shape=n_variables)
hidden_0 = tf.keras.layers.Dense(192, activation='relu')(x_1)
hidden_1 = tf.keras.layers.Dense(192, activation='relu')(hidden_0)
hidden_2 = tf.keras.layers.Dense(192, activation='relu')(hidden_1)
hidden_3 = tf.keras.layers.Dense(128, activation='relu')(hidden_2)
hidden_4 = tf.keras.layers.Dense(64, activation='relu')(hidden_3)
hidden_5 = tf.keras.layers.Dense(param_size, activation=None)(hidden_4)
output = tfp.layers.MixtureNormal(num_components, event_shape)(hidden_5)
model = tf.keras.Model(inputs=x_1, outputs=output, name=name)
model.compile()
dist = model(tf.constant([[1.]]))
print('mixture component logits: ',
dist.tensor_distribution.mixture_distribution.logits.numpy())
print('mixutre component means: ',
dist.tensor_distribution.components_distribution.tensor_distribution.mean().numpy())
print('mixture component stddevs: ',
dist.tensor_distribution.components_distribution.tensor_distribution.stddev().numpy())
Output:
mixture component logits: [[0.01587015 0.03365375]]
mixutre component means: [[[ 0.04741365]
[-0.01594907]]]
mixture component stddevs: [[[0.68762577]
[0.687484 ]]]
HTH!
Usually we feed a model for training with external data. But I would like to use tensor coming from intermediate layer of the same model as an input for next batch.
I believe that this can be acheived by using manual loop for training. This time, I prefer to use fit_generator() from Keras (v2.2.4). I create a mode using Functional API.
Any help are appreciated. Thanks.
A very simple approach is to make the loop inside your own model:
inputs = Input(...)
#part 1 layers:
layer1 = SomeLayer(...)
layer2 = SomeLayer(...)
layer3 = SomeLayer(...)
intermediateLayer = IntermediateLayer(...)
#first pass:
out = layer1(inputs)
out = layer2(out)
out = layer3(out)
intermediate_out = intermediateLayer(out)
#second pass:
out = layer1(intermediate_out)
out = layer2(out)
out = layer3(out)
second_pass_out = intermediateLayer(out)
#rest of the model - you decide wheter you need the first pass or only the second
out = SomeLayer(...)(second_pass_out)
out = SomeLayer(...)(out)
...
final_out = FinalLayer(...)(out)
The model then goes:
model = Model(inputs, final_out)
You can, depending on your purposes, make only the second pass participate in training, blocking gradients from the first pass.
#right after intermediate_out, before using it
intermediate_out = Lambda(lambda x: K.stop_gradients(x))(intermediate_out)
You can also create more models that will share these layers, and use each model for a purpose while they will always be updated together (as they use the same layers).
Notice that in "part 1", there are layers that get "reused".
While in "rest of the model" the layers are not "reused", if for some reason you need to reuse the layers for the second part, you should do it the same way it was done for "part 1".
This is how I solve my problem.
model.compile(optimizer=optimizer, loss=loss, metrics=metrics)
model.metrics_tensors =+ [self.model.get_layer('your_intermediate_layer').output] # This line is to access the output of a layer during training (what I want)
Then train like this:
loss_out, ...., your_intermediate_layer_out = model.train_on_batch(X, y)
your_intermediate_layer_out is a numpy array I am looking for during model's training.
My code like this:
balabala...
conv_model.add(keras.layers.Flatten())
input2 = keras.models.Sequential()
input2.add(keras.layers.Activation('linear', input_shape=(1,)))
model = keras.models.Sequential()
model.add(keras.layers.Merge([conv_model, input2], mode='concat'))
balabala.....
When I run this code, it says:
UserWarning: The `Merge` layer is deprecated and will be removed after
08/2017. Use instead layers from `keras.layers.merge`, e.g. `add`,
`concatenate`, etc.
I have tried to use 'keras.layers.Concatenate' in many ways like:
model.add(keras.layers.Concatenate([conv_model, angle]))
But it says:
The first layer in a Sequential model must get an `input_shape` or
`batch_input_shape` argument
Is anybody can help?
Sequential models are not supposed to work with branches.
You need a functional API model.
input2 = Input((1,))
out2 = Activation('linear')(input2)
concatenated = Concatenate(axis=chooseOne)([conv_model.output,out2])
model = Model([conv_model.input,input2], concatenated)
PS: the layer Activation('linear') does absolutely nothing in any model.
I wanna draw the weights of tf.layers.dense in tensorboard histogram, but it not show in the parameter, how could I do that?
The weights are added as a variable named kernel, so you could use
x = tf.dense(...)
weights = tf.get_default_graph().get_tensor_by_name(
os.path.split(x.name)[0] + '/kernel:0')
You can obviously replace tf.get_default_graph() by any other graph you are working in.
I came across this problem and just solved it. tf.layers.dense 's name is not necessary to be the same with the kernel's name's prefix. My tensor is "dense_2/xxx" but it's kernel is "dense_1/kernel:0". To ensure that tf.get_variable works, you'd better set the name=xxx in the tf.layers.dense function to make two names owning same prefix. It works as the demo below:
l=tf.layers.dense(input_tf_xxx,300,name='ip1')
with tf.variable_scope('ip1', reuse=True):
w = tf.get_variable('kernel')
By the way, my tf version is 1.3.
The latest tensorflow layers api creates all the variables using the tf.get_variable call. This ensures that if you wish to use the variable again, you can just use the tf.get_variable function and provide the name of the variable that you wish to obtain.
In the case of a tf.layers.dense, the variable is created as: layer_name/kernel. So, you can obtain the variable by saying:
with tf.variable_scope("layer_name", reuse=True):
weights = tf.get_variable("kernel") # do not specify
# the shape here or it will confuse tensorflow into creating a new one.
[Edit]: The new version of Tensorflow now has both Functional and Object-Oriented interfaces to the layers api. If you need the layers only for computational purposes, then using the functional api is a good choice. The function names start with small letters for instance -> tf.layers.dense(...). The Layer Objects can be created using capital first letters e.g. -> tf.layers.Dense(...). Once you have a handle to this layer object, you can use all of its functionality. For obtaining the weights, just use obj.trainable_weights this returns a list of all the trainable variables found in that layer's scope.
I am going crazy with tensorflow.
I run this:
sess.run(x.kernel)
after training, and I get the weights.
Comes from the properties described here.
I am saying that I am going crazy because it seems that there are a million slightly different ways to do something in tf, and that fragments the tutorials around.
Is there anything wrong with
model.get_weights()
After I create a model, compile it and run fit, this function returns a numpy array of the weights for me.
In TF 2 if you're inside a #tf.function (graph mode):
weights = optimizer.weights
If you're in eager mode (default in TF2 except in #tf.function decorated functions):
weights = optimizer.get_weights()
in TF2 weights will output a list in length 2
weights_out[0] = kernel weight
weights_out[1] = bias weight
the second layer weight (layer[0] is the input layer with no weights) in a model in size: 50 with input size: 784
inputs = keras.Input(shape=(784,), name="digits")
x = layers.Dense(50, activation="relu", name="dense_1")(inputs)
x = layers.Dense(50, activation="relu", name="dense_2")(x)
outputs = layers.Dense(10, activation="softmax", name="predictions")(x)
model = keras.Model(inputs=inputs, outputs=outputs)
model.compile(...)
model.fit(...)
kernel_weight = model.layers[1].weights[0]
bias_weight = model.layers[1].weights[1]
all_weight = model.layers[1].weights
print(len(all_weight)) # 2
print(kernel_weight.shape) # (784,50)
print(bias_weight.shape) # (50,)
Try to make a loop for getting the weight of each layer in your sequential network by printing the name of the layer first which you can get from:
model.summary()
Then u can get the weight of each layer running this code:
for layer in model.layers:
print(layer.name)
print(layer.get_weights())
I am trying to implemente a Memory-augmented neural network, in which the memory and the read/write/usage weight vectors are updated according to a combination of their previous values. These weigths are different from the classic weight matrices between layers that are automatically updated with the fit() function! My problem is the following: how can I correctly initialize these weights as keras tensors and use them in the model? I explain it better with the following simplified example.
My API model is something like:
input = Input(shape=(5,6))
controller = LSTM(20, activation='tanh',stateful=False, return_sequences=True)(input)
write_key = Dense(4,activation='tanh')(controller)
read_key = Dense(4,activation='tanh')(controller)
w_w = Add()([w_u, w_r]) #<---- UPDATE OF WRITE WEIGHTS
to_write = Dot()([w_w, write_key])
M = Add()([M,to_write])
cos_sim = Dot()([M,read_key])
w_r = Lambda(lambda x: softmax(x,axis=1))(cos_sim) #<---- UPDATE OF READ WEIGHTS
w_u = Add()([w_u,w_r,w_w]) #<---- UPDATE OF USAGE WEIGHTS
retrieved_memory = Dot()([w_r,M])
controller_output = concatenate([controller,retrieved_memory])
final_output = Dense(6,activation='sigmoid')(controller_output)`
You can see that, in order to compute w_w^t, I have to have first defined w_r^{t-1} and w_u^{t-1}. So, at the beginning I have to provide a valid initialization for these vectors. What is the best way to do it? The initializations I would like to have are:
M = K.variable(numpy.zeros((10,4))) # MEMORY
w_r = K.variable(numpy.zeros((1,10))) # READ WEIGHTS
w_u = K.variable(numpy.zeros((1,10))) # USAGE WEIGHTS`
But, analogously to what said in #2486(entron), these commands do not return a keras tensor with all the needed meta-data and so this returns the following error:
AttributeError: 'NoneType' object has no attribute 'inbound_nodes'
I also thought to use the old M, w_r and w_u as further inputs at each iteration and analogously get in output the same variables to complete the loop. But this means that I have to use the fit() function to train online the model having just the target as final output (Model 1), and employ the predict() function on the model with all the secondary outputs (Model 2) to get the variables to use at the next iteration. I have also to pass the weigth matrices from Model 1 to Model 2 using get_weights() and set_weights(). As you can see, it becomes a little bit messy and too slow.
Do you have any suggestions for this problem?
P.S. Please, do not focus too much on the API model above because it is a simplified (almost meaningless) version of the complete one where I skipped several key steps.