How to initialize a keras tensor employed in an API model - api

I am trying to implemente a Memory-augmented neural network, in which the memory and the read/write/usage weight vectors are updated according to a combination of their previous values. These weigths are different from the classic weight matrices between layers that are automatically updated with the fit() function! My problem is the following: how can I correctly initialize these weights as keras tensors and use them in the model? I explain it better with the following simplified example.
My API model is something like:
input = Input(shape=(5,6))
controller = LSTM(20, activation='tanh',stateful=False, return_sequences=True)(input)
write_key = Dense(4,activation='tanh')(controller)
read_key = Dense(4,activation='tanh')(controller)
w_w = Add()([w_u, w_r]) #<---- UPDATE OF WRITE WEIGHTS
to_write = Dot()([w_w, write_key])
M = Add()([M,to_write])
cos_sim = Dot()([M,read_key])
w_r = Lambda(lambda x: softmax(x,axis=1))(cos_sim) #<---- UPDATE OF READ WEIGHTS
w_u = Add()([w_u,w_r,w_w]) #<---- UPDATE OF USAGE WEIGHTS
retrieved_memory = Dot()([w_r,M])
controller_output = concatenate([controller,retrieved_memory])
final_output = Dense(6,activation='sigmoid')(controller_output)`
You can see that, in order to compute w_w^t, I have to have first defined w_r^{t-1} and w_u^{t-1}. So, at the beginning I have to provide a valid initialization for these vectors. What is the best way to do it? The initializations I would like to have are:
M = K.variable(numpy.zeros((10,4))) # MEMORY
w_r = K.variable(numpy.zeros((1,10))) # READ WEIGHTS
w_u = K.variable(numpy.zeros((1,10))) # USAGE WEIGHTS`
But, analogously to what said in #2486(entron), these commands do not return a keras tensor with all the needed meta-data and so this returns the following error:
AttributeError: 'NoneType' object has no attribute 'inbound_nodes'
I also thought to use the old M, w_r and w_u as further inputs at each iteration and analogously get in output the same variables to complete the loop. But this means that I have to use the fit() function to train online the model having just the target as final output (Model 1), and employ the predict() function on the model with all the secondary outputs (Model 2) to get the variables to use at the next iteration. I have also to pass the weigth matrices from Model 1 to Model 2 using get_weights() and set_weights(). As you can see, it becomes a little bit messy and too slow.
Do you have any suggestions for this problem?
P.S. Please, do not focus too much on the API model above because it is a simplified (almost meaningless) version of the complete one where I skipped several key steps.


How do I explicitly split a Dataset tuple in Tensorflow's functional API into two separate layers from just one input layer?

The input to my model is a BatchDataset object called dataset_train, and it is batched to yield (training_data, label).
For some of the machinery in my model, I need to be able to split the Dataset tuple inside the model and independently access both the data and the label. This is a single input model with multiple outputs, so I am using Tensorflow's Functional API. For the sake of reproducibility, I am working with timeseries, so a toy dataset would look like this:
time = np.arange(1000)
data = np.random.randn(1000)
label = np.random.randn(1000)
training_data = np.zeros(shape=(time.size,2))
training_data[:,0] = time
training_data[:,1] = data
dataset_train = tf.keras.utils.timeseries_dataset_from_array(
data = training_data,
targets = label,
batch_size = batch_size,
sequence_length = sequence_length,
sequence_stride = 1,
Note: Sequence Length and batch_size are additional semi-arbitrary hyperparameters that are not important for the purposes of this question.
How do I split apart the Dataset in Tensorflow's Functional API into the training data element and the label element?
Here is pseudocode of what I am looking for:
input = Single Input Layer that defines something capable of accepting dataset_train
training_data = input.element_spec[0]
label = input.element_spec[1]
After that point, my model can perform it's actions on training_data and label independently.
First Solution I tried:
I first started by trying to define two input layers and pass each element of the dataset tuple to each input layer, and the act on each input layer independently.
training_data = tf.keras.Input(shape=(sequence_length,2))
label = tf.keras.Input(shape = sequence_length)
#model machinery
model = tf.keras.Model(
inputs = [training_data, label],
outputs = [output_1, output_2]
#model machinery
history =, epochs = 500)
The first problem I had with this is that I got the following error:
ValueError: Layer "model_5" expects 2 input(s), but it received 1 input tensors. Inputs received: [<tf.Tensor 'IteratorGetNext:0' shape=(None, None, 2) dtype=float64>]
This is a problem, because if I actually pass the model a dictionary of datasets (nevermind that this isn't supported) then I introduce a circular dependency where in order to use model.predict, it expects labels for the inputs to model.predict. In other words, I need the answers to get the answers. Because I need to pass it only a single Dataset to prevent introducing this circular dependency (tensorflow implicitly assumes that the second element in a Dataset is the label, and doesn't require Datasets with labels for model.predict), I decided to abandon this strategy for unpacking the Input layer directly within the functional API for the model.
Second Solution I tried:
I thought maybe I could unpack the Dataset using the .get_single_element() method in the following code excerpt
input = tf.keras.Input(shape = (sequence_length, 2))
training_dataset, label = input.get_single_element()
This gave the following error:
AttributeError: 'KerasTensor' object has no attribute 'get_single_element'
I then thought the problem was that because the symbolic tensor wasn't of type Dataset, I needed to define the input layer to expect a Dataset. After reading through the documentation and spending ~9 hours messing around, I realized that tf.keras.Input takes an argument called type_spec, which allows the user to specify exactly the type of symbolic tensor to create (I think - I'm still a little shaky on understanding exactly what's going on and I'm more than a little sleep deprived, which isn't helping). As it turns out there's a way to generate the type_spec from the dataset itself, so I did that to make sure that I wasn't making a mistake in generating it.
input = tf.keras.Input(tensor = dataset_train)
training_dataset, label = input.get_single_element()
Which gives the following error:
AttributeError: 'BatchDataset' object has no attribute 'dtype'
I'm not really sure why I get this error, but I tried to circumvent it by explicitly defining the type_spec in the Input layer
input = tf.keras.Input(type_spec:
training_dataset, label = input.get_single_element()
Which gives the following error:
ValueError: KerasTensor only supports TypeSpecs that have a shape field; got DatasetSpec, which does not have a shape.
I also had tried to make the DatasetSpec manually instead of generating it using .from_value earlier and had gotten the same error. I thought then it was just because I was messing it up, but now that I've gotten this error from .from_value, I'm beginning to suspect that this line of solutions won't work because DatasetSpec implicitly is missing a shape. I might also be confused, because performing dataset_train.element_spec clearly reveals that the dataset does have a shape, so I'm not sure why Tensorflow can't infer from it.
Any help in furthering either of those non-functional solutions so that I can explicitly access the training_data and label separately from an input Dataset inside the Functional API would be much appreciated!

Using Tensorflow Dataset from_generator() to create multi Input/Output with Custom Generator and ImageDataGenerator

I am trying to scale up my model which uses a "cluster loss" extension, the implementation works so far on MNIST, but I would like to benefit from data augmentation and multi-processing for the real dataset.
In short, the network follows works done with the "centre loss", which resemble a bit a Siamese Network. The important part of the architectures is that the model has 2 inputs and 2 outputs. Therefore, I implemented a custom generator in order to feed the model as follow:
def my_generator(stop):
i = 0
while i < stop:
batch =
img = batch[0]
labels = batch[1]
labels_size = np.shape(labels)
cluster = np.zeros(labels_size)
x = [img, labels]
y = [labels, cluster]
yield x, y
i += 1
which calls the generator ("train_gen") defined as follow:
generator = tf.keras.preprocessing.image.ImageDataGenerator(rescale=1./255, horizontal_flip=True)
train_gen = generator.flow_from_dataframe(df, x_col='img_path', y_col='label',
target_size=(32, 32),
The generator works if I set only one worker in the fit function. But obviously it's painfully slow... So I tried to use the recommended tf.Data from Tensorflow ( to fit my model, but setting it as follow,
ds =,
output_types=([tf.float32, tf.float32], [tf.float32, tf.float32]))
I got the following error:
TypeError: Cannot convert value [tf.float32, tf.float32] to a Tensorflow DType.
From there, I tried multiple things, following this post
For example, trying to return tuples instead of arrays:
x = (img, labels)
y = (labels, cluster)
But I got:
ValueError: as_list() is not defined on an unknown TensorShape
Does anyone have experience with this? I am not sure to understand the error and I am thinking that I could change the "output_types" argument perhaps, but TensorFlow has no "list" or "tuple" DType argument.
Here is a link to my code which construct a small image dataset from cifar10 to feed a toy model.
I do not think your generator works as you expect. Each time it is called it sets i=0. The code after
yield x, y
i += 1
i += 1 never executes. Put a print statement as below
yield x, y
i += 1
print ('the value of i is ',i)
and you will see it never executes.
The above is true if you execute
which is how generators are used. However if you execute
then the i += 1 statement does execute. Normally with generators you use them with next(my_generator). I believe gets the next batch by using next() on the generator you specify.

How to create two graphs for train and validation?

When I read tensorflow guidance about graph and session(Graphs and Sessions), I found they suggest to create two graphs for train and validation.
I think this reasonable and I want to use this because my train and validation models are different (for encoder-decoder mode or dropout). However, i don't know how to make variables in trained graph available for test graph without using tf.saver().
When I create two graphs and create variables inside each graph, I found these two variables are totally different as they belong to different graphs.
I have googled a lot and I know there are questions about this problems, such as question1. But there is still no useful answer. If there is any code example or anyone know how to create two graphs for train and validation separately, such as:
def train_model():
g_train = tf.graph()
with g_train.as_default():
def validation_model():
g_test = tf.graph()
with g_test.as_default():
One easy way of doing that is to create a 'forward function' that defines the model and change behaviour based on extra parameters.
Here is an example:
def forward_pass(x, is_training, reuse=tf.AUTO_REUSE, name='model_forward_pass'):
# Note the reuse attribute as it tells the getter to either create the graph or get the weights
with tf.variable_scope(name=name, reuse=reuse):
x = tf.layers.conv(x, ...)
x = tf.layers.dense(x, ...)
x = tf.layers.dropout(x, rate, training=is_training) # Note the is_training attribute
return x
Now you can call the 'forward_pass' function anywhere in your code. You simply need to provide the is_training attribute to use the correct mode for dropout for example. The 'reuse' argument will automatically get the correct values for your weights as long as the 'name' of the 'variable_scope' is the same.
For example:
train_logits_model1 = forward_pass(x_train, is_training=True, name='model1')
# Graph is defined and dropout is used in training mode
test_logits_model1 = forward_pass(x_test, is_training=False, name='model1')
# Graph is reused but the dropout behaviour change to inference mode
train_logits_model2 = forward_pass(x_train2, is_training=True, name='model2')
# Name changed, model2 is added to the graph and dropout is used in training mode
To add to this answer as you stated that you want to have 2 separated graph, you could to that using an assign function:
train_graph = forward_pass(x, is_training=True, reuse=False, name='train_graph')
test_graph = forward_pass(x, is_training=False, reuse=False, name='test_graph')
train_vars = tf.get_collection('variables', 'train_graph/.*')
test_vars = tf.get_collection('variables','test_graph/.*')
test_assign_ops = []
for test, train in zip(test_vars, train_vars):
test_assign_ops += [tf.assign(test, train)]
assign_op =*test_assign_ops) # Replace vars in the test_graph by the one in train_graph
I'm a big advocate of method 1 as it is way cleaner and reduce memory usage.

How to get weights in tf.layers.dense?

I wanna draw the weights of tf.layers.dense in tensorboard histogram, but it not show in the parameter, how could I do that?
The weights are added as a variable named kernel, so you could use
x = tf.dense(...)
weights = tf.get_default_graph().get_tensor_by_name(
os.path.split([0] + '/kernel:0')
You can obviously replace tf.get_default_graph() by any other graph you are working in.
I came across this problem and just solved it. tf.layers.dense 's name is not necessary to be the same with the kernel's name's prefix. My tensor is "dense_2/xxx" but it's kernel is "dense_1/kernel:0". To ensure that tf.get_variable works, you'd better set the name=xxx in the tf.layers.dense function to make two names owning same prefix. It works as the demo below:
with tf.variable_scope('ip1', reuse=True):
w = tf.get_variable('kernel')
By the way, my tf version is 1.3.
The latest tensorflow layers api creates all the variables using the tf.get_variable call. This ensures that if you wish to use the variable again, you can just use the tf.get_variable function and provide the name of the variable that you wish to obtain.
In the case of a tf.layers.dense, the variable is created as: layer_name/kernel. So, you can obtain the variable by saying:
with tf.variable_scope("layer_name", reuse=True):
weights = tf.get_variable("kernel") # do not specify
# the shape here or it will confuse tensorflow into creating a new one.
[Edit]: The new version of Tensorflow now has both Functional and Object-Oriented interfaces to the layers api. If you need the layers only for computational purposes, then using the functional api is a good choice. The function names start with small letters for instance -> tf.layers.dense(...). The Layer Objects can be created using capital first letters e.g. -> tf.layers.Dense(...). Once you have a handle to this layer object, you can use all of its functionality. For obtaining the weights, just use obj.trainable_weights this returns a list of all the trainable variables found in that layer's scope.
I am going crazy with tensorflow.
I run this:
after training, and I get the weights.
Comes from the properties described here.
I am saying that I am going crazy because it seems that there are a million slightly different ways to do something in tf, and that fragments the tutorials around.
Is there anything wrong with
After I create a model, compile it and run fit, this function returns a numpy array of the weights for me.
In TF 2 if you're inside a #tf.function (graph mode):
weights = optimizer.weights
If you're in eager mode (default in TF2 except in #tf.function decorated functions):
weights = optimizer.get_weights()
in TF2 weights will output a list in length 2
weights_out[0] = kernel weight
weights_out[1] = bias weight
the second layer weight (layer[0] is the input layer with no weights) in a model in size: 50 with input size: 784
inputs = keras.Input(shape=(784,), name="digits")
x = layers.Dense(50, activation="relu", name="dense_1")(inputs)
x = layers.Dense(50, activation="relu", name="dense_2")(x)
outputs = layers.Dense(10, activation="softmax", name="predictions")(x)
model = keras.Model(inputs=inputs, outputs=outputs)
kernel_weight = model.layers[1].weights[0]
bias_weight = model.layers[1].weights[1]
all_weight = model.layers[1].weights
print(len(all_weight)) # 2
print(kernel_weight.shape) # (784,50)
print(bias_weight.shape) # (50,)
Try to make a loop for getting the weight of each layer in your sequential network by printing the name of the layer first which you can get from:
Then u can get the weight of each layer running this code:
for layer in model.layers:

Update only part of the word embedding matrix in Tensorflow

Assuming that I want to update a pre-trained word-embedding matrix during training, is there a way to update only a subset of the word embedding matrix?
I have looked into the Tensorflow API page and found this:
# Create an optimizer.
opt = GradientDescentOptimizer(learning_rate=0.1)
# Compute the gradients for a list of variables.
grads_and_vars = opt.compute_gradients(loss, <list of variables>)
# grads_and_vars is a list of tuples (gradient, variable). Do whatever you
# need to the 'gradient' part, for example cap them, etc.
capped_grads_and_vars = [(MyCapper(gv[0]), gv[1])) for gv in grads_and_vars]
# Ask the optimizer to apply the capped gradients.
However how do I apply that to the word-embedding matrix. Suppose I do:
word_emb = tf.Variable(0.2 * tf.random_uniform([syn0.shape[0],s['es']], minval=-1.0, maxval=1.0, dtype=tf.float32),name='word_emb',trainable=False)
gather_emb = tf.gather(word_emb,indices) #assuming that I pass some indices as placeholder through feed_dict
opt = tf.train.AdamOptimizer(1e-4)
grad = opt.compute_gradients(loss,gather_emb)
How do I then use opt.apply_gradients and tf.scatter_update to update the original embeddign matrix? (Also, tensorflow throws an error if the second argument of compute_gradient is not a tf.Variable)
TL;DR: The default implementation of opt.minimize(loss), TensorFlow will generate a sparse update for word_emb that modifies only the rows of word_emb that participated in the forward pass.
The gradient of the tf.gather(word_emb, indices) op with respect to word_emb is a tf.IndexedSlices object (see the implementation for more details). This object represents a sparse tensor that is zero everywhere, except for the rows selected by indices. A call to opt.minimize(loss) calls AdamOptimizer._apply_sparse(word_emb_grad, word_emb), which makes a call to tf.scatter_sub(word_emb, ...)* that updates only the rows of word_emb that were selected by indices.
If on the other hand you want to modify the tf.IndexedSlices that is returned by opt.compute_gradients(loss, word_emb), you can perform arbitrary TensorFlow operations on its indices and values properties, and create a new tf.IndexedSlices that can be passed to opt.apply_gradients([(word_emb, ...)]). For example, you could cap the gradients using MyCapper() (as in the example) using the following calls:
grad, = opt.compute_gradients(loss, word_emb)
train_op = opt.apply_gradients(
[tf.IndexedSlices(MyCapper(grad.values), grad.indices)])
Similarly, you could change the set of indices that will be modified by creating a new tf.IndexedSlices with a different indices.
* In general, if you want to update only part of a variable in TensorFlow, you can use the tf.scatter_update(), tf.scatter_add(), or tf.scatter_sub() operators, which respectively set, add to (+=) or subtract from (-=) the value previously stored in a variable.
Since you just want to select the elements to be updated (and not to change the gradients), you can do as follows.
Let indices_to_update be a boolean tensor that indicates the indices you wish to update, and entry_stop_gradients is defined in the link, Then:
gather_emb = entry_stop_gradients(gather_emb, indices_to_update)
Actually, I was also struggling with such a problem. In my case, I needed to train a model with w2v embeddings, but not all of the tokens existed in embedding matrix. Thus for those tokens which were not in matrix, I made random initialization. Of course tokens for which embeddings were already trained, shouldn't be updated, thus I've came up with such a solution:
class PartialEmbeddingsUpdate(tf.keras.layers.Layer):
def __init__(self, len_vocab,
super(PartialEmbeddingsUpdate, self).__init__()
self.embeddings = tf.Variable(weights, name='embedding', dtype=tf.float32)
self.bool_mask = tf.equal(tf.expand_dims(tf.range(0,len_vocab),1), tf.expand_dims(indices_to_update,0))
self.bool_mask = tf.reduce_any(self.bool_mask,1)
self.bool_mask_not = tf.logical_not(self.bool_mask)
self.bool_mask_not = tf.expand_dims(tf.cast(self.bool_mask_not, dtype=self.embeddings.dtype),1)
self.bool_mask = tf.expand_dims(tf.cast(self.bool_mask, dtype=self.embeddings.dtype),1)
def call(self, input):
input = tf.cast(input, dtype=tf.int32)
embeddings = tf.stop_gradient(self.bool_mask_not * self.embeddings) + self.bool_mask * self.embeddings
return tf.gather(embeddings,input)
Where len_vocab - is your vocabulary length, weights - matrix of weights (some of which shouldn't be updated) and indices_to_update - indices of those tokens which should be updated. After that I applied this layer instead of tf.keras.layers.Embeddings. Hope it helps everyone, who encountered the same problem.