How to create a custom layer in Keras with 'stateful' variables/tensors? - tensorflow

I would like to ask you some help for creating my custom layer.
What I am trying to do is actually quite simple: generating an output layer with 'stateful' variables, i.e. tensors whose value is updated at each batch.
In order to make everything more clear, here is a snippet of what I would like to do:
def call(self, inputs)
c = self.constant
m = self.extra_constant
update = inputs*m + c
X_new = self.X_old + update
outputs = X_new
self.X_old = X_new
return outputs
The idea here is quite simple:
X_old is initialized to 0 in the def__ init__(self, ...)
update is computed as a function of the inputs to the layer
the output of the layer is computed (i.e. X_new)
the value of X_old is set equal to X_new so that, at the next batch, X_old is no longer equal to zero but equal to X_new from the previous batch.
I have found out that K.update does the job, as shown in the example:
X_new = K.update(self.X_old, self.X_old + update)
The problem here is that, if I then try to define the outputs of the layer as:
outputs = X_new
return outputs
I will receiver the following error when I try model.fit():
ValueError: An operation has `None` for gradient. Please make sure that all of your ops have
gradient defined (i.e. are differentiable). Common ops without gradient: K.argmax, K.round, K.eval.
And I keep having this error even though I imposed layer.trainable = False and I did not define any bias or weights for the layer. On the other hand, if I just do self.X_old = X_new, the value of X_old does not get updated.
Do you guys have a solution to implement this? I believe it should not be that hard, since also stateful RNN have a 'similar' functioning.
Thanks in advance for your help!

Defining a custom layer can become confusing some times. Some of the methods that you override are going to be called once but it gives you the impression that just like many other OO libraries/frameworks, they are going to be called many times.
Here is what I mean: When you define a layer and use it in a model the python code that you write for overriding call method is not going to be directly called in forward or backward passes. Instead, it's called only once when you call model.compile. It compiles the python code to a computational graph and that graph in which the tensors will flow is what does the computations during training and prediction.
That's why if you want to debug your model by putting a print statement it won't work; you need to use tf.print to add a print command to the graph.
It is the same situation with the state variable you want to have. Instead of simply assigning old + update to new you need to call a Keras function that adds that operation to the graph.
And note that tensors are immutable so you need to define the state as tf.Variable in the __init__ method.
So I believe this code is more like what you're looking for:
class CustomLayer(tf.keras.layers.Layer):
def __init__(self, **kwargs):
super(CustomLayer, self).__init__(**kwargs)
self.state = tf.Variable(tf.zeros((3,3), 'float32'))
self.constant = tf.constant([[1,1,1],[1,0,-1],[-1,0,1]], 'float32')
self.extra_constant = tf.constant([[1,1,1],[1,0,-1],[-1,0,1]], 'float32')
self.trainable = False
def call(self, X):
m = self.constant
c = self.extra_constant
outputs = self.state + tf.matmul(X, m) + c
tf.keras.backend.update(self.state, tf.reduce_sum(outputs, axis=0))
return outputs

Related

how to get labels when using model.predict()

In my project, I have a number of cases where I have a Dataset instance and I need to get predictions from some model on every item in the dataset.
The model.predict() API is optimized perfectly for this, as shown in the documentation. However, there seems to be one major catch. I also happen to need the labels to compare with the predicted values, i.e. the dataset contains x,y pairs, and I'd like to end up with (y_predicted, y) pairs after the prediction is complete. This does not seem to be possible with the predict() API though, and I can't think of a clean way to 'split' the dataset so that the x's are fed into the model and the y's are retained to be joined back up with the predicted y's.
EDIT: I know it's quite simple to do by iterating over the dataset manually and calling the model directly, e.g.
for x, y in dataset:
y_pred = model(x)
result.append((y, y_pred))
However, this seems like it will be a fair bit slower than using the inbuilt predict() as Tensorflow won't be able to multi-thread/optimize the input pipeline.
Does anyone have a good way to accomplish this?
Given the concerns you mentioned, it may be best to overwrite predict to suit your needs. You don't actually need to overwrite that function though, instead only predict_step which is called by that function. Just use this class instead of Model:
class MyModel(tf.keras.Model):
def predict_step(self, data):
x, y = data
return self(x, training=False), y
If your model is currently Sequential, inherit from that instead. Basically the only change I made from the default implementation is to add , y to the model call result.
Note that this also makes some assumptions, such that your dataset consists of (input, label) batch pairs. You may need to adapt it slightly to your needs. Here is a minimal example:
import tensorflow as tf
import numpy as np
(imgs, lbls), (te_imgs, te_lbls) = tf.keras.datasets.mnist.load_data()
imgs = imgs.astype(np.float32).reshape((-1, 784)) / 255.
te_imgs = te_imgs.astype(np.float32).reshape((-1, 784)) / 255.
lbls = lbls.astype(np.int32)
te_lbls = te_lbls.astype(np.int32)
tr_data = tf.data.Dataset.from_tensor_slices((imgs, lbls)).shuffle(60000).batch(128)
te_data = tf.data.Dataset.from_tensor_slices((te_imgs, te_lbls)).batch(128)
class MyModel(tf.keras.Model):
def predict_step(self, data):
x, y = data
return self(x, training=False), y
inp = tf.keras.Input((784,))
logits = tf.keras.layers.Dense(10)(inp)
model = MyModel(inp, logits)
opt = tf.keras.optimizers.Adam()
loss = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
model.compile(loss=loss, optimizer=opt)
something = model.predict(te_data)
print(something[0].shape, something[1].shape)
This shows ((10000, 10), (10000,)) -- predict now returns a tuple of outputs, labels (this can be confirmed by inspecting the returned labels and comparing to the images in the test set).

For loop in tensorflow/ keras

I am trying to use a for loop within a model definition (and attempting to recreate TabNet in keras).
class TabNet(keras.Model):
def __init__(self, input_dim, output_dim, steps, n_d, n_a, gamma=1.3):
super().__init__()
self.n_d, self.n_a, self.steps = n_d, n_a, steps
self.shared = SharedBlock(n_d+n_a)
self.first_block = SharedBlock(n_a)
self.decision_blocks = [DecisionBlock(n_d+n_a)] * steps
self.prior_scale = Prior(input_dim, gamma)
self.bn = layers.BatchNormalization()
self.attention = [AttentiveTransformer(input_dim)] * steps
self.final = layers.Dense(output_dim)
self.eps = 1e-8
#tf.function
def call(self, x):
self.prior_scale.reset()
final_out = 0
M_loss = 0
x = self.bn(x)
attention = self.first_block(self.shared(x))
for i in range(self.steps):
mask = self.attention[i](attention, self.prior_scale.P)
M_loss += tf.reduce_sum(mask * tf.math.log(mask + self.eps), axis=-1) / self.steps
prior = self.prior_scale(mask)
out = self.decision_blocks[i](self.shared(x * prior))
attention, output = out[:,:self.n_a], out[:,self.n_a:]
final_out += tf.nn.relu(output)
return self.final(final_out), M_loss
If you're unaware of what those individual blocks are, simply assume that they are linear layers. I have a colab notebook with the full code if you wish to see what they actually are.
However, I cannot train it as I am getting the error iterating over tf.Tensor is not allowed: AutoGraph did not convert this function. Try decorating it directly with #tf.function.. I have decorated it, and still does not help.
I am fairly certain it is the for loop that is causing me the error when I do model.fit(train_x, train_y). Would appreciate any thoughts on how to implement the above for loop in the tensorflow way. tf.while_loop is all I have seen so far and the examples given are fairly simplistic compared to what I want to do.
this is my proposal...
I don't know what your network exactly do but what I can see is that you want to produce 2 outputs and combine them inside your loss. One of your output is also the results of some hidden operation inside the network (M_loss).
so if you want to return 2 outputs, 2 targets are needed in keras in order to make a fit. In the code I provide below, the first target is the real labels and the other is a fake output (an array of zeros).
As said before, you try to build a combined loss as sparse_entropy(y_true, y_pred) - reg_sparse * M_loss. To make this possible I split the loss in two pieces (one for each output): the sparse part and the M_loss part. The sparse loss is simply SparseCategoricalCrossentropy(from_logits=True) from keras, while for the M_loss, I wrote this function following your code
def m_loss(y_true, y_pred):
m = tf.reduce_mean(y_pred, keepdims=True)
return m
the m_loss use only 'y_pred' that are the hidden pieces of your network. the y_true in this case doesn't matter for the required operation. this is why we pass an array of zeros when fitting.
At this point, we have to combine the two losses and this possible in keras in this way
reg_sparse = 0.1
model.compile('Adam', loss=[sce, m_loss], loss_weights=[1,-reg_sparse])
model.fit(train_x, [train_y, np.zeros(train_y.shape[0])], epochs=3)
in this case, the final loss is the result of the combination of 1*sce + (-reg_sparse)*m_loss
this is the full running code: https://colab.research.google.com/drive/152q1rmqTJ0dWLbFN8PqzCBhWkVKirkU5?usp=sharing
I also make some little changes in TabNet, for example in the way final_out and M_loss are created
No actually it is not a problem of for loop. I checked your code, the problem was that you forgot to call the superclass constructor in your SharedBlock, DecisionBlock and Prior.
For e.g your code should look like.
class SharedBlock(layers.Layer):
def __init__(self, units, mult=tf.sqrt(0.5)):
super().__init__()
self.layer1 = FCBlock(units)
self.layer2 = FCBlock(units)
self.mult = mult
After doing these changes you will not see that error again but something else comes up.
TypeError: in user code:
/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/training.py:1147 predict_function *
outputs = self.distribute_strategy.run(
<ipython-input-46-f609cb1acdfa>:15 call *
self.prior_scale.reset()
TypeError: tf__reset() missing 1 required positional argument: 'len_x'
To resolve this issue you will need to do following changes in the class class Prior(layers.Layer):.
def reset(self, len_x=1.0):
self.P = 1.0
Then you will get another issue.
AttributeError: in user code:
/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/training.py:1147 predict_function *
outputs = self.distribute_strategy.run(
<ipython-input-46-f609cb1acdfa>:26 call *
out = self.decision[i](self.shared(x * prior))
AttributeError: 'TabNet' object has no attribute 'decision'
For this issue I will request to open another question as I think you main issue is resolved.
UPDATE:
You can look into the comment section of this answer, there a solution has been provided for the issue AttributeError: 'TabNet' object has no attribute 'decision'
UPDATE: 21/07
I have to disappoint you again that the issue is not with the for loop.
If you look closely at the error log you will see that the issue is due to the full_loss function.
<ipython-input-10-07e59f23d230>:7 full_loss *
logits, M_loss = y_pred
/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py:561 __iter__
self._disallow_iteration()
/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py:554 _disallow_iteration
self._disallow_when_autograph_enabled("iterating over `tf.Tensor`")
/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py:532 _disallow_when_autograph_enabled
" decorating it directly with #tf.function.".format(task))
OperatorNotAllowedInGraphError: iterating over `tf.Tensor` is not allowed: AutoGraph did not convert this function. Try decorating it directly with #tf.function.
The exact problem is caused by the below statement.
logits, M_loss = y_pred
If you use the below code that does not use your loss function you will see a different result.
model.compile('Adam', loss='sparse_categorical_crossentropy')
model.fit(train_x, train_y, batch_size=1)
Received a label value of 1 which is outside the valid range of [0, 1). Label values: 1
[[node sparse_categorical_crossentropy_1/SparseSoftmaxCrossEntropyWithLogits/SparseSoftmaxCrossEntropyWithLogits (defined at <ipython-input-26-d39f533b7a69>:2) ]] [Op:__inference_train_function_18003]
I do not understand the model code completely and the model.summary() is not that helpful in your case. There is some problem with your last layer, at least the error message suggests that you do not have ample neurons (1 for each class).
I will suggest looking into the last layer and the loss function.
Why I am sure it is not due to for loop is for the fact that even if you comment the for loop you will still receive the same error.
I hope I have helped you further, took me a few hours to figure it out.

`get_variable()` doesn't recognize existing variables for tf.estimator

This question has been asked here, difference is my problem is focused on Estimator.
Some context: We have trained a model using estimator and get some variable defined within Estimator input_fn, this function preprocesses data to batches. Now, we are moving to prediction. During the prediction, we use the same input_fn to read in and process the data. But got error saying variable (word_embeddings) does not exist (variables exist in the chkp graph), here's the relevant bit of code in input_fn:
with tf.variable_scope('vocabulary', reuse=tf.AUTO_REUSE):
if mode == tf.estimator.ModeKeys.TRAIN:
word_to_index, word_to_vec = load_embedding(graph_params["word_to_vec"])
word_embeddings = tf.get_variable(initializer=tf.constant(word_to_vec, dtype=tf.float32),
trainable=False,
name="word_to_vec",
dtype=tf.float32)
else:
word_embeddings = tf.get_variable("word_to_vec", dtype=tf.float32)
basically, when it's in prediction mode, else is invoked to load up variables in checkpoint. Failure of recognizing this variable indicates a) inappropriate usage of scope; b) graph is not restored. I don't think scope matters that much here as long as reuse is set properly.
I suspect that is because the graph is not yet restored at input_fn phase. Usually, the graph is restored by calling saver.restore(sess, "/tmp/model.ckpt") reference. Investigation of estimator source code doesn't get me anything relating to restore, the best shot is MonitoredSession, a wrapper of training. It's already been stretch so much from the original problem, not confident if I'm on the right path, I'm looking for help here if anyone has any insights.
One line summary of my question: How does graph get restored within tf.estimator, via input_fn or model_fn?
Hi I think that you error comes simply because you didn't specify the shape in the tf.get_variable (at predict) , it seems that you need to specify the shape even if the variable is going to be restored.
I've made the following test with a simple linear regressor estimator that simply needs to predict x + 5
def input_fn(mode):
def _input_fn():
with tf.variable_scope('all_input_fn', reuse=tf.AUTO_REUSE):
if mode == tf.estimator.ModeKeys.TRAIN:
var_to_follow = tf.get_variable('var_to_follow', initializer=tf.constant(20))
x_data = np.random.randn(1000)
labels = x_data + 5
return {'x':x_data}, labels
elif mode == tf.estimator.ModeKeys.PREDICT:
var_to_follow = tf.get_variable("var_to_follow", dtype=tf.int32, shape=[])
return {'x':[0,10,100,var_to_follow]}
return _input_fn
featcols = [tf.feature_column.numeric_column('x')]
model = tf.estimator.LinearRegressor(featcols, './outdir')
This code works perfectly fine, the value of the const is 20 and also for fun use it in my test set to confirm :p
However if you remove the shape=[] , it breaks, you can also give another initializer such as tf.constant(500) and everything will work and 20 will be used.
By running
model.train(input_fn(tf.estimator.ModeKeys.TRAIN), max_steps=10000)
and
preds = model.predict(input_fn(tf.estimator.ModeKeys.PREDICT))
print(next(preds))
You can visualize the graph and you'll see that a) the scoping is normal and b) the graph is restored.
Hope this will help you.

Manipulating nn.Dense() layer parameters manually in MxNet

I'm trying to implement my own optimization algorithm for MxNet (Imperative / Gluon) that does not use gradients. My question is pretty simple is there a simple way to create new nn.Dense(...) layer initialized with parameters (i.e. Biases and Weights) represented by two nd.array() instances?
Thank you in advance!
You can create a custom block with parameters that set differentiable=False, and provide the data for initialization through the init argument. See the scales parameter in the example below taken from this tutorial. You can also see an example of FullyConnected which you'll want to use for your dense layer too. F is used to denote a generic backend, typically this would be mx.ndarray, but after hybridization this is set to mx.symbol.
class NormalizationHybridLayer(gluon.HybridBlock):
def __init__(self, hidden_units, scales):
super(NormalizationHybridLayer, self).__init__()
with self.name_scope():
self.weights = self.params.get('weights',
shape=(hidden_units, 0),
allow_deferred_init=True)
self.scales = self.params.get('scales',
shape=scales.shape,
init=mx.init.Constant(scales.asnumpy().tolist()), # Convert to regular list to make this object serializable
differentiable=False)
def hybrid_forward(self, F, x, weights, scales):
normalized_data = F.broadcast_div(F.broadcast_sub(x, F.min(x)), (F.broadcast_sub(F.max(x), F.min(x))))
weighted_data = F.FullyConnected(normalized_data, weights, num_hidden=self.weights.shape[0], no_bias=True)
scaled_data = F.broadcast_mul(scales, weighted_data)
return scaled_data

How to initialize a keras tensor employed in an API model

I am trying to implemente a Memory-augmented neural network, in which the memory and the read/write/usage weight vectors are updated according to a combination of their previous values. These weigths are different from the classic weight matrices between layers that are automatically updated with the fit() function! My problem is the following: how can I correctly initialize these weights as keras tensors and use them in the model? I explain it better with the following simplified example.
My API model is something like:
input = Input(shape=(5,6))
controller = LSTM(20, activation='tanh',stateful=False, return_sequences=True)(input)
write_key = Dense(4,activation='tanh')(controller)
read_key = Dense(4,activation='tanh')(controller)
w_w = Add()([w_u, w_r]) #<---- UPDATE OF WRITE WEIGHTS
to_write = Dot()([w_w, write_key])
M = Add()([M,to_write])
cos_sim = Dot()([M,read_key])
w_r = Lambda(lambda x: softmax(x,axis=1))(cos_sim) #<---- UPDATE OF READ WEIGHTS
w_u = Add()([w_u,w_r,w_w]) #<---- UPDATE OF USAGE WEIGHTS
retrieved_memory = Dot()([w_r,M])
controller_output = concatenate([controller,retrieved_memory])
final_output = Dense(6,activation='sigmoid')(controller_output)`
You can see that, in order to compute w_w^t, I have to have first defined w_r^{t-1} and w_u^{t-1}. So, at the beginning I have to provide a valid initialization for these vectors. What is the best way to do it? The initializations I would like to have are:
M = K.variable(numpy.zeros((10,4))) # MEMORY
w_r = K.variable(numpy.zeros((1,10))) # READ WEIGHTS
w_u = K.variable(numpy.zeros((1,10))) # USAGE WEIGHTS`
But, analogously to what said in #2486(entron), these commands do not return a keras tensor with all the needed meta-data and so this returns the following error:
AttributeError: 'NoneType' object has no attribute 'inbound_nodes'
I also thought to use the old M, w_r and w_u as further inputs at each iteration and analogously get in output the same variables to complete the loop. But this means that I have to use the fit() function to train online the model having just the target as final output (Model 1), and employ the predict() function on the model with all the secondary outputs (Model 2) to get the variables to use at the next iteration. I have also to pass the weigth matrices from Model 1 to Model 2 using get_weights() and set_weights(). As you can see, it becomes a little bit messy and too slow.
Do you have any suggestions for this problem?
P.S. Please, do not focus too much on the API model above because it is a simplified (almost meaningless) version of the complete one where I skipped several key steps.