Unable to understand the behavior of method `build` in tensorflow keras layers (tf.keras.layers.Layer) - tensorflow

Layers in tensorflow keras have a method build that is used to defer the weights creation to a time when you have seen what the input is going to be. a layer's build method
I have a few questions i have not been able to find the answer of:
here it is said that
If you assign a Layer instance as attribute of another Layer, the outer layer will start tracking the weights of the inner layer.
What does it mean to track the weights of a layer?
The same link also mentions that
We recommend creating such sublayers in the init method (since the sublayers will typically have a build method, they will be built when the outer layer gets built).
Does it mean that while running the build method of child class (self), there will an iteration through all the attributes of self and whichever are found to be subclassed from (instances of) tf.keras.layer.Layer will have their build methods run automatically?
I can run this code:
class Net(tf.keras.Model):
"""A simple linear model."""
def __init__(self):
super(Net, self).__init__()
self.l1 = tf.keras.layers.Dense(5)
def call(self, x):
return self.l1(x)
net = Net()
print(net.variables)
But not this:
class Net(tf.keras.Model):
"""A simple linear model."""
def __init__(self):
super(Net, self).__init__()
self.l1 = tf.keras.layers.Dense(5)
def build(self,input_shape):
super().build()
def call(self, x):
return self.l1(x)
net = Net()
print(net.variables)
why?

I would say the build mentioned means, when you build a self-defined tf.keras.Model for example
net = Net()
then you will get all the tf.keras.layers.Layer objects create in __init__, and being stored in net which is a callable object. In this case, it will become a completed object for TF to train later, this is what it said to track. The next time you call net(inputs) you'll can get your outputs.
Here is a example of Tensorflow self-defined decoder with attention
class BahdanauAttention(tf.keras.layers.Layer):
def __init__(self, units):
super(BahdanauAttention, self).__init__()
self.W1 = tf.keras.layers.Dense(units)
self.W2 = tf.keras.layers.Dense(units)
self.V = tf.keras.layers.Dense(1)
def call(self, query, values):
# query hidden state shape == (batch_size, hidden size)
# query_with_time_axis shape == (batch_size, 1, hidden size)
# values shape == (batch_size, max_len, hidden size)
# we are doing this to broadcast addition along the time axis to calculate the score
query_with_time_axis = tf.expand_dims(query, 1)
# score shape == (batch_size, max_length, 1)
# we get 1 at the last axis because we are applying score to self.V
# the shape of the tensor before applying self.V is (batch_size, max_length, units)
score = self.V(tf.nn.tanh(
self.W1(query_with_time_axis) + self.W2(values)))
# attention_weights shape == (batch_size, max_length, 1)
attention_weights = tf.nn.softmax(score, axis=1)
# context_vector shape after sum == (batch_size, hidden_size)
context_vector = attention_weights * values
context_vector = tf.reduce_sum(context_vector, axis=1)
return context_vector, attention_weights
class Decoder(tf.keras.Model):
def __init__(self, vocab_size, embedding_dim, dec_units, batch_sz):
super(Decoder, self).__init__()
self.batch_sz = batch_sz
self.dec_units = dec_units
self.embedding = tf.keras.layers.Embedding(vocab_size, embedding_dim)
self.gru = tf.keras.layers.GRU(self.dec_units,
return_sequences=True,
return_state=True,
recurrent_initializer='glorot_uniform')
self.fc = tf.keras.layers.Dense(vocab_size)
# used for attention
self.attention = BahdanauAttention(self.dec_units)
def call(self, x, hidden, enc_output):
# enc_output shape == (batch_size, max_length, hidden_size)
context_vector, attention_weights = self.attention(hidden, enc_output)
# x shape after passing through embedding == (batch_size, 1, embedding_dim)
x = self.embedding(x)
# x shape after concatenation == (batch_size, 1, embedding_dim + hidden_size)
x = tf.concat([tf.expand_dims(context_vector, 1), x], axis=-1)
# passing the concatenated vector to the GRU
output, state = self.gru(x)
# output shape == (batch_size * 1, hidden_size)
output = tf.reshape(output, (-1, output.shape[2]))
# output shape == (batch_size, vocab)
x = self.fc(output)
return x, state, attention_weights
I have tried to put tf.keras.layers.Layer object in call and got really poor outcome, guess that was because if you put it in call then it will be call multiple times while each time a forward-backward propagation happends.

Related

Custom layer in tensorflow to output the running maximum of its inputs

I am trying to create a custom layer in tensorflow to output the running maximum of its inputs. The layer has a memory variable and comparison function. I wrote the following
class ComputeMax(tf.keras.layers.Layer):
def __init__(self):
super(ComputeMax, self).__init__()
def build(self, input_shape):
self.maxval = tf.Variable(initial_value=tf.zeros((input_shape)),
trainable=False)
def call(self, inputs):
self.maxval.assign(tf.maximum(inputs, self.maxval))
return self.maxval
my_sum = ComputeMax()
x = tf.ones((1,2))
y = my_sum(x)
print(y.numpy()) # [1, 1]
y = my_sum(x)
print(y.numpy()) # [1, 1]
It works as above. When I try it in a test model:
model = Sequential()
model.add(tf.keras.Input(shape=(2)))
model.add(Dense(1, activation='relu'))
model.add(ComputeMax())
model.compile(optimizer='adam', loss='mse')
I get the error on compile:
ValueError: Cannot convert a partially known TensorShape to a Tensor: (None, 1)
What am I missing?
Actually, the layer needs to know the input neurons from the previous layer, which is the last value in input_shape. You are using input_shape as it is which is actually batch shape, leading to a layer of the shape of batch.
This implementation might help.
class ComputeMax(tf.keras.layers.Layer):
def __init__(self):
super(ComputeMax, self).__init__()
def build(self, input_shape):
self.maxval = tf.Variable(initial_value=tf.zeros((input_shape[-1])),
trainable=False)
def call(self, inputs):
self.maxval.assign(tf.maximum(inputs, self.maxval))
return self.maxval
But probably it won't give you answers with numpy 1d array.

How to connect a single neuron in one layer with the single neuron in another layer

I am new to Tensorflow. I want to connect a layer with another layer in which only the corresponding neurons are connected with weights to each other as shown below. This means all the neurons in the previous layer are not connected to a neuron in the next layer.
Now I get 4 neurons with wixi. Further, I need to add all these outputs to get a single value. Now I want to pass this single value to a dense layer of size 4 for the autoencoder operation to complete.
I have created my custom layer for the wixi operation and I am correctly getting it but when I apply normal dense layer after addition I get the following error:
'''
ValueError: Input 0 of layer dense_15 is incompatible with the layer: : expected min_ndim=2, found ndim=1. Full shape received: [100]
'''
Following is my code for custom layer and model-
class Layer_w_x(tf.keras.layers.Layer):
def __init__(self):
super(Layer_w_x, self).__init__()
def build(self, input_shape):
self.w = self.add_weight(shape=(input_shape[-1],),
initializer="random_normal", trainable=True)
def call(self, inputs):
return tf.multiply(inputs, self.w)
class MyModel(Model):
def __init__(self, **kwargs):
super(MyModel, self).__init__(**kwargs)
self.layer_1 = Layer_w_x()
self.dense = Dense(4,activation = 'sigmoid')
def call(self, inputs):
# CALCULATION FOR FIRST NEURON
h1 = self.layer_1(inputs)
h4 = tf.reduce_sum(h1,1)
encoded = self.dense(h4)
return encoded
model = MyModel()
output =model(my_train_data1)
my_train_data1 has size (100,4)
You can make your own layer by subclassing from tf.keras.layers.Layer.
Creating custom layers is described in https://www.tensorflow.org/guide/keras/custom_layers_and_models
and https://www.tensorflow.org/tutorials/customization/custom_layers.
I created the layer for you.
import tensorflow as tf
class DirectLayer(tf.keras.layers.Layer):
def __init__(self):
super(DirectLayer, self).__init__()
def build(self, input_shape):
self.w = self.add_weight(
shape=input_shape,
initializer="random_normal",
trainable=True,
)
self.b = self.add_weight(
shape=(input_shape[-1],), initializer="random_normal", trainable=True
)
def call(self, inputs):
# outputs shape
return tf.multiply(inputs, self.w) + self.b
layer1 = DirectLayer()
layer2 = DirectLayer()
x = tf.ones([16,2])
y = layer1(x)
y = layer2(y)
tf.print(y.shape)
It outputs TensorShape([16, 2]), which means that it maintains the input's dimensions, which, I believe, is what you want. Notice I used tf.multiply (multiply elementwise) as opposed to what the Dense layer does—tf.matmul (matrix multiplication).

Failed to run the tflite model on Interpreter due to Internal Error

I am trying to build an offline translator for android. My model is highly inspired from this guide: https://www.tensorflow.org/tutorials/text/nmt_with_attention. I just did some modifications to make sure the model is serialisable. (You can find the code for the model at the end)
The model works perfectly on my jupyter notebook. I am using Tensorflow version: 2.3.0-dev20200617, I also was able to generate the tflite file using the following snippet:
converter = tf.lite.TFLiteConverter.from_keras_model(partial_model)
tflite_model = converter.convert()
with tf.io.gfile.GFile('goog_nmt_v2.tflite', 'wb') as f:
f.write(tflite_model)
However when I used the generated tflite model to get predictions on android, it throws the error java.lang.IllegalArgumentException: Internal error: Failed to run on the given Interpreter: tensorflow/lite/kernels/concatenation.cc:73 t->dims->data[d] != t0->dims->data[d] (8 != 1) Node number 84 (CONCATENATION) failed to prepare.
This is strange because I have provided the exact same input dimensions as I did in my jupyter notebook. Here is the java code that is used to test (dummy inputs) if model runs on android:
HashMap<Integer, Object> outputVal = new HashMap<>();
for(int i=0; i<2; i++) outputVal.put(i, new float[1][5]);
float[][] inp_test = new float[1][8];
float[][] enc_hidden = new float[1][1024];
float[][] dec_input = new float[1][1];
float[][] dec_test = new float[1][8];
tfLite.runForMultipleInputsOutputs(new Object[] {inp_test,enc_hidden, dec_input, dec_test},outputVal);
And here are my gradle dependencies:
dependencies {
implementation fileTree(dir: 'libs', include: ['*.jar'])
implementation 'androidx.appcompat:appcompat:1.1.0'
implementation 'org.tensorflow:tensorflow-lite:0.0.0-nightly'
implementation 'org.tensorflow:tensorflow-lite-select-tf-ops:0.0.0-nightly'
// This dependency adds the necessary TF op support.
implementation 'androidx.constraintlayout:constraintlayout:1.1.3'
testImplementation 'junit:junit:4.12'
androidTestImplementation 'androidx.test.ext:junit:1.1.1'
androidTestImplementation 'androidx.test.espresso:espresso-core:3.2.0'
}
As the error pointed, there was something wrong with dimensions at node 84. So I went ahead and visualised the tflite file using Netron. I have zoomed the concatenation node, you can find the pic of the node along with input and output dimensions here. You can find the whole generated graph here.
As it turns out, the concatenation node at location 84 isn't actually concatenating, you can see this from the input and output dimensions. It just spits out a 1X1X1 matrix after processing 1X1X1 and 1X1X256 matrix. I know tflite graph isn't same as the original model graph since a lot of operations are replaced and even removed for optimisations but this seems a little odd.
I can't relate this to the error. And if it runs prefectly on jupyter, is it a framework issue or am I missing something? Also, could anyone please explain me what does the error mean by t->dims->data[d] != t0->dims->data[d] what is d?
Please if you have answers to even any one of the question, please write it. If you require any extra details please let me know.
Here is the code for the model:
Tx = 8
def Partial_model():
outputs = []
X = tf.keras.layers.Input(shape=(Tx,))
partial = tf.keras.layers.Input(shape=(Tx,))
enc_hidden = tf.keras.layers.Input(shape=(units,))
dec_input = tf.keras.layers.Input(shape=(1,))
d_i = dec_input
e_h = enc_hidden
X_i = X
enc_output, e_h = encoder(X, enc_hidden)
dec_hidden = enc_hidden
print(dec_input.shape, 'inp', dec_hidden.shape, 'dec_hidd')
for t in range(1, Tx):
print(t, 'tt')
# passing enc_output to the decoder
predictions, dec_hidden, _ = decoder(d_i, dec_hidden, enc_output)
# outputs.append(predictions)
print(predictions.shape, 'pred')
d_i = tf.reshape(partial[:, t], (-1, 1))
print(dec_input.shape, 'dec_input')
predictions, dec_hidden, _ = decoder(d_i, dec_hidden, enc_output)
d_i = tf.squeeze(d_i)
outputs.append(tf.math.top_k(predictions, 5))
return tf.keras.Model(inputs = [X, enc_hidden, dec_input, partial], outputs = [outputs[0][0], outputs[0][1]])
class Encoder():
def __init__(self, vocab_size, embedding_dim, enc_units, batch_sz):
self.batch_sz = batch_sz
self.enc_units = enc_units
self.embedding = tf.keras.layers.Embedding(vocab_size, embedding_dim)
self.gru = tf.keras.layers.GRU(self.enc_units,
return_sequences=True,
return_state=True,
recurrent_initializer='glorot_uniform')
def __call__(self, x, hidden):
x = self.embedding(x)
output, state = self.gru(x, initial_state = hidden)
print(output.shape, hidden.shape, "out", "hid")
return output, state
def initialize_hidden_state(self):
return tf.zeros((self.batch_sz, self.enc_units))
class BahdanauAttention():
def __init__(self, units):
self.W1 = tf.keras.layers.Dense(units)
self.W2 = tf.keras.layers.Dense(units)
self.V = tf.keras.layers.Dense(1)
def __call__(self, query, values):
# query hidden state shape == (batch_size, hidden size)
# query_with_time_axis shape == (batch_size, 1, hidden size)
# values shape == (batch_size, max_len, hidden size)
# we are doing this to broadcast addition along the time axis to calculate the score
print(query.shape, 'shape')
query_with_time_axis = tf.expand_dims(query, 1)
# score shape == (batch_size, max_length, 1)
# we get 1 at the last axis because we are applying score to self.V
# the shape of the tensor before applying self.V is (batch_size, max_length, units)
print("2")
score = self.V(tf.nn.tanh(
self.W1(query_with_time_axis) + self.W2(values)))
print("3")
# attention_weights shape == (batch_size, max_length, 1)
attention_weights = tf.nn.softmax(score, axis=1)
# context_vector shape after sum == (batch_size, hidden_size)
context_vector = attention_weights * values
context_vector = tf.reduce_sum(context_vector, axis=1)
return context_vector, attention_weights
class Decoder():
def __init__(self, vocab_size, embedding_dim, dec_units, batch_sz):
self.dec_units = dec_units
self.embedding = tf.keras.layers.Embedding(vocab_size, embedding_dim)
self.gru = tf.keras.layers.GRU(self.dec_units,
return_sequences=True,
return_state=True,
recurrent_initializer='glorot_uniform')
self.fc = tf.keras.layers.Dense(vocab_size)
# used for attention
self.attention = BahdanauAttention(self.dec_units)
def __call__(self, x, hidden, enc_output):
# enc_output shape == (batch_size, max_length, hidden_size)
context_vector, attention_weights = self.attention(hidden, enc_output)
print(context_vector.shape, 'c_v', attention_weights.shape, "attention_w")
# x shape after passing through embedding == (batch_size, 1, embedding_dim)
x = self.embedding(x)
# x shape after concatenation == (batch_size, 1, embedding_dim + hidden_size)
print(x.shape, 'xshape', context_vector.shape, 'context')
expanded_dims = tf.expand_dims(context_vector, 1)
x = tf.concat([expanded_dims, x], axis=-1)
# passing the concatenated vector to the GRU
output, state = self.gru(x)
# output shape == (batch_size * 1, hidden_size)
output = tf.reshape(output, (-1, output.shape[2]))
# output shape == (batch_size, vocab)
x = self.fc(output)
return x, state, attention_weights
You can load the generated .tflite file inside python notebook and pass the same inputs as at Keras model. You have to see the exact outputs because during conversion of model there is no loss of accuracy. If there is a problem there...there will be problem during android operations. If not...everything will work fine. Use below code from Tensorflow guide to run inference in Python:
import numpy as np
import tensorflow as tf
# Load the TFLite model and allocate tensors.
interpreter = tf.lite.Interpreter(model_path="converted_model.tflite")
interpreter.allocate_tensors()
# Get input and output tensors.
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()
# Test the model on random input data.
input_shape = input_details[0]['shape']
input_data = np.array(np.random.random_sample(input_shape), dtype=np.float32)
interpreter.set_tensor(input_details[0]['index'], input_data)
interpreter.invoke()
# The function `get_tensor()` returns a copy of the tensor data.
# Use `tensor()` in order to get a pointer to the tensor.
output_data = interpreter.get_tensor(output_details[0]['index'])
print(output_data)
Happy coding!

Custom Attention Layer using in Keras

I want to create a custom attention layer that for input at any time this layer returns the weighted mean of inputs at all time inputs.
For Example, I want that input tensor with shape [32,100,2048] goes to layer and I get the tensor with the shape [32,100,2048]. I wrote the Layer as follow:
import tensorflow as tf
from keras.layers import Layer, Dense
#or
from tensorflow.keras.layers import Layer, Dense
class Attention(Layer):
def __init__(self, units_att):
self.units_att = units_att
self.W = Dense(units_att)
self.V = Dense(1)
super().__init__()
def __call__(self, values):
t = tf.constant(0, dtype= tf.int32)
time_steps = tf.shape(values)[1]
initial_outputs = tf.TensorArray(dtype=tf.float32, size=time_steps)
initial_att = tf.TensorArray(dtype=tf.float32, size=time_steps)
def should_continue(t, *args):
return t < time_steps
def iteration(t, values, outputs, atts):
score = self.V(tf.nn.tanh(self.W(values)))
# attention_weights shape == (batch_size, time_step, 1)
attention_weights = tf.nn.softmax(score, axis=1)
# context_vector shape after sum == (batch_size, hidden_size)
context_vector = attention_weights * values
context_vector = tf.reduce_sum(context_vector, axis=1)
outputs = outputs.write(t, context_vector)
atts = atts.write(t, attention_weights)
return t + 1, values, outputs, atts
t, values, outputs, atts = tf.while_loop(should_continue, iteration,
[t, values, initial_outputs, initial_att])
outputs = outputs.stack()
outputs = tf.transpose(outputs, [1,0,2])
atts = atts.stack()
atts = tf.squeeze(atts, -1)
atts = tf.transpose(atts, [1,0,2])
return t, values, outputs, atts
For input= tf.constant(2, shape= [32, 100, 2048], dtype= tf.float32) I get the
output with shape = [32,100,2048] in tf2 and [32,None, 2048] in tf1.
For Input input= Input(shape= (None, 2048)) I get the output with shape = [None, None, 2048] in tf1 and I get error
TypeError: 'Tensor' object cannot be interpreted as an integer
in tf2.
Finally, in both cases, I can't use this layer in my model because my model input is Input(shape= (None, 2048)) and I get the error
AttributeError: 'NoneType' object has no attribute '_inbound_nodes'
in tf1 and in tf2 I get the same error as said in above, I create my model with Keras functional method.
From the code you have shared, looks like you want to implement Bahdanau's attention layer in your code. You want to attend to all the 'values' (prev layer output - all its hidden states) and your 'query' would be the last hidden state of the decoder. Your code should actually be very simple and should look like:
class Bahdanau(tf.keras.layers.Layer):
def __init__(self, n):
super(Bahdanau, self).__init__()
self.w = tf.keras.layers.Dense(n)
self.u = tf.keras.layers.Dense(n)
self.v = tf.keras.layers.Dense(1)
def call(self, query, values):
query = tf.expand_dims(query, 1)
e = self.v(tf.nn.tanh(self.w(query) + self.u(values)))
a = tf.nn.softmax(e, axis=1)
c = a * values
c = tf.reduce_sum(c, axis=1)
return a,c
##Say we want 10 units in the single layer MLP determining w,u
attentionlayer = Bahdanau(10)
##Call with i/p: decoderstate # t-1 and all encoder hidden states
a, c = attentionlayer(stminus1, hj)
We are not specifying the tensor shape anywhere in the code. This code will return you a context tensor of same size as 'stminus1' which is the 'query'. It does this after attending to all the 'values' (all output states of decoder) using Bahdanau's attention mechanism.
So assuming your batch size is 32, timesteps=100 and embedding dimension=2048, the shape of stminus1 should be (32,2048) and the shape of the hj should be (32,100,2048). The shape of the output context would be (32,2048). We also returned the 100 attention weights just in case you want to route them to a nice display.
This is the simplest version of 'Attention'. If you have any other intent, please let me know and I will reformat my answer. For more specific details, please refer https://towardsdatascience.com/create-your-own-custom-attention-layer-understand-all-flavours-2201b5e8be9e

Input to attention in TensorFlow 2.0 tutorial on "Neural machine translation with attention"

There is one question when I learned the example "Neural machine translation with attention".
class Decoder(tf.keras.Model):
def __init__(self, vocab_size, embedding_dim, dec_units, batch_sz):
super(Decoder, self).__init__()
self.batch_sz = batch_sz
self.dec_units = dec_units
self.embedding = tf.keras.layers.Embedding(vocab_size, embedding_dim)
self.gru = tf.keras.layers.GRU(self.dec_units,
return_sequences=True,
return_state=True,
recurrent_initializer='glorot_uniform')
self.fc = tf.keras.layers.Dense(vocab_size)
# used for attention
self.attention = BahdanauAttention(self.dec_units)
def call(self, x, hidden, enc_output):
# enc_output shape == (batch_size, max_length, hidden_size)
context_vector, attention_weights = self.attention(hidden, enc_output)
# x shape after passing through embedding == (batch_size, 1, embedding_dim)
x = self.embedding(x)
# x shape after concatenation == (batch_size, 1, embedding_dim + hidden_size)
x = tf.concat([tf.expand_dims(context_vector, 1), x], axis=-1)
# passing the concatenated vector to the GRU
output, state = self.gru(x)
# output shape == (batch_size * 1, hidden_size)
output = tf.reshape(output, (-1, output.shape[2]))
# output shape == (batch_size, vocab)
x = self.fc(output)
return x, state, attention_weights
Why the attention weight is calculated by encoder_output and encoder_hidden and context vector is contacted with decoder_embedding. In my opinion, the attention weight should be calculated by encoder_output and every single hidden of decoder_output, and context vector should be contacted with decoder_output.
Maybe I have not understood the seq2seq with attention completely?
The attention is called in every step of the decoder. The inputs to the decoder step are:
previously decoded token x (or ground-truth token while training)
previous hidden state of the decoder hidden
hidden states of the encoder enc_output
As you correctly say, the attention the single decoder hidden states and all encoder hidden states as input which gives you the context vector.
context_vector, attention_weights = self.attention(hidden, enc_output)
The context vector gets concatenated with the embedding only after calling the attention mechanism when it is used as the input of the GRU cell.
x = tf.concat([tf.expand_dims(context_vector, 1), x], axis=-1)
output, state = self.gru(x)
The variable output will become hidden in the next step of the decoder.