Custom Attention Layer using in Keras - tensorflow

I want to create a custom attention layer that for input at any time this layer returns the weighted mean of inputs at all time inputs.
For Example, I want that input tensor with shape [32,100,2048] goes to layer and I get the tensor with the shape [32,100,2048]. I wrote the Layer as follow:
import tensorflow as tf
from keras.layers import Layer, Dense
#or
from tensorflow.keras.layers import Layer, Dense
class Attention(Layer):
def __init__(self, units_att):
self.units_att = units_att
self.W = Dense(units_att)
self.V = Dense(1)
super().__init__()
def __call__(self, values):
t = tf.constant(0, dtype= tf.int32)
time_steps = tf.shape(values)[1]
initial_outputs = tf.TensorArray(dtype=tf.float32, size=time_steps)
initial_att = tf.TensorArray(dtype=tf.float32, size=time_steps)
def should_continue(t, *args):
return t < time_steps
def iteration(t, values, outputs, atts):
score = self.V(tf.nn.tanh(self.W(values)))
# attention_weights shape == (batch_size, time_step, 1)
attention_weights = tf.nn.softmax(score, axis=1)
# context_vector shape after sum == (batch_size, hidden_size)
context_vector = attention_weights * values
context_vector = tf.reduce_sum(context_vector, axis=1)
outputs = outputs.write(t, context_vector)
atts = atts.write(t, attention_weights)
return t + 1, values, outputs, atts
t, values, outputs, atts = tf.while_loop(should_continue, iteration,
[t, values, initial_outputs, initial_att])
outputs = outputs.stack()
outputs = tf.transpose(outputs, [1,0,2])
atts = atts.stack()
atts = tf.squeeze(atts, -1)
atts = tf.transpose(atts, [1,0,2])
return t, values, outputs, atts
For input= tf.constant(2, shape= [32, 100, 2048], dtype= tf.float32) I get the
output with shape = [32,100,2048] in tf2 and [32,None, 2048] in tf1.
For Input input= Input(shape= (None, 2048)) I get the output with shape = [None, None, 2048] in tf1 and I get error
TypeError: 'Tensor' object cannot be interpreted as an integer
in tf2.
Finally, in both cases, I can't use this layer in my model because my model input is Input(shape= (None, 2048)) and I get the error
AttributeError: 'NoneType' object has no attribute '_inbound_nodes'
in tf1 and in tf2 I get the same error as said in above, I create my model with Keras functional method.

From the code you have shared, looks like you want to implement Bahdanau's attention layer in your code. You want to attend to all the 'values' (prev layer output - all its hidden states) and your 'query' would be the last hidden state of the decoder. Your code should actually be very simple and should look like:
class Bahdanau(tf.keras.layers.Layer):
def __init__(self, n):
super(Bahdanau, self).__init__()
self.w = tf.keras.layers.Dense(n)
self.u = tf.keras.layers.Dense(n)
self.v = tf.keras.layers.Dense(1)
def call(self, query, values):
query = tf.expand_dims(query, 1)
e = self.v(tf.nn.tanh(self.w(query) + self.u(values)))
a = tf.nn.softmax(e, axis=1)
c = a * values
c = tf.reduce_sum(c, axis=1)
return a,c
##Say we want 10 units in the single layer MLP determining w,u
attentionlayer = Bahdanau(10)
##Call with i/p: decoderstate # t-1 and all encoder hidden states
a, c = attentionlayer(stminus1, hj)
We are not specifying the tensor shape anywhere in the code. This code will return you a context tensor of same size as 'stminus1' which is the 'query'. It does this after attending to all the 'values' (all output states of decoder) using Bahdanau's attention mechanism.
So assuming your batch size is 32, timesteps=100 and embedding dimension=2048, the shape of stminus1 should be (32,2048) and the shape of the hj should be (32,100,2048). The shape of the output context would be (32,2048). We also returned the 100 attention weights just in case you want to route them to a nice display.
This is the simplest version of 'Attention'. If you have any other intent, please let me know and I will reformat my answer. For more specific details, please refer https://towardsdatascience.com/create-your-own-custom-attention-layer-understand-all-flavours-2201b5e8be9e

Related

How to make a Keras Dense Layer deal with 3D tensor as input for this Softmax Fully Connected Layer?

I am working on a custom problem, and i have to change the fully connected layer (Dense with softmax), My model code is something like this (with Keras Framework):
.......
batch_size = 8
inputs = tf.random.uniform(shape=[batch_size,1024,256],dtype=tf.dtypes.float32)
preds = Dense(num_classes,activation='softmax')(x) #final layer with softmax activation
....
model = Model(inputs=base_model.input,outputs=preds)
So, i have to change the Code of Dense Layer to output a Tensor of probabilities with the shape of [batch_size, 1024, num_classes], without using a for loop, i need it to be optimized and not a consuming time function
The Dense code version that i want to change:
class Dense(Layer):
"""Just your regular densely-connected NN layer.
`Dense` implements the operation:
`output = activation(dot(input, kernel) + bias)`
where `activation` is the element-wise activation function
passed as the `activation` argument, `kernel` is a weights matrix
created by the layer, and `bias` is a bias vector created by the layer
(only applicable if `use_bias` is `True`).
Note: if the input to the layer has a rank greater than 2, then
it is flattened prior to the initial dot product with `kernel`.
# Example
```python
# as first layer in a sequential model:
model = Sequential()
model.add(Dense(32, input_shape=(16,)))
# now the model will take as input arrays of shape (*, 16)
# and output arrays of shape (*, 32)
# after the first layer, you don't need to specify
# the size of the input anymore:
model.add(Dense(32))
```
# Arguments
units: Positive integer, dimensionality of the output space.
activation: Activation function to use
(see [activations](../activations.md)).
If you don't specify anything, no activation is applied
(ie. "linear" activation: `a(x) = x`).
use_bias: Boolean, whether the layer uses a bias vector.
kernel_initializer: Initializer for the `kernel` weights matrix
(see [initializers](../initializers.md)).
bias_initializer: Initializer for the bias vector
(see [initializers](../initializers.md)).
kernel_regularizer: Regularizer function applied to
the `kernel` weights matrix
(see [regularizer](../regularizers.md)).
bias_regularizer: Regularizer function applied to the bias vector
(see [regularizer](../regularizers.md)).
activity_regularizer: Regularizer function applied to
the output of the layer (its "activation").
(see [regularizer](../regularizers.md)).
kernel_constraint: Constraint function applied to
the `kernel` weights matrix
(see [constraints](../constraints.md)).
bias_constraint: Constraint function applied to the bias vector
(see [constraints](../constraints.md)).
# Input shape
nD tensor with shape: `(batch_size, ..., input_dim)`.
The most common situation would be
a 2D input with shape `(batch_size, input_dim)`.
# Output shape
nD tensor with shape: `(batch_size, ..., units)`.
For instance, for a 2D input with shape `(batch_size, input_dim)`,
the output would have shape `(batch_size, units)`.
"""
def __init__(self, units,
activation=None,
use_bias=True,
kernel_initializer='glorot_uniform',
bias_initializer='zeros',
kernel_regularizer=None,
bias_regularizer=None,
activity_regularizer=None,
kernel_constraint=None,
bias_constraint=None,
**kwargs):
if 'input_shape' not in kwargs and 'input_dim' in kwargs:
kwargs['input_shape'] = (kwargs.pop('input_dim'),)
super(Dense, self).__init__(**kwargs)
self.units = units
self.activation = activations.get(activation)
self.use_bias = use_bias
self.kernel_initializer = initializers.get(kernel_initializer)
self.bias_initializer = initializers.get(bias_initializer)
self.kernel_regularizer = regularizers.get(kernel_regularizer)
self.bias_regularizer = regularizers.get(bias_regularizer)
self.activity_regularizer = regularizers.get(activity_regularizer)
self.kernel_constraint = constraints.get(kernel_constraint)
self.bias_constraint = constraints.get(bias_constraint)
self.input_spec = InputSpec(min_ndim=2)
self.supports_masking = True
def build(self, input_shape):
assert len(input_shape) >= 2
input_dim = input_shape[-1]
self.kernel = self.add_weight(shape=(input_dim, self.units),
initializer=self.kernel_initializer,
name='kernel',
regularizer=self.kernel_regularizer,
constraint=self.kernel_constraint)
if self.use_bias:
self.bias = self.add_weight(shape=(self.units,),
initializer=self.bias_initializer,
name='bias',
regularizer=self.bias_regularizer,
constraint=self.bias_constraint)
else:
self.bias = None
self.input_spec = InputSpec(min_ndim=2, axes={-1: input_dim})
self.built = True
def call(self, inputs):
output = K.dot(inputs, self.kernel)
if self.use_bias:
output = K.bias_add(output, self.bias)
if self.activation is not None:
output = self.activation(output)
return output
def compute_output_shape(self, input_shape):
assert input_shape and len(input_shape) >= 2
assert input_shape[-1]
output_shape = list(input_shape)
output_shape[-1] = self.units
return tuple(output_shape)
def get_config(self):
config = {
'units': self.units,
'activation': activations.serialize(self.activation),
'use_bias': self.use_bias,
'kernel_initializer': initializers.serialize(self.kernel_initializer),
'bias_initializer': initializers.serialize(self.bias_initializer),
'kernel_regularizer': regularizers.serialize(self.kernel_regularizer),
'bias_regularizer': regularizers.serialize(self.bias_regularizer),
'activity_regularizer': regularizers.serialize(self.activity_regularizer),
'kernel_constraint': constraints.serialize(self.kernel_constraint),
'bias_constraint': constraints.serialize(self.bias_constraint)
}
base_config = super(Dense, self).get_config()
return dict(list(base_config.items()) + list(config.items()))
There are three different ways in which this can be done (that I can think of). If you want to have a single dense layer, that maps a vector of 256 elements to a vector of num_classes elements, and apply it all across your batch of data (that is, use the same 256 x num_classes matrix of weights for every sample), then you don't need to do anything special, just use a regular Dense layer:
import tensorflow as tf
from tensorflow.keras import Input
from tensorflow.keras.layers import Dense
batch_size = 8
num_classes = 10
inp = Input(shape=(1024, 256))
layer = Dense(num_classes, activation='softmax')
out = layer(inp)
print(out.shape)
# (None, 1024, 10)
print(layer.count_params())
# 2570
Another way would be to have a single huge Dense layer that takes all 1024 * 256 values in at the same time and produces all 1024 * num_classes values at the output, that is, a layer with a matrix of weights with shape (1024 * 256) x (1024 * num_classes) (in the order if gigabytes of memory!). This is easy to do too, although it seems unlikely to be what you need:
import tensorflow as tf
from tensorflow.keras import Input
from tensorflow.keras.layers import Flatten, Dense, Reshape, Softmax
batch_size = 8
num_classes = 10
inp = Input(shape=(1024, 256))
res = Flatten()(inp)
# This takes _a lot_ of memory!
layer = Dense(1024 * num_classes, activation=None)
out_res = layer(res)
# Apply softmax after reshaping
out_preact = Reshape((-1, num_classes))(out_res)
out = Softmax()(out_preact)
print(out.shape)
# (None, 1024, 10)
print(layer.count_params())
# 2684364800
Finally, you may want to have a set of 1024 weight matrices, each one applied to the corresponding sample in the input, which would imply an array of weights with shape (1024, 256, num_classes). I don't think this can be done with one of the standard Keras layers (or don't know how to)1, but it's easy enough to write a custom layer based on Dense to do that:
import tensorflow as tf
from tensorflow.keras.layers import Dense, InputSpec
class Dense2D(Dense):
def __init__(self, *args, **kwargs):
super(Dense2D, self).__init__(*args, **kwargs)
def build(self, input_shape):
assert len(input_shape) >= 3
input_dim1 = input_shape[-2]
input_dim2 = input_shape[-1]
self.kernel = self.add_weight(shape=(input_dim1, input_dim2, self.units),
initializer=self.kernel_initializer,
name='kernel',
regularizer=self.kernel_regularizer,
constraint=self.kernel_constraint)
if self.use_bias:
self.bias = self.add_weight(shape=(input_dim1, self.units),
initializer=self.bias_initializer,
name='bias',
regularizer=self.bias_regularizer,
constraint=self.bias_constraint)
else:
self.bias = None
self.input_spec = InputSpec(min_ndim=3, axes={-2: input_dim1, -1: input_dim2})
self.built = True
def call(self, inputs):
# Multiply each set of weights with each input element
output = tf.einsum('...ij,ijk->...ik', inputs, self.kernel)
if self.use_bias:
output += self.bias
if self.activation is not None:
output = self.activation(output)
return output
def compute_output_shape(self, input_shape):
assert input_shape and len(input_shape) >= 3
assert input_shape[-1]
output_shape = list(input_shape)
output_shape[-1] = self.units
return tuple(output_shape)
You would then use it like this:
import tensorflow as tf
from tensorflow.keras import Input
batch_size = 8
num_classes = 10
inp = Input(shape=(1024, 256))
layer = Dense2D(num_classes, activation='softmax')
out = layer(inp)
print(out.shape)
# (None, 1024, 10)
print(layer.count_params())
# 2631680
1: As today points out in the comments, you can actually use a LocallyConnected1D layer to do the same that I tried to do with my Dense2D layer. It is as simple as this:
import tensorflow as tf
from tensorflow.keras import Input
from tensorflow.keras.layers import LocallyConnected1D
batch_size = 8
num_classes = 10
inp = Input(shape=(1024, 256))
layer = LocallyConnected1D(num_classes, 1, activation='softmax')
out = layer(inp)
print(out.shape)
# (None, 1024, 10)
print(layer.count_params())
# 2631680

Failed to run the tflite model on Interpreter due to Internal Error

I am trying to build an offline translator for android. My model is highly inspired from this guide: https://www.tensorflow.org/tutorials/text/nmt_with_attention. I just did some modifications to make sure the model is serialisable. (You can find the code for the model at the end)
The model works perfectly on my jupyter notebook. I am using Tensorflow version: 2.3.0-dev20200617, I also was able to generate the tflite file using the following snippet:
converter = tf.lite.TFLiteConverter.from_keras_model(partial_model)
tflite_model = converter.convert()
with tf.io.gfile.GFile('goog_nmt_v2.tflite', 'wb') as f:
f.write(tflite_model)
However when I used the generated tflite model to get predictions on android, it throws the error java.lang.IllegalArgumentException: Internal error: Failed to run on the given Interpreter: tensorflow/lite/kernels/concatenation.cc:73 t->dims->data[d] != t0->dims->data[d] (8 != 1) Node number 84 (CONCATENATION) failed to prepare.
This is strange because I have provided the exact same input dimensions as I did in my jupyter notebook. Here is the java code that is used to test (dummy inputs) if model runs on android:
HashMap<Integer, Object> outputVal = new HashMap<>();
for(int i=0; i<2; i++) outputVal.put(i, new float[1][5]);
float[][] inp_test = new float[1][8];
float[][] enc_hidden = new float[1][1024];
float[][] dec_input = new float[1][1];
float[][] dec_test = new float[1][8];
tfLite.runForMultipleInputsOutputs(new Object[] {inp_test,enc_hidden, dec_input, dec_test},outputVal);
And here are my gradle dependencies:
dependencies {
implementation fileTree(dir: 'libs', include: ['*.jar'])
implementation 'androidx.appcompat:appcompat:1.1.0'
implementation 'org.tensorflow:tensorflow-lite:0.0.0-nightly'
implementation 'org.tensorflow:tensorflow-lite-select-tf-ops:0.0.0-nightly'
// This dependency adds the necessary TF op support.
implementation 'androidx.constraintlayout:constraintlayout:1.1.3'
testImplementation 'junit:junit:4.12'
androidTestImplementation 'androidx.test.ext:junit:1.1.1'
androidTestImplementation 'androidx.test.espresso:espresso-core:3.2.0'
}
As the error pointed, there was something wrong with dimensions at node 84. So I went ahead and visualised the tflite file using Netron. I have zoomed the concatenation node, you can find the pic of the node along with input and output dimensions here. You can find the whole generated graph here.
As it turns out, the concatenation node at location 84 isn't actually concatenating, you can see this from the input and output dimensions. It just spits out a 1X1X1 matrix after processing 1X1X1 and 1X1X256 matrix. I know tflite graph isn't same as the original model graph since a lot of operations are replaced and even removed for optimisations but this seems a little odd.
I can't relate this to the error. And if it runs prefectly on jupyter, is it a framework issue or am I missing something? Also, could anyone please explain me what does the error mean by t->dims->data[d] != t0->dims->data[d] what is d?
Please if you have answers to even any one of the question, please write it. If you require any extra details please let me know.
Here is the code for the model:
Tx = 8
def Partial_model():
outputs = []
X = tf.keras.layers.Input(shape=(Tx,))
partial = tf.keras.layers.Input(shape=(Tx,))
enc_hidden = tf.keras.layers.Input(shape=(units,))
dec_input = tf.keras.layers.Input(shape=(1,))
d_i = dec_input
e_h = enc_hidden
X_i = X
enc_output, e_h = encoder(X, enc_hidden)
dec_hidden = enc_hidden
print(dec_input.shape, 'inp', dec_hidden.shape, 'dec_hidd')
for t in range(1, Tx):
print(t, 'tt')
# passing enc_output to the decoder
predictions, dec_hidden, _ = decoder(d_i, dec_hidden, enc_output)
# outputs.append(predictions)
print(predictions.shape, 'pred')
d_i = tf.reshape(partial[:, t], (-1, 1))
print(dec_input.shape, 'dec_input')
predictions, dec_hidden, _ = decoder(d_i, dec_hidden, enc_output)
d_i = tf.squeeze(d_i)
outputs.append(tf.math.top_k(predictions, 5))
return tf.keras.Model(inputs = [X, enc_hidden, dec_input, partial], outputs = [outputs[0][0], outputs[0][1]])
class Encoder():
def __init__(self, vocab_size, embedding_dim, enc_units, batch_sz):
self.batch_sz = batch_sz
self.enc_units = enc_units
self.embedding = tf.keras.layers.Embedding(vocab_size, embedding_dim)
self.gru = tf.keras.layers.GRU(self.enc_units,
return_sequences=True,
return_state=True,
recurrent_initializer='glorot_uniform')
def __call__(self, x, hidden):
x = self.embedding(x)
output, state = self.gru(x, initial_state = hidden)
print(output.shape, hidden.shape, "out", "hid")
return output, state
def initialize_hidden_state(self):
return tf.zeros((self.batch_sz, self.enc_units))
class BahdanauAttention():
def __init__(self, units):
self.W1 = tf.keras.layers.Dense(units)
self.W2 = tf.keras.layers.Dense(units)
self.V = tf.keras.layers.Dense(1)
def __call__(self, query, values):
# query hidden state shape == (batch_size, hidden size)
# query_with_time_axis shape == (batch_size, 1, hidden size)
# values shape == (batch_size, max_len, hidden size)
# we are doing this to broadcast addition along the time axis to calculate the score
print(query.shape, 'shape')
query_with_time_axis = tf.expand_dims(query, 1)
# score shape == (batch_size, max_length, 1)
# we get 1 at the last axis because we are applying score to self.V
# the shape of the tensor before applying self.V is (batch_size, max_length, units)
print("2")
score = self.V(tf.nn.tanh(
self.W1(query_with_time_axis) + self.W2(values)))
print("3")
# attention_weights shape == (batch_size, max_length, 1)
attention_weights = tf.nn.softmax(score, axis=1)
# context_vector shape after sum == (batch_size, hidden_size)
context_vector = attention_weights * values
context_vector = tf.reduce_sum(context_vector, axis=1)
return context_vector, attention_weights
class Decoder():
def __init__(self, vocab_size, embedding_dim, dec_units, batch_sz):
self.dec_units = dec_units
self.embedding = tf.keras.layers.Embedding(vocab_size, embedding_dim)
self.gru = tf.keras.layers.GRU(self.dec_units,
return_sequences=True,
return_state=True,
recurrent_initializer='glorot_uniform')
self.fc = tf.keras.layers.Dense(vocab_size)
# used for attention
self.attention = BahdanauAttention(self.dec_units)
def __call__(self, x, hidden, enc_output):
# enc_output shape == (batch_size, max_length, hidden_size)
context_vector, attention_weights = self.attention(hidden, enc_output)
print(context_vector.shape, 'c_v', attention_weights.shape, "attention_w")
# x shape after passing through embedding == (batch_size, 1, embedding_dim)
x = self.embedding(x)
# x shape after concatenation == (batch_size, 1, embedding_dim + hidden_size)
print(x.shape, 'xshape', context_vector.shape, 'context')
expanded_dims = tf.expand_dims(context_vector, 1)
x = tf.concat([expanded_dims, x], axis=-1)
# passing the concatenated vector to the GRU
output, state = self.gru(x)
# output shape == (batch_size * 1, hidden_size)
output = tf.reshape(output, (-1, output.shape[2]))
# output shape == (batch_size, vocab)
x = self.fc(output)
return x, state, attention_weights
You can load the generated .tflite file inside python notebook and pass the same inputs as at Keras model. You have to see the exact outputs because during conversion of model there is no loss of accuracy. If there is a problem there...there will be problem during android operations. If not...everything will work fine. Use below code from Tensorflow guide to run inference in Python:
import numpy as np
import tensorflow as tf
# Load the TFLite model and allocate tensors.
interpreter = tf.lite.Interpreter(model_path="converted_model.tflite")
interpreter.allocate_tensors()
# Get input and output tensors.
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()
# Test the model on random input data.
input_shape = input_details[0]['shape']
input_data = np.array(np.random.random_sample(input_shape), dtype=np.float32)
interpreter.set_tensor(input_details[0]['index'], input_data)
interpreter.invoke()
# The function `get_tensor()` returns a copy of the tensor data.
# Use `tensor()` in order to get a pointer to the tensor.
output_data = interpreter.get_tensor(output_details[0]['index'])
print(output_data)
Happy coding!

Unable to understand the behavior of method `build` in tensorflow keras layers (tf.keras.layers.Layer)

Layers in tensorflow keras have a method build that is used to defer the weights creation to a time when you have seen what the input is going to be. a layer's build method
I have a few questions i have not been able to find the answer of:
here it is said that
If you assign a Layer instance as attribute of another Layer, the outer layer will start tracking the weights of the inner layer.
What does it mean to track the weights of a layer?
The same link also mentions that
We recommend creating such sublayers in the init method (since the sublayers will typically have a build method, they will be built when the outer layer gets built).
Does it mean that while running the build method of child class (self), there will an iteration through all the attributes of self and whichever are found to be subclassed from (instances of) tf.keras.layer.Layer will have their build methods run automatically?
I can run this code:
class Net(tf.keras.Model):
"""A simple linear model."""
def __init__(self):
super(Net, self).__init__()
self.l1 = tf.keras.layers.Dense(5)
def call(self, x):
return self.l1(x)
net = Net()
print(net.variables)
But not this:
class Net(tf.keras.Model):
"""A simple linear model."""
def __init__(self):
super(Net, self).__init__()
self.l1 = tf.keras.layers.Dense(5)
def build(self,input_shape):
super().build()
def call(self, x):
return self.l1(x)
net = Net()
print(net.variables)
why?
I would say the build mentioned means, when you build a self-defined tf.keras.Model for example
net = Net()
then you will get all the tf.keras.layers.Layer objects create in __init__, and being stored in net which is a callable object. In this case, it will become a completed object for TF to train later, this is what it said to track. The next time you call net(inputs) you'll can get your outputs.
Here is a example of Tensorflow self-defined decoder with attention
class BahdanauAttention(tf.keras.layers.Layer):
def __init__(self, units):
super(BahdanauAttention, self).__init__()
self.W1 = tf.keras.layers.Dense(units)
self.W2 = tf.keras.layers.Dense(units)
self.V = tf.keras.layers.Dense(1)
def call(self, query, values):
# query hidden state shape == (batch_size, hidden size)
# query_with_time_axis shape == (batch_size, 1, hidden size)
# values shape == (batch_size, max_len, hidden size)
# we are doing this to broadcast addition along the time axis to calculate the score
query_with_time_axis = tf.expand_dims(query, 1)
# score shape == (batch_size, max_length, 1)
# we get 1 at the last axis because we are applying score to self.V
# the shape of the tensor before applying self.V is (batch_size, max_length, units)
score = self.V(tf.nn.tanh(
self.W1(query_with_time_axis) + self.W2(values)))
# attention_weights shape == (batch_size, max_length, 1)
attention_weights = tf.nn.softmax(score, axis=1)
# context_vector shape after sum == (batch_size, hidden_size)
context_vector = attention_weights * values
context_vector = tf.reduce_sum(context_vector, axis=1)
return context_vector, attention_weights
class Decoder(tf.keras.Model):
def __init__(self, vocab_size, embedding_dim, dec_units, batch_sz):
super(Decoder, self).__init__()
self.batch_sz = batch_sz
self.dec_units = dec_units
self.embedding = tf.keras.layers.Embedding(vocab_size, embedding_dim)
self.gru = tf.keras.layers.GRU(self.dec_units,
return_sequences=True,
return_state=True,
recurrent_initializer='glorot_uniform')
self.fc = tf.keras.layers.Dense(vocab_size)
# used for attention
self.attention = BahdanauAttention(self.dec_units)
def call(self, x, hidden, enc_output):
# enc_output shape == (batch_size, max_length, hidden_size)
context_vector, attention_weights = self.attention(hidden, enc_output)
# x shape after passing through embedding == (batch_size, 1, embedding_dim)
x = self.embedding(x)
# x shape after concatenation == (batch_size, 1, embedding_dim + hidden_size)
x = tf.concat([tf.expand_dims(context_vector, 1), x], axis=-1)
# passing the concatenated vector to the GRU
output, state = self.gru(x)
# output shape == (batch_size * 1, hidden_size)
output = tf.reshape(output, (-1, output.shape[2]))
# output shape == (batch_size, vocab)
x = self.fc(output)
return x, state, attention_weights
I have tried to put tf.keras.layers.Layer object in call and got really poor outcome, guess that was because if you put it in call then it will be call multiple times while each time a forward-backward propagation happends.

How to use a pre-trained embedding matrix in tensorflow 2.0 RNN as initial weights in an embedding layer?

I'd like to use a pretrained GloVe embedding as the initial weights for an embedding layer in an RNN encoder/decoder. The code is in Tensorflow 2.0. Simply adding the embedding matrix as a weights = [embedding_matrix] parameter to the tf.keras.layers.Embedding layer won't do it because the encoder is an object and I'm not sure now to effectively pass the embedding_matrix to this object at training time.
My code closely follows the neural machine translation example in the Tensorflow 2.0 documentation. How would I add a pre-trained embedding matrix to the encoder in this example? The encoder is an object. When I get to training, the GloVe embedding matrix is unavailable to the Tensorflow graph. I get the error message:
RuntimeError: Cannot get value inside Tensorflow graph function.
The code uses the GradientTape method and teacher forcing in the training process.
I've tried modifying the encoder object to include the embedding_matrix at various points, including in the encoder's init, call and initialize_hidden_state. All of these fail. The other questions on stackoverflow and elsewhere are for Keras or older versions of Tensorflow, not Tensorflow 2.0.
class Encoder(tf.keras.Model):
def __init__(self, vocab_size, embedding_dim, enc_units, batch_sz):
super(Encoder, self).__init__()
self.batch_sz = batch_sz
self.enc_units = enc_units
self.embedding = tf.keras.layers.Embedding(vocab_size, embedding_dim, weights=[embedding_matrix])
self.gru = tf.keras.layers.GRU(self.enc_units,
return_sequences=True,
return_state=True,
recurrent_initializer='glorot_uniform')
def call(self, x, hidden):
x = self.embedding(x)
output, state = self.gru(x, initial_state = hidden)
return output, state
def initialize_hidden_state(self):
return tf.zeros((self.batch_sz, self.enc_units))
encoder = Encoder(vocab_inp_size, embedding_dim, units, BATCH_SIZE)
# sample input
sample_hidden = encoder.initialize_hidden_state()
sample_output, sample_hidden = encoder(example_input_batch, sample_hidden)
print ('Encoder output shape: (batch size, sequence length, units) {}'.format(sample_output.shape))
print ('Encoder Hidden state shape: (batch size, units) {}'.format(sample_hidden.shape))
# ... Bahdanau Attention, Decoder layers, and train_step defined, see link to full tensorflow code above ...
# Relevant training code
EPOCHS = 10
training_record = pd.DataFrame(columns = ['epoch', 'training_loss', 'validation_loss', 'epoch_time'])
for epoch in range(EPOCHS):
template = 'Epoch {}/{}'
print(template.format(epoch +1,
EPOCHS))
start = time.time()
enc_hidden = encoder.initialize_hidden_state()
total_loss = 0
total_val_loss = 0
for (batch, (inp, targ)) in enumerate(dataset.take(steps_per_epoch)):
batch_loss = train_step(inp, targ, enc_hidden)
total_loss += batch_loss
if batch % 100 == 0:
template = 'batch {} ============== train_loss: {}'
print(template.format(batch +1,
round(batch_loss.numpy(),4)))
I was trying to do the same thing and getting the exact same error. The problem was that weights in the Embedding layer is currently deprecated. Changing weights= to embeddings_initializer= worked for me.
self.embedding = tf.keras.layers.Embedding(vocab_size, embedding_dim,
embeddings_initializer=tf.keras.initializers.Constant(embedding_matrix),
trainable=False)
firslty : load pretrained embedding matrix using
def pretrained_embeddings(file_path, EMBEDDING_DIM, VOCAB_SIZE, word2idx):
# 1.load in pre-trained word vectors #feature vector for each word
print("graph in function",tf.get_default_graph())
print('Loading word vectors...')
word2vec = {}
with open(os.path.join(file_path+'.%sd.txt' % EMBEDDING_DIM), errors='ignore', encoding='utf8') as f:
# is just a space-separated text file in the format:
# word vec[0] vec[1] vec[2] ...
for line in f:
values = line.split()
word = values[0]
vec = np.asarray(values[1:], dtype='float32')
word2vec[word] = vec
print('Found %s word vectors.' % len(word2vec))
# 2.prepare embedding matrix
print('Filling pre-trained embeddings...')
num_words = VOCAB_SIZE
# initialization by zeros
embedding_matrix = np.zeros((num_words, EMBEDDING_DIM))
for word, i in word2idx.items():
if i < VOCAB_SIZE:
embedding_vector = word2vec.get(word)
if embedding_vector is not None:
# words not found in embedding index will be all zeros.
embedding_matrix[i] = embedding_vector
return embedding_matrix
2-then update Encoder class as following:
class Encoder(tf.keras.Model):
def __init__(self, vocab_size, embedding_dim, enc_units, batch_sz,embedding_matrix):
super(Encoder, self).__init__()
self.batch_sz = batch_sz
self.enc_units = enc_units
self.embedding = tf.keras.layers.Embedding(vocab_size, embedding_dim, weights=[embedding_matrix])
self.gru = tf.keras.layers.GRU(self.enc_units,
return_sequences=True,
return_state=True,
recurrent_initializer='glorot_uniform')
def call(self, x, hidden):
x = self.embedding(x)
output, state = self.gru(x, initial_state = hidden)
return output, state
def initialize_hidden_state(self):
return tf.zeros((self.batch_sz, self.enc_units))
3-calling function that loads pre-trained embedding to get embedding matrix
embedding_matrix = pretrained_embeddings(file_path, EMBEDDING_DIM,vocab_size, word2idx)
encoder = Encoder(vocab_inp_size, embedding_dim, units, BATCH_SIZE,embedding_matrix)
# sample input
sample_hidden = encoder.initialize_hidden_state()
sample_output, sample_hidden = encoder(example_input_batch, sample_hidden)
print ('Encoder output shape: (batch size, sequence length, units) {}'.format(sample_output.shape))
print ('Encoder Hidden state shape: (batch size, units) {}'.format(sample_hidden.shape))
Note : this works on tensorflow 1.13.1 well

Finetuning DNN with continuous outputs in the last layer

Greatly appreciate it if someone could help me out here:
I'm trying to do some finetuning on a regression task --- my inputs are 200X200 RGB images and my prediction output/label is a set of real values (let's say, within [0,10], though scaling is not a big deal here...?) --- on top of InceptionV3 architecture. Here are my functions that take a pretrained Inception model, remove the last layer and add a a new layer, set up for finetuning...
"""
Fine-tuning functions
"""
IM_WIDTH, IM_HEIGHT = 299, 299 #fixed size for InceptionV3
NB_EPOCHS = 3
BAT_SIZE = 32
FC_SIZE = 1024
NB_IV3_LAYERS_TO_FREEZE = 172
def eucl_dist(inputs):
x, y = inputs
return ((x - y)**2).sum(axis=-1)
def add_new_last_continuous_layer(base_model):
"""Add last layer to the convnet
Args:
base_model: keras model excluding top, for instance:
base_model = InceptionV3(weights='imagenet',include_top=False)
Returns:
new keras model with last layer
"""
x = base_model.output
x = GlobalAveragePooling2D()(x)
x = Dense(FC_SIZE, activation='relu')(x)
predictions = Lambda(eucl_dist, output_shape=(1,))(x)
model = Model(input=base_model.input, output=predictions)
return model
def setup_to_finetune_continuous(model):
"""Freeze the bottom NB_IV3_LAYERS and retrain the remaining top
layers.
note: NB_IV3_LAYERS corresponds to the top 2 inception blocks in
the inceptionv3 architecture
Args:
model: keras model
"""
for layer in model.layers[:NB_IV3_LAYERS_TO_FREEZE]:
layer.trainable = False
for layer in model.layers[NB_IV3_LAYERS_TO_FREEZE:]:
layer.trainable = True
model.compile(optimizer=SGD(lr=0.0001, momentum=0.9),
loss='eucl_dist')
Here are my implementations:
base_model = InceptionV3(weights = "imagenet",
include_top=False, input_shape=(3,200,200))
model0 = add_new_last_continuous_layer(base_model)
setup_to_finetune_continuous(model0)
history=model0.fit(train_x, train_y, validation_data = (test_x, test_y), nb_epoch=epochs, batch_size=32)
scores = model0.evaluate(test_x, test_y, verbose = 0)
features = model0.predict(X_train)
where train_x is a (168435, 3, 200, 200) numpy array and train_y is a (168435,) numpy array. The same goes for test_x and test_y except the number of observations is 42509.
I got the TypeError: Tensor object is not iterable bug which occurred at predictions = Lambda(eucl_dist, output_shape=(1,))(x)'' when going through theadd_new_last_continuous_layer()`` function. Could you anyone kindly give me some guidance to get around that and what the problem is? Greatly appreciated and happy holidays!
EDIT:
Changed the functions to:
def eucl_dist(inputs):
x, y = inputs
return ((x - y)**2).sum(axis=-1)
def add_new_last_continuous_layer(base_model):
"""Add last layer to the convnet
Args:
base_model: keras model excluding top, for instance:
base_model = InceptionV3(weights='imagenet',include_top=False)
Returns:
new keras model with last layer
"""
x = base_model.output
x = GlobalAveragePooling2D()(x)
x1 = Dense(FC_SIZE, activation='relu')(x)
x2 = Dense(FC_SIZE, activation='relu')(x)
predictions = Lambda(eucl_dist, output_shape=eucl_dist_shape)([x1,x2])
model = Model(input=base_model.input, output=predictions)
return model
Your output shape for the lambda layer is wrong. Define your functions like this:
from keras import backend as K
def euclidean_distance(vects):
x, y = vects
return K.sqrt(K.maximum(K.sum(K.square(x - y), axis=1, keepdims=True), K.epsilon()))
def eucl_dist_output_shape(shapes):
shape1, shape2 = shapes
return (shape1[0], 1)
predictions = Lambda(euclidean_distance, output_shape=eucl_dist_output_shape)([input1, input2])