I'm making a NER model with Bi-LSTM. I want to use Attention layers with it. I want to what is the right way to fit that Attention Layer? There are two layers given as: tf.keras.layers.Attention and tf.keras.layers.AdditiveAttention. I think it uses All Hidden states from LSTM as well as the last output but I'm not quite sure. Below is the code. Please tell me where do I have to put that Attention Layer? Documentation was not helpful for me. All other answers have used their own CustomAttention() Layer.
def build_model(vocab_size:int,n_tags:int,max_len:int,emb_dim:int=300,emb_weights=False,use_elmo:bool=False,use_crf:bool=False,train_embedding:bool=False):
'''
Build and return a Keras model based on the given inputs
args:
n_tags: No of unique 'y' tags present in the data
max_len: Maximum length of sentence to use
emb_dim: Size of embedding dimension
emb_weights: pretrained Embedding Weights for Embedding Layer. if False, use default
use_elmo: Whether to use Elmo Embeddings
use_crf: Whether to use the CRF layer
train_embedding: Whether to train the embeddings weights
out:
Keras model. See comments for each type of loss function and metric to use
'''
assert not(isinstance(emb_weights,np.ndarray) and use_elmo), "Either provide embedding weights or use ELMO. Not both"
inputs = Input(shape=(max_len,))
if isinstance(emb_weights,np.ndarray):
x = Embedding(trainable=train_embedding,input_dim=vocab_size, output_dim=emb_dim, input_length=max_len, mask_zero=True, embeddings_initializer=keras.initializers.Constant(emb_weights))(inputs)
elif use_elmo:
x = Lambda(ElmoEmbedding, output_shape=(max_len, 1024))(inputs) # Lambda will create a layer based on the function defined
else: # use default Embeddings
x = Embedding(input_dim=vocab_size, output_dim=emb_dim, input_length=max_len, mask_zero=True,)(inputs) # n_words = vocab_size
x = Bidirectional(LSTM(units=50, return_sequences=True,recurrent_dropout=0.1))(x)
# I think the attention layer will come here but I'm not sure exactly how to implement it here.
if use_crf:
try: # If you can not modify your crf.py file, it'll use the second package
x = Dense(50, activation="relu")(x) # use TimeDistributed(Dense(50, activation="relu")(x)) in case otherwise
crf = CRF(n_tags) # Instantiate CRF layer
out = crf(x)
model = Model(inputs, out)
return model # use crf_loss and crf_accuracy at compile time
except:
output = Dense(n_tags, activation=None)(x)
crf = CRF_TF2(dtype='float32') # it does not take any n_tags. See the documentation.
output = crf(output)
base_model = Model(inputs, output)
model = ModelWithCRFLoss(base_model) # It has Loss and Metric already. Change the model if you want to use DiceLoss.
return model # Do not use any metric or loss with this model.compile(). Just use Optimizer and run training
else:
out = Dense(n_tags, activation="softmax")(x) # Wrap it around TimeDistributed(Dense()) if you have old versions
model = Model(inputs, out)
return model # use "sparse_categorical_crossentropy", "accuracy"
I am trying to implement a convolutional autoencoder where some of the convolutional filters are input content dependent. For example, in a simple toy example, knowing the digit label for MNIST could further help with reconstruction in an autoencoder setup.
The more general idea is that there could be some relevant, auxiliary information (whether the information is the class label or some other information) that that is useful to incorporate. While there are various ways to use this label/auxiliary information, I will do so through creating a separate convolutional filter. Let's say the model has 15 typical convolutional filters, I would like to add an additional convolutional filter that corresponds to the MNIST digit and can be thought of as an embedding of the digit in the form of a 3x3 kernel. We would use the digit as an additional input to the network and then learn a distinct kernel/filter embedding for each digit.
However, I am having difficulty implementing a convolutional filter/kernel that is input dependent. I am not using tf.keras.layers.Conv2D layer because that takes in the # of filters to be used, but not the actual filter parameters to make this input dependent.
# load and preprocess data
num_classes = 10
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()
x_train, x_test = np.float32(x_train)/255, np.float32(x_test)/255
x_train, x_test = np.expand_dims(x_train, axis=-1), np.expand_dims(x_test, axis=-1)
y_train = keras.utils.to_categorical(y_train, num_classes=num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes=num_classes)
num_filters = 15
input_img = layers.Input(shape=(28,28,1))
conv_0 = keras.layers.Conv2D(num_filters, (3,3), strides=2, padding='same', activation='relu')(input_img)
# embed the target as a 3x3 kernel/filter -> this should map to a distinct embedding for
# each target
target = layers.Input(shape=(10,))
target_encoded = layers.Dense(9, activation='relu')(target)
target_encoded = layers.Reshape((3,3,1,1))(target_encoded)
# Using tf.nn.conv2d so that I can specify kernel
# Kernel needs to be a 4D tensor of dimensions (filter_height, filter_width, input_channels, output_channels)
# which in this case is (3,3,1,1)
# However it is currently (None,3,3,1,1) because the first dimension is batch size so this doesn't work
target_conv = tf.nn.conv2d(input_img, target_encoded, strides=[1, 1, 1, 1], padding='SAME')
I am currently using tf.nn.conv2d which takes a kernel as input in the format (filter_height, filter_width, input_channels, output_channels). However, this doesn't work as is because data is fed in batches. Therefore, each sample in the batch has a label and therefore a corresponding kernel so the kernels are of shape (None, 3, 3, 1, 1) which is not compatible with the expected format. This is illustrated in the code chunk above (which doesn't work). What are potential work arounds? Is there a simpler way to implement this concept of an input dependent conv2d filter?
Making A Conv2D with SWAPPABLE kernel!
You'll need to make your own Conv2D that takes as input the image to process AND the kernel to use.
# Define our new Convolution
class DynamicConv2D(tf.keras.layers.Layer):
def __init__(self, padding='SAME'):
super(DynamicConv2D, self).__init__()
self.padding = padding
def call(self, input, kernel):
return tf.nn.conv2d(input=input, filters=kernel,
strides=(1,1), padding=self.padding)
And let's test it out
dc2d = DynamicConv2D(padding='VALID')
input_tensor = np.ones([1,4,4,3],dtype=np.float32)
kernel_tensor = np.ones([2,2,3,1],dtype=np.float32)
dc2d(input_tensor, kernel_tensor)
returns
array([[[[12.], [12.], [12.]],
[[12.], [12.], [12.]],
[[12.], [12.], [12.]]]])
It looks like it works great... but there is a HUGE problem
HUGE ISSUE WITH KERAS - BATCH BY DEFAULT
Yeah, so here is the deal: tensorflow keras is really really really set on everything being set up so the first dimension is the batch. But if you look up above we have to specify the ONE KERNEL for the whole batch. We can't pass in a batch of kernel_tensorS, but just one.
THERE IS A WORK AROUND!
Let's borrow something from RNN training schemes, specifically we are going to solve this by being careful about what we send per batch. More specifically, for a batch we are going to make sure all input images use the same kernel_tensor. You'll have to figure out how you do that efficiently with your data pipeline, but here is an example to get you going.
Working Code
(We will rewrite out dynamic conv2d so that it takes a category and stores its
own kernel per category)
# Define our new Convolution
class DynamicConv2D(tf.keras.layers.Layer):
def __init__(self, padding='SAME', input_dim=10, kernel_shape=[3,3,1,8]):
super(DynamicConv2D, self).__init__()
self.padding = padding
self.input_dim = input_dim
self.kernel_shape = kernel_shape
self.kernel_size = kernel_shape[0]*kernel_shape[1]*kernel_shape[2]*kernel_shape[3] # = 3*3*1*8
self.category_to_kernel = tf.keras.layers.Embedding(self.input_dim,self.kernel_size)
def call(self, input, categories):
just_first_category = tf.slice(categories,(0,0),(1,1))
flat_kernel = self.category_to_kernel(just_first_category)
kernel = tf.reshape(flat_kernel,self.kernel_shape)
return tf.nn.conv2d(input=input, filters=kernel, strides=(1,1), padding=self.padding)
This class by default does a 3x3 convolution, reading in 1 filter from the previous layer and outputting 8
# Example output
dc2d = DynamicConv2D(padding='VALID')
image_data = np.ones([4,10,10,1],dtype=np.float32)
# prove that you can send in a different category and get different results
print( dc2d(image_data, [[3]]*4).numpy()[0,0,0,:3] )
print( dc2d(image_data, [[4]]*4).numpy()[0,0,0,:3] )
--------
[ 0.014 -0.002 0.108]
[ 0.021 0.014 -0.034]
Use it to make a tf.Keras model
# model input
image_input = tf.keras.Input(shape=(28,28,1), dtype=tf.float32)
category_input = tf.keras.Input(shape=(1,), dtype=tf.int32)
# do covolution
dynamic_conv2d = DynamicConv2D(padding='VALID')(image_input, category_input)
# make the model
model = tf.keras.Model(inputs=[image_input, category_input], outputs=dynamic_conv2d)
And we can use the model like so
# use the model
input_as_tensor = tf.constant(image_data,dtype=tf.float32)
category_as_tensor = tf.constant([[4]]*4,dtype=tf.int32)
result = model.predict(x=(input_as_tensor, category_as_tensor))
print('The output shape is',result.shape)
print('The first 3 values of the first output image are', result[0,0,0,:3])
---------
The output shape is (4, 8, 8, 8)
The first 3 values of the first output image are [-0.028 -0.009 0.015]
I want to get the value of an intermediate tensor in a convolutional neural network for a specific input. I know how to do this in keras and even though I have trained a model using keras, I'm going to move towards constructing and training the model using only tensorflow. Therefore, I want to move away from something like K.function(input_layer, output_layer) which is fairly simple, and instead use tensorflow. I believe I should use placeholder values, like the following approach:
with tf.compat.v1.Session(graph=tf.Graph()) as sess:
loaded_model = tf.keras.models.load_model(filepath)
graph = tf.compat.v1.get_default_graph()
images = tf.compat.v1.placeholder(tf.float32, shape=(None, 28, 28, 1)) # To specify input at MNIST images
output_tensor = graph.get_tensor_by_name(tensor_name) # tensor_name is 'dense_1/MatMul:0'
output = sess.run([output_tensor], feed_dict={images: x_test[0:1]}) # x_test[0:1] is of shape (1, 28, 28, 1)
print(output)
However, I get the following error message for the sess.run() line: Invalid argument: You must feed a value for placeholder tensor 'conv2d_2_input' with dtype float and shape [?,28,28,1]. I am unsure why I get this message because the image used for feed_dict is of type float and is what I believe to be the correct shape. Any help would be suggested.
You must use the input tensor from the Keras model, not make your own new placeholder, which would be disconnected from the rest of the model:
with tf.Graph().as_default(), tf.compat.v1.Session() as sess:
# Load model
loaded_model = tf.keras.models.load_model(filepath)
# Take model input tensor
images = loaded_model.input
# Take output of the second layer (index 1)
output_tensor = loaded_model.layers[1].output
# Evaluate
output = sess.run(output_tensor, feed_dict={images: x_test[0:1]})
print(output)
I'm learning tensorflow 2 working through the text classification with TF hub tutorial. It used an embedding module from TF hub. I was wondering if I could modify the model to include a LSTM layer. Here's what I've tried:
train_data, validation_data, test_data = tfds.load(
name="imdb_reviews",
split=('train[:60%]', 'train[60%:]', 'test'),
as_supervised=True)
embedding = "https://tfhub.dev/google/tf2-preview/gnews-swivel-20dim/1"
hub_layer = hub.KerasLayer(embedding, input_shape=[],
dtype=tf.string, trainable=True)
model = tf.keras.Sequential()
model.add(hub_layer)
model.add(tf.keras.layers.Embedding(10000, 50))
model.add(tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(64)))
model.add(tf.keras.layers.Dense(64, activation='relu'))
model.add(tf.keras.layers.Dense(1))
model.summary()
model.compile(optimizer='adam',
loss=tf.keras.losses.BinaryCrossentropy(from_logits=True),
metrics=['accuracy'])
history = model.fit(train_data.shuffle(10000).batch(512),
epochs=10,
validation_data=validation_data.batch(512),
verbose=1)
results = model.evaluate(test_data.batch(512), verbose=2)
for name, value in zip(model.metrics_names, results):
print("%s: %.3f" % (name, value))
I don't know how to get the vocabulary size from the hub_layer. So I just put 10000 there. When run it, it throws this exception:
tensorflow.python.framework.errors_impl.InvalidArgumentError: indices[480,1] = -6 is not in [0, 10000)
[[node sequential/embedding/embedding_lookup (defined at .../learning/tensorflow/text_classify.py:36) ]] [Op:__inference_train_function_36284]
Errors may have originated from an input operation.
Input Source operations connected to node sequential/embedding/embedding_lookup:
sequential/embedding/embedding_lookup/34017 (defined at Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/contextlib.py:112)
Function call stack:
train_function
I stuck here. My questions are:
how should I use the embedding module from TF hub to feed an LSTM layer? it looks like embedding lookup has some issues with the setting.
how do I get the vocabulary size from the hub layer?
Thanks
Finally figured out the way to link pre-trained embeddings to LSTM or other layers. Just post the steps here in case anyone feels helpful.
Embedding layer has to be the first layer in the model. (hub_layer is the same as Embedding layer.) The not very intuitive part is that any text input to the hub layer will be converted to only one vector of shape [embedding_dim]. You need to do sentence splitting and tokenization to make sure whatever input to the model is a sequence in the form of array of arrays. e.g., "Let us prepare the data." should be converted to [["let"],["us"],["prepare"], ["the"], ["data"]]. You will also need to pad the sequences if you are using batch mode.
In addition, you will need to convert your target tokens to int if your training labels are strings. The input to the model is array of strings with shape [batch, seq_length], the hub embedding layer converts it to [batch, seq_length, embed_dim]. (If you add a LSTM or other RNN layer, the output from the layer is [batch, seq_length, rnn_units]. ) The output dense layer will output index of text instead of actual text. The index of text is stored in the downloaded tfhub directory as "tokens.txt". You can load the file and convert text to the corresponding index. Otherwise you cannot compute the loss.
I created a CNN model using higher level tensorflow layers, like
conv1 = tf.layers.conv2d(...)
maxpooling1 = tf.layers.max_pooling2d(...)
conv2 = tf.layers.conv2d(...)
maxpooling2 = tf.layers.max_pooling2d(...)
flatten = tf.layers.flatten(...)
logits = tf.layers.dense(...)
loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(...))
optimizer = tf.train.AdadeltaOptimizer(init_lr).minimize(loss)
acc = tf.reduce_mean(...)
The model is well trained and saved, everything is good so far. Next, I want to load this saved model, make a change to the learning rate, and continue to train (I know tensorflow provides exponential_decay() function to allow a decay learning rate, here i just want to be in full control of learning rate, and change it manually). To do this, my idea is like:
saver = tf.train.import_meta_grah(...)
saver.restore(sess, tf.train.latest_chechpoint(...))
graph = tf.get_default_graph()
inputImg_ = graph.get_tensor_by_name(...) # this is place_holder in model
labels_ = graph.get_tensor_by_name(...) # place_holder in model
logits = graphget_tensor_by_name(...) # output of dense layer
loss = grah.get_tensor_by_name(...) # loss
optimizer = tf.train.AdadeltaOptimizer(new_lr).minimize(loss) # I give it a new learning rate
acc = tf.reduce_mean(...)
Now I got a problem. the code above can successfully obtain inputmg_, labels_, because I named them when I defined them. But I cannot obtain logits because logits = tf.layers.dense(name='logits') the name is actually given to the dense layer instead of the output tensor logits. That means, I cannot obtain the tensor conv1, conv2 either. It seems tensorflow cannot name a tensor output by a layer. In this case, is there a way to obtain these tensors, like logits, conv1, maxpooling1? I've searched for the answer for a while but failed.
I was having the same problem and solved it using tf.identity.
Since the dense layer has bias and weights parameters, when you name it, you are naming the layer, not the output tensor.
The tf.identity returns a tensor with the same shape and contents as input.
So just leave the dense layer unamed and use it as input to the tf.identity
self.output = tf.layers.dense(hidden_layer3, 2)
self.output = tf.identity(self.output, name='output')
Now you can load the output
output = graph.get_tensor_by_name('output:0')