Combining a Pre-trained Model with a Custom Model in TF - tensorflow

I have simple network that I would like to increase its complexity by combining it with a pre-trained model such as InceptionV3. However, once I join them together with the following command:
snn_model = Model(inputs=baseModel.input, outputs=model, name = 'snn')
I face this error:
ValueError: Output tensors of a Functional model must be the output of a TensorFlow `Layer` (thus holding past layer metadata). Found: <tensorflow.python.keras.engine.functional.Functional object at 0x7f82d1804c10>
My Network is as follows:
def build_siamese_model(inputShape, embeddingDim=48):
# increase model complexity by adding Inception
# make the network itself generate the embediings
# specify the inputs for the feature extractor network
inputs = Input(inputShape)
# define the first set of CONV => RELU => POOL => DROPOUT layers
x = Conv2D(64,(2,2), padding='same', activation='relu')(inputs)
x = MaxPooling2D(pool_size=2)(x)
x = Dropout(0.3)(x)
# second set of CONV => RELU => POOL => DROPOUT layers
x = Conv2D(64,(2,2), padding='same', activation='relu')(x)
x = MaxPooling2D(pool_size=2)(x)
x = Dropout(0.3)(x)
# prepare the final outputs
pooledOutput = GlobalAveragePooling2D()(x)
outputs = Dense(embeddingDim)(pooledOutput)
# build the model
model = Model(inputs, outputs)
# return the model to the calling function
return model
I am combining my network with InceptionV3 as follows:
baseModel = InceptionV3(weights="imagenet", include_top=False, input_shape=(160, 160,3), input_tensor=Input(shape=(160, 160,3)))
snn_model = Model(inputs=baseModel.input, outputs=model, name = 'snn')
Even if I try to switch between these models by giving the InceptionV3 output as an input to my custom network I got another error:
ValueError: Negative dimension size caused by subtracting 2 from 1 for '{{node max_pooling2d_62/MaxPool}} = MaxPool[T=DT_FLOAT, data_format="NHWC", explicit_paddings=[], ksize=[1, 2, 2, 1], padding="VALID", strides=[1, 2, 2, 1]](Placeholder)' with input shapes: [?,1,1,64].
So, my idea is to combine a custom model with a pre-trained model to increase complexity and achieve a better performance.

A simple google search for transfer learning will have Transfer learning and fine-tuning as the first result. I suggest that you read it first as it has exactly what you are trying to do.
Basically you will use InceptionV3 as you would do a normal layer inside your build_siamese_model function that returns your entire model. Something like this will do:
# specify the inputs for the feature extractor network
inputs = Input(inputShape)
# define the first set of CONV => RELU => POOL => DROPOUT layers
x = baseModel(inputs) # initialized from InceptionV3
x = Conv2D(64,(2,2), padding='same', activation='relu')(x)
x = MaxPooling2D(pool_size=2)(x)
x = Dropout(0.3)(x)
# the rest of your code ...
Again, you should read the documentation to understand how to properly instantiate a pre-trained model and handle batch normalization layers

The first error is due to specifying output incorrectly. You should define output of the model, a output of a layer. So in order to resolve the first error, modified code, should be like this:
snn_model = Model(inputs=baseModel.input, outputs=model.output, name = 'snn')
The second error, indicates, you are shrinking input tensor, through your model layers, until it reaches to (1,1) size. Then it can not go further, since each Conv2D or MaxPooling layer could makes the input size smaller, and it will gets negative, somewhere in your middle layers. So, try to remove some layers or architecture, or try to feed a bigger input size to avoid reaching a (1,1) tensor in middle layers. You can use model.summary() to see each layers output shape, in order to figure out the journey of input tensor throughout your layers.

Related

How to use Attention or AdditiveAttention Layers which are given in tensorflow (Keras) for NER task?

I'm making a NER model with Bi-LSTM. I want to use Attention layers with it. I want to what is the right way to fit that Attention Layer? There are two layers given as: tf.keras.layers.Attention and tf.keras.layers.AdditiveAttention. I think it uses All Hidden states from LSTM as well as the last output but I'm not quite sure. Below is the code. Please tell me where do I have to put that Attention Layer? Documentation was not helpful for me. All other answers have used their own CustomAttention() Layer.
def build_model(vocab_size:int,n_tags:int,max_len:int,emb_dim:int=300,emb_weights=False,use_elmo:bool=False,use_crf:bool=False,train_embedding:bool=False):
'''
Build and return a Keras model based on the given inputs
args:
n_tags: No of unique 'y' tags present in the data
max_len: Maximum length of sentence to use
emb_dim: Size of embedding dimension
emb_weights: pretrained Embedding Weights for Embedding Layer. if False, use default
use_elmo: Whether to use Elmo Embeddings
use_crf: Whether to use the CRF layer
train_embedding: Whether to train the embeddings weights
out:
Keras model. See comments for each type of loss function and metric to use
'''
assert not(isinstance(emb_weights,np.ndarray) and use_elmo), "Either provide embedding weights or use ELMO. Not both"
inputs = Input(shape=(max_len,))
if isinstance(emb_weights,np.ndarray):
x = Embedding(trainable=train_embedding,input_dim=vocab_size, output_dim=emb_dim, input_length=max_len, mask_zero=True, embeddings_initializer=keras.initializers.Constant(emb_weights))(inputs)
elif use_elmo:
x = Lambda(ElmoEmbedding, output_shape=(max_len, 1024))(inputs) # Lambda will create a layer based on the function defined
else: # use default Embeddings
x = Embedding(input_dim=vocab_size, output_dim=emb_dim, input_length=max_len, mask_zero=True,)(inputs) # n_words = vocab_size
x = Bidirectional(LSTM(units=50, return_sequences=True,recurrent_dropout=0.1))(x)
# I think the attention layer will come here but I'm not sure exactly how to implement it here.
if use_crf:
try: # If you can not modify your crf.py file, it'll use the second package
x = Dense(50, activation="relu")(x) # use TimeDistributed(Dense(50, activation="relu")(x)) in case otherwise
crf = CRF(n_tags) # Instantiate CRF layer
out = crf(x)
model = Model(inputs, out)
return model # use crf_loss and crf_accuracy at compile time
except:
output = Dense(n_tags, activation=None)(x)
crf = CRF_TF2(dtype='float32') # it does not take any n_tags. See the documentation.
output = crf(output)
base_model = Model(inputs, output)
model = ModelWithCRFLoss(base_model) # It has Loss and Metric already. Change the model if you want to use DiceLoss.
return model # Do not use any metric or loss with this model.compile(). Just use Optimizer and run training
else:
out = Dense(n_tags, activation="softmax")(x) # Wrap it around TimeDistributed(Dense()) if you have old versions
model = Model(inputs, out)
return model # use "sparse_categorical_crossentropy", "accuracy"

How to specify sample dependent kernels/filters in Conv2d?

I am trying to implement a convolutional autoencoder where some of the convolutional filters are input content dependent. For example, in a simple toy example, knowing the digit label for MNIST could further help with reconstruction in an autoencoder setup.
The more general idea is that there could be some relevant, auxiliary information (whether the information is the class label or some other information) that that is useful to incorporate. While there are various ways to use this label/auxiliary information, I will do so through creating a separate convolutional filter. Let's say the model has 15 typical convolutional filters, I would like to add an additional convolutional filter that corresponds to the MNIST digit and can be thought of as an embedding of the digit in the form of a 3x3 kernel. We would use the digit as an additional input to the network and then learn a distinct kernel/filter embedding for each digit.
However, I am having difficulty implementing a convolutional filter/kernel that is input dependent. I am not using tf.keras.layers.Conv2D layer because that takes in the # of filters to be used, but not the actual filter parameters to make this input dependent.
# load and preprocess data
num_classes = 10
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()
x_train, x_test = np.float32(x_train)/255, np.float32(x_test)/255
x_train, x_test = np.expand_dims(x_train, axis=-1), np.expand_dims(x_test, axis=-1)
y_train = keras.utils.to_categorical(y_train, num_classes=num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes=num_classes)
num_filters = 15
input_img = layers.Input(shape=(28,28,1))
conv_0 = keras.layers.Conv2D(num_filters, (3,3), strides=2, padding='same', activation='relu')(input_img)
# embed the target as a 3x3 kernel/filter -> this should map to a distinct embedding for
# each target
target = layers.Input(shape=(10,))
target_encoded = layers.Dense(9, activation='relu')(target)
target_encoded = layers.Reshape((3,3,1,1))(target_encoded)
# Using tf.nn.conv2d so that I can specify kernel
# Kernel needs to be a 4D tensor of dimensions (filter_height, filter_width, input_channels, output_channels)
# which in this case is (3,3,1,1)
# However it is currently (None,3,3,1,1) because the first dimension is batch size so this doesn't work
target_conv = tf.nn.conv2d(input_img, target_encoded, strides=[1, 1, 1, 1], padding='SAME')
I am currently using tf.nn.conv2d which takes a kernel as input in the format (filter_height, filter_width, input_channels, output_channels). However, this doesn't work as is because data is fed in batches. Therefore, each sample in the batch has a label and therefore a corresponding kernel so the kernels are of shape (None, 3, 3, 1, 1) which is not compatible with the expected format. This is illustrated in the code chunk above (which doesn't work). What are potential work arounds? Is there a simpler way to implement this concept of an input dependent conv2d filter?
Making A Conv2D with SWAPPABLE kernel!
You'll need to make your own Conv2D that takes as input the image to process AND the kernel to use.
# Define our new Convolution
class DynamicConv2D(tf.keras.layers.Layer):
def __init__(self, padding='SAME'):
super(DynamicConv2D, self).__init__()
self.padding = padding
def call(self, input, kernel):
return tf.nn.conv2d(input=input, filters=kernel,
strides=(1,1), padding=self.padding)
And let's test it out
dc2d = DynamicConv2D(padding='VALID')
input_tensor = np.ones([1,4,4,3],dtype=np.float32)
kernel_tensor = np.ones([2,2,3,1],dtype=np.float32)
dc2d(input_tensor, kernel_tensor)
returns
array([[[[12.], [12.], [12.]],
[[12.], [12.], [12.]],
[[12.], [12.], [12.]]]])
It looks like it works great... but there is a HUGE problem
HUGE ISSUE WITH KERAS - BATCH BY DEFAULT
Yeah, so here is the deal: tensorflow keras is really really really set on everything being set up so the first dimension is the batch. But if you look up above we have to specify the ONE KERNEL for the whole batch. We can't pass in a batch of kernel_tensorS, but just one.
THERE IS A WORK AROUND!
Let's borrow something from RNN training schemes, specifically we are going to solve this by being careful about what we send per batch. More specifically, for a batch we are going to make sure all input images use the same kernel_tensor. You'll have to figure out how you do that efficiently with your data pipeline, but here is an example to get you going.
Working Code
(We will rewrite out dynamic conv2d so that it takes a category and stores its
own kernel per category)
# Define our new Convolution
class DynamicConv2D(tf.keras.layers.Layer):
def __init__(self, padding='SAME', input_dim=10, kernel_shape=[3,3,1,8]):
super(DynamicConv2D, self).__init__()
self.padding = padding
self.input_dim = input_dim
self.kernel_shape = kernel_shape
self.kernel_size = kernel_shape[0]*kernel_shape[1]*kernel_shape[2]*kernel_shape[3] # = 3*3*1*8
self.category_to_kernel = tf.keras.layers.Embedding(self.input_dim,self.kernel_size)
def call(self, input, categories):
just_first_category = tf.slice(categories,(0,0),(1,1))
flat_kernel = self.category_to_kernel(just_first_category)
kernel = tf.reshape(flat_kernel,self.kernel_shape)
return tf.nn.conv2d(input=input, filters=kernel, strides=(1,1), padding=self.padding)
This class by default does a 3x3 convolution, reading in 1 filter from the previous layer and outputting 8
# Example output
dc2d = DynamicConv2D(padding='VALID')
image_data = np.ones([4,10,10,1],dtype=np.float32)
# prove that you can send in a different category and get different results
print( dc2d(image_data, [[3]]*4).numpy()[0,0,0,:3] )
print( dc2d(image_data, [[4]]*4).numpy()[0,0,0,:3] )
--------
[ 0.014 -0.002 0.108]
[ 0.021 0.014 -0.034]
Use it to make a tf.Keras model
# model input
image_input = tf.keras.Input(shape=(28,28,1), dtype=tf.float32)
category_input = tf.keras.Input(shape=(1,), dtype=tf.int32)
# do covolution
dynamic_conv2d = DynamicConv2D(padding='VALID')(image_input, category_input)
# make the model
model = tf.keras.Model(inputs=[image_input, category_input], outputs=dynamic_conv2d)
And we can use the model like so
# use the model
input_as_tensor = tf.constant(image_data,dtype=tf.float32)
category_as_tensor = tf.constant([[4]]*4,dtype=tf.int32)
result = model.predict(x=(input_as_tensor, category_as_tensor))
print('The output shape is',result.shape)
print('The first 3 values of the first output image are', result[0,0,0,:3])
---------
The output shape is (4, 8, 8, 8)
The first 3 values of the first output image are [-0.028 -0.009 0.015]

Issue with feeding value into placeholder tensor for sess.run()

I want to get the value of an intermediate tensor in a convolutional neural network for a specific input. I know how to do this in keras and even though I have trained a model using keras, I'm going to move towards constructing and training the model using only tensorflow. Therefore, I want to move away from something like K.function(input_layer, output_layer) which is fairly simple, and instead use tensorflow. I believe I should use placeholder values, like the following approach:
with tf.compat.v1.Session(graph=tf.Graph()) as sess:
loaded_model = tf.keras.models.load_model(filepath)
graph = tf.compat.v1.get_default_graph()
images = tf.compat.v1.placeholder(tf.float32, shape=(None, 28, 28, 1)) # To specify input at MNIST images
output_tensor = graph.get_tensor_by_name(tensor_name) # tensor_name is 'dense_1/MatMul:0'
output = sess.run([output_tensor], feed_dict={images: x_test[0:1]}) # x_test[0:1] is of shape (1, 28, 28, 1)
print(output)
However, I get the following error message for the sess.run() line: Invalid argument: You must feed a value for placeholder tensor 'conv2d_2_input' with dtype float and shape [?,28,28,1]. I am unsure why I get this message because the image used for feed_dict is of type float and is what I believe to be the correct shape. Any help would be suggested.
You must use the input tensor from the Keras model, not make your own new placeholder, which would be disconnected from the rest of the model:
with tf.Graph().as_default(), tf.compat.v1.Session() as sess:
# Load model
loaded_model = tf.keras.models.load_model(filepath)
# Take model input tensor
images = loaded_model.input
# Take output of the second layer (index 1)
output_tensor = loaded_model.layers[1].output
# Evaluate
output = sess.run(output_tensor, feed_dict={images: x_test[0:1]})
print(output)

How to use embedding models in tensorflow hub with LSTM layer?

I'm learning tensorflow 2 working through the text classification with TF hub tutorial. It used an embedding module from TF hub. I was wondering if I could modify the model to include a LSTM layer. Here's what I've tried:
train_data, validation_data, test_data = tfds.load(
name="imdb_reviews",
split=('train[:60%]', 'train[60%:]', 'test'),
as_supervised=True)
embedding = "https://tfhub.dev/google/tf2-preview/gnews-swivel-20dim/1"
hub_layer = hub.KerasLayer(embedding, input_shape=[],
dtype=tf.string, trainable=True)
model = tf.keras.Sequential()
model.add(hub_layer)
model.add(tf.keras.layers.Embedding(10000, 50))
model.add(tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(64)))
model.add(tf.keras.layers.Dense(64, activation='relu'))
model.add(tf.keras.layers.Dense(1))
model.summary()
model.compile(optimizer='adam',
loss=tf.keras.losses.BinaryCrossentropy(from_logits=True),
metrics=['accuracy'])
history = model.fit(train_data.shuffle(10000).batch(512),
epochs=10,
validation_data=validation_data.batch(512),
verbose=1)
results = model.evaluate(test_data.batch(512), verbose=2)
for name, value in zip(model.metrics_names, results):
print("%s: %.3f" % (name, value))
I don't know how to get the vocabulary size from the hub_layer. So I just put 10000 there. When run it, it throws this exception:
tensorflow.python.framework.errors_impl.InvalidArgumentError: indices[480,1] = -6 is not in [0, 10000)
[[node sequential/embedding/embedding_lookup (defined at .../learning/tensorflow/text_classify.py:36) ]] [Op:__inference_train_function_36284]
Errors may have originated from an input operation.
Input Source operations connected to node sequential/embedding/embedding_lookup:
sequential/embedding/embedding_lookup/34017 (defined at Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/contextlib.py:112)
Function call stack:
train_function
I stuck here. My questions are:
how should I use the embedding module from TF hub to feed an LSTM layer? it looks like embedding lookup has some issues with the setting.
how do I get the vocabulary size from the hub layer?
Thanks
Finally figured out the way to link pre-trained embeddings to LSTM or other layers. Just post the steps here in case anyone feels helpful.
Embedding layer has to be the first layer in the model. (hub_layer is the same as Embedding layer.) The not very intuitive part is that any text input to the hub layer will be converted to only one vector of shape [embedding_dim]. You need to do sentence splitting and tokenization to make sure whatever input to the model is a sequence in the form of array of arrays. e.g., "Let us prepare the data." should be converted to [["let"],["us"],["prepare"], ["the"], ["data"]]. You will also need to pad the sequences if you are using batch mode.
In addition, you will need to convert your target tokens to int if your training labels are strings. The input to the model is array of strings with shape [batch, seq_length], the hub embedding layer converts it to [batch, seq_length, embed_dim]. (If you add a LSTM or other RNN layer, the output from the layer is [batch, seq_length, rnn_units]. ) The output dense layer will output index of text instead of actual text. The index of text is stored in the downloaded tfhub directory as "tokens.txt". You can load the file and convert text to the corresponding index. Otherwise you cannot compute the loss.

How can I get a tensor output by a tensorflow.layer

I created a CNN model using higher level tensorflow layers, like
conv1 = tf.layers.conv2d(...)
maxpooling1 = tf.layers.max_pooling2d(...)
conv2 = tf.layers.conv2d(...)
maxpooling2 = tf.layers.max_pooling2d(...)
flatten = tf.layers.flatten(...)
logits = tf.layers.dense(...)
loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(...))
optimizer = tf.train.AdadeltaOptimizer(init_lr).minimize(loss)
acc = tf.reduce_mean(...)
The model is well trained and saved, everything is good so far. Next, I want to load this saved model, make a change to the learning rate, and continue to train (I know tensorflow provides exponential_decay() function to allow a decay learning rate, here i just want to be in full control of learning rate, and change it manually). To do this, my idea is like:
saver = tf.train.import_meta_grah(...)
saver.restore(sess, tf.train.latest_chechpoint(...))
graph = tf.get_default_graph()
inputImg_ = graph.get_tensor_by_name(...) # this is place_holder in model
labels_ = graph.get_tensor_by_name(...) # place_holder in model
logits = graphget_tensor_by_name(...) # output of dense layer
loss = grah.get_tensor_by_name(...) # loss
optimizer = tf.train.AdadeltaOptimizer(new_lr).minimize(loss) # I give it a new learning rate
acc = tf.reduce_mean(...)
Now I got a problem. the code above can successfully obtain inputmg_, labels_, because I named them when I defined them. But I cannot obtain logits because logits = tf.layers.dense(name='logits') the name is actually given to the dense layer instead of the output tensor logits. That means, I cannot obtain the tensor conv1, conv2 either. It seems tensorflow cannot name a tensor output by a layer. In this case, is there a way to obtain these tensors, like logits, conv1, maxpooling1? I've searched for the answer for a while but failed.
I was having the same problem and solved it using tf.identity.
Since the dense layer has bias and weights parameters, when you name it, you are naming the layer, not the output tensor.
The tf.identity returns a tensor with the same shape and contents as input.
So just leave the dense layer unamed and use it as input to the tf.identity
self.output = tf.layers.dense(hidden_layer3, 2)
self.output = tf.identity(self.output, name='output')
Now you can load the output
output = graph.get_tensor_by_name('output:0')