How to apply l2 normalization to a layer in keras? - tensorflow

I am trying to normalize a layer in my neural network using l2 normalization. I want to divide each node/element in a specific layer by its l2 norm (the square root of the sum of squared elements), and my assumption is that keras' l2_normalize can achieve this: https://www.tensorflow.org/api_docs/python/tf/keras/backend/l2_normalize?version=stable. However, I am not sure how to actually use this since there are no examples in the documentation. I found other examples that use a lambda function along with it, for example Lambda(lambda x: K.l2_normalize(x,axis=1))(previous_layer). However, I am not sure why this needs to be done? Would appreciate help on how keras.backend.l2_normalize should be used and why a lambda function might be needed. Thanks!
Here is how I'd want to be used:
autoencoder = Sequential()
# Encoder Layer
autoencoder.add(Dense(encoded_dim, input_shape=(input_dim,),
activation='relu'))
# Normalization - Need help here!
# TODO: Add l2_normalize here
# Decoder Layer
# TODO: Add final output layer here

Do as the example you mentioned. It's ok.
You need a Layer for every operation in the model, backend operations are no exception, and that's the reason for the Lambda layer. (Keras needs layers to do its magic).
import keras.backend as K
autoencoder.add(Lambda(lambda x: K.l2_normalize(x,axis=1)))
If you are planning to get the encoder's output later for other things, I suggest you create the encoder and decoder as separate models:
encoder = Sequential()
#.... add encoder layers ....
#encoder.add(....)
decoder = Sequential()
#.... add decoder layers ....
#decoder.add(....)
Finally create the autoencoder as another model, for training:
from keras.layers import Input
from keras.models import Model
autoInput = Input(shape_of_the_encoder_input)
encodedData = encoder(autoInput)
decodedData = decoder(encodedData)
autoencoder = Model(autoInput, decodedData)

Related

How to randomly initialize layers in pretrained model?

I am using Xception model with pre initialized weights trained on ImageNet as so:
model = keras.applications.Xception(
weights='imagenet',
input_shape=(150,150,3)
)
Now I Would like to take specific layer (by its name, using model.get_layer(layerName)) and then reinitialize its weights to completely random one.
What is the simplest way to do so, and if it is even possible?
You could use a reinitialize function like this:
def reinitialize_layer(model, initializer, layer_name):
layer = model.get_layer(layer_name)
layer.set_weights([initializer(shape=w.shape) for w in layer.get_weights()])
Instead of layer_name you could also work with the layer index. You could also extend the function such that it takes a list of layer names, if you like to reinitialize more than one layer.
Usage example:
import keras
model = keras.applications.Xception(
weights='imagenet',
input_shape=(299,299,3)
)
# zeros as illustrative example, change to something else
initializer = keras.initializers.Zeros()
# check pretrained weights
print(model.get_layer("predictions").get_weights())
# change "predictions" to whatever layer name you like to use instead
reinitialize_layer(model, initializer, "predictions")
# check weights after reinitialization
print(model.get_layer("predictions").get_weights())
model.compile(...)
model.fit(...)

How to use Attention or AdditiveAttention Layers which are given in tensorflow (Keras) for NER task?

I'm making a NER model with Bi-LSTM. I want to use Attention layers with it. I want to what is the right way to fit that Attention Layer? There are two layers given as: tf.keras.layers.Attention and tf.keras.layers.AdditiveAttention. I think it uses All Hidden states from LSTM as well as the last output but I'm not quite sure. Below is the code. Please tell me where do I have to put that Attention Layer? Documentation was not helpful for me. All other answers have used their own CustomAttention() Layer.
def build_model(vocab_size:int,n_tags:int,max_len:int,emb_dim:int=300,emb_weights=False,use_elmo:bool=False,use_crf:bool=False,train_embedding:bool=False):
'''
Build and return a Keras model based on the given inputs
args:
n_tags: No of unique 'y' tags present in the data
max_len: Maximum length of sentence to use
emb_dim: Size of embedding dimension
emb_weights: pretrained Embedding Weights for Embedding Layer. if False, use default
use_elmo: Whether to use Elmo Embeddings
use_crf: Whether to use the CRF layer
train_embedding: Whether to train the embeddings weights
out:
Keras model. See comments for each type of loss function and metric to use
'''
assert not(isinstance(emb_weights,np.ndarray) and use_elmo), "Either provide embedding weights or use ELMO. Not both"
inputs = Input(shape=(max_len,))
if isinstance(emb_weights,np.ndarray):
x = Embedding(trainable=train_embedding,input_dim=vocab_size, output_dim=emb_dim, input_length=max_len, mask_zero=True, embeddings_initializer=keras.initializers.Constant(emb_weights))(inputs)
elif use_elmo:
x = Lambda(ElmoEmbedding, output_shape=(max_len, 1024))(inputs) # Lambda will create a layer based on the function defined
else: # use default Embeddings
x = Embedding(input_dim=vocab_size, output_dim=emb_dim, input_length=max_len, mask_zero=True,)(inputs) # n_words = vocab_size
x = Bidirectional(LSTM(units=50, return_sequences=True,recurrent_dropout=0.1))(x)
# I think the attention layer will come here but I'm not sure exactly how to implement it here.
if use_crf:
try: # If you can not modify your crf.py file, it'll use the second package
x = Dense(50, activation="relu")(x) # use TimeDistributed(Dense(50, activation="relu")(x)) in case otherwise
crf = CRF(n_tags) # Instantiate CRF layer
out = crf(x)
model = Model(inputs, out)
return model # use crf_loss and crf_accuracy at compile time
except:
output = Dense(n_tags, activation=None)(x)
crf = CRF_TF2(dtype='float32') # it does not take any n_tags. See the documentation.
output = crf(output)
base_model = Model(inputs, output)
model = ModelWithCRFLoss(base_model) # It has Loss and Metric already. Change the model if you want to use DiceLoss.
return model # Do not use any metric or loss with this model.compile(). Just use Optimizer and run training
else:
out = Dense(n_tags, activation="softmax")(x) # Wrap it around TimeDistributed(Dense()) if you have old versions
model = Model(inputs, out)
return model # use "sparse_categorical_crossentropy", "accuracy"

Keras remove activation function of last layer

I want to use ResNet50 with Imagenet weights.
The last layer of ResNet50 is (from here)
x = layers.Dense(1000, activation='softmax', name='fc1000')(x)
I need to keep the weights of this layer but remove the softmax function.
I want to manually change it so my last layer looks like this
x = layers.Dense(1000, name='fc1000')(x)
but the weights stay the same.
Currently I call my net like this
resnet = Sequential([
Input(shape(224,224,3)),
ResNet50(weights='imagenet', input_shape(224,224,3))
])
I need the Input layer because otherwise the model.compile says that placeholders aren't filled.
Generally there are two ways of achievieng this:
Quick way - supported functions:
To change the final layer's activation function, you can pass an argument classifier_activation.
So in order to get rid of activation all together, your module can be called like:
import tensorflow as tf
resnet = tf.keras.Sequential([
tf.keras.layers.Input(shape=(224,224,3)),
tf.keras.applications.ResNet50(
weights='imagenet',
input_shape=(224,224,3),
pooling="avg",
classifier_activation=None
)
])
This however, is not going to work if the you want a different function, that is not supported by Keras classifer_activation parameter (e. g. custom activation function).
To achieve this you can use the workaround solution:
Long way - copy the model's weights
This solution proposes copying the original model's weights onto your custom one. This approach works because apart from the activation function you are not chaning the model's architecture.
You need to:
1. Download original model.
2. Save it's weights.
3. Declare your modified version of the model (in your case, without the activation function).
4. Set the weights of the new model.
Below snippet explains this concept in more detail:
import tensorflow as tf
# 1. Download original resnet
resnet = tf.keras.Sequential([
tf.keras.layers.Input(shape=(224,224,3)),
tf.keras.applications.ResNet50(
weights='imagenet',
input_shape=(224,224,3),
pooling="avg"
)
])
# 2. Hold weights in memory:
imagenet_weights = resnet.get_weights()
# 3. Declare the model, but without softmax
resnet_no_softmax = tf.keras.Sequential([
tf.keras.layers.Input(shape=(224,224,3)),
tf.keras.applications.ResNet50(
include_top=False,
weights='imagenet',
input_shape=(224,224,3),
pooling="avg"
),
tf.keras.layers.Dense(1000, name='fc1000')
])
# 4. Pass the imagenet weights onto the second resnet
resnet_no_softmax.set_weights(imagenet_weights)
Hope this helps!

How can I use TensorFlow's sampled softmax loss function in a Keras model?

I'm training a language model in Keras and would like to speed up training by using sampled softmax as the final activation function in my network. From the TF docs, it looks like I need to supply arguments for weights and biases, but I'm unsure of what is expected as input for these. It seems like I could write a custom function in Keras as follows:
import keras.backend as K
def sampled_softmax(weights, biases, y_true, y_pred, num_sampled, num_classes):
return K.sampled_softmax(weights, biases, y_true, y_pred, num_sampled, num_classes)
However, I'm unsure of how to "plug this in" to my existing network. The architecture for the LM is pretty dead-simple:
model = Sequential()
model.add(Embedding(input_dim=len(vocab), output_dim=256))
model.add(LSTM(1024, return_sequence=True))
model.add(Dense(output_dim=len(vocab), activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam')
Given this architecture, could I pass the sampled_softmax function as the loss argument when calling the compile method on the model? Or do this need to be written as a layer that comes after the final fully-connected layer. Any guidance here would be greatly appreciated. Thanks.
The key observation here is that the TensorFlow sampled softmax function returns actual losses, not a set of predictions over the set of possible labels to compare with the ground truth data to then compute losses as a separate step. This makes the model setup a little bit weird.
First, we add a second input layer to the model that encodes the target (training) data a second time as an input, in addition to being the target output. This is used for the labels argument of the sampled_softmax_loss function. It needs to be a Keras input, because it's treated as an input when we go to instantiate and set up the model.
Second, we construct a new custom Keras layer that calls the sampled_softmax_loss function with two Keras layers as its inputs: the output of the dense layer that predicts our classes, and then the second input that contains a copy of the training data. Note that we're doing some serious hackery accessing the _keras_history instance variable to fetch the weight and bias tensors from the output tensor of the original fully-connected layer.
Finally, we have to construct a new "dumb" loss function that ignores the training data and just uses the loss reported by the sampled_softmax_loss function.
Note that because the sampled softmax function returns losses, not class predictions, you can't use this model specification for validation or inference. You'll need to re-use the trained layers from this "training version" in a new specification that applies a standard softmax function to the original dense layer which has the default activation function applied.
There is definitely a more elegant way to do this, but I believe this works, so I figured I'd post it here now as-is rather than wait until I have something that's a little bit neater. For example, you'd probably want to make the number of classes an argument of the SampledSoftmax layer, or better yet, condense this all into the loss function as in the original question and avoid passing in the training data twice.
from keras.models import Model
from keras.layers import Input, Dense, Layer
from keras import backend as K
class SampledSoftmax(Layer):
def __init__(self, **kwargs):
super(SampledSoftmax, self).__init__(**kwargs)
def call(self, inputs):
"""
The first input should be the model as it were, and the second the
target (i.e., a repeat of the training data) to compute the labels
argument
"""
# the labels input to this function is batch size by 1, where the
# value at position (i, 1) is the index that is true (not zero)
# e.g., (0, 0, 1) => (2) or (0, 1, 0, 0) => (1)
return K.tf.nn.sampled_softmax_loss(weights=inputs[0]._keras_history[0].weights[0],
biases=inputs[0]._keras_history[0].bias,
inputs=inputs[0],
labels=K.tf.reshape(K.tf.argmax(inputs[1], 1), [-1, 1]),
num_sampled=1000,
num_classes=200000)
def custom_loss(y_true, y_pred):
return K.tf.reduce_mean(y_pred)
num_classes = 200000
input = Input(shape=(300,))
target_input = Input(shape=(num_classes,))
dense = Dense(num_classes)
outputs = dense(input)
outputs = SampledSoftmax()([outputs, target_input])
model = Model([input, target_input], outputs)
model.compile(optimizer=u'adam', loss=custom_loss)
# train as desired

Implementing a many-to-many LSTM in TensorFlow?

I am using TensorFlow to make predictions on time-series data. So it is like I have 50 tags and I want to find out the next possible 5 tags.
As shown in the following picture, I want to make it like the 4th structure.
I went through the tutorial demo: Recurrent Neural Networks
But I found it can provide like the 5th one in the above picture, which is different.
I am wondering which model could I use? I am thinking of the seq2seq models, but not sure if it is the right way.
You are right that you can use a seq2seq model. For brevity I've written up an example of how you can do it in Keras which also has a Tensorflow backend. I've not run the example so it might need tweaking. If your tags are one-hot you need to use cross-entropy loss instead.
from keras.models import Model
from keras.layers import Input, LSTM, RepeatVector
# The input shape is your sequence length and your token embedding size
inputs = Input(shape=(seq_len, embedding_size))
# Build a RNN encoder
encoder = LSTM(128, return_sequences=False)(inputs)
# Repeat the encoding for every input to the decoder
encoding_repeat = RepeatVector(5)(encoder)
# Pass your (5, 128) encoding to the decoder
decoder = LSTM(128, return_sequences=True)(encoding_repeat)
# Output each timestep into a fully connected layer
sequence_prediction = TimeDistributed(Dense(1, activation='linear'))(decoder)
model = Model(inputs, sequence_prediction)
model.compile('adam', 'mse') # Or categorical_crossentropy
model.fit(X_train, y_train)