Is it possible to add different behavior for training and testing in keras Functional API - tensorflow

I want to use different behavior in functional API for training and testing. Is it possible?
E.g.,
a = Input
b = CONV1(a)
if testing:
return b
c = CONV2(b)

Yes, this can be achieved by defining custom keras layers.
Example codes:
class diff_behavior_layer(tf.keras.layers.Layer):
def __init__(self, **kwargs):
self.dense_1=Dense(64)
super().__init__(**kwargs)
def call(self, inputs,training=None):
if training:
return self.dense_1(inputs)
else:
return inputs
inputs = tf.keras.Input(shape=(2,))
x=Dense(64)(inputs)
x=Dense(64)(x)
x=diff_behavior_layer()(x)
outputs=Dense(64)(x)
model = tf.keras.Model(inputs=inputs, outputs=outputs)
model(data,training=True) # flow through 4 dense layer
model(data,training=False) # flow through 3 dense layer
Remark: 'training' must be used as the keyword argument here. You cannot define your own keyword argument like testing, etc..

Related

What does self._compute_output_and_mask_jointly = True do in tf.keras.layers.Masking layer?

tf.keras.layers.Masking layer has _compute_output_and_mask_jointly set to True in its __init__(...), what does this attribute do other than telling what it is doing in its call(...)?
def __init__(self, mask_value=0., **kwargs):
...
self._compute_output_and_mask_jointly = True
In addition, the mask has been created and applied in call(...). What is the purpose of compute_mask(...)? Seems redundant.
def compute_mask(self, inputs, mask=None):
return tf.reduce_any(tf.not_equal(inputs, self.mask_value), axis=-1)
def call(self, inputs):
boolean_mask = tf.reduce_any(
tf.not_equal(inputs, self.mask_value), axis=-1, keepdims=True)
outputs = inputs * tf.cast(boolean_mask, inputs.dtype)
# Compute the mask and outputs simultaneously.
outputs._keras_mask = tf.squeeze(boolean_mask, axis=-1) # pylint: disable=protected-access
return outputs
First of all, a hefty, fair warning:
This is an implementation detail, never use it!
It may be in fact on the way out.
Having said that, this is a minor optimization, used by the single layers.Masking class of all layer classes there are. This is part of TensorFlow Keras (as opposed to TensorFlow proper). When this attribute is present and set to True on a layer, the Keras framework assumes that the output mask has been already computed in the __call__ invocation and placed into the KerasTensor._layer_mask attribute, and optimizes out a call to the compute_mask method, both in eager and in graph tracing modes. This is all to it. No magic up to eleven.
Actually, creating the _layer_mask attribute on the output KerasTensor has the same effect. And you'll indeed avoid a nasty surprise one day by setting neither of these attributes.

How to use Attention or AdditiveAttention Layers which are given in tensorflow (Keras) for NER task?

I'm making a NER model with Bi-LSTM. I want to use Attention layers with it. I want to what is the right way to fit that Attention Layer? There are two layers given as: tf.keras.layers.Attention and tf.keras.layers.AdditiveAttention. I think it uses All Hidden states from LSTM as well as the last output but I'm not quite sure. Below is the code. Please tell me where do I have to put that Attention Layer? Documentation was not helpful for me. All other answers have used their own CustomAttention() Layer.
def build_model(vocab_size:int,n_tags:int,max_len:int,emb_dim:int=300,emb_weights=False,use_elmo:bool=False,use_crf:bool=False,train_embedding:bool=False):
'''
Build and return a Keras model based on the given inputs
args:
n_tags: No of unique 'y' tags present in the data
max_len: Maximum length of sentence to use
emb_dim: Size of embedding dimension
emb_weights: pretrained Embedding Weights for Embedding Layer. if False, use default
use_elmo: Whether to use Elmo Embeddings
use_crf: Whether to use the CRF layer
train_embedding: Whether to train the embeddings weights
out:
Keras model. See comments for each type of loss function and metric to use
'''
assert not(isinstance(emb_weights,np.ndarray) and use_elmo), "Either provide embedding weights or use ELMO. Not both"
inputs = Input(shape=(max_len,))
if isinstance(emb_weights,np.ndarray):
x = Embedding(trainable=train_embedding,input_dim=vocab_size, output_dim=emb_dim, input_length=max_len, mask_zero=True, embeddings_initializer=keras.initializers.Constant(emb_weights))(inputs)
elif use_elmo:
x = Lambda(ElmoEmbedding, output_shape=(max_len, 1024))(inputs) # Lambda will create a layer based on the function defined
else: # use default Embeddings
x = Embedding(input_dim=vocab_size, output_dim=emb_dim, input_length=max_len, mask_zero=True,)(inputs) # n_words = vocab_size
x = Bidirectional(LSTM(units=50, return_sequences=True,recurrent_dropout=0.1))(x)
# I think the attention layer will come here but I'm not sure exactly how to implement it here.
if use_crf:
try: # If you can not modify your crf.py file, it'll use the second package
x = Dense(50, activation="relu")(x) # use TimeDistributed(Dense(50, activation="relu")(x)) in case otherwise
crf = CRF(n_tags) # Instantiate CRF layer
out = crf(x)
model = Model(inputs, out)
return model # use crf_loss and crf_accuracy at compile time
except:
output = Dense(n_tags, activation=None)(x)
crf = CRF_TF2(dtype='float32') # it does not take any n_tags. See the documentation.
output = crf(output)
base_model = Model(inputs, output)
model = ModelWithCRFLoss(base_model) # It has Loss and Metric already. Change the model if you want to use DiceLoss.
return model # Do not use any metric or loss with this model.compile(). Just use Optimizer and run training
else:
out = Dense(n_tags, activation="softmax")(x) # Wrap it around TimeDistributed(Dense()) if you have old versions
model = Model(inputs, out)
return model # use "sparse_categorical_crossentropy", "accuracy"

Reusable block in Keras' functional API

The goal is to create a block of layers, using Keras' functional API, which is usable (also syntax-wise) like a 'normal' Keras layer.
Here is a toy example
from tensorflow.keras import layers as kl
def layer_block(prev_layer, args):
# some code using 'args'
layer = kl.Dense(units=prev_layer.shape[1])(prev_layer)
layer = kl.Dense(units=5)(layer)
layer = kl.Dense(units=prev_layer.shape[1])(layer)
return layer
This block is called using layer_block(prev_layer, args) which is in contradiction to Keras' functional API's syntax. It should rather look like layer_block(args)(prev_layer).
The approach so far is to wrap this block by another block:
def outer_block(args):
def layer_block(prev_layer, args):
# some code using 'args'
layer = kl.Dense(units=prev_layer.shape[1])(prev_layer)
layer = kl.Dense(units=5)(layer)
layer = kl.Dense(units=prev_layer.shape[1])(layer)
return layer
return lambda prev_layer: layer_block(prev_layer, args)
Now two questions arise:
Is there an easier way to achieve this?
Is it effective this way or does it have negative impact on performance?
Thank you in advance!
What you're doing doesn't affect performance, you're creating layers perfectly fine.
There is no problem in any of your two approaches, but if you do want to make it work as an actual layer, transform it into a model.
This may not work in every keras version:
class LayerBlock(tensorflow.keras.Model): #not sure if it works in normal keras (without tf)
def __init__(self):
super(LayerBlock, self).__init__(outer_units)
self.layer1 = kl.Dense(units=outer_units)
self.layer2 = kl.Dense(units=5)
self.layer3 = kl.Dense(units=outer_units)
def call(self, inputs):
x = self.layer1(inputs)
x = self.layer2(x)
x = self.layer3(x)
return x
This tutorial seems to suggest that you can use tf.keras.Layer instead of tf.keras.Model, but that sounds strange to me. It may work with eager mode on, but it lacks the build method with a self.built=True statement.

how to create a keras layer that takes effect only during evaluation phase (and that is transparent during training)?

I want to add to my model a layer that, during evaluation, takes the input, applies some transformations (a quantization in this case, but can be whatever) and return it as the output. This layer must, however, be completely transparent during training, meaning that it must return the same input tensor.
I have written the following function
from keras.layers import Lambda
import keras.backend as K
def myquantize(x):
return K.in_test_phase( K.clip(K.round(x*(2**5))/(2**5),-3.9,3.9) , x)
which I then use via a Lambda layer
y = keras.layers.Conv1D(**args1)
y = keras.layers.AveragePooling1D(pool_size=2)(y)
y = keras.layers.Lambda(myquantize)(y)
y = keras.layers.Conv1D(**args2)
#...
Now, in principle the K.in_test_phase should return x during training, and that expression during test.
However, training the network with such layer prevent the network from learning (i.e. the train loss stops decreasing after 3 epochs), while if I remove it the network keeps training normally. I assume this layer is not actually transparent during training as expected.
in_test_phase has a parameter of training which you can explicitly set to indicate whether you are training or not. If you don't set it explicitly, then the value of learning_phase is used. This value keeps changing when you reset the graph or when you call different types of fit/predict/evaluate functions of model.
Since your full code isn't present, you can make use of training parameter. Set it to True during training. Then save the weights of the model using save_weights function of model. When you wish to test your model, set the training parameter to False. Then load the weights using load_weights function and then you can proceed accordingly.
For those who are in a similar situation, I created a custom layer like the following, which I only use during training:
class MyLayer(keras.layers.Layer):
def __init__(self, **kwargs):
super(MyLayer, self).__init__(**kwargs)
def compute_output_shape(self, input_shape):
return input_shape
def call(self, inputs, **kwargs):
x=inputs
return K.identity(x)
note that this layer always returns the input tensor, but it serves as 'placeholder' for the next step. On the evaluation part of the code, I wrote the following code:
class MyLayer(keras.layers.Layer):
def __init__(self, **kwargs):
super(MyLayer, self).__init__(**kwargs)
def compute_output_shape(self, input_shape):
return input_shape
def call(self, inputs, **kwargs):
x=inputs
return #Your actual processing here
Here, the only difference is that you actually perform the desired processing steps on your tensor. When I load my stored model, I pass this class as custom object
model = keras.models.load_model(model_file,custom_objects={'MyLayer':MyLayer})
be careful to pass as MyLayer the one where the actual processing is performed.
This is my solution, other suggestions are welcome

Manipulating nn.Dense() layer parameters manually in MxNet

I'm trying to implement my own optimization algorithm for MxNet (Imperative / Gluon) that does not use gradients. My question is pretty simple is there a simple way to create new nn.Dense(...) layer initialized with parameters (i.e. Biases and Weights) represented by two nd.array() instances?
Thank you in advance!
You can create a custom block with parameters that set differentiable=False, and provide the data for initialization through the init argument. See the scales parameter in the example below taken from this tutorial. You can also see an example of FullyConnected which you'll want to use for your dense layer too. F is used to denote a generic backend, typically this would be mx.ndarray, but after hybridization this is set to mx.symbol.
class NormalizationHybridLayer(gluon.HybridBlock):
def __init__(self, hidden_units, scales):
super(NormalizationHybridLayer, self).__init__()
with self.name_scope():
self.weights = self.params.get('weights',
shape=(hidden_units, 0),
allow_deferred_init=True)
self.scales = self.params.get('scales',
shape=scales.shape,
init=mx.init.Constant(scales.asnumpy().tolist()), # Convert to regular list to make this object serializable
differentiable=False)
def hybrid_forward(self, F, x, weights, scales):
normalized_data = F.broadcast_div(F.broadcast_sub(x, F.min(x)), (F.broadcast_sub(F.max(x), F.min(x))))
weighted_data = F.FullyConnected(normalized_data, weights, num_hidden=self.weights.shape[0], no_bias=True)
scaled_data = F.broadcast_mul(scales, weighted_data)
return scaled_data