How to create a recurrent connection between 2 layers in Tensorflow/Keras? - tensorflow

Essentially what I would like to do is take the following very simple feedforward graph:
And then add a recurrent layer that feeds the outputs of the second Dense layer as Input to the first Dense layer, like demonstrated below. Both models are obviously simplifications of my actual use case, though I suppose the general principle for which I am asking holds true for both.
I wonder if there may be an efficient way in Tensorflow or even keras to accomplish this, especially regarding GPU processing efficiency. While I am fairly confident that I could hack together a custom model in Tensorflow that would accomplish this function-wise am I pessimistic about the GPU processing efficiency of such a custom model. I therefore would very much appreciate if someone knows about an efficient way to accomplish these recurrent connections between 2 layers. Thank you for your time! =)
For completeness sake, here is the code to create the first simple feedforward graph. The recurrent graph I created through image editing.
inputs = tf.keras.Input(shape=(128,))
h_1 = tf.keras.layers.Dense(64)(inputs)
h_2 = tf.keras.layers.Dense(32)(h_1)
out = tf.keras.layers.Dense(16)(h_2)
model = tf.keras.Model(inputs, out)

Since my question hasn't received any answers would I like to share the solution I came up with in case someone finds this question via search.
Please let me know if you find or come up with a better solution - thanks!
class SimpleModel(tf.keras.Model):
def __init__(self, input_shape, *args, **kwargs):
super(SimpleModel, self).__init__(*args, **kwargs)
# Create node layers
self.node_1 = tf.keras.layers.InputLayer(input_shape=input_shape)
self.node_2 = tf.keras.layers.Dense(64, activation='sigmoid')
self.node_3 = tf.keras.layers.Dense(32, activation='sigmoid')
self.node_4 = tf.keras.layers.Dense(16, activation='sigmoid')
self.conn_3_2_recurrent_state = None
# Create recurrent connection states
node_1_output_shape = self.node_1.compute_output_shape(input_shape)
node_2_output_shape = self.node_2.compute_output_shape(node_1_output_shape)
node_3_output_shape = self.node_3.compute_output_shape(node_2_output_shape)
self.conn_3_2_recurrent_state = tf.Variable(initial_value=self.node_3(tf.ones(shape=node_2_output_shape)),
trainable=False,
validate_shape=False,
dtype=tf.float32)
# OR
# self.conn_3_2_recurrent_state = tf.random.uniform(shape=node_3_output_shape, minval=0.123, maxval=4.56)
# OR
# self.conn_3_2_recurrent_state = tf.ones(shape=node_3_output_shape)
# OR
# self.conn_3_2_recurrent_state = tf.zeros(shape=node_3_output_shape)
def call(self, inputs):
x = self.node_1(inputs)
#tf.print(self.conn_3_2_recurrent_state)
#tf.print(self.conn_3_2_recurrent_state.shape)
x = tf.keras.layers.Concatenate(axis=-1)([x, self.conn_3_2_recurrent_state])
x = self.node_2(x)
x = self.node_3(x)
self.conn_3_2_recurrent_state.assign(x)
#tf.print(self.conn_3_2_recurrent_state)
#tf.print(self.conn_3_2_recurrent_state.shape)
x = self.node_4(x)
return x
# Demonstrate statefulness of model (uncomment tf prints in model.call())
model = SimpleModel(input_shape=(10, 128))
x = tf.ones(shape=(10, 128))
model(x)
model(x)
# Demonstrate trainability of the recurrent connection TF model
x = tf.random.uniform(shape=(10, 128))
y = tf.ones(shape=(10, 16))
model = SimpleModel(input_shape=(10, 128))
model.compile(optimizer='adam', loss='binary_crossentropy')
model.fit(x=x, y=y, epochs=100)

Related

How to create a keras layer with a custom gradient *and learnable parameters* in TF2.0?

this is a similar question to: How to create a keras layer with a custom gradient in TF2.0?
Only, I would like to introduce a learnable parameter into the custom layer that I am training.
Here's a toy example of my current approach here:
# Method for calculation custom gradient
#tf.custom_gradient
def scaler(x, s):
def grad(upstream):
dy_dx = s
dy_ds = x
return dy_dx, dy_ds
return x * s, grad
# Keras Layer with trainable parameter
class TestLayer(tf.keras.layers.Layer):
def build(self, input_shape):
self.scale = self.add_weight("scale",
shape=[1,],
initializer=tf.keras.initializers.Constant(value=2.0),
trainable=True)
def call(self, inputs):
return scaler(inputs, self.scale)
# Creates Keras Model that uses the layer
def Model():
x_in = tf.keras.layers.Input(shape=(1,))
x_out = TestLayer()(x_in)
return tf.keras.Model(inputs=x_in, outputs=x_out, name="fp8_test")
# Create toy dataset, want to learn `scale` such to satisfy 5 = 2 * scale (i.e, `scale` should learn ~2.5)
def Dataset():
inps = tf.ones(shape=(10**5,)) * 2 # inputs
expected = tf.ones(shape=(10**5,)) * 5 # targets
data_in = tf.data.Dataset.from_tensors(inps)
data_exp = tf.data.Dataset.from_tensors(expected)
dataset = tf.data.Dataset.zip((data_in, data_exp))
return dataset
model = Model()
model.summary()
dataset = Dataset()
# Use `MSE` loss and `SGD` optimizer
model.compile(
loss=tf.keras.losses.MSE,
optimizer=tf.keras.optimizers.SGD(),
)
model.fit(dataset, epochs=100)
This is failing with the following shape related error in the optimizer:
ValueError: Shapes must be equal rank, but are 1 and 2 for '{{node SGD/SGD/update/ResourceApplyGradientDescent}} = ResourceApplyGradientDescent[T=DT_FLOAT, use_locking=true](fp8_test/test_layer_1/ReadVariableOp/resource, SGD/Identity, SGD/IdentityN)' with input shapes: [], [], [100000,1].
I've been staring at the docs for a while, I'm a bit stumped as to why this isn't working, I would really appreciate any input on how to fix this toy example.
Thanks in advance.

How to properly stack RNN layers?

I've been trying to implement a character-level language model in tensorflow based on this tutorial.
I would like to extend the model by allowing multiple RNN layers to be stacked. So far I've come up with this:
class MyModel(tf.keras.Model):
def __init__(self, vocab_size, embedding_dim, rnn_type, rnn_units, num_layers, dropout):
super().__init__(self)
self.rnn_type = rnn_type.lower()
self.num_layers = num_layers
self.embedding = tf.keras.layers.Embedding(vocab_size, embedding_dim)
if self.rnn_type == 'gru':
rnn_layer = tf.keras.layers.GRU
elif self.rnn_type == 'lstm':
rnn_layer = tf.keras.layers.LSTM
elif self.rnn_type == 'rnn':
rnn_layer = tf.keras.layers.SimpleRNN
else:
raise ValueError(f'Unsupported RNN layer: {rnn_type}')
setattr(self, self.rnn_type, rnn_layer(rnn_units, return_sequences=True, return_state=True, dropout=dropout))
for i in range(1, num_layers):
setattr(self, f'{self.rnn_type}_{i}', rnn_layer(rnn_units, return_sequences=True, return_state=True, dropout=dropout))
self.dense = tf.keras.layers.Dense(vocab_size)
def call(self, inputs, states=None, return_state=False, training=False):
x = inputs
x = self.embedding(x, training=training)
rnn = self.get_layer(self.rnn_type)
if states is None:
states = rnn.get_initial_state(x)
x, states = rnn(x, initial_state=states, training=training)
for i in range(1, self.num_layers):
layer = self.get_layer(f'{self.rnn_type}_{i}')
x, states = layer(x, initial_state=states, training=training)
x = self.dense(x, training=training)
if return_state:
return x, states
else:
return x
model = MyModel(
vocab_size=vocab_size,
embedding_dim=embedding_dim,
rnn_type='gru',
rnn_units=512,
num_layers=3,
dropout=dropout)
When trained for 30 epochs on the dataset in the tutorial, this model generates some random gibberish. Now I don't know if I'm doing the stacking wrong or if the dataset is just too small.
There are multiple factors contributing to the bad predictions of your model:
The dataset is small
The model itself you are using is quite simple
The training time is very short
Predicting Shakespeare sonnets will produce random gibberish even if trained right
Try to train it for longer. This will ultimately lead to better results although predicting coorect speech based on text may be one of the hardest tasks in Machine Learning in general. For example GPT3, one of the models, which solves this problem almost perfectly, consists of billions of parameters (see here).
EDIT: The reason why your model is performing worse than the one in the tutorial although you have more stacked RNN layers may be, that more layers need more training time. Simply increasing the number of layers will not necessarily increase your prediction quality. As I said, try to increase training time or play around with hyper parameters (learning rate, Nomralization layers, etc.).

How to save and reload a Subclassed model in TF 2.6.0 / Python 3.9.7 wihtout performance drop?

Looks like the million dollars question. I have the model below built by sub classing Model in Keras.
Model trains fine and have good performance but I cannot find a way to save and restore the model without incurring a significant performance loss.
I track AUC on ROC curves for anomaly detection, and the ROC curve after loading the model is worse than before, using exactly the same validation data set.
I suspect the problem to come from the BatchNormalization, but I could be wrong.
I've tried several option:
This works but leads to performance drop.
model.save() / tf.keras.models.load()
This works but also lead to performance drop:
model.save_weights() / model.load_weights()
This does not work and I get the following error:
tf.saved_model.save() / tf.saved_model.load()
AttributeError: '_UserObject' object has no attribute 'predict'
This does not work either, as Subclassed model do not support json export:
model.to_json()
Here is the model:
class Deep_Seq2Seq_Detector(Model):
def __init__(self, flight_len, param_len, hidden_state=16):
super(Deep_Seq2Seq_Detector, self).__init__()
self.input_dim = (None, flight_len, param_len)
self._name_ = "LSTM"
self.units = hidden_state
self.regularizer0 = tf.keras.Sequential([
layers.BatchNormalization()
])
self.encoder1 = layers.LSTM(self.units,
return_state=False,
return_sequences=True,
#activation="tanh",
name='encoder1',
input_shape=self.input_dim)#,
#kernel_regularizer= tf.keras.regularizers.l1(),
#)
self.regularizer1 = tf.keras.Sequential([
layers.BatchNormalization(),
layers.Activation("tanh")
])
self.encoder2 = layers.LSTM(self.units,
return_state=False,
return_sequences=True,
#activation="tanh",
name='encoder2')#,
#kernel_regularizer= tf.keras.regularizers.l1()
#) # input_shape=(None, self.input_dim[1],self.units),
self.regularizer2 = tf.keras.Sequential([
layers.BatchNormalization(),
layers.Activation("tanh")
])
self.encoder3 = layers.LSTM(self.units,
return_state=True,
return_sequences=False,
activation="tanh",
name='encoder3')#,
#kernel_regularizer= tf.keras.regularizers.l1(),
#) # input_shape=(None, self.input_dim[1],self.units),
self.repeat = layers.RepeatVector(self.input_dim[1])
self.decoder = layers.LSTM(self.units,
return_sequences=True,
activation="tanh",
name="decoder",
input_shape=(self.input_dim[1],self.units))
self.dense = layers.TimeDistributed(layers.Dense(self.input_dim[2]))
#tf.function
def call(self, x):
# Encoder
x0 = self.regularizer0(x)
x1 = self.encoder1(x0)
x11 = self.regularizer1(x1)
x2 = self.encoder2(x11)
x22 = self.regularizer2(x2)
output, hs, cs = self.encoder3(x22)
# see https://www.tensorflow.org/guide/keras/rnn
encoded_state = [hs, cs]
repeated_vec = self.repeat(output)
# Decoder
decoded = self.decoder(repeated_vec, initial_state=encoded_state)
output_decoder = self.dense(decoded)
return output_decoder
I've seen Git threads, but no straight answer:
https://github.com/keras-team/keras/issues/4875
Did anyone found a solution ? Do I have to use the Functional or Sequential API instead ?
It seems the problem was coming from the Sublcassing API.
I reconstructed the exact same model using the Functionnal API and now model.save / model.load yields similar results.

Completely different results using Tensorflow and Pytorch for MobilenetV3 Small

I am using transfer learning from MobileNetV3 Small to predict 5 different points on an image. I am doing this as a regression task.
For both models:
Setting the last 50 layers trainable and adding the same fully connected layers to the end.
Learning rate 3e-2
Batch size 32
Adam optimizer with the same betas
100 epochs
The inputs consist of RGB unscaled images
Pytorch
Model
def _init_weights(m):
if type(m) == nn.Linear:
nn.init.xavier_uniform_(m.weight)
m.bias.data.fill_(0.01)
def get_mob_v3_small():
model = torchvision.models.mobilenet_v3_small(pretrained=True)
children_list = get_children(model)
for c in children_list[:-50]:
for p in c.parameters():
p.requires_grad = False
return model
class TransferMobileNetV3_v2(nn.Module):
def __init__(self,
num_keypoints: int = 5):
super(TransferMobileNetV3_v2, self).__init__()
self.classifier_neurons = num_keypoints*2
self.base_model = get_mob_v3_small()
self.base_model.classifier = nn.Sequential(
nn.Linear(in_features=1024, out_features=1024),
nn.ReLU(),
nn.Linear(in_features=1024, out_features=512),
nn.ReLU(),
nn.Linear(in_features=512, out_features=self.classifier_neurons)
)
self.base_model.apply(_init_weights)
def forward(self, x):
out = self.base_model(x)
return out
Training Script
def train(net, trainloader, testloader, train_loss_fn, optimizer, scaler, args):
len_dataloader = len(trainloader)
for epoch in range(1, args.epochs+1):
net.train()
for batch_idx, sample in enumerate(trainloader):
inputs, labels = sample
inputs, labels = inputs.to(args.device), labels.to(args.device)
optimizer.zero_grad()
with torch.cuda.amp.autocast(args.use_amp):
prediction = net(inputs)
loss = train_loss_fn(prediction, labels)
scaler.scale(loss).backward()
scaler.step(optimizer)
scaler.update()
def main():
args = make_args_parser()
args.device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
seed = args.seed
torch.manual_seed(seed)
torch.cuda.manual_seed_all(seed)
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False
np.random.seed(seed)
loss_fn = nn.MSELoss()
optimizer = optim.Adam(net.parameters(), lr=3e-2,
betas=(0.9, 0.999))
scaler = torch.cuda.amp.GradScaler(enabled=args.use_amp)
train(net, train_loader, test_loader, loss_fn, optimizer, scaler, args)
Tensorflow
Model
base_model = tf.keras.applications.MobileNetV3Small(weights='imagenet',
input_shape=(224,224,3))
x_in = base_model.layers[-6].output
x = Dense(units=1024, activation="relu")(x_in)
x = Dense(units=512, activation="relu")(x)
x = Dense(units=10, activation="linear")(x)
model = Model(inputs=base_model.input, outputs=x)
for layer in model.layers[:-50]:
layer.trainable=False
Training Script
model.compile(loss = "mse",
optimizer = tf.keras.optimizers.Adam(learning_rate=3e-2))
history = model.fit(input_numpy, output_numpy,
verbose=1,
batch_size=32, epochs=100,validation_split = 0.2)
Results
The PyTorch model predicts one single point around the center for all 5 different points.
The Tensorflow model predicts the points quite well and are quite accurate.
The loss in the Pytorch model is much higher than the Tensorflow model.
Please do let me know what is going wrong as I am trying my best to shift to PyTorch for this work and I need this model to give me similar/identical results. Please do let me know what is going wrong as I am trying my best to shift to PyTorch for this work and I need this model to give me similar/identical results.
Note: I also noticed that the MobileNetV3 Small model seems to be different in PyTorch and different in Tensorflow. I do not know if am interpreting it wrong, but I'm putting it here just in case.

TensorFlow 2.3: load model from ModelCheckPoint callback with both custom layers and model

I have wrote a custom code to build a UNet architecture. To do so I have firstly subclassed the tf.keras.layers.Layer object to define an encoder convolutional block composed by a conv3D layer, a BatchNormalization layer and a Activation layer, similarly I defined a decoder inverse convolutional block composed by a Conv3DTranspose layer, a BatchNormalization layer, an Activation layer and a Concatenate layer. Finally I subclassed the tf.keras.Model object to define the full model, composed by 4 enconding blocks and 4 decoding blocks.
To checkpoint the model while training I have used the tf.keras.callbacks.ModelCheckpoint callback. However when a I try to load back the model (that in fact is still training) with tf.keras.models.load_model() I receive the following error: ValueError: No model found in config file.
Here the full code for the model definition, building and fitting:
import tensorflow as tf
# Encoder block
class ConvBlock(tf.keras.layers.Layer):
def __init__(self, n_filters, conv_size, conv_stride, **kwargs):
super(ConvBlock, self).__init__(**kwargs)
self.conv3D = tf.keras.layers.Conv3D(
filters=n_filters,
kernel_size=conv_size,
strides=conv_stride,
padding="same",
)
self.batch_norm = tf.keras.layers.BatchNormalization()
self.relu = tf.keras.layers.Activation("relu")
def call(self, inputs, training=None):
h = self.conv3D(inputs)
if training:
h = self.batch_norm(h)
h = self.relu(h)
return h
# Decoder block
class InvConvBlock(tf.keras.layers.Layer):
def __init__(self, n_filters, conv_size, conv_stride, activation, **kwargs):
super(InvConvBlock, self).__init__(**kwargs)
self.conv3D_T = tf.keras.layers.Conv3DTranspose(
filters=n_filters,
kernel_size=conv_size,
strides=conv_stride,
padding="same",
)
self.batch_norm = tf.keras.layers.BatchNormalization()
self.activ = tf.keras.layers.Activation(activation)
self.concat = tf.keras.layers.Concatenate(axis=-1)
def call(self, inputs, feat_concat=None, training=None):
h = self.conv3D_T(inputs)
if training:
h = self.batch_norm(h)
h = self.activ(h)
if feat_concat is not None:
h = self.concat([h, feat_concat])
return h
class UNet(tf.keras.Model):
def __init__(self, n_filters, e_size, e_stride, d_size, d_stride, **kwargs):
super(UNet, self).__init__(**kwargs)
# Encoder
self.conv_block_1 = ConvBlock(n_filters, e_size, e_stride)
self.conv_block_2 = ConvBlock(n_filters * 2, e_size, e_stride)
self.conv_block_3 = ConvBlock(n_filters * 4, e_size, (1, 1, 1))
self.conv_block_4 = ConvBlock(n_filters * 8, e_size, (1, 1, 1))
# Decoder
self.inv_conv_block_1 = InvConvBlock(n_filters * 4, d_size, (1, 1, 1), "relu")
self.inv_conv_block_2 = InvConvBlock(n_filters * 2, d_size, (1, 1, 1), "relu")
self.inv_conv_block_3 = InvConvBlock(n_filters, d_size, d_stride, "relu")
self.inv_conv_block_4 = InvConvBlock(1, d_size, d_stride, "sigmoid")
def call(self, inputs, **kwargs):
h1 = self.conv_block_1(inputs, **kwargs)
h2 = self.conv_block_2(h1, **kwargs)
h3 = self.conv_block_3(h2, **kwargs)
h = self.conv_block_4(h3, **kwargs)
h = self.inv_conv_block_1(h, feat_concat=h3, **kwargs)
h = self.inv_conv_block_2(h, feat_concat=h2, **kwargs)
h = self.inv_conv_block_3(h, feat_concat=h1, **kwargs)
h = self.inv_conv_block_4(h, **kwargs)
return h
model = UNet(
n_filters,
e_size,
e_stride,
d_size,
d_stride,
)
model.build((None, *input_shape, 1))
loss = tf.keras.losses.BinaryCrossentropy(from_logits=True)
optimizer = tf.keras.optimizers.Adam(learning_rate)
metrics = [tf.keras.metrics.Precision(), tf.keras.metrics.Recall()]
model.compile(
loss=loss,
optimizer=optimizer,
metrics=metrics,
)
CP_callback = tf.keras.callbacks.ModelCheckpoint(
f"{checkpoint_dir}/model.h5", save_freq='epoch', monitor="loss"
)
unet.fit(
data,
epochs=opts.epochs,
callbacks=[CP_callback],
)
To load the model I used the following code on another python console:
import tensorflow as tf
model = tf.keras.models.load_model(f'{checkpoint_dir}/model.h5')
but here I receive the above mentioned error. What am I missing? Or what am I doing wrong?
Thank you in advance for your help.
This is because you don't define the get_config method in your custom layers. For this check, this exited answer in SO.
Otherwise, you can save the trained weights (not the full model) and load the model as follows. In that case, you don't need to define this function. Please note, it's good practice to do, however. Here is a workaround for your problem:
# callback
tf.keras.callbacks.ModelCheckpoint('model.h5',
monitor='val_loss',
verbose= 1,
save_best_only=True,
mode= 'min',
save_weights_only=True) # <---- only save weight
# train
model = UNet(
n_filters,
e_size,
e_stride,
d_size,
d_stride,
)
model.compile(...)
model.fit(...)
# inference
model = UNet(
n_filters,
e_size,
e_stride,
d_size,
d_stride,
)
model.build((None, *input_shape, 1))
model.load_weights('model.h5')
For more details, see the documentation of Serialization and saving and also collab demonstration of François Chollet. Also, We've written an article about model subclassing and custom training stuff in tf 2.x, in the Save and Load section (at the bottom) of this article, we've demonstrated many strategies, here, hope that help.
Update
I've run your public colab notebook. Unfortunately, I am facing the same issue, and it's a bit weird and currently, I don't have the exact answer for saving the entire model in the ModelCheckpoint callback with Custom Layer even if we define the get_config() method.
However, there is another workaround that may come in handy for you. As we know there are two major ways to save tf models: (1). SaveModel and HDF5 format. The way is we choose the SaveMoedl format. Which is recommended by the way and safe to use.
The key difference between HDF5 and SavedModel is that HDF5 uses object configs to save the model architecture, while SavedModel saves the execution graph. Thus, SavedModels are able to save custom objects like subclassed models and custom layers without requiring the original code.
Now, as for your requirements, you are saving the entire model along with the best loss or val_loss in training time. For that, we can define a custom callback do save the model for lowest validation_loss (or whatever you want). As follows:
class SaveModelH5(tf.keras.callbacks.Callback):
def on_train_begin(self, logs=None):
self.val_loss = []
def on_epoch_end(self, epoch, logs=None):
current_val_loss = logs.get("val_loss")
self.val_loss.append(logs.get("val_loss"))
if current_val_loss <= min(self.val_loss):
print('Find lowest val_loss. Saving entire model.')
self.model.save('unet', save_format='tf') # < ----- Here
save_model = SaveModelH5()
unet.fit(.., callbacks=save_model)
Using
model.save('any_name', save_format=`tf`)
allows us create a any_name working directory, inside which it contains assets, saved_model.pb, and variables. The model architecture and training configuration, including the optimizer, losses, and metrics are stored in saved_model.pb. The weights are saved in the variables directory.
When saving the model and its layers, the SavedModel format stores the class name, call function, losses, and weights (and the config, if implemented). The call function defines the computation graph of the model/layer. In the absence of the model/layer config, the call function is used to create a model that exists like the original model which can be trained, evaluated, and used for inference. When we need to re-load the saved model, we can do as follows:
new_unet = tf.keras.models.load_model("unet", compile=False)
Colab.