Fit concatenated modules throws error regarding the input - tensorflow

I have 3 models that are pre-trained. I want to concat the models and then add a fully connected network on top:
output = concatenate(
[
resnet.layers[-2].output, # model1
cnn2.layers[-2].output, #model2
cnn.layers[-2].output #model3
], axis=-1)
fc = Dense(units=4096, activation="relu")(output)
fc = Dense(units=4096, activation="relu")(fc)
logits = Dense(3, activation="sigmoid")(fc)
model = Model(inputs=[resnet.inputs, cnn2.inputs, cnn.inputs], outputs=logits)
model.compile(optimizer="adam", loss="binary_crossentropy", metrics=["acc"])
epochs = 10
x_train_second, x_val_second, y_train_second, y_val_second = train_test_split(x_train, y_train, test_size=0.15)
history = model.fit(
datagen.flow([x_train_second, x_train_second, x_train_second], y=y_train_second, batch_size=64),
epochs=epochs, steps_per_epoch=x_train_second.shape[0] // 64,
)
I get the error:
ValueError: Layer "model_68" expects 3 input(s), but it received 2 input tensors. Inputs received: [<tf.Tensor 'IteratorGetNext:0' shape=(None, None, None, None) dtype=float32>, <tf.Tensor 'IteratorGetNext:1' shape=(None, None, None, None) dtype=float32>]
Which is a bit confusing since I am passing 3 inputs to x in model.fit(). Even if I insert 4 or 5 inputs there I also get the same error, which suggests that something is way off...
Update: So I think it is because of datagen.flow(), because if I use:
model.fit([x_train_second, x_train_second, x_train_second], y=y_train_second, batch_size=64,
epochs=epochs
)
It is working.So the question is to use ImageDataGenerator.flow() for merged module

Related

Can't get multi-output CNN to work (tensorflow and keras)

I'm currently working on a task of fiber tip tracking on an endoscopic video.
For this purpose I have two models:
classifier that tells whether image contains fiber (is_visible)
regressor that predicts fiber tip position (x, y)
I am using ResNet18 pretrained on ImageNet for this purpose and it works great. But I'm experiencing performance issues,
so I decided to combine these two models into a single one using multi-output approach.
But so far I haven't been able to get it to work.
TENSORFLOW:
TensorFlow version: 2.10.1
DATATSET:
My dataset is stored in a HDF5 format. Each sample has:
an image (224, 224, 3)
uint8 for visibility flag
and two floats for fiber tip position (x, y)
I am loading this dataset using custom generator as follows:
output_types = (tf.float32, tf.uint8, tf.float32)
output_shapes = (
tf.TensorShape((None, image_height, image_width, number_of_channels)), # image
tf.TensorShape((None, 1)), # is_visible
tf.TensorShape((None, 1, 1, 2)), # x, y
)
train_dataset = tf.data.Dataset.from_generator(
generator, output_types=output_types, output_shapes=output_shapes,
)
MODEL:
My model is defined as follows:
model = ResNet18(input_shape=(224, 224, 3), weights="imagenet", include_top=False)
inputLayer = model.input
innerLayer = tf.keras.layers.Flatten()(model.output)
is_visible = tf.keras.layers.Dense(1, activation="sigmoid", name="is_visible")(innerLayer)
position = tf.keras.layers.Dense(2)(innerLayer)
position = tf.keras.layers.Reshape((1, 1, 2), name="position")(position)
model = tf.keras.Model(inputs=[inputLayer], outputs=[is_visible, position])
adam = tf.keras.optimizers.Adam(1e-4)
model.compile(
optimizer=adam,
loss={
"is_visible": "binary_crossentropy",
"position": "mean_squared_error",
},
loss_weights={
"is_visible": 1.0,
"position": 1.0
},
metrics={
"is_visible": "accuracy",
"position": "mean_squared_error"
},
)
PROBLEM:
Dataset is working great, I can loop through each batch. But when it comes to training
model.fit(
train_dataset,
validation_data=validation_dataset,
epochs=100000,
callbacks=callbacks,
)
I get the following error
ValueError: Can not squeeze dim[3], expected a dimension of 1, got 2 for '{{node mean_squared_error/weighted_loss/Squeeze}} = SqueezeT=DT_FLOAT, squeeze_dims=[-1]' with input shapes: [?,1,1,2].
I tried to change the dataset format like so:
output_types = (tf.float32, tf.uint8, tf.float32, tf.float32)
output_shapes = (
tf.TensorShape((None, image_height, image_width, number_of_channels)), # image
tf.TensorShape((None, 1)), # is_visible
tf.TensorShape((None, 1)), # x
tf.TensorShape((None, 1)), # y
)
But these leads to another error:
ValueError: Data is expected to be in format x, (x,), (x, y), or (x, y, sample_weight), found: (<tf.Tensor 'IteratorGetNext:0' shape=(None, 224, 224, 3) dtype=float32>, <tf.Tensor 'IteratorGetNext:1' shape=(None, 1) dtype=uint8>, <tf.Tensor 'IteratorGetNext:2' shape=(None, 1) dtype=float32>, <tf.Tensor 'IteratorGetNext:3' shape=(None, 1) dtype=float32>)
I tried to wrap is_visible and (x,y) returned from train_dataset into dictionary like so:
yield image_batch, {"is_visible": is_visible_batch, "position": position_batch}
Also tried these options:
yield image_batch, (is_visible_batch, position_batch)
yield image_batch, [is_visible_batch, position_batch]
But that didn't help
Can anyone tell me what am I doing wrong? I am totally stuck ))

How to create joint loss with paired Dataset samples in Tensorflow Keras API?

I'm trying to train an autoencoder, with constraints that force one or more of the hidden/encoded nodes/neurons to have an interpretable value. My training approach uses paired images (though after training the model should operate on a single image) and utilizes a joint loss function that includes (1) the reconstruction loss for each of the images and (2) a comparison between values of the hidden/encoded vector, from each of the two images.
I've created an analogous simple toy problem and model to make this clearer. In the toy problem, the autoencoder is given a vector of length 3 as input. The encoding uses one dense layer to compute the mean (a scalar) and another dense layer to compute some other representation of the vector (given my construction, it will likely just learn an identity matrix, i.e., copy the input vector). See the figure below. The lowest node of the hidden layer is intended to compute the mean of the input vector. The rest of the hidden nodes are unconstrained aside from having to accommodate a reconstruction that matches the input.
The figure below exhibits how I wish to train the model, using paired images. "MSE" is mean-squared-error, although the identity of the actual function is not important for the question I'm asking here. The loss function is the sum of the reconstruction loss and the mean-estimation loss.
I've tried creating (1) a tf.data.Dataset to generate paired vectors, (2) a Keras model, and (3) a custom loss function. However, I'm failing to understand how to do this correctly for this particular situation.
I can't get the Model.fit() to run correctly, and to associate the model outputs with the Dataset targets as intended. See code and errors below. Can anyone help? I've done many Google and stackoverflow searches and still don't understand how I can implement this.
import tensorflow as tf
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'
DTYPE = tf.dtypes.float32
N_VEC = 3
def my_generator(n):
while True:
# Create two identical vectors of length, except with different means.
# An internal layer (single neuron) of the model should predict the
# mean of the input vector. To train it to do so, with paired
# vector inputs, use a loss function that penalizes incorrect
# predictions of the difference of the means of two input vectors.
input_vec1 = tf.random.normal((n,), dtype=DTYPE)
target_mean_diff = tf.random.normal((1,), dtype=DTYPE)
input_vec2 = input_vec1 + target_mean_diff
# Model is a constrained autoencoder. Output targets are
# identical to the input vectors. Including them as explicit
# targets in this generator, for generalization.
target_vec1 = tf.identity(input_vec1)
target_vec2 = tf.identity(input_vec2)
yield ({'input_vec1':input_vec1,
'input_vec2':input_vec2},
{'target_vec1':target_vec1,
'target_vec2':target_vec2,
'target_mean_diff':target_mean_diff})
def my_dataset(n, batch_size=4):
ds = tf.data.Dataset.from_generator(my_generator,
output_signature=({'input_vec1':tf.TensorSpec(shape=(n,), dtype=DTYPE),
'input_vec2':tf.TensorSpec(shape=(n,), dtype=DTYPE)},
{'target_vec1':tf.TensorSpec(shape=(n,), dtype=DTYPE),
'target_vec2':tf.TensorSpec(shape=(n,), dtype=DTYPE),
'target_mean_diff':tf.TensorSpec(shape=(1,), dtype=DTYPE)}),
args=(n,))
ds = ds.batch(batch_size)
return ds
## Do a brief test using the Dataset
ds = my_dataset(N_VEC, batch_size=4)
ds_iter = iter(ds)
dict_inputs, dict_targets = next(ds_iter)
print(dict_inputs)
print(dict_targets)
## Define the Model
layer_encode_vec = tf.keras.layers.Dense(N_VEC, activation=None, name='encode_vec')
layer_decode_vec = tf.keras.layers.Dense(N_VEC, activation=None, name='decode_vec')
layer_encode_mean = tf.keras.layers.Dense(1, activation=None, name='encode_mean')
layer_decode_mean = tf.keras.layers.Dense(N_VEC, activation=None, name='decode_mean')
input1 = tf.keras.Input(shape=(N_VEC,), name='input_vec1')
input2 = tf.keras.Input(shape=(N_VEC,), name='input_vec2')
vec_encoded1 = layer_encode_vec(input1)
vec_encoded2 = layer_encode_vec(input2)
mean_encoded1 = layer_encode_mean(input1)
mean_encoded2 = layer_encode_mean(input2)
mean_diff = mean_encoded2 - mean_encoded1
pred_vec1 = layer_decode_vec(vec_encoded1) + layer_decode_mean(mean_encoded1)
pred_vec2 = layer_decode_vec(vec_encoded2) + layer_decode_mean(mean_encoded2)
model = tf.keras.Model(inputs=[input1, input2], outputs=[pred_vec1, pred_vec2, mean_diff])
print(model.summary())
## Define the joint loss function
def loss_total(y_true, y_pred):
loss_reconstruct = tf.reduce_mean(tf.keras.MSE(y_true[0], y_pred[0]))/2 + \
tf.reduce_mean(tf.keras.MSE(y_true[1], y_pred[1]))/2
loss_mean = tf.reduce_mean(tf.keras.MSE(y_true[2], y_pred[2]))
return loss_reconstruct + loss_mean
## Compile model
optimizer = tf.keras.optimizers.Adam(lr=0.01)
model.compile(optimizer=optimizer, loss=loss_total)
## Train model
history = model.fit(x=ds, epochs=10, steps_per_epoch=10)
Output: Example batch from the Dataset:
{'input_vec1': <tf.Tensor: shape=(4, 3), dtype=float32, numpy=
array([[-0.53022575, -0.02389329, 0.32843253],
[-0.61793506, -0.8276422 , -1.3469328 ],
[-0.5401968 , 0.3141346 , -1.3638284 ],
[-1.2189807 , 0.23848908, 0.75108534]], dtype=float32)>, 'input_vec2': <tf.Tensor: shape=(4, 3), dtype=float32, numpy=
array([[-0.23415083, 0.27218163, 0.6245074 ],
[-0.57636774, -0.7860749 , -1.3053654 ],
[ 0.65463066, 1.508962 , -0.16900098],
[-0.49326736, 0.9642024 , 1.4767987 ]], dtype=float32)>}
{'target_vec1': <tf.Tensor: shape=(4, 3), dtype=float32, numpy=
array([[-0.53022575, -0.02389329, 0.32843253],
[-0.61793506, -0.8276422 , -1.3469328 ],
[-0.5401968 , 0.3141346 , -1.3638284 ],
[-1.2189807 , 0.23848908, 0.75108534]], dtype=float32)>, 'target_vec2': <tf.Tensor: shape=(4, 3), dtype=float32, numpy=
array([[-0.23415083, 0.27218163, 0.6245074 ],
[-0.57636774, -0.7860749 , -1.3053654 ],
[ 0.65463066, 1.508962 , -0.16900098],
[-0.49326736, 0.9642024 , 1.4767987 ]], dtype=float32)>, 'target_mean_diff': <tf.Tensor: shape=(4, 1), dtype=float32, numpy=
array([[0.29607493],
[0.04156734],
[1.1948274 ],
[0.7257133 ]], dtype=float32)>}
Output: The model summary:
Model: "model"
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_vec1 (InputLayer) [(None, 3)] 0
__________________________________________________________________________________________________
input_vec2 (InputLayer) [(None, 3)] 0
__________________________________________________________________________________________________
encode_vec (Dense) (None, 3) 12 input_vec1[0][0]
input_vec2[0][0]
__________________________________________________________________________________________________
encode_mean (Dense) (None, 1) 4 input_vec1[0][0]
input_vec2[0][0]
__________________________________________________________________________________________________
decode_vec (Dense) (None, 3) 12 encode_vec[0][0]
encode_vec[1][0]
__________________________________________________________________________________________________
decode_mean (Dense) (None, 3) 6 encode_mean[0][0]
encode_mean[1][0]
__________________________________________________________________________________________________
tf.__operators__.add (TFOpLambd (None, 3) 0 decode_vec[0][0]
decode_mean[0][0]
__________________________________________________________________________________________________
tf.__operators__.add_1 (TFOpLam (None, 3) 0 decode_vec[1][0]
decode_mean[1][0]
__________________________________________________________________________________________________
tf.math.subtract (TFOpLambda) (None, 1) 0 encode_mean[1][0]
encode_mean[0][0]
==================================================================================================
Total params: 34
Trainable params: 34
Non-trainable params: 0
__________________________________________________________________________________________________
Output: The error message when calling model.fit():
Epoch 1/10
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
...
ValueError: Found unexpected keys that do not correspond to any
Model output: dict_keys(['target_vec1', 'target_vec2', 'target_mean_diff']).
Expected: ['tf.__operators__.add', 'tf.__operators__.add_1', 'tf.math.subtract']
You can pass a dict to Model for both inputs and outputs like so:
model = tf.keras.Model(
inputs={"input_vec1": input1, "input_vec2": input2},
outputs={
"target_vec1": pred_vec1,
"target_vec2": pred_vec2,
"target_mean_diff": mean_diff,
},
)
which avoids having to name the output layers.
For the losses, it's currently applying loss_total to each of the 3 outputs individually and summing to get the final loss, which is not what you want. So you can either break out each of the losses individually:
model.compile(
optimizer=optimizer,
loss={"target_vec1": "mse", "target_vec2": "mse", "target_mean_diff": "mse"},
loss_weights={"target_vec1": 0.5, "target_vec2": 0.5, "target_mean_diff": 1},
)
or you can manually train the model using a modified loss function that takes dict input. Something like:
def loss_total(y_true, y_pred):
loss_reconstruct = (
tf.reduce_mean(tf.keras.losses.MSE(y_true["target_vec1"], y_pred["target_vec1"])) / 2
+ tf.reduce_mean(tf.keras.losses.MSE(y_true["target_vec2"], y_pred["target_vec2"])) / 2
)
loss_mean = tf.reduce_mean(tf.keras.losses.MSE(y_true["target_mean_diff"], y_pred["target_mean_diff"]))
return loss_reconstruct + loss_mean
for epoch in range(10):
for batch, (x, y) in zip(range(10), ds):
with tf.GradientTape() as tape:
outputs = model(x, training=True)
loss = loss_total(y, outputs)
trainable_vars = model.trainable_variables
gradients = tape.gradient(loss, trainable_vars)
optimizer.apply_gradients(zip(gradients, trainable_vars))
print(f"Batch: {batch}, loss: {loss.numpy()}")

Basic RNN training using fit_generator doesn't output the expected shape

I'm implementing a basic RNN composed of a 512 units GRU and a dense layer using Keras:
model = Sequential()
model.add(GRU(units=512,
return_sequences=True,
input_shape=(None, num_x_signals,)))
model.add(Dense(num_y_signals, activation='sigmoid'))
I needed to generate input batches on the fly so I used fit_generator :
model.fit_generator(generator=generator_train, epochs=NB_EPOCHS, steps_per_epoch=STEPS_PER_EPOCH,
validation_data=generator_test, validation_steps=900, callbacks=callbacks)
And here is how I define my batch generator :
SAMPLE_PERIOD_PER_INPUT = 1728
PERIOD_TO_PREDICT = 288
BATCH_SIZE = 64
def batch_generator(batch_size, sequence_length, train = True):
while True:
x_shape = (batch_size, sequence_length, num_x_signals)
x_batch = np.zeros(shape=x_shape, dtype=np.float16)
y_shape = (batch_size, PERIOD_TO_PREDICT, num_y_signals)
y_batch = np.zeros(shape=y_shape, dtype=np.float16)
for i in range(batch_size):
if train:
idx = np.random.randint(num_train - sequence_length)
predict_idx = (idx + sequence_length) - PERIOD_TO_PREDICT
x_batch[i] = x_train_scaled[idx:idx+sequence_length]
y_batch[i] = y_train_scaled[predict_idx:idx+sequence_length]
else:
idx = np.random.randint(num_test - sequence_length)
predict_idx = (idx + sequence_length) - PERIOD_TO_PREDICT
x_batch[i] = x_test_scaled[idx:idx+sequence_length]
y_batch[i] = y_test_scaled[predict_idx:idx+sequence_length]
yield (x_batch, y_batch)
generator_train = batch_generator(batch_size=BATCH_SIZE, sequence_length=SAMPLE_PERIOD_PER_INPUT)
generator_test = batch_generator(batch_size=BATCH_SIZE, sequence_length=SAMPLE_PERIOD_PER_INPUT, train = False)
I also use a "custom" loss function because I need to ignore the first computed sequence which is supposed to not be accurate :
warmup_steps = 50
def loss_mse_warmup(y_true, y_pred):
y_true_slice = y_true[:, warmup_steps:, :]
y_pred_slice = y_pred[:, warmup_steps:, :]
loss = tf.losses.mean_squared_error(labels=y_true_slice,
predictions=y_pred_slice)
loss_mean = tf.reduce_mean(loss)
return loss_mean
optimizer = RMSprop(lr=1e-3)
model.compile(loss=loss_mse_warmup, optimizer=optimizer)
Here is the summary of my model :
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
gru (GRU) (None, None, 512) 798720
_________________________________________________________________
dense (Dense) (None, None, 1) 513
=================================================================
Total params: 799,233
Trainable params: 799,233
Non-trainable params: 0
_________________________________________________________________
But when I run this it says that there shape errors :
2 root error(s) found.
(0) Invalid argument: Incompatible shapes: [64,238,1] vs. [64,1678,1]
[[{{node loss_4/dense_loss/mean_squared_error/SquaredDifference}}]]
[[loss_4/mul/_167]]
(1) Invalid argument: Incompatible shapes: [64,238,1] vs. [64,1678,1]
[[{{node loss_4/dense_loss/mean_squared_error/SquaredDifference}}]]
0 successful operations.
0 derived errors ignored.
Any ideas why ? Where did I write something wrong ?

The two structures don't have the same nested structure while adding return_state=True over LSTM

I don't know if it is kind of bug or an error.
I have also reported this issue here.
The thing I am trying to do is that I want to make my custom LSTM statefull.
So this code running fine without adding return_state=True. Once I add this to the code it raises this error : The two structures don't have the same nested structure.
This is a reproducible code:
from keras.layers import Lambda
import keras
import numpy as np
import tensorflow as tf
SEQUENCE_LEN = 45
LATENT_SIZE = 20
EMBED_SIZE = 50
VOCAB_SIZE = 100
BATCH_SIZE = 10
def rev_entropy(x):
def row_entropy(row):
_, _, count = tf.unique_with_counts(row)
count = tf.cast(count,tf.float32)
prob = count / tf.reduce_sum(count)
prob = tf.cast(prob,tf.float32)
rev = -tf.reduce_sum(prob * tf.log(prob))
return rev
nw = tf.reduce_sum(x,axis=1)
rev = tf.map_fn(row_entropy, x)
rev = tf.where(tf.is_nan(rev), tf.zeros_like(rev), rev)
rev = tf.cast(rev, tf.float32)
max_entropy = tf.log(tf.clip_by_value(nw,2,LATENT_SIZE))
concentration = (max_entropy/(1+rev))
new_x = x * (tf.reshape(concentration, [BATCH_SIZE, 1]))
return new_x
inputs = keras.layers.Input(shape=(SEQUENCE_LEN,), name="input")
embedding = keras.layers.Embedding(output_dim=EMBED_SIZE, input_dim=VOCAB_SIZE, input_length=SEQUENCE_LEN, trainable=True)(inputs)
encoded = keras.layers.Bidirectional(keras.layers.LSTM(LATENT_SIZE,return_state=True), merge_mode="sum", name="encoder_lstm")(embedding)
encoded = Lambda(rev_entropy)(encoded)
decoded = keras.layers.RepeatVector(SEQUENCE_LEN, name="repeater")(encoded)
decoded = keras.layers.Bidirectional(keras.layers.LSTM(EMBED_SIZE, return_sequences=True,return_state=True), merge_mode="sum", name="decoder_lstm")(decoded)
autoencoder = keras.models.Model(inputs, decoded)
autoencoder.compile(optimizer="sgd", loss='mse')
autoencoder.summary()
x = np.random.randint(0, 90, size=(10, 45))
print(x.shape)
y = np.random.normal(size=(10, 45, 50))
print(y.shape)
history = autoencoder.fit(x, y, epochs=1)
Update1
After applying the idea of the comment tf.map_fn(row_entropy, encoded,dtype=tf.float32), I received a new error:
ValueError: Layer repeater expects 1 inputs, but it received 5 input tensors. Input received: [<tf.Tensor 'encoder_lstm/add_16:0' shape=(?, 20) dtype=float32>, <tf.Tensor 'encoder_lstm/while/Exit_3:0' shape=(?, 20) dtype=float32>, <tf.Tensor 'encoder_lstm/while/Exit_4:0' shape=(?, 20) dtype=float32>, <tf.Tensor 'encoder_lstm/while_1/Exit_3:0' shape=(?, 20) dtype=float32>, <tf.Tensor 'encoder_lstm/while_1/Exit_4:0' shape=(?, 20) dtype=float32>]
Also, consider that this error raises even without that lambda layer, So it seems there is something else wrong.
If I try encoded.shape, it says encoded is a list with length 5 however it has to be a tensor with (batch_size, latent size)!!!
everything is fine without adding return_state=True
Any help s appreciated!

How to change the LSTMCell weight format from tensorflow to tf.keras

I I have some old code from tensorflow that I want to make work for tensorflow2/tf.keras. I would like to keep the same LSTM weights, but cannot figure out how to convert the format.
I have the old weights saved in a checkpoint file, and also have them saved in csv files.
My old code looks something like this:
input_placeholder = tf.placeholder(tf.float32, [None, None, input_units])
lstm_layers = [tf.nn.rnn_cell.LSTMCell(layer_size), tf.nn.rnn_cell.LSTMCell(layer_size)]
stacked = tf.contrib.rnn.MultiRNNCell(lstm_layers)
features, state = tf.nn.dynamic_rnn(stacked, input_placeholder, dtype=tf.float32)
And my new code looks something like this:
input_placeholder = tf.placeholder(tf.float32, [None, None, input_units])
lstm_layers = [tf.keras.layers.LSTMCell(layer_size),tf.keras.layers.LSTMCell(layer_size)]
stacked = tf.keras.layers.StackedRNNCells(lstm_layers)
features = stacked(input_placeholder)
... #later in the code
features.set_weights(previous_weights)
The old bias seems to match the new bias.
The old kernel seems to be the concatenation of the kernel and recurrent kernel.
I am able to load the previous_weights into the model (have explicitly checked the weights loaded correctly), however tests I have fail to produce the same result.
Digging into the source code, the kernels seem to have a different format under the hood.
Is it possible to calculate the kernel and recurrent_kernel (tf.keras) using these old saved kernel weights?
Links if they're helpful:
https://github.com/tensorflow/tensorflow/blob/r1.13/tensorflow/python/ops/rnn_cell_impl.py
https://github.com/tensorflow/tensorflow/blob/r1.13/tensorflow/python/keras/layers/recurrent.py
In case anyone else encounters this.
There are three differences that I found for migrating weights:
The kernel is shuffled in axis=0. Both implementations use one (or two) dot-products to do four dot-product operations that the lstm calls for by concatenating weights in axis=0. The challenge is that the middle two quarters of this concatenated weight matrix are swapped.
The kernel is divided in axis=1. The rnn_cell implementation has a single weights matrix that is dot-producted with a concatenation of the inputs and the hidden state, where as the keras implementation stores these as two attributes: _kernel and _recurrent_kernel, and dot-products these separately before summing them.
A forget bias is explicitly added in the cell calculation from rnn_cell, but is integrated into the cell bias in keras, with the option modifying the initialisation only.
A migration function that accounts for these three differences is
def convert_lstm_weights(tf1_kernel, tf1_bias, forget_bias=True):
a, b, c, d = tf.split(tf1_kernel, num_or_size_splits=4, axis=1)
lstm_kernel = tf.concat(values=[a, c, b, d], axis=1)
kernel, recurrent_kernel = lstm_kernel[:-hps.hidden_dim], lstm_kernel[-hps.hidden_dim:]
a, b, c, d = tf.split(tf1_bias, num_or_size_splits=4, axis=0)
bias = tf.concat(values=[a, c + float(forget_bias), b, d], axis=0) # + 1 to account for forget bias
return kernel, recurrent_kernel, bias
And two differences I've found that need to be accounted for during use:
The activation function in tf.compat.v1.nn.rnn_cell.LSTMCell is sigmoid but tf.keras.LSTMCell is hard sigmoid so this needs to be set on initialization with activation="sigmoid".
The states are returned in opposite orders.
output, (c_state_new, m_state_new) = tf.compat.v1.nn.rnn_cell.LSTMCell(hidden_size, state_is_tuple=True)(input, (c_state, m_state))
becomes
output, (h_state_new, c_state_new) = tf.keras.layers.LSTMCell(hidden_size, activation="sigmoid")(input, (h_state, c_state))
where the hidden state is referred to by m in rnn_cell and h in keras.
You can split the matrix:
If you see here, kernel matrix of TF1 has shape of (input_shape[-1], self.units).
Let's say you have 20 inputs and 128 nodes in an LSTM layer
input_units=20
layer_size = 128
input_placeholder = tf.placeholder(tf.float32, [None, None, input_units])
lstm_layers = [tf.nn.rnn_cell.LSTMCell(layer_size), tf.nn.rnn_cell.LSTMCell(layer_size)]
stacked = tf.contrib.rnn.MultiRNNCell(lstm_layers)
output, state = tf.nn.dynamic_rnn(stacked, input_placeholder, dtype=tf.float32)
Your trainable parameters will have these shapes:
[<tf.Variable 'rnn/multi_rnn_cell/cell_0/lstm_cell/kernel:0' shape=(148, 512) dtype=float32_ref>,
<tf.Variable 'rnn/multi_rnn_cell/cell_0/lstm_cell/bias:0' shape=(512,) dtype=float32_ref>,
<tf.Variable 'rnn/multi_rnn_cell/cell_1/lstm_cell/kernel:0' shape=(256, 512) dtype=float32_ref>,
<tf.Variable 'rnn/multi_rnn_cell/cell_1/lstm_cell/bias:0' shape=(512,) dtype=float32_ref>]
In TF 1.0, kernel and recurrent kernel of TF 2.0 is concatenated (see here)
def build(self, input_shape):
self.kernel = self.add_weight(shape=(input_shape[-1], self.units),
initializer='uniform',
name='kernel')
self.recurrent_kernel = self.add_weight(
shape=(self.units, self.units),
initializer='uniform',
name='recurrent_kernel')
self.built = True
At this new version you have now two different weight matrices.
input_placeholder = tf.placeholder(tf.float32, [None, None, input_units])
lstm_layers = [tf.keras.layers.LSTMCell(layer_size),tf.keras.layers.LSTMCell(layer_size)]
stacked = tf.keras.layers.StackedRNNCells(lstm_layers)
output = tf.keras.layers.RNN(stacked, return_sequences=True, return_state=True, dtype=tf.float32)
Thus, your trainable parameters are:
<tf.Variable 'rnn_1/while/stacked_rnn_cells_1/kernel:0' shape=(20, 512) dtype=float32>,
<tf.Variable 'rnn_1/while/stacked_rnn_cells_1/recurrent_kernel:0' shape=(128, 512) dtype=float32>,
<tf.Variable 'rnn_1/while/stacked_rnn_cells_1/bias:0' shape=(512,) dtype=float32>,
<tf.Variable 'rnn_1/while/stacked_rnn_cells_1/kernel_1:0' shape=(128, 512) dtype=float32>,
<tf.Variable 'rnn_1/while/stacked_rnn_cells_1/recurrent_kernel_1:0' shape=(128, 512) dtype=float32>,
<tf.Variable 'rnn_1/while/stacked_rnn_cells_1/bias_1:0' shape=(512,) dtype=float32>]