Keras embedding layer with variable length in functional API - tensorflow

I have the following sequential model that works with variable length inputs:
m = Sequential()
m.add(Embedding(len(chars), 4, name="embedding"))
m.add(Bidirectional(LSTM(16, unit_forget_bias=True, name="lstm")))
m.add(Dense(len(chars),name="dense"))
m.add(Activation("softmax"))
m.summary()
Gives the following summary:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
embedding (Embedding) (None, None, 4) 204
_________________________________________________________________
bidirectional_2 (Bidirection (None, 32) 2688
_________________________________________________________________
dense (Dense) (None, 51) 1683
_________________________________________________________________
activation_2 (Activation) (None, 51) 0
=================================================================
Total params: 4,575
Trainable params: 4,575
Non-trainable params: 0
However when I try to implement the same model in functional API I don't know whatever I try as Input layer shape doesn't seem to be the same as the sequential model. Here is one of my tries:
charinput = Input(shape=(4,),name="input",dtype='int32')
embedding = Embedding(len(chars), 4, name="embedding")(charinput)
lstm = Bidirectional(LSTM(16, unit_forget_bias=True, name="lstm"))(embedding)
dense = Dense(len(chars),name="dense")(lstm)
output = Activation("softmax")(dense)
And here is the summary:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input (InputLayer) (None, 4) 0
_________________________________________________________________
embedding (Embedding) (None, 4, 4) 204
_________________________________________________________________
bidirectional_1 (Bidirection (None, 32) 2688
_________________________________________________________________
dense (Dense) (None, 51) 1683
_________________________________________________________________
activation_1 (Activation) (None, 51) 0
=================================================================
Total params: 4,575
Trainable params: 4,575
Non-trainable params: 0

Use shape=(None,) in the input layer, in your case:
charinput = Input(shape=(None,),name="input",dtype='int32')

Try adding the argument input_length=None to the embeddinglayer.

Related

Split trained autoencoder to encoder and decoder

I realize now that implementing it like this would have been a good idea. However, I have an already trained and fine-tuned autoencoder that looks like this:
Model: "autoencoder"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
user_input (InputLayer) [(None, 5999)] 0
_________________________________________________________________
user_e00 (Dense) (None, 64) 384000
_________________________________________________________________
user_e01 (Dense) (None, 64) 4160
_________________________________________________________________
user_e02 (Dense) (None, 64) 4160
_________________________________________________________________
user_e03 (Dense) (None, 64) 4160
_________________________________________________________________
user_out (Dense) (None, 32) 2080
_________________________________________________________________
emb_dropout (Dropout) (None, 32) 0
_________________________________________________________________
user_d00 (Dense) (None, 64) 2112
_________________________________________________________________
user_d01 (Dense) (None, 64) 4160
_________________________________________________________________
user_d02 (Dense) (None, 64) 4160
_________________________________________________________________
user_d03 (Dense) (None, 64) 4160
_________________________________________________________________
user_res (Dense) (None, 5999) 389935
=================================================================
Total params: 803,087
Trainable params: 0
Non-trainable params: 803,087
_________________________________________________________________
Now I want to split it into encoder and decoder. I believe I already found the right way for the encoder, which would be:
encoder_in = model.input
encoder_out = model.get_layer(name='user_out').output
encoder = Model(encoder_in, encoder_out, name='encoder')
For the decoder I would like to do something like:
decoder_in = model.get_layer("user_d00").input
decoder_out = model.output
decoder = Model(decoder_in, decoder_out, name='decoder')
but that throws:
WARNING:tensorflow:Functional inputs must come from `tf.keras.Input` (thus holding past layer metadata), they cannot be the output of a previous non-Input layer. Here, a tensor specified as input to "decoder" was not an Input tensor, it was generated by layer emb_dropout.
Note that input tensors are instantiated via `tensor = tf.keras.Input(shape)`.
The tensor that caused the issue was: emb_dropout/cond_3/Identity:0
I believe I have to create an Input layer with the shape of the output of emb_dropout and probably add it to user_d00 (since the Dropout layer is not needed anymore since training has ended). Anyone knows how to do it correctly?

How to apply average pooling at each time step of lstm output?

I'm trying to apply average pooling at each time step of lstm output, please find my architecture as below
X_input = tf.keras.layers.Input(shape=(64,35))
X= tf.keras.layers.LSTM(512,activation="tanh",return_sequences=True,kernel_initializer=tf.keras.initializers.he_uniform(seed=45),kernel_regularizer=tf.keras.regularizers.l2(0.1))(X_input)
X= tf.keras.layers.LSTM(256,activation="tanh",return_sequences=True,kernel_initializer=tf.keras.initializers.he_uniform(seed=45),kernel_regularizer=tf.keras.regularizers.l2(0.1))(X)
X = tf.keras.layers.GlobalAvgPool1D()(X)
X = tf.keras.layers.Dense(128,activation="relu",kernel_initializer=tf.keras.initializers.he_uniform(seed=45),kernel_regularizer=tf.keras.regularizers.l2(0.1))(X)
X = tf.keras.layers.Dense(64,activation="relu",kernel_initializer=tf.keras.initializers.he_uniform(seed=45),kernel_regularizer=tf.keras.regularizers.l2(0.1))(X)
X = tf.keras.layers.Dense(32,activation="relu",kernel_initializer=tf.keras.initializers.he_uniform(seed=45),kernel_regularizer=tf.keras.regularizers.l2(0.1))(X)
# X = tf.keras.layers.Dense(16,activation="relu",kernel_initializer=tf.keras.initializers.he_uniform(seed=45),kernel_regularizer=tf.keras.regularizers.l2(0.1))(X)
output_layer = tf.keras.layers.Dense(10,activation='softmax', kernel_initializer=tf.keras.initializers.he_uniform(seed=45))(X)
model2 = tf.keras.Model(inputs = X_input,outputs = output_layer)
I want to take average at each time step, not on each unit
For example now I'm getting the shape (None,256) but I want to get the shape (None,64) from global average pooling layer, what I need to do for that.
I am not sure this is the most efficient way, but you can try this :
X = tf.keras.layers.Reshape(target_shape=(64,256,1))(X)
X = tf.keras.layers.TimeDistributed(tf.keras.layers.GlobalAveragePooling1D())(X)
X = tf.keras.layers.Reshape(target_shape=(64,))(X)
instead of :
X = tf.keras.layers.GlobalAvgPool1D()(X)
The summary is now :
Model: "functional_13"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_14 (InputLayer) [(None, 64, 35)] 0
_________________________________________________________________
lstm_26 (LSTM) (None, 64, 512) 1122304
_________________________________________________________________
lstm_27 (LSTM) (None, 64, 256) 787456
_________________________________________________________________
reshape_2 (Reshape) (None, 64, 256, 1) 0
_________________________________________________________________
time_distributed_8 (TimeDist (None, 64, 1) 0
_________________________________________________________________
reshape_3 (Reshape) (None, 64) 0
_________________________________________________________________
dense_61 (Dense) (None, 128) 8320
_________________________________________________________________
dense_62 (Dense) (None, 64) 8256
_________________________________________________________________
dense_63 (Dense) (None, 32) 2080
_________________________________________________________________
dense_64 (Dense) (None, 10) 330
=================================================================
Total params: 1,928,746
Trainable params: 1,928,746
Non-trainable params: 0

Select size for output vector with 1000s of labels

Most of the examples on the Internet regarding multi-label image classification are based on just a few labels. For example, with 6 classes we get:
model = models.Sequential()
model.add(layer=base)
model.add(layer=layers.Flatten())
model.add(layer=layers.Dense(units=256, activation="relu"))
model.add(layer=layers.Dense(units=6, activation="sigmoid"))
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
vgg16 (Model) (None, 7, 7, 512) 14714688
_________________________________________________________________
flatten_1 (Flatten) (None, 25088) 0
_________________________________________________________________
dense_1 (Dense) (None, 256) 6422784
_________________________________________________________________
dense_2 (Dense) (None, 6) 1542
=================================================================
Total params: 21,139,014
Trainable params: 13,503,750
Non-trainable params: 7,635,264
However, for datasets with significantly more labels, the size of the training parameters explodes and eventually training process fails with a ResourceExhaustedError error. For example, with 3047 label we get:
model = models.Sequential()
model.add(layer=base)
model.add(layer=layers.Flatten())
model.add(layer=layers.Dense(units=256, activation="relu"))
model.add(layer=layers.Dense(units=3047, activation="sigmoid"))
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
vgg16 (Model) (None, 7, 7, 512) 14714688
_________________________________________________________________
flatten_1 (Flatten) (None, 25088) 0
_________________________________________________________________
dense_1 (Dense) (None, 256) 6422784
_________________________________________________________________
dense_2 (Dense) (None, 3047) 783079
=================================================================
Total params: 21,920,551
Trainable params: 14,285,287
Non-trainable params: 7,635,264
_________________________________________________________________
Obviously, there is something wrong with my network but not sure how to overcome this issue...
Resource Exhauseted Error is related to memory issues. Either you don't have enough memory in your system or some other part of the code is causing memory issues.

LSTM Network not learning from sequences. Underfiting or Overfitting using Keras, TF backend

Thanks in advance for your help.
I am working in a problem with sequences of 4 characters. I have around 18.000 sequences in the training set. Working with Keras+TensorFlow backend. The total number of possible characters to predict is 52.
When I use a network like you see below in "Network A" with around 490K parameters to learn, the network tremendously overfit and the validation loss increases like crazy even in 300 epochs. Either way, the validation accuracy does not go up to 20%.
When I use "Network B" below, with around 8K parameters to learn, the network does not seems to learn. Accuracy does not go over 40% even in 3000 epochs for the training data and around 10% for validation set..
I have tried lots of configurations in the middle without any real success.
Do you have any recommendation?
Both cases using the following config:
rms = keras.optimizers.RMSprop(lr=0.01, rho=0.9, epsilon=None, decay=0.0)
model.compile(loss='categorical_crossentropy', optimizer=rms, metrics=['accuracy'])
Network A
Shape of input matrix:
4 1
Shape of Output:
57
Layer (type) Output Shape Param #
=================================================================
lstm_3 (LSTM) (None, 4, 256) 264192
_________________________________________________________________
dropout_2 (Dropout) (None, 4, 256) 0
_________________________________________________________________
lstm_4 (LSTM) (None, 4, 128) 197120
_________________________________________________________________
dropout_3 (Dropout) (None, 4, 128) 0
_________________________________________________________________
lstm_5 (LSTM) (None, 32) 20608
_________________________________________________________________
dense_1 (Dense) (None, 128) 4224
_________________________________________________________________
dropout_4 (Dropout) (None, 128) 0
_________________________________________________________________
dense_2 (Dense) (None, 57) 7353
_________________________________________________________________
activation_1 (Activation) (None, 57) 0
=================================================================
Total params: 493,497
Trainable params: 493,497
Non-trainable params: 0
"Network B"
Shape of input matrix:
4 1
Shape of Output:
57
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
lstm_6 (LSTM) (None, 4, 32) 4352
_________________________________________________________________
dropout_5 (Dropout) (None, 4, 32) 0
_________________________________________________________________
lstm_7 (LSTM) (None, 16) 3136
_________________________________________________________________
dropout_6 (Dropout) (None, 16) 0
_________________________________________________________________
dense_3 (Dense) (None, 57) 969
_________________________________________________________________
activation_2 (Activation) (None, 57) 0
=================================================================
Total params: 8,457
Trainable params: 8,457
Non-trainable params: 0
I can see that your input shape is "4x1" and you feed that directly that to your LSTM, what is the format of your input ? Because here it seems that at each timestep (for each character) you have a dimension of 1 (so maybe you just passed an int ?).
As you said you are dealing with sequence of 4 characters, you have to treat them as categorical variables and encode them in a proper way.
You could for example one-hot encode them, or embed them using an EmbeddingLayer to a certain dimension.

Keras - Freezing A Model And Then Adding Trainable Layers

I am taking a CNN model that is pretrained, and then trying to implement a CNN-LSTM with parallel CNNs all with the same weights from the pretraining.
# load in CNN
weightsfile = 'final_weights.h5'
modelfile = '2dcnn_model.json'
# load model from json
json_file = open(modelfile, 'r')
loaded_model_json = json_file.read()
json_file.close()
fixed_cnn_model = keras.models.model_from_json(loaded_model_json)
fixed_cnn_model.load_weights(weightsfile)
# remove the last 2 dense FC layers and freeze it
fixed_cnn_model.pop()
fixed_cnn_model.pop()
fixed_cnn_model.trainable = False
print(fixed_cnn_model.summary())
This will produce the summary:
_
________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) (None, 32, 32, 4) 0
_________________________________________________________________
conv2d_1 (Conv2D) (None, 30, 30, 32) 1184
_________________________________________________________________
conv2d_2 (Conv2D) (None, 28, 28, 32) 9248
_________________________________________________________________
conv2d_3 (Conv2D) (None, 26, 26, 32) 9248
_________________________________________________________________
conv2d_4 (Conv2D) (None, 24, 24, 32) 9248
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 12, 12, 32) 0
_________________________________________________________________
conv2d_5 (Conv2D) (None, 10, 10, 64) 18496
_________________________________________________________________
conv2d_6 (Conv2D) (None, 8, 8, 64) 36928
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 4, 4, 64) 0
_________________________________________________________________
conv2d_7 (Conv2D) (None, 2, 2, 128) 73856
_________________________________________________________________
max_pooling2d_3 (MaxPooling2 (None, 1, 1, 128) 0
_________________________________________________________________
flatten_1 (Flatten) (None, 128) 0
_________________________________________________________________
dropout_1 (Dropout) (None, 128) 0
_________________________________________________________________
dense_1 (Dense) (None, 512) 66048
=================================================================
Total params: 224,256
Trainable params: 0
Non-trainable params: 224,256
_________________________________________________________________
Now, I will add to it and compile and show that the non-trainable all become trainable.
# create sequential model to get this all before the LSTM
# initialize loss function, SGD optimizer and metrics
loss = 'binary_crossentropy'
optimizer = keras.optimizers.Adam(lr=1e-4,
beta_1=0.9,
beta_2=0.999,
epsilon=1e-08,
decay=0.0)
metrics = ['accuracy']
currmodel = Sequential()
currmodel.add(TimeDistributed(fixed_cnn_model, input_shape=(num_timewins, imsize, imsize, n_colors)))
currmodel.add(LSTM(units=size_mem,
activation='relu',
return_sequences=False))
currmodel.add(Dense(1024, activation='relu')
currmodel.add(Dense(2, activation='softmax')
currmodel = Model(inputs=currmodel.input, outputs = currmodel.output)
config = currmodel.compile(optimizer=optimizer, loss=loss, metrics=metrics)
print(currmodel.summary())
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
time_distributed_3_input (In (None, 5, 32, 32, 4) 0
_________________________________________________________________
time_distributed_3 (TimeDist (None, 5, 512) 224256
_________________________________________________________________
lstm_3 (LSTM) (None, 50) 112600
_________________________________________________________________
dropout_1 (Dropout) (None, 50) 0
_________________________________________________________________
dense_1 (Dense) (None, 1024) 52224
_________________________________________________________________
dropout_2 (Dropout) (None, 1024) 0
_________________________________________________________________
dense_2 (Dense) (None, 2) 2050
=================================================================
Total params: 391,130
Trainable params: 391,130
Non-trainable params: 0
_________________________________________________________________
How am I supposed to freeze the layers in this case? I am almost 100% positive that I had working code in this format in an earlier keras version. It seems like this is the right direction, since you define a model and declare certain layers trainable, or not.
Then you add layers, which are by default trainable. However, this seems to convert all the layers to trainable.
try adding
for layer in currmodel.layers[:5]:
layer.trainable = False
First print the layer numbers in you network
for i,layer in enumerate(currmodel.layers):
print(i,layer.name)
Now check which layers are trainable and which are not
for i,layer in enumerate(model.layers):
print(i,layer.name,layer.trainable)
Now you can set the parameter 'trainable' for the layers which you want. Let us say you want to train only last 2 layers out of total 6 (the numbering starts from 0) then you can write something like this
for layer in model.layers[:5]:
layer.trainable=False
for layer in model.layers[5:]:
layer.trainable=True
To cross check try to print again and you will get the desired settings.