I have a Bi-LSTM model and I want to get the computational complexity of it. I have read on internet that
The computational complexity of learning LSTM models per weight and time step with the stochastic gradient descent (SGD) optimization technique is O(1). Therefore, the learn- ing computational complexity per time step is O(W).
But how do I find the time steps in my model? My model is
model = Sequential()
model.add(Embedding(max_words, 768, input_length=max_len, weights=[embedding]))
model.add(Dense(2, activation='softmax', use_bias=True, kernel_regularizer=regularizers.l1_l2(l1=1e-5, l2=1e-4), bias_regularizer=regularizers.l2(1e-4),
Model summary is
Model: "sequential_1"
Layer (type) Output Shape Param #
embedding_1 (Embedding) (None, 768, 768) 37147392
batch_normalization_2 (Batch (None, 768, 768) 3072
activation_2 (Activation) (None, 768, 768) 0
bidirectional_1 (Bidirection (None, 32) 100480
batch_normalization_3 (Batch (None, 32) 128
activation_3 (Activation) (None, 32) 0
dropout_1 (Dropout) (None, 32) 0
dense_1 (Dense) (None, 2) 66
Total params: 37,251,138
Trainable params: 37,249,538
Non-trainable params: 1,600
I'm trying to train my keras model but shapes are incompatible.
The error says
ValueError: Shapes (None, 3) and (None, 3, 3) are incompatible
My train set's shape is (2000, 3, 768) and lable's shape is (2000, 3).
What is the wrong the point?
Model define & fit code
input_shape = x_train.shape[1:]
model = my_dnn(input_shape, 3)
model.fit(x_train, y_train, epochs=25, verbose=1)
Model code
def my_dnn(input, num_classes):
model = Sequential()
model.compile( loss='categorical_crossentropy',
return model
In addition to what's said, it seems you are carrying the second dimension of the input data until the end of the model. So your model summary is something like this:
Layer (type) Output Shape Param #
dense_1 (Dense) (None, 3, 1024) 787456
activation_1 (Activation) (None, 3, 1024) 0
dropout_1 (Dropout) (None, 3, 1024) 0
dense_2 (Dense) (None, 3, 512) 524800
activation_2 (Activation) (None, 3, 512) 0
dense_3 (Dense) (None, 3, 225) 115425
activation_3 (Activation) (None, 3, 225) 0
dense_4 (Dense) (None, 3, 100) 22600
activation_4 (Activation) (None, 3, 100) 0
dense_5 (Dense) (None, 3, 3) 303
activation_5 (Activation) (None, 3, 3) 0
Total params: 1,450,584
Trainable params: 1,450,584
Non-trainable params: 0
As you can see, the output shape of the model (None, 3, 3) is not compatible with the label's shape (None, 3), and at some point, you need to use a Flatten layer.
There are two possible reasons:
Your problem is multi-class classification, hence you need softmax instead of sigmoid + accuracy or CategoricalAccuracy() as a metric.
Your problem is multi-label classification, hence you need binary_crossentropy and tf.keras.metrics.BinaryAccuracy()
Depending on how your dataset is built/the task you are trying to solve, you need to opt for one of those.
For case 1, ensure your data is OHE(one-hot encoded).
Also, Marco Cerliani and Amir (in the comment below) point out that the data output needs to be in a 2D format rather than 3D : you should either preprocess the data accordingly before feeding it to the network or use, as suggested in the comment below, a Flatten() at a point (probably before the final Dense())
Most of the examples on the Internet regarding multi-label image classification are based on just a few labels. For example, with 6 classes we get:
model = models.Sequential()
model.add(layer=layers.Dense(units=256, activation="relu"))
model.add(layer=layers.Dense(units=6, activation="sigmoid"))
Layer (type) Output Shape Param #
vgg16 (Model) (None, 7, 7, 512) 14714688
flatten_1 (Flatten) (None, 25088) 0
dense_1 (Dense) (None, 256) 6422784
dense_2 (Dense) (None, 6) 1542
Total params: 21,139,014
Trainable params: 13,503,750
Non-trainable params: 7,635,264
However, for datasets with significantly more labels, the size of the training parameters explodes and eventually training process fails with a ResourceExhaustedError error. For example, with 3047 label we get:
model = models.Sequential()
model.add(layer=layers.Dense(units=256, activation="relu"))
model.add(layer=layers.Dense(units=3047, activation="sigmoid"))
Layer (type) Output Shape Param #
vgg16 (Model) (None, 7, 7, 512) 14714688
flatten_1 (Flatten) (None, 25088) 0
dense_1 (Dense) (None, 256) 6422784
dense_2 (Dense) (None, 3047) 783079
Total params: 21,920,551
Trainable params: 14,285,287
Non-trainable params: 7,635,264
Obviously, there is something wrong with my network but not sure how to overcome this issue...
Resource Exhauseted Error is related to memory issues. Either you don't have enough memory in your system or some other part of the code is causing memory issues.
Thanks in advance for your help.
I am working in a problem with sequences of 4 characters. I have around 18.000 sequences in the training set. Working with Keras+TensorFlow backend. The total number of possible characters to predict is 52.
When I use a network like you see below in "Network A" with around 490K parameters to learn, the network tremendously overfit and the validation loss increases like crazy even in 300 epochs. Either way, the validation accuracy does not go up to 20%.
When I use "Network B" below, with around 8K parameters to learn, the network does not seems to learn. Accuracy does not go over 40% even in 3000 epochs for the training data and around 10% for validation set..
I have tried lots of configurations in the middle without any real success.
Do you have any recommendation?
Both cases using the following config:
rms = keras.optimizers.RMSprop(lr=0.01, rho=0.9, epsilon=None, decay=0.0)
model.compile(loss='categorical_crossentropy', optimizer=rms, metrics=['accuracy'])
Network A
Shape of input matrix:
4 1
Shape of Output:
Layer (type) Output Shape Param #
lstm_3 (LSTM) (None, 4, 256) 264192
dropout_2 (Dropout) (None, 4, 256) 0
lstm_4 (LSTM) (None, 4, 128) 197120
dropout_3 (Dropout) (None, 4, 128) 0
lstm_5 (LSTM) (None, 32) 20608
dense_1 (Dense) (None, 128) 4224
dropout_4 (Dropout) (None, 128) 0
dense_2 (Dense) (None, 57) 7353
activation_1 (Activation) (None, 57) 0
Total params: 493,497
Trainable params: 493,497
Non-trainable params: 0
"Network B"
Shape of input matrix:
4 1
Shape of Output:
Layer (type) Output Shape Param #
lstm_6 (LSTM) (None, 4, 32) 4352
dropout_5 (Dropout) (None, 4, 32) 0
lstm_7 (LSTM) (None, 16) 3136
dropout_6 (Dropout) (None, 16) 0
dense_3 (Dense) (None, 57) 969
activation_2 (Activation) (None, 57) 0
Total params: 8,457
Trainable params: 8,457
Non-trainable params: 0
I can see that your input shape is "4x1" and you feed that directly that to your LSTM, what is the format of your input ? Because here it seems that at each timestep (for each character) you have a dimension of 1 (so maybe you just passed an int ?).
As you said you are dealing with sequence of 4 characters, you have to treat them as categorical variables and encode them in a proper way.
You could for example one-hot encode them, or embed them using an EmbeddingLayer to a certain dimension.
I am taking a CNN model that is pretrained, and then trying to implement a CNN-LSTM with parallel CNNs all with the same weights from the pretraining.
# load in CNN
weightsfile = 'final_weights.h5'
modelfile = '2dcnn_model.json'
# load model from json
json_file = open(modelfile, 'r')
loaded_model_json = json_file.read()
fixed_cnn_model = keras.models.model_from_json(loaded_model_json)
# remove the last 2 dense FC layers and freeze it
fixed_cnn_model.trainable = False
This will produce the summary:
Layer (type) Output Shape Param #
input_1 (InputLayer) (None, 32, 32, 4) 0
conv2d_1 (Conv2D) (None, 30, 30, 32) 1184
conv2d_2 (Conv2D) (None, 28, 28, 32) 9248
conv2d_3 (Conv2D) (None, 26, 26, 32) 9248
conv2d_4 (Conv2D) (None, 24, 24, 32) 9248
max_pooling2d_1 (MaxPooling2 (None, 12, 12, 32) 0
conv2d_5 (Conv2D) (None, 10, 10, 64) 18496
conv2d_6 (Conv2D) (None, 8, 8, 64) 36928
max_pooling2d_2 (MaxPooling2 (None, 4, 4, 64) 0
conv2d_7 (Conv2D) (None, 2, 2, 128) 73856
max_pooling2d_3 (MaxPooling2 (None, 1, 1, 128) 0
flatten_1 (Flatten) (None, 128) 0
dropout_1 (Dropout) (None, 128) 0
dense_1 (Dense) (None, 512) 66048
Total params: 224,256
Trainable params: 0
Non-trainable params: 224,256
Now, I will add to it and compile and show that the non-trainable all become trainable.
# create sequential model to get this all before the LSTM
# initialize loss function, SGD optimizer and metrics
loss = 'binary_crossentropy'
optimizer = keras.optimizers.Adam(lr=1e-4,
metrics = ['accuracy']
currmodel = Sequential()
currmodel.add(TimeDistributed(fixed_cnn_model, input_shape=(num_timewins, imsize, imsize, n_colors)))
currmodel.add(Dense(1024, activation='relu')
currmodel.add(Dense(2, activation='softmax')
currmodel = Model(inputs=currmodel.input, outputs = currmodel.output)
config = currmodel.compile(optimizer=optimizer, loss=loss, metrics=metrics)
Layer (type) Output Shape Param #
time_distributed_3_input (In (None, 5, 32, 32, 4) 0
time_distributed_3 (TimeDist (None, 5, 512) 224256
lstm_3 (LSTM) (None, 50) 112600
dropout_1 (Dropout) (None, 50) 0
dense_1 (Dense) (None, 1024) 52224
dropout_2 (Dropout) (None, 1024) 0
dense_2 (Dense) (None, 2) 2050
Total params: 391,130
Trainable params: 391,130
Non-trainable params: 0
How am I supposed to freeze the layers in this case? I am almost 100% positive that I had working code in this format in an earlier keras version. It seems like this is the right direction, since you define a model and declare certain layers trainable, or not.
Then you add layers, which are by default trainable. However, this seems to convert all the layers to trainable.
try adding
for layer in currmodel.layers[:5]:
layer.trainable = False
First print the layer numbers in you network
for i,layer in enumerate(currmodel.layers):
Now check which layers are trainable and which are not
for i,layer in enumerate(model.layers):
Now you can set the parameter 'trainable' for the layers which you want. Let us say you want to train only last 2 layers out of total 6 (the numbering starts from 0) then you can write something like this
for layer in model.layers[:5]:
for layer in model.layers[5:]:
To cross check try to print again and you will get the desired settings.
I have the following sequential model that works with variable length inputs:
m = Sequential()
m.add(Embedding(len(chars), 4, name="embedding"))
m.add(Bidirectional(LSTM(16, unit_forget_bias=True, name="lstm")))
Gives the following summary:
Layer (type) Output Shape Param #
embedding (Embedding) (None, None, 4) 204
bidirectional_2 (Bidirection (None, 32) 2688
dense (Dense) (None, 51) 1683
activation_2 (Activation) (None, 51) 0
Total params: 4,575
Trainable params: 4,575
Non-trainable params: 0
However when I try to implement the same model in functional API I don't know whatever I try as Input layer shape doesn't seem to be the same as the sequential model. Here is one of my tries:
charinput = Input(shape=(4,),name="input",dtype='int32')
embedding = Embedding(len(chars), 4, name="embedding")(charinput)
lstm = Bidirectional(LSTM(16, unit_forget_bias=True, name="lstm"))(embedding)
dense = Dense(len(chars),name="dense")(lstm)
output = Activation("softmax")(dense)
And here is the summary:
Layer (type) Output Shape Param #
input (InputLayer) (None, 4) 0
embedding (Embedding) (None, 4, 4) 204
bidirectional_1 (Bidirection (None, 32) 2688
dense (Dense) (None, 51) 1683
activation_1 (Activation) (None, 51) 0
Total params: 4,575
Trainable params: 4,575
Non-trainable params: 0
Use shape=(None,) in the input layer, in your case:
charinput = Input(shape=(None,),name="input",dtype='int32')
Try adding the argument input_length=None to the embeddinglayer.