I want to mask zeros before conv1D layer, but this is not supported do you have a solution for this problem.
Here's a piece of my code :
# Defining the neural network architecture. This contains five TDNN layers, statistics layer, two fully connected layers followed by the softmax output.
input_shape_data=(1865,20)
main_input = Input(shape=input_shape_data, dtype='float', name='main_input')
u = Masking(mask_value=0.)(main_input)
x1 = Conv1D(512, kernel_size=5, strides=1, padding='same',dilation_rate=1,activation='relu',kernel_initializer='glorot_uniform', bias_initializer='zeros')(u)
x1 = BatchNormalization()(x1)
Thank you in advance
Related
I am trying to write code for this architecture (Question Answering model: Paper https://www.hindawi.com/journals/cin/2019/9543490/) and looking for help how to get hidden state matrices Hq and Ha from stacked BiLSTM layers. Could some one please advise.
# Creating Embedding Layer for Query
# Considered fixed length as 40 for both question and answer as per research paper
embedding_layer1 = layers.Embedding(vocab_size_query, 300, weights=[embedding_matrix_query], input_length =40, trainable=False)
input_text1 =Input(shape=(40,), name="input_text")
x = embedding_layer1(input_text1)
# Creating Bidirectional layer for Query
# Each word in the context and question should be made aware of the nearby words occurring. We use a bi-directional recurrent neural network (LSTM’s) here.
x = Bidirectional(LSTM(128,recurrent_dropout=0.5,kernel_regularizer=regularizers.l2(0.001),return_sequences=True))(x)
x = Bidirectional(LSTM(128,recurrent_dropout=0.5,kernel_regularizer=regularizers.l2(0.001),return_sequences=True))(x)
flatten_1 = Flatten()(x)
## Creating Embedding Layer for Passage
embedding_layer2 = layers.Embedding(vocab_size_answer, 300, weights=[embedding_matrix_answer], input_length =40, trainable=False)
input_text2 =Input(shape=(40,), name="input_text")
x2 = embedding_layer2(input_text2)
# Creating Bidirectional layer for Passage
x2 = Bidirectional(LSTM(128,recurrent_dropout=0.5,kernel_regularizer=regularizers.l2(0.001),return_sequences=True))(x2)
x2 = Bidirectional(LSTM(128,recurrent_dropout=0.5,kernel_regularizer=regularizers.l2(0.001),return_sequences=True))(x2)
flatten_2 = Flatten()(x2)
According to the model structure and your source code, you can obtain the Hq and Ha by extracting the output of flatten_1 and flatten_2 layer. To extract the output of an intermediate layer, you can create a new model with input as the original input, and the output as the appropriate layer.
from tensorflow.keras.models import Model
model = ... # create the original model
layer_name = 'my_layer'
intermediate_layer_model = Model(inputs=model.input,
outputs=model.get_layer(layer_name).output)
intermediate_output = intermediate_layer_model.predict(data)
This is the model I make for my deep learning project and I am getting decent accuracy out of it. My question is, if I froze the weights of the initial model(which is my base model of VGG19) how did I manage to train the whole model? And also after adding the VGG19 layer with the layers frozen I got better results than I acheived only which a few layers of CNN. Could it be because the weights of the VGG19 were initialized into my CNN layer?
img_h=224
img_w=224
initial_model = applications.vgg19.VGG19(weights='imagenet', include_top=False,input_shape = (img_h,img_w,3))
last = initial_model.output
for layer in initial_model.layers:
layer.trainable = False
x = Conv2D(128, kernel_size=3, strides=1, activation='relu')(last)
x = Conv2D(64, kernel_size=3, strides=1, activation='relu')(x)
x = Flatten()(x)
x = Dense(512, activation='relu')(x)
x = Dense(256, activation='relu')(x)
x = Dense(128, activation='relu')(x)
x = (Dropout(0.1))(x)
preds = Dense(2, activation='sigmoid')(x)
"Freezing the layers" just means you don't update the weights on those layers when you backpropagate the error. Therefore, you'll just update the weights on those layers that are not frozen, which enables your neural net to learn.
You are adding some layers after VGG. I don't know if this is a common approach, but it totally makes sense that it kind of works, assuming you are interpreting your metrics right.
Your VGG has already been pre-trained on ImageNet, so it's a pretty good baseline for many use-cases. You are basically using VGG as your encoder. Then, on the output of this encoder (which we can call latent representation of your input), you train a neural net.
I would also try out more mainstream transfer learning techniques, where you gradually unfreeze layers starting from the end, or you have gradually smaller learning rate.
Suppose that I have a model like this (this is a model for time series forecasting):
ipt = Input((data.shape[1] ,data.shape[2])) # 1
x = Conv1D(filters = 10, kernel_size = 3, padding = 'causal', activation = 'relu')(ipt) # 2
x = LSTM(15, return_sequences = False)(x) # 3
x = BatchNormalization()(x) # 4
out = Dense(1, activation = 'relu')(x) # 5
Now I want to add batch normalization layer to this network. Considering the fact that batch normalization doesn't work with LSTM, Can I add it before Conv1D layer? I think it's rational to have a batch normalization layer after LSTM.
Also, where can I add Dropout in this network? The same places? (after or before batch normalization?)
What about adding AveragePooling1D between Conv1D and LSTM? Is it possible to add batch normalization between Conv1D and AveragePooling1D in this case without any effect on LSTM layer?
Update: the LayerNormalization implementation I was using was inter-layer, not recurrent as in the original paper; results with latter may prove superior.
BatchNormalization can work with LSTMs - the linked SO gives false advice; in fact, in my application of EEG classification, it dominated LayerNormalization. Now to your case:
"Can I add it before Conv1D"? Don't - instead, standardize your data beforehand, else you're employing an inferior variant to do the same thing
Try both: BatchNormalization before an activation, and after - apply to both Conv1D and LSTM
If your model is exactly as you show it, BN after LSTM may be counterproductive per ability to introduce noise, which can confuse the classifier layer - but this is about being one layer before output, not LSTM
If you aren't using stacked LSTM with return_sequences=True preceding return_sequences=False, you can place Dropout anywhere - before LSTM, after, or both
Spatial Dropout: drop units / channels instead of random activations (see bottom); was shown more effective at reducing coadaptation in CNNs in paper by LeCun, et al, w/ ideas applicable to RNNs. Can considerably increase convergence time, but also improve performance
recurrent_dropout is still preferable to Dropout for LSTM - however, you can do both; just do not use with with activation='relu', for which LSTM is unstable per a bug
For data of your dimensionality, any sort of Pooling is redundant and may harm performance; scarce data is better transformed via a non-linearity than simple averaging ops
I strongly recommend a SqueezeExcite block after your Conv; it's a form of self-attention - see paper; my implementation for 1D below
I also recommend trying activation='selu' with AlphaDropout and 'lecun_normal' initialization, per paper Self Normalizing Neural Networks
Disclaimer: above advice may not apply to NLP and embed-like tasks
Below is an example template you can use as a starting point; I also recommend the following SO's for further reading: Regularizing RNNs, and Visualizing RNN gradients
from keras.layers import Input, Dense, LSTM, Conv1D, Activation
from keras.layers import AlphaDropout, BatchNormalization
from keras.layers import GlobalAveragePooling1D, Reshape, multiply
from keras.models import Model
import keras.backend as K
import numpy as np
def make_model(batch_shape):
ipt = Input(batch_shape=batch_shape)
x = ConvBlock(ipt)
x = LSTM(16, return_sequences=False, recurrent_dropout=0.2)(x)
# x = BatchNormalization()(x) # may or may not work well
out = Dense(1, activation='relu')
model = Model(ipt, out)
model.compile('nadam', 'mse')
return model
def make_data(batch_shape): # toy data
return (np.random.randn(*batch_shape),
np.random.uniform(0, 2, (batch_shape[0], 1)))
batch_shape = (32, 21, 20)
model = make_model(batch_shape)
x, y = make_data(batch_shape)
model.train_on_batch(x, y)
Functions used:
def ConvBlock(_input): # cleaner code
x = Conv1D(filters=10, kernel_size=3, padding='causal', use_bias=False,
kernel_initializer='lecun_normal')(_input)
x = BatchNormalization(scale=False)(x)
x = Activation('selu')(x)
x = AlphaDropout(0.1)(x)
out = SqueezeExcite(x)
return out
def SqueezeExcite(_input, r=4): # r == "reduction factor"; see paper
filters = K.int_shape(_input)[-1]
se = GlobalAveragePooling1D()(_input)
se = Reshape((1, filters))(se)
se = Dense(filters//r, activation='relu', use_bias=False,
kernel_initializer='he_normal')(se)
se = Dense(filters, activation='sigmoid', use_bias=False,
kernel_initializer='he_normal')(se)
return multiply([_input, se])
Spatial Dropout: pass noise_shape = (batch_size, 1, channels) to Dropout - has the effect below; see Git gist for code:
I'm trying to use a CNN-LSTM network with Keras in order to analyze videos. I read about it and run into TimeDistributed function and some examples.
Actually, I tried the network described below, which is in fact composed by a convolutional and pooling layers followed by recurrent and dense layers.
model = Sequential()
model.add(TimeDistributed(Conv2D(2, (2,2), activation= 'relu' ), input_shape=(None, IMG_SIZE, IMG_SIZE, 3)))
model.add(TimeDistributed(MaxPooling2D(pool_size=(2, 2))))
model.add(TimeDistributed(Flatten()))
model.add(LSTM(50))
model.add(Dense(50, activation = 'softmax'))
model.compile(loss = 'categorical_crossentropy' , optimizer = 'adam' , metrics = ['acc'])
I haven't tested properly the model, since my dataset is too small. However, during training process the network reaches accuracy 0.98 in 4-5 epochs (perhaps it is overfitting, but it isn't a problem yet because I hope to get a suitable dataset later).
Then, I read about how to use a pretrained convolutional network (MobileNet, ResNet or Inception) as a feature extractor for LSTM network, such that I use the following code:
inputs = Input(shape = (frames, IMG_SIZE, IMG_SIZE, 3))
cnn_base = InceptionV3(include_top = False, weights='imagenet', input_shape = (IMG_SIZE, IMG_SIZE, 3))
cnn_out = GlobalAveragePooling2D()(cnn_base.output)
cnn = Model(inputs=cnn_base.input, outputs=cnn_out)
encoded_frames = TimeDistributed(cnn)(inputs)
encoded_sequence = LSTM(256)(encoded_frames)
hidden_layer = Dense(1024, activation="relu")(encoded_sequence)
outputs = Dense(50, activation="softmax")(hidden_layer)
model = Model([inputs], outputs)
In this case, when training the model it always shows accuracy ~0.02 (it is the baseline 1/50).
Since the first model at least learned anything, I am wondering if there is any error with the way the network is build in the second case.
Has anybody faced this situation? Any advice?
Thank you.
The reason is you have very small amount of data and retraining the complete Inception V3 weights. Either you have to train the model with more amount of data OR train the model with more number of epochs with hyper parameter tuning. You can find more about hyper parameter training here.
The ideal way is to freeze the base model by base_model.trainable = False and just train the new layers that you have added on top of the Inception V3 layers.
OR
Unfreeze the top layers of the base model(Inception V3 layers) and set the bottom layers to be un-trainable. You can do it as below -
# Let's take a look to see how many layers are in the base model
print("Number of layers in the base model: ", len(base_model.layers))
# Fine-tune from this layer onwards
fine_tune_at = 100
# Freeze all the layers before the `fine_tune_at` layer
for layer in base_model.layers[:fine_tune_at]:
layer.trainable = False
Let us say that I build an extreamly simple CNN with Keras to classify vectors.
My input (X_train) is a matrix in which each row is a vector and each column is a feature. My input labels (y_train) is matrix where each line is a one hot encoded vector. This is a binary classifier.
my CNN is built as follows:
model = Sequential()
model.add(Conv1D(64,3))
model.add(Activation('relu'))
model.add(Flatten())
model.add(Dense(64))
model.add(Activation('relu'))
model.add(Dense(2))
model.add(Activation('sigmoid'))
model.compile(loss = 'binary_crossentropy', optimizer = 'adam', matrics =
['accuracy'])
model.fit(X_train,y_train,batch_size = 32)
But when I try to run this code, I get back this error message:
Input 0 is incompatible with layer conv1d_23: expected ndim=3, found
ndim=2
why would keras expect 3 dims? one dim for samples, and one for features. And more importantly, how can I fix this?
X_train is suppose to have the shape: (batch_size, steps, input_dim), see documentation. It seems like you are missing one of the dimensions.
I would guess input_dim in your case is 1 and that is why it is missing. If so, change the
model.fit
line to
model.fit(tf.expand_dims(X_train,-1), y_train,batch_size = 32)
Your code is not a minimal working example, so I am not able to verify if that is the only problem, but this should hopefully fix your current error message.
A Conv1D layer expects an input with shape (samples, width, channels), so this does not match your input data, producing an error.
The convolution operation is done on the width dimension, so assuming that you want to do convolution on what you call features, then you should reshape your data to add a dummy channels dimension with a value of one:
X_train = X_train.reshape((X_train.shape[0], X_train.shape[1], 1))