How to add a few layers before the model in transfer learning with tensorflow - tensorflow

I am trying to use transfer learning in tensorflow. I know the high level paradigm
base_model=MobileNet(weights='imagenet',include_top=False) #imports the
mobilenet model and discards the last 1000 neuron layer.
x=base_model.output
x=GlobalAveragePooling2D()(x)
x=Dense(1024,activation='relu')(x) #we add dense layers so that the model can learn more complex functions and classify for better results.
x=Dense(1024,activation='relu')(x) #dense layer 2
x=Dense(512,activation='relu')(x) #dense layer 3
preds=Dense(120,activation='softmax')(x) #final layer with softmax activation
and then one compiles it by
model=Model(inputs=base_model.input,outputs=preds)
However i want the there to be a few other layers before the base_model.input. I want to add adversarial noise to the images that come in and a few other things. So effectively i want to know how to :
base_model=MobileNet(weights='imagenet',include_top=False) #imports the
mobilenet model and discards the last 1000 neuron layer
x = somerandomelayers(x_in)
base_model.input = x_in
x=base_model.output
x=GlobalAveragePooling2D()(x)
x=Dense(1024,activation='relu')(x) #we add dense layers so that the model can learn more complex functions and classify for better results.
x=Dense(1024,activation='relu')(x) #dense layer 2
x=Dense(512,activation='relu')(x) #dense layer 3
preds=Dense(120,activation='softmax')(x) #final layer with softmax activation
model=Model(inputs=x_in,outputs=preds)
but the line base_model.input = x_in is apparently not the way to do it as it throws can't set attribute error. How do i go about achieving the desired behavior?

You need to define input layer. It's rather straightforward, just be sure to set right shapes. For example, you can use any predefined model from Keras.
base_model = keras.applications.any_model(...)
input_layer = keras.layers.Input(shape)
x = keras.layers.Layer(...)(input_layer)
...
x = base_model(x)
...
output = layers.Dense(num_classes, activation)(x)
model = keras.Model(inputs=input_layer, outputs=output)

Related

tf.keras Functional model gives different results on the same data

I have defined my Functional model like this:
base_model = VGG16(include_top=False, input_shape=(224,224,3), pooling='avg')
inputs = tf.keras.Input(shape=(224,224,3))
x = preprocess_input(inputs)
x = base_model(x, training=False)
x = tf.keras.layers.Dropout(0.2)(x, training=True)
outputs = tf.keras.layers.Dense(1, activation='sigmoid')(x)
model = tf.keras.Model(inputs=inputs, outputs=outputs)
The problem is when I call .evaluate() or .predict() I get slightly different results everytime when using the exact same batch (with shuffle=False in my dataset, and all the random seeds initialized).
I tried reconstructing the model without some of the layers and I found the culprit to be these 2 layers constructed by the line x=preprocess_input(inputs), which give randomness to the results:
model summary
Note: preprocess_input is a vgg16 preprocessing function at tf.keras.applications.vgg16.preprocess_input.
However, if I redefine my Functional model as Sequential:
new_model = tf.keras.Sequential()
new_model.add(model.layers[0]) #input layer
new_model.add(tf.keras.layers.Lambda(preprocess_input))
new_model.add(model.layers[3]) #vgg16
new_model.add(model.layers[4]) #dropout
new_model.add(model.layers[5]) #dense
The problem is gone and I get consistent results from .evaluate() or .predict().
What could potentially cause the Functional model to behave like this?
EDIT
As xdurch0 pointed out, it was the dropout layer at fault for different results. The functional model applied dropout during .predict() and .evaluate() methods.

Should feature embeddings be taken before or after dropout layer in neural network?

I am training a binary text classification model using BERT as follows:
def create_model():
text_input = tf.keras.layers.Input(shape=(), dtype=tf.string, name='text')
preprocessed_text = bert_preprocess(text_input)
outputs = bert_encoder(preprocessed_text)
# Neural network layers
l1 = tf.keras.layers.Dropout(0.1, name="dropout")(outputs['pooled_output'])
l2 = tf.keras.layers.Dense(1, activation='sigmoid', name="output")(l1)
# Use inputs and outputs to construct a final model
model = tf.keras.Model(inputs=[text_input], outputs=[l2])
return model
This code is borrowed from the example on tfhub: https://tfhub.dev/tensorflow/bert_en_uncased_L-12_H-768_A-12/4.
I want to extract feature embeddings from the penultimate layer and use them for comparison, clustering, visualization, etc between examples. Should this be done before dropout (l1 in the model above) or after dropout (l2 in the model above)?
I am trying to figure out whether this choice makes a significant difference, or is it fine either way? For example, if I extract feature embeddings after dropout and compute feature similarities between two examples, this might be affected by which nodes are randomly set to 0 (but perhaps this is okay).
In order to answer your question let's recall how a Dropout layer works:
The Dropout layer is usually used as a means to mitigate overfitting. Suppose two layers, A and B, are connected through a Dropout layer. Then during the training phase, neurons in layer A are being randomly dropped. That prevents layer B from becoming too dependent upon specific neurons in layer A, as these neurons are not always available. Therefore, layer B has to take into consideration the overall signal coming from layer A, and (hopefully) cannot cling to some noise which is specific to the training set.
An important point to note is that the Dropout mechanism is activated only during the training phase. While predicting, Dropout does nothing.
If I understand you correctly, you want to know whether to take the features before or after the Dropout (note that in your network l1 denotes the features after Dropout has been applied). If so, I would take the features before Dropout, because technically it does not really matter (Dropout is inactive during prediction) and it is more reasonable to do so (Dropout is not meaningful without a following layer).

How can I properly train a model to predict a moving average using LSTM in keras?

I'm learning how to train RNN model on Keras and I was expecting that training a model to predict the Moving Average of the last N steps would be quite easy.
I have a time series with thousands of steps and I'm able to create a model and train it with batches of data.
If I train it with the following model though, the test set predictions differ a lot from real values. (batch = 30, moving average window = 10)
inputs = tf.keras.Input(shape=(batch_length, num_features))
x = tf.keras.layers.LSTM(10, return_sequences=False)(inputs)
outputs = tf.keras.layers.Dense(num_labels)(x)
model = tf.keras.Model(inputs=inputs, outputs=outputs, name="test_model")
To be able to get good predictions, I need to add another layer of TimeDistributed, getting 2D predictions instead of 1D ones (I get one prediction per each time step)
inputs = tf.keras.Input(shape=(batch_length, num_features))
x = tf.keras.layers.LSTM(10, return_sequences=True)(inputs)
x = tf.keras.layers.TimeDistributed(tf.keras.layers.Dense(num_labels))(x)
outputs = tf.keras.layers.Dense(num_labels)(x)
model = tf.keras.Model(inputs=inputs, outputs=outputs, name="test_model")
I suggest that if your goal is to give as input the last 10 timesteps and have as a prediction the moving average to try a regressor model with Densely Connected layers rather than an RNN. (Linear activation with regularization might work well enough)
That option would be cheaper to train and run than an LSTM

I want to build 2 copies of decoder containing same FC layers with different droupout but I want them to share weights. How can this be done?

I want to build 2 copies of decoder of a model with different dropouts but these layers should share the weights, how can this be achieved with keras ?
I know how to share a layer with keras API (https://keras.io/getting-started/functional-api-guide/#shared-layers) but I want 2 set of layers, as I want to keep different dropouts but they should have common weights.
I want to achieve this architecture.
Conv
Pool
droupout1 droupout2
FC1 FC2
softmax1 softmax2
out
This is easy with the Keras Functional API, I assume you want to share weights between FC1 and FC2:
pool_out = SomePoolingLayer()(input_tensor)
shared_fc = Dense(neurons, activation='softmax')
drop1 = Dropout(0.5)(pool_out)
drop2 = Droput(0.5)(pool_out)
fc1 = shared_fc(drop1)
fc2 = shared_fc(drop2)
out = somehow_merge()([fc1, fc2])
somehow_merge can be any functional merge function like concatenate or average.

MLP to initialize LSTM cell state in Keras

Can we use output of MLP as cell state in LSTM network and train the MLP too with back propagation?
This is similar to image captioning with CNN & LSTM where the output of CNN is flattened and used as initial hidden/cell state and train the stacked network where even the CNN part is updated through back-propagation.
I tried an architecture in keras to achieve the same. Please find the code here.
But the weights of the MLP are not being updated. I understand this is more straightforward in tensorflow where we can explicitly mention which parameters to update with the loss, but can anyone help me with keras API?
Yes, we can. Simply pass the output as the initial hidden state. Remember that an LSTM has two hidden states, h and c. You can read more about this here. Note that you also do not have to create multiple keras models, but can simple connect all the layers.:
# define mlp
mlp_inp = Input(batch_shape=(batch_size, hidden_state_dim))
mlp_dense = Dense(hidden_state_dim, activation='relu')(mlp_inp)
## Define LSTM model
lstm_inp = Input(batch_shape=(batch_size, seq_len, inp_size))
lstm_layer = LSTM(lstm_dim)(lstm_inp, initial_state=[mlp_dense,mlp_dense])
lstm_out = Dense(10,activation='softmax')(lstm_layer)
model = Model([mlp_inp, lstm_inp], lstm_out)
model.compile(optimizer="adam", loss="categorical_crossentropy", metrics= ['accuracy'])
However, because of the above fact about having two states, you may want to consider two MLP layers for each initial state separately.
# define mlp
mlp_inp = Input(batch_shape=(batch_size, hidden_state_dim))
mlp_dense_h = Dense(hidden_state_dim, activation='relu')(mlp_inp)
mlp_dense_c = Dense(hidden_state_dim, activation='relu')(mlp_inp)
## Define LSTM model
lstm_inp = Input(batch_shape=(batch_size, seq_len, inp_size))
lstm_layer = LSTM(lstm_dim)(lstm_inp, initial_state=[mlp_dense_h,mlp_dense_c])
lstm_out = Dense(10,activation='softmax')(lstm_layer)
model = Model([mlp_inp, lstm_inp], lstm_out)
model.compile(optimizer="adam", loss="categorical_crossentropy", metrics= ['accuracy'])
Also, note that when you go about saving this model, use save_weights instead of save because save_model can not handle the initial state passing. Also, as a slight note.