Obtaining output of an Intermediate layer in TensorFlow/Keras - tensorflow

I'm trying to obtain output of an intermediate layer in Keras, Following is my code:
XX = model.input # Keras Sequential() model object
YY = model.layers[0].output
F = K.function([XX], [YY]) # K refers to keras.backend
Xaug = X_train[:9]
Xresult = F([Xaug.astype('float32')])
Running this, I got an Error :
InvalidArgumentError (see above for traceback): You must feed a value for placeholder tensor 'dropout_1/keras_learning_phase' with dtype bool
i came to know that because I'm using dropout layer in my model, I have to specify a learning_phase() flag to my function as per keras documentation.
I changed my code to the following:
XX = model.input
YY = model.layers[0].output
F = K.function([XX, K.learning_phase()], [YY])
Xaug = X_train[:9]
Xresult = F([Xaug.astype('float32'), 0])
Now I'm getting a new Error that I'm unable to figure out:
TypeError: Cannot interpret feed_dict key as Tensor: Can not convert a int into a Tensor.
Any help would be appreciated.
PS : I'm new to TensorFlow and Keras.
Edit 1 :
Following is the complete code that I'm using. I'm using Spatial Transformer Network as discussed in this NIPS paper and it's Kera's implementation here
input_shape = X_train.shape[1:]
# initial weights
b = np.zeros((2, 3), dtype='float32')
b[0, 0] = 1
b[1, 1] = 1
W = np.zeros((100, 6), dtype='float32')
weights = [W, b.flatten()]
locnet = Sequential()
locnet.add(Convolution2D(64, (3, 3), input_shape=input_shape, padding='same'))
locnet.add(Activation('relu'))
locnet.add(Convolution2D(64, (3, 3), padding='same'))
locnet.add(Activation('relu'))
locnet.add(MaxPooling2D(pool_size=(2, 2)))
locnet.add(Convolution2D(128, (3, 3), padding='same'))
locnet.add(Activation('relu'))
locnet.add(Convolution2D(128, (3, 3), padding='same'))
locnet.add(Activation('relu'))
locnet.add(MaxPooling2D(pool_size=(2, 2)))
locnet.add(Convolution2D(256, (3, 3), padding='same'))
locnet.add(Activation('relu'))
locnet.add(Convolution2D(256, (3, 3), padding='same'))
locnet.add(Activation('relu'))
locnet.add(MaxPooling2D(pool_size=(2, 2)))
locnet.add(Dropout(0.5))
locnet.add(Flatten())
locnet.add(Dense(100))
locnet.add(Activation('relu'))
locnet.add(Dense(6, weights=weights))
model = Sequential()
model.add(SpatialTransformer(localization_net=locnet,
output_size=(128, 128), input_shape=input_shape))
model.add(Convolution2D(64, (3, 3), padding='same'))
model.add(Activation('relu'))
model.add(Convolution2D(64, (3, 3), padding='same'))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Convolution2D(128, (3, 3), padding='same'))
model.add(Activation('relu'))
model.add(Convolution2D(128, (3, 3), padding='same'))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Convolution2D(256, (3, 3), padding='same'))
model.add(Activation('relu'))
model.add(Convolution2D(256, (3, 3), padding='same'))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Convolution2D(256, (3, 3), padding='same'))
model.add(Activation('relu'))
model.add(Convolution2D(256, (3, 3), padding='same'))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.5))
model.add(Flatten())
model.add(Dense(256))
model.add(Activation('relu'))
model.add(Dense(num_classes))
model.add(Activation('softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
#==============================================================================
# Start Training
#==============================================================================
#define training results logger callback
csv_logger = keras.callbacks.CSVLogger(training_logs_path+'.csv')
model.fit(X_train, y_train,
batch_size=batch_size,
epochs=20,
validation_data=(X_valid, y_valid),
shuffle=True,
callbacks=[SaveModelCallback(), csv_logger])
#==============================================================================
# Visualize what Transformer layer has learned
#==============================================================================
XX = model.input
YY = model.layers[0].output
F = K.function([XX, K.learning_phase()], [YY])
Xaug = X_train[:9]
Xresult = F([Xaug.astype('float32'), 0])
# input
for i in range(9):
plt.subplot(3, 3, i+1)
plt.imshow(np.squeeze(Xaug[i]))
plt.axis('off')
for i in range(9):
plt.subplot(3, 3, i + 1)
plt.imshow(np.squeeze(Xresult[0][i]))
plt.axis('off')

The easiest way is to create a new model in Keras, without calling the backend. You'll need the functional model API for this:
from keras.models import Model
XX = model.input
YY = model.layers[0].output
new_model = Model(XX, YY)
Xaug = X_train[:9]
Xresult = new_model.predict(Xaug)

You could try:
model1 = tf.keras.models.Sequential(base_model.layers[:1])
model2 = tf.keras.models.Sequential(base_model.layers[1:])
Xaug = X_train[:9]
out = model1(Xaug)

Related

Is it possible/how to add new convolution layers to pre-trained model for transfer learning?

I try to reuse a pre-trained model and add some new convolution layers. The pre-trained classifier is also be replaced.
1. The pre-trained model looks like this:
input_shape = X_train.shape[1:] #(224, 224, 3)
num_classes = 500
model = Sequential()
model.add(Conv2D(32, kernel_size = (3, 3), input_shape=input_shape, activation='relu', padding = 'same',kernel_initializer='glorot_normal'))
model.add(Conv2D(32, kernel_size = (3, 3), activation = 'relu', padding = 'same',kernel_initializer='glorot_normal'))
model.add(MaxPool2D(pool_size=(2, 2), strides=(2, 2)))
model.add(Conv2D(64, kernel_size = (3, 3), activation = 'relu', padding = 'same',kernel_initializer='glorot_normal'))
model.add(Conv2D(64, kernel_size = (3, 3), activation = 'relu', padding = 'same',kernel_initializer='glorot_normal'))
model.add(MaxPool2D(pool_size=(2, 2), strides=(2, 2)))
model.add(Conv2D(128, kernel_size = (3, 3), activation = 'relu', padding = 'same',kernel_initializer='glorot_normal'))
model.add(Conv2D(128, kernel_size = (3, 3), activation = 'relu', padding = 'same',kernel_initializer='glorot_normal'))
model.add(MaxPool2D(pool_size=(2, 2), strides=(2, 2)))
model.add(Conv2D(256, kernel_size = (3, 3), activation = 'relu', padding = 'same',kernel_initializer='glorot_normal'))
model.add(Conv2D(256, kernel_size = (3, 3), activation = 'relu', padding = 'same',kernel_initializer='glorot_normal'))
model.add(Conv2D(256, kernel_size = (3, 3), activation = 'relu', padding = 'same',kernel_initializer='glorot_normal'))
model.add(MaxPool2D(pool_size=(2, 2), strides=(2, 2)))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(256, activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(num_classes, activation='softmax'))
2. Try to perform transfer learning as following:
# mark loaded layers as not trainable
for layer in base_model.layers:
layer.trainable = False
x = base_model.layers[-3].output
# A`enter code here`dd new layers :
[line 47] x = tf.keras.layers.Conv2D(256, kernel_size = (3, 3), activation = 'relu', padding = 'same',kernel_initializer='glorot_normal')(x)
x = tf.keras.layers.MaxPool2D(pool_size=(2, 2), strides=(2, 2))(x)s
x = tf.keras.layers.Conv2D(512, kernel_size = (3, 3), activation = 'relu', padding = 'same')(x)
x = tf.keras.layers.Conv2D(512, kernel_size = (3, 3), activation = 'relu', padding = 'same')(x)
x = tf.keras.layers.Conv2D(512, kernel_size = (3, 3), activation = 'relu', padding = 'same')(x)
x = tf.keras.layers.xPool2D(pool_size=(2, 2), strides=(2, 2))(x)
x = tf.keras.layers.Conv2D(512, kernel_size = (3, 3), activation = 'relu', padding = 'same')(x)
x = tf.keras.layers.Conv2D(512, kernel_size = (3, 3), activation = 'relu', padding = 'same')(x)
x = tf.keras.layers.Conv2D(512, kernel_size = (3, 3), activation = 'relu', padding = 'same')(x)
x = tf.keras.layers.xPool2D(pool_size=(2, 2), strides=(2, 2))(x)
x = Flatten()(x)
x = Dense(1024, activation='relu')(x)
output = Dense(class_number, activation='softmax')(x)
# define new model
new_model = Model(inputs=base_model.inputs, outputs=output)
new_model.summary()
Encounter this error:
ValueError: Input 0 of layer "conv2d_11" is incompatible with the layer: expected min_ndim=4, found ndim=2. Full shape received: (None, 256) [at line 47]
Is it possible/how to add new convolution layers to a pre-trained model for transfer learning?
update:
inspired from(Remove top layer from pre-trained model, transfer learning, tensorflow (load_model))
Following codes work for my purpose.
inputshape = (224, 224, 3)
# num_classes = y_train.shape[1]
num_classes = 2000
# Load Model
path = r'D:\00_twm_cnn_model\02_prt\02_outputs'
os.chdir(path)
ModelName = r'Model_CNN_VGG-16_chi_prt_100_2022-06-15_1.h5'
base_model = load_model(ModelName)
base_model.summary()
model = tf.keras.Sequential()
for layer in base_model.layers[0:-6]:
model.add(layer)
model.add(Conv2D(512, kernel_size = (3, 3), activation = 'relu', padding = 'same',input_shape=input_shape,name='conv2d_10'))
model.add(Conv2D(512, kernel_size = (3, 3), activation = 'relu', padding = 'same',input_shape=input_shape,name='conv2d_11'))
model.add(Conv2D(512, kernel_size = (3, 3), activation = 'relu', padding = 'same',input_shape=input_shape,name='conv2d_12'))
model.add(MaxPool2D(pool_size=(2, 2), strides=(2, 2),name='max_pooling2d_4'))
model.add(Conv2D(512, kernel_size = (3, 3), activation = 'relu', padding = 'same',input_shape=input_shape,name='conv2d_13'))
model.add(Conv2D(512, kernel_size = (3, 3), activation = 'relu', padding = 'same',input_shape=input_shape,name='conv2d_14'))
model.add(Conv2D(512, kernel_size = (3, 3), activation = 'relu', padding = 'same',input_shape=input_shape,name='conv2d_15'))
model.add(MaxPool2D(pool_size=(2, 2), strides=(2, 2),name='max_pooling2d_5'))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.1))
model.add(Dense(256, activation='relu'))
model.add(Dropout(0.1))
#model.add(BatchNormalization())
model.add(Dense(num_classes, activation='softmax'))
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
model.summary()

Model performance on test set fluctuates highly from epoch to epoch

I have been trying to learn a binary classifier for photos with the following arhitechture:
class PatchNDepthBasedCNN:
#staticmethod
def build(width=256, height=256, depth=3):
# initialize the model along with the input shape to be
# "channels last" and the channels dimension itself
model = Sequential()
inputShape = (height, width, depth)
chanDim = -1
# if we are using "channels first", update the input shape
# and channels dimension
if K.image_data_format() == "channels_first":
inputShape = (depth, height, width)
chanDim = 1
# 1
model.add(Conv2D(32, (3, 3), strides=1, padding="same", input_shape=inputShape))
model.add(BatchNormalization(axis=chanDim))
model.add(Activation("relu"))
model.add(MaxPooling2D(pool_size=(3, 3), strides=2, padding="same"))
# 2
model.add(Conv2D(32, (3, 3), strides=1, padding="same"))
model.add(BatchNormalization(axis=chanDim))
model.add(Activation("relu"))
model.add(Conv2D(25, (3, 3), strides=1, padding="same"))
model.add(BatchNormalization(axis=chanDim))
model.add(Activation("relu"))
model.add(Conv2D(32, (3, 3), strides=1, padding="same"))
model.add(BatchNormalization(axis=chanDim))
model.add(Activation("relu"))
x1 = MaxPooling2D(pool_size=(3, 3), strides=2, padding="same")
model.add(x1)
# the reshaped tensor will be used later in concatenate
print(x1.output.shape)
x1r = Reshape((int(width / 8), int(height / 8), 128))(x1.output)
# 3
model.add(Conv2D(32, (3, 3), strides=1, padding="same"))
model.add(BatchNormalization(axis=chanDim))
model.add(Activation("relu"))
model.add(Conv2D(25, (3, 3), strides=1, padding="same"))
model.add(BatchNormalization(axis=chanDim))
model.add(Activation("relu"))
model.add(Conv2D(32, (3, 3), strides=1, padding="same"))
model.add(BatchNormalization(axis=chanDim))
model.add(Activation("relu"))
x2 = MaxPooling2D(pool_size=(3, 3), strides=2, padding="same")
model.add(x2)
# 4
model.add(Conv2D(32, (3, 3), strides=1, padding="same"))
model.add(BatchNormalization(axis=chanDim))
model.add(Activation("relu"))
model.add(Conv2D(25, (3, 3), strides=1, padding="same"))
model.add(BatchNormalization(axis=chanDim))
model.add(Activation("relu"))
x3 = Conv2D(32, (3, 3), strides=1, padding="same")
model.add(x3)
model.add(BatchNormalization(axis=chanDim))
model.add(Activation("relu"))
# 5 - concat
c = Concatenate()([x1r, x2.output, x3.output])
model.add(InputLayer(input_tensor=c))
# 6 -
model.add(Conv2D(32, (3, 3), strides=1, padding="same"))
model.add(BatchNormalization(axis=chanDim))
model.add(Activation("relu"))
model.add(Conv2D(25, (3, 3), strides=1, padding="same"))
model.add(BatchNormalization(axis=chanDim))
model.add(Activation("relu"))
# 7
final_layer = Conv2D(2, (3, 3), strides=1, padding="same")
model.add(final_layer)
model.add(BatchNormalization(axis=chanDim))
model.add(Activation("relu"))
# * first (and only) set of FC => RELU layers
model.add(Flatten())
model.add(Dense(64))
model.add(Activation("relu"))
model.add(BatchNormalization())
# model.add(Dropout(0.5))
# softmax classifier
model.add(Dense(2))
model.add(Activation("softmax"))
return model
new_model = PatchNDepthBasedCNN.build(width=IMG_DIM, height=IMG_DIM, depth=3)
new_model.compile(
optimizer="rmsprop",
loss="categorical_crossentropy",
metrics=["accuracy"],
)
During training, I save the model at each epoch (for the purpose of experiment). I always thought that the latest model (the latest epoch) must be the preferred one (in case the model hasn't started to overfit). Still, when I assess each variant (epoch) of the trained model on the test set (from another data distribution), I get randomly fluctuating results from epoch to epoch. Say, the test accuracy on epoch 60 can be around 72%, epoch 61 - 97%, epoch 63 - 80%.
At the same time, if I substitute the last two layers of the model, and change the loss function to simulate SVM, I get overall worse results, but the tendency is clearly seen from epoch to epoch (test accuracy slowly rises from base 50% to around 78%, and then fluctuates within a small margin):
class PatchNDepthBasedCNN:
#staticmethod
def build(width=256, height=256, depth=3):
# initialize the model along with the input shape to be
# "channels last" and the channels dimension itself
model = Sequential()
inputShape = (height, width, depth)
chanDim = -1
# if we are using "channels first", update the input shape
# and channels dimension
if K.image_data_format() == "channels_first":
inputShape = (depth, height, width)
chanDim = 1
# 1
model.add(Conv2D(32, (3, 3), strides=1, padding="same", input_shape=inputShape))
model.add(BatchNormalization(axis=chanDim))
model.add(Activation("relu"))
model.add(MaxPooling2D(pool_size=(3, 3), strides=2, padding="same"))
# 2
model.add(Conv2D(32, (3, 3), strides=1, padding="same"))
model.add(BatchNormalization(axis=chanDim))
model.add(Activation("relu"))
model.add(Conv2D(25, (3, 3), strides=1, padding="same"))
model.add(BatchNormalization(axis=chanDim))
model.add(Activation("relu"))
model.add(Conv2D(32, (3, 3), strides=1, padding="same"))
model.add(BatchNormalization(axis=chanDim))
model.add(Activation("relu"))
x1 = MaxPooling2D(pool_size=(3, 3), strides=2, padding="same")
model.add(x1)
# the reshaped tensor will be used later in concatenate
print(x1.output.shape)
x1r = Reshape((int(width / 8), int(height / 8), 128))(x1.output)
# 3
model.add(Conv2D(32, (3, 3), strides=1, padding="same"))
model.add(BatchNormalization(axis=chanDim))
model.add(Activation("relu"))
model.add(Conv2D(25, (3, 3), strides=1, padding="same"))
model.add(BatchNormalization(axis=chanDim))
model.add(Activation("relu"))
model.add(Conv2D(32, (3, 3), strides=1, padding="same"))
model.add(BatchNormalization(axis=chanDim))
model.add(Activation("relu"))
x2 = MaxPooling2D(pool_size=(3, 3), strides=2, padding="same")
model.add(x2)
# 4
model.add(Conv2D(32, (3, 3), strides=1, padding="same"))
model.add(BatchNormalization(axis=chanDim))
model.add(Activation("relu"))
model.add(Conv2D(25, (3, 3), strides=1, padding="same"))
model.add(BatchNormalization(axis=chanDim))
model.add(Activation("relu"))
x3 = Conv2D(32, (3, 3), strides=1, padding="same")
model.add(x3)
model.add(BatchNormalization(axis=chanDim))
model.add(Activation("relu"))
# 5 - concat
c = Concatenate()([x1r, x2.output, x3.output])
model.add(InputLayer(input_tensor=c))
# 6 -
model.add(Conv2D(32, (3, 3), strides=1, padding="same"))
model.add(BatchNormalization(axis=chanDim))
model.add(Activation("relu"))
model.add(Conv2D(25, (3, 3), strides=1, padding="same"))
model.add(BatchNormalization(axis=chanDim))
model.add(Activation("relu"))
# 7
final_layer = Conv2D(2, (3, 3), strides=1, padding="same")
model.add(final_layer)
model.add(BatchNormalization(axis=chanDim))
model.add(Activation("relu"))
model.add(Flatten())
model.add(Dense(256))
model.add(Activation("relu"))
model.add(Dense(2, kernel_regularizer=l2(0.0001)))
model.add(Activation('linear'))
return model
new_model = PatchNDepthBasedCNN.build(width=IMG_DIM, height=IMG_DIM, depth=3)
new_model.compile(loss='hinge',
optimizer='adadelta',
metrics=['accuracy'])
What are the possble reasons/explanations for this behavior?
What advice could you give me to try achieve better results (if it is possible to infer from the provided data)?
Thank you for considering my question!
EDIT: Removed unused code (LR and other unused parameters from the model, they weren't actually taken into account while training the model, just forgot to remove them)

Tensor Tensor("flatten/Reshape:0", shape=(?, 2622), dtype=float32) is not an element of this graph

Hello StackOverFlow Team:
I built a model based on (Vgg_Face_Model) with weights loaded (vgg_face_weights.h5).
Note that I use tensorflow-gpu = 2.1.0 , and keras=2.3.1 , with Anaconda 3 create it as interpreter and used with pycharm
But the code shows an error in the part :
input_descriptor = [model.predict(face), img]
The code is:
def face_recognizer(face, db_descriptors):
# face = cv2.imread(img)
# face = cv2.resize(face, (IMG_Size, IMG_Size))
t0 = time.perf_counter()
face = np.array(face).reshape(-1, IMG_Size, IMG_Size, 3)
###### here error #################################
input_descriptor = [model.predict(face), img]
###################################################
K_nn_result = K_nn_Classifier(input_descriptor[0], db_descriptors, 5)
input_result = Knn_Distance_Score(K_nn_result)
if input_result[0] <= 10:
identity = 'stranger'
else:
identity = input_result[1]
# print('Done in',time.perf_counter()-t0)
return input_result, identity
def PrepareModels(self):
global mpFaceDetection, FaceDetector, model
mpFaceDetection = mp.solutions.face_detection
FaceDetector = mpFaceDetection.FaceDetection()
model = loadModel()
Model is:
import os
from pathlib import Path
# from tensorflow.keras.models import Model, Sequential
from tensorflow.keras.models import Model, Sequential, load_model
from tensorflow.keras.layers import Input, Convolution2D, ZeroPadding2D, MaxPooling2D, Flatten, Dense, Dropout, \
Activation
import gdown
# ---------------------------------------
def Vgg_Face_Model():
model = Sequential()
model.add(ZeroPadding2D((1, 1), input_shape=(224, 224, 3)))
model.add(Convolution2D(64, (3, 3), activation='relu'))
model.add(ZeroPadding2D((1, 1)))
model.add(Convolution2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D((2, 2), strides=(2, 2)))
model.add(ZeroPadding2D((1, 1)))
model.add(Convolution2D(128, (3, 3), activation='relu'))
model.add(ZeroPadding2D((1, 1)))
model.add(Convolution2D(128, (3, 3), activation='relu'))
model.add(MaxPooling2D((2, 2), strides=(2, 2)))
model.add(ZeroPadding2D((1, 1)))
model.add(Convolution2D(256, (3, 3), activation='relu'))
model.add(ZeroPadding2D((1, 1)))
model.add(Convolution2D(256, (3, 3), activation='relu'))
model.add(ZeroPadding2D((1, 1)))
model.add(Convolution2D(256, (3, 3), activation='relu'))
model.add(MaxPooling2D((2, 2), strides=(2, 2)))
model.add(ZeroPadding2D((1, 1)))
model.add(Convolution2D(512, (3, 3), activation='relu'))
model.add(ZeroPadding2D((1, 1)))
model.add(Convolution2D(512, (3, 3), activation='relu'))
model.add(ZeroPadding2D((1, 1)))
model.add(Convolution2D(512, (3, 3), activation='relu'))
model.add(MaxPooling2D((2, 2), strides=(2, 2)))
model.add(ZeroPadding2D((1, 1)))
model.add(Convolution2D(512, (3, 3), activation='relu'))
model.add(ZeroPadding2D((1, 1)))
model.add(Convolution2D(512, (3, 3), activation='relu'))
model.add(ZeroPadding2D((1, 1)))
model.add(Convolution2D(512, (3, 3), activation='relu'))
model.add(MaxPooling2D((2, 2), strides=(2, 2)))
model.add(Convolution2D(4096, (7, 7), activation='relu'))
model.add(Dropout(0.5))
model.add(Convolution2D(4096, (1, 1), activation='relu'))
model.add(Dropout(0.5))
model.add(Convolution2D(2622, (1, 1)))
model.add(Flatten())
model.add(Activation('softmax'))
return model
def loadModel():
model = Vgg_Face_Model()
# -----------------------------------
home = str(Path.home())
if os.path.isfile(home + '/.deepface/weights/vgg_face_weights.h5') != True:
print("vgg_face_weights.h5 will be downloaded...")
url = 'https://drive.google.com/uc?id=1CPSeum3HpopfomUEK1gybeuIVoeJT_Eo'
output = home + '/.deepface/weights/vgg_face_weights.h5'
gdown.download(url, output, quiet=False)
# -----------------------------------
model.load_weights(home + '/.deepface/weights/vgg_face_weights.h5')
# -----------------------------------
# TO-DO: why?
vgg_model_descriptor = Model(inputs=model.layers[0].input, outputs=model.layers[-2].output)
return vgg_model_descriptor
# model = loadModel()
output:
Tensor Tensor("flatten/Reshape:0", shape=(?, 2622), dtype=float32) is not an element of this graph.'
from tensorflow.python.keras.backend import set_session
sess = tf.Session()
#This is a global session and graph
graph = tf.get_default_graph()
set_session(sess)
#now where you are calling the model
global sess
global graph
with graph.as_default():
set_session(sess)
input_descriptor = [model.predict(face), img]

Combination of ResNet and ConvNet

I have prepared a CNN model for image colorization:
"""Encoder - Input grayscale image (L)"""
model = Sequential()
model.add(Conv2D(32, (3, 3), activation='relu', padding='same', input_shape=(256, 256, 1)))
...
"""Latent space"""
model.add(Conv2D(512, (3,3), activation='relu', padding='same'))
"""Decoder - output (A,B)"""
...
model.add(Conv2D(2, (3, 3), activation='tanh', padding='same'))
Now i want to use ResNet as feature extractor and merge the output to Latent space.
I have already imported ResNet model as:
resnet50_imagnet_model = tf.keras.applications.resnet.ResNet50(weights = "imagenet",
include_top=False,
input_shape = (256, 256, 3),
pooling='max')
Encoder
"""Encoder - Input grayscale image (L)"""
encoder = Sequential()
model.add(Conv2D(32, (3, 3), activation='relu', padding='same', input_shape=(256, 256, 1)))
...
Decoder
decoder = """Decoder - output (A,B)"""
...
Use tf.keras.Sequential() to merge all models
comb_model = tf.keras.Sequential(
[encoder,resnet50_imagnet_model, decoder]
)

Feed CNN features to LSTM

I want to build an end-to-end trainable model with the following proprieties:
CNN to extract features from image
The features is reshaped to a matrix
Each row of this matrix is then fed to LSTM1
Each column of this matrix is then fed to LSTM2
The output of LSTM1 and LSTM2 are concatenated for the final output
(it's more or less similar to Figure 2 in this paper: https://arxiv.org/pdf/1611.07890.pdf)
My problem now is after the reshape, how can I feed the values of feature matrix to LSTM with Keras or Tensorflow?
This is my code so far with VGG16 net (also a link to Keras issues):
# VGG16
model = Sequential()
model.add(Conv2D(64, (3, 3), activation='relu', padding='same', input_shape=(224, 224, 3)))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D((2, 2)))
# block 2
model.add(Conv2D(128, (3, 3), activation='relu', padding='same'))
model.add(Conv2D(128, (3, 3), activation='relu'))
model.add(MaxPooling2D((2, 2)))
# block 3
model.add(Conv2D(256, (3, 3), activation='relu', padding='same'))
model.add(Conv2D(256, (3, 3), activation='relu'))
model.add(Conv2D(256, (3, 3), activation='relu'))
model.add(MaxPooling2D((2, 2)))
# block 4
model.add(Conv2D(512, (3, 3), activation='relu', padding='same'))
model.add(Conv2D(512, (3, 3), activation='relu'))
model.add(Conv2D(512, (3, 3), activation='relu'))
model.add(MaxPooling2D((2, 2)))
# block 5
model.add(Conv2D(512, (3, 3), activation='relu', padding='same'))
model.add(Conv2D(512, (3, 3), activation='relu'))
model.add(Conv2D(512, (3, 3), activation='relu'))
model.add(MaxPooling2D((2, 2)))
# block 6
model.add(Flatten())
model.add(Dense(4096, activation='relu'))
model.add(Dense(4096, activation='relu'))
# reshape the feature 4096 = 64 * 64
model.add(Reshape((64, 64)))
# How to feed each row of this to LSTM?
# This is my first solution but it doesn’t look correct:
# model.add(LSTM(256, input_shape=(64, 1))) # 256 hidden units, sequence length = 64, feature dim = 1
Consider building your CNN model with Conv2D and MaxPool2D layers, until you reach your Flatten layer, because the vectorized output from the Flatten layer will be you input data to the LSTM part of your structure.
So, build your CNN model like this:
model_cnn = Sequential()
model_cnn.add(Conv2D...)
model_cnn.add(MaxPooling2D...)
...
model_cnn.add(Flatten())
Now, this is an interesting point, the current version of Keras has some incompatibility with some TensorFlow structures that will not let you stack your entire layers in just one Sequential object.
So it's time to use the Keras Model Object to complete you neural network with a trick:
input_lay = Input(shape=(None, ?, ?, ?)) #dimensions of your data
time_distribute = TimeDistributed(Lambda(lambda x: model_cnn(x)))(input_lay) # keras.layers.Lambda is essential to make our trick work :)
lstm_lay = LSTM(?)(time_distribute)
output_lay = Dense(?, activation='?')(lstm_lay)
And finally, now it's time to put together our 2 separated models:
model = Model(inputs=[input_lay], outputs=[output_lay])
model.compile(...)
OBS: Note that you can substitute my model_cnn example by your VGG without including the top layers, once the vectorized output from the VGG Flatten layer will be the input of the LSTM model.