How to replace the input channel shape from (224, 224, 3) to (224, 224, 1) in VGG16? - tensorflow

I am using VGG16 for transfer learning. My images are grayscale. So, I need to change the input channel shape of Vgg16 from (224, 224, 3) to (224, 224, 1). I tried the following code and got error:
TypeError: build() takes from 1 to 2 positional arguments but 4 were given
Can anyone help me where Am I doing it wrong?
vgg16_model= load_model('Fetched_VGG.h5')
vgg16_model.summary()
# transform the model to Sequential
model= Sequential()
for layer in vgg16_model.layers[1:-1]:
model.add(layer)
# Freezing the layers (Oppose weights to be updated)
for layer in model.layers:
layer.trainable = False
model.build(224,224,1)
model.add(Dense(2, activation='softmax', name='predictions'))

you can't, even if you get rid of the input layer, this model has a graph that has already been compiled and your first conv layer expects an input with 3 channels. I don't think there is really an easy work around to make it accept 1 channel if there is any at all.
you need to repeat your data in third dimension and have the same grayscale image in all 3 bands instead of RGB, that works just fine.
if your image has the shape of : (224,224,1):
import numpy as np
gray_image_3band = np.repeat(gray_img, repeats = 3, axis = -1)
if your image has the shape of : (224,224)
gray_image_3band = np.repeat(gray_img[..., np.newaxis], repeats = 3, axis = -1)
you don't need to call the model.build() anymore this way, keep the input layer. but if you ever wanted to call it you need to pass the shape as a tuple like this:
model.build( (224, 224, 1) ) # this is correct, notice the parentheses

Related

How transfer learning on EfficientNets work for grayscale images?

My question concerns more about how the algorithm work. I have successfully implemented EfficientNet integration and modelization for grayscale images and now I want to understand why it works.
Here the most important aspect is the grayscale and its 1 channel. When I put channels=1, the algorithm doesn't work because, if I understood right, it was made on 3-channel images. When I put channels=3 it works perfectly.
So my question is, when I put channels = 3 and feed the model with preprocessed images with channels=1, why it continues to work?
Code for EfficientNetB5
# Variable assignments
num_classes = 9
img_height = 84
img_width = 112
channels = 3
batch_size = 32
# Make the input layer
new_input = Input(shape=(img_height, img_width, channels),
name='image_input')
# Download and use EfficientNetB5
tmp = tf.keras.applications.EfficientNetB5(include_top=False,
weights='imagenet',
input_tensor=new_input,
pooling='max')
model = Sequential()
model.add(tmp) # adding EfficientNetB5
model.add(Flatten())
...
Code of preprocessing into grayscale
data_generator = ImageDataGenerator(
validation_split=0.2)
train_generator = data_generator.flow_from_directory(
train_path,
target_size=(img_height, img_width),
batch_size=batch_size,
color_mode="grayscale", ###################################
class_mode="categorical",
subset="training")
I dug into what happens when you give grayscale images to efficient net models with three-channel inputs.
Here are the first layers of Efficient Net B5 whose input_shape is (128,128,3)
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_7 (InputLayer) [(None, 128, 128, 3 0 []
)]
rescaling_7 (Rescaling) (None, 128, 128, 3) 0 ['input_7[0][0]']
normalization_13 (Normalizatio (None, 128, 128, 3) 7 ['rescaling_7[0][0]']
n)
tf.math.truediv_4 (TFOpLambda) (None, 128, 128, 3) 0 ['normalization_13[0][0]']
stem_conv_pad (ZeroPadding2D) (None, 129, 129, 3) 0 ['tf.math.truediv_4[0][0]']
And here is the shape of the output of each of these layers when the model has as input a grayscale image:
input_7 (128, 128, 1)
rescaling_7 (128, 128, 1)
normalization_13 (128, 128, 3)
tf.math.truediv_4 (128, 128, 3)
stem_conv_pad (129, 129, 3)
As you can see, the number of channels of the output tensor switches from 1 to 3 when proceeding to the normalization_13 layer, so let's see what this layer is actually doing.
The Normalization layer is performing this operation on the input tensor:
(input_tensor - self.mean) / sqrt(self.var) // see https://www.tensorflow.org/api_docs/python/tf/keras/layers/Normalization
The number of channels changes after the subtraction. As a matter of fact, self.mean looks like this :
<tf.Tensor: shape=(1, 1, 1, 3), dtype=float32, numpy=array([[[[0.485, 0.456, 0.406]]]], dtype=float32)>
So self.mean has three channels and when performing the subtraction between a tensor with one channel and a tensor with three channels, the output looks like this: [firstTensor - secondTensorFirstChannel, firstTensor - secondTensorSecondChannel, firstTensor - secondTensorThirdChannel]
And this is how the magic happens and this is why the model can take as input grayscale images!
I have checked this with efficient net B5 and with efficient net B2V2. Even if they have differences in the way the Normalization layer is declared, the process is the same. I suppose that is also the case for the other efficient net models.
I hope it was clear enough!
This is interesting. If training still works with channels = 3 even though the input is grayscale, I would check the batch shape of the train_generator(maybe print a couple of batches to get a feel for it). Here is a code snippet to quickly check the batch shape. (plotImages() is available in Tensorflow docs)
imgs,labels = next(train_generator)
print('Batch shape: ',imgs.shape)
plotImages(imgs,labels)

a problem using LSTM network (neural networks)

im trying to create a speaker diarization system using lstm (im trying to make the network tell the difference between speakers).
this is the model i've created:
model = Sequential()
model.add(LSTM(768, batch_input_shape=(39, 40, 1), return_sequences=True))
model.add(Dense(256))
model.add(LSTM(768, return_sequences=True))
model.add(Dense(256))
model.add(LSTM(768, return_sequences=True))
model.add(Dense(4))
there are 4 different speakers.
in my dataset i have the array 'features' (256 at length for 256 speech segments).
for each segment in 'features' i have 39 vectors to represent each segment and each of these vectors is at size 40.
each of these 39 vectors is extracted from a different time window. (i used log mel filterbank energies).
i also have the array 'lables' which is also 256 at length and contains the lables for each segment.
i used 'to_categorical' for it:
labels = tf.keras.utils.to_categorical(labels, num_classes=4)
i tried using a generator to feed it to the network but it didnt work.
this is the class i used:
class KerasBatchGenerator(object):
def __init__(self, features, batch_size, labels):
self.features = features
self.batch_size = batch_size
self.labels = labels
def generate(self):
while True:
for i in self.labels:
for j in self.features:
temp = [j, i]
# temp = np.expand_dims(temp, axis=1)
temp = np.expand_dims(temp, axis=2)
yield tuple(temp)
and the code i used to run the network is:
train_data_generator = KerasBatchGenerator(features, batch_size, labels)
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
model.fit_generator(train_data_generator.generate(), 100, 1)
please help!!!
If i guessed correctly, you want to classify which input in spoke by which speaker.
In that case your final layer should have a shape (batch_size, numOfClasses) or (39, 4)
But if you take close look at the summary the output shape for final layer is (39, 40, 4)
to get the proper shape remove the argument return_sequences=True from last LSTM layer.

Issues with Keras Conv1D and VGG

I trying to build a deep learning model with VGG16 on top. I have implemented it in Keras using following code:
image_input = Input(shape=(224, 224, 3))
model = VGG16(input_tensor=image_input, include_top=True,weights='imagenet')
model.summary()
fc7 = model.get_layer('fc2').output
conv1d = Conv1D(1,5,activation='relu', name="conv1d",input_shape=(1,4096)) (fc7) #error appears here
# flat = Flatten()(conv1d)
fc8 = Dense(512, activation='relu', name="fc8")(conv1d)
#x= Flatten(name='flatten')(last_layer)
out = Dense(num_classes, activation='softmax', name='output')(fc8)
custom_vgg_model = Model(image_input, out)
custom_vgg_model.summary()
I am getting the following error:
ValueError: Input 0 is incompatible with layer conv1d: expected ndim=3, found ndim=2
Why can't we do the consecutive feature vectors 1d convolution like in the image below?
enter link description here
A fully connected layer in a VGG is 2D, and a 1D convolutional layer expects 3D data.
At the point where VGG adds a Dense layer, it destroys the image format (4D) with a flatten or a global pooling, transforming it into plain data (2D). You no longer have dimensions to use convolutions.
If you try to explain why you want a Conv1D, what do you expect from it, then we could think of an alternative.
Example model:
movie_data = any_data_with_shape((number_of_videos, frames, 224, 224, 3))
movie_input = Input((None,224,224,3)) #None means any number of frames
vgg = VGG16(include_top=True,weights='imagenet')
This part is only necessary if you're getting intermediary outputs from vgg:
vgg_in = vgg.input
vgg_out = vgg.get_layer('fc2').output #make sure this layer exists
vgg = Model(vgg_in, vgg_out)
Continue:
vgg_outs = TimeDistributed(vgg)(movie_input) #out shape (None, frames, fc2_units)
outs = Conv1D(.....)(vgg_outs)
outs = GlobalAveragePooling1D()(outs)
outs = Dense(....)(outs)
.....
your_model = model(move_input, outs)

How to flatten the RNN output for Dense layer?

I'd like to classify a signal containing
X = (n_samples, n_timesteps, n_features), where n_samples=476, n_timesteps=400, n_features=16 are the number of samples, timesteps, and features (or channels) of the signal.
y = (n_samples, n_timesteps, 1). Each timestep is labeled by either 0 or 1 (binary classification).
My graph model is shown in the figure below.
The input is fed into a 32-unit LSTM. The LSTM output enters a 1-unit Dense layer to generate a 400x1 vector, where 400 is the number of timesteps. I then would like to put this 400x1 vector into a 400-unit Dense layer. I tried to flatten the 1-unit Dense, but the shape of the final output does not match the label 400x1 vector.
The snippet and model are shown as follows.
input_layer = Input(shape=(n_timestep, n_feature))
lstm1 = LSTM(32, return_sequences=True)(input_layer)
dense1 = Dense(1, activation='sigmoid')(lstm1)
flat1 = TimeDistributed(Flatten())(dense1)
dense2 = TimeDistributed(Dense(400, activation='sigmoid'))(flat1)
model = Model(inputs=input_layer, outputs=dense2)
model.summary()
The error is seen below.
ValueError: Error when checking target: expected time_distributed_4 to have shape (400, 400) but got array with shape (400, 1)
Please let me know how to fix it. Thanks.

How to load MobileNet weights with an input tensor in Keras

I'm trying to apply transfer learning to MNIST using MobileNet weights in Keras. Keras documentation to use MobileNet https://keras.io/applications/#mobilenet
Mobilenet accepts 224x224x3 as input but MNIST is 28x28x1. I'm creating a Lambda layer which can convert 28x28x1 image into 224x224x3 and send it as input to MobileNet. The following code causes
TypeError: Input layers to a Model must be InputLayer objects. Received inputs: Tensor("lambda_2/ResizeNearestNeighbor:0", shape=(?, 224, 224, 3), dtype=float32). Input 0 (0-based) originates from layer type Lambda.
height = 28
width = 28
input_image = Input(shape=(height,width,1))
def resize_image_to_inception(x):
x = K.repeat_elements(x, 3, axis=3)
x = K.resize_images(x, 8, 8, data_format="channels_last")
return x
input_image_ = Lambda(resize_image_to_inception, output_shape=(224, 224, 3))(input_image)
print(type(input_image_))
base_model = MobileNet(input_tensor=input_image_, weights='imagenet', include_top=False)