Training the autoencoder with identical images - tensorflow

I am training the autoencoder with 2000 identical images. My expectation is, that given the autoencoder has enough capacity the loss will approach 0 and the accuracy will approach 1 after a certain training time. Instead I see a quick convergence to loss = 0.07 and accuracy=0.76. Reducing the number of convolutional layers gave some improvement. Reducing the number of kernels per layer increased the loss. There is no improvement after that. Is my expectation wrong? Or is there something wrong with my autoencoder architecture? What can be done to make an almost lossless autoencoder?
input_img = Input(shape=(image_size_x, image_size_y, 1))
x = Conv2D(32, (3, 3), activation='relu', padding='same')(input_img)
x = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(16, (3, 3), activation='relu', padding='same')(x)
x = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(16, (3, 3), activation='relu', padding='same')(x)
x = UpSampling2D((2, 2))(x)
x = Conv2D(32, (3, 3), activation='relu', padding='same')(x)
x = UpSampling2D((2, 2))(x)
decoded = Conv2D(1, (3, 3), activation='sigmoid', padding='same')(x)
autoencoder.compile(optimizer='adadelta', loss='binary_crossentropy')
Thanks!

You need to add a dense layer between your autoconvolutional encoder and autoconvolution decoder. This is the latent reprensentation, also called embedding layer. This is the layer in which the image is compressed. That is the "compressed knowledge" that the architecture is trying to "learn".
For the implementation, from this tutorial: https://www.tensorflow.org/tutorials/generative/cvae
I would suggest you add these lines between the encoder and the decoder part:
x = tf.keras.layers.Flatten()(x),
x = tf.keras.layers.Dense(latent_dim + latent_dim)

Related

movie similarity using Word2Vec and deep Convolutional Autoencoders

i am new to python and i am trying to create a model that can measure how similar movies are based on the movies description,the steps i followed so far are:
1.turn each movie description into a vector of 100*(maximum number of words possible for a movie description) values using Word2Vec, this results in a 21300-values vector for each movie description.
2.create a deep convolutional autoencoder that tries to compress each vector(and hopefully extract meaning from it).
while the first step was successful and i am still struggling with the autoencoder, here is my code so far:
encoder_input = keras.Input(shape=(21300,), name='sum')
encoded= tf.keras.layers.Reshape((150,142,1),input_shape=(21300,))(encoder_input)
x = tf.keras.layers.Conv2D(128, (3, 3), activation="relu", padding="same",input_shape=(1,128,150,142))(encoded)
x = tf.keras.layers.MaxPooling2D((2, 2), padding="same")(x)
x = tf.keras.layers.Conv2D(64, (3, 3), activation="relu", padding="same")(x)
x = tf.keras.layers.MaxPooling2D((2, 2), padding="same")(x)#49*25*64
x = tf.keras.layers.Conv2D(32, (3, 3), activation="relu", padding="same")(x)
x = tf.keras.layers.MaxPooling2D((2, 2), padding="same")(x)#25*13*32
x = tf.keras.layers.Conv2D(16, (3, 3), activation="relu", padding="same")(x)
x = tf.keras.layers.MaxPooling2D((2, 2), padding="same")(x)
x = tf.keras.layers.Conv2D(8, (3, 3), activation="relu", padding="same")(x)
x = tf.keras.layers.MaxPooling2D((2, 2), padding="same")(x)
x=tf.keras.layers.Flatten()(x)
encoder_output=keras.layers.Dense(units=90, activation='relu',name='encoder')(x)
x= tf.keras.layers.Reshape((10,9,1),input_shape=(28,))(encoder_output)
# Decoder
decoder_input=tf.keras.layers.Conv2D(8, (3, 3), activation='relu', padding='same')(x)
x = tf.keras.layers.UpSampling2D((2, 2))(decoder_input)
x = tf.keras.layers.Conv2D(16, (3, 3), activation='relu')(x)
x = tf.keras.layers.UpSampling2D((2, 2))(x)
x = tf.keras.layers.Conv2D(32, (3, 3), activation='relu')(x)
x = tf.keras.layers.UpSampling2D((2, 2))(x)
x = tf.keras.layers.Conv2D(64, (3, 3), activation='relu')(x)
x = tf.keras.layers.UpSampling2D((2, 2))(x)
x = tf.keras.layers.Conv2D(128, (3, 3), activation='relu')(x)
x = tf.keras.layers.UpSampling2D((2, 2))(x)
decoder_output = keras.layers.Conv2D(1, (3, 3), activation='relu', padding='same')(x)
autoencoder = keras.Model(encoder_input, decoder_output)
opt = tf.keras.optimizers.Adam(learning_rate=0.001, decay=1e-6)
autoencoder = keras.Model(encoder_input, decoder_output, name='autoencoder')
autoencoder.compile(opt, loss='mse')
print("STARTING FITTING")
history = autoencoder.fit(
movies_vector,
movies_vector,
epochs=25,
)
print("ENCODER READY")
#USING THE MIDDLE LAYER
encoder = keras.Model(inputs=autoencoder.input,
outputs=autoencoder.get_layer('encoder').output)
running this code gives me the following error:
required broadcastable shapes [[node mean_squared_error/SquaredDifference (defined at tmp/ipykernel_52/3425712667.py:119) ]] [Op:__inference_train_function_1568]
i have two questions:
1.how can i fix this error?
2.how can i improve my autoencoder so that i can use the compressed vectors to test for movie similarity?
The output of your model is (batch_size, 260, 228, 1), while your targets appear to be (batch_size, 21300). You can solve that problem by either adding a tf.keras.layers.Flatten() layer to the end of your model, or by not flattening your input.
You probably should not be using 2D convolutions, as there is no spatial or temporal correlation between adjacent feature channels in most text embedding. You should be able to safely reshape to (150,142) rather than (150, 142, 1) and use 1D convolution, pooling, and upsampling layers.

problem using convolutional autoencoder for 2d data

I want to train an autoencoder for the purpose of gpr investigations.
The input data dimension is 149x8.However, While i am trying deep autoencoder it works fine
input_img = Input(shape=(8,))
encoded1 = Dense(8, activation='relu')(input_img)
encoded2 = Dense(4, activation='relu')(encoded1)
encoded3 = Dense(2, activation='relu' )(encoded2)
decoded1 = Dense(2, activation='relu' )(encoded3)
decoded2 = Dense(4, activation='relu')(decoded1)
decoded3 = Dense(8, activation='relu' )(decoded2)
decoded = Dense(8, activation='linear')(decoded3)
autoencoder = Model(input_img, decoded)
sgd = optimizers.Adam(lr=0.001)
autoencoder.compile(optimizer=sgd, loss='mse')
autoencoder.summary()
..................................................
But while trying to use convolutional autoencoder for the same input
it gives error `ValueError: Input 0 is incompatible with layer conv2d_1: expected ndim=4, found ndim=2`
can anybody suggest me how to overcome this problem.
My code is
input_img = Input(shape=(8,))
x = layers.Conv2D(16, (3, 3), activation='relu', padding='same')(input_img)
x = layers.MaxPooling2D((2, 2), padding='same')(x)
x = layers.Conv2D(8, (3, 3), activation='relu', padding='same')(x)
x = layers.MaxPooling2D((2, 2), padding='same')(x)
x = layers.Conv2D(8, (3, 3), activation='relu', padding='same')(x)
encoded = layers.MaxPooling2D((2, 2), padding='same')(x)
x = layers.Conv2D(8, (3, 3), activation='relu', padding='same')(encoded)
x = layers.UpSampling2D((2, 2))(x)
x = layers.Conv2D(8, (3, 3), activation='relu', padding='same')(x)
x = layers.UpSampling2D((2, 2))(x)
x = layers.Conv2D(16, (3, 3), activation='relu')(x)
x = layers.UpSampling2D((2, 2))(x)
decoded = layers.Conv2D(1, (3, 3), activation='sigmoid', padding='same')(x)
autoencoder = Model(input_img, decoded)
sgd = optimizers.Adam(lr=0.001)
autoencoder.compile(optimizer=sgd, loss='mse')
autoencoder.summary()
Wrong Input Shape:
This is because we are passing the input shape of (8,) and 1 extra dimension added by TensorFlow for Batch size, so the error message says that it found ndim=3, but the CNN has expected min_ndim=4, 3 for the image size and 1 for the batch size. e.g.
input_shape=(number_of_rows, 28,28,1)

Get the output of a specific layer as the result on test data in place of last layer (auto-encoder latent features) in keras

I am trying to get the output of the latent layer/hidden layer to use it as input for something else. I trained my model in an efficient way to minimize the loss so my model could learn the latent features efficiently and as close as possible to the image.
My model is
input_img = Input(shape=(28, 28, 1)) # adapt this if using `channels_first` image data format
#Encoder
x = Conv2D(16, (3, 3), activation='relu', padding='same')(input_img)
x = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(8, (3, 3), activation='relu', padding='same')(x)
encoded = MaxPooling2D((2, 2), padding='same')(x)
# Decoder
x = Conv2D(8, (3, 3), activation='relu', padding='same')(encoded)
x = UpSampling2D((2, 2))(x) # opposite of Pooling
x = Conv2D(16, (3, 3), activation='relu')(x)
x = UpSampling2D((2, 2))(x)
decoded = Conv2D(1, (3, 3), activation='sigmoid', padding='same')(x)
autoencoder = Model(input_img, decoded)
autoencoder.compile(optimizer='adadelta', loss='binary_crossentropy')
I want the output of encoded layer as my output for the model. Is it possible? ad If yes, Please tell me how.
you can simply do in this way
autoencoder.fit(...)
latent_model = Model(input_img, encoded)
latent_representation = latent_model.predict(X)

CNN Overfitting

I have a siamese CNN that is performing very well (96% accuracy, 0.08 loss) on training data but poorly (70% accuracy, 0.1 loss) on testing data.
The architecture is below:
input_main = Input(shape=input_shape, dtype='float32')
x = Conv2D(32, (3, 3), padding='same', activation='relu',
kernel_regularizer=l2(0.005))(input_main)
x = Conv2D(16, (5, 5), activation='relu',
kernel_regularizer=l2(0.005))(x)
x = MaxPooling2D(pool_size=(5, 5))(x)
x = Dropout(0.5)(x)
x = Conv2D(32, (3, 3), padding='same', activation='relu',
kernel_regularizer=l2(0.0005))(x)
x = Conv2D(32, (7, 7), activation='relu',
kernel_regularizer=l2(0.005))(x)
x = MaxPooling2D(pool_size=(3, 3))(x)
x = Dropout(0.5)(x)
x = Flatten()(x)
#x = Dropout(0.5)(x)
x = Dense(16, activation='relu',
kernel_regularizer=l2(0.005))(x)
model = Model(inputs=input_main, outputs=x)
Two of these are then combined to make a siamese architecture, and the difference between the vectors from the final layer informs the result. I have experimented with dropout and regularization, and neither has been able to solve the problem (these parameters are the ones I am testing at time of posting)
I have also tried simplifying the architecture to fewer conv layers, and this has not solved the problem.
The data is 256x128x1 images, sent through the network in pairs with binary labels based on whether they are the same or not. I also use data augmentation, with some small rotations and translations.
Can anyone suggest anything else to try to solve this overfitting problem?

Training Convolutional Autoencoder with Keras

I'm training a convolutional autoencoder for IR faces, this is my first time doing autoencoder. I have about 1300 training images, and I didn't using any regulation method. Here's what I got after 800 epochs:
top: test images, bottom: output from autoencoder.
And this is my training curve: top: training loss, bottom: validation loss. Validation loss uses the test set images that is separated from training set. At the end, the training loss is about 0.006, but the validation loss is 0.009.
My model is defined bellow, with input images with size 110X150 and output images with size 88X120. I simply resize the source images to make the training labels. Each sample/label are normalized by dividing by 255.
As for the architecture of this network, I read one paper using this similar layout for RGB images face feature, and I halved each layer's depth (channels) for my purpose.
So my question is, is there something wrong? The training curve is quite weird to me. And how do I improve this autoencoder? More epochs? Regulations? Choose another activation function(I heard about leaky ReLU is better). Any feedback and suggestion is appreciated, thanks!
def create_models():
input_img = Input(shape=(150, 110, 1)) # adapt this if using `channels_first` image data format
x = Conv2D(128, (3, 3), activation='relu', padding='same')(input_img)
x = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(128, (3, 3), activation='relu', padding='same')(x)
x = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(256, (3, 3), activation='relu')(x)
x = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(512, (3, 3), activation='relu')(x)
encoded = MaxPooling2D((2, 2), padding='same')(x)
# at this point the representation is (8, 6, 512) i.e. 128-dimensional
x = Conv2D(512, (3, 3), activation='relu', padding='same')(encoded)
x = UpSampling2D((2, 2))(x)
x = Conv2D(256, (3, 3), activation='relu', padding='same')(x)
x = UpSampling2D((2, 2))(x)
x = Conv2D(128, (3, 3), activation='relu')(x)
x = UpSampling2D((2, 2))(x)
x = Conv2D(128, (3, 3), activation='relu', padding='same')(x)
x = UpSampling2D((2, 2))(x)
decoded = Conv2D(1, (3, 3), activation='tanh', padding='same')(x)
autoencoder = Model(input_img, decoded)
autoencoder.compile(optimizer='adadelta', loss='mean_squared_error')
return autoencoder