Incompatible input_shape in deep learning using keras - tensorflow

I am starting off my deep learning journey using the IMDB movie review dataset. I am not sure how the training data is loaded and the input_shape is specified.
(train_data, train_labels), (test_data, test_labels) = imdb.load_data(num_words=10000)
If I understand it correctly, there are 25000 reviews from different movies and the words are encoded as list of integers.
train_data.shape
(25000,)
Since each review has different lengths, how is it possible to store such data in a matrix (i.e., 25,000 rows but columns with different lengths)?
model = models.Sequential()
model.add(layers.Dense(16, activation='relu', input_shape=(10000,)))
model.add(layers.Dense(16, activation='relu', input_shape=(16,)))
model.add(layers.Dense(1, activation='sigmoid', input_shape=(16,)))
After reshaping the input to (25000,10000) using one-hot encoding, why is the first argument to input_shape not 25000 (number of samples) rather than 10000 since the Dense layer will compute the outputs according to relu(dot(w, inputs) + b)? ==> (25000, 10000) dot (10000, 16) = (25000, 16) Why don't we specify the input_shape as [None, 10000] as in Tensorflow core?

You can pad using keras pad_sequence
data = pad_sequences(data, maxlen=max_length, padding='post')
I recommend you to use an Embedding layer also
model = Sequential()
model.add(Embedding(num_words, 32, input_length=maxlen))

Related

CNN with LSTM-Layer

I have implemented a CNN with an LSTM layer. My input consists of four images. The images were transformed into a tensor by feature extraction. The input shape is (4,256,256,3).
The following is the structure of my model:
model = keras.models.Sequential()
model.add(TimeDistributed(Conv2D(32,(3,3),padding = 'same', activation = 'relu'),input_shape = (4,256,256,3)))
model.add(TimeDistributed(MaxPooling2D((2,2))))
model.add(TimeDistributed(Dropout(0.25)))
model.add(TimeDistributed(Conv2D(64,(3,3),padding = 'same', activation = 'relu')))
model.add(TimeDistributed(MaxPooling2D((4,4))))
model.add(TimeDistributed(Dropout(0.25)))
model.add(TimeDistributed(Conv2D(128,(3,3),padding = 'same', activation = 'relu')))
model.add(TimeDistributed(MaxPooling2D((2,2))))
model.add(TimeDistributed(Dropout(0.25)))
model.add(TimeDistributed(Flatten()))
model.add(LSTM(128, activation='tanh'))# finalize with standard Dense, Dropout...
model.add(Dense(64, activation='relu'))
model.add(Dropout(.5))
model.add(Dense(1, activation='relu'))
optim = keras.optimizers.Adam(learning_rate=0.001)
model.compile(optimizer=optim, loss=['MSE'])
history = model.fit(x=X, y=Y, batch_size=4, epochs=5, validation_split=0.2, validation_data=(X,Y))
My problem is that my model predicts the same values for all inputs.
What could be the problem?
you use the same data for training and validation. this kills the whole point of validation. Perhaps the mistake lies in this. Try to split the data, or apply cross validation.
Also, the application of the relu activation function to the last layer in combination with the mse error looks strange. At least the real can give an unlimited result, and the data should be normalized.
I hope this will help you
if you are working with a classification problem specifically binary classification, then using sigmoid activation instead softmax And MSE loss is not a good choice for binary classification.

Keras CNN: Multi Label Classification of Images

I am rather new to deep learning and got some questions on performing a multi-label image classification task with keras convolutional neural networks. Those are mainly referring to evaluating keras models performing multi label classification tasks. I will structure this a bit to get a better overview first.
Problem Description
The underlying dataset are album cover images from different genres. In my case those are electronic, rock, jazz, pop, hiphop. So we have 5 possible classes that are not mutual exclusive. Task is to predict possible genres for a given album cover. Each album cover is of size 300px x 300px. The images are loaded into tensorflow datasets, resized to 150px x 150px.
Model Architecture
The architecture for the model is the following.
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.models import Sequential
data_augmentation = keras.Sequential(
[
layers.experimental.preprocessing.RandomFlip("horizontal",
input_shape=(img_height,
img_width,
3)),
layers.experimental.preprocessing.RandomFlip("vertical"),
layers.experimental.preprocessing.RandomRotation(0.4),
layers.experimental.preprocessing.RandomZoom(height_factor=(0.2, 0.6), width_factor=(0.2, 0.6))
]
)
def create_model(num_classes=5, augmentation_layers=None):
model = Sequential()
# We can pass a list of layers performing data augmentation here
if augmentation_layers:
# The first layer of the augmentation layers must define the input shape
model.add(augmentation_layers)
model.add(layers.experimental.preprocessing.Rescaling(1./255))
else:
model.add(layers.experimental.preprocessing.Rescaling(1./255, input_shape=(img_height, img_width, 3)))
model.add(layers.Conv2D(32, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(128, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(128, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Flatten())
model.add(layers.Dense(512, activation='relu'))
# Use sigmoid activation function. Basically we train binary classifiers for each class by specifiying binary crossentropy loss and sigmoid activation on the output layer.
model.add(layers.Dense(num_classes, activation='sigmoid'))
model.summary()
return model
I'm not using the usual metrics here like standard accuracy. In this paper I read that you cannot evaluate multi-label classification models with the usual methods. In chapter 7. evaluation metrics the hamming loss and an adjusted accuracy (variant of exact match) are presented which I use for this model.
The hamming loss is already provided by tensorflow-addons (see here) and an implementation of the subset accuracy I found here (see here).
from tensorflow_addons.metrics import HammingLoss
hamming_loss = HammingLoss(mode="multilabel", threshold=0.5)
def subset_accuracy(y_true, y_pred):
# From https://stackoverflow.com/questions/56739708/how-to-implement-exact-match-subset-accuracy-as-a-metric-for-keras
threshold = tf.constant(.5, tf.float32)
gtt_pred = tf.math.greater(y_pred, threshold)
gtt_true = tf.math.greater(y_true, threshold)
accuracy = tf.reduce_mean(tf.cast(tf.equal(gtt_pred, gtt_true), tf.float32), axis=-1)
return accuracy
# Create model
model = create_model(num_classes=5, augmentation_layers=data_augmentation)
# Compile model
model.compile(loss="binary_crossentropy", optimizer="adam", metrics=[subset_accuracy, hamming_loss])
# Fit the model
history = model.fit(training_dataset, epochs=epochs, validation_data=validation_dataset, callbacks=callbacks)
Problem with this model
When training the model subset_accuracy hamming_loss are at some point stuck which looks like the following:
What could cause this behaviour. I am honestly a little bit lost right now. Could this be a case of the dying relu problem? Or is it wrong use of the metrics mentioned or is the implementation of those maybe wrong?
So far I tried to test differen optimizers and lowering the learning rate (e.g. from 0.01 to 0.001, 0.0001, etc..) but that didn't help either.
Maybe somebody has an idea that can help me.
Thanks in advance!
I think you need to tune your model's hyperparameters right. For that I'll recommend try using Keras Tuner library.
This would take some time to run, but will fetch you right set of hyperparameters.

Why do wee need to put one more layer and where is the softmax activation function?

I'm reading and testing the basic example of CNN from TensorFlow tutorial web site:
The model from the tutorial looks:
model = models.Sequential()
model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(32, 32, 3)))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu')
model.add(layers.Flatten())
# 1.why do we need the next line ?
model.add(layers.Dense(64, activation='relu'))
model.add(layers.Dense(10))
Two basic questions:
We are building CNN network.
Why do we need the last layer (model.add(layers.Dense(64, activation='relu'))) ?
It is not part of the CNN network, and from my tests, I'm getting same results (accuracy) with or without this last layer
In the tutorial they wrote that they used softmax in the last layer:
"CIFAR has 10 output classes, so you use a final Dense layer with 10 outputs and a softmax activation"
but they didn't use softmax in their code.
I checked the documentation, and the default activation function is None and not softmax. so the tutrial has a mistake and it is not used with softmax ?
Convolutional Neural Network (CNN)
CNN consist of (conv-pool)n-(flatten or globalpool)-(Dense)m, where the (conv-pool)n part extracts the features from a 2D signal and (Dense)m selects the features from the previous layers.
The output of the last layer is (4,4,64) which are 64 feature maps of size 4 × 4 (2D signals). We then flattens them to get a 4 × 4 × 64=1024 dim vector (instead, we can also use global max/avg pool to get a 64 dim vector). If you are using flatten then it will yield a 1024 dime vector and we have 10 classes. This will drastically reduce the dimension, leading to loss of important features. This is known as 'representation bottleneck'. To avoid this you can insert a Dense layer with (say 64 neuron) which will first project 1024 dim vector → 64 dim vector and then from 64 dim → 10 dim vector. If you use global max/ avg pooing then you can skip the additional Dense layer. In your case it seems that the representational bottleneck is avoided.
The tutorial is using
model.compile(optimizer='adam',
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=['accuracy'])
Tensorflow has efficient implementation for logits calculation. This way, you need not use softmax in the layer. It will automatically optimize it as if you used softmax.
But if you still wish to use softmax in the Dense layer then you can use it. but then in the compile() use from_logits=False. However, the later approach is less efficient as it requires double work.
The purpose of a dense layer or a fully connected layer before the final dense layer is to give weights or it votes to select the most appropriate label before selecting in the final layer. In this case of the image below adding a few more neurons to select the label cat
Check this link out for a deeper understanding of fc layers: https://missinglink.ai/guides/convolutional-neural-networks/fully-connected-layers-convolutional-neural-networks-complete-guide/
A softmax layer typically maps the predictions(logits) into a more understandable format where's each value in the tensor can add up to become 1
[1.6e-7, 1.6e-8, 1.6e-9, 1.6e-10] # Before applying softmax
[0.6, 0.1, 0.2, 0.1] # After applying softmax
Note: The typical way of using the predictions is getting the highest value with the tensor
import numpy as np
preds = model.predict(batch_data)
highest_val = np.argmax(preds) # returns an index, in this case 0

How to build a pretrained CNN-LSTM network with Keras

I'm trying to use a CNN-LSTM network with Keras in order to analyze videos. I read about it and run into TimeDistributed function and some examples.
Actually, I tried the network described below, which is in fact composed by a convolutional and pooling layers followed by recurrent and dense layers.
model = Sequential()
model.add(TimeDistributed(Conv2D(2, (2,2), activation= 'relu' ), input_shape=(None, IMG_SIZE, IMG_SIZE, 3)))
model.add(TimeDistributed(MaxPooling2D(pool_size=(2, 2))))
model.add(TimeDistributed(Flatten()))
model.add(LSTM(50))
model.add(Dense(50, activation = 'softmax'))
model.compile(loss = 'categorical_crossentropy' , optimizer = 'adam' , metrics = ['acc'])
I haven't tested properly the model, since my dataset is too small. However, during training process the network reaches accuracy 0.98 in 4-5 epochs (perhaps it is overfitting, but it isn't a problem yet because I hope to get a suitable dataset later).
Then, I read about how to use a pretrained convolutional network (MobileNet, ResNet or Inception) as a feature extractor for LSTM network, such that I use the following code:
inputs = Input(shape = (frames, IMG_SIZE, IMG_SIZE, 3))
cnn_base = InceptionV3(include_top = False, weights='imagenet', input_shape = (IMG_SIZE, IMG_SIZE, 3))
cnn_out = GlobalAveragePooling2D()(cnn_base.output)
cnn = Model(inputs=cnn_base.input, outputs=cnn_out)
encoded_frames = TimeDistributed(cnn)(inputs)
encoded_sequence = LSTM(256)(encoded_frames)
hidden_layer = Dense(1024, activation="relu")(encoded_sequence)
outputs = Dense(50, activation="softmax")(hidden_layer)
model = Model([inputs], outputs)
In this case, when training the model it always shows accuracy ~0.02 (it is the baseline 1/50).
Since the first model at least learned anything, I am wondering if there is any error with the way the network is build in the second case.
Has anybody faced this situation? Any advice?
Thank you.
The reason is you have very small amount of data and retraining the complete Inception V3 weights. Either you have to train the model with more amount of data OR train the model with more number of epochs with hyper parameter tuning. You can find more about hyper parameter training here.
The ideal way is to freeze the base model by base_model.trainable = False and just train the new layers that you have added on top of the Inception V3 layers.
OR
Unfreeze the top layers of the base model(Inception V3 layers) and set the bottom layers to be un-trainable. You can do it as below -
# Let's take a look to see how many layers are in the base model
print("Number of layers in the base model: ", len(base_model.layers))
# Fine-tune from this layer onwards
fine_tune_at = 100
# Freeze all the layers before the `fine_tune_at` layer
for layer in base_model.layers[:fine_tune_at]:
layer.trainable = False

Convolutional neural network Conv1d input shape

I am trying to create a CNN to classify data. My Data is X[N_data, N_features]
I want to create a neural net capable of classifying it. My problem is concerning the input shape of a Conv1D for the keras back end.
I want to repeat a filter over.. let say 10 features and then keep the same weights for the next ten features.
For each data my convolutional layer would create N_features/10 New neurones.
How can i do so? What should I put in input_shape?
def cnn_model():
model = Sequential()
model.add(Conv1D(filters=1, kernel_size=10 ,strides=10,
input_shape=(1, 1,N_features),kernel_initializer= 'uniform',
activation= 'relu'))
model.flatten()
model.add(Dense(N_features/10, init= 'uniform' , activation= 'relu' ))
Any advice?
thank you!
Try:
def cnn_model():
model = Sequential()
model.add(Conv1D(filters=1, kernel_size=10 ,strides=10,
input_shape=(N_features, 1),kernel_initializer= 'uniform',
activation= 'relu'))
model.flatten()
model.add(Dense(N_features/10, init= 'uniform' , activation= 'relu' ))
....
And reshape your x to shape (nb_of_examples, nb_of_features, 1).
EDIT:
Conv1D was designed for a sequence analysis - to have convolutional filters which would be the same no matter in which part of sequence we are. The second dimension is so called features dimension where you could have a vector of multiple features at each of timesteps. One may think about sequence dimension the same as spatial dimensions and feature dimension the same as channel dimension or color dimension in Conv2D. As #putonspectacles mentioned in his comment - you may set sequence dimension to None in order to make your network input length invariant.
#Marcin's answer might work, but might suggestion given the documentation here:
When using this layer as the first layer in a model, provide an
input_shape argument (tuple of integers or None, e.g. (10, 128) for
sequences of 10 vectors of 128-dimensional vectors, or (None, 128) for
variable-length sequences of 128-dimensional vectors.
would be:
model = Sequential()
model.add(Conv1D(filters=1, kernel_size=10 ,strides=10,
input_shape=(None, N_features),kernel_initializer= 'uniform',
activation= 'relu'))
Note that since input data (N_Data, N_features), we set the number of examples as unspecified (None). The strides argument controls the size of of the timesteps in this case.
To input a usual feature table data of shape (nrows, ncols) to Conv1d of Keras, following 2 steps are needed:
xtrain.reshape(nrows, ncols, 1)
# For conv1d statement:
input_shape = (ncols, 1)
For example, taking first 4 features of iris dataset:
To see usual format and its shape:
iris_array = np.array(irisdf.iloc[:,:4].values)
print(iris_array[:5])
print(iris_array.shape)
The output shows usual format and its shape:
[[5.1 3.5 1.4 0.2]
[4.9 3. 1.4 0.2]
[4.7 3.2 1.3 0.2]
[4.6 3.1 1.5 0.2]
[5. 3.6 1.4 0.2]]
(150, 4)
Following code alters the format:
nrows, ncols = iris_array.shape
iris_array = iris_array.reshape(nrows, ncols, 1)
print(iris_array[:5])
print(iris_array.shape)
Output of above code data format and its shape:
[[[5.1]
[3.5]
[1.4]
[0.2]]
[[4.9]
[3. ]
[1.4]
[0.2]]
[[4.7]
[3.2]
[1.3]
[0.2]]
[[4.6]
[3.1]
[1.5]
[0.2]]
[[5. ]
[3.6]
[1.4]
[0.2]]]
(150, 4, 1)
This works well for Conv1d of Keras. For input_shape (4,1) is needed.