efficientnet.tfkeras vs tf.keras.applications.efficientnet - tensorflow

I am trying to use efficientnet to custom train my dataset.
And I find out with all other code/data/config the same.
efficientnet.tfkeras.EfficientNetB0 can gives ~90% training/prediction accruacy and tf.keras.applications.efficientnet.EfficientNetB0 only gives ~70% accuracy.
But I guess both should be the same implementation of the efficient net, or I am missing something here?
I am using latest efficientnet and Tensorflow 2.3.0
with strategy.scope():
model = tf.keras.Sequential([
efficientnet.tfkeras.EfficientNetB0( #tf.keras.applications.efficientnet.EfficientNetB0
input_shape=(IMAGE_SIZE, IMAGE_SIZE, 3),
weights='imagenet',
include_top=False
),
L.GlobalAveragePooling2D(),
L.Dense(1, activation='sigmoid')
])
model.compile(
optimizer='adam',
loss='binary_crossentropy',
metrics=['binary_crossentropy']
)
model.summary()

I did run into the same problem for EfficientNetB4 and did encounter the following:
The number of total parameters are not equal. The trainable parameters are equal, but the non-trainable parameters aren't. The efficientnet.tfkeras has 7 fewer non-trainable parameters than the tf.keras.applications model.
The number of layers are not equal, the efficientnet.tfkeras has fewer layers than tf.keras.application model.
The different layers are at the very beginning, the most noteworthy are the normalization and rescaling layers, which are in the tf.keras.applications model, but not in the efficientnet.tfkeras model. You can observe this yourself using the model.summary() method.
When applying this layer, by using model.layers[i](array), it turn out these layers do rescale the image by dividing it by 255 and applying normalization according to:
(input_image - IMAGENET_MEAN) / square_root(IMAGENET_STD)
Thus, it turns out the image normalization is build into the model. When you perform this normalization yourself to the input image, the image will be normalized twice resulting in extremely small pixel values. The model will therefore have a hard time learning.
TLDR: Do not normalize the input image as it is build into the tf.keras.application model, input images should have values in the range 0-255.

Related

Is validation curve slight greater or lower in CNN models good?

Can you tell me which one among the two is a good validation vs train plot?
Both of them are trained with same keras sequential layers, but the second one is trained using more number of samples, i.e. augmented the dataset.
I'm a little bit confused about the zigzags in the first plot, otherwise I think it is better than the second.
In the second plot, there are no zigzags but the validation accuracy tends to be a little high than train, is it overfitting or considerable?
It is an image detection model where the first model's dataset size is 5170 and the second had 9743 samples.
The convolutional layers defined for the model building:
tf.keras.layers.Conv2D(128,(3,3), activation = 'relu', input_shape = (150,150,3)),
tf.keras.layers.MaxPool2D(2,2),
tf.keras.layers.Conv2D(64,(3,3), activation = 'relu'),
tf.keras.layers.MaxPool2D(2,2),
tf.keras.layers.Conv2D(32,(3,3), activation = 'relu'),
tf.keras.layers.MaxPool2D(2,2),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(512,activation='relu'),
tf.keras.layers.Dropout(0.5),
tf.keras.layers.Dense(128,activation='relu'),
tf.keras.layers.Dropout(0.25),
tf.keras.layers.Dense(1,activation='sigmoid')
Can the model be improved?
From the graphs the second graph where you have more samples is better. The reason is with more samples the model is trained on a much wider probability distribution of images. So when validation is run you have a better chance of correctly classifying the image. You have a lot of dropout in your model. This is good to prevent over fitting, however it will lower the training accuracy relative to the validation accuracy. Your model seems to be doing well. It might improve if you add additional convolution- max pooling layers. Alternative of course is to use transfer learning. I would recommend efficientnetb3. I also recommend using an adjustable learning rate. The Keras callback ReduceLROnPlateau works well for that purpose. Documentation is here.. Code below shows my recommended settings.
rlronp=tf.keras.callbacks.ReduceLROnPlateau(
monitor='val_loss',
factor=0.5,
patience=2,
verbose=1,
mode='auto'
)
in model.fit include callbacks=[rlronp]

Improve multiclass text classification model with LSTM and Glove, Keras and Tensorflow

I have spent some time trying to improve my F1-Score for my multiclass text classification task. I am extraction aspects and sentiments from laptop reviews. Therefore there are 3 labels, B_A / I_A / O etc. I would really appreciate any suggestions to improve my network, for example additional layers or another embedding. (Maybe I should also try something else than multiclass classification for my task)
Now I have got a F1-Score of about 60% for the following code:
#vocab_size=4840, embedding is glove6B, max_seq_length=100
model = Sequential()
model.add(Embedding(vocab_size, 300, weights=[embedding_vectors], input_length=max_seq_length,
trainable= False))
model.add(Dropout(0.1))
model.add(Conv1D(3000, 1, activation='relu'))
model.add(Bidirectional(LSTM(units=150, recurrent_dropout=0, return_sequences=True)))
model.add(Dense(32, activation='relu'))
model.add(Dense(n_tags, activation='softmax'))
model.compile(loss="categorical_crossentropy", optimizer="rmsprop", metrics=["categorical_accuracy"])
model.summary()
# fit model on train data
model.fit(x_train, y_train,
batch_size=64,
epochs=10)
I don't know about the data, but I do have a lot of suggestions in general for mult-text classification with keras:
Instead of adding 1 3000 Conv1D layer, try adding multiple Conv1D layers of a smaller filtering amount
For the 32 neuron Dense layer, try increasing the amount of neurons. Often, when you don't have enough neurons in the layer before the output layer, the model loses accuracy
Instead of adding activation='relu' into the layers, instead try adding a LeakyReLU, so it would fix the dying ReLU problem if it is there
Instead of adding the Dropout after the Embedding layer, add the Dropout after the Conv1D layer. I wouldn't see the need for a Dropout after an untrainable layer made just for vectorizing inputs
If you haven't tried any of my suggestions already, I would recommend trying it. I especially would try the 4th one, as a Dropout after an Embedding layer doesn't seem neccessary.

Extracting activations from a specific layer of neural network

I was working on an image recognition problem. After training the model, I saved the architecture as well as weights. Now I want to use the model for extracting features from other images and perform SVM on that. For this, I want to remove the last two layers of my model and get the values calculated by the CNN and fully connected layers till then. How can I do that in Keras?
# a simple model
model = keras.models.Sequential([
keras.layers.Input((32,32,3)),
keras.layers.Conv2D(16, 3, activation='relu'),
keras.layers.Flatten(),
keras.layers.Dense(10, activation='softmax')
])
# after training
feature_only_model = keras.models.Model(model.inputs, model.layers[-2].output)
feature_only_model take a (32,32,3) for input and the output is the feature vector
If your model is subclassed - just change call() method.
If not:
if your model is complicated - wrap your model by subclassed model and change forward pass in call() method, or
if your model is simple - create model without the last layers, load weights to every layer separately

Tensorfow-lite PReLU Fusion and TransposeConv Bias

When we convert a tf.keras model with PReLU with tf 1.15, the PReLU layers becomes ReLU and seem to get fused with previous operators. As a result, the keras h5 file of 28 MB becomes 1.3 MB in size.It looks like number of parameters gets significantly less since i did not use share weights axes option with PReLU. So, does this conversion work properly without any accuracy loss? Are the weights of PReLU discarded altogether? Similarly does the fusion take into account the bias of transpose convolution layers(bias is not mentioned as input property in netron). Do these fusions preserve the trained weight parameters internally and do they effect the inference accuracy of tflite?
Prelu Fusion:-
input = Input(shape=(512,512,3), name='ip')
x = Conv2D(filters=8, kernel_size=2, strides=2, padding='valid')(input)
x = PReLU()(x) # shared_axes not used
It shows prelu/ReLU in output property
Transpose conv:-
cout1 = Conv2DTranspose(filters=8, kernel_size=2, strides=2, padding = 'same' )(pout1) # Bias is true by default
It does not show bias in output property
So, does the fusion work properly by combining weights or are they being discarded?
If all the values in the weights are zeros it automatically discards them during fusion/conversion. So, PReLU became ReLU after fusion and transpose conv+bias became transpose conv. The problem arises when you convert a model to tflite format before training, since the weights have their default values(zeros).

Why did the Keras Sequential model give a different result compared to Model model?

I've tried a simple lstm model in keras to do a simple sentiment analysis using imdb dataset using both Sequential model and Model model, and turns out the latter gives a worse result. Here's my code :
model = Sequential()
model.add(Embedding(top_words, embedding_vector_length, input_length=max_review_length))
model.add(LSTM(100))
model.add(Dense(2, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='rmsprop', metrics=['accuracy'])
It gives a result around 0.6 of accuracy in the first epoch, while the other code that use Model :
_input = Input(shape=[max_review_length], dtype='int32')
embedded = Embedding(
input_dim=top_words,
output_dim=embedding_size,
input_length=max_review_length,
trainable=False,
mask_zero=False
)(_input)
lstm = LSTM(100, return_sequences=True)(embedded)
probabilities = Dense(2, activation='softmax')(lstm)
model = Model(_input, probabilities)
model.compile(loss='categorical_crossentropy',
optimizer='rmsprop',
metrics=['accuracy'])
and it gives 0.5 accuracy as a result of the first epoch and never change afterwards.
Any reason for that, or am i doing something wrong? Thanks in advance
I see two main differences between your two models :
You have set the embeddings of the second model as "trainable=False". So you have probably a lot fewer parameters to optimize the second model compared to the first one.
The LSTM is returning the whole sequence in the second model, so the outputs shape will be different, so I don't see how you can compare the two models, they are not doing the same thing.