Tensorflow Hub vs Keras application - performance drop - tensorflow

I have image classification problem and i want to use Keras pretrained models for this task.
When I use such a model
model = tf.keras.Sequential([
hub.KerasLayer("https://tfhub.dev/google/tf2-preview/mobilenet_v2/feature_vector/4",
output_shape=[1280],
trainable=False),
tf.keras.layers.Dropout(0.5),
tf.keras.layers.Dense(num_classes, activation='softmax')
])
model.build([None, image_size[0], image_size[1], 3])
model.compile(
optimizer=tf.keras.optimizers.Adam(),
loss='categorical_crossentropy',
metrics=['acc'])
I easily get ~90% accuracy and very low loss on balanced dataset. However, if use keras.application like that:
`base_model = tf.keras.applications.mobilenet_v2.MobileNetV2(
input_shape=input_img_size,
include_top=False,
weights='imagenet'
)
base_model.trainable = False
model = tf.keras.layers.Dropout(0.5)(model)
model = tf.keras.layers.Dense(num_classes, activation='softmax')(model)
model = tf.keras.models.Model(inputs=base_model.input, outputs=model)
model.compile(
optimizer=tf.keras.optimizers.Adam(),
loss='categorical_crossentropy',
metrics=['acc'])`
and use a proper tf.keras.application.mobilenet_v2.preprocess_input function in datagenerator (and leaving everything else the same) it is stuck at around 60% validation and 80% training.
what is the difference between these approaches? why one is superior to the other?
The data generator:
datagen = tf.keras.preprocessing.image.ImageDataGenerator(
preprocessing_function = preprocessing_function,
rotation_range=10,
zoom_range=0.3,
width_shift_range=0.2,
height_shift_range=0.2,
horizontal_flip=True,
vertical_flip=True,
shear_range=0.2,
)
Training:
history = model.fit_generator(
train_generator,
epochs=nb_epochs,
verbose=1,
steps_per_epoch=steps_per_epoch,
validation_data=valid_generator,
validation_steps=val_steps_per_epoch,
callbacks=[
checkpoint,
learning_rate_reduction,
csv_logger,
tensorboard_callback,
],
)

I believe you are training two different 'models'. In your TensorFlow Hub example, you used mobilenet's feature vector. Feature vector as I understand it, is not the same as a model. It is a 1-D tensor of certain length. It is probably the last layer before the output of the mobilenet model. This is different from the tf.keras example, where you are invoking the full mobilenet model.

Related

Softmax activation gives worst performance with loss sparse_categorical_crossentropy

I have a simple Keras sequential model.
I have N categories and i have to predict in which category the next point will fall based on the previous one.
The weird thing is that when i remove the Softmax activation function from the output layer the performance are better (lower loss and highest sparse_categorical_accuracy).
As loss i'm using the sparse_categorical_crossentropy with logits=True.
Is there any reason for that? Should not be the opposite?
Thank you in advance for any suggestion!
def build_model(vocab_size, embedding_dim, rnn_units, batch_size):
model = tf.keras.Sequential([
tf.keras.layers.Embedding(vocab_size, embedding_dim,
batch_input_shape=[batch_size, None]),
tf.keras.layers.GRU(rnn_units,
return_sequences=True,
stateful=True,
recurrent_initializer='glorot_uniform'),
tf.keras.layers.Dense(vocab_size, activation='softmax')
])
return model
model = build_model(
vocab_size = vocab_size,
embedding_dim=embedding_dim,
rnn_units=rnn_units,
batch_size=BATCH_SIZE)
def loss(labels, logits):
return tf.keras.losses.sparse_categorical_crossentropy(labels, logits, from_logits=True)
model.compile(optimizer='adam', loss=loss, metrics=['sparse_categorical_accuracy'])
EPOCHS = 5
history = model.fit(train_set, epochs=EPOCHS, validation_data=val_set,)
In a nutshell, when you are using the option from_logits = True, you are telling the loss function that your neural network output is not normalized. Since you are using softmax activation in your last layer, your outputs are indeed normalized, so you have two options:
Remove the softmax activation as you have already tried. Keep in mind that, after this, your output probabilities won't be normalized.
Use from_logits = False.

tensorflow 2 evaluate inconsistent with sklearn accuracy_score

I try to train a model to predict gender using Celeba dataset and tensorflow.
This is my model:
train_data_gen = train_image_generator.flow_from_dataframe(
dataframe=train_split,
directory=celeba.images_folder,
x_col='id',
y_col='Male',
target_size=(IMG_WIDTH, IMG_HEIGHT),
batch_size=batch_size,
classes=['1', '0']
)
base_model = tf.keras.applications.MobileNetV2(input_shape=IMG_SHAPE,
include_top=False,
weights='imagenet')
model = tf.keras.Sequential([
base_model,
tf.keras.layers.GlobalAveragePooling2D(),
tf.keras.layers.Dense(512, activation='relu'),
tf.keras.layers.Dropout(0.5),
tf.keras.layers.Dense(2),
tf.keras.layers.Softmax()
])
base_learning_rate = 0.001
model.compile(optimizer=tf.keras.optimizers.RMSprop(lr=base_learning_rate),
loss=tf.keras.losses.CategoricalCrossentropy(from_logits=True),
metrics=['accuracy'])
Then I use the following to evaluate the model
test_data_gen = test_image_generator.flow_from_dataframe(
dataframe=test_split,
directory=celeba.images_folder,
x_col='id',
y_col='Male',
target_size=(IMG_WIDTH, IMG_HEIGHT),
batch_size=batch_size,
classes=['1', '0']
)
model = tf.keras.models.load_model("cp-0004.ckpt")
# Re-evaluate the model
loss, acc = model.evaluate(test_data_gen, verbose=2)
which gives accuracy of 0.87
But when I use the following, I get 0.51 accuracy!
pred_test = model.predict(test_data_gen)
pred_df = pd.DataFrame(pred_test, columns=["Male", "Female"])
pred_df[pred_df > 0.5] = "1"
pred_df[pred_df < 0.5] = "0"
# test_split_raw = celeba.split('test', drop_zero=False)
confusion_matrix(test_split["Male"].astype(int).values, np.argmax(pred_df.values, 1))
Can anyone explain why the accuracy from the evaluate function is different?
You want to check test_image_generator.flow_from_dataframe. The default value of shuffle is set to True.
Your generator object therefore yields randomly from your test data.
Your model then predicts for those randomly generated images, but you compare to your ordered dataframe. If you want to compare to test_split["Male"] set shuffle to False. If you don't set shuffle to False you will always get ~0.5 accuracy (If your data is equally distributed)
Another hint: You should use the .evaluate() method if you have labeled data. Using .evaluate() also yields accuracy.
Use .predict() only for new, unlabeled data.

AlreadyExistsError while training a network on colab

I'm trying to train an LSTMs network on Google Colab. However, this error occurs:
AlreadyExistsError: Resource __per_step_116/training_4/Adam/gradients/bidirectional_4/while/ReadVariableOp/Enter_grad/ArithmeticOptimizer/AddOpsRewrite_Add/tmp_var/N10tensorflow19TemporaryVariableOp6TmpVarE
[[{{node training_4/Adam/gradients/bidirectional_4/while/ReadVariableOp/Enter_grad/ArithmeticOptimizer/AddOpsRewrite_Add/tmp_var}}]]
I don't know where can be the issue. This is the model of the network:
sl_model = keras.models.Sequential()
sl_model.add(keras.layers.Embedding(max_index+1, hidden_size, mask_zero=True))
sl_model.add(keras.layers.Bidirectional(keras.layers.LSTM(hidden_size,
activation='tanh', dropout=0.2, recurrent_dropout = 0.2, return_sequences=True)))
sl_model.add(keras.layers.Bidirectional(keras.layers.LSTM(hidden_size, activation='tanh', dropout=0.2, recurrent_dropout = 0.2, return_sequences=False))
)
sl_model.add(keras.layers.Dense(max_length, activation='softsign'))
optimizer = keras.optimizers.Adam()
sl_model.compile(loss='binary_crossentropy', optimizer=optimizer, metrics=['acc'])
batch_size = 128
epochs = 3
cbk = keras.callbacks.TensorBoard("logging/keras_model")
print("\nStarting training...")
sl_model.fit(x_train, y_train, epochs=epochs, batch_size=batch_size,
shuffle=True, validation_data=(x_dev, y_dev), callbacks=[cbk])
Thank you so much!
You need to restart your runtime -- this happens when you have defined multiple graphs built in a single jupyter (Colaboratory) runtime.
Calling tf.reset_default_graph() may also help, but depending on whether you are using eager exection and how you've defined your sessions this may or may not work.

My loss is "nan" and accuracy is " 0.0000e+00 " in Transfer learning: InceptionV3

I am working on transfer learning. My use case is to classify two categories of images. I used InceptionV3 to classify images. When training my model, I am getting nan as loss and 0.0000e+00 as accuracy in every epoch. I am using 20 epochs because my data amount is small: I got 1000 images for training and 100 for testing and per batch 5 records.
from keras.applications.inception_v3 import InceptionV3
from keras.preprocessing import image
from keras.models import Model
from keras.layers import Dense, GlobalAveragePooling2D
from keras import backend as K
# create the base pre-trained model
base_model = InceptionV3(weights='imagenet', include_top=False)
# add a global spatial average pooling layer
x = base_model.output
x = GlobalAveragePooling2D()(x)
# let's add a fully-connected layer
x = Dense(1024, activation='relu')(x)
x = Dense(512, activation='relu')(x)
x = Dense(32, activation='relu')(x)
# and a logistic layer -- we have 2 classes
predictions = Dense(1, activation='softmax')(x)
# this is the model we will train
model = Model(inputs=base_model.input, outputs=predictions)
for layer in base_model.layers:
layer.trainable = False
# we chose to train the top 2 inception blocks, i.e. we will freeze
# the first 249 layers and unfreeze the rest:
for layer in model.layers[:249]:
layer.trainable = False
for layer in model.layers[249:]:
layer.trainable = True
model.compile(loss="binary_crossentropy", optimizer="adam", metrics=["accuracy"])
from keras.preprocessing.image import ImageDataGenerator
train_datagen = ImageDataGenerator(
rescale=1./255,
shear_range=0.2,
zoom_range=0.2,
horizontal_flip=True)
test_datagen = ImageDataGenerator(rescale=1./255)
training_set = train_datagen.flow_from_directory(
'C:/Users/Desktop/Transfer/train/',
target_size=(64, 64),
batch_size=5,
class_mode='binary')
test_set = test_datagen.flow_from_directory(
'C:/Users/Desktop/Transfer/test/',
target_size=(64, 64),
batch_size=5,
class_mode='binary')
model.fit_generator(
training_set,
steps_per_epoch=1000,
epochs=20,
validation_data=test_set,
validation_steps=100)
It sounds like your gradient is exploding. There could be a few reasons for that:
Check that your input is generated correctly. For example use the save_to_dir parameter of flow_from_directory
Since you have a batch size of 5, fix the steps_per_epoch from 1000 to 1000/5=200
Use sigmoid activation instead of softmax
Set a lower learning rate in Adam; to do that you need to create the optimizer separately like adam = Adam(0.0001) and pass it in model.compile(..., optimizer=adam)
Try VGG16 instead of InceptionV3
Let us know when you tried all of the above.
Using Softmax for the activation does not make sense in case of single class. Your output value will always be normed by itself, thus equals to 1. The purpose of softmax is to make the values sum up to 1. In case of single value you will get it == 1. I believe at some moment in time you got 0 as predicted value, which resulted in zero division and NaN loss value.
You should either change the number of classes to 2 by:
predictions = Dense(2, activation='softmax')(x)
class_mode='categorical' in flow_from_directory
loss="categorical_crossentropy"
or use the sigmoid activation function for the last layer.

How to use batch trained model, to predict on single input?

I have RNN model that have been trained on Dataset:
train = tf.data.Dataset.from_tensor_slices((data_x[:train_size],
data_y[:train_size])).batch(batch_size).repeat()
model:
model = tf.keras.Sequential()
model.add(tf.keras.layers.GRU(units=lstm_num_units,
return_sequences=True,
kernel_initializer='random_uniform',
recurrent_initializer='random_uniform',
bias_initializer='random_uniform',
batch_size=batch_size,
input_shape = [seq_len, num_features]))
model.add(tf.keras.layers.LSTM(units=lstm_num_units,
batch_size=batch_size,
return_sequences=True,
input_shape = [seq_len, num_features]))
model.add(tf.keras.layers.Flatten())
model.add(tf.keras.layers.Dense(units=dence_units))
model.add(tf.keras.layers.Dropout(drop_flat))
model.add(tf.keras.layers.Dense(units=out_units))
model.add(tf.keras.layers.Softmax())
model.compile(loss="sparse_categorical_crossentropy",
optimizer=tf.train.RMSPropOptimizer(opt),
metrics=['accuracy'])
model.fit(train, epochs=EPOCHS,
steps_per_epoch=repeat_size_train,
validation_data=validate,
validation_steps=repeat_size_validate,
verbose=1,
shuffle=True)
callbacks=[tensorboard, cp_callback])
I need to do prediction on single input of seq_len, but looks like my input have to be of a batch size:
ar = np.random.randint(98, size=[batch_size, seq_len])
ar = np.reshape(ar, [batch_size, seq_len, 1])
prediction = model.m.predict(ar)
Is there a way to make it work on a single input of shape [1, seq_len, 1]?
Yes, simply rebuild the model without a batch size in the first layer.
Copy the weights of the old model.
newModel.set_weights(oldModel.get_weights())
The purpose of the batch size only exists in stateful=True models to keep consistency between batches.
Even though, there is no mathematical change due to batch size.