Tensorflow model Inference Accuracy Drooping with Batch Size - tensorflow

I trained a DenseNet121 based model on my data and achieved desired accuracy in training. But During prediction with BATCH=1 the accuracy drops badly. I have found that prediction output is depending upon BATCH SIZE. I get the same accuracy if I keep the BATCH size same as during training but for any other batch size the accuracy is lower. The lower thhe BATCH size , the lower accuracy. Please help as I need to do predictions on single image at a time. Below is the model:-
def make_model():
base_model = DenseNet121(include_top=False, weights="imagenet", input_shape=(128, 128, 3), pooling="max")
inputs = keras.Input(shape=(128, 128, 3))
output = base_model(inputs, training=True)
output = tf.keras.layers.Dropout(0.2)(output)
output = keras.layers.Dense(units=max_seq_length * TOTAL_SYMBOLS)(output)
output = keras.layers.Reshape((max_seq_length, TOTAL_SYMBOLS))(output)
model = keras.Model(inputs, output)
return model
model = make_model()

This is not an answer but I found a way to solve the problem. I created the dense121 network from scratch and used that to train my model, every thing worked fine. I suspect there are some optimization in the keras.application.YOUR_MODEL or in the keras.application.YourModel.pre_processing provided by keras, which are creating this problem. The optimizations seems batch dependednt.

Related

My model fit too slow, tringle of val_loss is 90

I have a task to write a neural network. On input of 9 neurons, and output of 4 neurons for a multiclass classification problem. I have tried different models and for all of them:
Drop-out mechanism is used.
Batch normalization is used.
And the resulting neural networks all are overfitting. Precision is <80%, I want to have min 90% precision. Loss is 0.8 on the median.
Please, can you suggest to me what model I should use?
Dataset:
TMS_coefficients.RData file
Part of my code:
(trainX, testX, trainY, testY) = train_test_split(dataset,
values, test_size=0.25, random_state=42)
# модель нейронки
visible = layers.Input(shape=(9,))
hidden0 = layers.Dense(64, activation="tanh")(visible)
batch0 = layers.BatchNormalization()(hidden0)
drop0 = layers.Dropout(0.3)(batch0)
hidden1 = layers.Dense(32, activation="tanh")(drop0)
batch1 = layers.BatchNormalization()(hidden1)
drop1 = layers.Dropout(0.2)(batch1)
hidden2 = layers.Dense(128, activation="tanh")(drop1)
batch2 = layers.BatchNormalization()(hidden2)
drop2 = layers.Dropout(0.5)(batch2)
hidden3 = layers.Dense(64, activation="tanh")(drop2)
batch3 = layers.BatchNormalization()(hidden3)
output = layers.Dense(4, activation="softmax")(batch3)
model = tf.keras.Model(inputs=visible, outputs=output)
model.compile(optimizer=tf.keras.optimizers.Adam(0.0001),
loss='categorical_crossentropy',
metrics=['Precision'],)
history = model.fit(trainX, trainY, validation_data=(testX, testY), epochs=5000, batch_size=256)
From the loss curve, I can say it is not overfitting at all! In fact, your model is underfitting. Why? because, when you have stopped training, the loss curve for the validation set has not become flat yet. That means, your model still has the potential to do well if it was trained more.
The model overfits when the training loss is decreasing (or remains the same) but the validation loss gradually increases without decreasing. This is clearly not the case
So, what you can do:
Try training longer.
Add more layers.
Try different activation functions like ReLU instead of tanh.
Use lower dropout (probably your model is struggling to learn for high value of dropouts).
Make sure you have shuffled your data before train-test splitting (if you are using sklearn for train_test_split() then it is done by default) and also check if the test data is similar to the train data and both of them goes under the same preprocessing steps.

How to reduce the size of neural network model file in keras? To deploy to OpenMV

I need to train a picture classification model with 15 categories. Because it needs to be deployed to OpenMV, the model size cannot exceed 1M, otherwise it cannot be loaded by openmv. I used the trained model mobilenetv2. As in the example on Keras's website, I placed the convolution basis of mobilenetv2 at the bottom of my model, and then added a softmax activated dense layer at the top. The training effect is good, and the accuracy has reached
90%, but the problem is derived The size of H5 model reaches 3M.
I tried to use a fool migration learning website. https://studio.edgeimpulse.com/
The classification model exported from the website is only 600KB(after INT8 quantized). What causes my model to be too large?
Here is the structure of my network:
def mobile_net_v2(data_augmentation, input_shape):
base_model = keras.applications.mobilenet_v2.MobileNetV2(
weights='imagenet',
input_shape=input_shape,
alpha=0.35,
include_top=False)
inputs = keras.Input(shape=input_shape)
base_model.trainable = False
x = data_augmentation(inputs)
x = layers.Rescaling(1. / 255)(x)
x = base_model(x)
x = layers.Flatten()(x)
outputs = layers.Dense(15, activation='softmax')(x)
return keras.Model(inputs, outputs)
The input size is (96, 96)

How can I properly train a model to predict a moving average using LSTM in keras?

I'm learning how to train RNN model on Keras and I was expecting that training a model to predict the Moving Average of the last N steps would be quite easy.
I have a time series with thousands of steps and I'm able to create a model and train it with batches of data.
If I train it with the following model though, the test set predictions differ a lot from real values. (batch = 30, moving average window = 10)
inputs = tf.keras.Input(shape=(batch_length, num_features))
x = tf.keras.layers.LSTM(10, return_sequences=False)(inputs)
outputs = tf.keras.layers.Dense(num_labels)(x)
model = tf.keras.Model(inputs=inputs, outputs=outputs, name="test_model")
To be able to get good predictions, I need to add another layer of TimeDistributed, getting 2D predictions instead of 1D ones (I get one prediction per each time step)
inputs = tf.keras.Input(shape=(batch_length, num_features))
x = tf.keras.layers.LSTM(10, return_sequences=True)(inputs)
x = tf.keras.layers.TimeDistributed(tf.keras.layers.Dense(num_labels))(x)
outputs = tf.keras.layers.Dense(num_labels)(x)
model = tf.keras.Model(inputs=inputs, outputs=outputs, name="test_model")
I suggest that if your goal is to give as input the last 10 timesteps and have as a prediction the moving average to try a regressor model with Densely Connected layers rather than an RNN. (Linear activation with regularization might work well enough)
That option would be cheaper to train and run than an LSTM

How to avoid overfitting in CNN?

I'm making a model for predicting the age of people by analyzing their face. I'm using this pretrained model, and maked a custom loss function and a custom metrics. So I obtain discrete result but I want to improve it. In particular, I noticed that after some epochs the model begin to overfitt on the training set then the val_loss increases. How can I avoid this? I'm already using Dropout, but this doesn't seem to be enough.
I think maybe I should use l1 and l2 but I don't know how.
def resnet_model():
model = VGGFace(model = 'resnet50')#model :{resnet50, vgg16, senet50}
xl = model.get_layer('avg_pool').output
x = keras.layers.Flatten(name='flatten')(xl)
x = keras.layers.Dense(4096, activation='relu')(x)
x = keras.layers.Dropout(0.5)(x)
x = keras.layers.Dense(4096, activation='relu')(x)
x = keras.layers.Dropout(0.5)(x)
x = keras.layers.Dense(11, activation='softmax', name='predictions')(x)
model = keras.engine.Model(model.input, outputs = x)
return model
model = resnet_model()
initial_learning_rate = 0.0003
epochs = 20; batch_size = 110
num_steps = train_x.shape[0]//batch_size
learning_rate_fn = tf.keras.optimizers.schedules.PiecewiseConstantDecay(
[3*num_steps, 10*num_steps, 16*num_steps, 25*num_steps],
[1e-4, 1e-5, 1e-6, 1e-7, 5e-7]
)
optimizer = tf.keras.optimizers.Adam(learning_rate=learning_rate_fn)
model.compile(loss=custom_loss, optimizer=optimizer, metrics=['accuracy', one_off_accuracy])
model.fit(train_x, train_y, epochs=epochs, batch_size=batch_size, validation_data=(test_x, test_y))
This is an example of result:
There are many regularization methods to help you avoid overfitting your model:
Dropouts:
Randomly disables neurons during the training, in order to force other neurons to be trained as well.
L1/L2 penalties:
Penalizes weights that change dramatically. This tries to ensure that all parameters will be equally taken into consideration when classifying an input.
Random Gaussian Noise at the inputs:
Adds random gaussian noise at the inputs: x = x + r where r is a random normal value from range [-1, 1]. This will confuse your model and prevent it from overfitting into your dataset, because in every epoch, each input will be different.
Label Smoothing:
Instead of saying that a target is 0 or 1, You can smooth those values (e.g. 0.1 & 0.9).
Early Stopping:
This is a quite common technique for avoiding training your model too much. If you notice that your model's loss is decreasing along with the validation's accuracy, then this is a good sign to stop the training, as your model begins to overfit.
K-Fold Cross-Validation:
This is a very strong technique, which ensures that your model is not fed all the time with the same inputs and is not overfitting.
Data Augmentations:
By rotating/shifting/zooming/flipping/padding etc. an image you make sure that your model is forced to train better its parameters and not overfit to the existing dataset.
I am quite sure there are also more techniques to avoid overfitting. This repository contains many examples of how the above techniques are deployed in a dataset:
https://github.com/kochlisGit/Tensorflow-State-of-the-Art-Neural-Networks
You can try incorporate image augmentation in your training, which increases the "sample size" of your data as well as the "diversity" as #Suraj S Jain mentioned. The official tutorial is here: https://www.tensorflow.org/tutorials/images/data_augmentation

Recurrent Neural Network Mini-Batch dependency after trained

Currently, I have a neural network, built in tensorflow that is used to classify time sequence data into one of 6 categories. The network is composed of:
2 fully connected layers -> LSTM unit -> softmax -> output
All layers have regularization in the form of dropout and or layer normalization. In order to speed up the training process, I am using mini-batching of the data, where the mini-batch size = # of categories = 6. Each mini-batch contains exactly one sample for each of the 6 categories, arranged randomly in the mini-batch. Below is the feed-forward code, where x is of shape [batch_size, number of time steps, number of features], and the various get commands are simple definitions for creating standard fully connected layers and LSTM units with regularization.
def getFullyConnected(input ,hidden ,dropout, layer, phase):
weight = tf.Variable(tf.random_normal([input.shape.dims[1].value,hidden]), name="weight_layer"+str(layer))
bias = tf.Variable(tf.random_normal([1]), name="bias_layer"+str(layer))
layer = tf.add(tf.matmul(input, weight), bias)
layer = tf.contrib.layers.batch_norm(layer,
center=True, scale=True,
is_training=phase)
layer = tf.minimum(tf.nn.relu(layer), FLAGS.relu_clip)
layer = tf.nn.dropout(layer, (1.0 - dropout))
return layer
def RNN(x, weights, biases, time_steps):
#shape the input as [batch_size*time_steps, input_depth]
x = tf.reshape(x, [-1,input_depth])
layer1 = getFullyConnected(input=x, hidden=16, dropout=full_drop, layer=1, phase=True)
layer2 = getFullyConnected(input=layer1, hidden=input_depth*3, dropout=full_drop, layer=2, phase=True)
rnn_input = tf.reshape(layer2, [-1,time_steps,input_depth*3])
# 1-layer LSTM with n_hidden units.
LSTM_cell = getLSTMcell(n_hidden)
#generate prediction
outputs, state = tf.nn.dynamic_rnn(LSTM_cell,
rnn_input,
dtype=tf.float32,
time_major=False)
#good old tensorboard saves
tf.summary.histogram('weight', weights['out'])
tf.summary.histogram('bias',biases['out'])
#there are time_steps outputs, but only grab the last output for the classification
return tf.sigmoid(tf.matmul(outputs[:,-1,:], weights['out']) + biases['out'])
Surprisingly, this network trained extremely well giving me about 99.75% accuracy on my test data (which the trained network had never seen). However, it only scored this high when I fed the training data into the network with a mini-batch size the same as during training, 6. If I only fed the training data one sample at a time (mini-batch size = 1), the network was scoring around 60%. What is weird is that, if I train the network with only single samples (mini-batch size = 1), the trained network works perfectly fine with high accuracy once the network is trained. This leads me to the weird conclusion that the network is almost learning to utilize the batch size in its learning, so much so that it becomes dependent on the mini-batch to classify correctly.
Is it a thing for a deep network to become dependent on the size of the mini-batch during training, so much that the final trained network will require input data to have the same mini-batch size just to perform correctly?
All ideas or thoughts would be loved!