Approximating a smooth multidimensional function using Keras to an error of 1e-4 - tensorflow

I am trying to approximate a function that smoothly maps five inputs to a single probability using Keras, but seem to have hit a limit. A similar problem was posed here (Keras Regression to approximate function (goal: loss < 1e-7)) for a ten-dimensional function and I have found that the architecture proposed there, namely:
model = Sequential()
model.add(Dense(128,input_shape=(5,), activation='tanh'))
model.add(Dense(64,activation='tanh'))
model.add(Dense(1,activation='sigmoid'))
model.compile(optimizer='adam', loss='mae')
gives me my best results, converging to a best loss of around 7e-4 on my validation data when the batch size is 1000. Adding or removing more neurons or layers seems to reduce the accuracy. Dropout regularisation also reduces accuracy. I am currently using 1e7 training samples, which took two days to generate (hence the desire to approximate this function). I would like to reduce the mae by another order of magnitude, does anyone have any suggestions how to do this?

I recommend use utilize the keras callbacks ReduceLROnPlateau, documentation is [here][1] and ModelCheckpoint, documentation is [here.][2]. For the first, set it to monitory validation loss and it will reduce the learning rate by a factor(factor) if the loss fails to reduce after a fixed number (patience) of consecutive epochs. For the second also monitor validation loss and save the weights for the model with the lowest validation loss to a directory. After training load the weights and use them to evaluate or predict on the test set. My code implementation is shown below.
checkpoint=tf.keras.callbacks.ModelCheckpoint(filepath=save_loc, monitor='val_loss', verbose=1, save_best_only=True,
save_weights_only=True, mode='auto', save_freq='epoch', options=None)
lr_adjust=tf.keras.callbacks.ReduceLROnPlateau( monitor="val_loss", factor=0.5, patience=1, verbose=1, mode="auto",
min_delta=0.00001, cooldown=0, min_lr=0)
callbacks=[checkpoint, lr_adjust]
history = model.fit_generator( train_generator, epochs=EPOCHS,
steps_per_epoch=STEPS_PER_EPOCH,validation_data=validation_generator,
validation_steps=VALIDATION_STEPS, callbacks=callbacks)
model.load_weights(save_loc) # load the saved weights
# after this use the model to evaluate or predict on the test set.
# if you are satisfied with the results you can then save the entire model with
model.save(save_loc)
[1]: https://keras.io/api/callbacks/reduce_lr_on_plateau/
[2]: https://keras.io/api/callbacks/model_checkpoint/

Related

Validation loss reported seems to be wrong, can preprocessing be the reason?

I'm training a resnet model with Keras, fine tuned on my own images. While training, Tensorboard is constantly reporting a validation loss that seems unrelated to training loss (much higher, see image below where train is orange line and validation blue line). Furthermore when training is finished (for example final losses as reported by Tensorboard could be respectively 0.06 and 0.57) I evaluate the model "manually" and validation loss seems to be in the same range of training loss (ex:0.07).
I suspect that preprocessing could be the reason of this strange result. Essentially the inputs and the outputs of the model are created like this:
inp = tf.keras.Input(input_shape)
resnet = tf.keras.applications.ResNet50V2(include_top=False, input_shape=input_shape, input_tensor=inp,pooling="avg")
# Add ResNet50V2 specific preprocessing method into the model.
preprocessed = tf.keras.layers.Lambda(lambda x: tf.keras.applications.resnet_v2.preprocess_input(x))(inp)
out = resnet(preprocessed)
out = tf.keras.layers.Dense(num_outputs, activation=None)(out)
and the training :
model.compile(
optimizer=tf.keras.optimizers.Adam(lrate),
loss='mse',
metrics=[tf.keras.metrics.MeanSquaredError()],
)
model.fit(
train_dataset,
epochs=epochs,
validation_data=val_dataset,
callbacks=callbacks
)
It's like if preprocessing does not occur when validation loss is calculated but I don't know why.

Is validation curve slight greater or lower in CNN models good?

Can you tell me which one among the two is a good validation vs train plot?
Both of them are trained with same keras sequential layers, but the second one is trained using more number of samples, i.e. augmented the dataset.
I'm a little bit confused about the zigzags in the first plot, otherwise I think it is better than the second.
In the second plot, there are no zigzags but the validation accuracy tends to be a little high than train, is it overfitting or considerable?
It is an image detection model where the first model's dataset size is 5170 and the second had 9743 samples.
The convolutional layers defined for the model building:
tf.keras.layers.Conv2D(128,(3,3), activation = 'relu', input_shape = (150,150,3)),
tf.keras.layers.MaxPool2D(2,2),
tf.keras.layers.Conv2D(64,(3,3), activation = 'relu'),
tf.keras.layers.MaxPool2D(2,2),
tf.keras.layers.Conv2D(32,(3,3), activation = 'relu'),
tf.keras.layers.MaxPool2D(2,2),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(512,activation='relu'),
tf.keras.layers.Dropout(0.5),
tf.keras.layers.Dense(128,activation='relu'),
tf.keras.layers.Dropout(0.25),
tf.keras.layers.Dense(1,activation='sigmoid')
Can the model be improved?
From the graphs the second graph where you have more samples is better. The reason is with more samples the model is trained on a much wider probability distribution of images. So when validation is run you have a better chance of correctly classifying the image. You have a lot of dropout in your model. This is good to prevent over fitting, however it will lower the training accuracy relative to the validation accuracy. Your model seems to be doing well. It might improve if you add additional convolution- max pooling layers. Alternative of course is to use transfer learning. I would recommend efficientnetb3. I also recommend using an adjustable learning rate. The Keras callback ReduceLROnPlateau works well for that purpose. Documentation is here.. Code below shows my recommended settings.
rlronp=tf.keras.callbacks.ReduceLROnPlateau(
monitor='val_loss',
factor=0.5,
patience=2,
verbose=1,
mode='auto'
)
in model.fit include callbacks=[rlronp]

Why my validation loss is smaller than the training loss?

I am consistently getting a higher training loss than the validation loss while training a deep convolution autoencoder. Notice in my train data generator, I am doing data augmentation with Keras zoom_range. If I raise the zoom range like [0.8-4], [0.8,6] etc the gap between training and validation loss keeps increasing.
Is it because training loss is calculated on augmented data? Assuming more augmentation makes it harder for the model to predict(reconstruct) the input image. Or something wrong with my training method? I have attached my code snippet for the training command as well.
checkpoint = ModelCheckpoint(model_save_dir, monitor='val_loss', save_best_only=False, mode='min')
callbacks_list = [checkpoint]
history = model.fit(train_generator, validation_data=val_generator, epochs=n_epochs, shuffle=True, callbacks=callbacks_list)
It looks like your train loss increase as you are increasing data augmentation effect and basically this is because it becomes harder for the model to learn pattern with too much data augmentation.
In my point of view the goal of data augmentation is to make realistic change from initial data to improve the model's robustness like a regularization technique.
However the loss of validation remains the same so I presume the efficiency of the learning phase is not impaired so much. I will have made sure that the distribution of the labels is homogenous and the data from train/val is stratified. I will also have made a test set (without any data augmentation such as the validation set) to make comparaison more valuable.

How to build a Neural Network in Keras using a custom loss function with datapoint-specific weight?

I want to train a Neural Network for a classification task in Keras using a TensorFlow backend with a custom loss function. In my loss, I want to give different weights to different training examples. I have some datapoints I consider important and some I do not consider as important. I want my loss function to take this into account and punish errors in important examples more than in less important ones.
I have already built my model:
input = tf.keras.Input(shape=(16,))
hidden_layer_1 = tf.keras.layers.Dense(5, kernel_initializer='glorot_uniform', activation='relu')(input)
output = tf.keras.layers.Dense(1, kernel_initializer='normal', activation='softmax')(hidden_layer_1)
model = tf.keras.Model(input, output)
model.compile(loss=custom_loss(input), optimizer='adam', run_eagerly=True, metrics = [tf.keras.metrics.Accuracy(), 'acc'])
and the currrent state of my loss function is:
def custom_loss(input):
def loss(y_true, y_pred):
return ...
return loss
I'm struggling with implementing the loss function in the way I explained above, mainly because I don't exactly know what input, y_pred and y_true are (KerasTensors, I know - but what is the content? And is it for one training example only or for the whole batch?). I'd appreciate help with
printing out the values of input, y_true and y_pred
converting the input value to a numpy ndarray ([1,3,7] for example) so I can use the array to look up my weight for this specific training data point
once I have my weigth as a number (0.5 for example), how do I implement the computation of the loss function in Keras? My loss for one training exaple should be 0 if the classification was correct and weight if it was incorrect.

Tensorflow Polynomial Linear Regression curve fit

I have created this Linear regression model using Tensorflow (Keras). However, I am not getting good results and my model is trying to fit the points around a linear line. I believe fitting points around degree 'n' polynomial can give better results. I have looked googled how to change my model to polynomial linear regression using Tensorflow Keras, but could not find a good resource. Any recommendation on how to improve the prediction?
I have a large dataset. Shuffled it first and then spited to 80% training and 20% Testing. Also dataset is normalized.
1) Building model:
def build_model():
model = keras.Sequential()
model.add(keras.layers.Dense(units=300, input_dim=32))
model.add(keras.layers.Activation('sigmoid'))
model.add(keras.layers.Dense(units=250))
model.add(keras.layers.Activation('tanh'))
model.add(keras.layers.Dense(units=200))
model.add(keras.layers.Activation('tanh'))
model.add(keras.layers.Dense(units=150))
model.add(keras.layers.Activation('tanh'))
model.add(keras.layers.Dense(units=100))
model.add(keras.layers.Activation('tanh'))
model.add(keras.layers.Dense(units=50))
model.add(keras.layers.Activation('linear'))
model.add(keras.layers.Dense(units=1))
#sigmoid tanh softmax relu
optimizer = tf.train.RMSPropOptimizer(0.001,
decay=0.9,
momentum=0.0,
epsilon=1e-10,
use_locking=False,
centered=False,
name='RMSProp')
#optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.1)
model.compile(loss='mse',
optimizer=optimizer,
metrics=['mae'])
return model
model = build_model()
model.summary()
2) Train the model:
class PrintDot(keras.callbacks.Callback):
def on_epoch_end(self, epoch, logs):
if epoch % 100 == 0: print('')
print('.', end='')
EPOCHS = 500
# Store training stats
history = model.fit(train_data, train_labels, epochs=EPOCHS,
validation_split=0.2, verbose=1,
callbacks=[PrintDot()])
3) plot Train loss and val loss
enter image description here
4) Stop When results does not get improved
enter image description here
5) Evaluate the result
[loss, mae] = model.evaluate(test_data, test_labels, verbose=0)
#Testing set Mean Abs Error: 1.9020842795676374
6) Predict:
test_predictions = model.predict(test_data).flatten()
enter image description here
7) Prediction error:
enter image description here
Polynomial regression is a linear regression with some extra additional input features which are the polynomial functions of original input features.
i.e.;
let the original input features are : (x1,x2,x3,...)
Generate a set of polynomial functions by adding some transformations of the original features, for example: (x12, x23, x13x2,...).
One may decide which all functions are to be included depending on their constraints such as intuition on correlation to the target values, computational resources, and training time.
Append these new features to the original input feature vector. Now the transformed input feature vector has a size of len(x1,x2,x3,...) + len(x12, x23, x13x2,...)
Further, this updated set of input features (x1,x2,x3,x12, x23, x13x2,...) is feeded into the normal linear regression model. ANN's architecture may be tuned again to get the best trained model.
PS: I see that your network is huge while the number of inputs is only 32 - this is not a common scale of architecture. Even in this particular linear model, reducing the hidden layers to one or two hidden layers may help in training better models (It's a suggestion with an assumption that this particular dataset is similar to other generally seen regression datasets)
I've actually created polynomial layers for Tensorflow 2.0, though these may not be exactly what you are looking for. If they are, you could use those layers directly or follow the procedure used there to create a more general layer https://github.com/jloveric/piecewise-polynomial-layers