keras prioritizes metrics or loss? - tensorflow

I'm struggling with understanding how keras model works.
When we train model, we give metrics(like ['accuracy']) and loss function(like cross-entropy) as arguments.
What I want to know is which is the goal for model to optimize.
After fitting, leant model maximize accuracy? or minimize loss?

The model optimizes the loss; Metrics are only there for your information and reporting results.
https://en.wikipedia.org/wiki/Loss_function
Note that metrics are optional, but you must provide a loss function to do training.
You can also evaluate a model on metrics that were not added during training.

Keras models work by minimizing the loss by adjusting trainable model parameters via back propagation. The metrics such as training accuracy, validation accuracy etc are provided as information but can also be used to improve your model's performance through the use of Keras callbacks. Documentation for that is located here. For example the callback ReduceLROnPlateau (documentation is here) can be used to monitor a metric like validation loss and adjust the model's learning rate if the loss fails to decrease after a certain number (patience parameter) of consecutive epochs.

Related

Neural network classification loss from validation set: Does it update anything dynamically

I'm attempting t study up a bit on the theory of training neural networks, and right now i have gotten to validation sets.
Now, I can understand that a validation set gives us a loss-index, which helps us in knowing whether we are overfitting or not. But when I read in books and and watch videos, everyone seems to express themselves in a manner that is a bit ambiguous.
Does the model update itself manually when the validation set is being run? can layers, weights, biases, or net-amounts of neurons be updated "automatically" when it is being validated?
Thank you very much
Loss and Accuracy refer to the current loss and accuracy of the training set.
The Loss is being fixed during back propagation of the epoch to improve the Accuracy.
At the end of each epoch your trained Neural Network is evaluated against your validation set. This is what Validation Loss and Validation Accuracy refer to.

CNN model's val_loss go down well but val_loss change a lot

I'm using keras(tensorflow) to train my own CNN model.
As shown in the chart, the train_loss goes down well, but val_loss has big change among each Epoch.
What can be the reason, and what should I do to improve it?
This is typical behavior when training in deep learning. Think about it, your target loss is the training loss, so it is directly affected by the training process and as you said "goes down well". The validation loss is only affected indirectly, so naturally it will be more volatile in comparison.
When you are training, the model is attempting to estimate the real distribution of the data, however all it got is the distribution of the training dataset to rely on (which is similar but not the same).
The big spike at the end of your loss curve might be the result of over-fitting. If you are not using a decaying learning rate during training, I would suggest it.

TensorFlow model saving to be approached differently during training Vs. deployment?

Assume that I have a CNN which I am training on some dataset. The most important part of the model is the CNN architecture.
Now when I write a code, I define the model structure in a Python class. However, outside that class, I define a number of other nodes such as loss, accuracy, tf.Variable to keep count of epochs and so on.
When I am training, for properly resuming the training, I'd like to save all these nodes (e.g - loss, epoch variable etc), and not just the CNN structure.
However, once I am done with training, I would like to save only the CNN architecture and no nodes for loss, accuracy etc. This is because it will enable people using my model to exercise freedom in writing their own finetuning codes.
How to achieve this in TF code ? Can someone show an example ?
Is this approach towards saving followed by others also ? I just want to know if my approach is right.

Behavior of Dropout layers in test / training phase

According to the Keras documentation dropout layers show different behaviors in training and test phase:
Note that if your model has a different behavior in training and
testing phase (e.g. if it uses Dropout, BatchNormalization, etc.), you
will need to pass the learning phase flag to your function:
Unfortunately, nobody talks about the actual differences. Why should dropout behave differently in test phase? I expect the layer to set a certain amount of neurons to 0. Why should this behavior depend on the training/test phase?
Dropout is used in the training phase to reduce the chance of overfitting. As you mention this layer deactivates certain neurons. The model will become more insensitive to weights of other nodes. Basically with the dropout layer the trained model will be the average of many thinned models. Check a more detailed explanation here
However, in when you apply your trained model you want to use the full power of the model. You want to use all neurons in the trained (average) network to get the highest accuracy.

dropout and data.split in model.fit

As we know, dropout is a mechanism to help control overfitting. In the training process of Keras, we can conduct online cross-validation by monitoring the validation loss, and setup data split in model.fit.
Generally, do I need to use both of these mechanisms? Or if I setup data split in model.fit, then I do not need to use dropout.
Dropout is a regularization technique, i.e. it prevents the network from overfitting on your data quickly. The validation loss just gives you an indication of when your network is overfitting. These are two completely different things and having a validation loss does not help you when your model is overfitting, it just shows you that it is.
I would say that having a validation loss is valuable information to have during training and you should never go without it. Whether you need regularization techniques such as noise, dropout or batch normalization depends on how your network learns. If you see that it overfits then you should attempt to employ regularization techniques.