Regression Loss Function Working Perfectly on My Classification Model - tensorflow

I have built a model that detects what type of shot a table tennis player is performing using TensorFlow. After I built my Neural Network, the model I am dealing with seems to be a multi-label classification model. The binary cross-entropy and categorical cross-entropy gave bad loss and accuracy while using MSE and MAE gave 98% accuracy and 0.004 loss in both cases.
Why is this happening, although I have Supervised Learning data with 3 output labels as shown in the figure below:
The dataset I have collected showing 3 output labels

If your learner has .98 for R squared (of I understand you well), it is likely that you're overfitting and will hence have poor testing predictions. Prediction errors that low are typically symptomatic of overfitting... but honestly, this is likely a better query for cross-validated.

Related

Tensorflow loss converging but model fails to predict even on train data

Using ANN with Tensorflow to train a simple known equation Y=Sin(X) or Y=Cos(X). My loss function is converging properly.
Loss function convergence graph. If loss function converges it means model has fitted well to my training dataset.
However, when I predict passing in argument training set itself, model fails to predict even train data which is strange.
Here it can be seen that after 200th value there model shows no training at all
If the loss has converged then model should fit the train dataset perfectly but that is not happening here. What is wrong in my code?
X = np.linspace(0,10*np.pi,1000)
Y = np.sin(X)
model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Dense(500,input_shape=(1,),activation='relu'))
model.add(tf.keras.layers.Dense(1))
opt = tf.keras.optimizers.Adam(0.01)
model.compile(optimizer=opt,loss='mse')
r= model.fit(X.reshape(-1,1),Y,epochs=100)
plt.plot(r.history['loss'])
Yhat = model.predict(X.reshape(-1,1)).flatten()
plt.plot(Y)
plt.plot(Yhat)
It is the nature of your data.
It made me remember the old paper which showed that the ANN can't compute even the XOR
Anyway the reason here is that your model is shallow and shallow networks are much less efficient than deep networks. To put in perspective a model like below
model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Dense(20,input_shape=(1,),activation='relu'))
model.add(tf.keras.layers.Dense(20,activation='relu'))
model.add(tf.keras.layers.Dense(1))
Will likely perform better even though it has only 1/3 of the parameters of the original model and that is cause the deeper you go the more complex representations can the model create. The core thing to remember is
THE DEEP LEARNING MODEL DON'T BUILD NON-LINEAR DECISION BOUNDARIES as EACH AND EVERY
UNIT IS FUNDAMENTALLY DESIGNED TO CREATE SOME LINEAR DECISION BOUNDARY. so what does
it do? IT FROM STACKING THOSE LINEAR DECISION BOUNDARIES MAKE A REPRESENTATION OF
DATA WHICH IS LINEARLY SEPARABLE.
Also, the most important things is to know your data. In this case using the Probabilistic Models will give almost perfect results. You can easily implement those using TensorFlow probability.

Overfitting DL model?

I am trying to build a Deep Learning model to pick out Tropical Cyclones in weather model data. I have collected the data, normalized it in the region [0, 1] and passed it to my early model. Then I plotted my loss and accuracy curves as here. I am getting weird curves as the validation loss starts increasing after ~50 epochs, indicating overfitting, but the validation accuracy is still increasing. Is my model overfitting (at around 50 epochs) or not?
These graphs are classical graphs that comes with overfitting! You can recognize overfit because even though training accuracy keeps increasing, validation accuracy does not. To prevent overfit, there are several approaches; too numerous to name in one answer. You could apply L1/L2 regularization, dropout, or artificially expand your training data (amongst others).

Training Inception V2 from scratch - diverging

As a learning exercise, I'm training the Inception (v2) model from scratch using the ImageNet dataset from the Kaggle competition. I've heard people say it took them a week or so of training on a GPU to converge this model in this same dataset. I'm currently training it on my MacBook Pro (single CPU), so I'm expecting it to converge in no less than a month or so.
Here's my implementation of the Inception model. Input is 224x224x3 images, with values in range [0, 1].
The learning rate was set to a static 0.01 and I'm using the stochastic gradient descent optimizer.
My question
After 48 hours of training, the training loss seems to indicate that it's learning from the training data, but the validation loss is beginning to get worse. Ordinarily, this would feel like the model is overfitting. Does it look like something might be wrong with my model or dataset, or is this perfectly expected, since I've only trained 5.8 epochs?
My training and validation loss and accuracy after 1.5 epochs.
Training and validation loss and accuracy after 5.8 epochs.
Some input images as seen by the model, as well as the output of one of the early convolution layers.

Optimizer and Estimator in Neural Networks

When I started with Neural it seemed I understood Optimizers and Estimators well.
Estimators: Classifier to classify the value based on sample set and Regressor to predict the value based on sample set.
Optimizer: Using different optimizers (Adam, GradientDescentOptimizer) to minimise the loss function, which could be complex.
I understand every estimators come up with an Default optimizer internally to perform minimising the loss.
Now my question is how do they fit in together and optimize the machine training?
short answer: loss function link them together.
for example, if you are doing a classification, your classifier can take input and output a prediction. then you can calculate your loss by take predicted class and ground truth class. the task of your optimizer is to minimize the loss by modifying the parameter of your classifier.

Tensorflow: loss decreasing, but accuracy stable

My team is training a CNN in Tensorflow for binary classification of damaged/acceptable parts. We created our code by modifying the cifar10 example code. In my prior experience with Neural Networks, I always trained until the loss was very close to 0 (well below 1). However, we are now evaluating our model with a validation set during training (on a separate GPU), and it seems like the precision stopped increasing after about 6.7k steps, while the loss is still dropping steadily after over 40k steps. Is this due to overfitting? Should we expect to see another spike in accuracy once the loss is very close to zero? The current max accuracy is not acceptable. Should we kill it and keep tuning? What do you recommend? Here is our modified code and graphs of the training process.
https://gist.github.com/justineyster/6226535a8ee3f567e759c2ff2ae3776b
Precision and Loss Images
A decrease in binary cross-entropy loss does not imply an increase in accuracy. Consider label 1, predictions 0.2, 0.4 and 0.6 at timesteps 1, 2, 3 and classification threshold 0.5. timesteps 1 and 2 will produce a decrease in loss but no increase in accuracy.
Ensure that your model has enough capacity by overfitting the training data. If the model is overfitting the training data, avoid overfitting by using regularization techniques such as dropout, L1 and L2 regularization and data augmentation.
Last, confirm your validation data and training data come from the same distribution.
Here are my suggestions, one of the possible problems is that your network start to memorize data, yes you should increase regularization,
update:
Here I want to mention one more problem that may cause this:
The balance ratio in the validation set is much far away from what you have in the training set. I would recommend, at first step try to understand what is your test data (real-world data, the one your model will face in inference time) descriptive look like, what is its balance ratio, and other similar characteristics. Then try to build such a train/validation set almost with the same descriptive you achieve for real data.
Well, I faced the similar situation when I used Softmax function in the last layer instead of Sigmoid for binary classification.
My validation loss and training loss were decreasing but accuracy of both remained constant. So this gave me lesson why sigmoid is used for binary classification.