I am currently training a tensorflow based image classification algorithm on 3 classes. Two the classes are relatively similar so it is having a hard time and I am currently trial and erroring to see if I can improve the results.
I have encountered the following issue though, is this just a case of it is time to get rid of a bunch of epochs?
My val training loss seems to just be noise as well, is there a way to improve this?
I have tried data augmentation, increasing the number of layers and I am currently experimenting with different dropout amounts.
Any help would be much appreciated.
Related
I am new to deep learning and I try to implement an RNN (with 2 GRU layers).
At first, the network seems to do it's job quite fine. However, I am currently trying to understand the loss and accuracy curve. I attached the pictures below. The dark-blue line is the training set and the cyan line is the validation set.
After 50 epochs the validation loss increases. My assumption is that this indicates overfitting. However, I am unsure why the validation mean absolute error still decreases. Do you maybe got an idea?
One idea I had in mind was that this could be caused by some big outliers in my dataset. Thus I already tried to clean it up. I also tried to scale it properly. I also added a few dropout layers for further regularization (rate=0.2). However these are just normal dropout layers because cudnn does not seem to support recurrent_dropout from tensorflow.
Remark: I am using the negative log-likelihood as loss function and a tensorflow probability distribution as the output dense layer.
Any hints what I should investigate?
Thanks in advance
Edit: I also attached the non-probabilistic plot as recommended in the comment. Seems like here the mean-absolute-error behaves normal (does not improve all the time).
What are the outputs of your model? It sounds pretty strange that you're using the negative log-likelihood (which basically "works" with distributions) as the loss function but the MAE as a metric, which is suited for deterministic continuous values.
I don't know what is your task and perhaps this is meaningful in your specific case, but perhaps the strange behavior comes out from there.
I implement Capsule Network by EM-Routing based-on Sara Sabour & Hinton's article, It works great on MNIST dataset and some other grayscale dataset as same as MNIST such as Hoda (Persian/Arabic Digits) but When I tried on CIFAR10 the accuracy was unbelievable low.
Yup, that is the current problem with Capsule Networks. It works well with MNIST because of the dataset's simplicity. All you require is some edges and blobs to be detected in order to classify each data. For more complex datasets, naively stacking the capsules and hoping it to perform well does not work. However, works are being performed currently to tweak the current CapsNet architecture to make it perform better than now. When CNN was developed during the days, it also had this same problem. It took many years for CNN to be what is it now.
Refer to this paper if you want to know the performance of CapsNet on different datasets : https://arxiv.org/abs/1712.03480
Earlier I mentioned there are works being done to improve CapsNet. However, there are already some works that has been done so far. You can refer to these:
http://openaccess.thecvf.com/content_CVPR_2019/papers/Rajasegaran_DeepCaps_Going_Deeper_With_Capsule_Networks_CVPR_2019_paper.pdf
http://proceedings.mlr.press/v97/jeong19b/jeong19b.pdf
Bear in mind that the time it takes to train CapsNet is very much higher than CNN. Therefore, it is not easy to test out these architectures.
I have a trained a neural network on a classification task, and it is learning, although it's accuracy is not high. I am trying to figure out which test examples it is not confident about, so that I can gain some more insight into what is happening.
In order to do this, I decided to use the standard softmax probabilities in Tensorflow. To do this, I called tf.nn.softmax(logits), and used the probabilities provided here. I noticed that many times the probabilities were 99%, but the prediction was still wrong. As such, even when I only consider examples who have prediction probabilities higher than 99%, I get a poor accuracy, only 2-3 percent higher than my original accuracy.
Does anyone have any ideas as to why the network is so confident about wrong predictions? I am still new to deep learning, so am looking for some ideas to help me out.
Also, is using the softmax probabilities the right way to do determine confidence of predictions from a neural network? If not, is there a better way?
Thanks!
Edit: From the answer below, it seems like my network is just performing poorly. Is there another way to identify which predictions the network makes are likely to be wrong besides looking at the confidence (since the confidence doesn't seem to work well)?
Imagine your samples are split by a vertical line but you NN classifier learnt a horizontal line, in this case any prediction given by your classifier can only obtain 50% accuracy always. However NN will assign higher confidence to the samples which are further away from the horizontal line.
In short, when your model is doing poor classification higher confidence has little to none contribution to accuracy.
Suggestion: Check if the information you needed to do the correct classification are in the data then improve the overall accuracy first.
I'm having some learning experience on tensorflows estimator api. Doing some classification task on a small dataset with tensorflow's tf.contrib.learn.DNNClassifier (I know there is tf.estimator.DNNClassifier but I have to work on tensorflow 1.2) I get the accuracy graph on my test dataset. I wonder why there are these negative peaks.
I thought they could occur because of overfitting and self repairing. The next datapoint after the peak seems to have the same value as the point before.
I tried to look into the code to find any proof that estimator's train function has such a mechanism but did not find any.
So, is there such a mechanism or are there other possible explanations?
I don't think that the Estimator's train functions has any such mechanism.
Some possible theories:
Does your training restart anytime? Its possible that if you have some Estimated Moving Average (EMA) in your model, upon restart the moving average has to be recomputed.
Is your input data randomized? If not, its possible that a patch of input data is all misclassified, and again the EMA is possibly smoothing out.
This is pretty mysterious to me. If you do find out what the real issue is please do share!
I am working on a deep learning (CNN + AEs) approach on facial images.
I have
an input layer of 112*112*3 of facial images
3 convolution + max pooling + ReLU
2 layers of fully connected with 512 neurons with 50% dropout to
avoid overfitting and last output layer with 10 neurons since I have
10 classes.
also used reduce mean of softmax cross entropy and also L2.
For training I divided my dataset to 3 groups of:
60% for training
20% for validation
20% for evaluation
The problem is after few epochs the validation error rate stay fixed value and never changes. I have used tensorflow to implement my project.
I hadn't such problem before with CNNs so I think it's first time. I have checked the code it's based on tensorflow documentation so I don't think if the problem is with the code. Maybe I need to change some parameters but I am not sure.
Any idea about common solutions for such problem?
Update:
I changed the optimizer from momentum to Adam whith default learning rate. For now validation error changes but it's lower than mini batch error most of the time while both have same batch sizes.
I have tested the model with and without biases with 0.1 as initial values but no good fit yet.
Update
I fixed the issue I will update with more details soon.
One common solution that I found helpful for this type of problem is using TensorBoard. You can add details visualize training performance information after each epoch for different points in the computational graph. Adding key metrics is worth it since you can see how training progresses after applying changes in the adaptive learning rate, batch size, neural network architecture, drop out / regularization, number of GPUs, etc.
Here is the link that I found helpful to add these details:
https://www.tensorflow.org/how_tos/graph_viz/#runtime_statistics