Does changing a token name in an image caption model affect performance? - tensorflow

If I train an image caption model then stop to rename a few tokens:
Should I train the model from scratch?
Or can I reload the model and continue training from the last epoch with the updated vocabulary?
Will either approach effect model accuracy/performance differently?

I would go for option 2.
When training the model from scratch, you are initializing the model's weights randomly and then you fit them based on your problem. However, if, instead of using random weights, you use weights that have already been trained for a similar problem, you may decrease the convergence time. This option is kind similar to the idea of transfer learning.

Just to give the other team a voice: So what is actually the difference between training from scratch and reloading a model and continuing training?
(2) will converge faster, (1) will probably have a better performance and should thus be chosen. Do we actually care about training times when we trade them off with performance - do you really? See you do not.
The further your model is already converged to a specific problem, the harder it gets to get it back into another optimum. Now you might be lucky and the chance, that you are going down the right rabid hole, rises with similar tasks and similar data. Yet with a change in your setup this can not be guaranteed.
Initializing a few epochs on other than your target domain, definitely makes sense and is beneficial, yet the question arises why you would not train on your target domain from the very beginning.
Note: For a more substantial read I'd like to refer you to this paper, where they explain in more depth why domain is of the essence and transfer learning could mess with your final performance.

It depends on the number of tokens being relabeled compared to the total amount. Just because you mentioned there are few of them, then the optimal solution in my opinion is clear.
You should start the training from scratch but initialize the weights with the values they had from wherever the previous training stopped (again mentioning that it is crucial that the samples that are being re-labeled are not of substantial amount). This way, the model will likely converge faster than starting with random weights and also better than trying to re-fit ("forget") what it managed to learn from the previous training.
Topologically speaking you are initializing in a position where the model is closer to a global minimum but has not made any steps towards a local minimum.
Hope this helps.

Related

How to automatically judge whether the training process of the deep learning model is converged?

When training a deep learning model, I have to look at the loss curve and performance curve to judge whether the training process of the deep learning model is converged.
This has cost me a lot of time. Sometimes, the time of convergence judged by the naked eye is not accurate.
Therefore, I'd like to know whether there exists an algorithm or a package that can automatically judge whether the training process of the deep learning model is converged.
Can anyone help me?
Thanks a lot.
To the risk of disappointing you, I believe there is no such universal algorithm. In my experience, it depends on what you want to achieve, which metrics are important to you and how much time you are willing to let the training go on for.
I have already seen validation losses dramatically go up (a sign of overfitting) while other metrics (mIoU in this case) were still improving on the validation set. In these cases, you need to know what your target is.
It is possible (although it is very rare) that your loss goes up for a substantial amount of time before going down again and reach better levels than before. There is no way to anticipate this.
Finally, and this is arguably a common case if you have tons of training data, your validation loss may continually go down, but do so slower and slower. In this case, the best strategy if you had an infinite amount of time would be to let it keep the training going indefinitely. In practice, this is impossible, and you would need to find the right balance between performance and training time.
If you really need an algorithm, I would suggest this quite simple one :
Compute a validation metric M(i) after each ith epoch on a fixed subset of your validation set or the whole validation set. Let's suppose that the higher M(i)is, the better. Fix k an integer depending on the duration of one training epoch (k~3 should do the trick)
If for some n you have M(n) > max(M(n+1), ..., M(n+k)), stop and keep the network you had at epoch n.
It's far from perfect, but should be enough for simple tasks.
[Edit] If you're not using it yet, I invite you to use TensorBoard to visualize the evolution of your metrics throughout the training. Once set up, it is a huge gain of time.

Object Detection model stuck at low mAP

I am trying to reproduce the results of the SSDLite model reported in the MobileNetV2 paper (arXiv:1801.04381), which should achieve about 22.1% mAP on the COCO detection challenge. However, I am stuck at 9% mAP. This is strange behavior because the model does work somewhat, but is still far off from the reported result. Can this much of a gap be caused by hyperparameters/optimizer choices (I am using adam instead of sgd), or is it almost certain that there is a bug in my implementation?
It is also worth mentioning that the model successfully overfits a small subset of the training set, but on the whole training set the loss seems to reach a plateau fairly quickly.
Has anyone encountered a problem similar to this?
Can this much of a gap be caused by hyperparameters/optimizer choices
(I am using adam instead of sgd), or is it almost certain that there
is a bug in my implementation?
Even small changes in the hyperparameters and a different optimizer choice can impact the training and the resulting precision of the classifier a lot. So your low precision might not necessary be due to a bug but could also be due to wrong parametrization.
It is also worth mentioning that the model successfully overfits a
small subset of the training set, but on the whole training set the
loss seems to reach a plateau fairly quickly.
Seems like you run into a local optimum which only works for a subset of your data, which could also be a pointer for an suboptimal parameterization.
Like #Matias Valdenegro also mentioned, to reproduce the exact result you might have to use the same parameters as in the original implementation.

Deep learning basic thoughts

I try to understand the basics of deep learning, lastly reading a bit through deeplearning4j. However, I don't really find an answer for: How does the training performance scale with the amount of training data?
Apparently, the cost function always depends on all the training data, since it just sums the squared error per input. Thus, I guess at each optimization step, all datapoints have to be taken into account. I mean deeplearning4j has the dataset iterator and the INDArray, where the data can live anywhere and thus (I think) doesn't limit the amount of training data. Still, doesn't that mean, that the amount of training data is directly related to the calculation time per step within the gradient descend?
DL4J uses iterator. Keras uses generator. Still the same idea - your data comes in batches, and used for SGD. So, minibatches matter, not the the whole amount of data you have.
Fundamentally speaking it doesn't (though your mileage may vary). You must research right architecture for your problem. Adding new data records may introduce some new features, which may be hard to capture with your current architecture. I'd safely always question my net's capacity. Retrain your model and check if metrics drop.

Is this overfitting

I’m running a machine learning algorithm to answer True/False questions.
Assuming I use classification algo.
After running 1200 data, I got 30% of accuracy.
But then, I made a second algorithm to always negate the first algorithm’s answer
Thus it’s accuracy is 70%
Is this an overfitting for the second algo? Assuming my 1st algorithm consistenly predicts 30% accuracy
To your questions.
I feel like this answer kind of depends on the machine learning model which you choose and the training set. Most ML Models make mistakes initially. In your case if the training set of Algo 2 is 70% it might mean that it is good at predicting the wrong thing? If i'm understanding this correctly? All though this might be true in the beginning of the data negating a ML answer is a bad idea. The better idea is to prepare your data correctly and train it on a data set which is the best fit for your model.
Most Machine learning models make mistakes it is bound to happen. But the training set and all that data helps you to choose the right model. Data preparation is key in order to make your training set correctly. I know I'm bouncing all over the place. I apologize for that
For instance we might have a logistic regression model and we want to identify the individuals who have a certain condition versus those who don't. The first thing we do is properly prepare our data and then train it (this is the short version) but my point is training a model is very important it allows your ML model to be able to predict the accuracy of it.
I should say I really enjoy Machine Learning/ Deep Learning but I am no means an expert. I highly recommend this class though its how I started off understanding the fundamentals.
Coursera Andrew Ng course

How to fix incorrect guess in image recognition

I'm very new to this stuff so please bear with me. I followed a quick simple video about image recognition/classification in YT and the program indeed could classify the image with a high percentage. But then I do have some other images that was incorrectly classified.
On tensorflow site: https://www.tensorflow.org/tutorials/image_retraining#distortions
However, one should generally avoid point-fixing individual errors in
the test set, since they are likely to merely reflect more general
problems in the (much larger) training set.
so here are my questions:
What would be the best way to correct the program's guess? eg. image is B but the app returned with the results "A - 70%, B - 30%"
If the answer to one would be to retrain again, how do I go about retraining the program again without deleting the previous bottlenecks files created? ie. I want the program to keep learning while retaining previous data I already trained it to recognize.
Unfortunately there is often no easy fix, because the model you are training is highly complex and very hard for a human to interpret.
However, there are techniques you can use to try and reduce your test error. First make sure your model isn't overfitting or underfitting by observing the difference between train and test errors. If either is the case then try applying standard techniques, such as choosing a deeper model and/or using more filters if underfitting or adding regularization if overfitting.
Since you say you are already classifying correctly a high percentage of the time, I would start inspecting misclassified examples directly to try and gain insight into what you might be able to improve.
If possible, try and observe what your misclassified images have in common. If you are lucky they will all fall into one or a small number of categories. Here are some examples of what you might see and possible solutions:
Problem: Dogs facing left are misclassified as cats
Solution: Try augmenting your training set with rotations
Problem: Darker images are being misclassified
Solution: Make sure you are normalizing your images properly
It is also possible that you have reached the limits of your current approach. If you still need to do better consider trying a different approach like using a pretrained network for image recognition, such as VGG.