I study ML and I see that most of the time the focus of the algorithms is run time and not accuracy. Reducing features, taking sample from the data set, using approximation and so on.
Im not sure why its the focus since once I trained my model I dont need to train it anymore if my accuracy is high enough and for that if it will take me 1 hours or 10 days to train my model it does not really matter because I do it only 1 time and my goal is to predict as better as I can my outcomes (minimum loss).
If I train a model to differ between cats and dogs I want it to be the most accurate it can be and not the fasted since once I trained this model I dont need to train any more models.
I can understand why models that depends on fasting changing data need this focus of speed but for general training models I dont understand why the focus is on speed.
Speed is relative term. Accuracy is also relative depending on the difficulty of the task. Currently the goal is to achieve human-like performance for application at reasonable costs because this will replace human labor and cut costs.
From what I have seen in reading papers, people usually focus on accuracy first to produce something that works. Then do ablation studies - studies where pieces of the models are removed or modified - to achieve the same performance in less time or memory requirements.
The field is very experimentally validated. There really isn't much of a theory that states why CNN work so well other than that it can model any function given non-linear activations functions. (https://en.wikipedia.org/wiki/Universal_approximation_theorem) There have been some recent efforts to explain why it works well. One I recall is MobileNetV2: Inverted Residuals and Linear Bottlenecks. The explaination of embedding data into a low dimensional space without losing information might be worth reading.
Related
When training a deep learning model, I have to look at the loss curve and performance curve to judge whether the training process of the deep learning model is converged.
This has cost me a lot of time. Sometimes, the time of convergence judged by the naked eye is not accurate.
Therefore, I'd like to know whether there exists an algorithm or a package that can automatically judge whether the training process of the deep learning model is converged.
Can anyone help me?
Thanks a lot.
To the risk of disappointing you, I believe there is no such universal algorithm. In my experience, it depends on what you want to achieve, which metrics are important to you and how much time you are willing to let the training go on for.
I have already seen validation losses dramatically go up (a sign of overfitting) while other metrics (mIoU in this case) were still improving on the validation set. In these cases, you need to know what your target is.
It is possible (although it is very rare) that your loss goes up for a substantial amount of time before going down again and reach better levels than before. There is no way to anticipate this.
Finally, and this is arguably a common case if you have tons of training data, your validation loss may continually go down, but do so slower and slower. In this case, the best strategy if you had an infinite amount of time would be to let it keep the training going indefinitely. In practice, this is impossible, and you would need to find the right balance between performance and training time.
If you really need an algorithm, I would suggest this quite simple one :
Compute a validation metric M(i) after each ith epoch on a fixed subset of your validation set or the whole validation set. Let's suppose that the higher M(i)is, the better. Fix k an integer depending on the duration of one training epoch (k~3 should do the trick)
If for some n you have M(n) > max(M(n+1), ..., M(n+k)), stop and keep the network you had at epoch n.
It's far from perfect, but should be enough for simple tasks.
[Edit] If you're not using it yet, I invite you to use TensorBoard to visualize the evolution of your metrics throughout the training. Once set up, it is a huge gain of time.
I try to understand the basics of deep learning, lastly reading a bit through deeplearning4j. However, I don't really find an answer for: How does the training performance scale with the amount of training data?
Apparently, the cost function always depends on all the training data, since it just sums the squared error per input. Thus, I guess at each optimization step, all datapoints have to be taken into account. I mean deeplearning4j has the dataset iterator and the INDArray, where the data can live anywhere and thus (I think) doesn't limit the amount of training data. Still, doesn't that mean, that the amount of training data is directly related to the calculation time per step within the gradient descend?
DL4J uses iterator. Keras uses generator. Still the same idea - your data comes in batches, and used for SGD. So, minibatches matter, not the the whole amount of data you have.
Fundamentally speaking it doesn't (though your mileage may vary). You must research right architecture for your problem. Adding new data records may introduce some new features, which may be hard to capture with your current architecture. I'd safely always question my net's capacity. Retrain your model and check if metrics drop.
I am planning to make small Tensor Flow image classification project, which is expected to run on machines with low processing power, and one of the concerns I was asked about was the time needed to train the model.
The project is still in the conception stage and no clear boundary is made.
But assuming that we will use Tensor flow for Python, with a simple Neural Network for say n images data set, is there a way to estimate or predict the time required to train the model before performing the training given the hardware in use?
I have asked one of my colleagues who works in NN and he said that maybe we could calculate the time needed by measuring the time for the first epoch and making an estimation how many epochs needed afterwards. Is this is a valid way? If yes then is it even possible to estimate the number of epochs needed? And either cases is there a way to calculate it before performing any training?
There is no definite way of finding the number of epochs to which the model converges. It is one of the hyperparameter.
Apart from the type of model you are training, convergence also depends on the distribution of data, and the optimizer you are using.
The rough estimate you can make by looking at the number of parameters you have in your model, check time for one epoch, and get a rough idea from "experience" on the number of epochs. BUT you always have to look at the training and validation loss curves to check for the convergence.
I’m running a machine learning algorithm to answer True/False questions.
Assuming I use classification algo.
After running 1200 data, I got 30% of accuracy.
But then, I made a second algorithm to always negate the first algorithm’s answer
Thus it’s accuracy is 70%
Is this an overfitting for the second algo? Assuming my 1st algorithm consistenly predicts 30% accuracy
To your questions.
I feel like this answer kind of depends on the machine learning model which you choose and the training set. Most ML Models make mistakes initially. In your case if the training set of Algo 2 is 70% it might mean that it is good at predicting the wrong thing? If i'm understanding this correctly? All though this might be true in the beginning of the data negating a ML answer is a bad idea. The better idea is to prepare your data correctly and train it on a data set which is the best fit for your model.
Most Machine learning models make mistakes it is bound to happen. But the training set and all that data helps you to choose the right model. Data preparation is key in order to make your training set correctly. I know I'm bouncing all over the place. I apologize for that
For instance we might have a logistic regression model and we want to identify the individuals who have a certain condition versus those who don't. The first thing we do is properly prepare our data and then train it (this is the short version) but my point is training a model is very important it allows your ML model to be able to predict the accuracy of it.
I should say I really enjoy Machine Learning/ Deep Learning but I am no means an expert. I highly recommend this class though its how I started off understanding the fundamentals.
Coursera Andrew Ng course
I am working on comparison of deep learning models with application in Vehicular network communication security. I want to know how I can compute the complexity of these models to know the performance of my proposed ones. I am making use of tensorflow
You can compare the complexity of two deep networks with respect to space and time.
Regarding space complexity:
Number of parameters in your model -> this is directly proportional to the amount of memory consumed by your model.
Regarding time complexity:
Amount of time it takes to train a single batch for a given batch size.
Amount of time it takes for training to converge
Amount of time it takes to perform inference on a single sample
Some papers also discuss the architecture complexity. For example, if GoogLeNet accuracy is only marginally higher than VGG-net, some people might prefer VGG-net as it is a lot easier to implement.
You can also discuss some analysis on tolerance of your network to hyperparameter tuning i.e. how your performance varies when you change the hyperparameters.
If your model is in a distributed setting, there are other things to mention such as the communication interval as it is the bottleneck sometimes.
In summary, you can discuss pretty much anything you feel that is implemented differently in another network and that is contributing additional complexity without much improvement in accuracy with respect to your network.
I don't think you would want it but there is also an open source project called deepBench to benchmark different deep network models.