Can we take subset of any pre-trained model in tensorflow? For example, if we have a pre-trained model which can detect 545 obejcts, can we make a subset of this model which can detect only 20 objects so that the time taken to load the model as well as the detection process can be reduced.
The best you can do is reduce the weights that are related to the last (output) layer only. So, if the size of your second last layer is 1000 so it will reduce your parameters by (1000 * 545 - 1000 * 20) = 525000.
But, if your network is very deep this won't prove to be a great speedup, as you will still need to calculate all the other layers except the last one.
You could, but it's a non-negligible amount of work, and it won't substantially improve the speed.
Indeed, what you'd need to change is only the class prediction layer, which you'd have to reduce from n_featuresx545 to n_featuresx20. Typically at that stage you have n_features=7*7=49 (although it actually depends on the method you're using; this is true for Faster RCNN with usual settings), so you'd save approximately 26k parameters and 8million operations per image (considering 300 detections per image), which is negligible compared to the millions of parameters and billions of operations usually involved in object detection models.
And changing the prediction layer without retraining and while keeping the trained values is not staight-forward, you'd have to write a piece of code to modify your network manually.
Related
Hello everyone so I made this cnn model.
My data:
Train folder->30 classes->800 images each->24000 all together
Validation folder->30 classes->100 images each->3000 all together
Test folder->30 classes -> 100 images each -> 3000 all together
-I've applied data augmentation. ( on the train data)
-I got 5 conv layers with filters 32->64->128->128->128
each with maxpooling and batch normalization
-Added dropout 0.5 after flattening layers
Train part looks good. Validation part has a lot of 'jumps' though. Does it overfit?
Is there any way to fix this and make validation part more stable?
Note: I plann to increase epochs on my final model I'm just experimenting to see what works best since the model takes a lot of time in order to train. So for now I train with 20 epochs.
I've applied data augmentation (on the train data).
What does this mean? What kind of data did you add and how much? You might think I'm nitpicking, but if the distribution of the augmented data is different enough from the original data, then this will indeed cause your model to generalize poorly to the validation set.
Increasing your epochs isn't going to help here, your training loss is already decreasing reasonably. Training your model for longer is a good step if the validation loss is also decreasing nicely, but that's obviously not the case.
Some things I would personally try:
Try decreasing the learning rate.
Try training the model without the augmented data and see how the validation loss behaves.
Try splitting the augmented data so that it's also contained in the validation set and see how the model behaves.
Train part looks good. Validation part has a lot of 'jumps' though. Does it overfit?
the answer is yes. The so-called 'jumps' in the validation part may indicate that the model is not generalizing well to the validation data and therefore your model might be overfitting.
Is there any way to fix this and make validation part more stable?
To fix this you can use the following:
Increasing the size of your training set
Regularization techniques
Early stopping
Reduce the complexity of your model
Use different hyperparameters like learning rate
I have a 2 layered Neural Network that I'm training on about 10000 features (genomic data) with about 100 samples in my data set. Now I realized that anytime I run my model (i.e. compile & fit) I get varying validation/testing accuracys even if I leave the train/test/validation split untouched. Sometimes its around 70% sometimes around 90%.
Due to the stochastic nature of the NN I anticipate some variation but could these strong fluctuations be a sign of something else?
The reason why you're seeing such a big instability with your validation accuracy is because your neural network is huge in comparison to the data you train it on.
Even with just 12 neurons per layer, you still have 12 * 10000 + 12 = 120012 parameters in your first layer. Now think about what the neural network does under the hood. It takes your 10000 inputs, it multiplies each input by some weight and then sums all these inputs. Now you provide it only 64 training examples on which the training algorithm is supposed to decide what are the correct input weights. Just based on intuition, from a purely combinatorial perspective there is going to be large amount of weight assignments that do well on your 64 training samples. And you have no guarantee that the training algorithm will pick such weight assignment that will also do well on your out-of-sample data.
Given neural network is able to represent a wide variety of functions (it's been proven that under certain assumptions it can approximate any function, that's called general approximation). To select the function you want you provide the training algorithm with data to constrain the space of all possible functions the network can represent to a subspace of functions that fit your data. However, such function is in no way guaranteed to represent the true underlying relationship between the input and the output. And especially if the number of parameters is larger than the number of samples (in this case by a few orders of magnitude), you're nearly guaranteed to see your network simply memorize the samples in your training data, simply because it has the capacity to do so and you haven't constrained it enough.
In other words, what you're seeing is overfitting. In NNs, the general rule of thumb is that you want at least a couple of times more samples than you have parameters (look in to the Hoeffding Inequality for theoretical rationale of this) and in effect the more samples you have, the less you're afraid of overfitting.
So here is a couple of possible solutions:
Use an algorithm that's more suitable for the case where you have high input dimension and low sample count, such as Kernel SVM (Support Vector Machine). With such a low sample count, it's quite possible that a Kernel SVM algorithm will achieve better and more consistent validation accuracy. (You can easily test this, they are available in the scikit-learn package, really easy to use)
If you insist on using NN - use regularization. Given the fact you already have working code, this will be easy, just add kernel_regularizer to all your layers, I would try both L1 and L2 regularization (probably separately). L1 regularization tends to push weights to zero so it might help reduce the number of parameters in your problem. L2 just tries to make all the weights small. Use your validation set to decide the best value for each regularization. You can optimize both for the best mean accuracy and also the lowest variance in accuracy on your validation data (do something like 20 training runs for each parameter value of L1 and L2 regularization, usually just trying different orders of magnitude is sufficient, e.g. 1e-4, 1e-3, 1e-2, 1e-1, 1, 1e1).
If most of your input features are not really predictive or if they are highly correlated, PCA (Principal Component Analysis) can be used to project your inputs into a much lower dimensional space (e.g. from 10000 to 20), where you'd have much smaller neural network (still I'd use L1 or L2 for regularization because even then you'd have more weights than training samples)
On a final note, the point of a testing set is to use it very sparsely (ideally only once). It should be the final reported metric after all your research and model tuning is done. You should not optimize any values on it. You should do all this on your validation set. To avoid overfitting on your validation set, look into k-fold cross validation.
I was writing a neural net to train Resnet on CIFAR-10 dataset.
The paper Deep Residual Learning For Image Recognition mentions training for around 60,000 epochs.
I was wondering - what exactly does an epoch refer to in this case? Is it a single pass through a minibatch of size 128 (which would mean around 150 passes through the entire 50000 image training set?
Also how long is this expected to take to train(assume CPU only, 20-layer or 32-layer ResNet)? With the above definition of an epoch, it seems it would take a very long time...
I was expecting something around 2-3 hours only, which is equivalent to about 10 passes through the 50000 image training set.
The paper never mentions 60000 epochs. An epoch is generally taken to mean one pass over the full dataset. 60000 epochs would be insane. They use 64000 iterations on CIFAR-10. An iteration involves processing one minibatch, computing and then applying gradients.
You are correct in that this means >150 passes over the dataset (these are the epochs). Modern neural network models often take days or weeks to train. ResNets in particular are troublesome due to their massive size/depth. Note that in the paper they mention training the model on two GPUs which will be much faster than on the CPU.
If you are just training some models "for fun" I would recommend scaling them down significantly. Try 8 layers or so; even this might be too much. If you are doing this for research/production use, get some GPUs.
I have a multilayer perceptron with 5 hidden layers and 256 neurons each. When I start training, I get different prediction probabilities for each train sample until epoch 50, but then the number of duplicate predictions increases, on epoch 300 I already have 30% of duplicate predictions which does not make sense since the input data is different for all training samples. Any idea what causes this behavior?
Clarifications:
with "duplicate predictions", I mean items with the exactly same predicted probability to belong to class A (it's a binary classification problem)
I have 4000 training samples with 200 features each and all samples are different, it does not make sense that the number of duplicate predictions increases to 30% while training. So I wonder what can cause this behavior.
One point, you say you are doing a binary prediction, and when you say "duplicate predictions", even with your clarification it's hard to understand your meaning. I am guessing that you have two outputs for your binary classifier, one for class A and one for class B and you are getting roughly the same value for a given sample. If that's the case, then the first thing to do is to use 1 output. A binary classification problem is better modeled with 1 output that ranges between 0 and 1 (sigmoid the output neuron). This way there will be no ambiguity, the network will have to choose one or the other, or when it's confused you'll get ~0.5 and it will be clear.
Second, it is very common for a network to start learning well and then to perform more poorly after overtraining. Especially with small datasets such as what you have. In fact, even with the little knowledge I have of your dataset I would put a small bet on you getting better performance out of an algorithm like XGA Boost than a neural network (I assume you're using a neural net and not literally a perceptron).
But regarding the performance degrading over time. When this happens you want to look into something called "early stopping". At some point the network will start memorizing the input, and may be part of what's happening. Essentially you train until the performance on your held out test data starts to worsen.
To address this you can apply various forms of regularization (L2 regularization, dropout, batch normalization all come to mind). You can also reduce the size of your network. 5 layers of 256 neurons sounds too big for the problem. Try trimming this down and I bet your results will improve. There is a sweet spot for architecture size in neural networks. When your network is too large it can, and often will, over fit. When it's too small it won't be expressive enough for the data. Angrew Ng's coursera class has some helpful practical advice on dealing with this.
Usually, when using Keras, the datasets used to train the neural network are labeled.
For example, if I have a 100,000 rows of patients with 12 field per each row, then the last field will indicate if this patient is diabetic or no (0 or 1).
And then after training is finished I can insert a new record and predict if this person is diabetic or no.
But in the case of unlabeled datasets, where I can not label the data due to some reasons, how can I train the neural network to let him know that those are the normal records and any new record that does not match this network will be malicious or not accepted ?
This is called one-class learning and is usually done by using autoencoders. You train an autoencoder on the training data to reconstruct the data itself. The labels in this case is the input itself. This will give you a reconstruction error. https://en.wikipedia.org/wiki/Autoencoder
Now you can define a threshold where the data is benign or not, depending on the reconstruction error. The hope is that the reconstruction of the good data is better than the reconstruction of the bad data.
Edit to answer the question about the difference in performance between supervised and unsupervised learning.
This cannot be said with any certainty, because I have not tried it and I do not know what the final accuracy is going to be. But for a rough estimate supervised learning will perform better on the trained data, because more information is supplied to the algorithm. However if the actual data is quite different to the training data the network will underperform in practice, while the autoencoder tends to deal better with different data. Additionally, per rule of thumb you should have 5000 examples per class to train a neural network reliably, so labeling could take some time. But you will need some data to test anyways.
It sounds like you need fit two different models:
a model for bad record detection
a model for prediction of a patient's likelihood to be diabetic
For both of these models, you will need to have labels. For the first model your labels would indicate whether the record is good or bad (malicious) and the second would be whether the patient is diabetic or not.
In order to detect bad records, you may find that simple logistic regression or SVM performs adequately.