I am trying to prune a pre-trained model using TFMOT (Tensorflow model Optimization ToolKit). Is it necessary to re-train the pruned model to get a reduced gzip size ?
Without re-training, the model gzip size does not reduce
Yes, it is necessary to train the model to apply pruning.
Pruning slowly reduces some weights to zero during the training process. This gradual process is necessary to maintain a good accuracy, it can be fine tuned with a specific pruning schedule.
For best results, pruning should be applied to an already trained model.
The Tensorflow team is currently investigating how to apply pruning in a single shot, without training. The objective being not to produce a useful model, but measure the size and performance benefits of pruning without re-training.
https://github.com/tensorflow/model-optimization/issues/621
Related
I'm training a classification model with custom layers on top of BERT. During this, the training performance of this model is going down with increasing epochs ( after the first epoch ) .. I'm not sure what to fix here - is it the model or the data?
( for the data it's binary labels, and balanced in the number of data points for each label).
Any quick pointers on what the problem could be? Has anyone come across this before?
Edit: Turns out there was a mismatch in the transformers library and tf version I was using. Once I fixed that, the training performance was fine!
Thanks!
Remember that fine-tuning a pre-trained model like Bert usually requires a much smaller number of epochs than models trained from scratch. In fact the authors of Bert recommend between 2 and 4 epochs. Further training often translates to overfitting to your data and forgetting the pre-trained weights (see catastrophic forgetting).
In my experience, this affects small datasets especially as it's easy to overfit on them, even at the 2nd epoch. Besides, you haven't commented on your custom layers on top of Bert, but adding much complexity there might increase overfitting also -- note that the common architecture for text classification only adds a linear transformation.
I'm deploying a deep learning model and saved the keras model as .h5 file. I think complex model will make it big in size and hence slow interaction at the server, but is there a way other than reducing the layers in the model that I can do? Is there a sort of compressing the .h5 file in order to load it faster for the server?
Thank you
There is a way to do that.
What you are looking for is called quantization.
Not necessarily reducing the layers which is equivalent to model-pruning, quantization reduces both the size and the latency of the model by modifying the precision of the weights (or even activations in some cases).
For more detailed information, read this page on the official TensorFlow documentation: https://www.tensorflow.org/lite/performance/post_training_quantization
What do each of the stages do? I understand that for neural nets in nlp, the train will find the best parameters for the word embedding. But what is the purpose of the evaluation step? What is it supposed to do? How is that different from the prediction phase?
Training, evaluation and prediction are the three main steps of training a model ( basically in any ML framework ) and to move a model from research/development to production.
Training:
A suitable ML architecture is selected based on the problem which needs to be solved. Hyperparameter optimization is carried out to fine-tune the model. The model is then trained on the data for a certain number of epochs. Metrics such as loss, accuracy, MSE are monitored.
Evaluation:
We need to move the model to production. The model in the production
stage will only make inferences and hence we require the best model
possible. So, in order to evaluate or test the model based on some
predefined levels, the evaluation phase is carried out.
Evaluation is mostly carried out on the data which is a subset of the original dataset. Training and evaluations splits are made while preprocessing the data. Metrics are calculated in order to check the performance of the model on the evaluation dataset.
The evaluation data has been never seen by the model as it is not trained on it. Hence, the model's best performance is expected here.
Prediction:
After the testing of the model, we can move it to production. In the production phase, models only make an inference ( predictions ) on the data given to them. No training takes place here.
Even after a thorough examination, the model tends to make
mispredictions. Hence, in the production stage, we can receive
interactive feedback from the users about the performance of the
model.
Now,
But what is the purpose of the evaluation step? What is it supposed to
do? How is that different from the prediction phase?
Evaluation is to make the model better for most cases through which it will come across. Predictions are made to check for other problems which are not related to performance.
I have built a model and I am successfully able to prune it using tf.contrib's model pruning module with default params and sparsity as 90%, but the problem is when I run the model it still takes the same amount of execution time as of the original model, my guess is that instead of running only the pruned version, tensorflow is running the entire graph with masked weghts and thats why there is no improvement even after pruning.
So how to export the pruned model with subgraph and respective weights and use it?
The strip_pruning_vars utility might be what you're looking for.
From the read.me file: https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/model_pruning#adding-pruning-ops
Removing pruning ops from the trained graph
Once the model is trained, it is necessary to remove the auxiliary variables (mask, threshold) and pruning ops added to the graph in the steps above. This can be accomplished using the strip_pruning_vars utility.
Would you mind sharing your code?
I need to gain some knowledge about deep neural networks.
For a 'ResNet' very deep neural network, we can use transfer learning to train a model.
But Resnet has been trained over the ImageNet dataset. So their pre-trained weights can be used to train the model with another dataset. (for an example training a model for lung cancer detection with CT lung images)
I feels that this approach will be not accurate as the pre-trained weights has been completely trained over other objects but not with medical data.
Instead of transfer learning, is it possible to train the resnet from scratch? (but the available number of images to train the resnet is around 1500) . Is it something possible to do with a normal computer.
Can someone please share your valuable ideas with me
is it possible to train the resnet from scratch?
Yes, it is possible, but the amount of time one needs to get to good accuracy greatly depends on the data. For instance, training original ResNet-50 on a NVIDIA M40 GPU took 14 days (10^18 single precision ops). The most expensive operation in CNN is the convolution in the early layers.
ImageNet contains 14m 226x226x3 images. Since your dataset is ~10000x smaller, each epoch will take ~10000x less ops. On top of that, if you pass gray-scale instead of RGB images, the first convolution will take 3x less ops. Likewise spatial image size affects the training time as well. Training on smaller images can also increase the batch size, which usually speeds things up due to vectorization.
All in all, I estimate that a machine with a single consumer GPU, such as 1080 or 1080ti, can train ~100 epochs of ResNet-50 model in a day. Obviously, training on a 2-GPU machine would be even faster. If that is what you mean by a normal computer, the answer is yes.
But since your dataset is very small, there's a big chance of overfitting. This looks like the biggest issue that your approach faces.