TensorFlow checkpoints and models vis-a-vis multi-gpu settings

TensorFlow checkpoints and models vis-a-vis multi-gpu settings - tensorflow

Let us take a practical situation a researcher often finds him/herself into when using TensorFlow :
Multiple GPUs are available for training and I'd like to use them for speedup.
Subsequently I'd like to give the trained model to a colleague or collaborator with a different (maybe 1 !!) number of GPUs.
It is important that the code need not be modified for use when shared with multiple collaborators.
However, TensorFlow documentation/examples are not very clear/explained for such a scenario.
Basic questions are :
How do I write a code which involves training a model with multiple GPUs and where the model can be easily restored from checkpoints ?
How do I deal with the situation where my collaborators have different number of GPU resources ? More precisely, what best practices should I follow to ensure that the code and model I share with them is easily usable by them ?
Are there some examples or best practices other TensorFlow users (facing the same situation !!) can share ?
NOTE : I am not looking for a readymade solution. My prime purpose is to understand a TensorFlow feature which is not very well documented.

Related

Serving hundreds of models with Tensorflow serving

I would like to serve about ~600 models with Tensorflow Serving.
I am trying to find a solution to eventually reduce the number of models:
My models have the same architecture, only the weights changes.
Is it possible to load only one model and changing the weights?
Would it be possible to aggregate all those models together and effectively, the first layer of the model would be an ID and the input features for that model?
Has anyone tried having couple of hundreds models running on one machine? I have find this cortex solution, but wanted to avoid using another tech.
https://towardsdatascience.com/how-to-deploy-1-000-models-on-one-cpu-with-tensorflow-serving-ec4297bff54b

If the models have the same architecture but different weight, you can try merging all those model into a "super model". However I would need to know more about the task to see if that's possible.
To serve 600 models, you would need a very powerful machine and lot of memory (depending on how big your models are and how much you use them in parallel).
You can either run TFServe yourself, or use a provider such as Inferrd.com/Google/AWS.

Understanding the Hugging face transformers

I am new to the Transformers concept and I am going through some tutorials and writing my own code to understand the Squad 2.0 dataset Question Answering using the transformer models. In the hugging face website, I came across 2 different links
https://huggingface.co/models
https://huggingface.co/transformers/pretrained_models.html
I want to know the difference between these 2 websites. Does one link have just a pre-trained model and the other have a pre-trained and fine-tuned model?
Now if I want to use, let's say an Albert Model For Question Answering and train with my Squad 2.0 training dataset on that and evaluate the model, to which of the link should I further?

I would formulate it like this:
The second link basically describes "community-accepted models", i.e., models that serve as the basis for the implemented Huggingface classes, like BERT, RoBERTa, etc., and some related models that have a high aceptance or have been peer-reviewed.
This list has bin around much longer, whereas the list in the first link only recently got introduced directly on the Huggingface website, where the community can basically upload arbitrary checkpoints that are simply considered "compatible" with the library. Oftentimes, these are additional models trained by practitioners or other volunteers, and have a task-specific fine-tuning. Note that al models from /pretrained_models.html are also included in the /models interface as well.
If you have a very narrow usecase, you might as well check and see if there was already some model that has been fine-tuned on your specific task. In the worst case, you'll simply end up with the base model anyways.

Which model (GPT2, BERT, XLNet and etc) would you use for a text classification task? Why?

I'm trying to train a model for a sentence classification task. The input is a sentence (a vector of integers) and the output is a label (0 or 1). I've seen some articles here and there about using Bert and GPT2 for text classification tasks. However, I'm not sure which one should I pick to start with. Which of these recent models in NLP such as original Transformer model, Bert, GPT2, XLNet would you use to start with? And why? I'd rather to implement in Tensorflow, but I'm flexible to go for PyTorch too.
Thanks!

It highly depends on your dataset and is part of the data scientist's job to find which model is more suitable for a particular task in terms of selected performance metric, training cost, model complexity etc.
When you work on the problem you will probably test all of the above models and compare them. Which one of them to choose first? Andrew Ng in "Machine Learning Yearning" suggest starting with simple model so you can quickly iterate and test your idea, data preprocessing pipeline etc.
Don’t start off trying to design and build the perfect system.
Instead, build and train a basic system quickly—perhaps in just a few
days
According to this suggestion, you can start with a simpler model such as ULMFiT as a baseline, verify your ideas and then move on to more complex models and see how they can improve your results.
Note that modern NLP models contain a large number of parameters and it is difficult to train them from scratch without a large dataset. That's why you may want to use transfer learning: you can download pre-trained model and use it as a basis and fine-tune it to your task-specific dataset to achieve better performance and reduce training time.

I agree with Max's answer, but if the constraint is to use a state of the art large pretrained model, there is a really easy way to do this. The library by HuggingFace called pytorch-transformers. Whether you chose BERT, XLNet, or whatever, they're easy to swap out. Here is a detailed tutorial on using that library for text classification.
EDIT: I just came across this repo, pytorch-transformers-classification (Apache 2.0 license), which is a tool for doing exactly what you want.

Well like others mentioned, it depends on the dataset and multiple models should be tried and best one must be chosen.
However, sharing my experience, XLNet beats all other models so far by a good margin. Hence if learning is not the objective, i would simple start with XLNET and then try a few more down the line and conclude. It just saves time in exploring.
Below repo is excellent to do all this quickly. Kudos to them.
https://github.com/microsoft/nlp-recipes
It uses hugging face transformers and makes them dead simple. 😃

I have used XLNet, BERT, and GPT2 for summarization tasks (English only). Based on my experience, GPT2 works the best among all 3 on short paragraph-size notes, while BERT performs better for longer texts (up to 2-3 pages). You can use XLNet as a benchmark.

fast.ai equivalent in tensorflow

Is there any equivalent/alternate library to fastai in tensorfow for easier training and debugging deep learning models including analysis on results of trained model in Tensorflow.
Fastai is built on top of pytorch looking for similar one in tensorflow.

The obvious choice would be to use tf.keras.
It is bundled with tensorflow and is becoming its official "high-level" API -- to the point where in TF 2 you would probably need to go out of your way not using it at all.
It is clearly the source of inspiration for fastai to easy the use of pytorch as Keras does for tensorflow, as mentionned by the authors time and again:
Unfortunately, Pytorch was a long way from being a good option for part one of the course, which is designed to be accessible to people with no machine learning background. It did not have anything like the clear simple API of Keras for training models. Every project required dozens of lines of code just to implement the basics of training a neural network. Unlike Keras, where the defaults are thoughtfully chosen to be as useful as possible, Pytorch required everything to be specified in detail. However, we also realised that Keras could be even better. We noticed that we kept on making the same mistakes in Keras, such as failing to shuffle our data when we needed to, or vice versa. Also, many recent best practices were not being incorporated into Keras, particularly in the rapidly developing field of natural language processing. We wondered if we could build something that could be even better than Keras for rapidly training world-class deep learning models.

Use summarization model without training

The tensorflow text summarization model as described here https://github.com/tensorflow/models/tree/master/textsum requires a multi GPU architecture in order to train. My repeated attempts at training the model has resulted in memory exceptions, machine crashing for various reasons. Is the trained summarisation model available so can make use of the summarization model without the need for training? The summarization model is trained using the not free Gigaword dataset, if the trained model is not available from Google is this a factor in reason why ?

So as far as I can tell, no one has put the trained model out there that is referenced. I too was originally running into memory issues on my macbook pro and eventually ended up using my gaming laptop which had a much better GPU.
The other option of course is to take advantage of AWS and use something like their g2.2xlarge instance. They also have their P2 instances as well, but I have not checked that out yet.
With regards to the Gigaword dataset, it simply comes down to licensing. It is not a free license from LDC and often many of the academics working on this have the dataset provided to them via their Universities or companies. I have not had luck finding it, however LDC did get back to me and advised that they do have other article datasets that have a pricetag of around $300 which is much more reasonable for those of use just trying to learn TF. That said, if you didn't want to buy anything, you can always write your own page scraper and format the data for the textsum model. https://github.com/tensorflow/models/pull/379/files
Hope this helps some. Good luck!

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

TensorFlow checkpoints and models vis-a-vis multi-gpu settings - tensorflow

Related

Serving hundreds of models with Tensorflow serving

Understanding the Hugging face transformers

Which model (GPT2, BERT, XLNet and etc) would you use for a text classification task? Why?

fast.ai equivalent in tensorflow

Use summarization model without training

Categories

Resources