Does spaCy use dev-data to tune hyper-parameters? Or dev-data is totally out of the training process, and so equivalent to test data?
Following the standard greatly explained here, validation data and test data are different. Please is someone can clarify which is the case for spaCy under the referred standard. Thanks a lot.
The spacy core library does not do any hyperparameter tuning. For spacy train, the dev data is used for the evaluation displayed during training, to select the best model, and for early stopping (the early stopping setting is called patience).
Related
For some reason, I get wildly different loss and acc when I evaluate my BERT test set right after training vs. when I load from a saved checkpoint. I thought it might have been my adaptation of BERT, so I tried modifying the run_classifier.py script as little as possible to fit my use case, and I still am seeing this problem.
The only reason I can think of is that the model isn't loading correctly, but I don't know how to fix it. I believe I'm loading how originally intended. For the init_checkpoint parameter, I pass path/to/classifier/model.ckpt-{last_step}. There are three model files (meta, index, data) but there are also the checkpoint, events, and graph files. Do I need to be doing something with those other three files as well? I'm used to using keras, and this pure tensorflow saving/loading process seems unnecessarily convoluted to me.
Thank you in advance for any help/insight regarding BERT or pure tf saving/loading! If you're unfamiliar with BERT, here's the github link: BERT GitHub
I'm trying to train a model for a sentence classification task. The input is a sentence (a vector of integers) and the output is a label (0 or 1). I've seen some articles here and there about using Bert and GPT2 for text classification tasks. However, I'm not sure which one should I pick to start with. Which of these recent models in NLP such as original Transformer model, Bert, GPT2, XLNet would you use to start with? And why? I'd rather to implement in Tensorflow, but I'm flexible to go for PyTorch too.
Thanks!
It highly depends on your dataset and is part of the data scientist's job to find which model is more suitable for a particular task in terms of selected performance metric, training cost, model complexity etc.
When you work on the problem you will probably test all of the above models and compare them. Which one of them to choose first? Andrew Ng in "Machine Learning Yearning" suggest starting with simple model so you can quickly iterate and test your idea, data preprocessing pipeline etc.
Don’t start off trying to design and build the perfect system.
Instead, build and train a basic system quickly—perhaps in just a few
days
According to this suggestion, you can start with a simpler model such as ULMFiT as a baseline, verify your ideas and then move on to more complex models and see how they can improve your results.
Note that modern NLP models contain a large number of parameters and it is difficult to train them from scratch without a large dataset. That's why you may want to use transfer learning: you can download pre-trained model and use it as a basis and fine-tune it to your task-specific dataset to achieve better performance and reduce training time.
I agree with Max's answer, but if the constraint is to use a state of the art large pretrained model, there is a really easy way to do this. The library by HuggingFace called pytorch-transformers. Whether you chose BERT, XLNet, or whatever, they're easy to swap out. Here is a detailed tutorial on using that library for text classification.
EDIT: I just came across this repo, pytorch-transformers-classification (Apache 2.0 license), which is a tool for doing exactly what you want.
Well like others mentioned, it depends on the dataset and multiple models should be tried and best one must be chosen.
However, sharing my experience, XLNet beats all other models so far by a good margin. Hence if learning is not the objective, i would simple start with XLNET and then try a few more down the line and conclude. It just saves time in exploring.
Below repo is excellent to do all this quickly. Kudos to them.
https://github.com/microsoft/nlp-recipes
It uses hugging face transformers and makes them dead simple. 😃
I have used XLNet, BERT, and GPT2 for summarization tasks (English only). Based on my experience, GPT2 works the best among all 3 on short paragraph-size notes, while BERT performs better for longer texts (up to 2-3 pages). You can use XLNet as a benchmark.
This is a newbie question for the tensorflow experts:
I reading lot of data from power transformer connected to an array of solar panels using arduinos, my question is can I use tensorflow to predict the power generation in future.
I am completely new to tensorflow, if can point me to something similar I can start with that or any github repo which is doing similar predictive modeling.
Edit: Kyle pointed me to the MNIST data, which I believe is a Image Dataset. Again, not sure if tensorflow is the right computation library for this problem or does it only work on Image datasets?
thanks, Rajesh
Surely you can use tensorflow to solve your problem.
TensorFlow™ is an open source software library for numerical
computation using data flow graphs.
So it works not only on Image dataset but also others. Don't worry about this.
And about prediction, first you need to train a model(such as linear regression) on you dataset, then predict. The tutorial code can be found in tensorflow homepage .
Get your hand dirty, you will find it works on your dataset.
Good luck.
You can absolutely use TensorFlow to predict time series. There are plenty of examples out there, like this one. And this is a really interesting one on using RNN to predict basketball trajectories.
In general, TF is a very flexible platform for solving problems with machine learning. You can create any kind of network you can think of in it, and train that network to act as a model for your process. Depending on what kind of costs you define and how you train it, you can build a network to classify data into categories, predict a time series forward a number of steps, and other cool stuff.
There is, sadly, no short answer for how to do this, but that's just because the possibilities are endless! Have fun!
I have a question related to this one:
TensorFlow in production for real time predictions in high traffic app - how to use?
I want to setup TensorFlow Serving to do inference as a service for our other application. I see how TensorFlow Serving helps me to do that. Additionally, it mentions a continuous training pipeline, which probably is related to the possibility that TensorFlow Serving can serve with multiple versions of a trained model. But what I am not sure is how to retrain your model as you get new data. The other post mentions the idea to run retraining with cron jobs. However, I am not sure if automatic retraining is a good idea. What architecture would you propose for a continuous retraining pipeline with a system continuously facing new, labelled data?
Edit: It is a supervised learning case. The question is would you automatically retrain your model after n new datapoints came in or would you retrain during the downtime of the customer automatically or just retrain manually?
You probably want to use some kind of semi-supervised training. There's fairly extensive research in that area.
A crude, but expedient way, which works well, is to use the current best models that you have to label the new, incoming data. Models are typically able to produce a score (hopefully a logprob). You can use that score to only train on the data that fits well.
That is an approach that we have used in speech recognition and is an excellent baseline.
I am facing a very strange problem where I am building an RNN model using tensorflow and then storing the model variables (all) using tf.Saver after I finish training.
During testing, I just build the inference part again and restore the variables to the graph. The restoration part does not give any error.
But when I start testing on the evaluation test, I always get same output from the inference all i.e. for all test inputs, I get the same output.
I printed the output during training and I do see that output is different for different training samples and cost is also decreasing.
But when I do testing, it always gives me same output no matter what is the input.
Can someone help me to understand why this could be happening? I want to post some minimal example but as I am not getting any error, I am not sure what should I post here. I will be happy to share more information if it can help the issue.
One difference I have between the inference graph during training and testing is the number of time steps in RNN. During training I train for n steps (n = 20 or more) for a batch before updating gradients while for testing I just use one step as I only want to predict for that input.
Thanks
I have been able to resolve this issue. This seemed to be happening as one of my input feature was very dominant in its original values due to which after some operations all values were converging to single number.
Scaling that feature has helped to resolve this.
Thanks
Can you create a small reproducible case and post this as a bug to https://github.com/tensorflow/tensorflow/issues ? That will help this question get attention from the right people.