I want to create a model that start with some features (like mean, std and more) and it ends to recreate the signal. Looking on the internet i understood that if i want to create this kind of model i have to use LSTM, but the only example that i find (for time series) are forecasting models.
I tried to create a model with a single LSTM neuron, and as input the n features but it didnt give me the hopped results. My questions are : Can you suggest me a paper that can explain me better what i want to accomplish ? There are other different ways to proceed ?
Related
I'm a beginner on time series analysis with deep learning, and I have been searching for examples with LSTM in which more than one series (for example one for each city or place) is trained to avoid fitting a model for each one. The main benefit of course is that you have more training data and less computational costs. I have found an interesting code to help modeling this problem with conditional/temporally-static variables (it's called cond-rnn). But wherever I search, it's not clear to me some issues regarding sorting the inputs appropriately.
The context is that I have a target and a set of autoregressive inputs (features, lags, timesteps, wherever you call it), in which data from different series are stack together. RF and GB are outperforming LSTM on this task (with overfitting, even when I use 100k+ samples, dropout or regularization), and I'm not sure if I'm using it appropriately.
It is wrong to stack series together and have the inputs-targets randomly sorted (as in the figure)? Does the LSTM need to receive the inputs temporally sorted?
If they need so, do you have any advice on how to deal with the problem of providing new series (that start from the first time period) to the LSTM training? This answer to a similar problem (but another perspective) suggest to pick "places" as an input column, but I don't think this answer help the questions here I posed.
I am using huggingface transformer models for text-summarization.
Currently I am testing different models such as T5 and Pegasus.
Now these models were trained for summarizing Big Texts into very short like a maximum of two sentences. Now I have the task, that I want summarizations, that are about half the size of the text, ergo the generated summaries are too small for my purpose.
My question now is, if there is a way to tell the model that another sentence came before?
Kind of similar to the logic inside stateful RNNs (although I know they work completly different).
If yes, I could summarize small windows over the sentences always with the information which content came before.
Is that just a thing of my mind? I cant believe that I am the only one, who wants to create shorter summaries, but not only 1 or two sentence long ones.
Thank you
Why not transfer learning? Train them on your specific texts and summaries.
I trained T5 on specific limited text over 5 epoch and got very good results. I adopted the code from here to my needs https://github.com/patil-suraj/exploring-T5/blob/master/t5_fine_tuning.ipynb
Let me know if you have a specific training questions.
I'm trying to train a model for a sentence classification task. The input is a sentence (a vector of integers) and the output is a label (0 or 1). I've seen some articles here and there about using Bert and GPT2 for text classification tasks. However, I'm not sure which one should I pick to start with. Which of these recent models in NLP such as original Transformer model, Bert, GPT2, XLNet would you use to start with? And why? I'd rather to implement in Tensorflow, but I'm flexible to go for PyTorch too.
Thanks!
It highly depends on your dataset and is part of the data scientist's job to find which model is more suitable for a particular task in terms of selected performance metric, training cost, model complexity etc.
When you work on the problem you will probably test all of the above models and compare them. Which one of them to choose first? Andrew Ng in "Machine Learning Yearning" suggest starting with simple model so you can quickly iterate and test your idea, data preprocessing pipeline etc.
Don’t start off trying to design and build the perfect system.
Instead, build and train a basic system quickly—perhaps in just a few
days
According to this suggestion, you can start with a simpler model such as ULMFiT as a baseline, verify your ideas and then move on to more complex models and see how they can improve your results.
Note that modern NLP models contain a large number of parameters and it is difficult to train them from scratch without a large dataset. That's why you may want to use transfer learning: you can download pre-trained model and use it as a basis and fine-tune it to your task-specific dataset to achieve better performance and reduce training time.
I agree with Max's answer, but if the constraint is to use a state of the art large pretrained model, there is a really easy way to do this. The library by HuggingFace called pytorch-transformers. Whether you chose BERT, XLNet, or whatever, they're easy to swap out. Here is a detailed tutorial on using that library for text classification.
EDIT: I just came across this repo, pytorch-transformers-classification (Apache 2.0 license), which is a tool for doing exactly what you want.
Well like others mentioned, it depends on the dataset and multiple models should be tried and best one must be chosen.
However, sharing my experience, XLNet beats all other models so far by a good margin. Hence if learning is not the objective, i would simple start with XLNET and then try a few more down the line and conclude. It just saves time in exploring.
Below repo is excellent to do all this quickly. Kudos to them.
https://github.com/microsoft/nlp-recipes
It uses hugging face transformers and makes them dead simple. 😃
I have used XLNet, BERT, and GPT2 for summarization tasks (English only). Based on my experience, GPT2 works the best among all 3 on short paragraph-size notes, while BERT performs better for longer texts (up to 2-3 pages). You can use XLNet as a benchmark.
Currently, I am working on deep neural network for image detection and I founded a model called YOLO Network, and it's very powerful to make objects detections, but I have a question:
How can we design and concept our own model? Do we use a brut force for that, for example "I use 2 convolutional and 1 pooling layer and 1 fully connected layer" after that if the result is'nt good I change the number of layers and change the parameter until I find the best model, Please if there is anyone who knows some informations about that, show me how ?
I use Tensorflow.
Thanks,
There are a couple of papers addressing this issue. For example in http://www.cv-foundation.org/openaccess/content_cvpr_2016/papers/Szegedy_Rethinking_the_Inception_CVPR_2016_paper.pdf some general principles are mentioned, like preserving information by not having too rapid changes in any cut of the graph seperating the output from the input.
Another paper is https://arxiv.org/pdf/1606.02228.pdf where specific hyperparameter combinations are tried.
The remainder are just what you observe in practice and depends on your dataset and on your requirement. Maybe you have performance requirements because you want to deploy to mobile or you need more than 90 % accuracy. Then you will have to choose your model accordingly.
I want to implement some LSTM model in Tensorflow. I think I understood the tutorials fairly well. In those input data was given in the form of words, which were embedded into a continous vector space (which has several advantages).
I now want to make an LSTM to predict a series of contionous numbers and do not know what is the best approach to that.
Should I discretize my input range, thus effectively get a classification problem with a number of classes and use the embedding desribed before, or stick to the continous numbers and do regression? In that case I just in each time step pass one feature to the model, namely the continous number?
Here's two example you may find helpful.
https://github.com/MorvanZhou/tutorials/blob/master/tensorflowTUT/tf20_RNN2.2/full_code.py
http://mourafiq.com/2016/05/15/predicting-sequences-using-rnn-in-tensorflow.html
You can just use regression. However, if your input is forever long, you need to fix size sequences.