I have a couple of questions regarding the logic behind multivariate, multi-step time series forecasting. To better illustrate, I am going to use an example:
Imagine I want to predict future closing prices for a stock given the past 60 days of data for closing price, opening price, and volume for that stock. I want to predict the closing price for the next 1, 2, 3, 4, and 5 days. My questions are as follow:
During training, should I give as input all features (closing price, opening price, volume) for the past 60 days and make the model predict all features one day into the future? Mind you, the only thing I care about is predicting the future closing price, so I believe this strategy might ruin the loss function? Should I instead give the same input, but make the model predict the closing price for the next 1-5 days? In that case, the loss function will only take the closing price into account, but the model won't use predicted values to predict future values, e.g use prediction for day 1 to give a prediction for day 2. What is best here?
I know there is no clear cut answer for this, but I wonder what model architecture suits best for this purpose. Particularly, I wonder how many neurons to include in the LSTM layers, and how many LSTM layers to use. Also, is it always a good idea to have the final layer be a Dense layer with the number of neurons equal to the amounts of outputs you're expecting from the model?
Related
I am currently training a time-series LSTM with daily timesteps between 2005 and 2019. My model includes hourly data as separate features, plus sin/cos cyclical encoding of the day of the year and one-hot encoding of the day of the week. I have found that if I also include a straight-line feature in the model, i.e. just start the first date in the data as a 1 and add one to every day until the end of the dataset, then scale it, the predictions greatly improve on the testing data. Has anybody else used a linear feature like this before? Is there any precedent for it?
I am having a somewhat decent working neural net, utilising mostly LSTM, Dropout and Dense layers. I usually use it for sales prediction only but now my issue is that I'd like to train and predict with datasets of different shapes.
I have several columns showing marketing spending per channel, as well as sales for different products. Below you find an image, illustrating the dataset. Now, the orange data (marketing channels and product sales) are supposed to be the training data. When I do a many-to-many prediction, I could just forecast all the columns, like I do when I've got a dataset containing only sales.
But I already know the marketing spendings for the future, because it already is planned ahead. Now, for that I could just use pystats (OLS for example) but LSTM are really good at remembering the past marketing spendings and sales.
Actual Question:
is there a way to utilise a tensorflow neural net with a different input shape on training and test data? Test data in this case would be either actual test data or already the actual future.
Or any other comparable model? Unfortunately, I have not found any solution during my research.
Thanks for your time.
I am trying to do multi-step (i.e., sequence-to-sequence) forecasts for product sales using both (multivariate) sequential and non-sequential inputs.
Specifically, I am using sales numbers as well as some other sequential inputs (e.g., price, is day before holiday, etc...) of the past n days to predict the sales for future m days. Additionally, I have some non-sequential features characterizing the product itself.
Definitions:
n_seq_features <- number of sequential features (in the multivariate time-series) including sales
n_non_seq_features <- number of non-sequential features characterizing a product
I got as far as building a hybrid-model, where first the sequential input is passed through some LSTM layers. The output of the final LSTM layer is then concatenated with the non-sequential features and fed into some dense layers.
What I can't quite get my head around, though, is how to input future sequntial input (everything except sales numbers for the following m days) in a way that efficiently utilizes the sequential information (i.e., causality, etc...). For m=1, I can simply input the sequential data for this one day together with the non-sequential input after the LSTM layers, however as soon as m becomes greater than 1 this appears to be a waste of causal information.
The only ways I could think of were:
to incorporate the sequential information for future m days as features in the LSTM input blowing up the input shape from (..., n, n_seq_features) to (..., n, n_seq_features + m*(n_seq_features-1))
add a separate LSTM branch handling the future data, the output of which is then 'somehow' fed into the dense layers at the last stage of the model
I only started using LSTM networks a while ago so I unfortunately have only limited intuition on how they are best utilized (especially in hybrid approaches). For this reason, I would like to ask:
Is the general approach of injecting sequential and non-sequential input at different stages of the same model (i.e., trained concurrently) useful or would one rather split it into separate models which can be trained independently for more fine-grained control?
How is future sequential input injected into an LSTM network to preserve causal information? Can this be achieved with a high-level frontend like KERAS or does it require a very deep dive into the tensorflow backend?
Are LSTM networks not the way to go for this specific problem in the first place?
Cheers and thanks in advance for any advice, resources or thoughts on the matter. :)
In case someone is having a similar issue with future sequential (or temporal) data, University of Oxford and Google Cloud AI have come up with a new architecture to handle all three types of input (past temporal, future temporal as well as static). It is called Temporal Fusion Transformer and, at least from reading the paper, looks like a neat fit. However, I have yet to implement and test it. There is also a PyTorch Tutorial available.
I am working on a project for price movement forecasting and I am stuck with poor quality predictions.
At every time-step I am using an LSTM to predict the next 10 time-steps. The input is the sequence of the last 45-60 observations. I tested several different ideas, but they all seems to give similar results. The model is trained to minimize MSE.
For each idea I tried a model predicting 1 step at a time where each prediction is fed back as an input for the next prediction, and a model directly predicting the next 10 steps(multiple outputs). For each idea I also tried using as input just the moving average of the previous prices, and extending the input to input the order book at those time-steps.
Each time-step corresponds to a second.
These are the results so far:
1- The first attempt was using as input the moving average of the last N steps, and predict the moving average of the next 10.
At time t, I use the ground truth value of the price and use the model to predict t+1....t+10
This is the result
Predicting moving average
On closer inspection we can see what's going wrong:
Prediction seems to be a flat line. Does not care much about the input data.
2) The second attempt was trying to predict differences, instead of simply the price movement. The input this time instead of simply being X[t] (where X is my input matrix) would be X[t]-X[t-1].
This did not really help.
The plot this time looks like this:
Predicting differences
But on close inspection, when plotting the differences, the predictions are always basically 0.
Plot of differences
At this point, I am stuck here and running our of ideas to try. I was hoping someone with more experience in this type of data could point me in the right direction.
Am I using the right objective to train the model? Are there any details when dealing with this type of data that I am missing?
Are there any "tricks" to prevent your model from always predicting similar values to what it last saw? (They do incur in low error, but they become meaningless at that point).
At least just a hint on where to dig for further info would be highly appreciated.
Thanks!
Am I using the right objective to train the model?
Yes, but LSTM are always very tricky for forecasting time series. And are very prone to overfitting compared to other time series models.
Are there any details when dealing with this type of data that I am missing?
Are there any "tricks" to prevent your model from always predicting similar values to what it last saw?
I haven't seen your code, or the details of the LSTM you are using. Make sure you are using a very small network, and you are avoiding overfitting. Make sure that after you differenced the data - you then reintegrate it before evaluating the final forecast.
On trick to try to build a model that forecasts 10 steps ahead directly instead of building a one-step ahead model and then forecasting recursively.
I am new to tensorflow/tflearn and deep learning so these may be basic questions but I would appreciate any input.
Question 1: I have been able to successfully run a LSTM model using tflearn on a set of 2 years of time series data/sequence. I can run the model via variations of "look_back" (e.g. 1 day, 7 days, 30 days) but it will output a single value at each iteration. Running the LSTM for a larger look back improves the RMSE of my test data set. Question I have is, if my goal is to to predict the "next 30 days" given a set of historical daily values, how do I modify the model? I presume I need to either modify my OUTPUT tensor to be a sequence or somehow feed the decoder output at each iteration to be the input to the next? Or do I modify the model to output a full sequence? I cannot find any clear example of how this may be done.
Question 2: After a model is trained, how exactly do you productionize the model. Suppose in my case I trained/tested a model using a year of data to predict the next 30 days. How exactly can I now implement this so that as i get daily values they get integrated with the model. Again, any example of this would be great.
I have tried to go through the tensorflow tutorials but I am not sure they address these points.
Thanks