time-series prediction for price forecasting (problems with predictions) - tensorflow

I am working on a project for price movement forecasting and I am stuck with poor quality predictions.
At every time-step I am using an LSTM to predict the next 10 time-steps. The input is the sequence of the last 45-60 observations. I tested several different ideas, but they all seems to give similar results. The model is trained to minimize MSE.
For each idea I tried a model predicting 1 step at a time where each prediction is fed back as an input for the next prediction, and a model directly predicting the next 10 steps(multiple outputs). For each idea I also tried using as input just the moving average of the previous prices, and extending the input to input the order book at those time-steps.
Each time-step corresponds to a second.
These are the results so far:
1- The first attempt was using as input the moving average of the last N steps, and predict the moving average of the next 10.
At time t, I use the ground truth value of the price and use the model to predict t+1....t+10
This is the result
Predicting moving average
On closer inspection we can see what's going wrong:
Prediction seems to be a flat line. Does not care much about the input data.
2) The second attempt was trying to predict differences, instead of simply the price movement. The input this time instead of simply being X[t] (where X is my input matrix) would be X[t]-X[t-1].
This did not really help.
The plot this time looks like this:
Predicting differences
But on close inspection, when plotting the differences, the predictions are always basically 0.
Plot of differences
At this point, I am stuck here and running our of ideas to try. I was hoping someone with more experience in this type of data could point me in the right direction.
Am I using the right objective to train the model? Are there any details when dealing with this type of data that I am missing?
Are there any "tricks" to prevent your model from always predicting similar values to what it last saw? (They do incur in low error, but they become meaningless at that point).
At least just a hint on where to dig for further info would be highly appreciated.

Am I using the right objective to train the model?
Yes, but LSTM are always very tricky for forecasting time series. And are very prone to overfitting compared to other time series models.
Are there any details when dealing with this type of data that I am missing?
Are there any "tricks" to prevent your model from always predicting similar values to what it last saw?
I haven't seen your code, or the details of the LSTM you are using. Make sure you are using a very small network, and you are avoiding overfitting. Make sure that after you differenced the data - you then reintegrate it before evaluating the final forecast.
On trick to try to build a model that forecasts 10 steps ahead directly instead of building a one-step ahead model and then forecasting recursively.


Week accuracy with testing data

I'm dealling with a data science problem, and I got this problem.
I have a labelled data (Training data) and non labelled data (Test data) and both of them have a lot of missing data.
I worked with my data and I split it to trainig data and validating data
I got a very good accuracy and a very small RMSE error between Y_validation and the predicted one ( model.predict(X_validate) ). But when I submit my solution, the RMSE error get bigger with testing data !
What can I do ?!
Firstly, you need to label your test data. If your test data is not labelled, you will not be able to gauge the accuracy. It will not return accurate error representation.
You need to understand that the training set contain a known output that the model learn from. The test data have to be labelled so that when the model returns its predictions on the test data, we are able to gauge whether the model has correctly predicted the label given to the test data.
On top of doing a train test split you can also do cross validation to improve your model performance. You can understand more from here. (https://towardsdatascience.com/train-test-split-and-cross-validation-in-python-80b61beca4b6)
This will happen sometimes when a model doesn't generalize well. This can happen when a model over fits to training data.
Resampling or better sampling of test and train data (which as mentioned, needs to be labeled) can help you get a better generalized model.

How can I use multiple datasets with one model in Keras?

I am trying Forex prediction with Keras and Tensorflow using a LSTM Network.
I of course want it to train on many days of trading but to do that I would have to give it sequential data with big jumps and phases without movement... when the market is closed... This isn't ideal as it gets "confused" because of these jumps and phases of no movement. Alternatively I an use one day of minute per minute data but this way I have very limited time of training data and the model won't be very good.
Do you have Ideas on how to fix this?
here is my current code:
If you plan on fitting multiple datasets as data slices, sequentially, something like this would work:
for _ in range(10):
#somehow cut the data into slices and fit them one by one
model.fit(data_slice, label_slice ......)
As successive calls to fit will train the single model incrementally .

Binary classification of every time series step based on past and future values

I'm currently facing a Machine Learning problem and I've reached a point where I need some help to proceed.
I have various time series of positional (x, y, z) data tracked by sensors. I've developed some more features. For example, I rasterized the whole 3D space and calculated a cell_x, cell_y and cell_z for every time step. The time series itself have variable lengths.
My goal is to build a model which classifies every time step with the labels 0 or 1 (binary classification based on past and future values). Therefore I have a lot of training time series where the labels are already set.
One thing which could be very problematic is that there are very few 1's labels in the data (for example only 3 of 800 samples are labeled with 1).
It would be great if someone can help me in the right direction because there are too many possible problems:
Wrong hyperparameters
Incorrect model
Too few 1's labels, but I think that's not a big problem because I only need the model to suggests the right time steps. So I would only use the peaks of the output.
Bad or too less training data
Bad features
I appreciate any help and tips.
Your model seems very strange. Why only use 2 units in lstm layer? Also your problem is a binary classification. In this case you should choose only one neuron in your output layer (try to insert one additional dense layer between and lstm layer and try dropout layers between them).
Binary crossentropy does not make much sense with 2 output neurons, if you don't have a multi label problem. But if you're switching to one output neuron it's the right one. You also need sigmoid then as activation function.
As last advice: Try class weights.
This can make a huge difference, if you're label are unbalanced.
You can create the model using tensorflow BasicLSTMCell, the shape of your data fits for BasicLSTMCell in TensorFlow you can find Documentation for BasicLSTMCell here and for creating the model this Documentation contain code that will help to build BasicLstmCell model . Hope this will help you, Cheers.

LSTM Sequence Prediction in Keras just outputs last step in the input

I am currently working with Keras using Tensorflow as the backend. I have a LSTM Sequence Prediction model shown below that I am using to predict one step ahead in a data series (input 30 steps [each with 4 features], output predicted step 31).
model = Sequential()
model.compile(loss="mse", optimizer="rmsprop")
return model
The issue I'm having is that after training the model and testing it - even with the same data it trained on - what it outputs is essentially the 30th step in the input. My first thought is the patterns of my data must be too complex to accurately predict, at least with this relatively simple model, so the best answer it can return is essentially the last element of the input. To limit the possibility of over-fitting I've tried turning training epochs down to 1 but the same behavior appears. I've never observed this behavior before though and I have worked with this type of data before with successful results (for context, I'm using vibration data taken from 4 points on a complex physical system that has active stabilizers; the prediction is used in a pid loop for stabilization hence why, at least for now, I'm using a simpler model to keep things fast).
Does that sound like the most likely cause, or does anyone have another idea? Has anyone seen this behavior before? In case it helps with visualization here is what the prediction looks like for one vibration point compared to the desired output (note, these screenshots are zoomed in smaller selections of a very large dataset - as #MarcinMożejko noticed I did not zoom quite the same both times so any offset between the images is due to that, the intent is to show the horizontal offset between the prediction and true data within each image):
...and compared to the 30th step of the input:
Note: Each data point seen by the Keras model is an average over many actual measurements with the window of the average processed along in time. This is done because the vibration data is extremely chaotic at the smallest resolution I can measure so instead I use this moving average technique to predict the larger movements (which are the more important ones to counteract anyway). That is why the offset in the first image appears as many points off instead of just one, it is 'one average' or 100 individual points of offset.
-----Edit 1, code used to get from the input datasets 'X_test, y_test' to the plots shown above-----
model_1 = lstm.build_model() # The function above, pulled from another file 'lstm'
prediction = model_1.predict(X_test)
temp_predicted_sensor_b = (prediction[:, 0] + 1) * X_b_orig[:, 0]
sensor_b_y = (Y_test[:, 0] + 1) * X_b_orig[:, 0]
plot_results(temp_predicted_sensor_b, sensor_b_y)
plot_results(temp_predicted_sensor_b, X_b_orig[:, 29])
For context:
X_test.shape = (41541, 30, 4)
Y_test.shape = (41541, 4)
X_b_orig is the raw (averaged as described above) data from the b sensor. This is multiplied by the prediction and input data when plotting to undo normalization I do to improve the prediction. It has shape (41541, 30).
----Edit 2----
Here is a link to a complete project setup to demonstrate this behavior:
That is because for your data(stock data?), the best prediction for 31st value is the 30th value itself.The model is correct and fits the data.
I also have similar experience predicting the stock data.
I feel I should post a follow-up, since it seems this post has been getting more attention than my other questions.
Ferret Zhang's answer is correct (and has been accepted), and I find this discovery is actually quite funny when you understand it in relation to stock / cryptocurrency data which some have commented about. What sequence prediction is ultimately doing is assigning statistical weights to different moves, to pick the highest probability move and 'predict' it will happen. In the case of stock data, in a vacuum it is (at least at this scale) completely random, there is equal probability of moving up or down, and hence the model predicts that it will stay the exact same.
The model, in a sense, learned that the best way to play is to not play at all :)

Time series classification using LSTM - How to approach?

I am working on an experiment with LSTM for time series classification and I have been going through several HOWTOs, but still, I am struggling with some very basic questions:
Is the main idea for learning the LSTM to take a same sample from every time series?
E.g. if I have time series A (with samples a1,a2,a3,a4), B(b1,b2,b3,b4) and C(c1,c2,c3,c4), then I will feed the LSTM with batches of (a1,b1,c1), then (a2,b2,c2) etc.? Meaning that all time series needs to be of the same size/number of samples?
If so, can anynone more experienced be so kind and describe me very simply how to approach the whole process of learning the LSTM and creating the classifier?
My intention is to use TensorFlow, but I am still new to this.
If your goal is classification, then your data should be a a time series and a label. During training, you feed each into the lstm, and look only at the last output and backprop as necessary.
Judging from your question, you are probably confused about batching -- you can train multiple items at once. However, each item in the batch would get its own hidden state, and only the parameters of the layers are updated.
The time series in a single batch should be of the same length. You should terminate each sequence with a END token and pad items that are too short with a special token PAD -- the lstm should learn that PAD's after and END are useless.
There is no need for different batches to have the same number of items, nor to have items of the same length.