TensorFlow model with time series data, having different input shapes for training and prediction - tensorflow

I am having a somewhat decent working neural net, utilising mostly LSTM, Dropout and Dense layers. I usually use it for sales prediction only but now my issue is that I'd like to train and predict with datasets of different shapes.
I have several columns showing marketing spending per channel, as well as sales for different products. Below you find an image, illustrating the dataset. Now, the orange data (marketing channels and product sales) are supposed to be the training data. When I do a many-to-many prediction, I could just forecast all the columns, like I do when I've got a dataset containing only sales.
But I already know the marketing spendings for the future, because it already is planned ahead. Now, for that I could just use pystats (OLS for example) but LSTM are really good at remembering the past marketing spendings and sales.
Actual Question:
is there a way to utilise a tensorflow neural net with a different input shape on training and test data? Test data in this case would be either actual test data or already the actual future.
Or any other comparable model? Unfortunately, I have not found any solution during my research.
Thanks for your time.

Related

How to inject future sequential input for multi-step time series forecasts in LSTM networks

I am trying to do multi-step (i.e., sequence-to-sequence) forecasts for product sales using both (multivariate) sequential and non-sequential inputs.
Specifically, I am using sales numbers as well as some other sequential inputs (e.g., price, is day before holiday, etc...) of the past n days to predict the sales for future m days. Additionally, I have some non-sequential features characterizing the product itself.
Definitions:
n_seq_features <- number of sequential features (in the multivariate time-series) including sales
n_non_seq_features <- number of non-sequential features characterizing a product
I got as far as building a hybrid-model, where first the sequential input is passed through some LSTM layers. The output of the final LSTM layer is then concatenated with the non-sequential features and fed into some dense layers.
What I can't quite get my head around, though, is how to input future sequntial input (everything except sales numbers for the following m days) in a way that efficiently utilizes the sequential information (i.e., causality, etc...). For m=1, I can simply input the sequential data for this one day together with the non-sequential input after the LSTM layers, however as soon as m becomes greater than 1 this appears to be a waste of causal information.
The only ways I could think of were:
to incorporate the sequential information for future m days as features in the LSTM input blowing up the input shape from (..., n, n_seq_features) to (..., n, n_seq_features + m*(n_seq_features-1))
add a separate LSTM branch handling the future data, the output of which is then 'somehow' fed into the dense layers at the last stage of the model
I only started using LSTM networks a while ago so I unfortunately have only limited intuition on how they are best utilized (especially in hybrid approaches). For this reason, I would like to ask:
Is the general approach of injecting sequential and non-sequential input at different stages of the same model (i.e., trained concurrently) useful or would one rather split it into separate models which can be trained independently for more fine-grained control?
How is future sequential input injected into an LSTM network to preserve causal information? Can this be achieved with a high-level frontend like KERAS or does it require a very deep dive into the tensorflow backend?
Are LSTM networks not the way to go for this specific problem in the first place?
Cheers and thanks in advance for any advice, resources or thoughts on the matter. :)
In case someone is having a similar issue with future sequential (or temporal) data, University of Oxford and Google Cloud AI have come up with a new architecture to handle all three types of input (past temporal, future temporal as well as static). It is called Temporal Fusion Transformer and, at least from reading the paper, looks like a neat fit. However, I have yet to implement and test it. There is also a PyTorch Tutorial available.

Once a CNN is trained, should its ouputs be deterministic?

I just trained a CNN with Tensorflow/Keras and saved it as a model. I tried running about 1000 inputs through it multiple times, and each time got a slightly different prediction accuracy. The accuracy was good, and I am not concerned with the performance; however, I thought that CNN models, once trained, should be deterministic. That is, any input will always be classified the same way. Is this not the case? Is there variability in the way a model can predict once trained? If not, hopefully I can assume that I have programmed some variability into my code unawares. Any help would be appreciated.
Once a CNN is trained, should its ouputs be deterministic?
Well, in theory, yes. In practise, as Peter Duniho points out in his excellent explanatory comment, we can see very small deviations because of the way values are calculated, aggregated, etc.
In practice the probability of such small deviations changing the predicted category (and therefore the accuracy) of a classification model are so small that I'd be almost certain something else is at play in your example. Even over a sample size of 1000.
Have you left on some training regularisation like batch normalisation? Are you certain you are evaluating precisely the same 1000 inputs each time? Got to suspect the issue is in the code rather than rounding errors.
Can you determine which specific classification changes?

time-series prediction for price forecasting (problems with predictions)

I am working on a project for price movement forecasting and I am stuck with poor quality predictions.
At every time-step I am using an LSTM to predict the next 10 time-steps. The input is the sequence of the last 45-60 observations. I tested several different ideas, but they all seems to give similar results. The model is trained to minimize MSE.
For each idea I tried a model predicting 1 step at a time where each prediction is fed back as an input for the next prediction, and a model directly predicting the next 10 steps(multiple outputs). For each idea I also tried using as input just the moving average of the previous prices, and extending the input to input the order book at those time-steps.
Each time-step corresponds to a second.
These are the results so far:
1- The first attempt was using as input the moving average of the last N steps, and predict the moving average of the next 10.
At time t, I use the ground truth value of the price and use the model to predict t+1....t+10
This is the result
Predicting moving average
On closer inspection we can see what's going wrong:
Prediction seems to be a flat line. Does not care much about the input data.
2) The second attempt was trying to predict differences, instead of simply the price movement. The input this time instead of simply being X[t] (where X is my input matrix) would be X[t]-X[t-1].
This did not really help.
The plot this time looks like this:
Predicting differences
But on close inspection, when plotting the differences, the predictions are always basically 0.
Plot of differences
At this point, I am stuck here and running our of ideas to try. I was hoping someone with more experience in this type of data could point me in the right direction.
Am I using the right objective to train the model? Are there any details when dealing with this type of data that I am missing?
Are there any "tricks" to prevent your model from always predicting similar values to what it last saw? (They do incur in low error, but they become meaningless at that point).
At least just a hint on where to dig for further info would be highly appreciated.
Thanks!
Am I using the right objective to train the model?
Yes, but LSTM are always very tricky for forecasting time series. And are very prone to overfitting compared to other time series models.
Are there any details when dealing with this type of data that I am missing?
Are there any "tricks" to prevent your model from always predicting similar values to what it last saw?
I haven't seen your code, or the details of the LSTM you are using. Make sure you are using a very small network, and you are avoiding overfitting. Make sure that after you differenced the data - you then reintegrate it before evaluating the final forecast.
On trick to try to build a model that forecasts 10 steps ahead directly instead of building a one-step ahead model and then forecasting recursively.

How can I use multiple datasets with one model in Keras?

I am trying Forex prediction with Keras and Tensorflow using a LSTM Network.
I of course want it to train on many days of trading but to do that I would have to give it sequential data with big jumps and phases without movement... when the market is closed... This isn't ideal as it gets "confused" because of these jumps and phases of no movement. Alternatively I an use one day of minute per minute data but this way I have very limited time of training data and the model won't be very good.
Do you have Ideas on how to fix this?
here is my current code:
CODE
Thanks
If you plan on fitting multiple datasets as data slices, sequentially, something like this would work:
for _ in range(10):
#somehow cut the data into slices and fit them one by one
model.fit(data_slice, label_slice ......)
As successive calls to fit will train the single model incrementally .

Using unlabeled dataset in Keras

Usually, when using Keras, the datasets used to train the neural network are labeled.
For example, if I have a 100,000 rows of patients with 12 field per each row, then the last field will indicate if this patient is diabetic or no (0 or 1).
And then after training is finished I can insert a new record and predict if this person is diabetic or no.
But in the case of unlabeled datasets, where I can not label the data due to some reasons, how can I train the neural network to let him know that those are the normal records and any new record that does not match this network will be malicious or not accepted ?
This is called one-class learning and is usually done by using autoencoders. You train an autoencoder on the training data to reconstruct the data itself. The labels in this case is the input itself. This will give you a reconstruction error. https://en.wikipedia.org/wiki/Autoencoder
Now you can define a threshold where the data is benign or not, depending on the reconstruction error. The hope is that the reconstruction of the good data is better than the reconstruction of the bad data.
Edit to answer the question about the difference in performance between supervised and unsupervised learning.
This cannot be said with any certainty, because I have not tried it and I do not know what the final accuracy is going to be. But for a rough estimate supervised learning will perform better on the trained data, because more information is supplied to the algorithm. However if the actual data is quite different to the training data the network will underperform in practice, while the autoencoder tends to deal better with different data. Additionally, per rule of thumb you should have 5000 examples per class to train a neural network reliably, so labeling could take some time. But you will need some data to test anyways.
It sounds like you need fit two different models:
a model for bad record detection
a model for prediction of a patient's likelihood to be diabetic
For both of these models, you will need to have labels. For the first model your labels would indicate whether the record is good or bad (malicious) and the second would be whether the patient is diabetic or not.
In order to detect bad records, you may find that simple logistic regression or SVM performs adequately.