Binary classification of every time series step based on past and future values - tensorflow

I'm currently facing a Machine Learning problem and I've reached a point where I need some help to proceed.
I have various time series of positional (x, y, z) data tracked by sensors. I've developed some more features. For example, I rasterized the whole 3D space and calculated a cell_x, cell_y and cell_z for every time step. The time series itself have variable lengths.
My goal is to build a model which classifies every time step with the labels 0 or 1 (binary classification based on past and future values). Therefore I have a lot of training time series where the labels are already set.
One thing which could be very problematic is that there are very few 1's labels in the data (for example only 3 of 800 samples are labeled with 1).
It would be great if someone can help me in the right direction because there are too many possible problems:
Wrong hyperparameters
Incorrect model
Too few 1's labels, but I think that's not a big problem because I only need the model to suggests the right time steps. So I would only use the peaks of the output.
Bad or too less training data
Bad features
I appreciate any help and tips.

Your model seems very strange. Why only use 2 units in lstm layer? Also your problem is a binary classification. In this case you should choose only one neuron in your output layer (try to insert one additional dense layer between and lstm layer and try dropout layers between them).
Binary crossentropy does not make much sense with 2 output neurons, if you don't have a multi label problem. But if you're switching to one output neuron it's the right one. You also need sigmoid then as activation function.
As last advice: Try class weights.
http://scikit-learn.org/stable/modules/generated/sklearn.utils.class_weight.compute_class_weight.html
This can make a huge difference, if you're label are unbalanced.

You can create the model using tensorflow BasicLSTMCell, the shape of your data fits for BasicLSTMCell in TensorFlow you can find Documentation for BasicLSTMCell here and for creating the model this Documentation contain code that will help to build BasicLstmCell model . Hope this will help you, Cheers.

Related

Is it possible to train a NN in Keras with features that won't be available for prediction?

I'm fairly new to this topic as a whole and struggle to wrap my head even the basics of neural networks in general. Not looking for a project plan, appreciate that you probably have better things to do.
Nonetheless, any idea or push in the right direction is appreciated.
Imaging a grey-box model of some kind, thermal network, electrical network, so on, it's desirable to predict returns based on a very few features with an underlying smart model that is trained with a much bigger dataset.
My question would be if it is possible to train a model with features and define mandatory and some sort of good-to-have features for the predictions?
Any tips are appreciated.
Cheers
Yes, you can train your model like that. But you must feed all the features during prediction. For example, you have 30 mandatory features and 10 optional features. The total is 40. You must feed all the 40 features to get a prediction from your model. Input data shape must be the same always. But we asked for optional features, but why I'm being forced now? well, I will discuss two options.
Option 1: set input shape to None. If you set the input shape to None, your model will accept any shape of input. But if you will have to handle some stuff. You can't use the MaxPooling layer. If you really need to use MaxPool, you will need to calculate input, output shape for all the layers only using the mandatory feature shapes(minimum input shape). If you calculate using (mandatory_feature+optional feature), you will end up with an error. Because the shape of the input of the maxpool layer can become too small and can't be reduced. Take care of that stuff and you're good to go.
Option 2: I will give you an example. I was using the OpenPose output dataset to classify some movements. OpenPose output = 18 bone key points, 36 features including x and y coordinates. These key points were being extracted from live camera frames. But we can't say all the body parts of a human will be always inside the frame. When someone's legs are outside of the frame, we can't get their legs keypoints. But still, we need to classify. There were a lot of options. We could replace the missing keypoints with 0 or find median/mean of such poses and use that value as key points. We found the best case by analyzing all the data. If you're going with option 2, I will suggest you analyze the data first, then decide how you're going to handle the missing field.

time-series prediction for price forecasting (problems with predictions)

I am working on a project for price movement forecasting and I am stuck with poor quality predictions.
At every time-step I am using an LSTM to predict the next 10 time-steps. The input is the sequence of the last 45-60 observations. I tested several different ideas, but they all seems to give similar results. The model is trained to minimize MSE.
For each idea I tried a model predicting 1 step at a time where each prediction is fed back as an input for the next prediction, and a model directly predicting the next 10 steps(multiple outputs). For each idea I also tried using as input just the moving average of the previous prices, and extending the input to input the order book at those time-steps.
Each time-step corresponds to a second.
These are the results so far:
1- The first attempt was using as input the moving average of the last N steps, and predict the moving average of the next 10.
At time t, I use the ground truth value of the price and use the model to predict t+1....t+10
This is the result
Predicting moving average
On closer inspection we can see what's going wrong:
Prediction seems to be a flat line. Does not care much about the input data.
2) The second attempt was trying to predict differences, instead of simply the price movement. The input this time instead of simply being X[t] (where X is my input matrix) would be X[t]-X[t-1].
This did not really help.
The plot this time looks like this:
Predicting differences
But on close inspection, when plotting the differences, the predictions are always basically 0.
Plot of differences
At this point, I am stuck here and running our of ideas to try. I was hoping someone with more experience in this type of data could point me in the right direction.
Am I using the right objective to train the model? Are there any details when dealing with this type of data that I am missing?
Are there any "tricks" to prevent your model from always predicting similar values to what it last saw? (They do incur in low error, but they become meaningless at that point).
At least just a hint on where to dig for further info would be highly appreciated.
Thanks!
Am I using the right objective to train the model?
Yes, but LSTM are always very tricky for forecasting time series. And are very prone to overfitting compared to other time series models.
Are there any details when dealing with this type of data that I am missing?
Are there any "tricks" to prevent your model from always predicting similar values to what it last saw?
I haven't seen your code, or the details of the LSTM you are using. Make sure you are using a very small network, and you are avoiding overfitting. Make sure that after you differenced the data - you then reintegrate it before evaluating the final forecast.
On trick to try to build a model that forecasts 10 steps ahead directly instead of building a one-step ahead model and then forecasting recursively.

Tensorflow bounded regression vs classification

As part of my masters thesis I have been tasked with predicting a label integer (0-255) which is a binned representation of an angle. The feature columns are also integers, in the range (0-255).
So far I have used the custom Tensorflow layers estimator, implementing a 256 output classifier which performs well. However, my issue with the classification approach I am using is the following:
My classification model thinks that predicting a 3 instead of a 28 is as good/bad as predicting a 27 as a 28
The numerical interval / ordinal nature of my data (not sure which) leads me to believe that if I used regression I would achieve results with less drastically incorrect predictions or outliers.
My goal:
to reduce the number of drastically incorrect predicted outliers
My questions:
Is regression the better approach, or can I improve my
classification to include an ordinal/interval relationship between
my labels?
If I choose regression, is there a way to bound my predicted output between 0-255 (I know I will have to round float values predicted).
Thanks in advance. Any other comments, suggestions or ideas to help me to best tackle the problem are also very helpful.
If I made any incorrect assumptions or mistake in my interpretation of the problem feel free to correct me.
Question 1: Regression is the simpler approach, however, you can also use classification and manipulate the loss function to have a lower loss for misclassifications that are "close" to the original class.
Question 2: The tensorflow command for bounding your prediction is tf.clip_by_value. Are you mapping all 360 degrees to [0,255]? In that case you will want to consider the boundary cases, i.e. your estimator yields -4 and the true value is 251, but they are the actually representing the same value so loss should be 0.

Understanding the Input Parameters in RNN

I'm having a hard time to understand the different "jargons" used in RNN. They are the following:
batch_size, time_steps, inputs and instances.
Let me go through my understanding of each input parameters & please correct me where I'm wrong.
Suppose I've got a sequence of numbers and I want to predict the next number. The numbers are the following:
[1,2,3,4,5,....,100]
time_steps: This parameter means how far RNN will look into past before it predicts the future. For simplicity, I want to predict 1 number ahead. And want to do after I see 10 numbers in the past. So, in this case, time_steps will be 10.
inputs: These are the values at each time_steps. In first time_step (t) the inputs are
t0: [1]
t1: [2]
.
.
.
t10: [10]`
batch_size: This helps in efficient computation of RNN model. Suppose my batch_size is 2. In that case, at time_step 2, the RNN input will be
t0: [1]
t0: [11]
Then what's the usage of instances? E.g. in this post, instances have been used. And there are multiple cases where instances are used. Is it means each loop over batch? E.g. there are 5 batches, each of size 2. Then there will be 5 instances.
Please help me correct my understanding.
Thanks!
batch_size
Batch size, in general, represents the size of the mini-batches constructed from the experimental dataset. Since in deep learning, we are required to do a lot of computations, it is better if we consider mini-batch operations because GPU usage will be worth then.
time_steps
Since RNN takes sequential inputs, index of each element in the input sequence can be referred as a time step of that sequence. For example, if [1,2,3,4,5,....,100] is a sequence, index of each element in the sequence is a time step.
inputs
The term inputs has a broader meaning, so I am not sure if my definition is correct. According to my understanding, inputs to an RNN refers to individual inputs provided to RNN at each time step. For example, in [1,2,3,4,5,....,100], each element is an input to the RNN at a particular time step.
But in an abstract way, if someone asks, what is the input of your deep neural model? You can say, it is English sentences or images or audio clips or videos etc. In short, the meaning of the term inputs depends on the context.
instances
Instances, in general, refers to a training/dev/test example in the dataset. For example, the sequence: [1,2,3,4,5,....,100] can be a training instance in your dataset.
Hope this helps!
Alright pal, you did good learning those concepts. I had a hard time learning those correctly. Everything you know seems to be in order and as for "instances". They're basically a set of data. There's no fixed term of usage of "instances" in a deep learning community. Some people use it for referring for a different set of data or batches of data. I rarely hear it in papers.

Predict all probable trajectories in a grid structure using Keras

I'm trying to predict sequences of 2D coordinates. But I don't want only the most probable future path but all the most probable paths to visualize it in a grid map.
For this I have traning data consisting of 40000 sequences. Each sequence consists of 10 2D coordinate pairs as input and 6 2D coordinate pairs as labels.
All the coordinates are in a fixed value range.
What would be my first step to predict all the probable paths? To get all probable paths I have to apply a softmax in the end, where each cell in the grid is one class right? But how to process the data to reflect this grid like structure? Any ideas?
A softmax activation won't do the trick I'm afraid; if you have an infinite number of combinations, or even a finite number of combinations that do not already appear in your data, there is no way to turn this into a multi-class classification problem (or if you do, you'll have loss of generality).
The only way forward I can think of is a recurrent model employing variational encoding. To begin with, you have a lot of annotated data, which is good news; a recurrent network fed with a sequence X (10,2,) will definitely be able to predict a sequence Y (6,2,). But since you want not just one but rather all probable sequences, this won't suffice. Your implicit assumption here is that there is some probability space hidden behind your sequences, which affects how they play out over time; so to model the sequences properly, you need to model that latent probability space. A Variational Auto-Encoder (VAE) does just that; it learns the latent space, so that during inference the output prediction depends on sampling over that latent space. Multiple predictions over the same input can then result in different outputs, meaning that you can finally sample your predictions to empirically approximate the distribution of potential outputs.
Unfortunately, VAEs can't really be explained within a single paragraph over stackoverflow, and even if they could I wouldn't be the most qualified person to attempt it. Try searching the web for LSTM-VAE and arm yourself with patience; you'll probably need to do some studying but it's definitely worth it. It might also be a good idea to look into Pyro or Edward, which are probabilistic network libraries for python, better suited to the task at hand than Keras.