LSTM for Regression (in Tensorflow) - tensorflow

I want to implement some LSTM model in Tensorflow. I think I understood the tutorials fairly well. In those input data was given in the form of words, which were embedded into a continous vector space (which has several advantages).
I now want to make an LSTM to predict a series of contionous numbers and do not know what is the best approach to that.
Should I discretize my input range, thus effectively get a classification problem with a number of classes and use the embedding desribed before, or stick to the continous numbers and do regression? In that case I just in each time step pass one feature to the model, namely the continous number?

Here's two example you may find helpful.
https://github.com/MorvanZhou/tutorials/blob/master/tensorflowTUT/tf20_RNN2.2/full_code.py
http://mourafiq.com/2016/05/15/predicting-sequences-using-rnn-in-tensorflow.html
You can just use regression. However, if your input is forever long, you need to fix size sequences.

Related

From features to sequence

I want to create a model that start with some features (like mean, std and more) and it ends to recreate the signal. Looking on the internet i understood that if i want to create this kind of model i have to use LSTM, but the only example that i find (for time series) are forecasting models.
I tried to create a model with a single LSTM neuron, and as input the n features but it didnt give me the hopped results. My questions are : Can you suggest me a paper that can explain me better what i want to accomplish ? There are other different ways to proceed ?

Issues when modelling LSTM for multi series

I'm a beginner on time series analysis with deep learning, and I have been searching for examples with LSTM in which more than one series (for example one for each city or place) is trained to avoid fitting a model for each one. The main benefit of course is that you have more training data and less computational costs. I have found an interesting code to help modeling this problem with conditional/temporally-static variables (it's called cond-rnn). But wherever I search, it's not clear to me some issues regarding sorting the inputs appropriately.
The context is that I have a target and a set of autoregressive inputs (features, lags, timesteps, wherever you call it), in which data from different series are stack together. RF and GB are outperforming LSTM on this task (with overfitting, even when I use 100k+ samples, dropout or regularization), and I'm not sure if I'm using it appropriately.
It is wrong to stack series together and have the inputs-targets randomly sorted (as in the figure)? Does the LSTM need to receive the inputs temporally sorted?
If they need so, do you have any advice on how to deal with the problem of providing new series (that start from the first time period) to the LSTM training? This answer to a similar problem (but another perspective) suggest to pick "places" as an input column, but I don't think this answer help the questions here I posed.

Can predictions be trusted if learning curve shows validation error lower than training error?

I'm working with neural networks (NN) as a part of my thesis in geophysics, and is using TensorFlow with Keras for training my network.
My current task is to use a NN to approximate a thermodynamical model i.e a nonlinear regression problem. It takes 13 input parameters and outputs a velocity profile (velocity vs. depth) of 450 parameters. My data consists of 100,000 synthetic examples (i.e. no noise is present), split in training (80k), validation (10k) and testing (10k).
I've tested my network for a number of different architectures: wider (5-800 neurons) and deeper (up to 10 layers), different learning rates and batch sizes, and even for many epochs (5000). Basically all the standard tricks of the trade...
But, I am puzzled by the fact that the learning curve shows validation error lower than training error (for all my tests), and I've never been able to overfit to the training data. See figure below:
The error on the test set is correspondingly low, thus the network seems to be able to make decent predictions. It seems like a single hidden layer of 50 neurons is sufficient. However, I'm not sure if I can trust these results due to the behavior of the learning curve. I've considered that this might be due to the validation set consisting of examples that are "easy" to predict, but I cannot see how I should change this. A bigger validation set perhaps?
To wrap it up: Is is necessarily a bad sign if the validation error is lower than or very close to the training error? What if the predictions made with said network are decent?
Is it possible that overfitting is simply not possible for my problem and data?
In addition to trying a higher k fold and the additional testing holdout sample,perhaps mix it up when sampling from the original data set: Select a stratified sample when partitioning out the training and validation/test sets. Then partition the validation and test set without stratifying the sampling.
My opinion is that if you introduce more variation in your modeling methodology (without breaking any "statistical rules"), you can be more confident in the model that you have created.
You can achieve more trustworthy results by repeating your experiments on different data. Use cross validation with high fold (like k=10) to get better confidence of your solution performance. Usually neural networks easily overfit, if your solution has similar results on validation and test set its a good sign.
It is not that easy to tell when not knowing the exact way you have setup the experiment:
what cross-validation method did you use?
how did you split the data?
etc
As you mentioned, the fact that you observe validation error lower than training can be a result of the fact that either the training dataset contains many "hard" cases to learn or the validation set contains many "easy" cases to predict.
However, since generally speaking training loss is expected to underestimate the validation, to me the specific model appear to have unpredictable/unknown fit (perform better in predicting the unknown that the known feels indeed weird).
In order to overcome this, I would start experimenting by reconsidering the data splitting strategy, adding more data if possible, or even change your performance metric.

is my model using future data?

I am training a model in TensorFlow to predict a time series. The network gets a window of data of length L and tries to come up with the next value.
I do the training in batches of overlapping windows (that slide forward in time). To speed up the process, instead of feeding an array of windows, I feed a single larger one and I use tf.extract_image_patches to extract the windows.
My question is: is it the case that the model can "cheat" by looking at the next values in the larger window? Technically, the next value of each window (except for the last one) is in the initial large window that I feed at the beginning.
EDIT: my model is a custom recurrent neural network, that is fed the various windows (one per loop) and the previous prediction.
Unless you are using Recurrent units, your model wouldn't know what it is going to be fed next.
Also, it is generally not a good idea to keep such a structure (overlapping windows) in input data. It's better to use shuffled data, rest whatever works in your particular case.

Cells detection using deep learning techniques

I have to analyse some images of drops, taken using a microscope, which may contain some cell. What would be the best thing to do in order to do it?
Every acquisition of images returns around a thousand pictures: every picture contains a drop and I have to determine whether the drop has a cell inside or not. Every acquisition dataset presents with a very different contrast and brightness, and the shape of the cells is slightly different on every setup due to micro variations on the focus of the microscope.
I have tried to create a classification model following the guide "TensorFlow for poets", defining two classes: empty drops and drops containing a cell. Unfortunately the result wasn't successful.
I have also tried to label the cells and giving to an object detection algorithm using DIGITS 5, but it does not detect anything.
I was wondering if these algorithms are designed to recognise more complex object or if I have done something wrong during the setup. Any solution or hint would be helpful!
Thank you!
This is a collage of drops from different samples: the cells are a bit different from every acquisition, due to the different setup and ambient lights
This kind of problem should definitely be possible. I would suggest starting with a cifar 10 convolutional neural network tutorial and customizing it for your problem.
In future posts you should tell us how your training is progressing. Make sure you're outputting the following information every few steps (maybe every 10-100 steps):
Loss/cost function output, you should see your loss decreasing over time.
Classification accuracy on the current batch of your training data
Classification accuracy on a held out test set (if you've implemented test set evaluation, you might implement this second)
There are many, many, many things that can go wrong, from bad learning rates, to preprocessing steps that go awry. Neural networks are very hard to debug, they are very resilient to bugs, making it hard to even know if you have a bug in your code. For that reason make sure you're visualizing everything.
Another very important step to follow is to save the images exactly as you are passing them to tensorflow. You will have them in a matrix form, you can save that matrix form as an image. Do that immediately before you pass the data to tensorflow. Make sure you are giving the network what you expect it to receive. I can't tell you how many times I and others I know have passed garbage into the network unknowingly, assume the worst and prove yourself wrong!
Your next post should look something like this:
I'm training a convolutional neural network in tensorflow
My loss function (sigmoid cross entropy) is decreasing consistently (show us a picture!)
My input images look like this (show us a picture of what you ACTUALLY FEED to the network)
My learning rate and other parameters are A, B, and C
I preprocessed the data by doing M and N
The accuracy the network achieves on training data (and/or test data) is Y
In answering those questions you're likely to solve 10 problems along the way, and we'll help you find the 11th and, with some luck, last one. :)
Good luck!