ConvLSTM2D data preparation - tensorflow

I'm trying to use ConvLSTM2D for 1700 of 90x3 data in keras.
I already did CONV2D which data is (1700x90x30x1). Data format is (batch, rows, cols, channels)
Now I want to use CONVLSTM2D but I found out I should change the data format to (samples, time, rows, cols, channels).
samples=1700 , row=90 , cols=30, channels=1
How to determine the "time"?

ConvLSTM2D, or LSTM as a special type of recurrent neural network in general, are used when the input data is a time series. This enables to take advantage of temporal properties within the data.
In case of ConvLSTM2D, the input is usually a video, consisting of multiple frames. Consequently, you have to reshape the data the following way:
samples=1700 , time=t, row=90 , cols=30, channels=1
where t is the number of frames in the video.
As an example, let's say we want to do video classification (or frame prediction) based on a short video clip of 10 frames, then t=10.
This of course only makes sense in case the image frames you are having are in a temporal order. Simply use tf.reshape(...).

Related

TF model wrong output dimensions

I am trying to make a model that is able to extract human speech from a recording. To do this I have loaded 1500 noisy files (some of these files are the exact same but with different speech to noise ratios (-1,1,3,5,7). I want my model to take in a wav file as a one dimensional array/tensor along the horizontal axis, and output a one dimensional array/tensor that I could then play.
currently this is how my data is set up.
this is how my model is setup
an error I am having is that I am not able to make a prediction and when I am i get an array/tensor with only one element, instead one with 220500. The reason behind 22050 is that it is the length of the background noise that was overlapped into clean speech so every file is this length.
I have been messing around with layers.Input because while I want my model to take in every row as one "object"/audio clip. I dont know if that is what's happening because the only "successful" prediction is an error
The model you built expect data in the format (batch_size, 1, 220500), as in the input layer you declared an input_shape of (1, 220500).
For the data you are using you should just use an input_shape of (220500,).
Another problem you might encounter, is that you are using a single unit in the last layer. This way the output of the model will be (batch_size, 1), but you need (batch_size, 220500) as an output.
For this last problem I suggest you to use a generative recurrent neural network.

What are the effects of padding a tensor?

I'm working on a problem using Keras that has been presenting me with issues:
My X data is all of shape (num_samples, 8192, 8), but my Y data is of shape (num_samples, 4), where 4 is a one-hot encoded vector.
Both X and Y data will be run through LSTM layers, but the layers are rejecting the Y data because it doesn't match the shape of the X data.
Is padding the Y data with 0s so that it matches the dimensions of the X data unreasonable? What kind of effects would that have? Is there a better solution?
Edited for clarification:
As requested, here is more information:
My Y data represents the expected output of passing the X data through my model. This is my first time working with LSTMs, so I don't have an architecture in mind, but I'd like to use an architecture that works well with classifying long (8192-length) sequences of words into one of several categories. Additionally, the dataset that I have is of an immense size when fed through an LSTM, so I'm currently using batch-training.
Technologies being used:
Keras (Tensorflow Backend)
TL;DR Is padding one tensor with zeroes in all dimensions to match another tensor's shape a bad idea? What could be a better approach?
First of all, let's make sure your representation is actually what you think it is; the input to an LSTM (or any recurrent layer, for that matter) must be of dimensionality: (timesteps, shape), i.e. if you have 1000 training samples, each consisting of 100 timesteps, with each timestep having 10 values, your input shape will be (100,10,). Therefore I assume from your question that each input sample in your X set has 8192 steps and 8 values per step. Great; a single LSTM layer can iterate over these and produce 4-dimensional representations with absolutely no problem, just like so:
myLongInput = Input(shape=(8192,8,))
myRecurrentFunction = LSTM(4)
myShortOutput = myRecurrentFunction(myLongInput)
myShortOutput.shape
TensorShape([Dimension(None), Dimension(4)])
I assume your problem stems from trying to apply yet another LSTM on top of the first one; the next LSTM expects a tensor that has a time dimension, but your output has none. If that is the case, you'll need to let your first LSTM also output the intermediate representations at each time step, like so:
myNewRecurrentFunction=LSTM(4, return_sequences=True)
myLongOutput = myNewRecurrentFunction(myLongInput)
myLongOutput.shape
TensorShape([Dimension(None), Dimension(None), Dimension(4)])
As you can see the new output is now a 3rd order tensor, with the second dimension now being the (yet unassigned) timesteps. You can repeat this process until your final output, where you usually don't need the intermediate representations but rather only the last one. (Sidenote: make sure to set the activation of your last layer to a softmax if your output is in one-hot format)
On to your original question, zero-padding has very little negative impact on your network. The network will strain itself a bit in the beginning trying to figure out the concept of the additional values you have just thrown at it, but will very soon be able to learn they're meaningless. This comes at a cost of a larger parameter space (therefore more time and memory complexity), but doesn't really affect predictive power most of the time.
I hope that was helpful.

How to train a classifier that contain multi dimensional featured input values

I am trying to model a classifier that contain Multi Dimensional Feature as input. Can any one knew of a dataset that contain multi dimensional Features?
Lets say for example: In mnist data we have pixel location as feature & feature value is a Single Dimensional grey scale value that varies from (0 - 255), But if we consider a colour image then in that case a single grey scale value is not sufficient, in this case also we will take the pixel location as feature but feature value will be of 3 Dimension( R(0-255) as one dimension, G(0-255) as second dimension and B(0-255) as third dimension) So in this case how can one solve using FeedForward Neural network?
SMALL SUGGESTIONS ALSO ACCEPTED.
The same way.
If you plug the pixels into your network directly just reshape the tensor to have H*W*3 length.
If you use convolutions note the the last parameter is the number of input/output dimensions. Just make sure the first convolution uses 3 as input.

Are there any pros to having a convolution layer using a filter the same size as the input data?

Are there any pros to having a convolution layer using a filter the same size as the input data (i.e. the filter can only fit over the input one way)?
A filter the same size as the input data will collapse the output dimensions to
1 x 1 x n_filters, which could be useful towards the end of a network that has a low dimensional output like a single number for example.
One place this is used is in sliding window object detection, where redundant computation is saved by making only one forward pass to compute the output on all windows.
However, it is more typical to add one or more dense layers that give the desired output dimension instead of fully collapsing your data with convolution layers.

Vector representation in multidimentional time-series prediction in Tensorflow

I have a large data set (~30 million data-points with 5 features) that I have reduced using K-means down to 200,000 clusters. The data is a time-series with ~150,000 time-steps. The data on which I would like to train the model is the presence of particular clusters at each time-step. The purpose of the predictive model is generate a generalized sequence similar to generating syntactically correct sentences from a model trained on word sequences. The easiest way to think about this data is that I'm trying to predict the pixels in the next video frame from pixels in the current video frame in order to generate a new sequence of frames that approximate the original sequence.
The raw and sparse representation at each time-step would be 200,000 binary values representing which clusters are present or not at that time step. Note, no more than 200 clusters may be present in any one time-step and thus this representation is extremely sparse.
What is the best representation to convert this sparse vector to a dense vector that would be more suitable to time-series prediction using Tensorflow?
I initially had in mind a RNN / LSTM trained on the vectors at each time-step, but due to the size of the training vector I'm now wondering if a convolution approach would be more suitable.
Note, I have not actually used tensorflow beyond some simple tutorials, but have have previously used OpenCV ML functions. Please consider me a novice in your responses.
Thank you.