How to train a classifier that contain multi dimensional featured input values - tensorflow

I am trying to model a classifier that contain Multi Dimensional Feature as input. Can any one knew of a dataset that contain multi dimensional Features?
Lets say for example: In mnist data we have pixel location as feature & feature value is a Single Dimensional grey scale value that varies from (0 - 255), But if we consider a colour image then in that case a single grey scale value is not sufficient, in this case also we will take the pixel location as feature but feature value will be of 3 Dimension( R(0-255) as one dimension, G(0-255) as second dimension and B(0-255) as third dimension) So in this case how can one solve using FeedForward Neural network?
SMALL SUGGESTIONS ALSO ACCEPTED.

The same way.
If you plug the pixels into your network directly just reshape the tensor to have H*W*3 length.
If you use convolutions note the the last parameter is the number of input/output dimensions. Just make sure the first convolution uses 3 as input.

Related

Dimensions of a convolution?

I have some questions regarding how this convolution is calculated and its output dimension. I'm familiar with simple convolutions with a nxm kernel, using strides, dilations or padding, thats not a problem, but this dimensions seems odd to me. Since the model that I'm using is pretty well known onnx-mnist, I assume it is correct.
So, my point is:
If the input has a dimensions of 1x1x28x28, how is the output 1x8x28x28?
W denotes the kernel. How can it be 8x1x5x5? As far as I know, the first dimension is the batch size, but here I'm just doing inference with 1 input. Does this make sense?
I'm implementing from scratch this convolution operator, and so far it works for 1x1x28x28 and a kernel of 1x1x5x5, but that extra dimensions doesn't make sense to me.
Find attached the convolution that I'm trying to do, hope is not too onnx specific.
I do not see the code you are using but I guess 8 is the number of kernels. This means you apply 8 different kernels on your input with the size 5x5 over a batch size of 1. That is how you get 1x8x28x28 in the output, the 8 denotes the number of activation maps (one for each kernel).
The numbers of your kernel dimensions (8x1x5x5) explained:
8: Number of different filters/kernels (will be number of output maps per image)
1: Number of input channels. If your input image was RGB instead of grayscale, this would be 3 instead of 1.
5: First spatial dimension
5: Second spatial dimension

how to generate different samples using PixelCNN?

I am trying pixelcnn, which is auto-regressive generative model. After training, the model receive an all-zero tensor and generate the next pixel form the left top coner. Now that the model parameters are fixed, does the model only can produce the same outputs starting from the same zero tensor? How to produce different samples?
Yes, you always provide an all-zero tensor. However, for PixelCNN each pixel location is represented by a distribution. So when you do the forward pass you then sample from a random distribution at the end. That is how the pixel values are different each run.
This is of course because PixelCNN is a probabilistic neural network. So the pixels, as mentioned before, are all represented by conditional probability distributions of all the layers below, not just point estimates.

What are the effects of padding a tensor?

I'm working on a problem using Keras that has been presenting me with issues:
My X data is all of shape (num_samples, 8192, 8), but my Y data is of shape (num_samples, 4), where 4 is a one-hot encoded vector.
Both X and Y data will be run through LSTM layers, but the layers are rejecting the Y data because it doesn't match the shape of the X data.
Is padding the Y data with 0s so that it matches the dimensions of the X data unreasonable? What kind of effects would that have? Is there a better solution?
Edited for clarification:
As requested, here is more information:
My Y data represents the expected output of passing the X data through my model. This is my first time working with LSTMs, so I don't have an architecture in mind, but I'd like to use an architecture that works well with classifying long (8192-length) sequences of words into one of several categories. Additionally, the dataset that I have is of an immense size when fed through an LSTM, so I'm currently using batch-training.
Technologies being used:
Keras (Tensorflow Backend)
TL;DR Is padding one tensor with zeroes in all dimensions to match another tensor's shape a bad idea? What could be a better approach?
First of all, let's make sure your representation is actually what you think it is; the input to an LSTM (or any recurrent layer, for that matter) must be of dimensionality: (timesteps, shape), i.e. if you have 1000 training samples, each consisting of 100 timesteps, with each timestep having 10 values, your input shape will be (100,10,). Therefore I assume from your question that each input sample in your X set has 8192 steps and 8 values per step. Great; a single LSTM layer can iterate over these and produce 4-dimensional representations with absolutely no problem, just like so:
myLongInput = Input(shape=(8192,8,))
myRecurrentFunction = LSTM(4)
myShortOutput = myRecurrentFunction(myLongInput)
myShortOutput.shape
TensorShape([Dimension(None), Dimension(4)])
I assume your problem stems from trying to apply yet another LSTM on top of the first one; the next LSTM expects a tensor that has a time dimension, but your output has none. If that is the case, you'll need to let your first LSTM also output the intermediate representations at each time step, like so:
myNewRecurrentFunction=LSTM(4, return_sequences=True)
myLongOutput = myNewRecurrentFunction(myLongInput)
myLongOutput.shape
TensorShape([Dimension(None), Dimension(None), Dimension(4)])
As you can see the new output is now a 3rd order tensor, with the second dimension now being the (yet unassigned) timesteps. You can repeat this process until your final output, where you usually don't need the intermediate representations but rather only the last one. (Sidenote: make sure to set the activation of your last layer to a softmax if your output is in one-hot format)
On to your original question, zero-padding has very little negative impact on your network. The network will strain itself a bit in the beginning trying to figure out the concept of the additional values you have just thrown at it, but will very soon be able to learn they're meaningless. This comes at a cost of a larger parameter space (therefore more time and memory complexity), but doesn't really affect predictive power most of the time.
I hope that was helpful.

Are there any pros to having a convolution layer using a filter the same size as the input data?

Are there any pros to having a convolution layer using a filter the same size as the input data (i.e. the filter can only fit over the input one way)?
A filter the same size as the input data will collapse the output dimensions to
1 x 1 x n_filters, which could be useful towards the end of a network that has a low dimensional output like a single number for example.
One place this is used is in sliding window object detection, where redundant computation is saved by making only one forward pass to compute the output on all windows.
However, it is more typical to add one or more dense layers that give the desired output dimension instead of fully collapsing your data with convolution layers.

Is mxnet.symbol.Convolution cyclic?

Is the Convolution symbol computed cyclically, i.e., does it assume that the padded input symbol is periodical in all dimensions?
More specifically, if I've got input symbol of dimensions 1x3xHxW, representing an RGB image, and I define a convolution operating on it as below:
conv1 = mxmet.symbol.Convolution(data=input, kernel=(3, 5, 5), pad=(0, 2, 2)...
what the trained filter will look like? I expect it to be composed from linear combinations of 2-D filters operating on each of the color channels R,G,B.
Am I correct?
It turns out that convolutions in mxnet are 3D: first two dimensions reflect image coordinates, while the third dimension reflects the depth, i.e., the dimension of the feature space. For an RGB image at the input layer, depth is 3 (unless it is a grayscale image that has depth==1). For any other layer, depth is the number of features.
The convolution across the depth dimension is of course cyclical, such that all features of the current layer can affect any feature of the following layer by finding linear combinations that optimize the detection precision.