Is mxnet.symbol.Convolution cyclic? - mxnet

Is the Convolution symbol computed cyclically, i.e., does it assume that the padded input symbol is periodical in all dimensions?
More specifically, if I've got input symbol of dimensions 1x3xHxW, representing an RGB image, and I define a convolution operating on it as below:
conv1 = mxmet.symbol.Convolution(data=input, kernel=(3, 5, 5), pad=(0, 2, 2)...
what the trained filter will look like? I expect it to be composed from linear combinations of 2-D filters operating on each of the color channels R,G,B.
Am I correct?

It turns out that convolutions in mxnet are 3D: first two dimensions reflect image coordinates, while the third dimension reflects the depth, i.e., the dimension of the feature space. For an RGB image at the input layer, depth is 3 (unless it is a grayscale image that has depth==1). For any other layer, depth is the number of features.
The convolution across the depth dimension is of course cyclical, such that all features of the current layer can affect any feature of the following layer by finding linear combinations that optimize the detection precision.

Related

How to train a classifier that contain multi dimensional featured input values

I am trying to model a classifier that contain Multi Dimensional Feature as input. Can any one knew of a dataset that contain multi dimensional Features?
Lets say for example: In mnist data we have pixel location as feature & feature value is a Single Dimensional grey scale value that varies from (0 - 255), But if we consider a colour image then in that case a single grey scale value is not sufficient, in this case also we will take the pixel location as feature but feature value will be of 3 Dimension( R(0-255) as one dimension, G(0-255) as second dimension and B(0-255) as third dimension) So in this case how can one solve using FeedForward Neural network?
SMALL SUGGESTIONS ALSO ACCEPTED.
The same way.
If you plug the pixels into your network directly just reshape the tensor to have H*W*3 length.
If you use convolutions note the the last parameter is the number of input/output dimensions. Just make sure the first convolution uses 3 as input.

Is a Tensorflow 3d convolution layer with kernel depth equal to depth equivalent to a 2d convolution layer?

To further explain the title, I am passing a series of single channel pictures into a convolution network and am comparing and contrasting conv3d versus conv2d. There are two possible setups I'm considering:
Setup 1 is using a conv2d layer with each picture input as a single channel. Input dimensions [batch_size,width,height,num_pictures]. Kernal Dimensions [width, height], and Stride [1,1]. Valid Padding.
Setup 2 is using a conv3d layer with each picture as the "depth" component of the kernal used for each picture. Input dimensions [batch_size, num_pictures, width, height, 1]. Kernal Dimensions [num_pictures, width, height]. Stride [1,1,1]. Valid padding.
The way I see it, 2d convolution networks consider all channels of a given input; so is there functionally any difference between the two above setups pragmatically and performance wise?

Shape of tensor for 2D image in Keras

I am a newbie to Keras (and somehow to TF) but I have found shape definition for the input layer very confusing.
So in the examples, when we have a 1D vector of length 20 for input, shape gets defined as
...Input(shape=(20,)...)
And when a 2D tensor for greyscale images needs to be defined for MNIST, it is defined as:
...Input(shape=(28, 28, 1)...)
So my question is why the tensor is not defined as (20) and (28, 28)? Why in the first case a second dimension is added and left empty? Also in second, number of channels have to be defined?
I understand that it depends on the layer so Conv1D, Dense or Conv2D take different shapes but it seems the first parameter is implicit?
According to docs, Dense needs be (batch_size, ..., input_dim) but how is this related the example:
Dense(32, input_shape=(784,))
Thanks
Tuples vs numbers
input_shape must be a tuple, so only (20,) can satisfy it. The number 20 is not a tuple. -- There is the parameter input_dim, to make your life easier if you have only one dimension. This parameter can take 20. (But really, I find it just confusing, I always work with input_shape and use tuples, to keep a consistent understanding).
Dense(32, input_shape=(784,)) is the same as Dense(32, input_dim=784).
Images
Images don't have only pixels, they also have channels (red, green, blue).
A black/white image has only one channel.
So, (28pixels, 28pixels, 1channel)
But notice that there isn't any obligation to follow this shape for images everywhere. You can shape them the way you like. But some kinds of layers do demand a certain shape, otherwise they couldn't work.
Some layers demand specific shapes
It's the case of the 2D convolutional layers, which need (size1,size2,channels). They need this shape because they must apply the convolutional filters accordingly.
It's also the case of recurrent layers, which need (timeSteps,featuresPerStep) to perform their recurrent calculations.
MNIST models
Again, there isn't any obligation to shape your image in a specific way. You must do it according to which first layer you choose and what you intend to achieve. It's a free thing.
Many examples simply don't care about an image being a 2d structured thing, and they just use models that take 784 pixels. That's enough. They probably start with Dense layers, which demand shapes like (size,)
Other examples may care, and use a shape (28,28), but then these models will have to reshape the input to fit the needs of the next layer.
Convolutional layers 2D will demand (28,28,1).
The main idea is: input arrays must match input_shape or input_dim.
Tensor shapes
Be careful, though, when reading Keras error messages or working with custom / lambda layers.
All these shapes we defined before omit an important dimension: the batch size or the number of samples.
Internally all tensors will have this additional dimension as the first dimension. Keras will report it as None (a dimension that will adapt to any batch size you have).
So, input_shape=(784,) will be reported as (None,784).
And input_shape=(28,28,1) will be reported as (None,28,28,1)
And your actual input data must have a shape that matches that reported shape.

How to implement the DecomposeMe architecture in TensorFlow?

There is a type of architecture that I would like to experiment with in TensorFlow.
The idea is to compose 2-D filter kernels by a combination of 1-D filters.
From the paper:
Simplifying ConvNets through Filter Compositions
The essence of our proposal consists of decomposing the ND kernels of a traditional network into N consecutive layers of 1D kernels.
...
We propose DecomposeMe which is an architecture consisting of decomposed layers. Each decomposed layer represents a N-D convolutional layer as a composition of 1D filters and, in addition, by including a non-linearity
φ(·) in-between.
...
Converting existing structures to decomposed ones is a straight forward process as
each existing ND convolutional layer can systematically be decomposed into sets of
consecutive layers consisting of 1D linearly rectified kernels and 1D transposed kernels
as shown in Figure 1.
If I understand correctly, a single 2-D convolutional layer is replaced with two consecutive 1-D convolutions?
Considering that the weights are shared and transposed, it is not clear to me how exactly to implement this in TensorFlow.
I know this question is old and you probably already figured it out, but it might help someone else with the same problem.
Separable convolution can be implemented in tensorflow as follows (details omitted):
X= placeholder(float32, shape=[None,100,100,3]);
v1=Variable(truncated_normal([d,1,3,K],stddev=0.001));
h1=Variable(truncated_normal([1,d,K,N],stddev=0.001));
M1=relu(conv2(conv2(X,v1),h1));
Standard 2d convolution with a column vector is the same as convolving each column of the input with that vector. Convolution with v1 produces K feature maps (or an output image with K channels), which is then passed on to be convolved by h1 producing the final desired number of featuremaps N.
Weight sharing, according to my knowledge, is simply a a misleading term, which is meant to emphasize the fact that you use one filter that is convolved with each patch in the image. Obviously you're going to use the same filter to obtain the results for each output pixel, which is how everyone does it in image/signal processing.
Then in order to "decompose" a convolution layer as shown on page 5, it can be done by simply adding activation units in between the convolutions (ignoring biases):
M1=relu(conv2(relu(conv2(X,v1)),h1));
Not that each filter in v1 is a column vector [d,1], and each h1 is a row vector [1,d]. The paper is a little vague, but when performing separable convolution, this is how it's done. That is, you convolve the image with the column vectors, then you convolve the result with the horizontal vectors, obtaining the final result.

How image is reduced to 7x7 by TensorFlow?

I'm reading the tutorial Deep MNIST for Experts. At the start of the section Densely Connected Layer, it says that "[...] the image size has been reduced to 7x7".
I can't seem to find out how they get to this 7x7 matrix. To my understanding, we start at 28x28 and have two layers of a 5x5 convolution kernels. 28 divided by 4 is 7, but not divided by 5.
5x5 is the "window" size for the convolution layer. It does not reduce the image size: TensorFlow and Caffe, among others, automatically supply a border pad. Torch, to name one, requires you to add that border (2 locations in each direction, in this case).
Each kernel (filter) considers a 5x5 subset of the entire image. For instance, to compute the value for position [7, 12] in the image, the convolution process considers the "window" [5:9, 10:14]. It multiplies each of these 25 values by its corresponding weight and sums those products. This sum becomes the value in the next layer for the center square [7,12].
This process repeats for every position in the image, and for each kernel in the layer.
As #Aenimated1 already mentioned, the size reduction comes from two poolings of 2x each. This operation divides the image into 2x2 windows, passing along the maximum value (or other representation, should the user specify) of each 2x2 square. This reduces the 28x28 image to 14x14; the second pooling reduces it to 7x7.
The reduction in the "image size" is the result of the pooling layers added after each convolutional layer. Each 2x2 pooling decreases the width and height by a factor of 2, thus yielding a 7x7 matrix after both pooling ops.