How does tf.nn.conv2d behave with an even-sized filter? - tensorflow

I've read this question for which the accepted answer only mentions square odd-sized filters (1x1, 3x3), and I am intrigued about how tf.nn.conv2d() behaves when using a square even-sized filter (e.g. 2x2) given that none of its elements can be considered its centre.
If padding='VALID' then I assume that tf.nn.conv2d() will stride across the input in the same way as it would if the filter were odd-sized.
However, if padding='SAME' how does tf.nn.conv2d() choose to centre the even-sized filter over the input?

See the description here:
https://www.tensorflow.org/versions/r0.9/api_docs/python/nn.html#convolution
For VALID padding, you're exactly right. You simply walk the filter over the input without any padding, moving the filter by the stride each time.
For SAME padding, you do the same as VALID padding, but conceptually you pad the input with some zeros before and after each dimension before computing the convolution. If an odd number of padding elements must be added, the right/bottom side gets the extra element.
Use the formula pad_... formulae in the link above to work out how much padding to add. For example, for simplicity, let's consider a 1D convolution. For a input of size 7, a window of size 2, and a stride of 1, you would add 1 element of padding to the right, and 0 elements of padding to the left.
Hope that helps!

when you do filtering in the conv layer, you simply getting the mean of each patch:
taking example of 1D size 5 input [ 1, 2, 3, 4, 5 ],
use size 2 filter and do valid padding, stride is 1,
you will get size 4 output(use mean of the inner product for weight parameter metrix of [1,1]) [ (1+2)/2, (2+3)/2, (3+4)/2, (4+5)/2 ],
which is [ 1.5, 2.5, 3.5, 4.5 ],
if you do same padding with stride 1, you will get size 5 output [ (1+2)/2, (2+3)/2, (3+4)/2, (4+5)/2, (5 + 0)/2 ], the final 0 here is the padding 0,
which is [ 1.5, 2.5, 3.5, 4.5, 2.5 ],
vise versa, if you have a 224*224 input, when you do 2 by 2 filtering using valid padding, it will out put 223*223 output

Related

How to do 2D Convolution only at a specific location?

This question has been asked multiple times but still I could not get what I was looking for. Imagine
data=np.random.rand(N,N) #shape N x N
kernel=np.random.rand(3,3) #shape M x M
I know convolution typically means placing the kernel all over the data. But in my case N and M are of the orders of 10000. So I wish to get the value of the convolution at a specific location in the data, say at (10,37) without doing unnecessary calculations at all locations. So the output will be just a number. The main goal is to reduce the computation and memory expenses. Is there any inbuilt function that does this with minimal adjustments?
Indeed, applying the convolution for a particular position coincides with the mere sum over the entries of a (pointwise) multiplication of the submatrix in data and the flipped kernel itself. Here, is a reproducible example.
Code
N = 1000
M = 3
np.random.seed(777)
data = np.random.rand(N,N) #shape N x N
kernel= np.random.rand(M,M) #shape M x M
# Pointwise convolution = pointwise product
data[10:10+M,37:37+M]*kernel[::-1, ::-1]
>array([[0.70980514, 0.37426475, 0.02392947],
[0.24387766, 0.1985901 , 0.01103323],
[0.06321042, 0.57352696, 0.25606805]])
with output
conv = np.sum(data[10:10+M,37:37+M]*kernel[::-1, ::-1])
conv
>2.45430578
The kernel is being flipped by definition of the convolution as explained in here and was kindly pointed Warren Weckesser. Thanks!
The key is to make sense of the index you provided. I assumed it refers to the upper left corner of the sub-matrix in data. However, it can refer to the midpoint as well when M is odd.
Concept
A different example with N=7 and M=3 exemplifies the idea
and is presented in here for the kernel
kernel = np.array([[3,0,-1], [2,0,1], [4,4,3]])
which, when flipped, yields
k[::-1,::-1]
> array([[ 3, 4, 4],
[ 1, 0, 2],
[-1, 0, 3]])
EDIT 1:
Please note that the lecturer in this video does not explicitly mention that flipping the kernel is required before the pointwise multiplication to adhere to the mathematically proper definition of convolution.
EDIT 2:
For large M and target index close to the boundary of data, a ValueError: operands could not be broadcast together with shapes ... might be thrown. To prevent this, padding the matrix data with zeros can prevent this (although it blows up the memory requirement). I.e.
data = np.pad(data, pad_width=M, mode='constant')

Filter size and stride when upsampling image using Conv2D Transpose

I am using Conv2D Transpose to upsample image by factors 18, 9, 6, 3.
My images are of sizes (1,1), (2,2), (3,3), (6,6). The goal is to upsample them to size (18,18).
The problem I am having is when choosing the correct filter size, stride and padding to achieve this. I have read articles about checkboard patterns that might arise when using improper sizes, but I still have not found any solution about which sizes to choose.
For the image (1,1) -> (18,18), I have chosen the filter size of (18,18) with no stride and padding. This makes sense to me as this one pixel is solely responsible for the look of the entire upsampled image.
But the other three are giving me problems.
One solution that I have thought of is that for (2,2) -> (18,18), I use the filter size (9,9) with stride (9,9). This would result that each pixel (2,2) provides 9,9 upsampled pixels.
Is this a proper way or would you recommend something else.
Have a look at the Keras docs. You can find the formula to calculate the output shape there:
new_rows = ((rows - 1) * strides[0] + kernel_size[0] - 2 * padding[0] + output_padding[0])
new_cols = ((cols - 1) * strides[1] + kernel_size[1] - 2 * padding[1] + output_padding[1])

How does the gradient of the sum trick work to get maxpooling positions in keras?

The keras examples directory contains a lightweight version of a stacked what-where autoencoder (SWWAE) which they train on MNIST data. (https://github.com/fchollet/keras/blob/master/examples/mnist_swwae.py)
In the original SWWAE paper, the authors compute the what and where using soft functions. However, in the keras implementation, they use a trick to get these locations. I would like to understand this trick.
Here is the code of the trick.
def getwhere(x):
''' Calculate the 'where' mask that contains switches indicating which
index contained the max value when MaxPool2D was applied. Using the
gradient of the sum is a nice trick to keep everything high level.'''
y_prepool, y_postpool = x
return K.gradients(K.sum(y_postpool), y_prepool) # How exactly does this line work?
Where y_prepool is a MxN matrix and y_postpool is a M/2 x N/2 matrix (lets assume canonical pooling of a size 2 pixels).
I have verified that the output of getwhere() is a bed of nails matrix where the nails indicate the position of the max (the local argmax if you will).
Can someone construct a small example demonstrating how getwhere works using this "Trick?"
Lets focus on the simplest example, without really talking about convolutions, say we have a vector
x = [1 4 2]
which we max-pool over (with a single, big window), we get
mx = 4
mathematically speaking, it is:
mx = x[argmax(x)]
now, the "trick" to recover one hot mask used by pooling is to do
magic = d mx / dx
there is no gradient for argmax, however it "passes" the corresponding gradient to an element in a vector at the location of maximum element, so:
d mx / dx = [0/dx[1] dx[2]/dx[2] 0/dx[3]] = [0 1 0]
as you can see, all the gradient for non-maximum elements are zero (due to argmax), and "1" appears at the maximum value because dx/x = 1.
Now for "proper" maxpool you have many pooling regions, connected to many input locations, thus taking analogous gradient of sum of pooled values, will recover all the indices.
Note however, that this trick will not work if you have heavily overlapping kernels - you might end up with bigger values than "1". Basically if a pixel is max-pooled by K kernels, than it will have value K, not 1, for example:
[1 ,2, 3]
x = [13,3, 1]
[4, 2, 9]
if we max pool with 2x2 window we get
mx = [13,3]
[13,9]
and the gradient trick gives you
[0, 0, 1]
magic = [2, 0, 0]
[0, 0, 1]

numpy: Broadcasting a vector horizontally

I have a 1-D array in numpy v. I'd like to copy it to make a matrix with each row being a copy of v. That's easy: np.broadcast_to(v, desired_shape).
However, if I'd like to treat v as a column vector, and copy it to make a matrix with each column being a copy of v, I can't find a simple way to do it. Through trial and error, I'm able to do this:
np.broadcast_to(v.reshape(v.shape[0], 1), desired_shape)
While that works, I can't claim to understand it (even though I wrote it!).
Part of the problem is that numpy doesn't seem to have a concept of a column vector (hence the reshape hack instead of what in math would just be .T).
But, a deeper part of the problem seems to be that broadcasting only works vertically, not horizontally. Or perhaps a more correct way to say it would be: broadcasting only works on the higher dimensions, not the lower dimensions. I'm not even sure if that's correct.
In short, while I understand the concept of broadcasting in general, when I try to use it for particular applications, like copying the col vector to make a matrix, I get lost.
Can you help me understand or improve the readability of this code?
https://en.wikipedia.org/wiki/Transpose - this article on Transpose talks only of transposing a matrix.
https://en.wikipedia.org/wiki/Row_and_column_vectors -
a column vector or column matrix is an m × 1 matrix
a row vector or row matrix is a 1 × m matrix
You can easily create row or column vectors(matrix):
In [464]: np.array([[1],[2],[3]]) # column vector
Out[464]:
array([[1],
[2],
[3]])
In [465]: _.shape
Out[465]: (3, 1)
In [466]: np.array([[1,2,3]]) # row vector
Out[466]: array([[1, 2, 3]])
In [467]: _.shape
Out[467]: (1, 3)
But in numpy the basic structure is an array, not a vector or matrix.
[Array in Computer Science] - Generally, a collection of data items that can be selected by indices computed at run-time
A numpy array can have 0 or more dimensions. In contrast in MATLAB matrix has 2 or more dimensions. Originally a 2d matrix was all that MATLAB had.
To talk meaningfully about a transpose you have to have at least 2 dimensions. One may have size one, and map onto a 1d vector, but it still a matrix, a 2d object.
So adding a dimension to a 1d array, whether done with reshape or [:,None] is NOT a hack. It is a perfect valid and normal numpy operation.
The basic broadcasting rules are:
a dimension of size 1 can be changed to match the corresponding dimension of the other array
a dimension of size 1 can be added automatically on the left (front) to match the number of dimensions.
In this example, both steps apply: (5,)=>(1,5)=>(3,5)
In [458]: np.broadcast_to(np.arange(5), (3,5))
Out[458]:
array([[0, 1, 2, 3, 4],
[0, 1, 2, 3, 4],
[0, 1, 2, 3, 4]])
In this, we have to explicitly add the size one dimension on the right (end):
In [459]: np.broadcast_to(np.arange(5)[:,None], (5,3))
Out[459]:
array([[0, 0, 0],
[1, 1, 1],
[2, 2, 2],
[3, 3, 3],
[4, 4, 4]])
np.broadcast_arrays(np.arange(5)[:,None], np.arange(3)) produces two (5,3) arrays.
np.broadcast_arrays(np.arange(5), np.arange(3)[:,None]) makes (3,5).
np.broadcast_arrays(np.arange(5), np.arange(3)) produces an error because it has no way of determining whether you want (5,3) or (3,5) or something else.
Broadcasting always adds new dimensions to the left because it'd be ambiguous and bug-prone to try to guess when you want new dimensions on the right. You can make a function to broadcast to the right by reversing the axes, broadcasting, and reversing back:
def broadcast_rightward(arr, shape):
return np.broadcast_to(arr.T, shape[::-1]).T

Tensorflow extract_glimpse offset

I am trying to use the extract_glimpse function of tensorflow but I encounter some difficulties with the offset parameter.
Let's assume that I have a batch of one single channel 5x5 matrix called M and that I want to extract a 3x3 matrix of it.
When I call extract_glimpse([M], [3,3], [[1,1]], centered=False, normalized=False), it returns the result I am expecting: the 3x3 matrix centered at the position (1,1) in M.
But when I call extract_glimpse([M], [3,3], [[2,1]], centered=False, normalized=False), it doesn't return the 3x3 matrix centered at the position (2,1) in M but it returns the same as in the first call.
What is the point that I don't get?
The pixel coordinates actually have a range of 2 times the size (not documented - so it is a bug indeed). This is at least true in the case of centered=True and normalized=False. With those settings, the offsets range from minus the size to plus the size of the tensor. I therefore wrote a wrapper that is more intuitive to numpy users, using pixel coordinates starting at (0,0). This wrapper and more details about the problem are available on the tensorflow GitHub page.
For your particular case, I would try something like:
offsets1 = [-5 + 3,
-5 + 3]
extract_glimpse([M], [3,3], [offsets1], centered=True, normalized=False)
offsets2 = [-5 + 3 + 2,
-5 + 3]
extract_glimpse([M], [3,3], [offsets2], centered=True, normalized=False)