Tensorflow, is there a way to specify padding along axes ? - tensorflow

I want to perform a convolution over an image, in tensorflow. I want the kernel to be as tall as the image and very thin. For example:
kernel_size = [200, 24]
image_size = [200, 400]
If I use padding "SAME", instead of getting a vector out, I get a [200, 400] image back, since tensorflow pads the image at the top and bottom and convolves with the kernel over the padded image.
If, on the other hand, I use padding "VALID", the problem for the top and bottom disappears, but it also does not fully cover the horizontal direction of my image, such that, if the horizontal dimension of the image is not divisible by the kernel dimension, you lose a part of it.
Is there a way to perform "VALID" padding at the top and bottom and "SAME" padding left and right? Or is there another way of doing this?

With the default padding options of tensorflow's convolution functions there is no way to do this, you will have to pad your tensor manually to get your horizontal dimension to be divisible by the kernel dimension.
After padding manually use VALID padding in order not to have the pixels of the manual padding be considered to be anything other than padding
To pad your tensor manualy you could use tf.concat with a constant tensor with shape and value you want. If your images are always the same size this will not be difficult to figure out.

For any non-standard padding, TF has a custom padding operation: tf.pad(). So just figure out what is the appropriate padding and call something like
# 't' is [[1, 2, 3], [4, 5, 6]].
# 'paddings' is [[1, 1,], [2, 2]].
# rank of 't' is 2.
pad(t, paddings, "CONSTANT") ==> [[0, 0, 0, 0, 0, 0, 0],
[0, 0, 1, 2, 3, 0, 0],
[0, 0, 4, 5, 6, 0, 0],
[0, 0, 0, 0, 0, 0, 0]]

Related

Why Reshape and Permute for segmentation with unet?

I am doing the image semantic segmentation job with unet. I am confused with the last layers for pixel classification. The Unet code is like this:
...
reshape = Reshape((n_classes,self.img_rows * self.img_cols))(conv9)
permute = Permute((2,1))(reshape)
activation = Activation('softmax')(permute)
model = Model(input = inputs, output = activation)
return model
...
Can I just reshape without using Permute like this?
reshape = Reshape((self.img_rows * self.img_cols, n_classes))(conv9)
Updated:
I found the training result is not right when when using the directly reshape way:
reshape = Reshape((self.img_rows * self.img_cols, n_classes))(conv9) // the loss is not convergent
My groundtruth is generated like this:
X = []
Y = []
im = cv2.imread(impath)
X.append(im)
seg_labels = np.zeros((height, width, n_classes))
for spath in segpaths:
mask = cv2.imread(spath, 0)
seg_labels[:, :, c] += mask
Y.append(seg_labels.reshape(width*height, n_classes))
Why reshape directly does not work?
You clearly misunderstand the meaning of each operation and the final goal:
final goal: classification for each pixel, i.e. softmax along the semantic class axis
how to achieve this goal in the original code? Let's see the code line by line:
reshape = Reshape((n_classes,self.img_rows * self.img_cols))(conv9) # L1
permute = Permute((2,1))(reshape) # L2
activation = Activation('softmax')(permute) # L3
L1's output dim = n_class-by-n_pixs, (n_pixs=img_rows x img_cols)
L2's output dim = n_pixs-by-n_class
L3's output dim = n_pixs-by-n_class
Note the default softmax activation is applied to the last axis, i.e. the axis that n_class stands for, which is the semantic class axis.
Therefore, this original code fulfills the final goal of semantic segmentation.
Let's revisit the code that you want to change, which is
reshape = Reshape((self.img_rows * self.img_cols, n_classes))(conv9) # L4
L4's output dim = n_pixs-by-n_class
My guess is that you think L4's output dim matches L2's, and thus L4 is a short-cut that is equivalent to executing L1 and L2.
However, matching the shape does not necessarily mean matching the physical meaning of axes. Why? A simple example will explain.
Say you have 2 semantic classes and 3 pixels. To see the difference assume all three pixels belong to the same class.
In other words, a ground truth tensor will look like this
# cls#1 cls#2
[ [0, 1], # pixel #1
[0, 1], # pixel #2
[0, 1], # pixel #3
]
Assume you have a perfect network and generate the exact response for each pixel, but your solution will create a tensor like below
# cls#1 cls#2
[ [0, 0], # pixel #1
[0, 1], # pixel #2
[1, 1], # pixel #3
]
whose shape is the same as the ground truth's, but fails to match the physical meaning of axes.
This further makes the softmax operation meaningless, because it is supposed to apply to the class dimension, but this dimension does not physically exist. As a result, it leads to the following erroneous output after applying softmax,
# cls#1 cls#2
[ [0.5, 0.5], # pixel #1
[0, 1], # pixel #2
[0.5, 0.5], # pixel #3
]
which completely mess up the training even if it is under the ideal assumption.
Therefore, it is a good habit to write down the physical meaning of each axis of a tensor. When you do any tensor reshape operation, ask yourself whether the physical meaning of an axis is changed in your expected way.
For example, if you have a tensor T of shape batch_dim x img_rows x img_cols x feat_dim, you can do many things and not all of them make sense (due to the problematic physical meaning of axes)
(Wrong) reshape it to whatever x feat_dim, because whatever dimension is meaningless in testing where the batch_size might be different.
(Wrong) reshape it to batch_dim x feat_dim x img_rows x img_cols, because the 2nd dimension is NOT the feature dimension and neither for the 3rd and 4th dimension.
(Correct) permute axes (3,1,2), and this will lead you the tensor of shape batch_dim x feat_dim x img_rows x img_cols, while keeping the physical meaning of each axis.
(Correct) reshape it to batch_dim x whatever x feat_dim. This is also valid, because the whatever=img_rows x img_cols is equivalent to the pixel location dimension, and both the meanings of batch_dim and feat_dim are unchanged.
Your code will still be runnable since the shape will be the same, but the result (backprops) will be different since the values of tensors will be different. For example:
arr = np.array([[[1,1,1],[1,1,1]],[[2,2,2],[2,2,2]],[[3,3,3],[3,3,3]],[[4,4,4],[4,4,4]]])
arr.shape
>>>(4, 2, 3)
#do reshape, then premute
reshape_1 = arr.reshape((4, 2*3))
np.swapaxes(reshape_1, 1, 0)
>>>array([[1, 2, 3, 4],
[1, 2, 3, 4],
[1, 2, 3, 4],
[1, 2, 3, 4],
[1, 2, 3, 4],
[1, 2, 3, 4]])
#do reshape directly
reshape_2 = arr.reshape(2*3, 4)
reshape_2
>>>array([[1, 1, 1, 1],
[1, 1, 2, 2],
[2, 2, 2, 2],
[3, 3, 3, 3],
[3, 3, 4, 4],
[4, 4, 4, 4]])
The Reshape and Permute is done to take the softmax at each pixel location. Adding to #meowongac's answer, Reshape preserves the order of the elements. In this case, since the channel dimensions have to be swapped, Reshape followed by Permute is appropriate.
Considering the case of (2,2) image with 3 values at each location,
arr = np.array([[[1,1],[1,1]],[[2,2],[2,2]],[[3,3],[3,3]]])
>>> arr.shape
(3, 2, 2)
>>> arr
array([[[1, 1],
[1, 1]],
[[2, 2],
[2, 2]],
[[3, 3],
[3, 3]]])
>>> arr[:,0,0]
array([1, 2, 3])
The channel values at each location are [1,2,3]. The goal is to swap the channel axis(length 3) to the end.
>>> arr.reshape((2,2,3))[0,0]
array([1, 1, 1]) # incorrect
>>> arr.transpose((1,2,0))[0,0] # similar to what permute does.
array([1, 2, 3]) # correct
More examples at this link: https://discuss.pytorch.org/t/how-to-change-shape-of-a-matrix-without-dispositioning-the-elements/30708

Spatial Pyramid Pooling - Input Size Error (? - None)

I've been trying to implement the Spatial Pyramid Pooling (https://arxiv.org/abs/1406.4729), but I've been having a problem with the input size.
My input has shape (batch_size, None, n_feature_maps) and I have the following code:
self.y_conv_unstacked = tf.unstack(self.conv_output, axis=0)
self.y_maxpool = []
for tensor in self.y_conv_unstacked:
for size_pool in self.out_pool_size:
self.w_strd = self.w_size = math.ceil(float(tensor.get_shape()[1]) / size_pool)
self.pad_w = int(size_pool * self.w_size - tensor.get_shape()[1])
self.padded_tensor = tf.pad(tensor, tf.constant([[0, 0], [0, 0], [0, self.pad_w], [0, 0]]))
self.max_pool = tf.nn.max_pool(self.padded_tensor, ksize=[1, 1, self.w_size, 1], strides=[1, 1, self.w_strd, 1], padding='SAME')
self.spp_tensor = tf.concat([self.spp_tensor, tf.reshape(self.max_pool, [1, size_pool, self.n_fm1])], axis=1)
self.y_maxpool.append(spp_tensor)
Since the inputs in the batch have different sizes, I am unstacking them and pooling each tensor separately. However when using tensor.get_shape()[1], it returns "?". If I use tensor.get_shape().as_list()[1], it returns None.
I would like to know how I can work around this nondefined size. Is it possible to get the tensor's shape at runtime?
Edit: Using tf.shape, I get a tensor. How can I use this tensor to create the ksize, strides and paddings I need?
I would like to know how I can work around this nondefined size. Is it
possible to get the tensor's shape at runtime?
Use tf.shape() op to get the dynamic shape of a tensor instead of the x.get_shape() which returns the static shape of x.
This is explained in detail here.
In the above code replace tensor.get_shape()[1] with tf.shape(tensor)[1]

Tensorflow, update the Variable to have arbitrary shape

So, according to the documentation, we can use tf.assign with validate_shape=False to change the shape. It does change the shape of the content of the variable, but the shape you can get from get_shape() doesn't get updated. For example:
>>> a = tf.Variable([1, 1, 1, 1])
>>> sess.run(tf.global_variables_initializer())
>>> tf.assign(a, [[1, 1, 1, 1, 1, 1, 1], [1, 1, 1, 1, 1, 1, 1]], validate_shape=False).eval()
array([[1, 1, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 1, 1, 1]], dtype=int32)
>>> a.get_shape()
TensorShape([Dimension(4)])
It's pretty annoying that the later layers of the network base their shapes on the get_shape() value of this variable. So, even though the actual shape is correct, Tensorflow will complain the dimensions doesn't match. So any ideas on how to update the "believed" shape of each Variable?
In short: use set_shape to update the static shape of the variable.
You can understand what is going on by reading the TF FAQ:
In TensorFlow, a tensor has both a static (inferred) shape and a
dynamic (true) shape. The static shape can be read using the
tf.Tensor.get_shape method: this shape is inferred from the operations
that were used to create the tensor, and may be partially complete. If
the static shape is not fully defined, the dynamic shape of a Tensor t
can be determined by evaluating tf.shape(t).
So the static shape was not properly inferred and you should give TF a hint. Luckily the next few lines from the same FAQ tell you what to do:
The tf.Tensor.set_shape method updates the static shape of a Tensor
object, and it is typically used to provide additional shape
information when this cannot be inferred directly. It does not change
the dynamic shape of the tensor.
Since validate_shape is set to false static shape of the variable doesn't get update automatically in the graph. a wrokaround is to set it manualy with the new shape (that's known)
a = tf.Variable([1, 1, 1, 1], validate_shape=False)
sess.run(tf.global_variables_initializer())
new_arr_assign = np.array([[1, 1, 1, 1, 1, 1, 1], [1, 1, 1, 1, 1, 1, 1]])
tf.assign(a, new_arr_assign, validate_shape=False).eval(session=sess)
a.set_shape(new_arr_assign.shape)
a.get_shape()
# results: TensorShape([Dimension(2), Dimension(7)])

Keras Array Input Error

I get the following error:
ValueError: Error when checking model input: the list of Numpy arrays that you are passing to your model is not the size the model expected. Expected to see 6 arrays but instead got the following list of 3 arrays: [array([[ 0, 0, 0, ..., 18, 12, 1],
[ 0, 0, 0, ..., 18, 11, 1],
[ 0, 0, 0, ..., 18, 9, 1],
...,
[ 0, 0, 0, ..., 18, 15, 1],
[ 0, 0, 0, ..., 18, 9, ...
in my keras model.
I think the model is mistaking something?
This happens when I feed input to my model. The same input works perfectly well in another program.
It's impossible to diagnose your exact problem without more information.
I usually specify the input_shape parameter of the first layer based on my training data X.
e.g.
model = Sequential()
model.add(Dense(32, input_shape=X.shape[0]))
I think you'll want X to look something like this:
[
[[ 0, 0, 0, ..., 18, 11, 1]],
[[ 0, 0, 0, ..., 18, 9, 1]],
....
]
So you could try reshaping it with the following line:
X = np.array([[sample] for sample in X])
The problem really comes from giving the wrong input to the network.
In my case the problem was that my custom image generator was passing the entire dataset as input rather than a certain pair of image-label. This is because I thought that generator.flow(x,y, batch_size) of Keras already has a yield structure inside, however the correct generator structure should be as follows(with a separate yield):
def generator(batch_size):
(images, labels) = utils.get_data(1000) # gets 1000 samples from dataset
labels = to_categorical(labels, 2)
generator = ImageDataGenerator(featurewise_center=True,
featurewise_std_normalization=True,
rotation_range=90.,
width_shift_range=0.1,
height_shift_range=0.1,
zoom_range=0.2)
generator.fit(images)
gen = generator.flow(images, labels, batch_size=32)
while 1:
x_batch, y_batch = gen.next()
yield ([x_batch, y_batch])
I realize the question is old but it might save some time for someone to find the issue.

How to use tensorflow to implement deconvolution?

I want to take use of tensorflow to implement fully convolutional network. There is a function
tf.nn.conv2d_transpose(value, filter, output_shape, strides, padding, name),
which could be used to take bilinear upsampling. However, I am confused as to how to use it? The input is an image with a single channel, and the output is also an image with a single channel, whose size is two times the one of the input.
I tried to use the function as follows, but got an IndexError: list index out of range:
with tf.name_scope('deconv') as scope:
deconv = tf.nn.conv2d_transpose(conv6, [3, 3, 1, 1],
[1, 26, 20, 1], 2, padding='SAME', name=None)
Got it! (assuming input_size = [1, 13, 10,1])
with tf.name_scope('deconv') as scope:
deconv = tf.nn.conv2d_transpose(input_layer, [3, 3, 1, 1],
[1, 26, 20, 1], [1, 2, 2, 1], padding='SAME', name=None)