How to explain the result of tf.map_fn? - tensorflow

Look at the code:
import tensorflow as tf
import numpy as np
elems = tf.ones([1,2,3],dtype=tf.int64)
alternates = tf.map_fn(lambda x: (x, x, x), elems, dtype=(tf.int64, tf.int64, tf.int64))
with tf.Session() as sess:
print(sess.run(alternates))
The output is:
(array([[[1, 1, 1],
[1, 1, 1]]], dtype=int64), array([[[1, 1, 1],
[1, 1, 1]]], dtype=int64), array([[[1, 1, 1],
[1, 1, 1]]], dtype=int64))
I can't understand the output, who can tell me?
update
elems is a tensor, so it should be unpacked along axis-0, and we will get [[1,1,1],[1,1,1]], and then map_fn pass [[1,1,1],[1,1,1]] into lambda x:(x,x,x),which means x=[[1,1,1],[1,1,1]], and I think the output of map_fn is
[[[1,1,1],[1,1,1]],
[[1,1,1],[1,1,1]],
[[1,1,1],[1,1,1]]]
The shape of output is [3,2,3] or a list of shape(2,3)
But in fact, the output is a list of tensor, the shape of each tensor is [1,2,3].
Or in other words:
import tensorflow as tf
import numpy as np
elems = tf.constant([1,2,3],dtype=tf.int64)
alternates = tf.map_fn(lambda x: (x, 2*x, -x), elems, dtype=(tf.int64, tf.int64, tf.int64))
with tf.Session() as sess:
print(sess.run(alternates))
Why the output is
(array([1, 2, 3], dtype=int64),
array([2, 4, 6], dtype=int64),
array([-1, -2, -3], dtype=int64))
rather than
(array([1, 2, -1], dtype=int64),
array([2, 4, -2], dtype=int64),
array([3, 6, -3], dtype=int64))
The two question is the same.
Update2
import tensorflow as tf
import numpy as np
elems = [tf.constant([1,2,3],dtype=tf.int64)]
alternates = tf.map_fn(lambda x: x, elems, dtype=tf.int64)
with tf.Session() as sess:
print(sess.run(alternates))
elems is a list of tensor, so according to api, tf.constant([1,2,3],dtype=tf.int64) will be unpacked along axis-0, so map_fn will works as [x for x in [1,2,3]], but in fact it will raise a error.
ValueError: The two structures don't have the same nested structure. First struc
ture: <dtype: 'int64'>, second structure: [<tf.Tensor 'map/while/TensorArrayRead
V3:0' shape=() dtype=int64>].
What's wrong?
update3
import tensorflow as tf
import numpy as np
elems = (tf.constant([1,2,3],dtype=tf.int64),tf.constant([1,2,3],dtype=tf.int64))
alternates = tf.map_fn(lambda x: x, elems, dtype=(tf.int64, tf.int64))
with tf.Session() as sess:
print(sess.run(alternates))
The output is
(array([1, 2, 3], dtype=int64), array([1, 2, 3], dtype=int64))
It seems that elems aren't unpacked, why?
import tensorflow as tf
import numpy as np
elems = (tf.constant([1,2,3],dtype=tf.int64),tf.constant([1,2,3],dtype=tf.int64))
alternates = tf.map_fn(lambda x: [x], elems, dtype=(tf.int64, tf.int64))
with tf.Session() as sess:
print(sess.run(alternates))
It will raise a error
TypeError: The two structures don't have the same sequence type. First structure
has type <class 'tuple'>, while second structure has type <class 'list'>.
Who can tell me how tf.map_fn works?

First,
elems = tf.ones([1,2,3],dtype=tf.int64)
elems is a 3-dimensional tensor with shape 1x2x3 full of ones, that is:
[[[1, 1, 1],
[1, 1, 1]]]
Then,
alternates = tf.map_fn(lambda x: (x, x, x), elems, dtype=(tf.int64, tf.int64, tf.int64))
alternates is a tuple of three tensors with the same shape as elems, each of which is built according to the given function. Since the function simply returns a tuple repeating its input three times, that means that the three tensors will be the same as elems. If the function were lambda x: (x, 2 * x, -x) then the first output tensor would be the same as elems, the second would be the double of elems and the third one the opposite.
In all these cases it is preferable to use regular operations instead of tf.map_fn; however, there may be cases where you have a function accepting tensors with N dimensions and you have a tensor with N + 1 that you want to have it applied to.
UPDATE:
I think you are thinking of tf.map_fn "the other way around", so to say. There is not a one-to-one correspondence between the number of elements or rows in the tensor and the number of outputs in the function; in fact, you could pass a function returning a tuple with as many elements as you want.
Taking your last example:
elems = tf.constant([1,2,3],dtype=tf.int64)
alternates = tf.map_fn(lambda x: (x, 2*x, -x), elems, dtype=(tf.int64, tf.int64, tf.int64))
tf.map_fn first split elems in the first axis, that is into 1, 2 and 3, and applies the function to each of them, getting:
(1, 2, -1)
(2, 4, -2)
(3, 6, -3)
Note that, as I said, each of these tuples could have as many elements as you wanted. Now, the final output is produced concatenating the results in the same position; so you get:
[1, 2, 3]
[2, 4, 6]
[-1, -2, -3]
Again, if the function produced tuples with more elements you would get more output tensors.
UPDATE 2:
About your new example:
import tensorflow as tf
import numpy as np
elems = (tf.constant([1,2,3],dtype=tf.int64),tf.constant([1,2,3],dtype=tf.int64))
alternates = tf.map_fn(lambda x: x, elems, dtype=(tf.int64, tf.int64))
with tf.Session() as sess:
print(sess.run(alternates))
The documentation says:
This method also allows multi-arity elems and output of fn. If elems is a (possibly nested) list or tuple of tensors, then each of these tensors must have a matching first (unpack) dimension. The signature of fn may match the structure of elems. That is, if elems is (t1, [t2, t3, [t4, t5]]), then an appropriate signature for fn is: fn = lambda (t1, [t2, t3, [t4, t5]]):.
Here elems is a tuple of two tensors with the same size in the first dimension, as needed. tf.map_fn takes one element of each input tensor at a time (so a tuple of two elements) and applies the given function to it, which should return the same structure that you passed in dtypes (a tuple of two elements, too); if you don't give a dtypes, then the expected output is the same as the input (again, a tuple of two elements, so in your case dtypes is optional). Anyway, it goes like this:
f((1, 1)) -> (1, 1)
f((2, 2)) -> (2, 2)
f((3, 3)) -> (3, 3)
These results are combined, concatenating all the corresponding elements in the structure; in this case, all the numbers in the first position produce the first output and all the numbers in the second positions produce the second output. The result is, finally, the requested structure (the two-element tuple) filled with these concatenations:
([1, 2, 3], [1, 2, 3])

Your input elems have shape (1,2,3) and look like this:
[[[1, 1, 1],
[1, 1, 1]]]
It's not a matrix containing values 1,2,3, because you create it with tf.ones() that makes a tensor filled with 1 with the shape you pass as parameter
Replying to the Update:
map_fn is applied to elems itself.
According to tf.map_fn's documentation:
elems: A tensor or (possibly nested) sequence of tensors, each of which will be unpacked along their first dimension. The nested sequence of the resulting slices will be applied to fn.
From what I understand there, the function expects a tensor or a list of tensors and supposedly slices it and applies the function to each element. However, from the results it seems that if you pass in a tensor that's the element it applies the function to directly, so x has shape (1,2,3) when the lambda function is called.
The function then creates a tuple with 3 copies of your (1,2,3) matrix (which is the array(...) in your output)
Restructuring the output line and adding indent to make it more clear, the output looks as follows:
(
array( # first copy of `x`
[
[
[1, 1, 1],
[1, 1, 1]
]
], dtype=int64
),
array( # second copy of `x`
[
[
[1, 1, 1],
[1, 1, 1]
]
], dtype=int64
),
array( # third copy of `x`
[
[
[1, 1, 1],
[1, 1, 1]
]
], dtype=int64
),
) # end of the tuple
Update 2:
My suspicion is that you ran into a bug. If you define elems as a list, you have the error, but if you define it as a tuple with elems = (tf.constant([1,2,3],dtype=tf.int64)), the code works as expected. Different handling of tuples and lists is very suspicious... which is why I believe it's a bug.
As #mrry pointed out, in my example with the tuple I missed a comma (and thus elems was the tensor itself and not a tuple containing the tensor).

Related

Why Reshape and Permute for segmentation with unet?

I am doing the image semantic segmentation job with unet. I am confused with the last layers for pixel classification. The Unet code is like this:
...
reshape = Reshape((n_classes,self.img_rows * self.img_cols))(conv9)
permute = Permute((2,1))(reshape)
activation = Activation('softmax')(permute)
model = Model(input = inputs, output = activation)
return model
...
Can I just reshape without using Permute like this?
reshape = Reshape((self.img_rows * self.img_cols, n_classes))(conv9)
Updated:
I found the training result is not right when when using the directly reshape way:
reshape = Reshape((self.img_rows * self.img_cols, n_classes))(conv9) // the loss is not convergent
My groundtruth is generated like this:
X = []
Y = []
im = cv2.imread(impath)
X.append(im)
seg_labels = np.zeros((height, width, n_classes))
for spath in segpaths:
mask = cv2.imread(spath, 0)
seg_labels[:, :, c] += mask
Y.append(seg_labels.reshape(width*height, n_classes))
Why reshape directly does not work?
You clearly misunderstand the meaning of each operation and the final goal:
final goal: classification for each pixel, i.e. softmax along the semantic class axis
how to achieve this goal in the original code? Let's see the code line by line:
reshape = Reshape((n_classes,self.img_rows * self.img_cols))(conv9) # L1
permute = Permute((2,1))(reshape) # L2
activation = Activation('softmax')(permute) # L3
L1's output dim = n_class-by-n_pixs, (n_pixs=img_rows x img_cols)
L2's output dim = n_pixs-by-n_class
L3's output dim = n_pixs-by-n_class
Note the default softmax activation is applied to the last axis, i.e. the axis that n_class stands for, which is the semantic class axis.
Therefore, this original code fulfills the final goal of semantic segmentation.
Let's revisit the code that you want to change, which is
reshape = Reshape((self.img_rows * self.img_cols, n_classes))(conv9) # L4
L4's output dim = n_pixs-by-n_class
My guess is that you think L4's output dim matches L2's, and thus L4 is a short-cut that is equivalent to executing L1 and L2.
However, matching the shape does not necessarily mean matching the physical meaning of axes. Why? A simple example will explain.
Say you have 2 semantic classes and 3 pixels. To see the difference assume all three pixels belong to the same class.
In other words, a ground truth tensor will look like this
# cls#1 cls#2
[ [0, 1], # pixel #1
[0, 1], # pixel #2
[0, 1], # pixel #3
]
Assume you have a perfect network and generate the exact response for each pixel, but your solution will create a tensor like below
# cls#1 cls#2
[ [0, 0], # pixel #1
[0, 1], # pixel #2
[1, 1], # pixel #3
]
whose shape is the same as the ground truth's, but fails to match the physical meaning of axes.
This further makes the softmax operation meaningless, because it is supposed to apply to the class dimension, but this dimension does not physically exist. As a result, it leads to the following erroneous output after applying softmax,
# cls#1 cls#2
[ [0.5, 0.5], # pixel #1
[0, 1], # pixel #2
[0.5, 0.5], # pixel #3
]
which completely mess up the training even if it is under the ideal assumption.
Therefore, it is a good habit to write down the physical meaning of each axis of a tensor. When you do any tensor reshape operation, ask yourself whether the physical meaning of an axis is changed in your expected way.
For example, if you have a tensor T of shape batch_dim x img_rows x img_cols x feat_dim, you can do many things and not all of them make sense (due to the problematic physical meaning of axes)
(Wrong) reshape it to whatever x feat_dim, because whatever dimension is meaningless in testing where the batch_size might be different.
(Wrong) reshape it to batch_dim x feat_dim x img_rows x img_cols, because the 2nd dimension is NOT the feature dimension and neither for the 3rd and 4th dimension.
(Correct) permute axes (3,1,2), and this will lead you the tensor of shape batch_dim x feat_dim x img_rows x img_cols, while keeping the physical meaning of each axis.
(Correct) reshape it to batch_dim x whatever x feat_dim. This is also valid, because the whatever=img_rows x img_cols is equivalent to the pixel location dimension, and both the meanings of batch_dim and feat_dim are unchanged.
Your code will still be runnable since the shape will be the same, but the result (backprops) will be different since the values of tensors will be different. For example:
arr = np.array([[[1,1,1],[1,1,1]],[[2,2,2],[2,2,2]],[[3,3,3],[3,3,3]],[[4,4,4],[4,4,4]]])
arr.shape
>>>(4, 2, 3)
#do reshape, then premute
reshape_1 = arr.reshape((4, 2*3))
np.swapaxes(reshape_1, 1, 0)
>>>array([[1, 2, 3, 4],
[1, 2, 3, 4],
[1, 2, 3, 4],
[1, 2, 3, 4],
[1, 2, 3, 4],
[1, 2, 3, 4]])
#do reshape directly
reshape_2 = arr.reshape(2*3, 4)
reshape_2
>>>array([[1, 1, 1, 1],
[1, 1, 2, 2],
[2, 2, 2, 2],
[3, 3, 3, 3],
[3, 3, 4, 4],
[4, 4, 4, 4]])
The Reshape and Permute is done to take the softmax at each pixel location. Adding to #meowongac's answer, Reshape preserves the order of the elements. In this case, since the channel dimensions have to be swapped, Reshape followed by Permute is appropriate.
Considering the case of (2,2) image with 3 values at each location,
arr = np.array([[[1,1],[1,1]],[[2,2],[2,2]],[[3,3],[3,3]]])
>>> arr.shape
(3, 2, 2)
>>> arr
array([[[1, 1],
[1, 1]],
[[2, 2],
[2, 2]],
[[3, 3],
[3, 3]]])
>>> arr[:,0,0]
array([1, 2, 3])
The channel values at each location are [1,2,3]. The goal is to swap the channel axis(length 3) to the end.
>>> arr.reshape((2,2,3))[0,0]
array([1, 1, 1]) # incorrect
>>> arr.transpose((1,2,0))[0,0] # similar to what permute does.
array([1, 2, 3]) # correct
More examples at this link: https://discuss.pytorch.org/t/how-to-change-shape-of-a-matrix-without-dispositioning-the-elements/30708

Tensorflow conv2d on RGB image

From the accepted answer in this question,
given the following
input and kernel matrices, the output of tf.nn.conv2d is
[[14 6]
[6 12]]
which makes sense. However, when I make the input and kernel matrices have 3-channels each (by repeating each original matrix), and run the same code:
# the previous input
i_grey = np.array([
[4, 3, 1, 0],
[2, 1, 0, 1],
[1, 2, 4, 1],
[3, 1, 0, 2]
])
# copy to 3-dimensions
i_rgb = np.repeat( np.expand_dims(i_grey, axis=0), 3, axis=0 )
# convert to tensor
i_rgb = tf.constant(i_rgb, dtype=tf.float32)
# make kernel depth match input; same process as input
k = np.array([
[1, 0, 1],
[2, 1, 0],
[0, 0, 1]
])
k_rgb = np.repeat( np.expand_dims(k, axis=0), 3, axis=0 )
# convert to tensor
k_rgb = tf.constant(k_rgb, dtype=tf.float32)
here's what my input and kernel matrices look like at this point
# reshape input to format: [batch, in_height, in_width, in_channels]
image_rgb = tf.reshape(i_rgb, [1, 4, 4, 3])
# reshape kernel to format: [filter_height, filter_width, in_channels, out_channels]
kernel_rgb = tf.reshape(k_rgb, [3, 3, 3, 1])
conv_rgb = tf.squeeze( tf.nn.conv2d(image_rgb, kernel_rgb, [1,1,1,1], "VALID") )
with tf.Session() as sess:
conv_result = sess.run(conv_rgb)
print(conv_result)
I get the final output:
[[35. 15.]
[35. 26.]]
But I was expecting the original output*3:
[[42. 18.]
[18. 36.]]
because from my understanding, each channel of the kernel is convolved with each channel of the input, and the resultant matrices are summed to get the final output.
Am I missing something from this process or the tensorflow implementation?
Reshape is a tricky function. It will produce you the shape you want, but can easily ground things together. In cases like yours, one should avoid using reshape by all means.
In that particular case instead, it is better to duplicate the arrays along the new axis. When using [batch, in_height, in_width, in_channels] channels is the last dimension and it should be used in repeat() function. Next code should better reflect the logic behind it:
i_grey = np.expand_dims(i_grey, axis=0) # add batch dim
i_grey = np.expand_dims(i_grey, axis=3) # add channel dim
i_rgb = np.repeat(i_grey, 3, axis=3 ) # duplicate along channels dim
And likewise with filters:
k = np.expand_dims(k, axis=2) # input channels dim
k = np.expand_dims(k, axis=3) # output channels dim
k_rgb = np.repeat(k, 3, axis=2) # duplicate along the input channels dim

Converting sparse tensor dense shape into an integer value in tensorflow

If I want to get the shape of a normal tensor in tensorflow, and store the values in a list, I would use the following
a_shape=[a.shape[0].value , a.shape[1].value]
If I'm not mistaken, using .value converts the element in the tensor to a real number.
With sparse tensors, I type the following
a_sparse_shape=[a.dense_shape[0].value, a.dense_shape[1].value]
However, I get the error message
" 'Tensor' object has no attribute 'value' "
Does anyone have any alternate solutions?
Yes, there is an alternative:
import tensorflow as tf
tensor = tf.random_normal([2, 2, 2, 3])
tensor_shape = tensor.get_shape().as_list()
print(tensor_shape)
# [2, 2, 2, 3]
Same for sparse tensors:
sparse_tensor = tf.SparseTensor(indices=[[0,0], [1, 1]],
values=[1, 2],
dense_shape=[2, 2])
sparse_tensor_shape = sparse_tensor.get_shape().as_list()
print(sparse_tensor_shape)
# [2, 2]

How do I select certain columns of a 2D tensor in TensorFlow?

As generalized slicing is being worked on in this issue, what would be the best way to achieve an op gathering columns of a 2D tensor (matrix)? For example, for tensor t:
1 2 3 4
5 6 7 8
and indices [1,3], I would like to get:
2 4
6 8
which is equivalent to numpy t[:, [1,3]].
Meanwhile the gather method has an axis parameter.
import tensorflow as tf
params = tf.constant([[1,2,3],[4,5,6]])
indices = [0,2]
op = tf.gather(params, indices, axis=1)
produces the output
[[1 3]
[4 6]]
There is a function named tf.nn.embedding_lookup(params, ind) which retrieves the rows of the params tensor.
To achieve what you want, we can first transpose the tensor t from which you want to select certain columns from. Then look up the rows of tf.transpose(t) (columns of t). After the selection, we transpose the result back.
import tensorflow as tf
t = tf.constant([[1, 2, 3],
[4, 5, 6]])
ind = tf.constant([0, 2])
result = tf.transpose(tf.nn.embedding_lookup(tf.transpose(t), ind))
with tf.Session() as sess:
print(sess.run(result))
So far, I created a workaround by flattening the input and using gather:
def gather_cols(params, indices, name=None):
"""Gather columns of a 2D tensor.
Args:
params: A 2D tensor.
indices: A 1D tensor. Must be one of the following types: ``int32``, ``int64``.
name: A name for the operation (optional).
Returns:
A 2D Tensor. Has the same type as ``params``.
"""
with tf.op_scope([params, indices], name, "gather_cols") as scope:
# Check input
params = tf.convert_to_tensor(params, name="params")
indices = tf.convert_to_tensor(indices, name="indices")
try:
params.get_shape().assert_has_rank(2)
except ValueError:
raise ValueError('\'params\' must be 2D.')
try:
indices.get_shape().assert_has_rank(1)
except ValueError:
raise ValueError('\'indices\' must be 1D.')
# Define op
p_shape = tf.shape(params)
p_flat = tf.reshape(params, [-1])
i_flat = tf.reshape(tf.reshape(tf.range(0, p_shape[0]) * p_shape[1],
[-1, 1]) + indices, [-1])
return tf.reshape(tf.gather(p_flat, i_flat),
[p_shape[0], -1])
Which for:
params = tf.constant([[1, 2, 3],
[4, 5, 6]])
indices = [0, 2]
op = gather_cols(params, indices)
produces the expected output:
[[1 3]
[4 6]]

reduce_sum by certain dimension

I have two embeddings tensor A and B, which looks like
[
[1,1,1],
[1,1,1]
]
and
[
[0,0,0],
[1,1,1]
]
what I want to do is calculate the L2 distance d(A,B) element-wise.
First I did a tf.square(tf.sub(lhs, rhs)) to get
[
[1,1,1],
[0,0,0]
]
and then I want to do an element-wise reduce which returns
[
3,
0
]
but tf.reduce_sum does not allow my to reduce by row. Any inputs would be appreciated. Thanks.
Add the reduction_indices argument with a value of 1, eg.:
tf.reduce_sum( tf.square( tf.sub( lhs, rhs) ), 1 )
That should produce the result you're looking for. Here is the documentation on reduce_sum().
According to TensorFlow documentation, reduce_sum function which takes four arguments.
tf.reduce_sum(input_tensor, axis=None, keep_dims=False, name=None, reduction_indices=None).
But reduction_indices has been deprecated. Better to use axis instead of. If the axis is not set, reduces all its dimensions.
As an example,this is taken from the documentation,
# 'x' is [[1, 1, 1]
# [1, 1, 1]]
tf.reduce_sum(x) ==> 6
tf.reduce_sum(x, 0) ==> [2, 2, 2]
tf.reduce_sum(x, 1) ==> [3, 3]
tf.reduce_sum(x, 1, keep_dims=True) ==> [[3], [3]]
tf.reduce_sum(x, [0, 1]) ==> 6
Above requirement can be written in this manner,
import numpy as np
import tensorflow as tf
a = np.array([[1,7,1],[1,1,1]])
b = np.array([[0,0,0],[1,1,1]])
xtr = tf.placeholder("float", [None, 3])
xte = tf.placeholder("float", [None, 3])
pred = tf.reduce_sum(tf.square(tf.subtract(xtr, xte)),1)
# Initializing the variables
init = tf.global_variables_initializer()
# Launch the graph
with tf.Session() as sess:
sess.run(init)
nn_index = sess.run(pred, feed_dict={xtr: a, xte: b})
print nn_index