Per pixel softmax for fully convolutional network - tensorflow

I'm trying to implement something like a fully convolutional network, where the last convolution layer uses filter size 1x1 and outputs a 'score' tensor. The score tensor has shape [Batch, height, width, num_classes].
My question is, what function in tensorflow can apply softmax operation for each pixel, independent of other pixels. The tf.nn.softmax ops seems not for such purpose.
If there is no such ops available, I guess I have to write one myself.
Thanks!
UPDATE: if I do have to implement myself, I think I may need to reshape the input tensor to [N, num_claees] where N = Batch x width x height, and apply tf.nn.softmax, then reshape it back. Does it make sense?

Reshaping it to 2d and then reshaping it back, like you guessed, is the right approach.

You can use this function.
I found it by searching from GitHub.
import tensorflow as tf
"""
Multi dimensional softmax,
refer to https://github.com/tensorflow/tensorflow/issues/210
compute softmax along the dimension of target
the native softmax only supports batch_size x dimension
"""
def softmax(target, axis, name=None):
with tf.name_scope(name, 'softmax', values=[target]):
max_axis = tf.reduce_max(target, axis, keep_dims=True)
target_exp = tf.exp(target-max_axis)
normalize = tf.reduce_sum(target_exp, axis, keep_dims=True)
softmax = target_exp / normalize
return softmax

Related

Keras/TensorFlow: What is the order of the weight tensor dimensions of a convolutional layer?

In channels_last format, the shape of the data tensor is (batch_size, height, width, channels) and the shape of the weight tensor is apparently (see reference 2) (rows, cols, input_depth, output_depth).
In channels_first format, the shape of the data tensor is (batch_size, channels, height, width) and the shape of the weight tensor is what?
I've looked high and low for the answer to that question. When I run my code and use model.get_weights() to get the weight and bias tensors, it appears that the format of the weight tensors is the same in channels_first as in channels_last. Yet, when I output the weight tensors to a file and read them back into my C/C++ code which is hand-crafted and doesn't use TensorFlow, it doesn't appear to be working. The results are numerically nonsensical. Maybe there is some other problem, but I would like to obtain a definitive answer to this question.
BTW, the reason I'm switching between channels_last and channels_first is that I need to be able to develop my code on a CPU machine and then run large training sessions on a GPU machine.
Any help is appreciated.
References:
Data tensor shape is explained here.
Weight tensor shape is partially explained here.
You can find the answer in source code of TF/keras keras/keras/layers/convolutional/base_conv.py, where data_format=channels_first or data_format=channels_last is working when forward calculation, but in weight definition, the kernel shape is kept as:
kernel_shape = self.kernel_size + (input_channel // self.groups, self.filters)
So, it makes you find the weight format is same in channels_first or channels_last by model.get_weights()。
In detail, convolution op is ultimately performed by conv1d, conv2d, conv3d, etc., in gen_nn_ops which defined and conducted by C/C++. Each of these operation need receive data_format to adjust inputs but not kernels (weights/filters).

Convert 2D Convolutionary Neural Networks to 1D Convolutionary Neural Networks in Tensorflow

Say I have some feature extracted and it is 10x10 data(maybe image or cepstrogram).
Usually I would feed this into my 2DConv and i ll be on my way.
My quesiton is if I had to convert this into 1D of 100 inputs what disadvantages would I get besides the obvious part where my filter would not be detecting the surrounding neighboors but only the previous and the next ones to detect pattern, which might lead to a worse performance.
And If I had to do this though, would I just reshape ,use reshape layer or use permute layer ?
Thanks
Yes, you are correct regarding the GNA, our Intel GNA hardware is natively support only 1D convolution and 2D convolutions is experimental.
This article (GNA Plugin - OpenVINO™ Toolkit) specifies the steps to add Permute layers before or after convolutions.
You could try both methods and see which one works for you.
Generally,the 1d convolution in TensorFlow is created with 2d convolution wrapping in reshape layers to add H dimension before 2d convolution and remove it after that.
At the same time MO inserts permutes before and after reshape layers since they change the interpretation of data.
For advantages & disadvantages of 2D/1D CNN you may refer to this detailed thread
In TensorFlow, these are the process to build CNN architecture:
Reshape input if necessary using tf.reshape() to match the convolutional layer you intend to build (for example, if using a 2D convolution, reshape it into three-dimensional format)
Create a convolutional layer using tf.nn.conv1d(), tf.nn.conv2d(), or tf.nn.conv3d, depending on the dimensionality of the input.
Create a poling layer using tf.nn.maxpool()
Repeat steps 2 and 3 for additional convolution and pooling layers
Reshape output of convolution and pooling layers, flattening it to prepare for the fully connected layer
Create a fully connected layer using tf.matmul() function, add an activation using, for example, tf.nn.relu() and apply a dropout using tf.nn.dropout()
Create a final layer for class prediction, again using tf.matmul()
Store weights and biases using TensorFlow variables These are just the basic steps to create the CNN model, there are additional steps to define training and evaluation, execute the model and tune it
In step 2 of CNN development you create convolutional layer of 2D using tf.nn.conv2d() - this function Computes a 2-D convolution given 4-D input and filters tensors.
So if you have 1D vector as found in examples of MNIST datadet with 784 features, you can convert 1D vector to 4D input required for conv2d() function using the tensorflow reshape method, Reshape method converts to match picture format [Height x Width x Channel], then Tensor input become 4-D: [Batch Size, Height, Width, Channel]:
x = tf.reshape(x, shape=[-1, 28, 28, 1])
where x is placeholder vector
x = tf.placeholder(tf.float32, [None, num_input])
You may refer to the official Tensorflow documentation

Tensorflow weighted vs sigmoid cross-entropy loss

I am trying to implement multi-label classification using TensorFlow (i.e., each output pattern can have many active units). The problem has imbalanced classes (i.e., much more zeros than ones in the labels distribution, which makes label patterns very sparse).
The best way to tackle the problem should be to use the tf.nn.weighted_cross_entropy_with_logits function. However, I get this runtime error:
ValueError: Tensor conversion requested dtype uint8 for Tensor with dtype float32
I can't understand what is wrong here. As input to the loss function, I pass the labels tensor, the logits tensor, and the positive class weight, which is a constant:
positive_class_weight = 10
loss = tf.nn.weighted_cross_entropy_with_logits(targets=labels, logits=logits, pos_weight=positive_class_weight)
Any hints about how to solve this? If I just pass the same labels and logits tensors to the tf.losses.sigmoid_cross_entropy loss function, everything works well (in the sense that Tensorflow runs properly, but of course following training predictions are always zero).
See related problem here.
The error is likely to be thrown after the loss function, because the only significant difference between tf.losses.sigmoid_cross_entropy and tf.nn.weighted_cross_entropy_with_logits is the shape of the returned tensor.
Take a look at this example:
logits = tf.linspace(-3., 5., 10)
labels = tf.fill([10,], 1.)
positive_class_weight = 10
weighted_loss = tf.nn.weighted_cross_entropy_with_logits(targets=labels, logits=logits, pos_weight=positive_class_weight)
print(weighted_loss.shape)
sigmoid_loss = tf.losses.sigmoid_cross_entropy(multi_class_labels=labels, logits=logits)
print(sigmoid_loss.shape)
Tensors logits and labels are kind of artificial and both have shape (10,). But it's important that weighted_loss and sigmoid_loss are different. Here's the output:
(10,)
()
This is because tf.losses.sigmoid_cross_entropy performs reduction (the sum by default). So in order to replicate it, you have to wrap the weighted loss with tf.reduce_sum(...).
If this doesn't help, make sure that labels tensor has type float32. This bug is very easy to make, e.g., the following declaration won't work:
labels = tf.fill([10,], 1) # the type is not float!
You might be also interested to read this question.

Tensorflow sequential matrix multiplication

I have two tensors of the following shapes:
tensor1 => shape(?, ?, 100) # corresponds to [batch_size, max_time, embedding_size]
tensor2 => shape(?, 100) # corresponds to [batch_size, embedding_size]
What I wish to do is for every [100] dimensional vector in tensor2 obtain a matrix multiplication with corresponding [max_time, 100] dimensional matrix in tensor1 to get batch_size number of max_time dimensional vectors; which is same as a [batch_size, max_time] dimensional matrix.
For those who know: I am basically trying to implement the content based attention over the encoded inputs given by the encoder of a seq2seq model. all the [max_time] dimensional vectors are just the attention values that I later softmax.
I am aware that tensorflow provides the AttentionWrapper as well as various helpers in the contrib package. However, I wish to do this because I am experimenting with the attention mechanism to obtain a hybrid attention mask.
I have tried the tf.while_loop but, got stuck in the ? shape to unroll the loop. A vectorized implementation also doesn't seem very straight forward to me. Please help.
What you can do is use tf.matmul and handle your vectors like 100 * 1 matrices.
tensor2 = tf.expand_dims(tensor2, 2)
result = tf.matmul(tensor1, tensor2)

Why isn't this Conv2d_Transpose / deconv2d returning the original input in tensorflow?

weights = tf.placeholder("float",[5,5,1,1])
imagein = tf.placeholder("float",[1,32,32,1])
conv = tf.nn.conv2d(imagein,weights,strides=[1,1,1,1],padding="SAME")
deconv = tf.nn.conv2d_transpose(conv, weights, [1,32,32,1], [1,1,1,1],padding="SAME")
dw = np.random.rand(5,5,1,1)
noise = np.random.rand(1,32,32,1)
sess = tf.InteractiveSession()
convolved = conv.eval(feed_dict={imagein: noise, weights: dw})
deconvolved = deconv.eval(feed_dict={imagein: noise, weights: dw})
I've been trying to figure out conv2d_transpose in order to reverse a convolution in Tensorflow. My understanding is that "deconvolved" should contain the same data as "noise" after applying a normal convolution and then its transpose, but "deconvolved" just contains some completely different image. Is there something wrong with my code, or is the theory incorrect?
There's a reason it's called conv2d_transpose rather than deconv2d: it isn't deconvolution. Convolution isn't an orthogonal transformation, so it's inverse (deconvolution) isn't the same as its transpose (conv2d_transpose).
Your confusion is understandable: calling the transpose of convolution "deconvolution" has been standard neural network practice for years. I am happy than we were able to fix the name to be mathematically correct in TensorFlow; more details here:
https://github.com/tensorflow/tensorflow/issues/256