Dynamic Axes with a custom RNN - cntk

I’m running into a number of issues relating to dynamic axes. I am trying to implement a convolutional rnn similar to the of the LSTM() function but handles sequential image input and outputs an image.
I’m able to build the network and pass dummy data through it to produce output, but when I try to compute the error with an input_variable label I consistently see the following error:
RuntimeError: Node '__v2libuid__Input471__v2libname__img_label' (InputValue operation): DataFor: FrameRange's dynamic axis is inconsistent with matrix: {numTimeSteps:1, numParallelSequences:2, sequences:[{seqId:0, s:0, begin:0, end:1}, {seqId:1, s:1, begin:0, end:1}]} vs. {numTimeSteps:2, numParallelSequences:1, sequences:[{seqId:0, s:0, begin:0, end:2}]}`
If I understand this error message correctly, it claims that the value I passed in as the label has inconsistent axes to what is expected with 2 time steps and 1 parallel sequence, when what is desired is 1 time-step and 2 sequences. This makes sense to me, but I’m not sure how the data I’m passing in is not conforming to this. Here are (roughly) the variable declarations and eval statements:
…
img_input = input_variable(shape=img_shape, dtype=np.float32, name="img_input")
convlstm = Recurrence(conv_lstm_cell, initial_state=initial_state)(img_input)
out = select_last(convlstm)
img_label = input_variable(shape=img_shape, dynamic_axes=out.dynamic_axes, dtype=np.float32, name="img_label”)
error = squared_error(out, img_label)
…
dummy_input = np.ones(shape=(2, 3, 3, 32, 32)) # (batch, seq_len, channels, height, width)
dummy_label = np.ones(shape=(2, 3, 32, 32)) # (batch, channels, height, width)
out = error.eval({img_input:dummy_input, img_label:dummy_label})
I believe part of the issue is with the dynamic_axes set when creating the img_label input_variable, I’ve also tried setting it to [Axis.default_batch_axis()] and not setting it at all and either squared error complains about inconsistent axes between out and img_label or I see the same error as above.

The only issue I see with the above setup is that your dummy label should have an explicit dynamic axis so it should be declared as
dummy_label = np.ones(shape=(2, 1, 3, 32, 32))
Assuming your convlstm works similar to an lstm, then the following works without issues for me and it evaluates the loss for two input/output pairs.
x = C.input_variable((3,32,32))
cx = convlstm(x)
lx = C.sequence.last(cx)
y = C.input_variable(lx.shape, dynamic_axes=lx.dynamic_axes)
loss = C.squared_error(y, lx)
x0 = np.arange(2*3*3*32*32,dtype=np.float32).reshape(2,3,3,32,32)
y0 = np.arange(2*1*3*32*32,dtype=np.float32).reshape(2,1,3,32,32)
loss.eval({x:x0, y:y0})

Related

unknown size from stacked tensorArray in tf.while_loop output

The following code uses a tf.while_loop(...) for computations of a dynamic length.
outputs_tensor_array = tf.TensorArray(tf.float32,
size=0,
clear_after_read=False,
infer_shape=False,
dynamic_size = True,
element_shape[self.batch_size, self.size])
initial_args = [outputs_tensor_array, 0]
outputs, *_ = tf.while_loop(lambda out, idx, *_ : idx < max_len,
func,
initial_args + additional_args,
parallel_iterations = 32,
swap_memory = True)
outputs = outputs.stack()
I'm wondering if its possible to enforce a size, or atleast make that size be None in order to enforce a size constraint and enable further computations down the graph. The current shape is [?, batch, hidden_size]
tensor.set_shape will refine the static shape information and throw an error if it is incompatible with current static shape information (in the TensorArray.stack() case it will let you set any value for the zeroth dimension's static shape information).
tf.reshape can also be useful for asserting/filling in shape information, although it's not perfect. It will only throw an error if the size of the Tensor is wrong when the graph is executed (and may otherwise hide a shape error downstream).
More complicated, but you can also set_shape for the static shape information and then use tf.Assert with tf.shape to check the Tensor's shape when the graph is executed.

Tensorflow: Bug when using `tf.contrib.metrics.streaming_mean_iou`

I'm getting a strange error when trying to compute the intersection over union using tensorflows tf.contrib.metrics.streaming_mean_iou.
This was the code I was using before which works perfectly fine
tensorflow as tf
label = tf.image.decode_png(tf.read_file('/path/to/label.png'),channels=1)
label_lin = tf.reshape(label, [-1,])
weights = tf.cast(tf.less_equal(label_lin, 10), tf.int32)
mIoU, update_op = tf.contrib.metrics.streaming_mean_iou(label_lin, label_lin,num_classes = 11,weights = weights)
init = tf.local_variables_initializer()
sess.run(init)
sess.run([update_op])
However when I use a mask like this
mask = tf.image.decode_png(tf.read_file('/path/to/mask_file.png'),channels=1)
mask_lin = tf.reshape(mask, [-1,])
mask_lin = tf.cast(mask_lin,tf.int32)
mIoU, update_op = tf.contrib.metrics.streaming_mean_iou(label_lin, label_lin,num_classes = 11,weights = mask_lin)
init = tf.local_variables_initializer()
sess.run(init)
sess.run([update_op])
It keeps on failing after an irregular number of iterations showing this error:
*** Error in `/usr/bin/python': corrupted double-linked list: 0x00007f29d0022fd0 ***
I checked the shape and data type of both mask_lin and weights. They are the same, so I cannot really see what is going wrong here.
Also the fact that the error comes after calling update_op an irregular number of times is strange. Maybe TF empties the mask_lin object after calling several sess.run()'s ?
Or is this some TF bug? But then again why would it work with weights...

How to use maxout activation function in tensorflow?

I want to use maxout activation function in tensorflow, but I don't know which function should use.
I sent a pull request for maxout, here is the link:
https://github.com/tensorflow/tensorflow/pull/5528
Code is as follows:
def maxout(inputs, num_units, axis=None):
shape = inputs.get_shape().as_list()
if axis is None:
# Assume that channel is the last dimension
axis = -1
num_channels = shape[axis]
if num_channels % num_units:
raise ValueError('number of features({}) is not a multiple of num_units({})'
.format(num_channels, num_units))
shape[axis] = -1
shape += [num_channels // num_units]
outputs = tf.reduce_max(tf.reshape(inputs, shape), -1, keep_dims=False)
return outputs
Here is how it works:
I don't think there is a maxout activation but there is nothing stopping yourself from making it yourself. You could do something like the following.
with tf.variable_scope('maxout'):
layer_input = ...
layer_output = None
for i in range(n_maxouts):
W = tf.get_variable('W_%d' % d, (n_input, n_output))
b = tf.get_variable('b_%d' % i, (n_output,))
y = tf.matmul(layer_input, W) + b
if layer_output is None:
layer_output = y
else:
layer_output = tf.maximum(layer_output, y)
Note that this is code I just wrote in my browser so there may be syntax errors but you should get the general idea. You simply perform a number of linear transforms and take the maximum across all the transforms.
How about this code?
This seems to work in my test.
def max_out(input_tensor,output_size):
shape = input_tensor.get_shape().as_list()
if shape[1] % output_size == 0:
return tf.transpose(tf.reduce_max(tf.split(input_tensor,output_size,1),axis=2))
else:
raise ValueError("Output size or input tensor size is not fine. Please check it. Reminder need be zero.")
I refer the diagram in the following page.
From version 1.4 on you can use tf.contrib.layers.maxout.
Maxout is a layer such that it calculates N*M output for a N*1 input, and then it returns the maximum value across the column, i.e., the final output has shape N*1 as well. Basically it uses multiple linear fittings to mimic a complex function.

per_image_whitening in Python

I'm trying to set up TensorFlow to accept one image at a time but I believe I'm getting incorrect results because I pass a regular array without first performing tf.image.per_image_whitening() beforehand. Is there an easy way to do this in Python to an individual image without using the image queue?
Here's my code so far:
im = Image.open(request.FILES.values()[0])
im = im.convert('RGB')
im = im.crop((0, 0, cifar10.IMAGE_SIZE, cifar10.IMAGE_SIZE))
(width, height) = im.size
image_array = list(im.getdata())
image_array = np.array(image_array)
image_array = image_array.reshape((1, height, width, 3))
# tf.image.per_image_whitening() should be done here
#mean = numpy.mean(image_array)
#stddev = numpy.std(image_array)
#adjusted_stddev = max(stddev, 1.0/len(image_array.flatten())))
feed_dict = {"shuffle_batch:0": image_array}
# predictions always returns something close to [1, 0]
predictions = sess.run(tf.nn.softmax(logits), feed_dict=feed_dict)
If you want to avoid the image queue and do the predictions one by one, I think
image_array = (image_array - mean) / adjusted_stddev
should be able to do the trick.
If you want to do the prediction by batches, it's a little bit complicated as per_image_whitening (now per_image_standardization) only works with single images. So you need to do it before you form the batch like the way above or setup a preprocess procedure.

Visualizing output of convolutional layer in tensorflow

I'm trying to visualize the output of a convolutional layer in tensorflow using the function tf.image_summary. I'm already using it successfully in other instances (e. g. visualizing the input image), but have some difficulties reshaping the output here correctly. I have the following conv layer:
img_size = 256
x_image = tf.reshape(x, [-1,img_size, img_size,1], "sketch_image")
W_conv1 = weight_variable([5, 5, 1, 32])
b_conv1 = bias_variable([32])
h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)
So the output of h_conv1 would have the shape [-1, img_size, img_size, 32]. Just using tf.image_summary("first_conv", tf.reshape(h_conv1, [-1, img_size, img_size, 1])) Doesn't account for the 32 different kernels, so I'm basically slicing through different feature maps here.
How can I reshape them correctly? Or is there another helper function I could use for including this output in the summary?
I don't know of a helper function but if you want to see all the filters you can pack them into one image with some fancy uses of tf.transpose.
So if you have a tensor that's images x ix x iy x channels
>>> V = tf.Variable()
>>> print V.get_shape()
TensorShape([Dimension(-1), Dimension(256), Dimension(256), Dimension(32)])
So in this example ix = 256, iy=256, channels=32
first slice off 1 image, and remove the image dimension
V = tf.slice(V,(0,0,0,0),(1,-1,-1,-1)) #V[0,...]
V = tf.reshape(V,(iy,ix,channels))
Next add a couple of pixels of zero padding around the image
ix += 4
iy += 4
V = tf.image.resize_image_with_crop_or_pad(image, iy, ix)
Then reshape so that instead of 32 channels you have 4x8 channels, lets call them cy=4 and cx=8.
V = tf.reshape(V,(iy,ix,cy,cx))
Now the tricky part. tf seems to return results in C-order, numpy's default.
The current order, if flattened, would list all the channels for the first pixel (iterating over cx and cy), before listing the channels of the second pixel (incrementing ix). Going across the rows of pixels (ix) before incrementing to the next row (iy).
We want the order that would lay out the images in a grid.
So you go across a row of an image (ix), before stepping along the row of channels (cx), when you hit the end of the row of channels you step to the next row in the image (iy) and when you run out or rows in the image you increment to the next row of channels (cy). so:
V = tf.transpose(V,(2,0,3,1)) #cy,iy,cx,ix
Personally I prefer np.einsum for fancy transposes, for readability, but it's not in tf yet.
newtensor = np.einsum('yxYX->YyXx',oldtensor)
anyway, now that the pixels are in the right order, we can safely flatten it into a 2d tensor:
# image_summary needs 4d input
V = tf.reshape(V,(1,cy*iy,cx*ix,1))
try tf.image_summary on that, you should get a grid of little images.
Below is an image of what one gets after following all the steps here.
In case someone would like to "jump" to numpy and visualize "there" here is an example how to display both Weights and processing result. All transformations are based on prev answer by mdaoust.
# to visualize 1st conv layer Weights
vv1 = sess.run(W_conv1)
# to visualize 1st conv layer output
vv2 = sess.run(h_conv1,feed_dict = {img_ph:x, keep_prob: 1.0})
vv2 = vv2[0,:,:,:] # in case of bunch out - slice first img
def vis_conv(v,ix,iy,ch,cy,cx, p = 0) :
v = np.reshape(v,(iy,ix,ch))
ix += 2
iy += 2
npad = ((1,1), (1,1), (0,0))
v = np.pad(v, pad_width=npad, mode='constant', constant_values=p)
v = np.reshape(v,(iy,ix,cy,cx))
v = np.transpose(v,(2,0,3,1)) #cy,iy,cx,ix
v = np.reshape(v,(cy*iy,cx*ix))
return v
# W_conv1 - weights
ix = 5 # data size
iy = 5
ch = 32
cy = 4 # grid from channels: 32 = 4x8
cx = 8
v = vis_conv(vv1,ix,iy,ch,cy,cx)
plt.figure(figsize = (8,8))
plt.imshow(v,cmap="Greys_r",interpolation='nearest')
# h_conv1 - processed image
ix = 30 # data size
iy = 30
v = vis_conv(vv2,ix,iy,ch,cy,cx)
plt.figure(figsize = (8,8))
plt.imshow(v,cmap="Greys_r",interpolation='nearest')
you may try to get convolution layer activation image this way:
h_conv1_features = tf.unpack(h_conv1, axis=3)
h_conv1_imgs = tf.expand_dims(tf.concat(1, h_conv1_features_padded), -1)
this gets one vertical stripe with all images concatenated vertically.
if you want them padded (in my case of relu activations to pad with white line):
h_conv1_features = tf.unpack(h_conv1, axis=3)
h_conv1_max = tf.reduce_max(h_conv1)
h_conv1_features_padded = map(lambda t: tf.pad(t-h_conv1_max, [[0,0],[0,1],[0,0]])+h_conv1_max, h_conv1_features)
h_conv1_imgs = tf.expand_dims(tf.concat(1, h_conv1_features_padded), -1)
I personally try to tile every 2d-filter in a single image.
For doing this -if i'm not terribly mistaken since I'm quite new to DL- I found out that it could be helpful to exploit the depth_to_space function, since it takes a 4d tensor
[batch, height, width, depth]
and produces an output of shape
[batch, height*block_size, width*block_size, depth/(block_size*block_size)]
Where block_size is the number of "tiles" in the output image. The only limitation to this is that the depth should be the square of block_size, which is an integer, otherwise it cannot "fill" the resulting image correctly.
A possible solution could be of padding the depth of the input tensor up to a depth that is accepted by the method, but I sill havn't tried this.
Another way, which I think very easy, is using the get_operation_by_name function. I had hard time visualizing the layers with other methods but this helped me.
#first, find out the operations, many of those are micro-operations such as add etc.
graph = tf.get_default_graph()
graph.get_operations()
#choose relevant operations
op_name = '...'
op = graph.get_operation_by_name(op_name)
out = sess.run([op.outputs[0]], feed_dict={x: img_batch, is_training: False})
#img_batch is a single image whose dimensions are (1,n,n,1).
# out is the output of the layer, do whatever you want with the output
#in my case, I wanted to see the output of a convolution layer
out2 = np.array(out)
print(out2.shape)
# determine, row, col, and fig size etc.
for each_depth in range(out2.shape[4]):
fig.add_subplot(rows, cols, each_depth+1)
plt.imshow(out2[0,0,:,:,each_depth], cmap='gray')
For example below is the input(colored cat) and output of the second conv layer in my model.
Note that I am aware this question is old and there are easier methods with Keras but for people who use an old model from other people (such as me), this may be useful.