visualizing 2nd convolutional layer - tensorflow

So via tf.summary I visualized the first convolutional layer in my model, of shape [5,5,3,32], as a set of individual images, one per filter. so this layer has a filter of 5x5 dimensions, of depth 3, and there are 32 of them. Im viewing these filters as 5x5 color(RGB) images.
Im wondering how to generalize this to a second convolutional layer, and third and on...
shape of the second convolutional layer is [5,5,32,64].
my questions is how would I transform that tensor into individual 5x5x3 images?
with the first conv layer of shape [5,5,3,32] I visualize it by transposing first tf.transpose(W_conv1,(3,0,1,2)), and then having 32 5x5x3 images.
doing a tf.transpose(W_conv2,(3,0,1,2)) would produce a shape [64,5,5,32]. how would I then use those "32 color channels"? (Im know its not that simple :) ).

Visualization of higher level filters is usually done indirectly. To visualize a particular filter, you look for images that the filter would respond the most to. For that you perform gradient ascent in the space of images (instead of changing the parameters of the network like when you train the network, you change the input image).
You will understand it easier if you play with the following Keras code: https://github.com/keras-team/keras/blob/master/examples/conv_filter_visualization.py

Related

Last fc layers in VGG16

The VGG16 architecture has input: 224x224x3 images.I want to have 48x48x3 inputs but to do this in keras, we remove the last fc layers which have 4096 neurons each.Why we have to do this? and is it needed to add another size of fc layers for this input?
Final pooling layer of VGG16 has dimension 7x7x512 for 224x224 input image. From there VGG16 uses fully connected layer of (7x7x512)x4096 to get 4096 dimensional output. However, since your input size is different your feature output dimension from final pooling layer will also be different (2x2x512 I think). So you need to change matrix dimension for fully connected layer to make it work. You have two other options though
use a global average pooling across spatial dimension to get 512 dimensional feature and then use few fully connected layers to get to your number of classes.
Resize you input image to 224x224x3 and you won't need to change anything in model architecture.
Removing the last FC layers is for fine-tuning or transfer learning, where you adapt an existing network to a new problem, such as changing the number of categories that your classifier can choose between.
You are adapting the network to take a different sized input, so you need to adjust the first layer(s) of the network.

Shape of tensor for 2D image in Keras

I am a newbie to Keras (and somehow to TF) but I have found shape definition for the input layer very confusing.
So in the examples, when we have a 1D vector of length 20 for input, shape gets defined as
...Input(shape=(20,)...)
And when a 2D tensor for greyscale images needs to be defined for MNIST, it is defined as:
...Input(shape=(28, 28, 1)...)
So my question is why the tensor is not defined as (20) and (28, 28)? Why in the first case a second dimension is added and left empty? Also in second, number of channels have to be defined?
I understand that it depends on the layer so Conv1D, Dense or Conv2D take different shapes but it seems the first parameter is implicit?
According to docs, Dense needs be (batch_size, ..., input_dim) but how is this related the example:
Dense(32, input_shape=(784,))
Thanks
Tuples vs numbers
input_shape must be a tuple, so only (20,) can satisfy it. The number 20 is not a tuple. -- There is the parameter input_dim, to make your life easier if you have only one dimension. This parameter can take 20. (But really, I find it just confusing, I always work with input_shape and use tuples, to keep a consistent understanding).
Dense(32, input_shape=(784,)) is the same as Dense(32, input_dim=784).
Images
Images don't have only pixels, they also have channels (red, green, blue).
A black/white image has only one channel.
So, (28pixels, 28pixels, 1channel)
But notice that there isn't any obligation to follow this shape for images everywhere. You can shape them the way you like. But some kinds of layers do demand a certain shape, otherwise they couldn't work.
Some layers demand specific shapes
It's the case of the 2D convolutional layers, which need (size1,size2,channels). They need this shape because they must apply the convolutional filters accordingly.
It's also the case of recurrent layers, which need (timeSteps,featuresPerStep) to perform their recurrent calculations.
MNIST models
Again, there isn't any obligation to shape your image in a specific way. You must do it according to which first layer you choose and what you intend to achieve. It's a free thing.
Many examples simply don't care about an image being a 2d structured thing, and they just use models that take 784 pixels. That's enough. They probably start with Dense layers, which demand shapes like (size,)
Other examples may care, and use a shape (28,28), but then these models will have to reshape the input to fit the needs of the next layer.
Convolutional layers 2D will demand (28,28,1).
The main idea is: input arrays must match input_shape or input_dim.
Tensor shapes
Be careful, though, when reading Keras error messages or working with custom / lambda layers.
All these shapes we defined before omit an important dimension: the batch size or the number of samples.
Internally all tensors will have this additional dimension as the first dimension. Keras will report it as None (a dimension that will adapt to any batch size you have).
So, input_shape=(784,) will be reported as (None,784).
And input_shape=(28,28,1) will be reported as (None,28,28,1)
And your actual input data must have a shape that matches that reported shape.

Detection Text from natural images

I write a code in tensorflow by using convolution neural network to detect the text from images. I used TFRecords file to read the street view text dataset, then, I resized the images to 128 for height and width.
I used 9-conv layer with zero padding and three max_pool layer with window size of (2×2) and stride of 2. Since I use just three pooling layer, the last layer shape will be (16×16). the last conv layer has '256' filters.
I used too, two regression fully connected layers (tf.nn.sigmoid) and tf.losses.mean_squared_error as a loss function.
My question is
is this architecture enough for detection process?? I know there is something call NMS for detection. Also what is the label in this case??
In general and this not a rule , it's just based on my experience, you should start with a smaller net 2 or 3 conv layer, and say what happens, if you get some good result focus more on the winning topology and adapt the hyperparameters ( learnrat, batchsize and so one ) , if you don't get good result at all go deep meaning add conv layer. and evaluate again. 12 conv is really huge , your problem complexity should be huge too ! otherwise you wil reach a good accuracy but waste a lot computer power and time for nothing ! and by the way use pyramid form meaning start wider and finish tiny

Convolutional Autoencoders: Black Feature Maps

I am working with convolutional autoencoders. My autoenoder configuration has one convolutional layer with stride (2,2) or avg-pooling and relu activation and one deconvolutional layer with stride (2,2) or avg-unpooling and relu activation.
I trained the autoencoder with the MNIST data set.
When I am looking at the feature maps after the first convolutional layer (20 filters with filter size 3), I got some black feature maps instead the learned filters are not black. The same happens if I change the number of filters or the filter size.
I get this phenomena with TensorFlow and Theano autoencoders. (I did not test other neural network software yet.)
Does anyone know why this happens?
I can avoid the black feature maps when adding a LRN layer but I want to understand why the black feature maps appear.
I found the same phenomenon.
After training an convolutional autoencoder with 7x7x3x6 for thousands RGB images, two or three filters has some outputs, other filters gets zero outputs.
And the error does not decrease when they has too many zero output filters.
I also changed the filter numbers and sizes but results were almost the same.

How can visualize tensorflow convolution filters?

In many documents, there are images about each filter like "Example".
I want to visual my convolution filters like "Example" image, but I don't know how can visualize it.
How can I visualize my convolution filters?
Think about each convolutional filter as x by x matrix, where x is the size of the filter. So your task is to put those matrices on a plot grid. I have made an example how to plot convolutional filters and output of convolutional layers using MNIST dataset, see conviz repository on github. Hope it helps you.
No those aren't the filters. You can read this paper which describes the procedures from converting the layer L's filters into these images.
In short words what it does is taking some filter, and uses a technique similar but not the same as back-propagation to convert the filter into an image.
The result of the 2d convolution is a tensor [batch, in_height, in_width, in_channels]. The image can be represented as a matrix [in_height, in_width, in_channels]. So all you need to do is to grab a few images from your batch, and add them to your summary with tf.summary.image().
For a tutorial how to do this, take a look at this answer.