I want to optimize the conditon gan model (generator),error: shape '[64, 3, 64, 64]' is invalid for input of size 2952192 , - size

This is original github source:
I want to rerun this model and use it another area (CGAN).
I want to change the network to generate multiple label and increase the performance but it didn't work well. I need somebody to help me! Thanks~
Error information as follow. I think the problem is reshape problem but I don't know how to figure it out! Need some help
Input attribute
#Number of all images (81474)!!!
# Root directory for dataset
dataroot = "***"
# Number of workers for dataloader
workers = 2
# Batch size during training
batch_size = 64
# Spatial size of training images. All images will be resized to this
# size using a transformer.
image_size = 64
#number of calsses lable
n_class = 27
# Number of channels in the training images. For color images this is 3
nc = 3
# Size of z latent vector (i.e. size of generator input)
nz = 100
# Size of feature maps in generator (output)
ngf = 64
# Size of feature maps in discriminator
ndf = 64
# Number of training epochs
num_epochs = 5
# Learning rate for optimizers
lr = 0.0002
# Beta1 hyperparam for Adam optimizers
beta1 = 0.5
# Number of GPUs available. Use 0 for CPU mode.
ngpu = 1 ```
```def __init__(self, ngpu=1):
super(Generator, self).__init__()
self.label_emb = nn.Embedding(n_class, n_class)
self.ngpu = ngpu
self.main = nn.Sequential(
# input is Z, going into a convolution
nn.ConvTranspose2d( nz + n_class, ngf * 16, 4, 1, 0, bias=False),
nn.BatchNorm2d(ngf * 16),
# state size. (ngf*8) x 4 x 4
nn.ConvTranspose2d(ngf * 16, ngf * 8, 4, 2, 1, bias=False),
nn.BatchNorm2d(ngf * 8),
# state size. (ngf*4) x 8 x 8
nn.ConvTranspose2d( ngf * 8, ngf * 4, 4, 2, 1, bias=False),
nn.BatchNorm2d(ngf * 4),
# state size. (ngf*2) x 16 x 16
nn.ConvTranspose2d( ngf * 4, ngf, 4, 4, 1, bias=False),
# state size. (ngf) x 32 x 32
nn.ConvTranspose2d(ngf, nc, 4, 2, 1, bias=False),
# state size. (nc) x 64 x 64
def forward(self, noise_input, labels):
# Concatenate label embedding and image to produce input
#print(self.label_emb(labels).unsqueeze(2).unsqueeze(3).shape, noise_input.shape, labels.shape)
gen_input = torch.cat((self.label_emb(labels).unsqueeze(2).unsqueeze(3), noise_input), 1)
img = self.main(gen_input)
img = img.view(img.size(0), *(nc, image_size, image_size))
return img
netG = Generator(ngpu).to(device)```
<ipython-input-65-50e58bbfe414> in <module>
37 noise = torch.randn(b_size, nz, 1, 1, device=device)
38 # Generate fake image batch with G
---> 39 fake = netG(noise, fake_style_labels)
40 label.fill_(fake_label)
41 # Classify all fake batch with D
~/opt/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
1100 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
1101 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1102 return forward_call(*input, **kwargs)
1103 # Do not call functions when jit is used
1104 full_backward_hooks, non_full_backward_hooks = [], []
<ipython-input-61-cb81d45887cf> in forward(self, noise_input, labels)
31 gen_input = torch.cat((self.label_emb(labels).unsqueeze(2).unsqueeze(3), noise_input), 1)
32 img = self.main(gen_input)
---> 33 img = img.view(img.size(0), *(nc, image_size, image_size))
34 return img
RuntimeError: shape '[64, 3, 64, 64]' is invalid for input of size 2952192```


Not understanding the data flow in UNET-like architetures and having problems with the output of the Conv2DTranspose layers

I have a problem or two with the input dimensions of modified U-Net architecture. In order to save your time and better understand/reproduce my results, I'll post the code and the output dimensions. The modified U-Net architecture is the MultiResUNet architecture from https://github.com/nibtehaz/MultiResUNet/blob/master/MultiResUNet.py. and is based on this paper https://arxiv.org/abs/1902.04049 Please Don't be turned off by the length of this code. You can simply copy-paste it and it shouldn't take longer than 10 seconds to reproduce my results. Also you don't need a dataset for this. Tested with TF.v1.9 Keras v.2.20.
import tensorflow as tf
from tensorflow.keras.layers import Input, Conv2D, MaxPooling2D, Conv2DTranspose, concatenate, BatchNormalization, Activation, add
from tensorflow.keras.models import Model
from tensorflow.keras.activations import relu
###{ 2D Convolutional layers
# Arguments: ######################################################################
# x {keras layer} -- input layer #
# filters {int} -- number of filters #
# num_row {int} -- number of rows in filters #
# num_col {int} -- number of columns in filters #
# Keyword Arguments:
# padding {str} -- mode of padding (default: {'same'})
# strides {tuple} -- stride of convolution operation (default: {(1, 1)})
# activation {str} -- activation function (default: {'relu'})
# name {str} -- name of the layer (default: {None})
# Returns:
# [keras layer] -- [output layer]}
# # ############################################################################
def conv2d_bn(x, filters ,num_row,num_col, padding = "same", strides = (1,1), activation = 'relu', name = None):
x = Conv2D(filters,(num_row, num_col), strides=strides, padding=padding, use_bias=False)(x)
x = BatchNormalization(axis=3, scale=False)(x)
if(activation == None):
return x
x = Activation(activation, name=name)(x)
return x
# our 2D transposed Convolution with batch normalization
# 2D Transposed Convolutional layers
# Arguments: #############################################################
# x {keras layer} -- input layer #
# filters {int} -- number of filters #
# num_row {int} -- number of rows in filters #
# num_col {int} -- number of columns in filters
# Keyword Arguments:
# padding {str} -- mode of padding (default: {'same'})
# strides {tuple} -- stride of convolution operation (default: {(2, 2)})
# name {str} -- name of the layer (default: {None})
# Returns:
# [keras layer] -- [output layer] ###################################
def trans_conv2d_bn(x, filters, num_row, num_col, padding='same', strides=(2, 2), name=None):
x = Conv2DTranspose(filters, (num_row, num_col), strides=strides, padding=padding)(x)
x = BatchNormalization(axis=3, scale=False)(x)
return x
# Our Multi-Res Block
# Arguments: ############################################################
# U {int} -- Number of filters in a corrsponding UNet stage #
# inp {keras layer} -- input layer #
# Returns: #
# [keras layer] -- [output layer] #
def MultiResBlock(U, inp, alpha = 1.67):
W = alpha * U
shortcut = inp
shortcut = conv2d_bn(shortcut, int(W*0.167) + int(W*0.333) +
int(W*0.5), 1, 1, activation=None, padding='same')
conv3x3 = conv2d_bn(inp, int(W*0.167), 3, 3,
activation='relu', padding='same')
conv5x5 = conv2d_bn(conv3x3, int(W*0.333), 3, 3,
activation='relu', padding='same')
conv7x7 = conv2d_bn(conv5x5, int(W*0.5), 3, 3,
activation='relu', padding='same')
out = concatenate([conv3x3, conv5x5, conv7x7], axis=3)
out = BatchNormalization(axis=3)(out)
out = add([shortcut, out])
out = Activation('relu')(out)
out = BatchNormalization(axis=3)(out)
return out
# Our ResPath:
# ResPath
# Arguments:#######################################
# filters {int} -- [description]
# length {int} -- length of ResPath
# inp {keras layer} -- input layer
# Returns:
# [keras layer] -- [output layer]#############
def ResPath(filters, length, inp):
shortcut = inp
shortcut = conv2d_bn(shortcut, filters, 1, 1,
activation=None, padding='same')
out = conv2d_bn(inp, filters, 3, 3, activation='relu', padding='same')
out = add([shortcut, out])
out = Activation('relu')(out)
out = BatchNormalization(axis=3)(out)
for i in range(length-1):
shortcut = out
shortcut = conv2d_bn(shortcut, filters, 1, 1,
activation=None, padding='same')
out = conv2d_bn(out, filters, 3, 3, activation='relu', padding='same')
out = add([shortcut, out])
out = Activation('relu')(out)
out = BatchNormalization(axis=3)(out)
return out
# MultiResUNet
# Arguments: ############################################
# height {int} -- height of image
# width {int} -- width of image
# n_channels {int} -- number of channels in image
# Returns:
# [keras model] -- MultiResUNet model###############
def MultiResUnet(height, width, n_channels):
inputs = Input((height, width, n_channels))
# downsampling part begins here
mresblock1 = MultiResBlock(32, inputs)
pool1 = MaxPooling2D(pool_size=(2, 2))(mresblock1)
mresblock1 = ResPath(32, 4, mresblock1)
mresblock2 = MultiResBlock(32*2, pool1)
pool2 = MaxPooling2D(pool_size=(2, 2))(mresblock2)
mresblock2 = ResPath(32*2, 3, mresblock2)
mresblock3 = MultiResBlock(32*4, pool2)
pool3 = MaxPooling2D(pool_size=(2, 2))(mresblock3)
mresblock3 = ResPath(32*4, 2, mresblock3)
mresblock4 = MultiResBlock(32*8, pool3)
# Upsampling part
up5 = concatenate([Conv2DTranspose(
32*4, (2, 2), strides=(2, 2), padding='same')(mresblock4), mresblock3], axis=3)
mresblock5 = MultiResBlock(32*8, up5)
up6 = concatenate([Conv2DTranspose(
32*4, (2, 2), strides=(2, 2), padding='same')(mresblock5), mresblock2], axis=3)
mresblock6 = MultiResBlock(32*4, up6)
up7 = concatenate([Conv2DTranspose(
32*2, (2, 2), strides=(2, 2), padding='same')(mresblock6), mresblock1], axis=3)
mresblock7 = MultiResBlock(32*2, up7)
conv8 = conv2d_bn(mresblock7, 1, 1, 1, activation='sigmoid')
model = Model(inputs=[inputs], outputs=[conv8])
return model
So now back to my problem with mismatched input/output dimensions in the UNet architecture.
If I choose filter height/width (128,128) or (256,256) or (512,512) and do :
model = MultiResUnet(128, 128,3)
Tensorflow gives me a perfect result of how the whole architecture looks like. Now if I do this
model = MultiResUnet(36, 36,3)
I get this error :
--------------------------------------------------------------------------- ValueError Traceback (most recent call
last) in
----> 1 model = MultiResUnet(36, 36,3)
2 display(model.summary())
in MultiResUnet(height, width,
26 up5 = concatenate([Conv2DTranspose(
---> 27 32*4, (2, 2), strides=(2, 2), padding='same')(mresblock4), mresblock3], axis=3)
28 mresblock5 = MultiResBlock(32*8, up5)
in concatenate(inputs, axis, **kwargs)
682 A tensor, the concatenation of the inputs alongside axis axis.
683 """
--> 684 return Concatenate(axis=axis, **kwargs)(inputs)
in call(self, inputs, *args, **kwargs)
694 if all(hasattr(x, 'get_shape') for x in input_list):
695 input_shapes = nest.map_structure(lambda x: x.get_shape(), inputs)
--> 696 self.build(input_shapes)
698 # Check input assumptions set after layer building, e.g. input shape.
in wrapper(instance, input_shape)
146 else:
147 input_shape = tuple(tensor_shape.TensorShape(input_shape).as_list())
--> 148 output_shape = fn(instance, input_shape)
149 if output_shape is not None:
150 if isinstance(output_shape, list):
in build(self, input_shape)
388 'inputs with matching shapes '
389 'except for the concat axis. '
--> 390 'Got inputs shapes: %s' % (input_shape))
392 def _merge_function(self, inputs):
ValueError: A Concatenate layer requires inputs with matching shapes
except for the concat axis. Got inputs shapes: [(None, 8, 8, 128),
(None, 9, 9, 128)]
Why does the Conv2DTranspose give me the wrong dimension
(None, 8, 8, 128)
instead of
(None, 9, 9, 128)
and why doesn't the Concat function complain when I choose filter sizes like (128,128),(256,256) and etc. (multiples of 32)
So to generalize this question how can I make this UNet architecture work with any filter size and how can I deal with the Conv2DTranspose layer producing an output that has one dimension less(width/height) than the actually needed dimension(when the filter size isn't a multiple of 32 or is not symmetric) and why doesn't this happen with other filter sizes that are a multiple of the 32. And what If I had variable Input sizes ??
Any help would be highly appreciated.
U-Net family of models (such as the MultiResUNet model above) follow an encoder-decoder architecture. Encoder is a down-sampling path with feature extraction whereas the decoder an upsampling one. Feature maps from encoder are concatenated at the decoder through skip-connections. These feature maps are concatenated at the last axis, the 'channel' axis (considering the features to be having dimensions [batch_size, height, width, channels]). Now, for the features to be concatenated at any axis ('channel' axis, in our case), the dimensions at all the other axes must match.
In the above model architecture, there are 3 downsampling/max-pooling operations being performed (through MaxPooling2D)in the encoder path. At the decoder path 3 upsampling/transpose-conv operations are performed, aiming to restore the image back to the full dimension. However, for the concatenations (through skip-connections) to happen, the downsampled and upsampled feature dimensions of height, width & batch_size should remain identical at every "level" of the model. I'll illustrate this with the examples you mentioned in the question:
1st case : Input dimensions (128,128,3) : 128 -> 64 -> 32 -> 16 -> 32 -> 64 -> 128
2nd case: Input dimensions (36,36,3) : 36 -> 18 -> 9 -> 4 -> 8 -> 16 -> 32
In the 2nd case, when the height and width of feature map reaches 9 in the encoder path, further downsampling leads to a dimension change (loss) that cannot be regained in the decoder while upsampling. Hence, it throws an error due to inability to concatenate feature maps of dimensions [(None, 8, 8, 128)] & [(None, 9, 9, 128)].
In general, for a simple encoder-decoder model (with skip-connections) having 'n' downsampling (MaxPooling2D) layers, the input dimension must be a multiple of 2^n to be able to concatenate the model's encoder features at the decoder. In this case n=3, hence the input must be a multiple of 8 to not run into these dimension mismatch errors.
Hope this helps! :)
Thanks #Balraj Ashwath for the great answer! Then, if your input has shape h and you want to use this architecture with depth d (h >= 2^d), one possibility is to pad the dimension of h with delta_h zeros, given by the following expression:
import numpy as np
h, d = 36, 3
delta_h = np.ceil(h/(2**d)) * (2**d) - h
> 4.0
Then, following the example of #Balraj Ashwath:
40 -> 20 -> 10 -> 5 -> 10 -> 20 -> 40

Dynamic output shape is incorrect/ not same as static

I am trying to implement a patch creation function with using tensorflow's extract_image_patches function but dynamic output shape is not same as my expectation.
Let me tell briefly what it does. Input shape is supposed to be
6000x4000. We first find its greatest common denominator. It turns out it is 3. then we pass '64' argument to our function to create patches with size of 3x64,2x64=192,128. This returns us 31x31 distinct patches. Everything works ok with static output, but when it comes to dynamic output things are not ok. I could not find which part caused a different dynamic output.
# input_shape_inbuild: (None, 6000, 4000, 1)
# ---LAYER---
# Input Size: (None, 6000, 4000, 1)
# Patch Size: (x,y) = 192, 128
# Aspect ratio: (3, 2)
!wget https://www.fujifilm.com/products/digital_cameras/x/fujifilm_x_pro2/sample_images/img/index/ff_x_pro2_001.JPG
img = cv2.imread('ff_x_pro2_001.JPG', 0)
img = tf.reshape(img, [1,img.shape[0],img.shape[1],1])
***tensorflow takes images as (y, x)
so a 6000x4000 im is given as tf.func(4000, 6000)
# Here I define custom layer in tensorflow.
class create_patches(Layer):
def __init__(self, patchMultiplier):
super(create_patches, self).__init__()
self.patchMultiplier = patchMultiplier
def build(self, input_shape):
print('input_shape_inbuild: ', input_shape)
def aspect_ratio(width, height):
#find greatest common divider of input_shape
def gcd(x, y):
while y != 0:
(x, y) = (y, x % y)
return x
r = gcd(width, height)
x = int(width/r)
y = int(height/r)
return x, y
self.aspect_ratio = aspect_ratio(input_shape[1], input_shape[2])
self.patchSize_x = self.aspect_ratio[0] * self.patchMultiplier
self.patchSize_y = self.aspect_ratio[1] * self.patchMultiplier
def call(self, inputs):
print('Input Size:', inputs._keras_shape)
print('Patch Size: (x,y) = {}, {}'.format(self.patchSize_x, self.patchSize_y))
print('Aspect ratio: {}'.format(self.aspect_ratio))
#call tf.extract_image_patches to return it.
out = tf.extract_image_patches(images=inputs,
ksizes=[1, self.patchSize_y, self.patchSize_x, 1],
strides=[1, self.patchSize_y, self.patchSize_x, 1],
rates=[1, 1, 1, 1],
return out
def compute_output_shape(self, input_shape):
ksize_cols = patchSize_x
ksize_rows = patchSize_y
#output shape=[batch, out_rows, out_cols, ksize_rows * ksize_cols * depth]
shape = (self.patchSize_x, self.patchSize_y,
(input_shape[1]/self.patchSize_x) * (input_shape[2]/self.patchSize_y))
shape =(input_shape[0],
input_shape[1]/self.patchSize_x, # patch row count
input_shape[2]/self.patchSize_y, # patch col count
self.patchSize_x * self.patchSize_y) # patch pixel count
return shape
#here is input with 6000x4000 pixels.
input_shape_1 = Input(shape=(6000, 4000, 1))
#here I fed input to my custom layer.
x1 = create_patches(64)(input_shape_1)
print('Output shape: ', x1.shape)
# here I build a model to see static output
f = K.function([input_shape_1], [x1])
import numpy as np
#result = f([np.random.randint(256, size=(1,4000,6000,1))])
result = f([img])
result = np.array(result)
# [batch, out_rows, out_cols, ksize_rows * ksize_cols * depth]
# Result shape: (1, 1, 125, 125, 1536)
print('Result shape: ', result.shape, '\n\n')
#print(result[:, :, :, 0].shape)
here is output I get.
input_shape_inbuild: (None, 6000, 4000, 1)
Input Size: (None, 6000, 4000, 1)
Patch Size: (x,y) = 192, 128
Aspect ratio: (3, 2)
Output shape: (?, 46, 20, 24576)
Result shape: (1, 1, 31, 31, 24576)
#####Result Shape is as I expected but at output shape I could not resolve where 46 and 20 come from. Could you tell me why it is like this?

Image adjustments with Conv2d

I am working on a project related to CNN using TensorFlow.
I imported image using (20 such images)
for filename in glob.glob('input_data/*.jpg'):
image_size_input = len(input_images[0])
The images were of size (250,250) because of grayscale.
But for conv2D, it requires a 4D input tensor to feed. My input tensor looks like
x = tf.placeholder(tf.float32,shape=[None,image_size_output,image_size_output,1], name='x')
So i was not able to convert the above 2d image into the given shape(4D). How to deal with the "None" field.
I tried this:
input_images_padded = []
for image in input_images:
temp = np.zeros((1,image_size_output,image_size_output,1))
for i in range(image_size_input):
for j in range(image_size_input):
temp[0,i,j,0] = image[i,j]
I got the following error:
File "/opt/intel/intelpython3/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 975, in _run
% (np_val.shape, subfeed_t.name, str(subfeed_t.get_shape())))
ValueError: Cannot feed value of shape (20, 1, 250, 250, 1) for Tensor 'x_11:0', which has shape '(?, 250, 250, 1)'
Here's the entire code(for reference):
import tensorflow as tf
from PIL import Image
import glob
import cv2
import os
import numpy as np
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'
input_images = []
output_images = []
for filename in glob.glob('input_data/*.jpg'):
for filename in glob.glob('output_data/*.jpg'):
image_size_input = len(input_images[0])
image_size_output = len(output_images[0])
now adding padding to the input images to convert from 125x125 to 250x2050 sized images
input_images_padded = []
for image in input_images:
temp = np.zeros((1,image_size_output,image_size_output,1))
for i in range(image_size_input):
for j in range(image_size_input):
temp[0,i,j,0] = image[i,j]
output_images_padded = []
for image in output_images:
temp = np.zeros((1,image_size_output,image_size_output,1))
for i in range(image_size_input):
for j in range(image_size_input):
temp[0,i,j,0] = image[i,j]
sess = tf.Session()
Creating tensor for the input
x = tf.placeholder(tf.float32,shape= [None,image_size_output,image_size_output,1], name='x')
Creating tensor for the output
y = tf.placeholder(tf.float32,shape= [None,image_size_output,image_size_output,1], name='y')
def create_weights(shape):
return tf.Variable(tf.truncated_normal(shape, stddev=0.05))
def create_biases(size):
return tf.Variable(tf.constant(0.05, shape=[size]))
def create_convolutional_layer(input, bias_count, filter_height, filter_width, num_input_channels, num_out_channels, activation_function):
weights = create_weights(shape=[filter_height, filter_width, num_input_channels, num_out_channels])
biases = create_biases(bias_count)
layer = tf.nn.conv2d(input=input,
strides=[1, 1, 1, 1],
layer += biases
layer = tf.nn.max_pool(value=layer,
ksize=[1, 2, 2, 1],
strides=[1, 1, 1, 1],
if activation_function=="relu":
layer = tf.nn.relu(layer)
return layer
Conv. Layer 1: Patch extraction
64 filters of size 1 x 9 x 9
Activation function: ReLU
Output: 64 feature maps
Parameters to optimize:
1 x 9 x 9 x 64 = 5184 weights and 64 biases
layer1 = create_convolutional_layer(input=x,
Conv. Layer 2: Non-linear mapping
32 filters of size 64 x 1 x 1
Activation function: ReLU
Output: 32 feature maps
Parameters to optimize: 64 x 1 x 1 x 32 = 2048 weights and 32 biases
layer2 = create_convolutional_layer(input=layer1,
'''Conv. Layer 3: Reconstruction
1 filter of size 32 x 5 x 5
Activation function: Identity
Output: HR image
Parameters to optimize: 32 x 5 x 5 x 1 = 800 weights and 1 bias'''
layer3 = create_convolutional_layer(input=layer2,
applying gradient descent algorithm
loss = tf.reduce_sum(tf.square(layer3-y))
optimizer = tf.train.GradientDescentOptimizer(0.01)
train = optimizer.minimize(loss)
init = tf.global_variables_initializer()
for i in range(len(input_images)):
sess.run(train,{x: input_images_padded, y:output_images_padded})
curr_loss = sess.run([loss], {x: x_train, y: y_train})
print("loss: %s"%(curr_loss))
I think your image_padded is not right. I don't have tf-code writing experience (though have read some code). But try this:
// imgs is your input-image-sequences
// padded is to feed
cnt = len(imgs)
H,W = imgs[0].shape[:2]
padded = np.zeros((cnt, H, W, 1))
for i in range(cnt):
padded[i, :,:,0] = img[i]
One option would be to ignore giving the shape when you create the placeholder so that it accepts a tensor of any shape that you feed during sess.run()
From the docs:
shape: The shape of the tensor to be fed (optional). If the shape is not
specified, you can feed a tensor of any shape.
Alternatively, you can specify 20, which is your batch size. Note that the first dimension in the tensor always corresponds to batch_size
Check the next lines. It works for me :
train_set = np.zeros((input_images.shape[0], input_images.shape[1], input_images.shape[2],1))
for image in range(input_images.shape[0]):
train_set[image,:,:,0] = input_images[image,:,:]

out of range indexing error in visualizing features from convolution layers

I'm the blog post at How convnets see the world by Francois Chollet for visualizing the features learned by the convnet. Here is my code:
from __future__ import print_function
from scipy.misc import imsave
import numpy as np
import time
from keras import applications
from keras import backend as K
from keras.preprocessing.image import ImageDataGenerator, array_to_img, img_to_array, load_img
# dimensions of the generated pictures for each filter.
img_width = 128
img_height = 128
# the name of the layer we want to visualize
# (see model definition at keras/applications/vgg16.py)
layer_name = 'block5_conv1'
# util function to convert a tensor into a valid image
def deprocess_image(x):
# normalize tensor: center on 0., ensure std is 0.1
x -= x.mean()
x /= (x.std() + 1e-5)
x *= 0.1
# clip to [0, 1]
x += 0.5
x = np.clip(x, 0, 1)
# build the VGG16 network with ImageNet weights
model = applications.VGG16(include_top=False, weights='imagenet', input_shape=(128,128,3))
print('Model loaded.')
# this is the placeholder for the input images
input_img = model.input
# get the symbolic outputs of each "key" layer (we gave them unique names).
layer_dict = dict([(layer.name, layer) for layer in model.layers[1:]])
def normalize(x):
# utility function to normalize a tensor by its L2 norm
return x / (K.sqrt(K.mean(K.square(x))) + 1e-5)
kept_filters = []
for filter_index in range(0, 20):
# we only scan through the first 50 filters,
# but there are actually 512 of them
print('Processing filter %d' % filter_index)
start_time = time.time()
# we build a loss function that maximizes the activation
# of the nth filter of the layer considered
layer_output = layer_dict[layer_name].output
loss = K.mean(layer_output[:, :, :, filter_index])
# we compute the gradient of the input picture wrt this loss
grads = K.gradients(loss, input_img)[0]
# normalization trick: we normalize the gradient
grads = normalize(grads)
# this function returns the loss and grads given the input picture
iterate = K.function([input_img], [loss, grads])
# step size for gradient ascent
step = 1.
# we start from a gray image with some random noise
img = load_img('para1.jpg') # this is a PIL image
x = img_to_array(img)
x = x.reshape((1,) + x.shape)
input_img_data = x
input_img_data = (input_img_data - 0.5) * 20 + 128
# we run gradient ascent for 20 steps
for i in range(20):
loss_value, grads_value = iterate([input_img_data])
input_img_data += grads_value * step
print('Current loss value:', loss_value)
if loss_value <= 0.:
# some filters get stuck to 0, we can skip them
# decode the resulting input image
if loss_value > 0:
img = deprocess_image(input_img_data[0])
kept_filters.append((img, loss_value))
end_time = time.time()
print('Filter %d processed in %ds' % (filter_index, end_time - start_time))
# we will stich the best 64 filters on a 8 x 8 grid.
n = 8
# the filters that have the highest loss are assumed to be better-looking.
# we will only keep the top 64 filters.
kept_filters.sort(key=lambda x: x[1], reverse=True)
kept_filters = kept_filters[:n * n]
# build a black picture with enough space for
# our 8 x 8 filters of size 128 x 128, with a 5px margin in between
margin = 5
width = n * img_width + (n-1) * margin
height = n * img_height + (n-1) * margin
stitched_filters = np.zeros((width, height, 3))
# fill the picture with our saved filters
for i in range(n):
for j in range(n):
img, loss = kept_filters[i * n + j]
stitched_filters[(img_width + margin) * i: (img_width + margin) * i + img_width,
(img_height + margin) * j: (img_height + margin) * j + img_height, :] = img
# save the result to disk
imsave('stitched_filters_%dx%d.png' % (n, n), stitched_filters)
As I run the code, I am stuck with the error:
File "C:/Users/rajaramans2/codes/untitled8.py", line 94, in <module>
img, loss = kept_filters[i * n + j]
IndexError: list index out of range
Kindly help with the modifications. I'm using a RGB image of dimensions (128,128) and trying to visualize the convolutional layer 1 at block 5 of the vgg16 network.
In the line 76, kept_filters is appended within the loop of line 42. So the length of kept_filters is at most 20. However in line 94, you want to access 8*8 = 64 elements in kept_filters, which is out of range.

How to get CNN kernel values in Tensorflow

I am using the code below to create CNN layers.
conv1 = tf.layers.conv2d(inputs = input, filters = 20, kernel_size = [3,3],
padding = "same", activation = tf.nn.relu)
and I want to get the values of all kernels after training. It does not work it I simply do
kernels = conv1.kernel
So how should I retrieve the value of these kernels? I am also not sure what variables and method does conv2d has since tensorflow don't really tell it in conv2d class.
You can find all the variables in list returned by tf.global_variables() and easily lookup for variable you need.
If you wish to get these variables by name, declare a layer as:
conv_layer_1 = tf.layers.conv2d(activation=tf.nn.relu,
kernel_size=(3, 3),
name="conv1", # NOTE THE NAME
strides=(1, 1))
Recover the graph as:
gr = tf.get_default_graph()
Recover the kernel values as:
conv1_kernel_val = gr.get_tensor_by_name('conv1/kernel:0').eval()
Recover the bias values as:
conv1_bias_val = gr.get_tensor_by_name('conv1/bias:0').eval()
You mean you want to get the value of the weights for the conv1 layer.
You haven't actually defined the weights with conv2d, you need to do that. When I create a convolutional layer I use a function that performs all the necessary steps, here's a copy/paste of the function I use to create a each of my convolutional layers:
def _conv_layer(self, name, in_channels, filters, kernel, input_tensor, strides, dtype=tf.float32):
with tf.variable_scope(name):
w = tf.get_variable("w", shape=[kernel, kernel, in_channels, filters],
initializer=tf.contrib.layers.xavier_initializer_conv2d(), dtype=dtype)
b = tf.get_variable("b", shape=[filters], initializer=tf.constant_initializer(0.0), dtype=dtype)
c = tf.nn.conv2d(input_tensor, w, strides, padding='SAME', name=name + "c")
a = tf.nn.relu(c + b, name=name + "_a")
print name + "_a", a.get_shape().as_list(), name + "_w", w.get_shape().as_list(), \
"params", np.prod(w.get_shape().as_list()[1:]) + filters
return a, w.get_shape().as_list()
This is what I use to define 5 convolutional layers, this example is straight out of my code, so note that it's 5 convolutional layers stacked without using max pooling or anything, strides of 2 and 5x5 kernels.
conv1_a, _ = self._conv_layer("conv1", 3, 24, 5, self.imgs4d, [1, 2, 2, 1]) # 24.8 MiB/feature -> 540 x 960
conv2_a, _ = self._conv_layer("conv2", 24, 80, 5, conv1_a, [1, 2, 2, 1]) # 6.2 MiB -> 270 x 480
conv3_a, _ = self._conv_layer("conv3", 80, 256, 5, conv2_a, [1, 2, 2, 1]) # 1.5 MiB -> 135 x 240
conv4_a, _ = self._conv_layer("conv4", 256, 750, 5, conv3_a, [1, 2, 2, 1]) # 0.4 MiB -> 68 x 120
conv5_a, _ = self._conv_layer("conv5", 750, 2048, 5, conv4_a, [1, 2, 2, 1]) # 0.1 MiB -> 34 x 60
There's also a good tutorial on the tensorflow website on how to set up a convolutional network:
The direct answer to your question is that the weights for the convolutional layer are defined there as w, that's the tensor you're asking about if I understand you correctly.