Capsule network parameters - tensorflow

i have found the parameters used for MNIST dataset which is as below
# Parameters Based on Paper
epsilon = 1e-7
m_plus = 0.9
m_minus = 0.1
lambda_ = 0.5
alpha = 0.0005
epochs = 3
no_of_secondary_capsules = 10
params = {
"no_of_conv_kernels": 256,
"no_of_primary_capsules": 64,
"no_of_secondary_capsules": 128,
"primary_capsule_vector": 16,
"secondary_capsule_vector": 32,
"r":3,
}
the input shape for MNIST is 28,28,1
I want this parameters change for my input data shaped as 13,9,1
because when I use the MNIST parameters for capsule network it throws error about the shape
ValueError: Exception encountered when calling layer "primary_caps" (type PrimaryCaps).
in user code:
File "/content/Efficient-CapsNet/utils/layers_hinton.py", line 69, in call *
x = tf.nn.conv2d(inputs, self.kernel, self.s, 'VALID')
ValueError: Negative dimension size caused by subtracting 9 from 5 for '{{node primary_caps/Conv2D}} = Conv2D[T=DT_FLOAT, data_format="NHWC", dilations=[1, 1, 1, 1], explicit_paddings=[], padding="VALID", strides=[1, 2, 2, 1], use_cudnn_on_gpu=true](Placeholder, primary_caps/Conv2D/ReadVariableOp)' with input shapes: [?,5,11,256], [9,9,256,256].
Call arguments received:
• inputs=tf.Tensor(shape=(None, 5, 11, 256), dtype=float32)
can someone suggest parameters for capsule network?

The data was audio (13,9,1) so converting it to spectrogram image and then reading it with target size (28,28) helped me workaround the issue of using capsule network for the audio dataset.
This workaround can be used if you want to go with the original hyperparameters and network designs of the capsule network with dynamic routing paper.

Related

Random 3d image slicing tensorflow data, depth of NoneType shape

What I need to do is to cut some slices (fix size) of a 3D-binary masks randomly.
The data is stored in a tensorflow dataset (tf.data). It does have to be this kind of data type to be able to use caching for speed up.
My source code so far:
import tensorflow as tf #version 2.2.0
mask.shape # (512,512,None,1), where (width, height, depth, channel), depth is NOT FIXED and depends on the image and therefore unknown
slice_number = 10
positive = tf.where(tf.equal(masks[:, :, :-slice_number,:],1))[:, 2] #slices with non zero values
# now we need to select slice id from positive mask slices randomly,
# which failes since the shape is always None due to the fact that image depth is unknown.
pos_id = random.randint(0, positive.shape[0])
mask = mask[:, :, positive[pos_id]:positive[pos_id] + slice_number]
How do I get the shape? Any ideas are highly appreciated
Thanks in advance!
Assuming that you want to randomly slice a fixed slice_size from a Tensor dimension with unknown depth, the following demonstrates how it can be done:
import tensorflow as tf
#tf.function
def random_slice(slice_size):
# For demonstration purposes, generate your mask with random depth
random_depth = tf.random.uniform(shape=[], dtype=tf.int32,
minval=20, maxval=50)
mask = tf.ones([512, 512, random_depth, 1], dtype=tf.int32)
print(mask) # Mask with unknown depth: Tensor("ones:0", shape=(512, 512, None, 1), dtype=int32)
depth = tf.shape(mask)[2]
print(depth) # Unknown depth: Tensor("strided_slice:0", shape=(), dtype=int32)
depth_begin = tf.random.uniform(shape=[], dtype=tf.int32,
minval=0, maxval=depth-slice_size)
print(depth_begin) # Random begin of slice based on unknown depth: Tensor("random_uniform_1:0", shape=(), dtype=int32)
mask_sliced = tf.slice(mask,
begin=[0, 0, depth_begin, 0],
size=[512, 512, slice_size, 1])
print(mask_sliced) # Random slice with known dimensions: Tensor("Slice:0", shape=(512, 512, 10, 1), dtype=int32)
return mask_sliced
mask_sliced = random_slice(slice_size=10)
print(mask_sliced) # Resolved random slice

How to Feed Batched Sequences of Images through Tensorflow conv2d

This seems like a trivial question, but I've been unable to find the answer.
I have batched sequences of images of shape:
[batch_size, number_of_frames, frame_height, frame_width, number_of_channels]
and I would like to pass each frame through a few convolutional and pooling layers. However, TensorFlow's conv2d layer accepts 4D inputs of shape:
[batch_size, frame_height, frame_width, number_of_channels]
My first attempt was to use tf.map_fn over axis=1, but I discovered that this function does not propagate gradients.
My second attempt was to use tf.unstack over the first dimension and then use tf.while_loop. However, my batch_size and number_of_frames are dynamically determined (i.e. both are None), and tf.unstack raises {ValueError} Cannot infer num from shape (?, ?, 30, 30, 3) if num is unspecified. I tried specifying num=tf.shape(self.observations)[1], but this raises {TypeError} Expected int for argument 'num' not <tf.Tensor 'A2C/infer/strided_slice:0' shape=() dtype=int32>.
Since all the images (num_of_frames) are passed to the same convolutional model, you can stack both batch and frames together and do the normal convolution. Can be achieved by just using tf.resize as shown below:
# input with size [batch_size, frame_height, frame_width, number_of_channels
x = tf.placeholder(tf.float32,[None, None,32,32,3])
# reshape for the conv input
x_reshapped = tf.reshape(x,[-1, 32, 32, 3])
x_reshapped output size will be (50, 32, 32, 3)
# define your conv network
y = tf.layers.conv2d(x_reshapped,5,kernel_size=(3,3),padding='SAME')
#(50, 32, 32, 3)
#Get back the input shape
out = tf.reshape(x,[-1, tf.shape(x)[1], 32, 32, 3])
The output size would be same as the input: (10, 5, 32, 32, 3
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
print(sess.run(out, {x:np.random.normal(size=(10,5,32,32,3))}).shape)
#(10, 5, 32, 32, 3)

Cifar10 with variable batch and image size fails at reshape before fully connected (tf-slim)

I'm trying to set CIFAR10's tf-slim model to have input of dynamic batch, height, width and single channel, i.e. monochromatic images of different sizes. Given that all shapes but channel size are dynamic, the output shape of tf.flatten is (?, ?). Is there any way to circumvent this? I'm trying to adapt CIFAR10 to tf's DeepDream tutorial that uses InceptionV3 with an unspecific input shape.
I'm assuming this happens because CIFAR10 is not fully convolutional
import tensorflow as tf
slim = tf.contrib.slim
images = tf.placeholder(tf.float32, shape=(None, None, None, 1))
NUM_CLASSES = 18
scope = 'CifarNet'
with tf.variable_scope(scope, 'CifarNet', [images, NUM_CLASSES]):
net = slim.conv2d(images, 64, [5, 5], scope='conv1')
net = slim.max_pool2d(net, [2, 2], 2, scope='pool1')
net = tf.nn.lrn(net, 4, bias=1.0, alpha=0.001/9.0, beta=0.75, name='norm1')
net = slim.conv2d(net, 64, [5, 5], scope='conv2')
net = tf.nn.lrn(net, 4, bias=1.0, alpha=0.001/9.0, beta=0.75, name='norm2')
net = slim.max_pool2d(net, [2, 2], 2, scope='pool2')
net = slim.flatten(net)
net = slim.fully_connected(net, 384, scope='fc3')
ValueError: The last dimension of the inputs to Dense should be defined. Found None.

Problems with reshape in GAN's discriminator (Tensorflow)

I was trying to implement various GANs in Tensorflow (after doing it successfully in PyTorch), and I am having some problems while coding the discriminator part.
The code of the discriminator (very similar to the MNIST CNN tutorial) is:
def discriminator(x):
"""Compute discriminator score for a batch of input images.
Inputs:
- x: TensorFlow Tensor of flattened input images, shape [batch_size, 784]
Returns:
TensorFlow Tensor with shape [batch_size, 1], containing the score
for an image being real for each input image.
"""
with tf.variable_scope("discriminator"):
x = tf.reshape(x, [tf.shape(x)[0], 28, 28, 1])
h_1 = leaky_relu(tf.layers.conv2d(x, 32, 5))
m_1 = tf.layers.max_pooling2d(h_1, 2, 2)
h_2 = leaky_relu(tf.layers.conv2d(m_1, 64, 5))
m_2 = tf.layers.max_pooling2d(h_2, 2, 2)
m_2 = tf.contrib.layers.flatten(m_2)
h_3 = leaky_relu(tf.layers.dense(m_2, 4*4*64))
logits = tf.layers.dense(h_3, 1)
return logits
while the code for the generator (architecture of InfoGAN paper) is:
def generator(z):
"""Generate images from a random noise vector.
Inputs:
- z: TensorFlow Tensor of random noise with shape [batch_size, noise_dim]
Returns:
TensorFlow Tensor of generated images, with shape [batch_size, 784].
"""
with tf.variable_scope("generator"):
batch_size = tf.shape(z)[0]
fc = tf.nn.relu(tf.layers.dense(z, 1024))
bn_1 = tf.layers.batch_normalization(fc)
fc_2 = tf.nn.relu(tf.layers.dense(bn_1, 7*7*128))
bn_2 = tf.layers.batch_normalization(fc_2)
bn_2 = tf.reshape(bn_2, [batch_size, 7, 7, 128])
c_1 = tf.nn.relu(tf.contrib.layers.convolution2d_transpose(bn_2, 64, 4, 2, padding='valid'))
bn_3 = tf.layers.batch_normalization(c_1)
c_2 = tf.tanh(tf.contrib.layers.convolution2d_transpose(bn_3, 1, 4, 2, padding='valid'))
So far, so good. The number of parameters is correct (checked it). However, I am having some problems in the next block of code:
tf.reset_default_graph()
# number of images for each batch
batch_size = 128
# our noise dimension
noise_dim = 96
# placeholder for images from the training dataset
x = tf.placeholder(tf.float32, [None, 784])
# random noise fed into our generator
z = sample_noise(batch_size, noise_dim)
# generated images
G_sample = generator(z)
with tf.variable_scope("") as scope:
#scale images to be -1 to 1
logits_real = discriminator(preprocess_img(x))
# Re-use discriminator weights on new inputs
scope.reuse_variables()
logits_fake = discriminator(G_sample)
# Get the list of variables for the discriminator and generator
D_vars = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, 'discriminator')
G_vars = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, 'generator')
# get our solver
D_solver, G_solver = get_solvers()
# get our loss
D_loss, G_loss = gan_loss(logits_real, logits_fake)
# setup training steps
D_train_step = D_solver.minimize(D_loss, var_list=D_vars)
G_train_step = G_solver.minimize(G_loss, var_list=G_vars)
D_extra_step = tf.get_collection(tf.GraphKeys.UPDATE_OPS, 'discriminator')
G_extra_step = tf.get_collection(tf.GraphKeys.UPDATE_OPS, 'generator')
The problem I am getting is where I am doing the reshape in the discriminator, and the error says:
ValueError: None values not supported.
Sure, the value for the batch_size is None (btw, the same error I am getting even where I am changing it to some number), but shape function (as far as I understand) should get the dynamic shape, not the static one. I think that I am a bit lost here.
For what is worth, I am giving here the link to the entire notebook I am working: https://github.com/TheRevanchist/GANs/blob/master/GANs-TensorFlow.ipynb if someone wants to look at it.
NB: The code here is part of the Stanford CS231n assignment. I have no affiliation with Stanford though, so it isn't homework cheating (proof: the course is finished months ago).
The generator seems to be the problem. The output size should match the discriminator. And the other issues are batch norm should be applied before the activation unit. I have modified the code:
with tf.variable_scope("generator"):
fc = tf.layers.dense(z, 4*4*128)
bn_1 = leaky_relu(tf.layers.batch_normalization(fc))
bn_1 = tf.reshape(bn_1, [-1, 4, 4, 128])
c_1 = tf.layers.conv2d_transpose(bn_1, 64, 5, strides=2, padding='same')
bn_2 = leaky_relu(tf.layers.batch_normalization(c_1))
c_2 = tf.layers.conv2d_transpose(bn_2, 32, 5, strides=2, padding='same')
bn_3 = leaky_relu(tf.layers.batch_normalization(c_2))
c_3 = tf.layers.conv2d_transpose(bn_3, 1, 5, strides=2, padding='same')
c_3 = tf.layers.batch_normalization(c_3)
c_3 = tf.image.resize_images(c_3, (28, 28))
c_3 = tf.contrib.layers.flatten(c_3)
c_3 = tf.tanh(c_3)
return c_3
Your code gives the below output when run with the above changes
Instead of passing None to reshape you must pass -1.
So this:
x = tf.reshape(x, [tf.shape(x)[0], 28, 28, 1])
becomes
x = tf.reshape(x, [-1, 28, 28, 1])
and this:
bn_2 = tf.reshape(bn_2, [batch_size, 7, 7, 128])
becomes:
bn_2 = tf.reshape(bn_2, [-1, 7, 7, 128])
It will infer the batch size from the rest of the shape you provided.

Using Estimator for building an LSTM network

I am trying to build an LSTM network using an Estimator. My data looks like
X = [[1,2,3], [2,3,4], ... , [98,99,100]]
y = [2, 3, ... , 99]
I am using an Estimator:
regressor = learn.Estimator(model_fn=lstm_model,
params=model_params,
)
where the lstm_model function is
def lstm_model(features, targets, mode, params):
def lstm_cells(layers):
if isinstance(layers[0], dict):
return [tf.nn.rnn_cell.BasicLSTMCell(layer['steps'],state_is_tuple=True) for layer in layers]
return [tf.nn.rnn_cell.BasicLSTMCell(steps, state_is_tuple=True) for steps in layers]
stacked_lstm = tf.nn.rnn_cell.MultiRNNCell(lstm_cells(params['rnn_layers']), state_is_tuple=True)
output, layers = tf.nn.rnn(stacked_lstm, [features], dtype=tf.float32)
return learn.models.linear_regression(output, targets)
and params are
model_params = {
'steps': 1000,
'learning_rate': 0.03,
'batch_size': 24,
'time_steps': 3,
'rnn_layers': [{'steps': 3}],
'dense_layers': [10, 10]
}
and then I do the fitting
regressor.fit(X, y)
The issue I am facing is
output, layers = tf.nn.rnn(stacked_lstm, [features], dtype=tf.float32)
requires a sequence but I am not sure how to split my features to into list of tensors. The shape of features inside the lstm_model function is (?, 3)
I have two questions, how do I do the training in batches? and how do I split 'features' so
output, layers = tf.nn.rnn(stacked_lstm, [features], dtype=tf.float32)
doesn't throw and error. The error I am getting is
raise TypeError("%s that don't all match." % prefix)
TypeError: Tensors in list passed to 'values' of 'Concat' Op have types [float64, float32] that don't all match.
I am using tensorflow 0.12
I had to set the shape for features to be
(batch_size, time_step, 1) or (None, time_step, 1) and then unstack the features to go in the rnn. Unstacking the features in the "time_step" so you have a list of tensors with the size of time steps and the shape for each tensor should be (None, 1) or (batch_size, 1)