Tensorflow weightNorm with variable length input - tensorflow

in_dim, out_dim = 10, 7
bias = False
activation = None
layer = tfp.layers.weight_norm.WeightNorm(
tf.keras.layers.Dense(out_dim, input_shape=(None, in_dim, ),
use_bias = bias, activation = activation),
input_shape = (None, None, in_dim))
I would like to give a input with variable length in the second dimension.
Suppose I run the below code first.
input = tf.random.normal(shape = (2, 3, 10))
output = layer(input)
output.shape
# [2, 3, 7]
After running the above code, I give another input to network
input2 = tf.random.normal(shape = (2, 4, 10))
output2 = layer(input2)
However, it causes error
Input 0 of layer "weight_norm_3" is incompatible with the layer:
expected shape=(None, 3, 10), found shape=(2, 4, 10)
I would like to give a variable length of second dimension. How can I do it?

Related

Add vs Concatenate layer in Keras

I'm looking through some different neural network architectures and trying to piece together how to recreate them on my own.
One issue I'm running into is the functional difference between the Concatenate() and Add() layers in Keras. It seems like they accomplish similar things (combining multiple layers together), but I don't quite see the real difference between the two.
Here's a sample keras model that takes two separate inputs and then combines them:
inputs1 = Input(shape = (32, 32, 3))
inputs2 = Input(shape = (32, 32, 3))
x1 = Conv2D(kernel_size = 24, strides = 1, filters = 64, padding = "same")(inputs1)
x1 = BatchNormalization()(x1)
x1 = ReLU()(x1)
x1 = Conv2D(kernel_size = 24, strides = 1, filters = 64, padding = "same")(x1)
x2 = Conv2D(kernel_size = 24, strides = 1, filters = 64, padding = "same")(inputs2)
x2 = BatchNormalization()(x2)
x2 = ReLU()(x2)
x2 = Conv2D(kernel_size = 24, strides = 1, filters = 64, padding = "same")(x2)
add = Concatenate()([x1, x2])
out = Flatten()(add)
out = Dense(24, activation = 'softmax')(out)
out = Dense(10, activation = 'softmax')(out)
out = Flatten()(out)
mod = Model([inputs1, inputs2], out)
I can substitute out the Add() layer with the Concatenate() layer and everything works fine, and the models seem similar, but I have a hard time understanding the difference.
For reference, here's the plot of each one with keras's plot_model function:
KERAS MODEL WITH ADDED LAYERS:
KERAS MODEL WITH CONCATENATED LAYERS:
I notice when you concatenate your model size is bigger vs adding layers. Is it the case with a concatenation you just stack the weights from the previous layers on top of each other and with an Add() you add together the values?
It seems like it should be more complicated, but I'm not sure.
As you said, both of them combine input, but they combine in a different way.
their name already suggest their usage
Add() inputs are added together,
For example (assume batch_size=1)
x1 = [[0, 1, 2]]
x2 = [[3, 4, 5]]
x = Add()([x1, x2])
then x should be [[3, 5, 7]], where each element is added
notice that the input shape is (1, 3) and (1, 3), the output is also (1, 3)
Concatenate() concatenates the output,
For example (assume batch_size=1)
x1 = [[0, 1, 2]]
x2 = [[3, 4, 5]]
x = Concatenate()([x1, x2])
then x should be [[0, 1, 2, 3, 4, 5]], where the inputs are horizontally stacked together,
notice that the input shape is (1, 3) and (1, 3), the output is also (1, 6),
even when the tensor has more dimensions, similar behaviors still apply.
Concatenate creates a bigger model for an obvious reason, the output size is simply the size of all inputs summed, while add has the same size with one of the inputs
For more information about add/concatenate, and other ways to combine multiple inputs, see this

Is there a way to divide the keras mobilenetv2 model into submodels?

I am trying to divide the mobilenetv2 model into 2 parts.
I first want to run the first part of the model, save the output, and feed it later on to the second model for certain reasons. I've tried code found here,
but I get the following error:
ValueError: A merge layer should be called on a list of inputs.
I think it is because the model isn't a Sequential.
Can someone help?
As I mentioned in my comments, some layers in mobile_net_v2 expect more than one inputs which are outputs of some other previous layers. Therefore adding them to a sequential model individually causes errors. I have an alternative solution for you. Using the mobile_net_v2 implementation (of my own) in this link, I was able to create the models you want:
import tensorflow as tf
from tensorflow.keras import layers, Model, Sequential
def conv_block(input_tensor, c, s, t, expand=True):
"""
Convolutional Block for mobile net v2
Args:
input_tensor (keras tensor): input tensor
c (int): output channels
s (int): stride size of first layer in the series
t (int): expansion factor
expand (bool): expand filters or not?
Returns: keras tensor
"""
first_conv_channels = input_tensor.get_shape()[-1]
if expand:
x = layers.Conv2D(
first_conv_channels*t,
1,
1,
padding='same',
use_bias=False
)(input_tensor)
x = layers.BatchNormalization()(x)
x = layers.ReLU(6.0)(x)
else:
x = input_tensor
x = layers.DepthwiseConv2D(
3,
s,
'same',
1,
use_bias=False
)(x)
x = layers.BatchNormalization()(x)
x = layers.ReLU(6.0)(x)
x = layers.Conv2D(
c,
1,
1,
padding='same',
use_bias=False
)(x)
x = layers.BatchNormalization()(x)
if input_tensor.get_shape() == x.get_shape() and s == 1:
return x+input_tensor
return x
def splitted_model(input_shape=(224,224,3)):
input = layers.Input(shape=input_shape)
x = layers.Conv2D(
32,
3,
2,
padding='same',
use_bias=False
)(input)
x = layers.BatchNormalization()(x)
x = layers.ReLU(6.0)(x)
x = conv_block(x, 16, 1, 1, expand=False)
x = conv_block(x, 24, 2, 6)
x = conv_block(x, 24, 1, 6)
x = conv_block(x, 32, 2, 6)
x = conv_block(x, 32, 1, 6)
x = conv_block(x, 32, 1, 6)
x = conv_block(x, 64, 2, 6)
x = conv_block(x, 64, 1, 6)
x = conv_block(x, 64, 1, 6)
x = conv_block(x, 64, 1, 6)
model_f = Model(inputs=input, outputs=x)
input_2 = layers.Input(shape=(x.shape[1:]))
x = conv_block(input_2, 96, 1, 6)
x = conv_block(x, 96, 1, 6)
x = conv_block(x, 96, 1, 6)
x = conv_block(x, 160, 2, 6)
x = conv_block(x, 160, 1, 6)
x = conv_block(x, 160, 1, 6)
x = conv_block(x, 320, 1, 6)
x = layers.Conv2D(
1280,
1,
1,
padding='same',
use_bias=False
)(x)
x = layers.BatchNormalization()(x)
x = layers.ReLU(6.0)(x)
x = layers.GlobalAveragePooling2D()(x)
model_h = Model(inputs=input_2, outputs=x)
return model_f, model_h
You could create your two models as such:
IMG_SIZE = 160
IMG_SHAPE = (IMG_SIZE, IMG_SIZE, 3)
model_f, model_h = splitted_model(input_shape=IMG_SHAPE)
Note that the weights are randomly initialized. If you want to have the weights from mobilenet_v2 trained on imagenet, you could run the following code to copy weights:
mobile_net = tf.keras.applications.MobileNetV2(input_shape=IMG_SHAPE,
include_top=False,
weights='imagenet')
layer_f_counter = 0
layer_h_counter = 0
for i in range(len(mobile_net.layers)):
if layer_f_counter<len(model_f.layers):
if len(mobile_net.layers[i].get_weights()) > 0:
if len(model_f.layers[layer_f_counter].get_weights()) > 0:
print(mobile_net.layers[i].name,'here', model_f.layers[layer_f_counter].name, layer_f_counter)
model_f.layers[layer_f_counter].set_weights(mobile_net.layers[i].get_weights())
layer_f_counter += 1
print(layer_f_counter)
else:
if len(model_f.layers[layer_f_counter].get_weights()) > 0:
continue
else:
layer_f_counter+=1
else:
if layer_h_counter<len(model_h.layers):
if len(mobile_net.layers[i].get_weights()) > 0:
if len(model_h.layers[layer_h_counter].get_weights()) > 0:
print(mobile_net.layers[i].name,'here', model_h.layers[layer_h_counter].name, layer_h_counter)
model_h.layers[layer_h_counter].set_weights(mobile_net.layers[i].get_weights())
layer_h_counter += 1
print(layer_h_counter)
else:
if len(model_h.layers[layer_h_counter].get_weights()) > 0:
continue
else:
layer_h_counter+=1
It iterates through the layers of mobilenet_v2 loaded from Keras, it copies the weights of the first part to model_f, and the rest to model_h. You could check that the weights are correctly copied by print out some random layer weights from mobile_net and also the new models as follows:
print(model_f.layers[1].get_weights()) # printing weights of first conv layer in model_f
print(mobile_net.get_layer('Conv1').get_weights()) # printing weights of fist conv layer in mobile_net
Also for model_h:
print(model_h.layers[-4].get_weights()) # printing weights of last conv layer in model_h
print(mobile_net.get_layer('Conv_1').get_weights()) # printing weights of last conv layer in mobile_net
Note that I randomly selected which block to separate moile_net into model_f and model_h, you could edit it to change where you want to split. Hope it helps.

Unpredicted tensor shape after dynamical slicing

I am relatively new to TF and am wondering how to get tensor slice dynamically, from a unknown shape of tensor?
I want to get the weights from the last layer (output_layer) and do softmax and then only look at those indices on the 2-nd dimensions (from the out_reshape). The numpy-type of striding didnt work, so I am using tf.gather instead (after changing the axis so that the desired axis is on the first axis).
And this works:
out_reshape = tf.gather(out_reshape, [1,2,3,4])
This outputs a tensor with [4, 3, ?] (as we expected). But I want to change the indices based on the data fed to T (instead of [1,2,3,4] as shown above).
this gives an unpredicted result (as shown in the code below):
out_reshape = tf.gather(out_reshape, T)
and
out_reshape.shape
this gives TensorShape(None)), but I was expecting to get [?, 3, ?], where the first value is the same length as T (the data fed into the T is an 1-d ndarray, such as [100, 200, 300, 400]).
What is going on here? Why its output shape collapses to None?
The entire code is something like this:
graph = tf.Graph()
tf.reset_default_graph()
with graph.as_default():
y=tf.placeholder(tf.float32, shape =(31, 3, None), name = 'Y_observed') # (samples x) ind x seqlen
T=tf.placeholder(tf.int32, shape =(None), name = "T_observed")
x = tf.placeholder(tf.float32, shape = (None, None, 4) , name = 'X_observed')
model = Conv1D(filters = 16,
kernel_size = filter_width,
padding = "same",
activation='relu')(x)
model = Conv1D(filters = 16,
kernel_size = filter_width,
padding = "same",
activation='relu')(model)
model = Conv1D(filters = n_output_channels,
kernel_size = 1,
padding = "same",
activation='relu')(model)
model_output = tf.identity(model, name='last_layer')
output_layer = tf.get_default_graph().get_tensor_by_name('last_layer:0')
out = output_layer[:, 512:, :]
out_norm = tf.nn.softmax( out, axis=1 )
out_reshape = tf.transpose(out_norm, (1, 2, 0)) # this gives a [?,3,?] tensor
out_reshape = tf.gather(out_reshape, T) # --> Problematic part !
...
updates = tf.train.AdamOptimizer(1e-4).minimize....
...

tf.multinomial outputs number other numbers than range

I am working with the OpenAI gym environment (using policy gradient). My network is outputting an action which is higher than the possible action range.
n_outputs = 9
learning_rate = 0.01
initializer = tf.variance_scaling_initializer()
X = tf.placeholder(tf.float32, shape=[None, 50, 70, 1])
network = tflearn.conv_2d(X, 32, 5, strides=2, activation='relu')
network = tflearn.max_pool_2d(network, 2)
network = tflearn.conv_2d(network, 32, 5, strides=2, activation='relu')
network = tflearn.max_pool_2d(network, 2)
network = tflearn.fully_connected(network, 256, activation='relu')
hidden = tf.layers.dense(network, 64, activation=tf.nn.relu, kernel_initializer=initializer)
logits = tf.layers.dense(hidden, n_outputs)
outputs = tf.nn.softmax(logits)
action = tf.multinomial(outputs, num_samples=1)
It outputs 9, which creates an error in the gym environment.
The full code.
tf.multinomial will sample outside of the range if it encounters numerical error, so in other words - you have NaNs in your graph.

Problems with reshape in GAN's discriminator (Tensorflow)

I was trying to implement various GANs in Tensorflow (after doing it successfully in PyTorch), and I am having some problems while coding the discriminator part.
The code of the discriminator (very similar to the MNIST CNN tutorial) is:
def discriminator(x):
"""Compute discriminator score for a batch of input images.
Inputs:
- x: TensorFlow Tensor of flattened input images, shape [batch_size, 784]
Returns:
TensorFlow Tensor with shape [batch_size, 1], containing the score
for an image being real for each input image.
"""
with tf.variable_scope("discriminator"):
x = tf.reshape(x, [tf.shape(x)[0], 28, 28, 1])
h_1 = leaky_relu(tf.layers.conv2d(x, 32, 5))
m_1 = tf.layers.max_pooling2d(h_1, 2, 2)
h_2 = leaky_relu(tf.layers.conv2d(m_1, 64, 5))
m_2 = tf.layers.max_pooling2d(h_2, 2, 2)
m_2 = tf.contrib.layers.flatten(m_2)
h_3 = leaky_relu(tf.layers.dense(m_2, 4*4*64))
logits = tf.layers.dense(h_3, 1)
return logits
while the code for the generator (architecture of InfoGAN paper) is:
def generator(z):
"""Generate images from a random noise vector.
Inputs:
- z: TensorFlow Tensor of random noise with shape [batch_size, noise_dim]
Returns:
TensorFlow Tensor of generated images, with shape [batch_size, 784].
"""
with tf.variable_scope("generator"):
batch_size = tf.shape(z)[0]
fc = tf.nn.relu(tf.layers.dense(z, 1024))
bn_1 = tf.layers.batch_normalization(fc)
fc_2 = tf.nn.relu(tf.layers.dense(bn_1, 7*7*128))
bn_2 = tf.layers.batch_normalization(fc_2)
bn_2 = tf.reshape(bn_2, [batch_size, 7, 7, 128])
c_1 = tf.nn.relu(tf.contrib.layers.convolution2d_transpose(bn_2, 64, 4, 2, padding='valid'))
bn_3 = tf.layers.batch_normalization(c_1)
c_2 = tf.tanh(tf.contrib.layers.convolution2d_transpose(bn_3, 1, 4, 2, padding='valid'))
So far, so good. The number of parameters is correct (checked it). However, I am having some problems in the next block of code:
tf.reset_default_graph()
# number of images for each batch
batch_size = 128
# our noise dimension
noise_dim = 96
# placeholder for images from the training dataset
x = tf.placeholder(tf.float32, [None, 784])
# random noise fed into our generator
z = sample_noise(batch_size, noise_dim)
# generated images
G_sample = generator(z)
with tf.variable_scope("") as scope:
#scale images to be -1 to 1
logits_real = discriminator(preprocess_img(x))
# Re-use discriminator weights on new inputs
scope.reuse_variables()
logits_fake = discriminator(G_sample)
# Get the list of variables for the discriminator and generator
D_vars = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, 'discriminator')
G_vars = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, 'generator')
# get our solver
D_solver, G_solver = get_solvers()
# get our loss
D_loss, G_loss = gan_loss(logits_real, logits_fake)
# setup training steps
D_train_step = D_solver.minimize(D_loss, var_list=D_vars)
G_train_step = G_solver.minimize(G_loss, var_list=G_vars)
D_extra_step = tf.get_collection(tf.GraphKeys.UPDATE_OPS, 'discriminator')
G_extra_step = tf.get_collection(tf.GraphKeys.UPDATE_OPS, 'generator')
The problem I am getting is where I am doing the reshape in the discriminator, and the error says:
ValueError: None values not supported.
Sure, the value for the batch_size is None (btw, the same error I am getting even where I am changing it to some number), but shape function (as far as I understand) should get the dynamic shape, not the static one. I think that I am a bit lost here.
For what is worth, I am giving here the link to the entire notebook I am working: https://github.com/TheRevanchist/GANs/blob/master/GANs-TensorFlow.ipynb if someone wants to look at it.
NB: The code here is part of the Stanford CS231n assignment. I have no affiliation with Stanford though, so it isn't homework cheating (proof: the course is finished months ago).
The generator seems to be the problem. The output size should match the discriminator. And the other issues are batch norm should be applied before the activation unit. I have modified the code:
with tf.variable_scope("generator"):
fc = tf.layers.dense(z, 4*4*128)
bn_1 = leaky_relu(tf.layers.batch_normalization(fc))
bn_1 = tf.reshape(bn_1, [-1, 4, 4, 128])
c_1 = tf.layers.conv2d_transpose(bn_1, 64, 5, strides=2, padding='same')
bn_2 = leaky_relu(tf.layers.batch_normalization(c_1))
c_2 = tf.layers.conv2d_transpose(bn_2, 32, 5, strides=2, padding='same')
bn_3 = leaky_relu(tf.layers.batch_normalization(c_2))
c_3 = tf.layers.conv2d_transpose(bn_3, 1, 5, strides=2, padding='same')
c_3 = tf.layers.batch_normalization(c_3)
c_3 = tf.image.resize_images(c_3, (28, 28))
c_3 = tf.contrib.layers.flatten(c_3)
c_3 = tf.tanh(c_3)
return c_3
Your code gives the below output when run with the above changes
Instead of passing None to reshape you must pass -1.
So this:
x = tf.reshape(x, [tf.shape(x)[0], 28, 28, 1])
becomes
x = tf.reshape(x, [-1, 28, 28, 1])
and this:
bn_2 = tf.reshape(bn_2, [batch_size, 7, 7, 128])
becomes:
bn_2 = tf.reshape(bn_2, [-1, 7, 7, 128])
It will infer the batch size from the rest of the shape you provided.