What does `y_dim` represent in the following GANs model? - tensorflow

This question is about an implementation example of GANs using TensorFlow on images.
I've excerpted part of code that I thought was enough to provide a context, reference for full code. In following code, it has defined a discriminator function, which can be considered as a typical convolution neural network, loosely speaking. As it is seen, discriminator operations are conditioned on y_dim, could someone help explain what is y_dim? Looking at Args, I am still very confused about the definition of y_dim.
class DCGAN(object):
def __init__(self, sess, image_size=108, is_crop=True,
batch_size=64, sample_size=64, output_size=64,
y_dim=None, z_dim=100, gf_dim=64, df_dim=64,
gfc_dim=1024, dfc_dim=1024, c_dim=3, dataset_name='default',
checkpoint_dir=None, sample_dir=None):
"""
Args:
sess: TensorFlow session
batch_size: The size of batch. Should be specified before training.
output_size: (optional) The resolution in pixels of the images. [64]
y_dim: (optional) Dimension of dim for y. [None]
z_dim: (optional) Dimension of dim for Z. [100]
gf_dim: (optional) Dimension of gen filters in first conv layer. [64]
df_dim: (optional) Dimension of discrim filters in first conv layer. [64]
gfc_dim: (optional) Dimension of gen units for for fully connected layer. [1024]
dfc_dim: (optional) Dimension of discrim units for fully connected layer. [1024]
c_dim: (optional) Dimension of image color. For grayscale input, set to 1. [3]
"""
...
def discriminator(self, image, y=None, reuse=False):
if reuse:
tf.get_variable_scope().reuse_variables()
if not self.y_dim:
h0 = lrelu(conv2d(image, self.df_dim, name='d_h0_conv'))
h1 = lrelu(self.d_bn1(conv2d(h0, self.df_dim * 2, name='d_h1_conv')))
h2 = lrelu(self.d_bn2(conv2d(h1, self.df_dim * 4, name='d_h2_conv')))
h3 = lrelu(self.d_bn3(conv2d(h2, self.df_dim * 8, name='d_h3_conv')))
h4 = linear(tf.reshape(h3, [self.batch_size, -1]), 1, 'd_h3_lin')
return tf.nn.sigmoid(h4), h4
else:
yb = tf.reshape(y, [self.batch_size, 1, 1, self.y_dim])
x = conv_cond_concat(image, yb)
h0 = lrelu(conv2d(x, self.c_dim + self.y_dim, name='d_h0_conv'))
h0 = conv_cond_concat(h0, yb)
h1 = lrelu(self.d_bn1(conv2d(h0, self.df_dim + self.y_dim, name='d_h1_conv')))
h1 = tf.reshape(h1, [self.batch_size, -1])
h1 = tf.concat(1, [h1, y])
h2 = lrelu(self.d_bn2(linear(h1, self.dfc_dim, 'd_h2_lin')))
h2 = tf.concat(1, [h2, y])
h3 = linear(h2, 1, 'd_h3_lin')
return tf.nn.sigmoid(h3), h3

y_dim is the length of the labels.
It seems to be used in the DCGAN class to say 'Do we know the number of training labels'. If we do we can define all these tensors with the known dimensions.
E.g. Look at the main.py file where the MNIST dataset is used. See that y_dim is set to 10 for the 0-9 labels? It is 'None' for the second option.
Hope this helps.

Related

How to use Keras Multiply() with tf.Variable?

How do I multiply tf.keras.layers with tf.Variable?
Context: I am creating a sample dependent convolutional filter, which consists of a generic filter W that is transformed through sample dependent shifting + scaling. Therefore, the convolutional original filter W is transformed into aW + b where a is sample dependent scaling and b is sample dependent shifting. One application of this is training an autoencoder where the sample dependency is the label, so each label shifts/scales the convolutional filter. Because of sample/label dependent convolutions, I am using tf.nn.conv2d which takes the actual filters as input (as opposed to just the number/size of filters) and a lambda layer with tf.map_fn to apply a different "transformed filter" (based on the label) for each sample. Although the details are different, this kind of sample-dependent convolution approach is discussed in this post: Tensorflow: Convolutions with different filter for each sample in the mini-batch.
Here is what I am thinking:
input_img = keras.Input(shape=(28, 28, 1))
label = keras.Input(shape=(10,)) # number of classes
num_filters = 32
shift = layers.Dense(num_filters, activation=None, name='shift')(label) # (32,)
scale = layers.Dense(num_filters, activation=None, name='scale')(label) # (32,)
# filter is of shape (filter_h, filter_w, input channels, output filters)
filter = tf.Variable(tf.ones((3,3,input_img.shape[-1],num_filters)))
# TODO: need to shift and scale -> shift*(filter) + scale along each output filter dimension (32 filter dimensions)
I am not sure how to implement the TODO part. I was thinking of tf.keras.layers.Multiply() for scaling and tf.keras.layers.Add() for shifting, but they do not seem to work with tf.Variable to my knowledge. How do I get around this? Assuming the dimensions/shape broadcasting work out, I would like to do something like this (note: the output should still be the same shape as var and is just scaled along each of the 32 output filter dimensions)
output = tf.keras.layers.Multiply()([var, scale])
It requires some work and needs a custom layer. For example you cannot use tf.Variable with tf.keras.Lambda
class ConvNorm(layers.Layer):
def __init__(self, height, width, n_filters):
super(ConvNorm, self).__init__()
self.height = height
self.width = width
self.n_filters = n_filters
def build(self, input_shape):
self.filter = self.add_weight(shape=(self.height, self.width, input_shape[-1], self.n_filters),
initializer='glorot_uniform',
trainable=True)
# TODO: Add bias too
def call(self, x, scale, shift):
shift_reshaped = tf.expand_dims(tf.expand_dims(shift,1),1)
scale_reshaped = tf.expand_dims(tf.expand_dims(scale,1),1)
norm_conv_out = tf.nn.conv2d(x, self.filter*scale + shift, strides=(1,1,1,1), padding='SAME')
return norm_conv_out
Using the layer
import tensorflow as tf
import tensorflow.keras.layers as layers
input_img = layers.Input(shape=(28, 28, 1))
label = layers.Input(shape=(10,)) # number of classes
num_filters = 32
shift = layers.Dense(num_filters, activation=None, name='shift')(label) # (32,)
scale = layers.Dense(num_filters, activation=None, name='scale')(label) # (32,)
conv_norm_out = ConvNorm(3,3,32)(input_img, scale, shift)
print(norm_conv_out.shape)
Note: Note that I haven't added bias. You will need bias as well for the convolution layer. But that's straightfoward.

Parameters computation of LSTM

I am trying to compute the total parameters of LSTM model, and I have some confusion.
I have searched some answers, such as this post and this post. I don't know how what's the role of hidden units play in the parameter computation(h1=64, h2=128 in my case).
import tensorflow as tf
b, t, d_in, d_out = 32, 256, 161, 257
data = tf.placeholder("float", [b, t, d_in]) # [batch, timestep, dim_in]
labels = tf.placeholder("float", [b, t, d_out]) # [batch, timestep, dim_out]
myinput = data
batch_size, seq_len, dim_in = myinput.shape
rnn_layers = []
h1 = 64
c1 = tf.nn.rnn_cell.LSTMCell(h1)
rnn_layers.append(c1)
h2 = 128
c2 = tf.nn.rnn_cell.LSTMCell(h1)
rnn_layers.append(c2)
multi_rnn_cell = tf.nn.rnn_cell.MultiRNNCell(rnn_layers)
rnnoutput, state = tf.nn.dynamic_rnn(cell=multi_rnn_cell,
inputs=myinput, dtype=tf.float32)
sess = tf.Session()
sess.run(tf.global_variables_initializer())
all_trainable_vars = tf.reduce_sum([tf.reduce_prod(v.shape) for v in tf.trainable_variables()])
print(sess.run(all_trainable_vars))
I printed the total number of parameters using Tensorflow, it showed the total number of a parameter is 90880. How can I get this result step by step, Thank you
In your case, you defined a LSTM cell via this line c1 = tf.nn.rnn_cell.LSTMCell(h1). To answer your question, here I will introduce the mathematical definition of LSTM. Like the picture (picture source wikipedia-lstm) below,
t: means at time t.
f_t is named forget gate.
i_t is named input gate.
o_t is named are called.
c_t, h_t are named the cell state and hidden state of the LSTM cell, respectively.
For tf.nn.rnn_cell.LSTMCell(h1), h1=64 is the dimension of h_t, i.e. dim(h_t) = 64.

Understanding Tensorflow BasicLSTMCell Kernel and Bias shape

I want to understand better those shape of the Tensorflow´s BasicLSTMCell Kernel and Bias.
#tf_export("nn.rnn_cell.BasicLSTMCell")
class BasicLSTMCell(LayerRNNCell):
input_depth = inputs_shape[1].value
h_depth = self._num_units
self._kernel = self.add_variable(
_WEIGHTS_VARIABLE_NAME,
shape=[input_depth + h_depth, 4 * self._num_units])
self._bias = self.add_variable(
_BIAS_VARIABLE_NAME,
shape=[4 * self._num_units],
initializer=init_ops.zeros_initializer(dtype=self.dtype))
Why does the kernel have the shape=[input_depth + h_depth, 4 * self._num_units]) and the bias the shape = [4 * self._num_units] ? Maybe the factor 4 come from the forget gate, block input, input gate and output gate? And what´s the reason for the summation of input_depth and h_depth?
More information about my LSTM Network:
num_input = 12, timesteps = 820, num_hidden = 64, num_classes = 2.
With tf.trainables_variables() i get the following information:
Variable name: Variable:0 Shape: (64, 2) Parameters: 128
Variable name: Variable_1:0 Shape: (2,) Parameters: 2
Variable name: rnn/basic_lstm_cell/kernel:0 Shape: (76, 256) Parameters: 19456
Variable name: rnn/basic_lstm_cell/bias:0 Shape: (256,) Parameters: 256
The following Code defines my LSTM Network.
def RNN(x, weights, biases):
x = tf.unstack(x, timesteps, 1)
lstm_cell = rnn.BasicLSTMCell(num_hidden)
outputs, states = rnn.static_rnn(lstm_cell, x, dtype=tf.float32)
return tf.matmul(outputs[-1], weights['out']) + biases['out']
First, about summing input_depth and h_depth: RNNs generally follow equations like h_t = W*h_t-1 + V*x_t to compute the state h at time t. That is, we apply a matrix multiplication to the last state and the current input and add the two. This is actually equivalent to concatenating h_t-1 and x_t (let's just call this c), "stacking" the two matrices W and V (let's just call this S) and computing S*c.
Now we only have one matrix multiplication instead of two; I believe this can be parallelized more effectively so this is done for performance reasons. Since h_t-1 has size h_depth and x has size input_depth we need to add the two dimensionalities for the concatenated vector c.
Second, you are right about the factor 4 coming from the gates. This is essentially the same as above: Instead of carrying out four separate matrix multiplications for the input and each of the gates, we carry out one multiplication that results in a big vector that is the input and all four gate values concatenated. Then we can just split this vector into four parts. In the LSTM cell source code this happens in lines 627-633.

What's the effect of projection layer after n-grams convolution banks?

I'm studying CBHG module for extracting representations from sequence in Tacotron.
CBHG is consisted of (1-D convolution bank - highway network - bidirectional GRU).
Inputs of 1-d conv is made by 'lookup_table' for embedding a~z.
After 1-D conv caculate, there is 'tf.concat' for concated the results.
And there is projection with this result.
I think it's associate with word2vec embedding like 'CBOW' but It's very hard to me.
What is the effect of projection layer after concated result by n-grams convolution? Extracting meaningful 'n' in n-grams conv banks?
Please help me.
with tf.variable_scope('conv_bank'):
# Convolution bank: concatenate on the last axis
# to stack channels from all convolutions
conv_fn = lambda k: \
conv1d(inputs, k, bank_channel_size,
tf.nn.relu, is_training, 'conv1d_%d' % k)
conv_outputs = tf.concat(
[conv_fn(k) for k in range(1, bank_size+1)], axis=-1,
)
# Maxpooling:
maxpool_output = tf.layers.max_pooling1d(
conv_outputs,
pool_size=maxpool_width,
strides=1,
padding='same')
# Two projection layers:
proj_out = maxpool_output
for idx, proj_size in enumerate(proj_sizes):
activation_fn = None if idx == len(proj_sizes) - 1 else tf.nn.relu
proj_out = conv1d(
proj_out, proj_width, proj_size, activation_fn,
is_training, 'proj_{}'.format(idx + 1))

how to stack LSTM layers using TensorFlow

what I have is the following, which I believe is a network with one hidden LSTM layer:
# Parameters
learning rate = 0.001
training_iters = 100000
batch_size = 128
display_step = 10
# Network Parameters
n_input = 13
n_steps = 10
n_hidden = 512
n_classes = 13
# tf Graph input
x = tf.placeholder("float", [None, n_steps, n_input])
y = tf.placeholder("float", [None, n_classes])
# Define weights
weights = {
'out' : tf.Variable(tf.random_normal([n_hidden, n_classes]))
}
biases = {
'out' : tf.Variable(tf.random_normal([n_classes]))
}
However, I am trying to build an LSTM network using TensorFlow to predict power consumption. I have been looking around to find a good example, but I could not find any model with 2 hidden LSTM layers. Here's the model that I would like to build:
1 input layer,
1 output layer,
2 hidden LSTM layers(with 512 neurons in each),
time step(sequence length): 10
Could anyone guide me to build this using TensorFlow? ( from defining weights, building input shape, training, predicting, use of optimizer or cost function, etc), any help would be much appreciated.
Thank you so much in advance!
Here is how I do it in a translation model with GRU cells. You can just replace the GRU with an LSTM. It is really easy just use tf.nn.rnn_cell.MultiRNNCell with a list of the multiple cells it should wrap. In the code bellow I am manually unrolling it but you can pass it to tf.nn.dynamic_rnn or tf.nn.rnn as well.
y = input_tensor
with tf.variable_scope('encoder') as scope:
rnn_cell = rnn.MultiRNNCell([rnn.GRUCell(1024) for _ in range(3)])
state = tf.zeros((BATCH_SIZE, rnn_cell.state_size))
output = [None] * TIME_STEPS
for t in reversed(range(TIME_STEPS)):
y_t = tf.reshape(y[:, t, :], (BATCH_SIZE, -1))
output[t], state = rnn_cell(y_t, state)
scope.reuse_variables()
y = tf.pack(output, 1)
First you need some placeholders to put your training data (one batch)
x_input = tf.placeholder(tf.float32, [batch_size, truncated_series_length, 1])
y_output = tf.placeholder(tf.float32, [batch_size, truncated_series_length, 1])
A LSTM need a state, which consists of two components, the hidden state and the cell state, very good guide here: https://arxiv.org/pdf/1506.00019.pdf. For every layer in the LSTM you have one cell state and one hidden state.
The problem is that Tensorflow stores this in a LSTMStateTuple which you can not send into placeholder. So you need to store it in a Tensor, and then unpack it into a tuple:
state_placeholder = tf.placeholder(tf.float32, [num_layers, 2, batch_size, state_size])
l = tf.unpack(state_placeholder, axis=0)
rnn_tuple_state = tuple(
[tf.nn.rnn_cell.LSTMStateTuple(l[idx][0], l[idx][1])
for idx in range(num_layers)]
)
Then you can use the built-in Tensorflow API to create the stacked LSTM layer.
cell = tf.nn.rnn_cell.LSTMCell(state_size, state_is_tuple=True)
cell = tf.nn.rnn_cell.MultiRNNCell([cell]*num_layers, state_is_tuple=True)
outputs, state = tf.nn.dynamic_rnn(cell, x_input, initial_state=rnn_tuple_state)
From here you continue with the outputs to calculate logits and then a loss with respect to the y_inputs.
Then you run each batch with the sess.run-command, with truncated backpropagation (good explanation here http://r2rt.com/styles-of-truncated-backpropagation.html)
init_state = np.zeros((num_layers, 2, batch_size, state_size))
...current_state... = sess.run([...state...], feed_dict={x_input:batch_in, state_placeholder:current_state ...})
current_state = np.array(current_state)
You will have to convert the state to a numpy array before feeding it again.
Perhaps it is better to use a librarly like Tflearn or Keras instead?