Tensorflow reduce dimensions of rank 3 tensor - tensorflow

I am trying to build a CLDNN that is researched in the paper here
After the convolutional layers, the features go through a dim-reduction layer. At the point when the features leave the conv layers, the dimensions are [?, N, M]. N represents the number of windows and I think the network requires the reduction in the dimension M, so the dimensions of the features after the dim-red layer is [?,N,Q] , where Q < M.
I have two questions.
How do I do this in TensorFlow? I tried using a weight with
W = tf.Variable( tf.truncated_normal([M,Q],stddev=0.1) )
I thought the multiplication of tf.matmul(x,W) would yield [?, N, Q] but [?, N, M] and [M, Q] are not valid dimensions for multiplication. I would like to keep N constant and reduce the dimension of M.
What kind of non-linearity should I apply to the outcome of tf.matmul(x,W)? I was thinking about using a ReLU but I couldn't even get #1 done.

According to the linked paper (T. N. Sainath et al.: "Convolutional, Long Short-Term Memory, Fully Connected Deep Neural Networks"),
[...] reducing the dimensionality, such that we have 256 outputs from the linear layer, was appropriate.
That means, whatever the input size is, i.e. [?, N, M] or any other dimensionality (always assuming that the first dimension is the number of samples in a mini-batch, denoted by ?), the output will be [?, Q], where typically Q=256.
As we are doing dimensionality reduction by multiplying the input with a weight matrix, no spatial information will be preserved. This means, that it doesn't matter whether each input is a matrix or a vector, so we can reshape the input to the linear layer x to have the dimensions [?, N*M]. Then, we can create a simple matrix multiplication tf.matmul(x, W) where W is a matrix with dimensions [N*M, Q].
W = tf.Variable(tf.truncated_normal([N*M, Q], stddev=0.1))
x_vec = tf.reshape(x, shape=(-1, N*M))
y = tf.matmul(x_vec, W)
Finally, regarding question 2: in the paper, the dimensionality reduction layer is a linear layer, i.e. you do not apply a non-linearity to the output.

Related

Is tensorflow row- or column-vector centric?

I'm new to tensorflow, keras and a bit confused about how tf treats its input and matrix multiplication and all that jazz.
You see, in Linear Algebra (LA) you can treat contravariant vectors as columns matices(math standard)
or as rows matrices
Somewhere I've heard that:
a tensor of shape (n,) e.g. [1,2,3,4,5] is not considered as a "vector" according to tf. Only tensors of shape (n,1) and (1,n) are considered vectors. But in many manuals people use those words without any system. Createing complete confusion in my head.
a tensor of shape (n,1) is considered as a column vector (according to tf)
but sending this column-vector (n,1) to some layer.call() as an input you can see that it is treated as a row-vector, because it's being multiplicated on the right by a self.w, but for column-oriented LA it must have been multiplicated by the self.w the left.
def call(self, inputs):
return tf.matmul(inputs, self.w) + self.b
So the questions are these:
What does it mean being x-oriented according to tf?
Is tensor flow column-vector or row-vector oriented?
What's up with non-vectors (n,1), and why those are not "vectors" according to TF?
What is expected as an input to a layer column or row vectors?
left-right matrix multiplication x*W+b in TF source code. Why it's x on the right and M on the left and not vice-versa? Why if layer expects a column-vector as an input its being multiplied by W on the right?
I see that I'm confused and can't clearly state the question. Please, be patient. Thanks.
It is because the elements of a rank 1 tensor are treated as a scalar.
For example
Example 1.
x = tf.constant(5)
y = tf.constant([1,2,3])
x*y will produce [5,10,15]
Example 2.
x = tf.constant([1,2,3])
y = tf.constant([[1],[2],[3]])
x*y will produce [[1 2 3] [2 4 6] [3 6 9]].
It also uses broadcast, whenever the shapes are incompatible. https://numpy.org/doc/stable/user/basics.broadcasting.html

Are weights 1-D or 2-D in softmax Regression?

I've learned ML and have been learning DL from Andrew N.G's coursera courses, and every time he talks about a linear classifier, the weights are just a 1-D vector.
Even during the assignments, when we roll an image into a 1-D vector(pixels * 3), the weights would still be a 1-D vector.
I now have started O'Reilly's "Learning TensorFlow" book, and came across the first example. The weights initialization in tensorflow was a bit different.
The book says the following(Page 14):
"Since we are not going to use the spatial information at this point, we will unroll our image pixels as a single long vector denoted x (Figure 2-2). Then
$xw^0 = ∑x_i w^0_i$
will be the evidence for the image containing the digit 0 (and in the same way we will have $w^d$ weight vectors for each one of the other digits, d = 1, . . . , 9)."
and the corresponding TensorFlow code:
data = input_data.read_data_sets(DATA_DIR, one_hot=True)
x = tf.placeholder(tf.float32, [None, 784])
W = tf.Variable(tf.zeros([784, 10]))
y_true = tf.placeholder(tf.float32, [None, 10])
y_pred = tf.matmul(x, W)
Why are the weights 2-D here. Are weights 2-D in softmax Linear Classifier?
In the coursera course, when he taught Softmax Linear Classifier, he still says the weights are 1-D. Can anyone explain this?
Yes, You are right that weights are 1-D, but that is just for 1 neuron.
If you consider a straightforward layered neural network, it will have some number of layers(Just 1 layer with 10 neurons in your code). So, in tensorflow, the weights variable contains weights for entire layer and not a single neuron, which makes it a 2-D array.
W = tf.Variable(tf.zeros([784, 10]))
This line means that there are 10 neurons, each with a weight array of length 784.
One rule of thumb to understand this in tensorflow is that weight dimentions are written as..
W = tf.Variable(tf.zeros([output_of_previous_layer, output_of_current_layer]))
or
W = tf.Variable(tf.zeros([input_of_current_layer, input_of_next_layer]))
You can read more about this at Intro to Neural Networks

How to constrain a layer to be a probability matrix?

I recently read this paper which deals with noisy labels in convolutional neural networks.
They model label noise by a probability transition matrix which forms a simple
constrained linear layer after the softmax output.
So as an example we may have a 3-by-3 probability transition matrix (3 classes):
Example probability transition matrix. The sum of each column has to be 1.
This matrix Q is basically trained in the same way as the rest of the network via backpropagation. But it needs to be constrained to be a probability matrix. Quote from the paper:
After taking a gradient step with the Q and the model
weights, we project Q back to the subspace of probability matrices because it represents conditional probabilities.
Now I am wondering what is the best way to implement such a layer in tensorflow.
I have some ideas but i'm not sure what could work or is best procedure.
1) Hard code the constraint in the model before any training is done, something like:
# ... build conv model without Q
[...]
# shape of y_conv (output CNN) assumed to be a [3,1] vector
y_conv = tf.nn.softmax(y_conv, 0)
# add linear layer representing Q, no bias
W_Q = weight_variable([3, 3])
# add constraint: columns are valid probability distribution
W_Q = tf.nn.softmax(W_Q, 0)
# output of model:
Q_out = tf.matmul(W_Q, y_conv)
# now compute loss, gradients and start training
2) Compute and apply gradients to the whole model (Q included), then apply constraint
train_op = ...
constraint_op = tf.assign(W_Q, tf.nn.softmax(W_Q,0))
sess = tf.session()
# compute and apply gradients in form of a train_op
sess.run(train_op)
sess.run(constraint_op)
I think the second approach is more related to the paper quote, but I am not sure to what extend external assignments confuse training.
Or maybe my ideas are bananas. I hope you can give me some advice!

RNN and LSTM implementation in tensorflow

I have been trying to learn how to code up an RNN and LSTM in tensorflow. I found an example online on this blog post
http://r2rt.com/recurrent-neural-networks-in-tensorflow-ii.html
Below are the snippets which I am having trouble understanding for an LSTM network to be used eventually for char-rnn generation
x = tf.placeholder(tf.int32, [batch_size, num_steps], name='input_placeholder')
y = tf.placeholder(tf.int32, [batch_size, num_steps], name='labels_placeholder')
embeddings = tf.get_variable('embedding_matrix', [num_classes, state_size])
rnn_inputs = [tf.squeeze(i) for i in tf.split(1,
num_steps, tf.nn.embedding_lookup(embeddings, x))]
Different Section of the Code Now where the weights are defined
with tf.variable_scope('softmax'):
W = tf.get_variable('W', [state_size, num_classes])
b = tf.get_variable('b', [num_classes], initializer=tf.constant_initializer(0.0))
logits = [tf.matmul(rnn_output, W) + b for rnn_output in rnn_outputs]
y_as_list = [tf.squeeze(i, squeeze_dims=[1]) for i in tf.split(1, num_steps, y)]
x is the data to be fed, and y is the set of labels. In the lstm equations we have a series of gates, x(t) gets multiplied by a series and prev_hidden_state gets multiplied by some set of weights, biases are added and non-liniearities are applied.
Here are the doubts I have
In this case only one weight matrix is defined does that mean that
works for both x(t) and prev_hidden_state as well.
For the embeddings matrix I know it has to be multiplied by the
weight matrix but why is the first dimension num_classes
For the rnn_inputs we are using squeeze which removes dimensions of 1
but why would I want to do that in a one-hot-encoding.
Also from the splits I understand that we are unrolling the x of
dimension (batch_size X num_steps) into discrete (batch_size X 1)
vectors and then passing these values through the network is this
right
May I help you.
In this case only one weight matrix is defined does that mean that works for both x(t) and prev_hidden_state as well.
There are more weights as you call tf.nn.rnn_cell.LSTMCell. They are the internal weights of the RNN cell, which tensorflow created it implicitly when you call the cell.
The weight matrix you explicitly defined is the transform from the hidden state to the vocabulary space.
You can view the implicit weights accounting for the recurrent parts, taking the previous hidden state and current input and output the new hidden state. And the weight matrix you defined transform the hidden states(i.e. state_size = 200) to the higher vocabulary space.(i.e. vocab_size = 2000)
For further information, maybe you can view this tutorial : http://colah.github.io/posts/2015-08-Understanding-LSTMs/
For the embeddings matrix I know it has to be multiplied by the weight matrix but why is the first dimension num_classes
The num_classes accounts for the vocab_size, the embedding matrix is transforming the vocabulary to the required embedding size(in this example is equal to the state_size).
For the rnn_inputs we are using squeeze which removes dimensions of 1 but why would I want to do that in a one-hot-encoding.
You need to get rid of the extra dimension because tf.nn.rnn takes inputs as (batch_size, input_size) instead of (batch_size, 1, input_size).
Also from the splits I understand that we are unrolling the x of dimension (batch_size X num_steps) into discrete (batch_size X 1) vectors and then passing these values through the network is this right?
Being more precise, after embedding. (batch_size, num_steps, state_size) turns into a list of num_step elements, each of size (batch_size, 1, state_size).
The flow goes like this :
The embedding matrix embed each word as a state_size dimension vector(a row of the matrix), making the size (vocab_size, state_size).
Retrieve the the indices specified by the x placeholder and get the rnn input, which is size (batch_size, num_steps, state_size).
tf.split split the inputs to (batch_size, 1, state_size)
tf.squeeze sqeeze them to (batch_size, state_size), forming the desired input format for tf.nn.rnn.
If there's any problem with the tensorflow methods, maybe you can search them in the tensorflow API for more detailed introduction.

Per pixel softmax for fully convolutional network

I'm trying to implement something like a fully convolutional network, where the last convolution layer uses filter size 1x1 and outputs a 'score' tensor. The score tensor has shape [Batch, height, width, num_classes].
My question is, what function in tensorflow can apply softmax operation for each pixel, independent of other pixels. The tf.nn.softmax ops seems not for such purpose.
If there is no such ops available, I guess I have to write one myself.
Thanks!
UPDATE: if I do have to implement myself, I think I may need to reshape the input tensor to [N, num_claees] where N = Batch x width x height, and apply tf.nn.softmax, then reshape it back. Does it make sense?
Reshaping it to 2d and then reshaping it back, like you guessed, is the right approach.
You can use this function.
I found it by searching from GitHub.
import tensorflow as tf
"""
Multi dimensional softmax,
refer to https://github.com/tensorflow/tensorflow/issues/210
compute softmax along the dimension of target
the native softmax only supports batch_size x dimension
"""
def softmax(target, axis, name=None):
with tf.name_scope(name, 'softmax', values=[target]):
max_axis = tf.reduce_max(target, axis, keep_dims=True)
target_exp = tf.exp(target-max_axis)
normalize = tf.reduce_sum(target_exp, axis, keep_dims=True)
softmax = target_exp / normalize
return softmax