I'm trying to extract multiple patches around a set of 2D landmarks in the same image using TensorFlow.
Given an input tensor of 2D landmarks of shape[batch_size, num_landmarks, 2] and an input image tensor of shape [batch_size, num_rows, num_cols, num_channels] I would like to return a tensor containing [batch_size, num_landmarks, patch_rows, patch_cols, num_channels].
For now we can assume that batch_size=1 and if so, the following code will do the above:
im = tf.tile(im, (num_landmarks, 1, 1, 1))
patches = tf.image.extract_glimpse(im, (patch_cols, patch_rows), landmarks, centered=False, normalized=False)
I basically repeat the input image as many times as I have landmarks and then extract the glimpses. This is of course insane, when I have a lot of landmarks, so I was wondering if there exists a better way.
EDIT:
I think tf.gather_nd could do the trick, so I'm working on building the indices I need to extract the patches.
Related
I am trying to apply gabor filter on cifar10 data. As far as I understand, that should be done on two steps:
1 - using n gabor kernels to generate n gabor filtered images.
2 - Stacking the n gabor filtered images to get the original image back then feed it to my neural network.
I tried using
tfio.experimental.filter.gabor(
input, freq, sigma=None, theta=0, nstds=3, offset=0, mode=None,
constant_values=None, name=None
)
and here I have two questions:
1 - which argument in the tensorflow's function represents the kernel size?
2- After using this tensorflow's function, will I get back a stacked image or just filtered images that I should stack later using another function.
If we assume the following for the parameters:
lambda is the wavelength of the sinusoidal factor,
theta is the orientation of the normal to the parallel stripes of a Gabor function,
psi is the phase offset,
sigma is the sigma/standard deviation of the Gaussian envelope and
gamma is the spatial aspect ratio
Then, this is equivalent to calling the TensorFlow/IO function with the following parameters:
tfio.experimental.filter.gabor(input,
freq=(1/lambda),
sigma=(sigma, sigma/gamma),
theta=theta,
nstds=3,
offset=psi,
mode=None,
constant_values=None,
name=None)
The kernel size is defined by the nstds argument.
The output of the function is the resulting images already stacked. To convince ourself, we can look at the source code:
real = tf.nn.depthwise_conv2d(
input, tf.cast(tf.math.real(g), input.dtype), [1, 1, 1, 1], padding="VALID"
)
imag = tf.nn.depthwise_conv2d(
input, tf.cast(tf.math.imag(g), input.dtype), [1, 1, 1, 1], padding="VALID"
)
The input in convolved with the gabor kernel with a depthwise convolution, that concatenates the output together, as specified in the documentation of the function (emphasis is mine):
Given a 4D input tensor ('NHWC' or 'NCHW' data formats) and a filter tensor of shape [filter_height, filter_width, in_channels, channel_multiplier] containing in_channels convolutional filters of depth 1, depthwise_conv2d applies a different filter to each input channel (expanding from 1 channel to channel_multiplier channels for each), then concatenates the results together. The output has in_channels * channel_multiplier channels.
I was trying to build a LSTM neural net with Keras to predict tags for words in a set of sentences.
The implementation is all pretty straight forward, but the surprising thing was that
given the exactly same and otherwise correctly implemented code and
using Tensorflow 1.4.0 with Keras running on Tensorflow Backend,
on some people's computers, it returned tensors with wrong dimensions, while for others it worked perfectly.
The problem occured in the following context:
First, we turned the list of training sentences (sentences as a list of word indeces) into a 2-D matrix using the pad_sequences method from Keras (https://keras.io/preprocessing/sequence/):
def do_padding(sequences, length, padding_value):
return pad_sequences(sequences, maxlen=length, padding='post',
truncating='post', value=padding_value)
train_sents_padded = do_padding(train_sents, MAX_LENGTH,
word_to_id[PAD_TOKEN])
Next, we used our do_padding method on the corresponding training labels to turn them into a padded matrix. At the same time, we used the Keras to_categorical method (https://keras.io/utils/#to_categorical) to add a one-hot encoded vector to the created label matrix (one one-hot vector for each cell in the matrix, that means for word in each training sentence):
train_labels_padded = to_categorical(do_padding(train_labels, MAX_LENGTH,
label_to_id["O"]), NUM_LABELS)
We expected the resulting shape to be 3-D: (len(train_labels), MAX_LENGTH, NUM_LABELS). Yet, we found that the resulting shape was 2-D and basically looked like this: ((len(train_labels) x MAX_LENGTH), NUM_LABELS), meaning the numbers on the two expected dimensions len(train_labels) and MAX_LENGTH were multiplied and flattened into one dimension.
Interestingly, this problem as said before only occured for about 50% of the people, using Tensorflow 1.4.0 and Keras running on Tensorflow Backend.
We managed to solve the problem by reshaping the label matrix manually:
train_labels_padded = np.reshape(train_labels_padded, (len(train_labels),
MAX_LENGTH, NUM_LABELS))
I was just wondering if any of you have experienced a similar problem and have figured out the reason why this happens.
I have two tensors of the following shapes:
tensor1 => shape(?, ?, 100) # corresponds to [batch_size, max_time, embedding_size]
tensor2 => shape(?, 100) # corresponds to [batch_size, embedding_size]
What I wish to do is for every [100] dimensional vector in tensor2 obtain a matrix multiplication with corresponding [max_time, 100] dimensional matrix in tensor1 to get batch_size number of max_time dimensional vectors; which is same as a [batch_size, max_time] dimensional matrix.
For those who know: I am basically trying to implement the content based attention over the encoded inputs given by the encoder of a seq2seq model. all the [max_time] dimensional vectors are just the attention values that I later softmax.
I am aware that tensorflow provides the AttentionWrapper as well as various helpers in the contrib package. However, I wish to do this because I am experimenting with the attention mechanism to obtain a hybrid attention mask.
I have tried the tf.while_loop but, got stuck in the ? shape to unroll the loop. A vectorized implementation also doesn't seem very straight forward to me. Please help.
What you can do is use tf.matmul and handle your vectors like 100 * 1 matrices.
tensor2 = tf.expand_dims(tensor2, 2)
result = tf.matmul(tensor1, tensor2)
I am following this tutorial in order to understand CNNs in NLP. There are a few things which I don't understand despite having the code in front of me. I hope somebody can clear a few things up here.
The first rather minor thing is the sequence_lengthparameter of the TextCNN object. In the example on github this is just 56 which I think is the max-length of all sentences in the training data. This means that self.input_x is a 56-dimensional vector which will contain just the indices from the dictionary of a sentence for each word.
This list goes into tf.nn.embedding_lookup(W, self.intput_x) which will return a matrix consisting of the word embeddings of those words given by self.input_x. According to this answer this operation is similar to using indexing with numpy:
matrix = np.random.random([1024, 64])
ids = np.array([0, 5, 17, 33])
print matrix[ids]
But the problem here is that self.input_x most of the time looks like [1 3 44 25 64 0 0 0 0 0 0 0 .. 0 0]. So am I correct if I assume that tf.nn.embedding_lookup ignores the value 0?
Another thing I don't get is how tf.nn.embedding_lookup is working here:
# Embedding layer
with tf.device('/cpu:0'), tf.name_scope("embedding"):
W = tf.Variable(
tf.random_uniform([vocab_size, embedding_size], -1.0, 1.0),
name="W")
self.embedded_chars = tf.nn.embedding_lookup(W, self.input_x)
self.embedded_chars_expanded = tf.expand_dims(self.embedded_chars, -1)
I assume, taht self.embedded_chars is the matrix which is the actual input to the CNN where each row represents the word embedding of one word. But how can tf.nn.embedding_lookup know about those indices given by self.input_x?
The last thing which I don't understand here is
W is our embedding matrix that we learn during training. We initialize it using a random uniform distribution. tf.nn.embedding_lookup creates the actual embedding operation. The result of the embedding operation is a 3-dimensional tensor of shape [None, sequence_length, embedding_size].
Does this mean that we are actually learning the word embeddings here? The tutorial states at the beginning:
We will not used pre-trained word2vec vectors for our word embeddings. Instead, we learn embeddings from scratch.
But I don't see a line of code where this is actually happening. The code of the embedding layer does not look like as if there is anything being trained or learned - so where is it happening?
Answer to ques 1 (So am I correct if I assume that tf.nn.embedding_lookup ignores the value 0?) :
The 0's in the input vector is the index to 0th symbol in the vocabulary, which is the PAD symbol. I don't think it gets ignored when the lookup is performed. 0th row of the embedding matrix will be returned.
Answer to ques 2 (But how can tf.nn.embedding_lookup know about those indices given by self.input_x?) :
Size of the embedding matrix is [V * E] where is the size of vocabulary and E is dimension of embedding vector. 0th row of matrix is embedding vector for 0th element of vocabulary, 1st row of matrix is embedding vector for 1st element of vocabulary.
From the input vector x, we get the indices of words in vocabulary, which are used for indexing the embedding matrix.
Answer to ques 3 (Does this mean that we are actually learning the word embeddings here?).
Yes, we are actually learning the embedding matrix.
In the embedding layer, in line W = tf.Variable( tf.random_uniform([vocab_size, embedding_size], -1.0, 1.0),name="W") W is the embedding matrix and by default, in tensorflow trainable=TRUE for variable. So, W will also be a learned parameter. To use pre- trained model, set trainable = False.
For detailed explanation of the code you can follow blog: https://agarnitin86.github.io/blog/2016/12/23/text-classification-cnn
I have one image data tensor with shape of B*H*W*C and one position tensor with shape of B*H*W*2. The values in position tensor are pixel coordinates and I want to sample pixels in image data tensor according to these pixel coordinates. I have tried one way to do that like reshaping the tensor to one-dimension tensor, but I think it's really inconvenient. I wonder whether I could implement it by some more convenient approach like matrix mapping(e.g. remap in opencv).
I would first ask if you are sure the position matrix isn't redundant. If the position matrix entries simply correspond to the pixel locations in the image array, then for a given application however you access the position matrix could be used instead on the image data.
Perhaps as a starting point, running
sess = tf.Session()
np_img, np_pos = sess.run([tf_img, tf_pos], feed_dict={...})
will convert tensors to numpy arrays, which may make your operations easier.
Otherwise, a 1D-tensor isn't that bad and there are TF functions for reshaping easily.