I have to swap tensor's axes using tf.transpose to do the batch matrix multiplication (as the code shown below).
tensor input_a: shape [10000, 10000]
tensor input_b: shape [batch_size, 10000, 10]
tensor output: shape [batch_size, 10000, 10]
# reshape_input_b: shape [10000, batch_size, 10]
transpose_input_b = tf.transpose(input_b, [1, 0, 2])
# transpose_input_b : shape [10000, batch_size * 10]
reshape_input_b = tf.reshape(transpose_input_b , [10000, -1])
# ret: shape [10000, batch_size * 10]
ret = tf.matmul(input_a, reshape_input_b, a_is_sparse = True)
# reshape_ret: [10000, batch_size, 10]
reshape_ret = tf.reshape(ret, [10000, -1, 10])
# output : [batch_size, 10000, 10]
output = tf.transpose(reshape_ret, [1, 0, 2])
However, it seems very slow. I noticed this in the document page of tf.transpose:
In numpy transposes are memory-efficient constant time operations as they simply return a new view of the same data with adjusted strides.
TensorFlow does not support strides, so transpose returns a new tensor with the items permuted.
So, I think it might be the reason why my code run slowly? Is there any way to swap tensor's axes, or do the batch matrix multiplication efficiently?
Related
The problem is, I have an indices tensor with shape [batch_size, seq_len, k] and every element in this tensor is in range [0, hidden_dim). I want to create a mask tensor with shape [batch_size, seq_len, hidden_dim] where every element indexed by the indices tensor is 1 and other elements are 0. k is smaller than hidden_dim. For example:
indices = [[[0],[1],[2]]] #batch_size=1, seq_len=3, k=1
mask = tf.zeros(shape=(1,3,3)) #batch_size=1, seq_len=3, hidden_dim = 3
How can I get a target mask tensor whose elements indicated by the indices are 1, i.e.:
target_mask = [[[1, 0, 0], [0, 1, 0], [0, 0, 1]]]
This can be accomplished using tf.one_hot, e.g.:
mask = tf.one_hot(indices, depth=hidden_dim, axis=-1) # [batch, seq_len, k, hidden_dim]
I wasn't clear on what you'd like to happen to k. tf.one_hot() will keep the axis as is, i.e. you'll get a delta distribution for each [batch-index, seq-index, k-index] tuple.
I have a tensor X of shape (N,...) and a boolean index mask mask of shape N. I want to shuffle the subarray of X given by mask along the first axis.
How can this be done non-eagerly and, if possible, in place?
Note: I do not need gradients.
You can do that like this:
import tensorflow as tf
def shuffle_mask(x, mask, seed=None):
n = tf.size(mask)
# Get masked indices
idx_masked = tf.cast(tf.where(mask), n.dtype)
# Shuffle masked indices
idx_masked_shuffled = tf.random.shuffle(tf.squeeze(idx_masked, 1), seed=seed)
# Scatter shuffled indices into place
idx_masked_shuffled_scat = tf.scatter_nd(idx_masked, idx_masked_shuffled, [n])
# Combine shuffled and non-shuffled indices
idx_shuffled = tf.where(mask, idx_masked_shuffled_scat, tf.range(n))
# Gather using resulting indices
return tf.gather(x, idx_shuffled)
# Test
with tf.Graph().as_default(), tf.Session() as sess:
tf.random.set_random_seed(0)
x = tf.constant([[0, 1], [2, 3], [4, 5], [6, 7], [8, 9]])
mask = tf.constant([True, False, True, True, False])
y = shuffle_mask(x, mask)
print(sess.run(y))
# [[6 7]
# [2 3]
# [0 1]
# [4 5]
# [8 9]]
You cannot do the operation "in place", as there are no in-place operations at all in TensorFlow. Tensors are constant, so you will always be replacing one tensor with another.
From the accepted answer in this question,
given the following
input and kernel matrices, the output of tf.nn.conv2d is
[[14 6]
[6 12]]
which makes sense. However, when I make the input and kernel matrices have 3-channels each (by repeating each original matrix), and run the same code:
# the previous input
i_grey = np.array([
[4, 3, 1, 0],
[2, 1, 0, 1],
[1, 2, 4, 1],
[3, 1, 0, 2]
])
# copy to 3-dimensions
i_rgb = np.repeat( np.expand_dims(i_grey, axis=0), 3, axis=0 )
# convert to tensor
i_rgb = tf.constant(i_rgb, dtype=tf.float32)
# make kernel depth match input; same process as input
k = np.array([
[1, 0, 1],
[2, 1, 0],
[0, 0, 1]
])
k_rgb = np.repeat( np.expand_dims(k, axis=0), 3, axis=0 )
# convert to tensor
k_rgb = tf.constant(k_rgb, dtype=tf.float32)
here's what my input and kernel matrices look like at this point
# reshape input to format: [batch, in_height, in_width, in_channels]
image_rgb = tf.reshape(i_rgb, [1, 4, 4, 3])
# reshape kernel to format: [filter_height, filter_width, in_channels, out_channels]
kernel_rgb = tf.reshape(k_rgb, [3, 3, 3, 1])
conv_rgb = tf.squeeze( tf.nn.conv2d(image_rgb, kernel_rgb, [1,1,1,1], "VALID") )
with tf.Session() as sess:
conv_result = sess.run(conv_rgb)
print(conv_result)
I get the final output:
[[35. 15.]
[35. 26.]]
But I was expecting the original output*3:
[[42. 18.]
[18. 36.]]
because from my understanding, each channel of the kernel is convolved with each channel of the input, and the resultant matrices are summed to get the final output.
Am I missing something from this process or the tensorflow implementation?
Reshape is a tricky function. It will produce you the shape you want, but can easily ground things together. In cases like yours, one should avoid using reshape by all means.
In that particular case instead, it is better to duplicate the arrays along the new axis. When using [batch, in_height, in_width, in_channels] channels is the last dimension and it should be used in repeat() function. Next code should better reflect the logic behind it:
i_grey = np.expand_dims(i_grey, axis=0) # add batch dim
i_grey = np.expand_dims(i_grey, axis=3) # add channel dim
i_rgb = np.repeat(i_grey, 3, axis=3 ) # duplicate along channels dim
And likewise with filters:
k = np.expand_dims(k, axis=2) # input channels dim
k = np.expand_dims(k, axis=3) # output channels dim
k_rgb = np.repeat(k, 3, axis=2) # duplicate along the input channels dim
I have a small model used in a reinforcement learning context.
I can input a 2d tensor of states, and I get a 2d tensor of action weigths.
Let say I input two states and I get the following action weights out:
[[0.1, 0.2],
[0.3, 0.4]]
Now I have another 2d tensor which have the action number from which I want to get the weights:
[[1],
[0]]
How can I use this tensor to get the weight of actions?
In this example I'd like to get:
[[0.2],
[0.3]]
Similar to Tensorflow tf.gather with axis parameter, the indices are handled little different here:
a = tf.constant( [[0.1, 0.2], [0.3, 0.4]])
indices = tf.constant([[1],[0]])
# convert to full indices
full_indices = tf.stack([tf.range(indices.shape[0])[...,tf.newaxis], indices], axis=2)
# gather
result = tf.gather_nd(a,full_indices)
with tf.Session() as sess:
print(sess.run(result))
#[[0.2]
#[0.3]]
A simple way to do this is squeeze the dimensions of indices, element-wise multiply with corresponding one-hot vector and then expand the dimensions later.
import tensorflow as tf
weights = tf.constant([[0.1, 0.2], [0.3, 0.4]])
indices = tf.constant([[1], [0]])
# Reduce from 2d (2, 1) to 1d (2,)
indices1d = tf.squeeze(indices)
# One-hot vector corresponding to the indices. shape (2, 2)
action_one_hot = tf.one_hot(indices=indices1d, depth=weights.shape[1])
# Element-wise multiplication and sum across axis 1 to pick the weight. Shape (2,)
action_taken_weight = tf.reduce_sum(action_one_hot * weights, axis=1)
# Expand the dimension back to have a 2d. Shape (2, 1)
action_taken_weight2d = tf.expand_dims(action_taken_weight, axis=1)
sess = tf.InteractiveSession()
print("weights\n", sess.run(weights))
print("indices\n", sess.run(indices))
print("indices1d\n", sess.run(indices1d))
print("action_one_hot\n", sess.run(action_one_hot))
print("action_taken_weight\n", sess.run(action_taken_weight))
print("action_taken_weight2d\n", sess.run(action_taken_weight2d))
Should give you the following output:
weights
[[0.1 0.2]
[0.3 0.4]]
indices
[[1]
[0]]
indices1d
[1 0]
action_one_hot
[[0. 1.]
[1. 0.]]
action_taken_weight
[0.2 0.3]
action_taken_weight2d
[[0.2]
[0.3]]
Note: You can also do action_taken_weight = tf.reshape(action_taken_weight, tf.shape(indices)) instead of expand_dims.
Similarly to the Caffe framework, where it is possible to watch the learned filters during CNNs training and it's resulting convolution with input images, I wonder if is it possible to do the same with TensorFlow?
A Caffe example can be viewed in this link:
http://nbviewer.jupyter.org/github/BVLC/caffe/blob/master/examples/00-classification.ipynb
Grateful for your help!
To see just a few conv1 filters in Tensorboard, you can use this code (it works for cifar10)
# this should be a part of the inference(images) function in cifar10.py file
# conv1
with tf.variable_scope('conv1') as scope:
kernel = _variable_with_weight_decay('weights', shape=[5, 5, 3, 64],
stddev=1e-4, wd=0.0)
conv = tf.nn.conv2d(images, kernel, [1, 1, 1, 1], padding='SAME')
biases = _variable_on_cpu('biases', [64], tf.constant_initializer(0.0))
bias = tf.nn.bias_add(conv, biases)
conv1 = tf.nn.relu(bias, name=scope.name)
_activation_summary(conv1)
with tf.variable_scope('visualization'):
# scale weights to [0 1], type is still float
x_min = tf.reduce_min(kernel)
x_max = tf.reduce_max(kernel)
kernel_0_to_1 = (kernel - x_min) / (x_max - x_min)
# to tf.image_summary format [batch_size, height, width, channels]
kernel_transposed = tf.transpose (kernel_0_to_1, [3, 0, 1, 2])
# this will display random 3 filters from the 64 in conv1
tf.image_summary('conv1/filters', kernel_transposed, max_images=3)
I also wrote a simple gist to display all 64 conv1 filters in a grid.