masked softmax in theano - numpy

I am wondering if it possible to apply a mask before performing theano.tensor.nnet.softmax?
This is the behavior I am looking for:
>>>a = np.array([[1,2,3,4]])
>>>m = np.array([[1,0,1,0]]) # ignore index 1 and 3
array([[ 0.11920292, 0. , 0.88079708, 0. ]])
Note that a and m are matrices, so I would like the softmax with work on an entire matrix and perform row-wise masked softmax.
Also the output should be the same shape as a, so the solution can not do advanced indexing e.g. theano.tensor.softmax(a[0,[0,2]])

def masked_softmax(a, m, axis):
e_a = T.exp(a)
masked_e = e_a * m
sum_masked_e = T.sum(masked_e, axis, keepdims=True)
return masked_e / sum_masked_e

theano.tensor.switch is one way to do this.
In the computational graph you can do the following:
a_mask = theano.tensor.switch(m, a, np.NINF)
sm = theano.tensor.softmax(a_mask)
hope it helps others.


Triple tensor product with Tensorflow

Suppose I have a matrix A and two vectors x,y, of appropriate dimensions. I want to compute the dot product x' * A * y, where x' denotes the transpose. This should result in a scalar.
Is there a convenient API function in Tensorflow to do this?
(Note that I am using Tensorflow 2).
Use tf.linalg.tensordot(). See the documentation
As you have mentioned in the question that you are trying to find dot product. In this case tf.matmul() will not work, as it is only for cross product of metrices.
Demo code snippet
import tensorflow as tf
A = tf.constant([[1,4,6],[2,1,5],[3,2,4]])
x = tf.constant([3,2,7])
result = tf.linalg.tensordot(tf.transpose(x), A, axes=1)
result = tf.linalg.tensordot(result, x, axes=1)
And the result will be
>>>tf.Tensor(532, shape=(), dtype=int32)
Few points I want to mention here
Don't forget the axes argument inside tf.linalg.tensordot()
When you create tf.zeros(5) it will create a list of shape 5 and it will be like [0,0,0,0,0], when you transpose this it will give you the same list. But if you create it like tf.zeros((5,1)), it would be a vector of shape (5,1) and the result will be
Now you can transpose this and the result will be different, but I recommend you do the code snippet I have mentioned. In case of dot product you don't have to bother much about this.
If you are still facing issues, will be very happy to help you.
Just do the following,
import tensorflow as tf
x = tf.constant([1,2])
a = tf.constant([[2,3],[3,4]])
y = tf.constant([2,3])
z = tf.reshape(tf.matmul(tf.matmul(x[tf.newaxis,:], a), y[:, tf.newaxis]),[])
>>> 49
Just use tf.transpose and multiplication operator like this:
tf.transpose(x)* A * y .
Based on your example:
x = tf.zeros(5)
A = tf.zeros((5,5))
How about
x = tf.expand_dims(x, -1)
tf.matmul(tf.matmul(x, A, transpose_a=True), x)

Ensemble network with categorical distribution in tensorflow

I have n networks, each with the same input / output. I want to randomly select one of the outputs according to a categorical distribution. Tfp.Categorical outputs only integers and I tried to do something like
act_dist = tfp.distributions.Categorical(logits=act_logits) # act_logits are all the same, so the distribution is uniform
rand_out = act_dist.sample()
x = nn_out1 * tf.cast(rand_out == 0., dtype=tf.float32) + ... # for all my n networks
But rand_out == 0. is always false, as well as the other conditions.
Any idea for achieving what I need?
You might also look at MixtureSameFamily, which does a gather under the covers for you.
nn_out1 = tf.expand_dims(nn_out1, axis=2)
outs = tf.concat([nn_out1, nn_nout2, ...], axis=2)
probs = tf.tile(tf.reduce_mean(tf.ones_like(nn_out1), axis=1, keepdims=True) / n, [1, n]) # trick to have ones of shape [None,1]
dist = tfp.distributions.MixtureSameFamily(
x = dist.sample()
I think you need to use tf.equal, because Tensor == 0 is always False.
Separately though, you might want to use OneHotCategorical. For training, you might also try using RelaxedOneHotCategorical.

How to create a new array of tensors from old one

I have a tensor [a, b, c, d, e, f, g, h, i] with dimension 9 X 1536. I need to create a new tensor which is like [(a,b), (a,c), (a,d), (a,e),(a,f),(a,g), (a,h), (a,i)] with dimension [8 x 2 x 1536]. How can I do it with tensorflow ?
I tried like this
x = tf.zeros((9x1536))
x_new = tf.stack([(x[0],x[1]),
(x[0], x[2]),
(x[0], x[3]),
(x[0], x[4]),
(x[0], x[5]),
(x[0], x[6]),
(x[0], x[7]),
(x[0], x[8])])
This seems to work but I would like to know if there is a better solution or approach which can be used instead of this
You can obtain the desired output with a combination of tf.concat, tf.tile and tf.expand_dims:
import tensorflow as tf
import numpy as np
_in = tf.constant(np.random.randint(0,10,(9,1536)))
tile_shape = [(_in.shape[0]-1).value] + [1]*len(_in.shape[1:].as_list())
_out = tf.concat([
tf.expand_dims(_in[1:], 1)
tf.tile repeats the first element of _in creating a tensor of length len(_in)-1 (I compute separately the shape of the tile because we want to tile only on the first dimension).
tf.expand_dims adds a dimension we can then concat on
Finally, tf.concat stitches together the two tensors giving the desired result.
EDIT: Rewrote to fit the OP's actual use-case with multidimensional tensors.

gather values from 2dim tensor in tensorflow

Hi tensorflow beginner here... I'm trying to get the value of a certain elements in an 2 dim tensor, in my case class scores from a probability matrix.
The probability matrix is (1000,81) with batchsize 1000 and number of classes 81. ClassIDs is (1000,) and contains the index for the highest class score for each sample. How do I get the corresponding class score from the probability matrix using tf.gather?
class_ids = tf.cast(tf.argmax(probs, axis=1), tf.int32)
class_scores = tf.gather_nd(probs,class_ids)
class_scores should be a tensor of shape (1000,) containing the highest class_score for each sample.
Right now I'm using a workaround that looks like this:
class_score_count = []
for i in range(probs.shape[0]):
prob = probs[i,:]
class_score = prob[class_ids[i]]
class_scores = tf.stack(class_score_count, axis=0)
Thanks for the help!
You can do it with tf.gather_nd like this:
class_ids = tf.cast(tf.argmax(probs, axis=1), tf.int32)
# If shape is not dynamic you can use probs.shape[0].value instead of tf.shape(probs)[0]
row_ids = tf.range(tf.shape(probs)[0], dtype=tf.int32)
idx = tf.stack([row_ids, class_ids], axis=1)
class_scores = tf.gather_nd(probs, idx)
You could also just use tf.reduce_max, even though it would actually compute the maximum again it may not be much slower if your data is not too big:
class_scores = tf.reduce_max(probs, axis=1)
you need to run the tensor class_ids to get the values
the values will be a bumpy array
you can access numpy array normally by a loop
you have to do something like this :
predictions =, 1), feed_dict={x: X_data})
predictions variable has all the information you need
tensorflow only returns those tensor values which you run explicitly
I think this is what the batch_dims argument for tf.gather is for.

tensorflow: how to perform element-wise multiplication between two sparse matrix

I have two sparse matrices declared using the tf.sparse_placeholder. I need to perform the element-wise multiplication between the two matrices. But I cannot find such an implementation in tensorflow. The most related function is tf.sparse_tensor_dense_matmul, but this is a function performing matrix multiplication between one sparse matrix and one dense matrix.
What I hope to find is to performing element-wise multiplication between two sparse matrices. Is there any implementation of this in tensorflow?
I show the following example of performing multiplication between dense matrices. I'm looking forward to seeing a solution.
import tensorflow as tf
import numpy as np
# Element-wise multiplication, two dense matrices
A = tf.placeholder(tf.float32, shape=(100, 100))
B = tf.placeholder(tf.float32, shape=(100, 100))
C = tf.multiply(A, B)
sess = tf.InteractiveSession()
RandA = np.random.rand(100, 100)
RandB = np.random.rand(100, 100)
print, feed_dict={A: RandA, B: RandB})
# matrix multiplication, A is sparse and B is dense
A = tf.sparse_placeholder(tf.float32)
B = tf.placeholder(tf.float32, shape=(5,5))
C = tf.sparse_tensor_dense_matmul(A, B)
sess = tf.InteractiveSession()
indices = np.array([[3, 2], [1, 2]], dtype=np.int64)
values = np.array([1.0, 2.0], dtype=np.float32)
shape = np.array([5,5], dtype=np.int64)
Sparse_A = tf.SparseTensorValue(indices, values, shape)
RandB = np.ones((5, 5))
print, feed_dict={A: Sparse_A, B: RandB})
Thank you very much!!!
TensorFlow currently has no sparse-sparse element-wise multiplication operation.
We don't plan to add support for this currently, but contributions are definitely welcome! Feel free to create a github issue here: and perhaps you or someone in the community can pick it up :)
you can use tf.matmul or tf.sparse_matmulfor sparse matrices also; setting a_is_sparse and b_is_sparse as True.
For element-wise multiplication, one workaround is to use tf.sparse_to_dense for converting sparse tensor to dense representation and using tf.multiply for element-wise multiplication
Solution from another post works.
Use the __mul__ to perform the element-wise multiplication.
TF2.1 ref:
I'm using Tensorflow 2.4.1.
Here's my workaround to multiply two sparse tensor element-wise:
def sparse_element_wise_mul(a: tf.SparseTensor, b: tf.SparseTensor):
a_plus_b = tf.sparse.add(a, b)
a_plus_b_square = tf.square(a_plus_b)
minus_a_square = tf.negative(tf.square(a))
minus_b_square = tf.negative(tf.square(b))
_2ab = tf.sparse.add(
ab = tf.sparse.map_values(tf.multiply, _2ab, 0.5)
return ab
Here's some simple explanation:
Given that
(a+b)^2 = a^2 + 2a*b + b^2
we can calculate a*b by
a*b = ((a+b)^2 - a^2 - b^2) / 2
It seems the gradient can be calculated correctly with such a workaround.