I am finding it impossible to get the max tensor in an n-dimensional array, even by summing the tensors and using gather or gather_nd.
By max tensor I mean the set of weights with the highest sum.
I have a tensor of shape (-1, 4, 30, 256) where 256 is the weights.
I need to get the maximum set of weights for each (-1, 0, 30), (-1, 1, 30), (-1, 2, 30) and (-1, 3, 30), so under each tensor in the 2nd dimension.
This would ideally result in a (-1, 4, 256) tensor.
reduce_max and any other max function will only return the maximum element values within the last dimension, not the maximum tensor (which is the set of weights with the highest sum) in the dimension itself. I have tried:
p1 = tf.reduce_sum(tensor, axis=3) # (-1, 4, 30)
p2 = tf.argmax(p1, 2) # (-1, 4)
Which gives the appropriate index values for the 3rd dimension:
[[0, 2, 2, 0],
[0, 1, 3, 0],
But running tf.gather or tf.gather_nd on the above does not work, even when splitting my data beforehand and using different axes.
Further, I can get the appropriated indexes if I use gather_nd by hand, eg:
tf.gather_nd(out5, [[0,0,0], [0,1,2], [0,2,2], [0,3,0], [1,0,0], [1,1,2], [1,2,2], [1,3,1]])
But as we are using a tensorflow variable of an unknown first dimension, I cannot build these indexes.
I have searched through related workarounds and found nothing applicable.
Can anyone tell me how to accomplish this? Thanks!
edit for clarification:
The maximum tensor of weights would be the set of weights with the highest sum:
[[ 1, 2, 3], [0, 0, 2], [1, 0, 2]] would be [1, 2, 3]

I figured it out using map_fn:
I reshaped my tensor to (-1, 120, 256)
tfr = tf.reshape(sometensor, ((-1, 120, 256)))
def func(slice):
f1 = tf.reduce_sum(slice, axis=1)
f2 = tf.argmax(f1)
bla = tf.map_fn(func, tfr)
Which returns (-1,256) with the greatested summed vector (highest set of weights).
Basically, map_fn will iterate along the 2nd to last axis, so it slices a chunk of (120,256) to func repeatedly (how ever many entries are on the first axis). It then returns the appropriate (1,256) chunk by chunk which, voila, gives the answer.


Argmax indexing in pytorch with 2 tensors of equal shape

Summarize the problem
I am working with high dimensional tensors in pytorch and I need to index one tensor with the argmax values from another tensor. So I need to index tensor y of dim [3,4] with the results from the argmax of tensor xwith dim [3,4]. If tensors are:
import torch as T
# Tensor to get argmax from
# expected argmax: [2, 0, 1]
x = T.tensor([[1, 2, 8, 3],
[6, 3, 3, 5],
[2, 8, 1, 7]])
# Tensor to index with argmax from preivous
# expected tensor to retrieve [2, 4, 9]
y = T.tensor([[0, 1, 2, 3],
[4, 5, 6, 7],
[8, 9, 10, 11]])
# argmax
x_max, x_argmax = T.max(x, dim=1)
I would like an operation that given the argmax indexes of x, or x_argmax, retrieves the values in tensor y in the same indexes x_argmax indexes.
Describe what you’ve tried
This is what I have tried:
# What I have tried
print(y[:, x_argmax])
print(y[..., x_argmax])
I have been reading a lot about numpy indexing, basic indexing, advanced indexing and combined indexing. I have been trying to use combined indexing (since I want a slice in first dimension of the tensor and the indexes values on the second one). But I have not been able to come up with a solution for this use case.
You are looking for torch.gather:
idx = torch.argmax(x, dim=1, keepdim=true) # get argmax directly, w/o max
out = torch.gather(y, 1, idx)
Resulting with
How about y[T.arange(3), x_argmax]?
That does the job for me...
Explanation: You take dimensional information away when you invoke T.max(x, dim=1), so this information needs to be restored explicitly.

How to slice a tensor using given indices in tensorflow?

I have a tensor with probabilities. This is a dynamic tensor with shape (?, 30) and I am selecting index with the best probability of these 30 values as :
best_probability = tf.argmax(probability, axis = 1)
Now the dimensions of tensor best_probability is (?,). Now I want to select the values with these indices from another tensor called data with dimensions (?, 30, 1024, 3). Essentially from each of the 30 values select one with best probability using best_probability tensor.
The final output should have dimensions of (?, 1024, 3).
PS:- I tried gather_nd but it need indexing of best_probability tensor something like [[0, 9], [1, 10], [2, 15], [3, 25]]. To do so I wrote following snippet.
selected_data = tf.stack(tf.range(probability.shape[0]),
tf.argmax(probability, axis = 1))
This doesn't work as I am dealing with a dynamic tensor. Is there any alternative to solve this problem.
I was able to solve this issue using tf.batch_gather and tf.reshape
selected_data = tf.reshape(tf.batch_gather(data, best_probability),
(-1, data.shape[2],data.shape[3]))

Slicing a tensor by an index tensor in Tensorflow

I have two following tensors (note that they are both Tensorflow tensors which means they are still virtually symbolic at the time I construct the following slicing op before I launch a tf.Session()):
params: has shape (64,784, 256)
indices: has shape (64, 784)
and I want to construct an op that returns the following tensor:
output: has shape (64,784) where
output[i,j] = params_tensor[i,j, indices[i,j] ]
What is the most efficient way in Tensorflow to do so?
ps: I tried with tf.gather but couldn't make use of it to perform the operation I described above.
Many thanks.
You can get exactly what you want using tf.gather_nd. The final expression is:
tf.gather_nd(params, tf.stack([tf.tile(tf.expand_dims(tf.range(tf.shape(indices)[0]), 1), [1, tf.shape(indices)[1]]), tf.transpose(tf.tile(tf.expand_dims(tf.range(tf.shape(indices)[1]), 1), [1, tf.shape(indices)[0]])), indices], 2))
This expression has the following explanation:
tf.gather_nd does what you expected and uses the indices to gather the output from the params
tf.stack combines three separate tensors, the last of which is the indices. The first two tensors specify the ordering of the first two dimensions (axis 0 and axis 1 of params/indices)
For the example provided, this ordering is simply 0, 1, 2, ..., 63 for axis 0, and 0, 1, 2, ... 783 for axis 1. These sequences are obtained with tf.range(tf.shape(indices)[0]) and tf.range(tf.shape(indices)[1]), respectively.
For the example provided, indices has shape (64, 784). The other two tensors from the last point above need to have this same shape in order to be combined with tf.stack
First, an additional dimension/axis is added to each of the two sequences using tf.expand_dims.
The use of tf.tile and tf.transpose can be shown by example: Assume the first two axes of params and index have shape (5,3). We want the first tensor to be:
[[0, 0, 0], [1, 1, 1], [2, 2, 2], [3, 3, 3], [4, 4, 4]]
We want the second tensor to be:
[[0, 1, 2], [0, 1, 2], [0, 1, 2], [0, 1, 2], [0, 1, 2]]
These two tensors almost function like specifying the coordinates in a grid for the associated indices.
The final part of tf.stack combines the three tensors on a new third axis, so that the result has the same 3 axes as params.
Keep in mind if you have more or less axes than in the question, you need to modify the number of coordinate-specifying tensors in tf.stack accordingly.
What you want is like a custom reduction function. If you are keeping something like index of maximum value at indices then I would suggest using tf.reduce_max:
max_params = tf.reduce_max(params_tensor, reduction_indices=[2])
Otherwise, here is one way to get what you want (Tensor objects are not assignable so we create a 2d list of tensors and pack it using tf.pack):
import tensorflow as tf
import numpy as np
with tf.Graph().as_default():
params_tensor = tf.pack(np.random.randint(1,256, [5,5,10]).astype(np.int32))
indices = tf.pack(np.random.randint(1,10,[5,5]).astype(np.int32))
output = [ [None for j in range(params_tensor.get_shape()[1])] for i in range(params_tensor.get_shape()[0])]
for i in range(params_tensor.get_shape()[0]):
for j in range(params_tensor.get_shape()[1]):
output[i][j] = params_tensor[i,j,indices[i,j]]
output = tf.pack(output)
with tf.Session() as sess:
params_tensor,indices,output =[params_tensor,indices,output])
print params_tensor
print indices
print output
I know I'm late, but I recently had to do something similar, and was able to to do it using Ragged Tensors:
output = tf.gather(params, tf.RaggedTensor.from_tensor(indices), batch_dims=-1, axis=-1)
Hope it helps

How do I swap tensor's axes in TensorFlow?

I have a tensor of shape (30, 116, 10), and I want to swap the first two dimensions, so that I have a tensor of shape (116, 30, 10)
I saw that numpy as such a function implemented (np.swapaxes) and I searched for something similar in tensorflow but I found nothing.
Do you have any idea?
tf.transpose provides the same functionality as np.swapaxes, although in a more generalized form. In your case, you can do tf.transpose(orig_tensor, [1, 0, 2]) which would be equivalent to np.swapaxes(orig_np_array, 0, 1).
It is possible to use tf.einsum to swap axes if the number of input dimensions is unknown. For example:
tf.einsum("ij...->ji...", input) will swap the first two dimensions of input;
tf.einsum("...ij->...ji", input) will swap the last two dimensions;
tf.einsum("aij...->aji...", input) will swap the second and the third
tf.einsum("ijk...->kij...", input) will permute the first three dimensions;
and so on.
You can transpose just the last two axes with tf.linalg.matrix_transpose, or more generally, you can swap any number of trailing axes by working out what the leading indices are dynamically, and using relative indices for the axes you want to transpose
x = tf.ones([5, 3, 7, 11])
trailing_axes = [-1, -2]
leading = tf.range(tf.rank(x) - len(trailing_axes)) # [0, 1]
trailing = trailing_axes + tf.rank(x) # [3, 2]
new_order = tf.concat([leading, trailing], axis=0) # [0, 1, 3, 2]
res = tf.transpose(x, new_order)
res.shape # [5, 3, 11, 7]

Outer product in tensorflow

In tensorflow, there are nice functions for entrywise and matrix multiplication, but after looking through the docs, I cannot find any internal function for taking an outer product of two tensors, i.e., making a bigger tensor by all possible products of elements of smaller tensors (like numpy.outer):
v_{i,j} = x_i*h_j
M_{ij,kl} = A_{ij}*B_{kl}
Does tensorflow have such a function?
Yes, you can do this by taking advantage of the broadcast semantics of tensorflow. Size the first out to size 1xN of itself, and the second to size Mx1 of itself, and you'll get a broadcast to MxN of all of the results when you multiply them.
(You can play around with the same thing in numpy to see how it behaves in a simpler context, btw:
a = np.array([1, 2, 3, 4, 5]).reshape([5,1])
b = np.array([6, 7, 8, 9, 10]).reshape([1,5])
How exactly you do it in tensorflow depends a bit on which axes you want to use and what semantics you want for the resulting multiply, but the general idea applies.
It is somewhat surprising that until recently there was no easy and "natural" way of doing an outer product between arbitrary tensors (also known as "tensor product") in tensorflow, especially given the name of the library...
With tensorflow>=1.6 you can now finally get what you want with a simple:
M = tf.tensordot(A, B, axes=0)
In earlier versions of tensorflow, axes=0 raises a ValueError: 'axes' must be at least 1.. Somehow tf.tensordot() used to need at least one dimension to actually sum over. The easy way out is to simply add a "fake" dimension with tf.expand_dims().
On tensorflow<=1.5 you can thus get the same result as above by doing:
M = tf.tensordot(tf.expand_dims(A, 0), tf.expand_dims(B, 0), axes=[[0],[0]])
This adds a new index of dimension 1 in location 0 for both tensors and then lets tf.tensordot() sum over those indices.
In case someone else stumbles upon this, according to the tensorflow docs you can use the tf.einsum() function to compute the outer product of two tensors a and b:
# Outer product
>>> einsum('i,j->ij', u, v) # output[i,j] = u[i]*v[j]
tf.multiply (and its '*' shortcut) result in an outer product, whether or not a batch is used. In particular, if the two input tensors have a 3D shapes of [batch, n, 1] and [batch, 1, n] then this op will calculate the outer product for [n,1],[1,n] per each sample in the batch. If there is no batch, so that the two input tensors are 2D, this op will calculate the outer product just the same.
On the other hand, while tf.tensordot yields the outer product for 2D matrices, it did not broadcast similarly when a batch was added.
Without a batch:
a_np = np.array([[1, 2, 3]]) # shape: (1,3) [a row vector], 2D Tensor
b_np = np.array([[4], [5], [6]]) # shape: (3,1) [a column vector], 2D Tensor
a = tf.placeholder(dtype='float32', shape=[1, 3])
b = tf.placeholder(dtype='float32', shape=[3, 1])
c = a*b # Result: an outer-product of a,b
d = tf.multiply(a,b) # Result: an outer-product of a,b
e = tf.tensordot(a,b, axes=[0,1]) # Result: an outer-product of a,b
With a batch:
a_np = np.array([[[1, 2, 3]], [[4, 5, 6]]]) # shape: (2,1,3) [a batch of two row vectors], 3D Tensor
b_np = np.array([[[7], [8], [9]], [[10], [11], [12]]]) # shape: (2,3,1) [a batch of two column vectors], 3D Tensor
a = tf.placeholder(dtype='float32', shape=[None, 1, 3])
b = tf.placeholder(dtype='float32', shape=[None, 3, 1])
c = a*b # Result: an outer-product per batch
d = tf.multiply(a,b) # Result: an outer-product per batch
e = tf.tensordot(a,b, axes=[1,2]) # Does NOT result with an outer-product per batch
Running any of these two graphs:
sess = tf.Session()
result_astrix =, feed_dict={a:a_np, b: b_np})
result_multiply =, feed_dict={a:a_np, b: b_np})
result_tensordot =, feed_dict={a:a_np, b: b_np})
print('tf.tensordot(a,b, axes=[1,2]:')
As pointed out in the other answers, the outer product can be done using broadcasting:
a = tf.range(10)
b = tf.range(5)
outer = a[..., None] * b[None, ...]
# array([[ 0, 0, 0, 0, 0],
# [ 0, 1, 2, 3, 4],
# [ 0, 2, 4, 6, 8],
# [ 0, 3, 6, 9, 12],
# [ 0, 4, 8, 12, 16],
# [ 0, 5, 10, 15, 20],
# [ 0, 6, 12, 18, 24],
# [ 0, 7, 14, 21, 28],
# [ 0, 8, 16, 24, 32],
# [ 0, 9, 18, 27, 36]], dtype=int32)
The a[..., None] inserts a new dimension of length 1 after the last axis.
Similarly, b[None, ...] inserts a new dimension of length 1 before the first axis.
The element-wide multiplication then broadcasts the tensors from shapes (10, 1) * (1, 5) to (10, 5) * (10, 5), computing the outer product.
Where you insert the additional dimensions determines for which dimensions the outer product is computed. For example, if both tensors have a batch size, you can skip that using : which gives a[:, ..., None] * b[:, None, ...]. This can be further abbreviated as a[..., None] * b[:, None]. To perform the outer product over the last dimension and thus supporting any number of batch dimensions, use a[..., None] * b[..., None, :].
I would have commented to MasDra, but SO wouldn't let me as a new registered user.
The general outer product of multiple vectors arranged in a list U of length order can be obtained via
tf.einsum(','.join(string.ascii_lowercase[0:order])+'->'+string.ascii_lowercase[0:order], *U)