Outer product in tensorflow - tensorflow

In tensorflow, there are nice functions for entrywise and matrix multiplication, but after looking through the docs, I cannot find any internal function for taking an outer product of two tensors, i.e., making a bigger tensor by all possible products of elements of smaller tensors (like numpy.outer):
v_{i,j} = x_i*h_j
or
M_{ij,kl} = A_{ij}*B_{kl}
Does tensorflow have such a function?

Yes, you can do this by taking advantage of the broadcast semantics of tensorflow. Size the first out to size 1xN of itself, and the second to size Mx1 of itself, and you'll get a broadcast to MxN of all of the results when you multiply them.
(You can play around with the same thing in numpy to see how it behaves in a simpler context, btw:
a = np.array([1, 2, 3, 4, 5]).reshape([5,1])
b = np.array([6, 7, 8, 9, 10]).reshape([1,5])
a*b
How exactly you do it in tensorflow depends a bit on which axes you want to use and what semantics you want for the resulting multiply, but the general idea applies.

It is somewhat surprising that until recently there was no easy and "natural" way of doing an outer product between arbitrary tensors (also known as "tensor product") in tensorflow, especially given the name of the library...
With tensorflow>=1.6 you can now finally get what you want with a simple:
M = tf.tensordot(A, B, axes=0)
In earlier versions of tensorflow, axes=0 raises a ValueError: 'axes' must be at least 1.. Somehow tf.tensordot() used to need at least one dimension to actually sum over. The easy way out is to simply add a "fake" dimension with tf.expand_dims().
On tensorflow<=1.5 you can thus get the same result as above by doing:
M = tf.tensordot(tf.expand_dims(A, 0), tf.expand_dims(B, 0), axes=[[0],[0]])
This adds a new index of dimension 1 in location 0 for both tensors and then lets tf.tensordot() sum over those indices.

In case someone else stumbles upon this, according to the tensorflow docs you can use the tf.einsum() function to compute the outer product of two tensors a and b:
# Outer product
>>> einsum('i,j->ij', u, v) # output[i,j] = u[i]*v[j]

tf.multiply (and its '*' shortcut) result in an outer product, whether or not a batch is used. In particular, if the two input tensors have a 3D shapes of [batch, n, 1] and [batch, 1, n] then this op will calculate the outer product for [n,1],[1,n] per each sample in the batch. If there is no batch, so that the two input tensors are 2D, this op will calculate the outer product just the same.
On the other hand, while tf.tensordot yields the outer product for 2D matrices, it did not broadcast similarly when a batch was added.
Without a batch:
a_np = np.array([[1, 2, 3]]) # shape: (1,3) [a row vector], 2D Tensor
b_np = np.array([[4], [5], [6]]) # shape: (3,1) [a column vector], 2D Tensor
a = tf.placeholder(dtype='float32', shape=[1, 3])
b = tf.placeholder(dtype='float32', shape=[3, 1])
c = a*b # Result: an outer-product of a,b
d = tf.multiply(a,b) # Result: an outer-product of a,b
e = tf.tensordot(a,b, axes=[0,1]) # Result: an outer-product of a,b
With a batch:
a_np = np.array([[[1, 2, 3]], [[4, 5, 6]]]) # shape: (2,1,3) [a batch of two row vectors], 3D Tensor
b_np = np.array([[[7], [8], [9]], [[10], [11], [12]]]) # shape: (2,3,1) [a batch of two column vectors], 3D Tensor
a = tf.placeholder(dtype='float32', shape=[None, 1, 3])
b = tf.placeholder(dtype='float32', shape=[None, 3, 1])
c = a*b # Result: an outer-product per batch
d = tf.multiply(a,b) # Result: an outer-product per batch
e = tf.tensordot(a,b, axes=[1,2]) # Does NOT result with an outer-product per batch
Running any of these two graphs:
sess = tf.Session()
result_astrix = sess.run(c, feed_dict={a:a_np, b: b_np})
result_multiply = sess.run(d, feed_dict={a:a_np, b: b_np})
result_tensordot = sess.run(e, feed_dict={a:a_np, b: b_np})
print('a*b:')
print(result_astrix)
print('tf.multiply(a,b):')
print(result_multiply)
print('tf.tensordot(a,b, axes=[1,2]:')
print(result_tensordot)

As pointed out in the other answers, the outer product can be done using broadcasting:
a = tf.range(10)
b = tf.range(5)
outer = a[..., None] * b[None, ...]
tf.InteractiveSession().run(outer)
# array([[ 0, 0, 0, 0, 0],
# [ 0, 1, 2, 3, 4],
# [ 0, 2, 4, 6, 8],
# [ 0, 3, 6, 9, 12],
# [ 0, 4, 8, 12, 16],
# [ 0, 5, 10, 15, 20],
# [ 0, 6, 12, 18, 24],
# [ 0, 7, 14, 21, 28],
# [ 0, 8, 16, 24, 32],
# [ 0, 9, 18, 27, 36]], dtype=int32)
Explanation:
The a[..., None] inserts a new dimension of length 1 after the last axis.
Similarly, b[None, ...] inserts a new dimension of length 1 before the first axis.
The element-wide multiplication then broadcasts the tensors from shapes (10, 1) * (1, 5) to (10, 5) * (10, 5), computing the outer product.
Where you insert the additional dimensions determines for which dimensions the outer product is computed. For example, if both tensors have a batch size, you can skip that using : which gives a[:, ..., None] * b[:, None, ...]. This can be further abbreviated as a[..., None] * b[:, None]. To perform the outer product over the last dimension and thus supporting any number of batch dimensions, use a[..., None] * b[..., None, :].

I would have commented to MasDra, but SO wouldn't let me as a new registered user.
The general outer product of multiple vectors arranged in a list U of length order can be obtained via
tf.einsum(','.join(string.ascii_lowercase[0:order])+'->'+string.ascii_lowercase[0:order], *U)

Related

How can one utilize the indices provided by torch.topk()?

Suppose I have a pytorch tensor x of shape [N, N_g, 2]. It can be viewed as N * N_g 2d vectors. Specifically, x[i, j, :] is the 2d vector of the jth group in the ith batch.
Now I am trying to get the coordinates of vectors of top 5 length in each group. So I tried the following:
(i) First I used x_len = (x**2).sum(dim=2).sqrt() to compute their lengths, resulting in x_len.shape==[N, N_g].
(ii) Then I used tk = x_len.topk(5) to get the top 5 lengths in each group.
(iii) The desired output would be a tensor x_top5 of shape [N, 5, 2]. Naturally I thought of using tk.indices to index x so as to obtain x_top5. But I failed as it seems such indexing is not supported.
How can I do this?
A minimal example:
x = torch.randn(10,10,2) # N=10 is the batchsize, N_g=10 is the group size
x_len = (x**2).sum(dim=2).sqrt()
tk = x_len.topk(5)
x_top5 = x[tk.indices]
print(x_top5.shape)
# torch.Size([10, 5, 10, 2])
However, this gives x_top5 as a tensor of shape [10, 5, 10, 2], instead of [10, 5, 2] as desired.

Explicit slicing across a particular dimension

I've got a 3D tensor x (e.g 4x4x100). I want to obtain a subset of this by explicitly choosing elements across the last dimension. This would have been easy if I was choosing the same elements across last dimension (e.g. x[:,:,30:50] but I want to target different elements across that dimension using the 2D tensor indices which specifies the idx across third dimension. Is there an easy way to do this in numpy?
A simpler 2D example:
x = [[1,2,3,4,5,6],[10,20,30,40,50,60]]
indices = [1,3]
Let's say I want to grab two elements across third dimension of x starting from points specified by indices. So my desired output is:
[[2,3],[40,50]]
Update: I think I could use a combination of take() and ravel_multi_index() but some of the platforms that are inspired by numpy (like PyTorch) don't seem to have ravel_multi_index so I'm looking for alternative solutions
Iterating over the idx, and collecting the slices is not a bad option if the number of 'rows' isn't too large (and the size of the sizes is relatively big).
In [55]: x = np.array([[1,2,3,4,5,6],[10,20,30,40,50,60]])
In [56]: idx = [1,3]
In [57]: np.array([x[j,i:i+2] for j,i in enumerate(idx)])
Out[57]:
array([[ 2, 3],
[40, 50]])
Joining the slices like this only works if they all are the same size.
An alternative is to collect the indices into an array, and do one indexing.
For example with a similar iteration:
idxs = np.array([np.arange(i,i+2) for i in idx])
But broadcasted addition may be better:
In [58]: idxs = np.array(idx)[:,None]+np.arange(2)
In [59]: idxs
Out[59]:
array([[1, 2],
[3, 4]])
In [60]: x[np.arange(2)[:,None], idxs]
Out[60]:
array([[ 2, 3],
[40, 50]])
ravel_multi_index is not hard to replicate (if you don't need clipping etc):
In [65]: np.ravel_multi_index((np.arange(2)[:,None],idxs),x.shape)
Out[65]:
array([[ 1, 2],
[ 9, 10]])
In [66]: x.flat[_]
Out[66]:
array([[ 2, 3],
[40, 50]])
In [67]: np.arange(2)[:,None]*x.shape[1]+idxs
Out[67]:
array([[ 1, 2],
[ 9, 10]])
along the 3D axis:
x = [x[:,i].narrow(2,index,2) for i,index in enumerate(indices)]
x = torch.stack(x,dim=1)
by enumerating you get the index of the axis and index from where you want to start slicing in one.
narrow gives you a zero-copy length long slice from a starting index start along a certain axis
you said you wanted:
dim = 2
start = index
length = 2
then you simply have to stack these tensors back to a single 3D.
This is the least work intensive thing i can think of for pytorch.
EDIT
if you just want different indices along different axis and indices is a 2D tensor you can do:
x = [x[:,i,index] for i,index in enumerate(indices)]
x = torch.stack(x,dim=1)
You really should have given a proper working example, making it unnecessarily confusing.
Here is how to do it in numpy, now clue about torch, though.
The following picks a slice of length n along the third dimension starting from points idx depending on the other two dimensions:
# example
a = np.arange(60).reshape(2, 3, 10)
idx = [(1,2,3),(4,3,2)]
n = 4
# build auxiliary 4D array where the last two dimensions represent
# a sliding n-window of the original last dimension
j,k,l = a.shape
s,t,u = a.strides
aux = np.lib.stride_tricks.as_strided(a, (j,k,l-n+1,n), (s,t,u,u))
# pick desired offsets from sliding windows
aux[(*np.ogrid[:j, :k], idx)]
# array([[[ 1, 2, 3, 4],
# [12, 13, 14, 15],
# [23, 24, 25, 26]],
# [[34, 35, 36, 37],
# [43, 44, 45, 46],
# [52, 53, 54, 55]]])
I came up with below using broadcasting:
x = np.array([[1,2,3,4,5,6,7,8,9,10],[10,20,30,40,50,60,70,80,90,100]])
i = np.array([1,5])
N = 2 # number of elements I want to extract along each dimension. Starting points specified in i
r = np.arange(x.shape[-1])
r = np.broadcast_to(r, x.shape)
ii = i[:, np.newaxis]
ii = np.broadcast_to(ii, x.shape)
mask = np.logical_and(r-ii>=0, r-ii<=N)
output = x[mask].reshape(2,3)
Does this look alright?

Max tensor (not element) in an n-dimensional tensor

I am finding it impossible to get the max tensor in an n-dimensional array, even by summing the tensors and using gather or gather_nd.
By max tensor I mean the set of weights with the highest sum.
I have a tensor of shape (-1, 4, 30, 256) where 256 is the weights.
I need to get the maximum set of weights for each (-1, 0, 30), (-1, 1, 30), (-1, 2, 30) and (-1, 3, 30), so under each tensor in the 2nd dimension.
This would ideally result in a (-1, 4, 256) tensor.
reduce_max and any other max function will only return the maximum element values within the last dimension, not the maximum tensor (which is the set of weights with the highest sum) in the dimension itself. I have tried:
p1 = tf.reduce_sum(tensor, axis=3) # (-1, 4, 30)
p2 = tf.argmax(p1, 2) # (-1, 4)
Which gives the appropriate index values for the 3rd dimension:
[[0, 2, 2, 0],
[0, 1, 3, 0],
...
But running tf.gather or tf.gather_nd on the above does not work, even when splitting my data beforehand and using different axes.
Further, I can get the appropriated indexes if I use gather_nd by hand, eg:
tf.gather_nd(out5, [[0,0,0], [0,1,2], [0,2,2], [0,3,0], [1,0,0], [1,1,2], [1,2,2], [1,3,1]])
But as we are using a tensorflow variable of an unknown first dimension, I cannot build these indexes.
I have searched through related workarounds and found nothing applicable.
Can anyone tell me how to accomplish this? Thanks!
edit for clarification:
The maximum tensor of weights would be the set of weights with the highest sum:
[[ 1, 2, 3], [0, 0, 2], [1, 0, 2]] would be [1, 2, 3]
I figured it out using map_fn:
I reshaped my tensor to (-1, 120, 256)
tfr = tf.reshape(sometensor, ((-1, 120, 256)))
def func(slice):
f1 = tf.reduce_sum(slice, axis=1)
f2 = tf.argmax(f1)
return(slice[f2])
bla = tf.map_fn(func, tfr)
Which returns (-1,256) with the greatested summed vector (highest set of weights).
Basically, map_fn will iterate along the 2nd to last axis, so it slices a chunk of (120,256) to func repeatedly (how ever many entries are on the first axis). It then returns the appropriate (1,256) chunk by chunk which, voila, gives the answer.

Sample from a tensor in Tensorflow along an axis

I have a matrix L of shape (2,5,2). The values along the last axis form a probability distribution. I want to sample another matrix S of shape (2, 5) where each entry is one of the following integers: 0, 1.
For example,
L = [[[0.1, 0.9],[0.2, 0.8],[0.3, 0.7],[0.5, 0.5],[0.6, 0.4]],
[[0.5, 0.5],[0.9, 0.1],[0.7, 0.3],[0.9, 0.1],[0.1, 0.9]]]
One of the samples could be,
S = [[1, 1, 1, 0, 1],
[1, 1, 1, 0, 1]]
The distributions are binomial in the above example. However, in general, the last dimension of L can be any positive integer, so the distributions can be multinomial.
The samples need to be generated efficiently within Tensorflow computation graph. I know how to do this using numpy using the functions apply_along_axis and numpy.random.multinomial.
You can use tf.multinomial() here.
You will first need to reshape your input tensor to shape [-1, N] (where N is the last dimension of L):
# L has shape [2, 5, 2]
L = tf.constant([[[0.1, 0.9],[0.2, 0.8],[0.3, 0.7],[0.5, 0.5],[0.6, 0.4]],
[[0.5, 0.5],[0.9, 0.1],[0.7, 0.3],[0.9, 0.1],[0.1, 0.9]]])
dims = L.get_shape().as_list()
N = dims[-1] # here N = 2
logits = tf.reshape(L, [-1, N]) # shape [10, 2]
Now we can apply the function tf.multinomial() to logits:
samples = tf.multinomial(logits, 1)
# We reshape to match the initial shape minus the last dimension
res = tf.reshape(samples, dims[:-1])
Be cautious when using tf.multinomial(). The inputs to the function should be logits and not probability distributions.
However, in your example, the last axis is a probability distribution.

How do I swap tensor's axes in TensorFlow?

I have a tensor of shape (30, 116, 10), and I want to swap the first two dimensions, so that I have a tensor of shape (116, 30, 10)
I saw that numpy as such a function implemented (np.swapaxes) and I searched for something similar in tensorflow but I found nothing.
Do you have any idea?
tf.transpose provides the same functionality as np.swapaxes, although in a more generalized form. In your case, you can do tf.transpose(orig_tensor, [1, 0, 2]) which would be equivalent to np.swapaxes(orig_np_array, 0, 1).
It is possible to use tf.einsum to swap axes if the number of input dimensions is unknown. For example:
tf.einsum("ij...->ji...", input) will swap the first two dimensions of input;
tf.einsum("...ij->...ji", input) will swap the last two dimensions;
tf.einsum("aij...->aji...", input) will swap the second and the third
dimension;
tf.einsum("ijk...->kij...", input) will permute the first three dimensions;
and so on.
You can transpose just the last two axes with tf.linalg.matrix_transpose, or more generally, you can swap any number of trailing axes by working out what the leading indices are dynamically, and using relative indices for the axes you want to transpose
x = tf.ones([5, 3, 7, 11])
trailing_axes = [-1, -2]
leading = tf.range(tf.rank(x) - len(trailing_axes)) # [0, 1]
trailing = trailing_axes + tf.rank(x) # [3, 2]
new_order = tf.concat([leading, trailing], axis=0) # [0, 1, 3, 2]
res = tf.transpose(x, new_order)
res.shape # [5, 3, 11, 7]