Mapping timeseries sequence input shape to desired output shape using EinsumDense - tensorflow

Can anyone help me understand how to handle compressing/expanding the dimension of a tensor using EinsumDense?
I have a timeseries (not NLP) input tensor of the shape (batch, horizon, features) wherein the intended output is (1, H, F); H is an arbitrary horizon and F is an arbitrary feature size. I'm actually using EinsumDense as my Feed Forward Network in a transformer encoder module and as a final dense layer in the transformer's output. The FFN should map (1, horizon, features) to (1, H, features) and the final dense layer should map (1, H, features) to (1, H, F).
My current equation is shf,h->shf for the FFN, and shf,hfyz->syz for the dense layer, however I'm getting a less than optimal result as compared to my original setup where there was no change in the horizon length and my equations were shf,h->shf and shf,hz->shz respectively.

My two cents,
First, an intuitive understanding of the transformer encoder: Given (batch, horizon, features), the attention mechanism tries to find a weighted linear combination of the projected features. The resulting weights are learned via attention scores obtained by operating between features, over each horizon. The FFN layer that comes next should be a linear combination of values within features.
Coming to EinsumDense by way of example we have two tensors:
a: Data (your input tensor to EinsumDense)
b: Weights (EinsumDense's internal weights tensor)
# create random data in a 3D tensor
a = tf.random.uniform(minval=1, maxval=3, shape=(1,2,3), dtype=tf.int32)
# [[[1, 2, 2],
# [2, 2, 1]]]
shf,h->shf:
This just scales the individual features.
b = tf.random.uniform(minval=2, maxval=4, shape=(2,), dtype=tf.int32)
# [3, 2]
tf.einsum('shf,h->shf', a, b)
# [[[3, 6, 6], #1st feature is scaled with 3
# [4, 4, 2]]]] #2nd feature is scaled with 2
shf,hz->shz: This does a linear combination within features
b = tf.random.uniform(minval=2, maxval=4, shape=(2,6), dtype=tf.int32)
# [[3, 3, 3, 3, 3, 3],
# [2, 2, 2, 3, 2, 3]]
tf.einsum('shf,hz->shz', a, b)
# [[[15, 15, 15, 15, 15, 15],
# [10, 10, 10, 15, 10, 15]]]
# every value is a linear combination of the first feature [1, 2, 2] with b. The first value is sum([1,2,2]*3)
The above two resembles the transformer encoder architecture, with a feature scaling layer. And the output structure is preserved (batch, H, F)
shf,hfyz->syz: This does both between features and within features combination.
b = tf.random.uniform(minval=2, maxval=4, shape=(2,3,4,5), dtype=tf.int32)
tf.einsum('shf,hfyz->syz', a,b)
# each element output `(i,j)` is a dot product of a and b[:,:,i,j]
# first element is tf.reduce_sum(a*b[:,:,0,0])
Here the output (s,y,z), y doesnt correspond to horizon and z doesn't correspond to features, but a combination of values between them.

Related

Average pooling tensorflow layer with differently shaped input tensors

I have extracted the embeddings for a particular entity X from every sentence in my dataset. Where X is mentioned more than once within the same sentence, this yields an embedding for each mention: I'd like to put these through an average pooling layer to arrive at a single embedding for X in each sentence.
Simplified working example:
import tensorflow as tf
embeddings = tf.constant([[1, 1, 1],
[2, 2, 2],
[4, 4, 4],
[5, 5, 5]])
# Let's imagine rows [1, 1, 1] & [4, 4, 4]
# correspond to embeddings for X from the same sentence
# We can indicate sentence belonging through an sent_idxs variable:
sent_idxs = tf.constant([0, 1, 0, 2])
With the help of related stackoverflow questions (Torch - How to calculate average of tensors with the same indexes, Summing over specific indices PyTorch (similar to scatter_add)), I could average embeddings corresponding to the same sentence like this:
unique_idxs, _, counts = tf.unique_with_counts(sent_idxs) # counts = ([2, 1, 1])
result_holder = tf.zeros([unique_idxs.shape[0], embeddings.shape[1]], dtype= embeddings.dtype)
embeddings = tf.tensor_scatter_nd_add(result_holder, tf.expand_dims(sent_idxs, axis=1), embeddings)
embeddings /= counts[:, None]
However, I would prefer to re-shape my original embeddings to instead perform the averaging with AveragePooling2D or AveragePooling1D, and I'm really struggling with imagining the appropriate shape to enable this.

How can one utilize the indices provided by torch.topk()?

Suppose I have a pytorch tensor x of shape [N, N_g, 2]. It can be viewed as N * N_g 2d vectors. Specifically, x[i, j, :] is the 2d vector of the jth group in the ith batch.
Now I am trying to get the coordinates of vectors of top 5 length in each group. So I tried the following:
(i) First I used x_len = (x**2).sum(dim=2).sqrt() to compute their lengths, resulting in x_len.shape==[N, N_g].
(ii) Then I used tk = x_len.topk(5) to get the top 5 lengths in each group.
(iii) The desired output would be a tensor x_top5 of shape [N, 5, 2]. Naturally I thought of using tk.indices to index x so as to obtain x_top5. But I failed as it seems such indexing is not supported.
How can I do this?
A minimal example:
x = torch.randn(10,10,2) # N=10 is the batchsize, N_g=10 is the group size
x_len = (x**2).sum(dim=2).sqrt()
tk = x_len.topk(5)
x_top5 = x[tk.indices]
print(x_top5.shape)
# torch.Size([10, 5, 10, 2])
However, this gives x_top5 as a tensor of shape [10, 5, 10, 2], instead of [10, 5, 2] as desired.

Separating custom keras metric inputs into two seperate metrics and finding median error

I have a ResNet network that I am using for a camera pose network. I have replaced the final classifier layer with a 1024 dense layer and then a 7 dense layer (first 3 for xyz, final 4 for quaternion).
My problem is that I want to record the xyz error and the quaternion error as two separate errors or metrics (Instead of just mean absolute error of all 7). The inputs of the custom metric template of customer_error(y_true,y_pred) are tensors. I don't know how to separate the inputs into two different xyz and q arrays. The function runs at compile time, when the tensors are empty and don't have any numpy components.
Ultimately I want to get the median xyz and q error using
median = tensorflow_probability.stats.percentile(input,q=50, interpolation='linear').
Any help would be really appreciated.
You could use the tf.slice() to extract just the first three elements of your model output.
import tensorflow as tf
# enabling eager mode to demo the slice fn
tf.compat.v1.enable_eager_execution()
import numpy as np
# just creating a random array dimesions size (2, 7)
# where 2 is just an arbitrary value chosen for the batch dimension
out = np.arange(0,14).reshape(2,7)
print(out)
# array([[ 0, 1, 2, 3, 4, 5, 6],
# [ 7, 8, 9, 10, 11, 12, 13]])
# put it in a tf variable
out_tf = tf.Variable(out)
# now using the slice operator
xyz = tf.slice(out_tf, begin=[0, 0], size=[-1,3])
# lets see what it looked like
print(xyz)
# <tf.Tensor: id=11, shape=(2, 3), dtype=int64, numpy=
# array([[0, 1, 2],
# [7, 8, 9]])>
Could wrap something like this into your custom metric to get what you need.
def xyz_median(y_true, y_pred):
"""get the median of just the X,Y,Z coords
UNTESTED though :)
"""
# slice to get just the xyz
xyz = tf.slice(out_tf, begin=[0, 0], size=[-1,3])
median = tfp.stats.percentile(xyz, q=50, interpolation='linear')
return median

Sample from a tensor in Tensorflow along an axis

I have a matrix L of shape (2,5,2). The values along the last axis form a probability distribution. I want to sample another matrix S of shape (2, 5) where each entry is one of the following integers: 0, 1.
For example,
L = [[[0.1, 0.9],[0.2, 0.8],[0.3, 0.7],[0.5, 0.5],[0.6, 0.4]],
[[0.5, 0.5],[0.9, 0.1],[0.7, 0.3],[0.9, 0.1],[0.1, 0.9]]]
One of the samples could be,
S = [[1, 1, 1, 0, 1],
[1, 1, 1, 0, 1]]
The distributions are binomial in the above example. However, in general, the last dimension of L can be any positive integer, so the distributions can be multinomial.
The samples need to be generated efficiently within Tensorflow computation graph. I know how to do this using numpy using the functions apply_along_axis and numpy.random.multinomial.
You can use tf.multinomial() here.
You will first need to reshape your input tensor to shape [-1, N] (where N is the last dimension of L):
# L has shape [2, 5, 2]
L = tf.constant([[[0.1, 0.9],[0.2, 0.8],[0.3, 0.7],[0.5, 0.5],[0.6, 0.4]],
[[0.5, 0.5],[0.9, 0.1],[0.7, 0.3],[0.9, 0.1],[0.1, 0.9]]])
dims = L.get_shape().as_list()
N = dims[-1] # here N = 2
logits = tf.reshape(L, [-1, N]) # shape [10, 2]
Now we can apply the function tf.multinomial() to logits:
samples = tf.multinomial(logits, 1)
# We reshape to match the initial shape minus the last dimension
res = tf.reshape(samples, dims[:-1])
Be cautious when using tf.multinomial(). The inputs to the function should be logits and not probability distributions.
However, in your example, the last axis is a probability distribution.

Outer product in tensorflow

In tensorflow, there are nice functions for entrywise and matrix multiplication, but after looking through the docs, I cannot find any internal function for taking an outer product of two tensors, i.e., making a bigger tensor by all possible products of elements of smaller tensors (like numpy.outer):
v_{i,j} = x_i*h_j
or
M_{ij,kl} = A_{ij}*B_{kl}
Does tensorflow have such a function?
Yes, you can do this by taking advantage of the broadcast semantics of tensorflow. Size the first out to size 1xN of itself, and the second to size Mx1 of itself, and you'll get a broadcast to MxN of all of the results when you multiply them.
(You can play around with the same thing in numpy to see how it behaves in a simpler context, btw:
a = np.array([1, 2, 3, 4, 5]).reshape([5,1])
b = np.array([6, 7, 8, 9, 10]).reshape([1,5])
a*b
How exactly you do it in tensorflow depends a bit on which axes you want to use and what semantics you want for the resulting multiply, but the general idea applies.
It is somewhat surprising that until recently there was no easy and "natural" way of doing an outer product between arbitrary tensors (also known as "tensor product") in tensorflow, especially given the name of the library...
With tensorflow>=1.6 you can now finally get what you want with a simple:
M = tf.tensordot(A, B, axes=0)
In earlier versions of tensorflow, axes=0 raises a ValueError: 'axes' must be at least 1.. Somehow tf.tensordot() used to need at least one dimension to actually sum over. The easy way out is to simply add a "fake" dimension with tf.expand_dims().
On tensorflow<=1.5 you can thus get the same result as above by doing:
M = tf.tensordot(tf.expand_dims(A, 0), tf.expand_dims(B, 0), axes=[[0],[0]])
This adds a new index of dimension 1 in location 0 for both tensors and then lets tf.tensordot() sum over those indices.
In case someone else stumbles upon this, according to the tensorflow docs you can use the tf.einsum() function to compute the outer product of two tensors a and b:
# Outer product
>>> einsum('i,j->ij', u, v) # output[i,j] = u[i]*v[j]
tf.multiply (and its '*' shortcut) result in an outer product, whether or not a batch is used. In particular, if the two input tensors have a 3D shapes of [batch, n, 1] and [batch, 1, n] then this op will calculate the outer product for [n,1],[1,n] per each sample in the batch. If there is no batch, so that the two input tensors are 2D, this op will calculate the outer product just the same.
On the other hand, while tf.tensordot yields the outer product for 2D matrices, it did not broadcast similarly when a batch was added.
Without a batch:
a_np = np.array([[1, 2, 3]]) # shape: (1,3) [a row vector], 2D Tensor
b_np = np.array([[4], [5], [6]]) # shape: (3,1) [a column vector], 2D Tensor
a = tf.placeholder(dtype='float32', shape=[1, 3])
b = tf.placeholder(dtype='float32', shape=[3, 1])
c = a*b # Result: an outer-product of a,b
d = tf.multiply(a,b) # Result: an outer-product of a,b
e = tf.tensordot(a,b, axes=[0,1]) # Result: an outer-product of a,b
With a batch:
a_np = np.array([[[1, 2, 3]], [[4, 5, 6]]]) # shape: (2,1,3) [a batch of two row vectors], 3D Tensor
b_np = np.array([[[7], [8], [9]], [[10], [11], [12]]]) # shape: (2,3,1) [a batch of two column vectors], 3D Tensor
a = tf.placeholder(dtype='float32', shape=[None, 1, 3])
b = tf.placeholder(dtype='float32', shape=[None, 3, 1])
c = a*b # Result: an outer-product per batch
d = tf.multiply(a,b) # Result: an outer-product per batch
e = tf.tensordot(a,b, axes=[1,2]) # Does NOT result with an outer-product per batch
Running any of these two graphs:
sess = tf.Session()
result_astrix = sess.run(c, feed_dict={a:a_np, b: b_np})
result_multiply = sess.run(d, feed_dict={a:a_np, b: b_np})
result_tensordot = sess.run(e, feed_dict={a:a_np, b: b_np})
print('a*b:')
print(result_astrix)
print('tf.multiply(a,b):')
print(result_multiply)
print('tf.tensordot(a,b, axes=[1,2]:')
print(result_tensordot)
As pointed out in the other answers, the outer product can be done using broadcasting:
a = tf.range(10)
b = tf.range(5)
outer = a[..., None] * b[None, ...]
tf.InteractiveSession().run(outer)
# array([[ 0, 0, 0, 0, 0],
# [ 0, 1, 2, 3, 4],
# [ 0, 2, 4, 6, 8],
# [ 0, 3, 6, 9, 12],
# [ 0, 4, 8, 12, 16],
# [ 0, 5, 10, 15, 20],
# [ 0, 6, 12, 18, 24],
# [ 0, 7, 14, 21, 28],
# [ 0, 8, 16, 24, 32],
# [ 0, 9, 18, 27, 36]], dtype=int32)
Explanation:
The a[..., None] inserts a new dimension of length 1 after the last axis.
Similarly, b[None, ...] inserts a new dimension of length 1 before the first axis.
The element-wide multiplication then broadcasts the tensors from shapes (10, 1) * (1, 5) to (10, 5) * (10, 5), computing the outer product.
Where you insert the additional dimensions determines for which dimensions the outer product is computed. For example, if both tensors have a batch size, you can skip that using : which gives a[:, ..., None] * b[:, None, ...]. This can be further abbreviated as a[..., None] * b[:, None]. To perform the outer product over the last dimension and thus supporting any number of batch dimensions, use a[..., None] * b[..., None, :].
I would have commented to MasDra, but SO wouldn't let me as a new registered user.
The general outer product of multiple vectors arranged in a list U of length order can be obtained via
tf.einsum(','.join(string.ascii_lowercase[0:order])+'->'+string.ascii_lowercase[0:order], *U)