hessian of a variable returned by tf.concat() is None - tensorflow

Let x and y be vectors of length N, and z is a function z = f(x,y). In Tensorflow v1.0.0, tf.hessians(z,x) and tf.hessians(z,y) both returns an N by N matrix, which is what I expected.
However, when I concatenate the x and y into a vector p of size 2*N using tf.concat, and run tf.hessian(z, p), it returns error "ValueError: None values not supported."
I understand this is because in the computation graph x,y ->z and x,y -> p, so there is no gradient between p and z. To circumvent the problem, I can create p first, slice it into x and y, but I will have to change a ton of my code. Is there a more elegant way?
related question: Slice of a variable returns gradient None
import tensorflow as tf
import numpy as np
N = 2
A = tf.Variable(np.random.rand(N,N).astype(np.float32))
B = tf.Variable(np.random.rand(N,N).astype(np.float32))
x = tf.Variable(tf.random_normal([N]) )
y = tf.Variable(tf.random_normal([N]) )
#reshape to N by 1
x_1 = tf.reshape(x,[N,1])
y_1 = tf.reshape(y,[N,1])
#concat x and y to form a vector with length of 2*N
p = tf.concat([x,y],axis = 0)
#define the function
z = 0.5*tf.matmul(tf.matmul(tf.transpose(x_1), A), x_1) + 0.5*tf.matmul(tf.matmul(tf.transpose(y_1), B), y_1) + 100
#works , hx and hy are both N by N matrix
hx = tf.hessians(z,x)
hy = tf.hessians(z,y)
#this gives error "ValueError: None values not supported."
#expecting a matrix of size 2*N by 2*N
hp = tf.hessians(z,p)

Compute the hessian by its definition.
gxy = tf.gradients(z, [x, y])
gp = tf.concat([gxy[0], gxy[1]], axis=0)
hp = []
for i in range(2*N):
hp.append(tf.gradients(gp[i], [x, y]))
Because tf.gradients computes the sum of (dy/dx), so when computing the second partial derivative, one should slice the vector into scalars and then compute the gradient. Tested on tf1.0 and python2.

Related

batched tensor slice, slice B x N x M with B x 1

I have an B x M x N tensor, X, and I have and B x 1 tensor, Y, which corresponds to the index of tensor X at dimension=1 that I want to keep. What is the shorthand for this slice so that I can avoid a loop?
Essentially I want to do this:
Z = torch.zeros(B,N)
for i in range(B):
Z[i] = X[i][Y[i]]
the following code is similar to the code in the loop. the difference is that instead of sequentially indexing the array Z,X and Y we are indexing them in parallel using the array i
B, M, N = 13, 7, 19
X = np.random.randint(100, size= [B,M,N])
Y = np.random.randint(M , size= [B,1])
Z = np.random.randint(100, size= [B,N])
i = np.arange(B)
Y = Y.ravel() # reducing array to rank-1, for easy indexing
Z[i] = X[i,Y[i],:]
this code can be further simplified as
-> Z[i] = X[i,Y[i],:]
-> Z[i] = X[i,Y[i]]
-> Z[i] = X[i,Y]
-> Z = X[i,Y]
pytorch equivalent code
B, M, N = 5, 7, 3
X = torch.randint(100, size= [B,M,N])
Y = torch.randint(M , size= [B,1])
Z = torch.randint(100, size= [B,N])
i = torch.arange(B)
Y = Y.ravel()
Z = X[i,Y]
The answer provided by #Hammad is short and perfect for the job. Here's an alternative solution if you're interested in using some less known Pytorch built-ins. We will use torch.gather (similarly you can achieve this with numpy.take).
The idea behind torch.gather is to construct a new tensor-based on two identically shaped tensors containing the indices (here ~ Y) and the values (here ~ X).
The operation performed is Z[i][j][k] = X[i][Y[i][j][k]][k].
Since X's shape is (B, M, N) and Y shape is (B, 1) we are looking to fill in the blanks inside Y such that Y's shape becomes (B, 1, N).
This can be achieved with some axis manipulation:
>>> Y.expand(-1, N)[:, None] # expand to dim=1 to N and unsqueeze dim=1
The actual call to torch.gather will be:
>>> X.gather(dim=1, index=Y.expand(-1, N)[:, None])
Which you can reshape to (B, N) by adding in [:, 0].
This function can be very effective in tricky scenarios...

Efficient implementation of factorization machine with matrix operations?

Link is here : https://www.csie.ntu.edu.tw/~r01922136/slides/ffm.pdf (slides 5-6)
Given the following matrices:
X : n * d
W : d * k
Is there an efficient way to calculate the n x 1 matrix using only matrix operations (eg. numpy, tensorflow), where the jth element is :
EDIT:
Current attempt is this, but obviously it's not very space efficient, as it requires storing matrices of size n*d*d :
n = 1000
d = 256
k = 32
x = np.random.normal(size=[n,d])
w = np.random.normal(size=[d,k])
xxt = np.matmul(x.reshape([n,d,1]),x.reshape([n,1,d]))
wwt = np.matmul(w.reshape([1,d,k]),w.reshape([1,k,d]))
output = xxt*wwt
output = np.sum(output,(1,2))
Avoid large temporary arrays
Not all types of algorithms are that easily or obviously to vectorize. The np.sum(xxt*wwt) can be rewritten using np.einsum. This should be faster than your solution, but has some other limitations (eg. no multithreading).
I would therefor suggest using a compiler like Numba.
Example
import numpy as np
import numba as nb
import time
#nb.njit(fastmath=True,parallel=True)
def factorization_nb(w,x):
n = x.shape[0]
d = x.shape[1]
k = w.shape[1]
output=np.empty(n,dtype=w.dtype)
wwt=np.dot(w.reshape((d,k)),w.reshape((k,d)))
for i in nb.prange(n):
sum=0.
for j in range(d):
for jj in range(d):
sum+=x[i,j]*x[i,jj]*wwt[j,jj]
output[i]=sum
return output
def factorization_orig(w,x):
n = x.shape[0]
d = x.shape[1]
k = w.shape[1]
xxt = np.matmul(x.reshape([n,d,1]),x.reshape([n,1,d]))
wwt = np.matmul(w.reshape([1,d,k]),w.reshape([1,k,d]))
output = xxt*wwt
output = np.sum(output,(1,2))
return output
Mesuring Performance
n = 1000
d = 256
k = 32
x = np.random.normal(size=[n,d])
w = np.random.normal(size=[d,k])
#first call has some compilation overhead
res_1=factorization_nb(w,x)
t1=time.time()
for i in range(100):
res_1=factorization_nb(w,x)
#res_2=factorization_orig(w,x)
print(time.time()-t1)
Timings
factorization_nb: 4.2 ms per iteration
factorization_orig: 460 ms per iteration (110x speedup)
For an einsum implemtnation in pytorch, it would be something like
V = torch.randn([50, 10])
x = torch.randn([50])
result = (torch.einsum('ik,jk,i,j->', V, V, x, x)-torch.einsum('ik,ik,i,i->', V, V, x, x))/2
where we subtract the contribution from the feature weight being dotted with itself.

matmul function for vector with tensor multiplication in tensorflow

In general when we multiply a vector v of dimension 1*n with a tensor T of dimension m*n*k, we expect to get a matrix/tensor of dimension m*k/m*1*k. This means that our tensor has m slices of matrices with dimension n*k, and v is multiplied to each matrix and the resulting vectors are stacked together. In order to do this multiplication in tensorflow, I came up with the following formulation. I am just wondering if there is any built-in function that does this standard multiplication straightforward?
T = tf.Variable(tf.random_normal((m,n,k)), name="tensor")
v = tf.Variable(tf.random_normal((1,n)), name="vector")
c = tf.stack([v,v]) # m times, here set m=2
output = tf.matmul(c,T)
You can do it with:
tf.reduce_sum(tf.expand_dims(v,2)*T,1)
Code:
m, n, k = 2, 3, 4
T = tf.Variable(tf.random_normal((m,n,k)), name="tensor")
v = tf.Variable(tf.random_normal((1,n)), name="vector")
c = tf.stack([v,v]) # m times, here set m=2
out1 = tf.matmul(c,T)
out2 = tf.reduce_sum(tf.expand_dims(v,2)*T,1)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
n_out1 = sess.run(out1)
n_out2 = sess.run(out2)
#both n_out1 and n_out2 matches
Not sure if there is a better way, but it sounds like you could use tf.map_fn like this:
output = tf.map_fn(lambda x: tf.matmul(v, x), T)

Evaluating the squared term of a gaussian kernel for having a covariance matrix for multi-dimensional inputs [duplicate]

I have the following code. It is taking forever in Python. There must be a way to translate this calculation into a broadcast...
def euclidean_square(a,b):
squares = np.zeros((a.shape[0],b.shape[0]))
for i in range(squares.shape[0]):
for j in range(squares.shape[1]):
diff = a[i,:] - b[j,:]
sqr = diff**2.0
squares[i,j] = np.sum(sqr)
return squares
You can use np.einsum after calculating the differences in a broadcasted way, like so -
ab = a[:,None,:] - b
out = np.einsum('ijk,ijk->ij',ab,ab)
Or use scipy's cdist with its optional metric argument set as 'sqeuclidean' to give us the squared euclidean distances as needed for our problem, like so -
from scipy.spatial.distance import cdist
out = cdist(a,b,'sqeuclidean')
I collected the different methods proposed here, and in two other questions, and measured the speed of the different methods:
import numpy as np
import scipy.spatial
import sklearn.metrics
def dist_direct(x, y):
d = np.expand_dims(x, -2) - y
return np.sum(np.square(d), axis=-1)
def dist_einsum(x, y):
d = np.expand_dims(x, -2) - y
return np.einsum('ijk,ijk->ij', d, d)
def dist_scipy(x, y):
return scipy.spatial.distance.cdist(x, y, "sqeuclidean")
def dist_sklearn(x, y):
return sklearn.metrics.pairwise.pairwise_distances(x, y, "sqeuclidean")
def dist_layers(x, y):
res = np.zeros((x.shape[0], y.shape[0]))
for i in range(x.shape[1]):
res += np.subtract.outer(x[:, i], y[:, i])**2
return res
# inspired by the excellent https://github.com/droyed/eucl_dist
def dist_ext1(x, y):
nx, p = x.shape
x_ext = np.empty((nx, 3*p))
x_ext[:, :p] = 1
x_ext[:, p:2*p] = x
x_ext[:, 2*p:] = np.square(x)
ny = y.shape[0]
y_ext = np.empty((3*p, ny))
y_ext[:p] = np.square(y).T
y_ext[p:2*p] = -2*y.T
y_ext[2*p:] = 1
return x_ext.dot(y_ext)
# https://stackoverflow.com/a/47877630/648741
def dist_ext2(x, y):
return np.einsum('ij,ij->i', x, x)[:,None] + np.einsum('ij,ij->i', y, y) - 2 * x.dot(y.T)
I use timeit to compare the speed of the different methods. For the comparison, I use vectors of length 10, with 100 vectors in the first group, and 1000 vectors in the second group.
import timeit
p = 10
x = np.random.standard_normal((100, p))
y = np.random.standard_normal((1000, p))
for method in dir():
if not method.startswith("dist_"):
continue
t = timeit.timeit(f"{method}(x, y)", number=1000, globals=globals())
print(f"{method:12} {t:5.2f}ms")
On my laptop, the results are as follows:
dist_direct 5.07ms
dist_einsum 3.43ms
dist_ext1 0.20ms <-- fastest
dist_ext2 0.35ms
dist_layers 2.82ms
dist_scipy 0.60ms
dist_sklearn 0.67ms
While the two methods dist_ext1 and dist_ext2, both based on the idea of writing (x-y)**2 as x**2 - 2*x*y + y**2, are very fast, there is a downside: When the distance between x and y is very small, due to cancellation error the numerical result can sometimes be (very slightly) negative.
Another solution besides using cdist is the following
difference_squared = np.zeros((a.shape[0], b.shape[0]))
for dimension_iterator in range(a.shape[1]):
difference_squared = difference_squared + np.subtract.outer(a[:, dimension_iterator], b[:, dimension_iterator])**2.

solving a sparse non linear system of equations using scipy.optimize.root

I want to solve the following non-linear system of equations.
Notes
the dot between a_k and x represents dot product.
the 0 in the first equation represents 0 vector and 0 in the second equation is scaler 0
all the matrices are sparse if that matters.
Known
K is an n x n (positive definite) matrix
each A_k is a known (symmetric) matrix
each a_k is a known n x 1 vector
N is known (let's say N = 50). But I need a method where I can easily change N.
Unknown (trying to solve for)
x is an n x 1 a vector.
each alpha_k for 1 <= k <= N a scaler
My thinking.
I am thinking of using scipy root to find x and each alpha_k. We essentially have n equations from each row of the first equation and another N equations from the constraint equations to solve for our n + N variables. Therefore we have the required number of equations to have a solution.
I also have a reliable initial guess for x and the alpha_k's.
Toy example.
n = 4
N = 2
K = np.matrix([[0.5, 0, 0, 0], [0, 1, 0, 0],[0,0,1,0], [0,0,0,0.5]])
A_1 = np.matrix([[0.98,0,0.46,0.80],[0,0,0.56,0],[0.93,0.82,0,0.27],[0,0,0,0.23]])
A_2 = np.matrix([[0.23, 0,0,0],[0.03,0.01,0,0],[0,0.32,0,0],[0.62,0,0,0.45]])
a_1 = np.matrix(scipy.rand(4,1))
a_2 = np.matrix(scipy.rand(4,1))
We are trying to solve for
x = [x1, x2, x3, x4] and alpha_1, alpha_2
Questions:
I can actually brute force this toy problem and feed it to the solver. But how do I do I solve this toy problem in such a way that I can extend it easily to the case when I have let's say n=50 and N=50
I will probably have to explicitly compute the Jacobian for larger matrices??.
Can anyone give me any pointers?
I think the scipy.optimize.root approach holds water, but steering clear of the trivial solution might be the real challenge for this system of equations.
In any event, this function uses root to solve the system of equations.
def solver(x0, alpha0, K, A, a):
'''
x0 - nx1 numpy array. Initial guess on x.
alpha0 - nx1 numpy array. Initial guess on alpha.
K - nxn numpy.array.
A - Length N List of nxn numpy.arrays.
a - Length N list of nx1 numpy.arrays.
'''
# Establish the function that produces the rhs of the system of equations.
n = K.shape[0]
N = len(A)
def lhs(x_alpha):
'''
x_alpha is a concatenation of x and alpha.
'''
x = np.ravel(x_alpha[:n])
alpha = np.ravel(x_alpha[n:])
lhs_top = np.ravel(K.dot(x))
for k in xrange(N):
lhs_top += alpha[k]*(np.ravel(np.dot(A[k], x)) + np.ravel(a[k]))
lhs_bottom = [0.5*x.dot(np.ravel(A[k].dot(x))) + np.ravel(a[k]).dot(x)
for k in xrange(N)]
lhs = np.array(lhs_top.tolist() + lhs_bottom)
return lhs
# Solve the system of equations.
x0.shape = (n, 1)
alpha0.shape = (N, 1)
x_alpha_0 = np.vstack((x0, alpha0))
sol = root(lhs, x_alpha_0)
x_alpha_root = sol['x']
# Compute norm of residual.
res = sol['fun']
res_norm = np.linalg.norm(res)
# Break out the x and alpha components.
x_root = x_alpha_root[:n]
alpha_root = x_alpha_root[n:]
return x_root, alpha_root, res_norm
Running on the toy example, however, only produces the trivial solution.
# Toy example.
n = 4
N = 2
K = np.matrix([[0.5, 0, 0, 0], [0, 1, 0, 0],[0,0,1,0], [0,0,0,0.5]])
A_1 = np.matrix([[0.98,0,0.46,0.80],[0,0,0.56,0],[0.93,0.82,0,0.27],
[0,0,0,0.23]])
A_2 = np.matrix([[0.23, 0,0,0],[0.03,0.01,0,0],[0,0.32,0,0],
[0.62,0,0,0.45]])
a_1 = np.matrix(scipy.rand(4,1))
a_2 = np.matrix(scipy.rand(4,1))
A = [A_1, A_2]
a = [a_1, a_2]
x0 = scipy.rand(n, 1)
alpha0 = scipy.rand(N, 1)
print 'x0 =', x0
print 'alpha0 =', alpha0
x_root, alpha_root, res_norm = solver(x0, alpha0, K, A, a)
print 'x_root =', x_root
print 'alpha_root =', alpha_root
print 'res_norm =', res_norm
Output is
x0 = [[ 0.00764503]
[ 0.08058471]
[ 0.88300129]
[ 0.85299622]]
alpha0 = [[ 0.67872815]
[ 0.69693346]]
x_root = [ 9.88131292e-324 -4.94065646e-324 0.00000000e+000
0.00000000e+000]
alpha_root = [ -4.94065646e-324 0.00000000e+000]
res_norm = 0.0