Make an LSTM prediction written in NumPy faster - numpy

I have written a bidirectional-LSTM prediction function using NumPy (and not Tensorflow nor PyTorch), and I need to make it faster. The network has three layers, but for the sake of simplicity, I will just show (and time) the first layer. This bi-LSTM layer is called by calling the subfunctions LSTMf() and LSTMb() to process the input data (array of 500 points) forward and backwards. The LSTMf() and LSTMb() have loops which I suspect take the most time. Here is the prediction function:
import numpy as np
def predict(xt, ht, c, u, t, whff, wxff, bff, whif, wxif, bif, whlf, wxlf, blf, whof, wxof, bof, whfb,
wxfb, bfb, whib, wxib, bib, whlb, wxlb, blb, whob, wxob, bob):
def tanh(a):
return np.tanh(a)
def sig(a):
return 1 / (1 + np.exp(-a))
def cell(x, h, c, wh1, wx1, b1, wh2, wx2, b2, wh3, wx3, b3, wh4, wx4, b4):
new_c = c * sig(h # wh1 + x # wx1 + b1) + sig(h # wh2 + x # wx2 + b2) * tanh(h # wh3 + x # wx3 + b3)
new_h = tanh(new_c) * sig(h # wh4 + x # wx4 + b4)
return new_c, new_h
def LSTMf(xt, ht, c, t, whf, wxf, bf, whi, wxi, bi, whl, wxl, bl, who, wxo, bo):
h = ht[t - 1:t]
for i in range(t):
c, h = cell(xt[i:i + 1], h, c, whf, wxf, bf, whi, wxi, bi, whl, wxl, bl, who, wxo, bo)
ht[i] = h
return ht
def LSTMb(xt, ht, c, t, whf, wxf, bf, whi, wxi, bi, whl, wxl, bl, who, wxo, bo):
h = ht[0:1]
for i in range(t - 1, -1, -1):
c, h = cell(xt[i:i + 1], h, c, whf, wxf, bf, whi, wxi, bi, whl, wxl, bl, who, wxo, bo)
ht[i] = h
return ht
# LSTM-bi 1
hf = LSTMf(xt, ht.copy(), c, t, whff, wxff, bff, whif, wxif, bif, whlf, wxlf, blf, whof, wxof, bof)
hb = LSTMb(xt, ht.copy(), c, t, whfb, wxfb, bfb, whib, wxib, bib, whlb, wxlb, blb, whob, wxob, bob)
xt = np.concatenate((hf, hb), axis=1)
return xt
The input data and the rest of parameters can be artificially generated with the following code:
t = 500 # input's number of points
u = 64 # layer's number of units
xt = np.zeros((t, 1), dtype=np.float32) # input
ht = np.zeros((t, u), dtype=np.float32)
ou = np.zeros((1, u), dtype=np.float32)
uu = np.zeros((u, u), dtype=np.float32)
weights = {'wxif':ou,'wxff':ou,'wxlf':ou,'wxof':ou,'whif':uu,'whff':uu,'whlf':uu,'whof':uu,'bif':ou,'bff':ou,'blf':ou,'bof':ou,
'wxib':ou,'wxfb':ou,'wxlb':ou,'wxob':ou,'whib':uu,'whfb':uu,'whlb':uu,'whob':uu,'bib':ou,'bfb':ou,'blb':ou,'bob':ou}
yt = predict(xt, ht, ou, **weights) # Call example
I have timed it (1) like this, (2) with Numba, and (3) with Cython:
import numpy as np
from predict import predict
from predict_numba import predict_numba
from predict_cython import predict_cython
import timeit
n = 100
print(timeit.Timer(lambda: predict(xt, ht, ou, u, t, **weights)).timeit(n)/n) # 0.05198 s
predict_numba(xt, ht, ou, u, t, **weights) # Dummy slow numba call
print(timeit.Timer(lambda: predict_numba(xt, ht, ou, u, t, **weights)).timeit(n)/n) # 0.01149 s
print(timeit.Timer(lambda: predict_cython(xt, ht, ou, u, t, **weights)).timeit(n)/n) # 0.13345 s
I would like to make this prediction faster than 0.03 s.
Numba is fast enough but I cannot have a very slow first call (more than 30 s for the three layers)
Cython is very slow; I'm not sure if this is the reason, but following the advice here (Cython: matrix multiplication) I did not type most parameters since the operation '#' does not support memory views.
Originally I was using Keras with CPU or GPU, but NumPy is faster than either. I have also heard of TorchScript which might be applicable. What can I do to make the prediction faster?
__
Context: This function predicts the R-peaks in an ECG window, and is meant to be called as frequently as possible, to predict the R-peaks of an ECG being acquired in real-time.
PS. In case you want to make sense of the calculations, this description of how an LSTM cell works might be of use: https://imgur.com/UFrd9oa

Related

How to fit data to model with analytical gradient in basinhopping or with another gradient descent method?

I'd like to fit experimental data to a model and extract the optimal model parameters, the parameters that result in minimal error between model function and experimental data. To get the optimal parameters, I'd like to use a gradient descent method, tensorflow, Bayesian inference or basinhopping or something that deals well with bad initial estimates and is rigid. To speed things up, I'd like to use the analytical gradient for example in basinhopping. How do I do that with the basinghopping routine from scipy. In the following example code, I have some example function and I'd like to use the analytical Jacobian instead of the numerical one, but I get an error. Do I have to sum up the Jacobian components?
Example code (my actual function is much more complex)
import random
import matplotlib.pyplot as plt
import numpy as np
# symbolic math
from sympy import lambdify, symbols, cos
from sympy.tensor.array import derive_by_array
# fitting
from scipy.optimize import basinhopping
# symbolic math with sympy ---
s_lst = x, a, b, c, d = symbols('x, a, b, c d', positive=True)
# mathematical function
y = a*x + cos(b*x)**2 * c*x**2 + d
# jacobian (derivatives after model parameters)
params = s_lst[1:]
jac_y = derive_by_array(y, params)
# translate sympy expression to python function
# function
get_y = lambdify(s_lst, y)
# jacobian (derivatives in a, b, c, d)
get_jac_y = [lambdify(s_lst, element) for element in jac_y]
#print(len(get_jac_y))
# data ---
x = np.linspace(0, 1, 500)
# measurement data
a = [random.randrange(4, 6, 1) for i in range(len(x))]
b = [random.randrange(3190, 3290, 1) for i in range(len(x))]
c = [random.randrange(90, 109, 1) for i in range(len(x))]
d = [0.1*random.randrange(0, 2, 1) for i in range(len(x))]
y_measured = get_y(x, a, b, c, d)
# exemplary model data
a, b, c, d = 5, 3200, 100, 1
y_model = get_y(x, a, b, c, d)
# plot
plt.plot(x, y_measured)
plt.plot(x, y_model)
plt.title('exemplary model and measured data')
plt.show()
# functions for fitting
def func(params, args1, args2=None):
a, b, c, d = params
y = get_y(args1, a, b, c, d)
if args2 is None:
return y
return np.sum((y - args2)**2)
# derivatives
def dfunc(params, args1, args2):
a, b, c, d = params
jac = [jac(args1, a, b, c, d) for jac in get_jac_y]
# because derviative in d is one
jac[-1] = np.ones(len(args1))
return np.asarray(jac)
# function and derivatives
def objective_func(params, args1, args2):
f = func(params, args1, args2)
df = dfunc(params, args1, args2)
return f, df
# fit with basinhopping and scipy ---
# initial model parameters
x0 = [1, 2, 33, 4]
# minimization with numerical jacobian, gives a result
minimizer_kwargs = {"args":(x, y_measured), 'method':'L-BFGS-B'}
ret = basinhopping(func, x0, minimizer_kwargs=minimizer_kwargs)
# minimization with analytical jacobian, fails,
# error: failed in converting 7th argument `g' of _lbfgsb.setulb to C/Fortran array
minimizer_kwargs = {"args":(x, y_measured), 'method':'L-BFGS-B', 'jac':True}
ret = basinhopping(objective_func, x0, minimizer_kwargs=minimizer_kwargs)
If I put in dfunc something like return [np.sum((j)) for j in jac] the program runs but fails. What would be the correct expression?

SVD Inversion, Moore Penrose and and LSQ give different answers using Numpy

I am solving a matrix using different methods. According to my interpretation of the numpy descriptions, all three of my tested methods (SVD inversion, moore-penrose inversion, and Least Squares) should result in the same answer. However, the SVD inversion results in a very different answer. I cannot find a mathematical reason for this in Numerical Recipes. Is there a Numpy implementation nuance that is causing this?
I am using the following code on Python 3.8.10, Numpy 1.21.4, in a jupyter notebook
y = np.array([176, 166, 194])
x = np.array([324, 322, 376])
x = np.stack([x, np.ones_like(x)], axis=1)
# Solve the matrix using singular value decomposition
u, s, vh = np.linalg.svd(x, full_matrices=False)
s = np.where(s < np.finfo(s.dtype).eps, 0, s)
manual_scale, manual_offset = vh # np.linalg.inv(np.diag(s)) # u.T # y
display(manual_scale, manual_offset, manual_scale * x + manual_offset)
# Solve the matrix using Moore-Penrose Inversion
# Manually
manual_scale, manual_offset = np.linalg.inv(x.T # x) # x.T # y
display(manual_scale, manual_offset, manual_scale * x + manual_offset)
# Using supplied numpy methods
manual_scale, manual_offset = np.linalg.pinv(x) # y
display(manual_scale, manual_offset, manual_scale * x + manual_offset)
# Solve using lstsq
((manual_scale, manual_offset), residuals, rank, s) = np.linalg.lstsq(x, y)
display(manual_scale, manual_offset, manual_scale * x + manual_offset)
The output (edited for clarity) is then
'SVD'
0.6091639943577222
29.167637174498772
array([[226.53677135, 29.77680117],
[225.31844336, 29.77680117],
[258.21329905, 29.77680117]])
'Manual Moore-Penrose'
0.4388335704125341
29.170697012800005
array([[171.35277383, 29.60953058],
[170.47510669, 29.60953058],
[194.17211949, 29.60953058]])
'Moore-Penrose'
0.43883357041251736
29.170697012802187
array([[171.35277383, 29.60953058],
[170.47510669, 29.60953058],
[194.17211949, 29.60953058]])
'LSTSQ'
/tmp/ipykernel_261995/387148285.py:24: FutureWarning: `rcond` parameter will change to the default of machine precision times ``max(M, N)`` where M and N are the input matrix dimensions.
To use the future default and silence this warning we advise to pass `rcond=None`, to keep using the old, explicitly pass `rcond=-1`.
((manual_scale, manual_offset), residuals, rank, s) = np.linalg.lstsq(x, y)
0.43883357041251814
29.17069701280214
array([[171.35277383, 29.60953058],
[170.47510669, 29.60953058],
[194.17211949, 29.60953058]])
As you can see three later methods get the same result, yet the manual svd calculation is different. What is going on?
You are missing a transpose of vh. The SVD solution should be
manual_scale, manual_offset = vh.T # np.linalg.inv(np.diag(s)) # u.T # y
By the way, you can simplify the inverse of the diagonal factor:
manual_scale, manual_offset = vh.T # np.diag(1/s) # u.T # y
(That assumes there are no zeros in s.)
For the next person who needs this, the fixed code is below. Thanks Warren!
y = np.array([176, 166, 194])
x = np.array([324, 322, 376])
x = np.stack([x, np.ones_like(x)], axis=1)
# Solve the matrix using singular value decomposition
u, s, vh = np.linalg.svd(x, full_matrices=False)
s = np.where(s < np.finfo(s.dtype).eps, 0, s)
manual_scale, manual_offset = vh.T # np.diag(1/s) # u.T # y
display('SVD')
display(manual_scale, manual_offset, manual_scale * x + manual_offset)
# Solve the matrix using Moore-Penrose Inversion
# Manually
manual_scale, manual_offset = np.linalg.inv(x.T # x) # x.T # y
display('Manual Moore-Penrose')
display(manual_scale, manual_offset, manual_scale * x + manual_offset)
# Using supplied numpy methods
manual_scale, manual_offset = np.linalg.pinv(x) # y
display('Moore-Penrose')
display(manual_scale, manual_offset, manual_scale * x + manual_offset)
# Solve using lstsq
display('LSTSQ')
((manual_scale, manual_offset), residuals, rank, s) = np.linalg.lstsq(x, y)
display(manual_scale, manual_offset, manual_scale * x + manual_offset)

What is wrong with my cython implementation of erosion operation of mathematical morphology

I have produced a naive implementation of "erosion". The performance is not relevant since I just trying to understand the algorithm. However, the output of my implementation does not match the one I get from scipy.ndimage. What is wrong with my implementation ?
Here is my implementation with a small test case:
import numpy as np
from PIL import Image
# a small image to play with a cross structuring element
imgmat = np.array([
[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
[0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0],
[0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0],
[0,0,0,0,0,1,1,0,0,1,1,1,1,0,0,0,1,1,0,0,1,0,0,0,0,0,0,0,0,0],
[0,0,0,0,0,1,1,0,0,1,1,1,1,0,0,0,1,1,1,0,1,0,0,0,0,0,0,0,0,0],
[0,0,0,0,0,1,1,0,0,1,1,1,1,0,0,0,1,1,1,1,1,0,0,0,0,0,0,0,0,0],
[0,0,0,0,0,1,1,0,0,1,1,1,1,0,0,0,1,1,1,1,1,0,0,0,0,0,0,0,0,0],
[0,0,0,0,0,1,1,0,0,1,1,1,1,0,0,0,0,1,1,1,1,0,0,0,0,0,0,0,0,0],
[0,0,0,0,0,1,1,0,0,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
[0,0,0,0,0,1,1,0,0,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
[0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,1,1,1,1,1,0,0,1,0,1,1,0,0,0,0],
[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,1,0,0,1,0,1,1,0,0,0,0],
[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,1,1,0,0,0,0],
[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,1,1,0,0,0,0],
[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,0,0,0,0],
])
imgmat2 = np.where(imgmat == 0, 0, 255).astype(np.uint8)
imarr = Image.fromarray(imgmat2).resize((100, 200))
imarr = np.array(imgrrr)
imarr = np.where(imarr == 0, 0, 1)
se_mat3 = np.array([
[0,1,0],
[1,1,1],
[0,1,0]
])
se_mat31 = np.where(se_mat3 == 1, 0, 1)
The imarr is .
My implementation of erosion:
%%cython -a
import numpy as np
cimport numpy as cnp
cdef erosionC(cnp.ndarray[cnp.int_t, ndim=2] img,
cnp.ndarray[cnp.int_t, ndim=2] B, cnp.ndarray[cnp.int_t, ndim=2] X):
"""
X: image coordinates
struct_element_mat: black and white image, black region is considered as the shape
of structuring element
This operation checks whether (B *includes* X) = $B \subset X$
as per defined in
Serra (Jean), « Introduction to mathematical morphology »,
Computer Vision, Graphics, and Image Processing,
vol. 35, nᵒ 3 (septembre 1986).
URL : https://linkinghub.elsevier.com/retrieve/pii/0734189X86900022..
doi: 10.1016/0734-189X(86)90002-2
Consulted le 6 août 2020, p. 283‑305.
"""
cdef cnp.ndarray[cnp.int_t, ndim=1] a, x, bx
cdef cnp.ndarray[cnp.int_t, ndim=2] Bx, B_frame, Xcp, b
cdef bint check
a = B[0] # get an anchor point from the structuring element coordinates
B_frame = B - a # express the se element coordinates in with respect to anchor point
Xcp = X.copy()
b = img.copy()
for x in X: # X contains the foreground coordinates in the image
Bx = B_frame + x # translate relative coordinates with respect to foreground coordinates considering it as the anchor point
check = True # this is erosion so if any of the se coordinates is not in foreground coordinates we consider it a miss
for bx in Bx: # Bx contains all the translated coordinates of se
if bx not in Xcp:
check = False
if check:
b[x[0], x[1]] = 1 # if there is a hit
else:
b[x[0], x[1]] = 0 # if there is no hit
return b
def erosion(img: np.ndarray, struct_el_mat: np.ndarray, foregroundValue = 0):
B = np.argwhere(struct_el_mat == 0)
X = np.argwhere(img == foregroundValue)
nimg = erosionC(img, B, X)
return np.where(nimg == 1, 255, 0)
The calling code for both is:
from scipy import ndimage as nd
err = nd.binary_erosion(imarr, se_mat3)
imerrCustom = erosion(imarr, se_mat31, foregroundValue=1)
err produces
imerrCustom produces
In the end, I am still not sure about it, but after having read several papers more, I assume that my interpretation of X as foreground coordinates was an error. It should have probably been the entire image that is being iterated.
As I have stated I am not sure if this interpretation is correct as well. But I made a new implementation which iterates over the image, and it gives a more plausible result. I am sharing it in here, hoping that it might help someone:
%%cython -a
import numpy as np
cimport numpy as cnp
cdef dilation_c(cnp.ndarray[cnp.uint8_t, ndim=2] X,
cnp.ndarray[cnp.uint8_t, ndim=2] SE):
"""
X: boolean image
SE: structuring element matrix
origin: coordinate of the origin of the structuring element
This operation checks whether (B *hits* X) = $B \cap X \not = \emptyset$
as per defined in
Serra (Jean), « Introduction to mathematical morphology »,
Computer Vision, Graphics, and Image Processing,
vol. 35, nᵒ 3 (septembre 1986).
URL : https://linkinghub.elsevier.com/retrieve/pii/0734189X86900022..
doi: 10.1016/0734-189X(86)90002-2
Consulted le 6 août 2020, p. 283‑305.
The algorithm adapts DILDIRECT of
Najman (Laurent) et Talbot (Hugues),
Mathematical morphology: from theory to applications,
2013. ISBN : 9781118600788, p. 329
to the formula given in
Jähne (Bernd),
Digital image processing,
6th rev. and ext. ed, Berlin ; New York,
2005. TA1637 .J34 2005.
ISBN : 978-3-540-24035-8.
"""
cdef cnp.ndarray[cnp.uint8_t, ndim=2] O
cdef list elst
cdef int r, c, X_rows, X_cols, SE_rows, SE_cols, se_r, se_c
cdef cnp.ndarray[cnp.int_t, ndim=1] bp
cdef list conds
cdef bint check, b, p, cond
O = np.zeros_like(X)
X_rows, X_cols = X.shape[:2]
SE_rows, SE_cols = SE.shape[:2]
# a boolean convolution
for r in range(0, X_rows-SE_rows):
for c in range(0, X_cols - SE_cols):
conds = []
for se_r in range(SE_rows):
for se_c in range(SE_cols):
b = <bint>SE[se_r, se_c]
p = <bint>X[se_r+r, se_c+c]
conds.append(b and p)
O[r,c] = <cnp.uint8_t>any(conds)
return O
def dilation_erosion(
img: np.ndarray,
struct_el_mat: np.ndarray,
foregroundValue: int = 1,
isErosion: bool = False):
"""
img: image matrix
struct_el: NxN mesh grid of the structuring element whose center is SE's origin
structuring element is encoded as 1
foregroundValue: value to be considered as foreground in the image
"""
B = struct_el_mat.astype(np.uint8)
if isErosion:
X = np.where(img == foregroundValue, 0, 1).astype(np.uint8)
else:
X = np.where(img == foregroundValue, 1, 0).astype(np.uint8)
nimg = dilation_c(X, B)
foreground, background = (255, 0) if foregroundValue == 1 else (0, 1)
if isErosion:
return np.where(nimg == 1, background, foreground).astype(np.uint8)
else:
return np.where(nimg == 1, foreground, background).astype(np.uint8)
# return nimg

Evaluating the squared term of a gaussian kernel for having a covariance matrix for multi-dimensional inputs [duplicate]

I have the following code. It is taking forever in Python. There must be a way to translate this calculation into a broadcast...
def euclidean_square(a,b):
squares = np.zeros((a.shape[0],b.shape[0]))
for i in range(squares.shape[0]):
for j in range(squares.shape[1]):
diff = a[i,:] - b[j,:]
sqr = diff**2.0
squares[i,j] = np.sum(sqr)
return squares
You can use np.einsum after calculating the differences in a broadcasted way, like so -
ab = a[:,None,:] - b
out = np.einsum('ijk,ijk->ij',ab,ab)
Or use scipy's cdist with its optional metric argument set as 'sqeuclidean' to give us the squared euclidean distances as needed for our problem, like so -
from scipy.spatial.distance import cdist
out = cdist(a,b,'sqeuclidean')
I collected the different methods proposed here, and in two other questions, and measured the speed of the different methods:
import numpy as np
import scipy.spatial
import sklearn.metrics
def dist_direct(x, y):
d = np.expand_dims(x, -2) - y
return np.sum(np.square(d), axis=-1)
def dist_einsum(x, y):
d = np.expand_dims(x, -2) - y
return np.einsum('ijk,ijk->ij', d, d)
def dist_scipy(x, y):
return scipy.spatial.distance.cdist(x, y, "sqeuclidean")
def dist_sklearn(x, y):
return sklearn.metrics.pairwise.pairwise_distances(x, y, "sqeuclidean")
def dist_layers(x, y):
res = np.zeros((x.shape[0], y.shape[0]))
for i in range(x.shape[1]):
res += np.subtract.outer(x[:, i], y[:, i])**2
return res
# inspired by the excellent https://github.com/droyed/eucl_dist
def dist_ext1(x, y):
nx, p = x.shape
x_ext = np.empty((nx, 3*p))
x_ext[:, :p] = 1
x_ext[:, p:2*p] = x
x_ext[:, 2*p:] = np.square(x)
ny = y.shape[0]
y_ext = np.empty((3*p, ny))
y_ext[:p] = np.square(y).T
y_ext[p:2*p] = -2*y.T
y_ext[2*p:] = 1
return x_ext.dot(y_ext)
# https://stackoverflow.com/a/47877630/648741
def dist_ext2(x, y):
return np.einsum('ij,ij->i', x, x)[:,None] + np.einsum('ij,ij->i', y, y) - 2 * x.dot(y.T)
I use timeit to compare the speed of the different methods. For the comparison, I use vectors of length 10, with 100 vectors in the first group, and 1000 vectors in the second group.
import timeit
p = 10
x = np.random.standard_normal((100, p))
y = np.random.standard_normal((1000, p))
for method in dir():
if not method.startswith("dist_"):
continue
t = timeit.timeit(f"{method}(x, y)", number=1000, globals=globals())
print(f"{method:12} {t:5.2f}ms")
On my laptop, the results are as follows:
dist_direct 5.07ms
dist_einsum 3.43ms
dist_ext1 0.20ms <-- fastest
dist_ext2 0.35ms
dist_layers 2.82ms
dist_scipy 0.60ms
dist_sklearn 0.67ms
While the two methods dist_ext1 and dist_ext2, both based on the idea of writing (x-y)**2 as x**2 - 2*x*y + y**2, are very fast, there is a downside: When the distance between x and y is very small, due to cancellation error the numerical result can sometimes be (very slightly) negative.
Another solution besides using cdist is the following
difference_squared = np.zeros((a.shape[0], b.shape[0]))
for dimension_iterator in range(a.shape[1]):
difference_squared = difference_squared + np.subtract.outer(a[:, dimension_iterator], b[:, dimension_iterator])**2.

hessian of a variable returned by tf.concat() is None

Let x and y be vectors of length N, and z is a function z = f(x,y). In Tensorflow v1.0.0, tf.hessians(z,x) and tf.hessians(z,y) both returns an N by N matrix, which is what I expected.
However, when I concatenate the x and y into a vector p of size 2*N using tf.concat, and run tf.hessian(z, p), it returns error "ValueError: None values not supported."
I understand this is because in the computation graph x,y ->z and x,y -> p, so there is no gradient between p and z. To circumvent the problem, I can create p first, slice it into x and y, but I will have to change a ton of my code. Is there a more elegant way?
related question: Slice of a variable returns gradient None
import tensorflow as tf
import numpy as np
N = 2
A = tf.Variable(np.random.rand(N,N).astype(np.float32))
B = tf.Variable(np.random.rand(N,N).astype(np.float32))
x = tf.Variable(tf.random_normal([N]) )
y = tf.Variable(tf.random_normal([N]) )
#reshape to N by 1
x_1 = tf.reshape(x,[N,1])
y_1 = tf.reshape(y,[N,1])
#concat x and y to form a vector with length of 2*N
p = tf.concat([x,y],axis = 0)
#define the function
z = 0.5*tf.matmul(tf.matmul(tf.transpose(x_1), A), x_1) + 0.5*tf.matmul(tf.matmul(tf.transpose(y_1), B), y_1) + 100
#works , hx and hy are both N by N matrix
hx = tf.hessians(z,x)
hy = tf.hessians(z,y)
#this gives error "ValueError: None values not supported."
#expecting a matrix of size 2*N by 2*N
hp = tf.hessians(z,p)
Compute the hessian by its definition.
gxy = tf.gradients(z, [x, y])
gp = tf.concat([gxy[0], gxy[1]], axis=0)
hp = []
for i in range(2*N):
hp.append(tf.gradients(gp[i], [x, y]))
Because tf.gradients computes the sum of (dy/dx), so when computing the second partial derivative, one should slice the vector into scalars and then compute the gradient. Tested on tf1.0 and python2.