Autodiff implementation for gradient calculation - tensorflow

I have worked through some papers about the autodiff algorithm to implement it for myself (for learning purposes). I compared my algorithm in test cases to the output of tensorflow and their outputs did not match in most cases. Therefor i worked through the tutorial from this side and implemented it with tensorflow operations just for the matrix multiplication operation since that was one of the operations that did not work:
gradient of matmul and unbroadcast method:
def gradient_matmul(node, dx, adj):
# dx is needed to know which of both parents should be derived
a = node.parents[0]
b = node.parents[1]
# the operation was node.tensor = tf.matmul(a.tensor, b,tensor)
if a == dx or b == dx:
# result depends on which of the parents is the derivative
mm = tf.matmul(adj, tf.transpose(b.tensor)) if a == dx else \
tf.matmul(tf.transpose(a.tensor), adj)
return mm
else:
return None
def unbroadcast(adjoint, node):
dim_a = len(adjoint.shape)
dim_b = len(node.shape)
if dim_a > dim_b:
sum = tuple(range(dim_a - dim_b))
res = tf.math.reduce_sum(adjoint, axis = sum)
return res
return adjoint
And finally the gradient calculation autodiff algorithm:
def gradient(y, dx):
working = [y]
adjoints = defaultdict(float)
adjoints[y] = tf.ones(y.tensor.shape)
while len(working) != 0:
curr = working.pop(0)
if curr == dx:
return adjoints[curr]
if curr.is_store:
continue
adj = adjoints[curr]
for p in curr.parents:
# for testing with matrix multiplication as only operation
local_grad = gradient_matmul(curr, p, adj)
adjoints[p] = unbroadcast(tf.add(adjoints[p], local_grad), p.tensor)
if not p in working:
working.append(p)
Yet it produces the same output as my initial implementation.
I constructed a matrix multiplication test case:
x = tf.constant([[[1.0, 1.0], [2.0, 3.0]], [[4.0, 5.0], [6.0, 7.0]]])
y = tf.constant([[3.0, -7.0], [-1.0, 5.0]])
z = tf.constant([[[1, 1], [2.0, 2]], [[3, 3], [-1, -1]]])
w = tf.matmul(tf.matmul(x, y), z)
Where w should be derived for each of the variables.
Tensorflow calculates the gradient:
[<tf.Tensor: shape=(2, 2, 2), dtype=float32, numpy=
array([[[-22., 18.],
[-22., 18.]],
[[ 32., -16.],
[ 32., -16.]]], dtype=float32)>, <tf.Tensor: shape=(2, 2), dtype=float32, numpy=
array([[66., -8.],
[80., -8.]], dtype=float32)>, <tf.Tensor: shape=(2, 2, 2), dtype=float32, numpy=
array([[[ 5., 5.],
[ -1., -1.]],
[[ 18., 18.],
[-10., -10.]]], dtype=float32)>]
My implementation calculates:
[[[-5. 7.]
[-5. 7.]]
[[-5. 7.]
[-5. 7.]]]
[[33. 22.]
[54. 36.]]
[[[ 9. 9.]
[14. 14.]]
[[-5. -5.]
[-6. -6.]]]
Maybe the problem is the difference between numpys dot and tensorflows matmul?
But then i don't know to fix the gradient or unbroadcast for the tensorflow method...
Thanks for taking the time to look over my code! :)

I found the error, the gradient matmul should have been:
def gradient_matmul(node, dx, adj):
a = node.parents[0]
b = node.parents[1]
if a == dx:
return tf.matmul(adj, b.tensor, transpose_b=True)
elif b == dx:
return tf.matmul(a.tensor, adj, transpose_a=True)
else:
return None
Since i only want to transpose the last 2 dimensions

Related

Jacobian of a vector in Tensorflow

I think this question has never been properly answered 8see How to calculate the Jacobian of a vector function with tensorflow or Computing Jacobian in TensorFlow 2.0), so I will try again:
I want to compute the jacobian of the vector valued function z = [x**2 + 2*y, y**2], that is, I want to obtain the matrix of the partial derivatives
[[2x, 0],
[2, 2y]]
(being automatic differentiation, this matrix will be for an specific point).
with tf.GradientTape() as g:
x = tf.Variable(1.0)
y = tf.Variable(4.0)
z = tf.convert_to_tensor([x**2 + 2*y, y**2])
jacobian = g.jacobian(z, [x, y])
print(jacobian)
Obtaining
[<tf.Tensor: shape=(2,), dtype=float32, numpy=array([2., 0.], dtype=float32)>, <tf.Tensor: shape=(2,), dtype=float32, numpy=array([2., 8.], dtype=float32)>]
I want to obtain naturally the tensor
[[2., 0.],
[2., 8.]]
not that intermediate result. Can it be done?
Try some thing like this
import numpy as np
import tensorflow as tf
with tf.GradientTape() as g:
x = tf.Variable(1.0)
y = tf.Variable(4.0)
z = tf.convert_to_tensor([x**2 + 2*y, y**2])
jacobian = g.jacobian(z, [x, y])
print(np.array([jacob.numpy() for jacob in jacobian]))
Result
[[2. 0.]
[2. 8.]]

tf.math.bincount - use min/max weight instead of weight sum

I would like to get a max/min value in tf.math.bincount instead of the weight sum. Basically currently it works as:
values = tf.constant([1,1,2,3,2,4,4,5])
weights = tf.constant([1,5,0,1,0,5,4,5])
tf.math.bincount(values, weights=weights) #[0 6 0 1 9 5]
However, I would like to get max/min for the conflicting weights instead, e.g. for max it should return:
[0 5 0 1 5 5]
It requires some finessing, but you can accomplish this as follows:
def bincount_with_max_weight(values: tf.Tensor, weights: tf.Tensor) -> tf.Tensor:
_range = tf.range(tf.reduce_max(values) + 1)
return tf.map_fn(lambda x: tf.maximum(
tf.reduce_max(tf.gather(weights, tf.where(tf.equal(values, x)))), 0), _range)
The output for the example case is:
[0 5 0 1 5 5]
Breaking it down, the first line computes the range of values in values:
_range = tf.range(tf.reduce_max(values) + 1)
and in the second line, the maximum of weight is computed per element in _range using tf.map_fn with tf.where, which retrieves indices where the clause is true, and tf.gather, which retrieves the values corresponding to supplied indices.
The tf.maximum wraps the output to handle the case where the element does not exist in values i.e; in the example case, 0 does not exist in values so the output without tf.maximum would be INT_MIN for 0:
[-2147483648 5 0 1 5 5]
This could also be applied on the final result tensor instead of per element:
def bincount_with_max_weight(values: tf.Tensor, weights: tf.Tensor) -> tf.Tensor:
_range = tf.range(tf.reduce_max(values) + 1)
result = tf.map_fn(lambda x:
tf.reduce_max(tf.gather(weights, tf.where(tf.equal(values, x)))), _range)
return tf.maximum(result, 0)
Note that this would not work if negative weights are utilized - in that case, tf.where can be used for comparing against the minimum integer value (tf.int32.min in the example, although this can be applied for any numeric dtype) instead of applying tf.maximum:
def bincount_with_max_weight(values: tf.Tensor, weights: tf.Tensor) -> tf.Tensor:
_range = tf.range(tf.reduce_max(values) + 1)
result = tf.map_fn(lambda x:
tf.reduce_max(tf.gather(weights, tf.where(tf.equal(values, x)))), _range)
return tf.where(tf.equal(result, tf.int32.min), 0, result)
Update
For handling the 2D Tensor case, we can use tf.map_fn to apply the maximum weight function to each pair of values and weights in the batch:
def bincount_with_max_weight(values: tf.Tensor, weights: tf.Tensor, axis: Optional[int] = None) -> tf.Tensor:
_range = tf.range(tf.reduce_max(values) + 1)
def mapping_function(x: int, _values: tf.Tensor, _weights: tf.Tensor) -> tf.Tensor:
return tf.reduce_max(tf.gather(_weights, tf.where(tf.equal(_values, x))))
if axis == -1:
result = tf.map_fn(lambda pair: tf.map_fn(lambda x: mapping_function(x, *pair), _range), (values, weights),
dtype=tf.int32)
else:
result = tf.map_fn(lambda x: mapping_function(x, values, weights), _range)
return tf.where(tf.equal(result, tf.int32.min), 0, result)
For the 2D example provided:
values = tf.constant([[1, 1, 2, 3], [2, 1, 4, 5]])
weights = tf.constant([[1, 5, 0, 1], [0, 5, 4, 5]])
print(bincount_with_max_weight(values, weights, axis=-1))
The output is:
tf.Tensor(
[[0 5 0 1 0 0]
[0 5 0 0 4 5]], shape=(2, 6), dtype=int32)
This implementation is a generalization of the approach originally described - if axis is omitted, it will compute results for the 1D case.
For Faster Execution try this,
values = tf.constant([[1,1,2,3], [2,1,4,5]])
weights = tf.constant([[1,5,0,1], [0,5,4,5]])
def find_max_bins(output , values , weights):
np.maximum.at(output , values , weights)
return output
#tf.function(input_signature=[tf.TensorSpec(shape=[None], dtype = tf.float32),
tf.TensorSpec(shape=[None], dtype = tf.int32),
tf.TensorSpec(shape=[None], dtype = tf.int32)
])
def tf_function(output , values , weights):
print(values)
y = tf.numpy_function(find_max_bins, [output , values , weights], tf.float32)
return y
length = np.max(values)+1
initial_value = [0 for x in range(length)]
variable = tf.Variable(initial_value = initial_value, shape=(length) , dtype=tf.float32)
for i , (value , weight) in enumerate(zip(values , weights)):
if(i > 0):
output = tf.stack([output , tf_function(variable , value , weight)] , 0)
else:
output = tf_function(variable , value , weight)
variable.assign_sub(initial_value)
Output:
<tf.Tensor: shape=(2, 6), dtype=float32, numpy=
array([[0., 5., 0., 1., 0., 0.],
[0., 5., 0., 0., 4., 5.]], dtype=float32)>

Problem about getting None from the GradientTape.gradient in TensorFlow

I tried the following code:
from d2l import tensorflow as d2l
import tensorflow as tf
#tf.function
def corr2d(X, k, Y): ##save
"""Compute 2D cross-correlation."""
with tf.GradientTape() as tape:
for i in range(Y.shape[0]):
for j in range(Y.shape[1]):
Y[i, j].assign(tf.reduce_sum(tf.multiply(X[i: i + h, j: j + w], k)))
print('Gradients = ', tape.gradient(Y, k)) # show the gradient
print('Watched Variables = ', tape.watched_variables()) # show the watched varaibles
print(tf.__version__)
Xin= tf.constant([[0.0, 1.0, 2.0], [3.0, 4.0, 5.0], [6.0, 7.0, 8.0]])
kernel = tf.Variable([[0.0, 1.0], [2.0, 3.0]])
h, w = kernel.shape
Y_hat = tf.Variable(tf.zeros((Xin.shape[0] - h + 1, Xin.shape[1] - w + 1))) # prepare the output tensor
corr2d(X, kernel, Y_hat)
print(Y_hat)
I got the following results:
2.4.1
Gradients = None
Watched Variables = (<tf.Variable 'Variable:0' shape=(2, 2) dtype=float32>, <tf.Variable 'Variable:0' shape=(2, 2) dtype=float32>)
<tf.Variable 'Variable:0' shape=(2, 2) dtype=float32, numpy=
array([[19., 25.],
[37., 43.]], dtype=float32)>
Can anyone explain why the returned gradient is None even though the source variable kernel is included in the list of watched variables?
I'm not sure I really understood what you were trying to do. You were passing your variable as the target for the gradient.
It is always easier to think in terms of cost function and variables.
Let's say your cost function is y = x ** 2. In this case, it is possible to calculate the gradient of y with respect to x.
Basically, you did not have a function to calculate any gradient with respect to k.
I have done a small change. Check for the variable cost.
import tensorflow as tf
def corr2d(X, k, Y): ##save
"""Compute 2D cross-correlation."""
with tf.GradientTape() as tape:
cost = 0
for i in range(Y.shape[0]):
for j in range(Y.shape[1]):
Y[i, j].assign(tf.reduce_sum(tf.multiply(X[i: i + h, j: j + w], k)))
cost = cost + tf.reduce_sum(tf.multiply(X[i: i + h, j: j + w], k))
print('\nGradients = ', tape.gradient(cost, k)) # show the gradient
print('Watched Variables = ', tape.watched_variables()) # show the watched varaibles
print(tf.__version__)
Xin= tf.constant([[0.0, 1.0, 2.0], [3.0, 4.0, 5.0], [6.0, 7.0, 8.0]])
kernel = tf.Variable([[0.0, 1.0], [2.0, 3.0]])
h, w = kernel.shape
Y_hat = tf.Variable(tf.zeros((Xin.shape[0] - h + 1, Xin.shape[1] - w + 1))) # prepare the output tensor
corr2d(Xin, kernel, Y_hat)
print(Y_hat)
And now, you will get
Gradients = tf.Tensor(
[[ 8. 12.]
[20. 24.]], shape=(2, 2), dtype=float32)
Watched Variables = (<tf.Variable 'Variable:0' shape=(2, 2) dtype=float32, numpy=
array([[0., 1.],
[2., 3.]], dtype=float32)>, <tf.Variable 'Variable:0' shape=(2, 2) dtype=float32, numpy=
array([[19., 25.],
[37., 43.]], dtype=float32)>)
<tf.Variable 'Variable:0' shape=(2, 2) dtype=float32, numpy=
array([[19., 25.],
[37., 43.]], dtype=float32)>

How does a process of optimization go with tensorflow?

I have simple graph in tensorflow
(1) X = tf.Variable(dtype=tf.float32, shape=(1, 3), name="X", initial_value=np.array([[1,2,3]]))
(2) y = tf.reduce_sum(tf.square(X)) - 2 * tf.reduce_sum(tf.sin(tf.square(X)))
(3) training_op = tf.train.GradientDescentOptimizer(0.3).minimize(y)
Here's the code for 5 steps of gradient descent:
with tf.Session() as sess:
sess.run(init)
for i in range(5):
(4) *res, _ = sess.run(fetches=[X, y, training_op])
print(res)
[array([[1., 2., 3.]], dtype=float32), 13.006426]
[array([[ 1.0483627 , -0.76874477, -2.080069 ]], dtype=float32), 4.9738936]
[array([[ 0.9910337 , -1.0735381 , 0.10702228]], dtype=float32), -1.3677568]
[array([[ 1.0567244 , -0.95272505, 0.17122723]], dtype=float32), -1.3784065]
[array([[ 0.978967 , -1.0848547 , 0.27387527]], dtype=float32), -1.4229481]
I'm trying to figure out how its optimization process goes. Could you please explain it step by step?
I thought it should be like this:
Evaluate X (1)
Evaluate y (2)
Calculate gradient and make a step (3) (as here it says "Calling minimize() takes care of both computing the gradients and applying them to the variables."
Then yield all requested in fetches variables (4)
But the output shows that at first run yields initial values, so I'm confused...
tf version == '1.15.0'
Thank you in advance!
upd1. If I change the order in fetches list, the output is still the same.
with tf.Session() as sess:
sess.run(init)
for i in range(5):
_, *res = sess.run(fetches=[training_op, X, y])
print(res)
[array([[1., 2., 3.]], dtype=float32), 13.006426]
[array([[ 1.0483627 , -0.76874477, -2.080069 ]], dtype=float32), 4.9738936]
[array([[ 0.9910337 , -1.0735381 , 0.10702228]], dtype=float32), -1.3677568]
[array([[ 1.0567244 , -0.95272505, 0.17122723]], dtype=float32), -1.3784065]
[array([[ 0.978967 , -1.0848547 , 0.27387527]], dtype=float32), -1.4229481]
upd2. A slight modification of the answer by #thushv89 does what I initially expected to see:
with tf.Session() as sess:
sess.run(init)
for i in range(2):
res = sess.run(fetches=[X, y])
print('Variables before the step', res)
sess.run(training_op)
res = sess.run(fetches=[X, y])
print('Variables after the step', res)
print()
Variables before the step [array([[1., 2., 3.]], dtype=float32), 13.006426]
Variables after the step [array([[ 1.0483627 , -0.76874477, -2.080069 ]], dtype=float32), 4.9738936]
Variables before the step [array([[ 1.0483627 , -0.76874477, -2.080069 ]], dtype=float32), 4.9738936]
Variables after the step [array([[ 0.9910337 , -1.0735381 , 0.10702228]], dtype=float32), -1.3677568]
You have fetches=[X, y, training_op]. These don't respect the order (At least you shouldn't expect sess.run() to respect the order). Which means, all of the,
Evaluates X (so the training_op hasn't happened yet)
Evaluate y (still the training_op hasn't happened yet)
Executes training_op (now, X and y have changed).
gets executed and then the results are fetched. If you want the variable X to change first,
Option 1: Breaking the sess.run() function
r1 = sess.run(X)
_, r2 = sess.run(fetches=[training_op, y])
print(r1,r2)
Option 2: Using a separate tf.Variable with tf.control_dependencies
X = tf.Variable(dtype=tf.float32, shape=(1, 3), name="X", initial_value=np.array([[1,2,3]]))
prevX = tf.Variable(dtype=tf.float32, shape=(1, 3), name="prevX", initial_value=np.array([[1,2,3]]))
y = tf.reduce_sum(tf.square(X)) - 2 * tf.reduce_sum(tf.sin(tf.square(X)))
assign_op = tf.assign(prevX, X)
with tf.control_dependencies([assign_op]):
training_op = tf.train.GradientDescentOptimizer(0.3).minimize(y)
with tf.Session() as sess:
init = tf.global_variables_initializer()
sess.run(init)
for i in range(5):
*res, _ = sess.run(fetches=[prevX, y, training_op])
print(res)

How to get the Jacobian matrix form derivative of vector by vector in TensorFlow Eager Execution API?

In the MLP model the input of layer l can be computed by this formula:
z = Wa + b
W is the weight matrix between layer l-1 and layer l, a is the output signal of layer l-1 neuron, b is the bias of layer l.
For example:
I want to use TensorFlow Eager Execution API to get the derivatives:
I define a function to calculate the value of z:
def f002(W, a, b):
return tf.matmul(W, a) + b
My main program:
def test001(args={}):
tf.enable_eager_execution()
tfe = tf.contrib.eager
a = tf.reshape(tf.constant([1.0, 2.0, 3.0]), [3, 1])
W = tf.constant([[4.0, 5.0, 6.0],[7.0, 8.0, 9.0]])
b = tf.reshape(tf.constant([1001.0, 1002.0]), [2, 1])
z = f002(W, a, b)
print(z)
grad_f1 = tfe.gradients_function(f002)
dv = grad_f1(W, a, b)
print(dv)
I can get the correct value of z in forward mode. But when print the derivative results it displayed something like these:
[<tf.Tensor: id=17, shape=(2, 3), dtype=float32, numpy=
array([[1., 2., 3.],
[1., 2., 3.]], dtype=float32)>, <tf.Tensor: id=18, shape=(3, 1),
dtype=float32, numpy=
array([[11.],
[13.],
[15.]], dtype=float32)>, <tf.Tensor: id=16, shape=(2, 1),
dtype=float32, numpy=
array([[1.],
[1.]], dtype=float32)>]
This is not what I want. How to get the Jacobian matrix derivative result of vector by vector?