How does a process of optimization go with tensorflow? - tensorflow

I have simple graph in tensorflow
(1) X = tf.Variable(dtype=tf.float32, shape=(1, 3), name="X", initial_value=np.array([[1,2,3]]))
(2) y = tf.reduce_sum(tf.square(X)) - 2 * tf.reduce_sum(tf.sin(tf.square(X)))
(3) training_op = tf.train.GradientDescentOptimizer(0.3).minimize(y)
Here's the code for 5 steps of gradient descent:
with tf.Session() as sess:
sess.run(init)
for i in range(5):
(4) *res, _ = sess.run(fetches=[X, y, training_op])
print(res)
[array([[1., 2., 3.]], dtype=float32), 13.006426]
[array([[ 1.0483627 , -0.76874477, -2.080069 ]], dtype=float32), 4.9738936]
[array([[ 0.9910337 , -1.0735381 , 0.10702228]], dtype=float32), -1.3677568]
[array([[ 1.0567244 , -0.95272505, 0.17122723]], dtype=float32), -1.3784065]
[array([[ 0.978967 , -1.0848547 , 0.27387527]], dtype=float32), -1.4229481]
I'm trying to figure out how its optimization process goes. Could you please explain it step by step?
I thought it should be like this:
Evaluate X (1)
Evaluate y (2)
Calculate gradient and make a step (3) (as here it says "Calling minimize() takes care of both computing the gradients and applying them to the variables."
Then yield all requested in fetches variables (4)
But the output shows that at first run yields initial values, so I'm confused...
tf version == '1.15.0'
Thank you in advance!
upd1. If I change the order in fetches list, the output is still the same.
with tf.Session() as sess:
sess.run(init)
for i in range(5):
_, *res = sess.run(fetches=[training_op, X, y])
print(res)
[array([[1., 2., 3.]], dtype=float32), 13.006426]
[array([[ 1.0483627 , -0.76874477, -2.080069 ]], dtype=float32), 4.9738936]
[array([[ 0.9910337 , -1.0735381 , 0.10702228]], dtype=float32), -1.3677568]
[array([[ 1.0567244 , -0.95272505, 0.17122723]], dtype=float32), -1.3784065]
[array([[ 0.978967 , -1.0848547 , 0.27387527]], dtype=float32), -1.4229481]
upd2. A slight modification of the answer by #thushv89 does what I initially expected to see:
with tf.Session() as sess:
sess.run(init)
for i in range(2):
res = sess.run(fetches=[X, y])
print('Variables before the step', res)
sess.run(training_op)
res = sess.run(fetches=[X, y])
print('Variables after the step', res)
print()
Variables before the step [array([[1., 2., 3.]], dtype=float32), 13.006426]
Variables after the step [array([[ 1.0483627 , -0.76874477, -2.080069 ]], dtype=float32), 4.9738936]
Variables before the step [array([[ 1.0483627 , -0.76874477, -2.080069 ]], dtype=float32), 4.9738936]
Variables after the step [array([[ 0.9910337 , -1.0735381 , 0.10702228]], dtype=float32), -1.3677568]

You have fetches=[X, y, training_op]. These don't respect the order (At least you shouldn't expect sess.run() to respect the order). Which means, all of the,
Evaluates X (so the training_op hasn't happened yet)
Evaluate y (still the training_op hasn't happened yet)
Executes training_op (now, X and y have changed).
gets executed and then the results are fetched. If you want the variable X to change first,
Option 1: Breaking the sess.run() function
r1 = sess.run(X)
_, r2 = sess.run(fetches=[training_op, y])
print(r1,r2)
Option 2: Using a separate tf.Variable with tf.control_dependencies
X = tf.Variable(dtype=tf.float32, shape=(1, 3), name="X", initial_value=np.array([[1,2,3]]))
prevX = tf.Variable(dtype=tf.float32, shape=(1, 3), name="prevX", initial_value=np.array([[1,2,3]]))
y = tf.reduce_sum(tf.square(X)) - 2 * tf.reduce_sum(tf.sin(tf.square(X)))
assign_op = tf.assign(prevX, X)
with tf.control_dependencies([assign_op]):
training_op = tf.train.GradientDescentOptimizer(0.3).minimize(y)
with tf.Session() as sess:
init = tf.global_variables_initializer()
sess.run(init)
for i in range(5):
*res, _ = sess.run(fetches=[prevX, y, training_op])
print(res)

Related

Autodiff implementation for gradient calculation

I have worked through some papers about the autodiff algorithm to implement it for myself (for learning purposes). I compared my algorithm in test cases to the output of tensorflow and their outputs did not match in most cases. Therefor i worked through the tutorial from this side and implemented it with tensorflow operations just for the matrix multiplication operation since that was one of the operations that did not work:
gradient of matmul and unbroadcast method:
def gradient_matmul(node, dx, adj):
# dx is needed to know which of both parents should be derived
a = node.parents[0]
b = node.parents[1]
# the operation was node.tensor = tf.matmul(a.tensor, b,tensor)
if a == dx or b == dx:
# result depends on which of the parents is the derivative
mm = tf.matmul(adj, tf.transpose(b.tensor)) if a == dx else \
tf.matmul(tf.transpose(a.tensor), adj)
return mm
else:
return None
def unbroadcast(adjoint, node):
dim_a = len(adjoint.shape)
dim_b = len(node.shape)
if dim_a > dim_b:
sum = tuple(range(dim_a - dim_b))
res = tf.math.reduce_sum(adjoint, axis = sum)
return res
return adjoint
And finally the gradient calculation autodiff algorithm:
def gradient(y, dx):
working = [y]
adjoints = defaultdict(float)
adjoints[y] = tf.ones(y.tensor.shape)
while len(working) != 0:
curr = working.pop(0)
if curr == dx:
return adjoints[curr]
if curr.is_store:
continue
adj = adjoints[curr]
for p in curr.parents:
# for testing with matrix multiplication as only operation
local_grad = gradient_matmul(curr, p, adj)
adjoints[p] = unbroadcast(tf.add(adjoints[p], local_grad), p.tensor)
if not p in working:
working.append(p)
Yet it produces the same output as my initial implementation.
I constructed a matrix multiplication test case:
x = tf.constant([[[1.0, 1.0], [2.0, 3.0]], [[4.0, 5.0], [6.0, 7.0]]])
y = tf.constant([[3.0, -7.0], [-1.0, 5.0]])
z = tf.constant([[[1, 1], [2.0, 2]], [[3, 3], [-1, -1]]])
w = tf.matmul(tf.matmul(x, y), z)
Where w should be derived for each of the variables.
Tensorflow calculates the gradient:
[<tf.Tensor: shape=(2, 2, 2), dtype=float32, numpy=
array([[[-22., 18.],
[-22., 18.]],
[[ 32., -16.],
[ 32., -16.]]], dtype=float32)>, <tf.Tensor: shape=(2, 2), dtype=float32, numpy=
array([[66., -8.],
[80., -8.]], dtype=float32)>, <tf.Tensor: shape=(2, 2, 2), dtype=float32, numpy=
array([[[ 5., 5.],
[ -1., -1.]],
[[ 18., 18.],
[-10., -10.]]], dtype=float32)>]
My implementation calculates:
[[[-5. 7.]
[-5. 7.]]
[[-5. 7.]
[-5. 7.]]]
[[33. 22.]
[54. 36.]]
[[[ 9. 9.]
[14. 14.]]
[[-5. -5.]
[-6. -6.]]]
Maybe the problem is the difference between numpys dot and tensorflows matmul?
But then i don't know to fix the gradient or unbroadcast for the tensorflow method...
Thanks for taking the time to look over my code! :)
I found the error, the gradient matmul should have been:
def gradient_matmul(node, dx, adj):
a = node.parents[0]
b = node.parents[1]
if a == dx:
return tf.matmul(adj, b.tensor, transpose_b=True)
elif b == dx:
return tf.matmul(a.tensor, adj, transpose_a=True)
else:
return None
Since i only want to transpose the last 2 dimensions

Jacobian of a vector in Tensorflow

I think this question has never been properly answered 8see How to calculate the Jacobian of a vector function with tensorflow or Computing Jacobian in TensorFlow 2.0), so I will try again:
I want to compute the jacobian of the vector valued function z = [x**2 + 2*y, y**2], that is, I want to obtain the matrix of the partial derivatives
[[2x, 0],
[2, 2y]]
(being automatic differentiation, this matrix will be for an specific point).
with tf.GradientTape() as g:
x = tf.Variable(1.0)
y = tf.Variable(4.0)
z = tf.convert_to_tensor([x**2 + 2*y, y**2])
jacobian = g.jacobian(z, [x, y])
print(jacobian)
Obtaining
[<tf.Tensor: shape=(2,), dtype=float32, numpy=array([2., 0.], dtype=float32)>, <tf.Tensor: shape=(2,), dtype=float32, numpy=array([2., 8.], dtype=float32)>]
I want to obtain naturally the tensor
[[2., 0.],
[2., 8.]]
not that intermediate result. Can it be done?
Try some thing like this
import numpy as np
import tensorflow as tf
with tf.GradientTape() as g:
x = tf.Variable(1.0)
y = tf.Variable(4.0)
z = tf.convert_to_tensor([x**2 + 2*y, y**2])
jacobian = g.jacobian(z, [x, y])
print(np.array([jacob.numpy() for jacob in jacobian]))
Result
[[2. 0.]
[2. 8.]]

batch axis in keras custom layer

I want to make a custom layer that does the following, given a batch of input vectors.
For each vector a in the batch:
get the first element a[0].
multiply the vector a by a[0] elementwise.
So if the batch is
[[ 1., 2., 3.],
[ 4., 5., 6.],
[ 7., 8., 9.],
[10., 11., 12.]]
This should be a batch of 4 vectors, each with dimension 3 (or am I wrong here?).
Then my layer should transform the batch to the following:
[[ 1., 2., 3.],
[ 16., 20., 24.],
[ 49., 56., 63.],
[100., 110., 120.]]
Here is my implementation for the layer:
class MyLayer(keras.layers.Layer):
def __init__(self, activation=None, **kwargs):
super().__init__(**kwargs)
self.activation = keras.activations.get(activation)
def call(self, a):
scale = a[0]
return self.activation(a * scale)
def get_config(self):
base_config = super().get_config()
return {**base_config,
"activation": keras.activations.serialize(self.activation)}
But the output is different from what I expected:
batch = tf.Variable([[1,2,3],
[4,5,6],
[7,8,9],
[10,11,12]], dtype=tf.float32)
layer = MyLayer()
print(layer(batch))
Output:
tf.Tensor(
[[ 1. 4. 9.]
[ 4. 10. 18.]
[ 7. 16. 27.]
[10. 22. 36.]], shape=(4, 3), dtype=float32)
It looks like the implementation actually treats each column as a vector, which is strange to me because other pre-written models, such as the sequential model, specify the input shape to be (batch_size, ...), which means each row, instead of column, is a vector.
How should I modify my code so that it behaves the way I want?
Actually, your input shape is (4,3). So when you slice this tensor by a[0] it gets the first row which is [1,2,3]. To get what you want you should instead get the first column and then transpose your matrix to give you the desired matrix like this:
def call(self, a):
scale = a[:,0]
return tf.transpose(self.activation(tf.transpose(a) * scale))

Why build timestep-unfold LSTM manually has different outputs from using static_rnn?

Here is my code build LSTM manually:
import tensorflow as tf
import numpy as np
batch_size = 1
hidden_size = 4
num_steps = 3
input_dim = 5
np.random.seed(123)
input = np.ones([batch_size, num_steps, input_dim], dtype=int)
x = tf.placeholder(dtype=tf.float32, shape=[batch_size, num_steps, input_dim], name='input_x')
lstm_cell = tf.nn.rnn_cell.BasicLSTMCell(num_units=hidden_size)
initial_state = lstm_cell.zero_state(batch_size, dtype=tf.float32)
outputs = []
with tf.variable_scope('for_loop', initializer= tf.ones_initializer):
for i in range(num_steps):
if i > 0:
tf.get_variable_scope().reuse_variables()
output = lstm_cell(x[:, i, :], initial_state)
outputs.append(output)
with tf.Session() as sess:
init_op = tf.initialize_all_variables()
sess.run(init_op)
result = sess.run(outputs, feed_dict={x: input})
print(result)
The outputs:
[(array([[0.7536526, 0.7536526, 0.7536526, 0.7536526]], dtype=float32), LSTMStateTuple(c=array([[0.99321693, 0.99321693, 0.99321693, 0.99321693]], dtype=float32), h=array([[0.7536526, 0.7536526, 0.7536526, 0.7536526]], dtype=float32))),
(array([[0.7536526, 0.7536526, 0.7536526, 0.7536526]], dtype=float32), LSTMStateTuple(c=array([[0.99321693, 0.99321693, 0.99321693, 0.99321693]], dtype=float32), h=array([[0.7536526, 0.7536526, 0.7536526, 0.7536526]], dtype=float32))),
(array([[0.7536526, 0.7536526, 0.7536526, 0.7536526]], dtype=float32), LSTMStateTuple(c=array([[0.99321693, 0.99321693, 0.99321693, 0.99321693]], dtype=float32), h=array([[0.7536526, 0.7536526, 0.7536526, 0.7536526]], dtype=float32)))]
While this is the code using static_rnn:
import tensorflow as tf
import numpy as np
batch_size = 1
hidden_size = 4
num_steps = 3
input_dim = 5
np.random.seed(123)
input = np.ones([batch_size, num_steps, input_dim], dtype=int)
x = tf.placeholder(dtype=tf.float32, shape=[batch_size, num_steps, input_dim], name='input_x')
lstm_cell = tf.nn.rnn_cell.BasicLSTMCell(num_units=hidden_size)
initial_state = lstm_cell.zero_state(batch_size, dtype=tf.float32)
y = tf.unstack(x, axis=1)
with tf.variable_scope('static_rnn', initializer= tf.ones_initializer):
output, state = tf.nn.static_rnn(lstm_cell, y, initial_state=initial_state)
with tf.Session() as sess:
init_op = tf.initialize_all_variables()
sess.run(init_op)
result = (sess.run([output, state], feed_dict={x: input}))
print(result)
The outputs:
[[array([[0.7536526, 0.7536526, 0.7536526, 0.7536526]], dtype=float32),
array([[0.9631945, 0.9631945, 0.9631945, 0.9631945]], dtype=float32),
array([[0.9948382, 0.9948382, 0.9948382, 0.9948382]], dtype=float32)], LSTMStateTuple(c=array([[2.9925175, 2.9925175, 2.9925175, 2.9925175]], dtype=float32), h=array([[0.9948382, 0.9948382, 0.9948382, 0.9948382]], dtype=float32))]
The first cell get exactly equal output, but since the second cell, the manual building seem has no connection with its preceding and succeeding cell --the outputs of the 3 cells are same. I think the manual code is wrong, but I can't find how to connect the BasicLSTMCell s. Help!
Thank #Susmit Agrawal, I change my code as:
for i in range(num_steps):
if i > 0:
output = lstm_cell(x[:, i, :], outputs[i-1][1])
else:
output = lstm_cell(x[:, i, :], z_state)
outputs.append(output)
This produces correct outputs which is same as static_rnn.

How to get the Jacobian matrix form derivative of vector by vector in TensorFlow Eager Execution API?

In the MLP model the input of layer l can be computed by this formula:
z = Wa + b
W is the weight matrix between layer l-1 and layer l, a is the output signal of layer l-1 neuron, b is the bias of layer l.
For example:
I want to use TensorFlow Eager Execution API to get the derivatives:
I define a function to calculate the value of z:
def f002(W, a, b):
return tf.matmul(W, a) + b
My main program:
def test001(args={}):
tf.enable_eager_execution()
tfe = tf.contrib.eager
a = tf.reshape(tf.constant([1.0, 2.0, 3.0]), [3, 1])
W = tf.constant([[4.0, 5.0, 6.0],[7.0, 8.0, 9.0]])
b = tf.reshape(tf.constant([1001.0, 1002.0]), [2, 1])
z = f002(W, a, b)
print(z)
grad_f1 = tfe.gradients_function(f002)
dv = grad_f1(W, a, b)
print(dv)
I can get the correct value of z in forward mode. But when print the derivative results it displayed something like these:
[<tf.Tensor: id=17, shape=(2, 3), dtype=float32, numpy=
array([[1., 2., 3.],
[1., 2., 3.]], dtype=float32)>, <tf.Tensor: id=18, shape=(3, 1),
dtype=float32, numpy=
array([[11.],
[13.],
[15.]], dtype=float32)>, <tf.Tensor: id=16, shape=(2, 1),
dtype=float32, numpy=
array([[1.],
[1.]], dtype=float32)>]
This is not what I want. How to get the Jacobian matrix derivative result of vector by vector?