backward, grad function in pytorch - variables

I'm trying to implement backward, grad function in pytorch.
But, I don't know why this value is returned.
Here is my code.
x = Variable(torch.FloatTensor([[1,2],[3,4]]), requires_grad=True)
y = x + 2
z = y * y
gradient = torch.ones(2, 2)
z.backward(gradient)
print(x.grad)
I think that result value should be [[6,8],[10,12]]
Because of dz/dx= 2*(x+2) and x=1,2,3,4
But returned value is [[7,9],[11,13]]
Why this is happened.. I want to know how gradient, grad function is doing.
Help me please..

The below piece of code on pytorch v0.12.1
import torch
from torch.autograd import Variable
x = Variable(torch.FloatTensor([[1,2],[3,4]]), requires_grad=True)
y = x + 2
z = y * y
gradient = torch.ones(2, 2)
z.backward(gradient)
print(x.grad)
returns
Variable containing:
6 8
10 12
[torch.FloatTensor of size 2x2]
Update your pytorch installation. This explains the working of autograd, which handles gradient computation for pytorch.

Related

Gradient in Pytorch and TensorFlow

I am new to PyTorch and Tensorflow and I would like to use them for solving ODEs and PDEs. My question is how to take a gradient of a vector (let's say Y=Y_(3*1) and Y = [Y1(X1,X2,X3), Y2(X1,X2,X3), Y3(X1,X2,X3)]^T with respect to the vector X = [X1,X2,X3]^T to get the following matrix with both PyTorch and TensorFlow (Keras).
F = [[Y11, Y12, Y13] , [Y21, Y22, Y23], [Y31, Y32, Y33]]
where
Yij = dYi/dXj
Thanks
I expect to get
F = [[Y11, Y12, Y13] , [Y21, Y22, Y23], [Y31, Y32, Y33]]
where
Yij = dYi/dXj

LSTM from scratch in tensorflow 2

I'm trying to make LSTM in tensorflow 2.1 from scratch, without using the one already supplied with keras (tf.keras.layers.LSTM), just to learn and code something. To do so, I've defined a class "Model" that when called (like with model(input)) it computes the matrix multiplications of the LSTM. I'm pasting here part of my code, the other parts are on github (link)
class Model(object):
[...]
def __call__(self, inputs):
assert inputs.shape == (vocab_size, T_steps)
outputs = []
for time_step in range(T_steps):
x = inputs[:,time_step]
x = tf.expand_dims(x,axis=1)
z = tf.concat([self.h_prev,x],axis=0)
f = tf.matmul(self.W_f, z) + self.b_f
f = tf.sigmoid(f)
i = tf.matmul(self.W_i, z) + self.b_i
i = tf.sigmoid(i)
o = tf.matmul(self.W_o, z) + self.b_o
o = tf.sigmoid(o)
C_bar = tf.matmul(self.W_C, z) + self.b_C
C_bar = tf.tanh(C_bar)
C = (f * self.C_prev) + (i * C_bar)
h = o * tf.tanh(C)
v = tf.matmul(self.W_v, h) + self.b_v
v = tf.sigmoid(v)
y = tf.math.softmax(v, axis=0)
self.h_prev = h
self.C_prev = C
outputs.append(y)
outputs = tf.squeeze(tf.stack(outputs,axis=1))
return outputs
But this neural netoworks has three problems:
1) it is way slow during training. In comparison a model that uses tf.keras.layers.LSTM() is trained more than 10 times faster. Why is this? Maybe because I didn't use a minibatch training, but a stochastic one?
2) the NN seems to not learn anything at all. After just some (very few!) training examples, the loss seems to settle down and it won't decrease anymore, but rather it oscillates around the reached value. After training, I tested the NN making it generate some text, but it just outputs non-sense gibberish. Why isn't learning anything?
3) the loss function outputs very high values. I've coded a categorical cross-entropy loss function but, with 100 characters long sequence, the value of the function is over 370 per training example. Shouldn't it be way lower than this?
I've wrote the loss function like this:
def compute_loss(predictions, desired_outputs):
l = 0
for i in range(T_steps):
l -= tf.math.log(predictions[desired_outputs[i], i])
return l
I know they're open questions, but unfortunately I can't make it works. So any answer, even a short answer that help me to make myself solve the problem, is fine :)

Tensorflow gradientTape explanation

I am trying to understand an API from tensorflow tf.gradientTape
Below is the code I get from the official website:
x = tf.constant(3.0)
with tf.GradientTape(persistent=True) as g:
g.watch(x)
y = x * x
z = y * y
dz_dx = g.gradient(z, x) # 108.0 (4*x^3 at x = 3)
dy_dx = g.gradient(y, x) # 6.0
I wanted to know how did they get dz_dx as 108 and dy_dx as 6?
I also did another test like below:
x = tf.constant(3.0)
with tf.GradientTape(persistent=True) as g:
g.watch(x)
y = x * x * x
z = y * y
dz_dx = g.gradient(z, x) # 1458.0
dy_dx = g.gradient(y, x) # 6.0
this time the dz_dx becomes 1458 and I do not know why at all. Could any expert show me how the calculation being done?
From y=x*x, we can have dy/dx=2*x. From z=y*y, we have dz/dy=2*y. According to the chain rule, dz/dx=(dz/dy)*(dy/dx)=(2*y)*(2*x)=(2*x*x)*(2*x)=108. dy/dx=2*x=6. The same derivation for your second example. BTW, in your second example, dy/dx should be 27 instead of 6.

Add a constant variable to a cuda.FloatTensor

I have two question:
1) I'd like to know how can I add/subtract a constante torch.FloatTensor of size 1 to all of the elemets of a torch.FloatTensor of size 30.
2) How can I multiply each element of a torch.FloatTensor of size 30 by a random value (different or not for each).
My code:
import torch
dtype = torch.cuda.FloatTensor
def main():
pop, xmax, xmin = 30, 5, -5
x = (xmax-xmin)*torch.rand(pop).type(dtype)+xmin
y = torch.pow(x, 2)
[miny, indexmin] = y.min(0)
gxbest = x[indexmin]
pxbest = x
pybest = y
v = torch.rand(pop)
vnext = torch.rand()*v + torch.rand()*(pxbest - x) + torch.rand()*(gxbest - x)
main()
What is the best way to do it? I think I should so how convert the gxbest into a torch.FloatTensor of size 30 but how can I do that?
I've try to create a vector:
Variable(torch.from_numpy(np.ones(pop)))*gxbest
But it did not work. The multiplication is not working also.
RuntimeError: inconsistent tensor size
Thank you all for your help!
1) How can I add/subtract a constant torch.FloatTensor of size 1 to all of the elements of a torch.FloatTensor of size 30?
You can do it directly in pytorch 0.2.
import torch
a = torch.randn(30)
b = torch.randn(1)
print(a-b)
In case if you get any error due to size mismatch, you can make a small change as follows.
print(a-b.expand(a.size(0))) # to make both a and b tensor of same shape
2) How can I multiply each element of a torch.FloatTensor of size 30 by a random value (different or not for each)?
In pytorch 0.2, you can do it directly as well.
import torch
a = torch.randn(30)
b = torch.randn(1)
print(a*b)
In case, if you get an error due to size mismatch, do as follows.
print(a*b.expand(a.size(0)))
So, in your case you can simply change the size of gxbest tensor from 1 to 30 as follows.
gxbest = gxbest.expand(30)

hessian of a variable returned by tf.concat() is None

Let x and y be vectors of length N, and z is a function z = f(x,y). In Tensorflow v1.0.0, tf.hessians(z,x) and tf.hessians(z,y) both returns an N by N matrix, which is what I expected.
However, when I concatenate the x and y into a vector p of size 2*N using tf.concat, and run tf.hessian(z, p), it returns error "ValueError: None values not supported."
I understand this is because in the computation graph x,y ->z and x,y -> p, so there is no gradient between p and z. To circumvent the problem, I can create p first, slice it into x and y, but I will have to change a ton of my code. Is there a more elegant way?
related question: Slice of a variable returns gradient None
import tensorflow as tf
import numpy as np
N = 2
A = tf.Variable(np.random.rand(N,N).astype(np.float32))
B = tf.Variable(np.random.rand(N,N).astype(np.float32))
x = tf.Variable(tf.random_normal([N]) )
y = tf.Variable(tf.random_normal([N]) )
#reshape to N by 1
x_1 = tf.reshape(x,[N,1])
y_1 = tf.reshape(y,[N,1])
#concat x and y to form a vector with length of 2*N
p = tf.concat([x,y],axis = 0)
#define the function
z = 0.5*tf.matmul(tf.matmul(tf.transpose(x_1), A), x_1) + 0.5*tf.matmul(tf.matmul(tf.transpose(y_1), B), y_1) + 100
#works , hx and hy are both N by N matrix
hx = tf.hessians(z,x)
hy = tf.hessians(z,y)
#this gives error "ValueError: None values not supported."
#expecting a matrix of size 2*N by 2*N
hp = tf.hessians(z,p)
Compute the hessian by its definition.
gxy = tf.gradients(z, [x, y])
gp = tf.concat([gxy[0], gxy[1]], axis=0)
hp = []
for i in range(2*N):
hp.append(tf.gradients(gp[i], [x, y]))
Because tf.gradients computes the sum of (dy/dx), so when computing the second partial derivative, one should slice the vector into scalars and then compute the gradient. Tested on tf1.0 and python2.