How to use #tf.custom_gradient with multiple outputs and inputs? - tensorflow

I have a function with 4 inputs (x1, x2, x3, x4) and 2 outputs (y1, y2) using Tensorflow. I would like to specify the gradients, since I perform some non-autodiff operations inside the function.
I need to specify the derivatives of the outputs with respect to the inputs. We can see these derivatives as a Jacobian of size (2,4). Regarding this, I have 8 derivatives: dy1_dx1, dy1_dx2, dy1_dx3, dy1_dx4, dy2_dx1, dy2_dx2, dy2_dx3 and dy2_dx4.
However, the grad function used in this tf.custom.gradient needs to have the same length as the inputs, this is 4. So, I do not know how to express 8 derivatives using just 4 elements. I tried to include them as lists, but it gives the error. Here is a general code to reproduce the error:
import tensorflow as tf
#tf.custom_gradient
def bar(x1, x2, x3, x4):
def grad(dy1, dy2):
dy1_dx1 = x2**2 * x3**3 * x4**4 #360000
dy1_dx2 = x1 * 2*x2 * x3**3 * x4**4 #480000
dy1_dx3 = x1 * x2**2 * 3*x3**2 * x4**4 #540000
dy1_dx4 = x1 * x2**2 * x3**3 * 4*x4**3 #576000
dy2_dx1 = x2**2 + x3**3 + x4**4 #698
dy2_dx2 = x1 + 2*x2 + x3**3 + x4**4 #697
dy2_dx3 = x1 + x2**2 + 3*x3**2 + x4**4 #684
dy2_dx4 = x1 + x2**2 + x3**3 + 4*x4**3 #575
return [dy1_dx1, dy2_dx1], [dy1_dx2, dy2_dx2], [dy1_dx3, dy2_dx3], [dy1_dx4, dy2_dx4]
y1 = x1 * x2**2 * x3**3 * x4**4
y2 = x1 + x2**2 + x3**3 + x4**4
return [y1, y2], grad
x1 = tf.constant(2.0, dtype=tf.float32)
x2 = tf.constant(3.0, dtype=tf.float32)
x3 = tf.constant(4.0, dtype=tf.float32)
x4 = tf.constant(5.0, dtype=tf.float32)
with tf.GradientTape(persistent=True) as tape:
tape.watch(x1)
tape.watch(x2)
tape.watch(x3)
tape.watch(x4)
z = bar(x1, x2, x3, x4)
print(tape.gradient(z, x1)) #[dy1_dx1, dy2_dx1]
print(tape.gradient(z, x2)) #[dy1_dx2, dy2_dx2]
print(tape.gradient(z, x3)) #[dy1_dx3, dy2_dx3]
print(tape.gradient(z, x4)) #[dy1_dx4, dy2_dx4]
The error says: "custom_gradient function expected to return 4 gradients, but returned 8 instead".
I expect someway to specify the correspondent 8 derivatives. Thank you in advance!

Related

what is the difference between s[:] and s if s is a torch.Tensor [duplicate]

import numpy as np
import time
features, labels = d2l.get_data_ch7()
def init_adam_states():
v_w, v_b = torch.zeros((features.shape[1], 1),dtype=torch.float32), torch.zeros(1, dtype=torch.float32)
s_w, s_b = torch.zeros((features.shape[1], 1),dtype=torch.float32), torch.zeros(1, dtype=torch.float32)
return ((v_w, s_w), (v_b, s_b))
def adam(params, states, hyperparams):
beta1, beta2, eps = 0.9, 0.999, 1e-6
for p, (v, s) in zip(params, states):
v[:] = beta1 * v + (1 - beta1) * p.grad.data
s = beta2 * s + (1 - beta2) * p.grad.data**2
v_bias_corr = v / (1 - beta1 ** hyperparams['t'])
s_bias_corr = s / (1 - beta2 ** hyperparams['t'])
p.data -= hyperparams['lr'] * v_bias_corr / (torch.sqrt(s_bias_corr) + eps)
hyperparams['t'] += 1
def train_ch7(optimizer_fn, states, hyperparams, features, labels, batch_size=10, num_epochs=2):
# 初始化模型
net, loss = d2l.linreg, d2l.squared_loss
w = torch.nn.Parameter(torch.tensor(np.random.normal(0, 0.01, size=(features.shape[1], 1)), dtype=torch.float32),
requires_grad=True)
b = torch.nn.Parameter(torch.zeros(1, dtype=torch.float32), requires_grad=True)
def eval_loss():
return loss(net(features, w, b), labels).mean().item()
ls = [eval_loss()]
data_iter = torch.utils.data.DataLoader(torch.utils.data.TensorDataset(features, labels), batch_size, shuffle=True)
for _ in range(num_epochs):
start = time.time()
print(w)
print(b)
for batch_i, (X, y) in enumerate(data_iter):
l = loss(net(X, w, b), y).mean() # 使⽤平均损失
# 梯度清零
if w.grad is not None:
w.grad.data.zero_()
b.grad.data.zero_()
l.backward()
optimizer_fn([w, b], states, hyperparams) # 迭代模型参数
if (batch_i + 1) * batch_size % 100 == 0:
ls.append(eval_loss()) # 每100个样本记录下当前训练误差
# 打印结果和作图
print('loss: %f, %f sec per epoch' % (ls[-1], time.time() - start))
d2l.set_figsize()
d2l.plt.plot(np.linspace(0, num_epochs, len(ls)), ls)
d2l.plt.xlabel('epoch')
d2l.plt.ylabel('loss')
train_ch7(adam, init_adam_states(), {'lr': 0.01, 't': 1}, features, labels)
I want to implement the Adam algorithm in the follow code and I feel confused in the function named adam.
v = beta1 * v + (1 - beta1) * p.grad.data
s = beta2 * s + (1 - beta2) * p.grad.data**2
when I use the follow code, the loss function curve is figure 1.
figure 1
v[:] = beta1 * v + (1 - beta1) * p.grad.data
s = beta2 * s + (1 - beta2) * p.grad.data**2
or
v = beta1 * v + (1 - beta1) * p.grad.data
s[:] = beta2 * s + (1 - beta2) * p.grad.data**2
when I use the follow code, the loss function curve is figure 2.
figure 2
v[:] = beta1 * v + (1 - beta1) * p.grad.data
s[:] = beta2 * s + (1 - beta2) * p.grad.data**2
when I use the follow code, the loss function curve is figure 3.
figure 3
The loss function curve in case 3 has always been smoother than that in case 1.
The loss function curve in case 2 sometimes can't converge.
Why is different?
To answer the first question,
v = beta1 * v + (1 - beta1) * p.grad.data
is an out-of-place operation. Remember that python variables are references to objects. By assigning a new value to variable v, the underlying object which v referred to before this assignment will not be changed. Instead the expression beta1 * v + (1 - beta1) * p.grad.data results in a new tensor which is then referred to by v.
On the other hand
v[:] = beta1 * v + (1 - beta1) * p.grad.data
is an in-place operation. After this operation v still refers to the same underlying object, and the elements of that tensor are modified and replaced with the values of the new tensor beta1 * v + (1 - beta1) * p.grad.data.
Take a look at the following 3 lines to see why this matters
for p, (v, s) in zip(params, states):
v[:] = beta1 * v + (1 - beta1) * p.grad.data
s[:] = beta2 * s + (1 - beta2) * p.grad.data**2
v and s are actually referring to tensors which are stored in states. If we do in-place operations then the values in states are changed to reflect the value assigned to v[:] and s[:].
If out-of-place operations are used then the values in states remain unchanged.

streamlit and optimisation function

hey i'm working with streamlit application and i have a problem on the part of optimisation function, I got as a results of the function the initial guess i don't know what the problem here is my function
def cons1(k):
y1 = k[0]
y2 = k[1]
return y2 - y1 - 0.0000000000000000000000000000000000000000000001
def cons2(k):
y1 = k[0]
y2 = k[1]
return y2 - s - 0.0000000000000000000000000000000000000000000001
def cons3(k):
y1 = k[0]
y2 = k[1]
return s - y1 - 0.0000000000000000000000000000000000000000000001
def arrivé(k):
y1 = k[0]
y2 = k[1]
return (BS_PUT(s, y2, T1, rd, rf, sigma1) - BS_CALL(s, y1, T1, rd, rf, sigma1)) ** 2
guess = [s, s]
cons3 = ({'type': 'ineq', 'fun': cons1,'type':'ineq','fun': cons2, 'type':'ineq','fun': cons3})
optimize2 = sci_opt.minimize(arrivé, guess, method='SLSQP', constraints=cons3, options={'disp': True})
this = optimize2.x[0]
that = optimize2.x[1]
#st.markdown(this)
#st.markdown(that)
g = BS_CALL(s, this, T1, rd, rf, sigma1)
#st.markdown(g)
f = BS_PUT(s, that, T1, rd, rf, sigma1)

Write STL_File from Matplotlib Mesh

My program works with matplotlib and calculates envelopes in y-z-section. This can be done for several sections and will be plotted as a 3dplot in a diagram. It is also quite easy to show an envelopes with a wireframe design.
Now i want to write a stl-file. For this i need to have a mesh. I have found a module its named numpy-stl that provides these functionality but i need to create the mesh.
Now i have found out that matplotlib has already included the functionality of creation a mesh.
Has someone experiences with this? Here is my code so far. Data comes from dictionary in the form: {'X' : {'Y' : [], 'Z' : []}}.
def lines_draw(self):
self.plot_data.clear()
self.plot_data = publish('3D_DATA_PLOT')
self.plot_Handling = []
a = []
x_keys = []
x_keys_collect1 = []
x_keys_collect2 = []
#print(self.plot_data)
_big_list_X = []
_big_list_Y = []
_big_list_Z = []
big_list_X = []
big_list_Y = []
big_list_Z = []
# Draw the section-lines
for key,val in self.plot_data.items():
a.clear()
for i in range(len(self.plot_data[key]['Y'])):
a.append(key)
x = np.array(a)
y = np.array(self.plot_data[key]['Y'])
z = np.array(self.plot_data[key]['Z'])
x = x.astype(np.float)
y = y.astype(np.float)
z = z.astype(np.float)
linesdraw, = self.ax.plot(x,y,z)
self.plot_Handling.append(linesdraw)
x_keys = list(self.plot_data)
dict_length = len(x_keys)
a=1
# Draw the wireframes
for i in range(dict_length-1):
x_keys_collect1.clear()
x_keys_collect2.clear()
for xi in self.plot_data[x_keys[i]]['Y']:
x_keys_collect1.append(x_keys[i])
a = a + 1
for xi in self.plot_data[x_keys[i+1]]['Y']:
x_keys_collect2.append(x_keys[i+1])
x1 = np.array(x_keys_collect1)
y1 = np.array(self.plot_data[x_keys[i]]['Y'])
z1 = np.array(self.plot_data[x_keys[i]]['Z'])
x2 = np.array(x_keys_collect2)
y2 = np.array(self.plot_data[x_keys[i+1]]['Y'])
z2 = np.array(self.plot_data[x_keys[i+1]]['Z'])
x1 = x1.astype(np.float)
y1 = y1.astype(np.float)
z1 = z1.astype(np.float)
x2 = x2.astype(np.float)
y2 = y2.astype(np.float)
z2 = z2.astype(np.float)
# i1, h1 = np.meshgrid(np.arange(a-1), np.linspace(x1[0],x2[0],5))
# print(np.linspace(x1[0],x2[0],5))
# i1 = i1.astype(np.int)
# h1 = h1.astype(np.int)
# X = (y2[i1] - y1[i1]) / (x2 - x1) * (h1 - x1) + y1[i1]
# Y = (z2[i1] - z1[i1]) / (x2 - x1) * (h1 - x1) + z1[i1]
# self.ax.plot_surface(h1,X, Y, color='m', alpha=0.3, linewidth=0)
# y1 = -y1.astype(np.float)
# y2 = -y2.astype(np.float)
# i1, h1 = np.meshgrid(np.arange(a-1), np.linspace(x1[0],x2[0],5))
# i1 = i1.astype(np.int)
# h1 = h1.astype(np.int)
# X = (y2[i1] - y1[i1]) / (x2 - x1) * (h1 - x1) + y1[i1]
# Y = (z2[i1] - z1[i1]) / (x2 - x1) * (h1 - x1) + z1[i1]
# self.ax.plot_surface(h1,X, Y, color='m', alpha=0.3, linewidth=0)
big_list_X = np.array(x_keys_collect1 + x_keys_collect2)
big_list_Y = np.array(self.plot_data[x_keys[i]]['Y'] + self.plot_data[x_keys[i+1]]['Y'])
big_list_Z = np.array(self.plot_data[x_keys[i]]['Z'] + self.plot_data[x_keys[i+1]]['Z'])
big_list_X = big_list_X.astype(np.float)
big_list_Y = big_list_Y.astype(np.float)
big_list_Z = big_list_Z.astype(np.float)
print(big_list_X)
# print(big_list_Y)
# print(big_list_Z)
# big_list_X, big_list_Z = np.meshgrid(big_list_X,big_list_Z)
# big_list_X, big_list_Z = big_list_X.flatten(), big_list_Z.flatten()
# tri = mtri.Triangulation(_big_list_X,_big_list_Z)
print(big_list_X)
# print(big_list_Y)
# print(big_list_Z)
self.ax.plot_trisurf(big_list_X,big_list_Y,big_list_Z, cmap=cm.jet, linewidth=0)#,triangles=tri.triangles)#,cmap=plt.cm.Spectral)
a = 1
# tri = mtri.Triangulation(X,Y)
# self.ax.plot_trisurf(h1,X,Y,triangles=tri.triangles,cmap=plt.cm.Spectral)
self.canvas.draw()

Quaternion addition like 3ds/gmax does with it's quats

A project I'm working on needs a function which mimics 3ds/gmax's quaternion addition. A test case of (quat 1 2 3 4)+(quat 3 5 7 9) should equal (quat 20 40 54 2). These quats are in xyzw.
So, I figure it's basic algebra, given the clean numbers. It's got to be something like this multiply function, since it doesn't involve sin/cos:
const quaternion &operator *=(const quaternion &q)
{
float x= v.x, y= v.y, z= v.z, sn= s*q.s - v*q.v;
v.x= y*q.v.z - z*q.v.y + s*q.v.x + x*q.s;
v.y= z*q.v.x - x*q.v.z + s*q.v.y + y*q.s;
v.z= x*q.v.y - y*q.v.x + s*q.v.z + z*q.s;
s= sn;
return *this;
}
source
But, I don't understand how sn= s*q.s - v*q.v is supposed to work. s is a float, v is vector. Multiply vectors and add to float?
I'm not even sure which terms of direction/rotation/orientation these values represent, but if the function satisfies the quat values above, it'll work.
Found it. Turns out to be known as multiplication. Addition is multiplication. Up is sideways. Not confusing at all :/
fn qAdd q1 q2 = (
x1=q1.x
y1=q1.y
z1=q1.z
w1=q1.w
x2=q2.x
y2=q2.y
z2=q2.z
w2=q2.w
W = (W1 * W2) - (X1 * X2) - (Y1 * Y2) - (Z1 * Z2)
X = (W1 * X2) + (X1 * W2) + (Y1 * Z2) - (Z1 * Y2)
Y = (W1 * Y2) + (Y1 * W2) + (Z1 * X2) - (X1 * Z2)
Z = (W1 * Z2) + (Z1 * W2) + (X1 * Y2) - (Y1 * X2)
return (quat x y z w)
)
Swapping q1 & q2 yields different results, quite neither like addition nor multiplication.
source

Generating a random Gaussian double in Objective-C/C

I'm trying to generate a random Gaussian double in Objective-C (the same as random.nextGaussian in Java). However rand_gauss() doesn't seem to work. Anyone know a way of achieving this?
This link shows how to calculate it using the standard random() function.
I should note that you'll likely have to make the ranf() routine that converts the output of random() from [0,MAX_INT] to be from [0,1], but that shouldn't be too difficult.
From the linked article:
The polar form of the Box-Muller transformation is both faster and more robust numerically. The algorithmic description of it is:
float x1, x2, w, y1, y2;
do {
x1 = 2.0 * ranf() - 1.0;
x2 = 2.0 * ranf() - 1.0;
w = x1 * x1 + x2 * x2;
} while ( w >= 1.0 );
w = sqrt( (-2.0 * ln( w ) ) / w );
y1 = x1 * w;
y2 = x2 * w;