what is the difference between s[:] and s if s is a torch.Tensor [duplicate] - indexing

import numpy as np
import time
features, labels = d2l.get_data_ch7()
def init_adam_states():
v_w, v_b = torch.zeros((features.shape[1], 1),dtype=torch.float32), torch.zeros(1, dtype=torch.float32)
s_w, s_b = torch.zeros((features.shape[1], 1),dtype=torch.float32), torch.zeros(1, dtype=torch.float32)
return ((v_w, s_w), (v_b, s_b))
def adam(params, states, hyperparams):
beta1, beta2, eps = 0.9, 0.999, 1e-6
for p, (v, s) in zip(params, states):
v[:] = beta1 * v + (1 - beta1) * p.grad.data
s = beta2 * s + (1 - beta2) * p.grad.data**2
v_bias_corr = v / (1 - beta1 ** hyperparams['t'])
s_bias_corr = s / (1 - beta2 ** hyperparams['t'])
p.data -= hyperparams['lr'] * v_bias_corr / (torch.sqrt(s_bias_corr) + eps)
hyperparams['t'] += 1
def train_ch7(optimizer_fn, states, hyperparams, features, labels, batch_size=10, num_epochs=2):
# 初始化模型
net, loss = d2l.linreg, d2l.squared_loss
w = torch.nn.Parameter(torch.tensor(np.random.normal(0, 0.01, size=(features.shape[1], 1)), dtype=torch.float32),
requires_grad=True)
b = torch.nn.Parameter(torch.zeros(1, dtype=torch.float32), requires_grad=True)
def eval_loss():
return loss(net(features, w, b), labels).mean().item()
ls = [eval_loss()]
data_iter = torch.utils.data.DataLoader(torch.utils.data.TensorDataset(features, labels), batch_size, shuffle=True)
for _ in range(num_epochs):
start = time.time()
print(w)
print(b)
for batch_i, (X, y) in enumerate(data_iter):
l = loss(net(X, w, b), y).mean() # 使⽤平均损失
# 梯度清零
if w.grad is not None:
w.grad.data.zero_()
b.grad.data.zero_()
l.backward()
optimizer_fn([w, b], states, hyperparams) # 迭代模型参数
if (batch_i + 1) * batch_size % 100 == 0:
ls.append(eval_loss()) # 每100个样本记录下当前训练误差
# 打印结果和作图
print('loss: %f, %f sec per epoch' % (ls[-1], time.time() - start))
d2l.set_figsize()
d2l.plt.plot(np.linspace(0, num_epochs, len(ls)), ls)
d2l.plt.xlabel('epoch')
d2l.plt.ylabel('loss')
train_ch7(adam, init_adam_states(), {'lr': 0.01, 't': 1}, features, labels)
I want to implement the Adam algorithm in the follow code and I feel confused in the function named adam.
v = beta1 * v + (1 - beta1) * p.grad.data
s = beta2 * s + (1 - beta2) * p.grad.data**2
when I use the follow code, the loss function curve is figure 1.
figure 1
v[:] = beta1 * v + (1 - beta1) * p.grad.data
s = beta2 * s + (1 - beta2) * p.grad.data**2
or
v = beta1 * v + (1 - beta1) * p.grad.data
s[:] = beta2 * s + (1 - beta2) * p.grad.data**2
when I use the follow code, the loss function curve is figure 2.
figure 2
v[:] = beta1 * v + (1 - beta1) * p.grad.data
s[:] = beta2 * s + (1 - beta2) * p.grad.data**2
when I use the follow code, the loss function curve is figure 3.
figure 3
The loss function curve in case 3 has always been smoother than that in case 1.
The loss function curve in case 2 sometimes can't converge.
Why is different?

To answer the first question,
v = beta1 * v + (1 - beta1) * p.grad.data
is an out-of-place operation. Remember that python variables are references to objects. By assigning a new value to variable v, the underlying object which v referred to before this assignment will not be changed. Instead the expression beta1 * v + (1 - beta1) * p.grad.data results in a new tensor which is then referred to by v.
On the other hand
v[:] = beta1 * v + (1 - beta1) * p.grad.data
is an in-place operation. After this operation v still refers to the same underlying object, and the elements of that tensor are modified and replaced with the values of the new tensor beta1 * v + (1 - beta1) * p.grad.data.
Take a look at the following 3 lines to see why this matters
for p, (v, s) in zip(params, states):
v[:] = beta1 * v + (1 - beta1) * p.grad.data
s[:] = beta2 * s + (1 - beta2) * p.grad.data**2
v and s are actually referring to tensors which are stored in states. If we do in-place operations then the values in states are changed to reflect the value assigned to v[:] and s[:].
If out-of-place operations are used then the values in states remain unchanged.

Related

How to calculate shifted distance gaussian map efficiently numpy

def build_gaussian_map(s, point, sigma=25):
x, y = point[0], point[1]
gmap = np.zeros(s)
for row in range(s[0]):
for col in range(s[1]):
gmap[row][col] = 1 / (2 * np.pi * sigma * sigma) * np.exp(-((x - row) * (x - row) + (y - col) * (y - col)) / (2 * sigma * sigma))
return gmap
s - 2D array shape
point - point coordinates
I am calculating distance gaussian map with a center in a certain point of image point. Can I do it somehow using matrix operations?
Result map example:
import numpy as np
def build_gaussian_map(s, point, sigma=25):
x, y = point[0], point[1]
gmap = np.zeros(s)
for row in range(s[0]):
for col in range(s[1]):
gmap[row][col] = 1 / (2 * np.pi * sigma * sigma) * np.exp(-((x - row) * (x - row) + (y - col) * (y - col)) / (2 * sigma * sigma))
return gmap
def build_gaussian_map2(shape, point, sigma=25):
x, y = point[0], point[1]
row, col = np.indices(shape)
gmap = 1 / (2 * np.pi * sigma * sigma) * np.exp(-((x - row) * (x - row) + (y - col) * (y - col)) / (2 * sigma * sigma))
return gmap
def main():
s = (1000, 1000)
result1 = build_gaussian_map(s, (100, 100))
result2 = build_gaussian_map2(s, (100, 100))
assert np.all(result1 == result2)
main()
Profiling results:
24 def main():
25 1 3.0 3.0 0.0 s = (1000, 1000)
26 1 6126705.0 6126705.0 98.2 result1 = build_gaussian_map(s, (100, 100))
27 1 105593.0 105593.0 1.7 result2 = build_gaussian_map2(s, (100, 100))
def gaussian_map(shape, point, sigma=20):
a = np.arange(shape[0])
b = np.arange(shape[1])
x_grid, y_grid = np.meshgrid(a, b)
return 1 / (2 * np.pi * sigma * sigma) * np.exp(- ((x_grid - point[0]) * (x_grid - point[0]) + (y_grid - point[1]) * (y_grid - point[1])) / (2 * sigma * sigma))
Thought up this function. It seems to be effective

Tensorflow: Understanding tf.contrib.layers.instance_norm graph

I'm trying to understand tf.contrib.layers.instance_norm graph:
according to this graph:
x = gamma * (x + x_mean) / x_std - beta
but it should be
x = gamma * (x - x_mean) / x_std + beta
I'm missing something?
I get it, it's x = gamma * (x - x_mean) / x_std + beta as it should be because in Sub order is beta - 'Sub op input'.

solve a system of nonlinear equations using scipy fsolve (math domain error encountered)

I tried to use Scipy's fsolve to find the answers to a system of two nonlinear equations.
The two equations are:
f1 = math.log(x) + 1. - ((1. + (m - 1)*x) / m) + chi * (1 - x)**2
f2 = math.log(1 - x) - (m - 1)*x + chi*m*x**2
m and chi are constants in this case. The essential goal is to find x, y that satisfies simultaneously f1(x) = f1(y) and f2(x) = f2(y). I know the initial guess for x, y are 0.3 and 0.99. Below is my code.
from scipy.optimize import fsolve
import math
# some global variables
m = 46.663
chi = 1.1500799949128826
def binodal_fsolve():
def equations(p):
x, y = p
out = []
out.append(math.log(x) + 1. - ((1. + (m - 1)*x) / m) + chi * (1 - x)**2 - (math.log(y) + 1. - ((1. + (m - 1)*y) / m) + chi * (1 - y)**2))
out.append(math.log(1 - x) - (m - 1)*x + chi*m*x**2 - (math.log(1 - y) - (m - 1)*y + chi*m*y**2))
return out
initial_guess = [0.3, 0.99]
ans = fsolve(equations, initial_guess)
return ans
def test_answers(phiL, phiR):
def functions(x):
return math.log(x) + 1. - ((1. + (m - 1)*x) / m) + chi * (1 - x)**2, math.log(1 - x) - (m - 1)*x + chi*m*x**2
return functions(phiL)[0], functions(phiR)[0], functions(phiL)[1], functions(phiR)[1]
print (test_answers(0.2542983070, 0.9999999274))
# (1.3598772108380786e-09, -1.5558330624053502e-09, -8.434988430355375, -8.435122589529684)
res = binodal_fsolve()
print (res)
When I executed the code, I always encountered the math domain error.
However, if I tried to solve it using MAPLE fsolve. I can get the answers (0.2542983070, 0.9999999274).
By plugging these back to the equations, I get (1.3598772108380786e-09, -1.5558330624053502e-09, -8.434988430355375, -8.435122589529684) which suggests the answers are correct.
I don't know how to make scipy fsolve work. Any suggestions will be greatly appreciated.
In this case you can use the log function from numpy.lib.scimath that returns a complex number when its argument is negative.
Instead of using scipy.optimize.fsolve, use scipy.optimize.root and change the method to lm which solves the system of nonlinear equations in a least squares sense using a modification of the Levenberg-Marquardt algorithm. For more methods, see the documentation.
from scipy.optimize import root
import numpy.lib.scimath as math
# some global variables
m = 46.663
chi = 1.1500799949128826
def binodal_fsolve():
def equations(p):
x, y = p
out = []
out.append(math.log(x) + 1. - ((1. + (m - 1)*x) / m) + chi * (1 - x)**2 - (math.log(y) + 1. - ((1. + (m - 1)*y) / m) + chi * (1 - y)**2))
out.append(math.log(1 - x) - (m - 1)*x + chi*m*x**2 - (math.log(1 - y) - (m - 1)*y + chi*m*y**2))
return out
initial_guess = [0.3, 0.99]
#ans = fsolve(equations, initial_guess)
ans = root(equations, initial_guess, method='lm')
return ans
def test_answers(phiL, phiR):
def functions(x):
return math.log(x) + 1. - ((1. + (m - 1)*x) / m) + chi * (1 - x)**2, math.log(1 - x) - (m - 1)*x + chi*m*x**2
return functions(phiL)[0], functions(phiR)[0], functions(phiL)[1], functions(phiR)[1]
print (test_answers(0.2542983070, 0.9999999274))
# (1.3598772108380786e-09, -1.5558330624053502e-09, -8.434988430355375, -8.435122589529684)
res = binodal_fsolve()
print (res)
Which gives the following roots x and y: : array([0.25429812, 0.99999993]).
The full output:
(1.3598772108380786e-09, -1.5558330624053502e-09, -8.434988430355375, -8.435122589529684)
/home/user/.local/lib/python3.6/site-packages/scipy/optimize/minpack.py:401: ComplexWarning: Casting complex values to real discards the imaginary part
gtol, maxfev, epsfcn, factor, diag)
cov_x: array([[6.49303571e-01, 8.37627537e-07],
[8.37627537e-07, 1.08484856e-12]])
fjac: array([[ 1.52933340e+07, -1.00000000e+00],
[-1.97290115e+01, -1.24101235e+00]])
fun: array([-2.22945317e-07, -7.20367503e-04])
ipvt: array([2, 1], dtype=int32)
message: 'The relative error between two consecutive iterates is at most 0.000000'
nfev: 84
qtf: array([-0.00338589, 0.00022828])
status: 2
success: True
x: array([0.25429812, 0.99999993])

Octave fminunc "trust region become excessively small"

I am trying to run a linear regression using fminunc to optimize my parameters. However, while the code never fails, the fminunc function seems to only be running once and not converging. The exit flag that the fminunc funtion returns is -3, which - according to documentation- means "The trust region radius became excessively small". What does this mean and how can I fix it?
This is my main:
load('data.mat');
% returns matrix X, a matrix of data
% Initliaze parameters
[m, n] = size(X);
X = [ones(m, 1), X];
initialTheta = zeros(n + 1, 1);
alpha = 1;
lambda = 0;
costfun = #(t) costFunction(t, X, surv, lambda, alpha);
options = optimset('GradObj', 'on', 'MaxIter', 1000);
[theta, cost, info] = fminunc(costfun, initialTheta, options);
And the cost function:
function [J, grad] = costFunction(theta, X, y, lambda, alpha)
%COSTFUNCTION Implements a logistic regression cost function.
% [J grad] = COSTFUNCTION(initialParameters, X, y, lambda) computes the cost
% and the gradient for the logistic regression.
%
m = size(X, 1);
J = 0;
grad = zeros(size(theta));
% un-regularized
z = X * theta;
J = (-1 / m) * y' * log(sigmoid(z)) + (1 - y)' * log(1 - sigmoid(z));
grad = (alpha / m) * X' * (sigmoid(z) - y);
% regularization
theta(1) = 0;
J = J + (lambda / (2 * m)) * (theta' * theta);
grad = grad + alpha * ((lambda / m) * theta);
endfunction
Any help is much appreciated.
There are a few issues with the code above:
Using the fminunc means you don't have to provide an alpha. Remove all instances of it from the code and your gradient functions should look like the following
grad = (1 / m) * X' * (sigmoid(z) - y);
and
grad = grad + ((lambda / m) * theta); % This isn't quite correct, see below
In the regularization of the grad, you can't use theta as you don't add in the theta for j = 0. There are a number ways to do this, but here is one
temp = theta;
temp(1) = 0;
grad = grad + ((lambda / m) * temp);
You missing a set of bracket in your cost function. The (-1 / m) is being applied only to a portion of the rest of the equation. It should look like.
J = (-1 / m) * ( y' * log(sigmoid(z)) + (1 - y)' * log(1 - sigmoid(z)) );
And finally, as a nit, a lambda value of 0 means that your regularization does nothing.

Quaternion addition like 3ds/gmax does with it's quats

A project I'm working on needs a function which mimics 3ds/gmax's quaternion addition. A test case of (quat 1 2 3 4)+(quat 3 5 7 9) should equal (quat 20 40 54 2). These quats are in xyzw.
So, I figure it's basic algebra, given the clean numbers. It's got to be something like this multiply function, since it doesn't involve sin/cos:
const quaternion &operator *=(const quaternion &q)
{
float x= v.x, y= v.y, z= v.z, sn= s*q.s - v*q.v;
v.x= y*q.v.z - z*q.v.y + s*q.v.x + x*q.s;
v.y= z*q.v.x - x*q.v.z + s*q.v.y + y*q.s;
v.z= x*q.v.y - y*q.v.x + s*q.v.z + z*q.s;
s= sn;
return *this;
}
source
But, I don't understand how sn= s*q.s - v*q.v is supposed to work. s is a float, v is vector. Multiply vectors and add to float?
I'm not even sure which terms of direction/rotation/orientation these values represent, but if the function satisfies the quat values above, it'll work.
Found it. Turns out to be known as multiplication. Addition is multiplication. Up is sideways. Not confusing at all :/
fn qAdd q1 q2 = (
x1=q1.x
y1=q1.y
z1=q1.z
w1=q1.w
x2=q2.x
y2=q2.y
z2=q2.z
w2=q2.w
W = (W1 * W2) - (X1 * X2) - (Y1 * Y2) - (Z1 * Z2)
X = (W1 * X2) + (X1 * W2) + (Y1 * Z2) - (Z1 * Y2)
Y = (W1 * Y2) + (Y1 * W2) + (Z1 * X2) - (X1 * Z2)
Z = (W1 * Z2) + (Z1 * W2) + (X1 * Y2) - (Y1 * X2)
return (quat x y z w)
)
Swapping q1 & q2 yields different results, quite neither like addition nor multiplication.
source