Octave fminunc "trust region become excessively small" - optimization

I am trying to run a linear regression using fminunc to optimize my parameters. However, while the code never fails, the fminunc function seems to only be running once and not converging. The exit flag that the fminunc funtion returns is -3, which - according to documentation- means "The trust region radius became excessively small". What does this mean and how can I fix it?
This is my main:
load('data.mat');
% returns matrix X, a matrix of data
% Initliaze parameters
[m, n] = size(X);
X = [ones(m, 1), X];
initialTheta = zeros(n + 1, 1);
alpha = 1;
lambda = 0;
costfun = #(t) costFunction(t, X, surv, lambda, alpha);
options = optimset('GradObj', 'on', 'MaxIter', 1000);
[theta, cost, info] = fminunc(costfun, initialTheta, options);
And the cost function:
function [J, grad] = costFunction(theta, X, y, lambda, alpha)
%COSTFUNCTION Implements a logistic regression cost function.
% [J grad] = COSTFUNCTION(initialParameters, X, y, lambda) computes the cost
% and the gradient for the logistic regression.
%
m = size(X, 1);
J = 0;
grad = zeros(size(theta));
% un-regularized
z = X * theta;
J = (-1 / m) * y' * log(sigmoid(z)) + (1 - y)' * log(1 - sigmoid(z));
grad = (alpha / m) * X' * (sigmoid(z) - y);
% regularization
theta(1) = 0;
J = J + (lambda / (2 * m)) * (theta' * theta);
grad = grad + alpha * ((lambda / m) * theta);
endfunction
Any help is much appreciated.

There are a few issues with the code above:
Using the fminunc means you don't have to provide an alpha. Remove all instances of it from the code and your gradient functions should look like the following
grad = (1 / m) * X' * (sigmoid(z) - y);
and
grad = grad + ((lambda / m) * theta); % This isn't quite correct, see below
In the regularization of the grad, you can't use theta as you don't add in the theta for j = 0. There are a number ways to do this, but here is one
temp = theta;
temp(1) = 0;
grad = grad + ((lambda / m) * temp);
You missing a set of bracket in your cost function. The (-1 / m) is being applied only to a portion of the rest of the equation. It should look like.
J = (-1 / m) * ( y' * log(sigmoid(z)) + (1 - y)' * log(1 - sigmoid(z)) );
And finally, as a nit, a lambda value of 0 means that your regularization does nothing.

Related

is there an R**2 values for finding in the linear regression analysis?

I'm trying to code a for linear regression analysis that prints TypeError: can't multiply sequence by non-int of type 'list',.
I tried to learn linear regression coefficient analysis
def corr_coef(x,y):
N = len(x)
num = (N * (x * y).sum()) - (x.sum() * y.sum())
den = np.sqrt((N * (x**2).sum() - x.sum()**2) * (N * (y**2).sum() - y.sum()**2))
R = num / den
return R
num = (N * (x * y).sum()) - (x.sum() * y.sum())
TypeError: can't multiply sequence by non-int of type 'list'

what is the difference between s[:] and s if s is a torch.Tensor [duplicate]

import numpy as np
import time
features, labels = d2l.get_data_ch7()
def init_adam_states():
v_w, v_b = torch.zeros((features.shape[1], 1),dtype=torch.float32), torch.zeros(1, dtype=torch.float32)
s_w, s_b = torch.zeros((features.shape[1], 1),dtype=torch.float32), torch.zeros(1, dtype=torch.float32)
return ((v_w, s_w), (v_b, s_b))
def adam(params, states, hyperparams):
beta1, beta2, eps = 0.9, 0.999, 1e-6
for p, (v, s) in zip(params, states):
v[:] = beta1 * v + (1 - beta1) * p.grad.data
s = beta2 * s + (1 - beta2) * p.grad.data**2
v_bias_corr = v / (1 - beta1 ** hyperparams['t'])
s_bias_corr = s / (1 - beta2 ** hyperparams['t'])
p.data -= hyperparams['lr'] * v_bias_corr / (torch.sqrt(s_bias_corr) + eps)
hyperparams['t'] += 1
def train_ch7(optimizer_fn, states, hyperparams, features, labels, batch_size=10, num_epochs=2):
# 初始化模型
net, loss = d2l.linreg, d2l.squared_loss
w = torch.nn.Parameter(torch.tensor(np.random.normal(0, 0.01, size=(features.shape[1], 1)), dtype=torch.float32),
requires_grad=True)
b = torch.nn.Parameter(torch.zeros(1, dtype=torch.float32), requires_grad=True)
def eval_loss():
return loss(net(features, w, b), labels).mean().item()
ls = [eval_loss()]
data_iter = torch.utils.data.DataLoader(torch.utils.data.TensorDataset(features, labels), batch_size, shuffle=True)
for _ in range(num_epochs):
start = time.time()
print(w)
print(b)
for batch_i, (X, y) in enumerate(data_iter):
l = loss(net(X, w, b), y).mean() # 使⽤平均损失
# 梯度清零
if w.grad is not None:
w.grad.data.zero_()
b.grad.data.zero_()
l.backward()
optimizer_fn([w, b], states, hyperparams) # 迭代模型参数
if (batch_i + 1) * batch_size % 100 == 0:
ls.append(eval_loss()) # 每100个样本记录下当前训练误差
# 打印结果和作图
print('loss: %f, %f sec per epoch' % (ls[-1], time.time() - start))
d2l.set_figsize()
d2l.plt.plot(np.linspace(0, num_epochs, len(ls)), ls)
d2l.plt.xlabel('epoch')
d2l.plt.ylabel('loss')
train_ch7(adam, init_adam_states(), {'lr': 0.01, 't': 1}, features, labels)
I want to implement the Adam algorithm in the follow code and I feel confused in the function named adam.
v = beta1 * v + (1 - beta1) * p.grad.data
s = beta2 * s + (1 - beta2) * p.grad.data**2
when I use the follow code, the loss function curve is figure 1.
figure 1
v[:] = beta1 * v + (1 - beta1) * p.grad.data
s = beta2 * s + (1 - beta2) * p.grad.data**2
or
v = beta1 * v + (1 - beta1) * p.grad.data
s[:] = beta2 * s + (1 - beta2) * p.grad.data**2
when I use the follow code, the loss function curve is figure 2.
figure 2
v[:] = beta1 * v + (1 - beta1) * p.grad.data
s[:] = beta2 * s + (1 - beta2) * p.grad.data**2
when I use the follow code, the loss function curve is figure 3.
figure 3
The loss function curve in case 3 has always been smoother than that in case 1.
The loss function curve in case 2 sometimes can't converge.
Why is different?
To answer the first question,
v = beta1 * v + (1 - beta1) * p.grad.data
is an out-of-place operation. Remember that python variables are references to objects. By assigning a new value to variable v, the underlying object which v referred to before this assignment will not be changed. Instead the expression beta1 * v + (1 - beta1) * p.grad.data results in a new tensor which is then referred to by v.
On the other hand
v[:] = beta1 * v + (1 - beta1) * p.grad.data
is an in-place operation. After this operation v still refers to the same underlying object, and the elements of that tensor are modified and replaced with the values of the new tensor beta1 * v + (1 - beta1) * p.grad.data.
Take a look at the following 3 lines to see why this matters
for p, (v, s) in zip(params, states):
v[:] = beta1 * v + (1 - beta1) * p.grad.data
s[:] = beta2 * s + (1 - beta2) * p.grad.data**2
v and s are actually referring to tensors which are stored in states. If we do in-place operations then the values in states are changed to reflect the value assigned to v[:] and s[:].
If out-of-place operations are used then the values in states remain unchanged.

solve a system of nonlinear equations using scipy fsolve (math domain error encountered)

I tried to use Scipy's fsolve to find the answers to a system of two nonlinear equations.
The two equations are:
f1 = math.log(x) + 1. - ((1. + (m - 1)*x) / m) + chi * (1 - x)**2
f2 = math.log(1 - x) - (m - 1)*x + chi*m*x**2
m and chi are constants in this case. The essential goal is to find x, y that satisfies simultaneously f1(x) = f1(y) and f2(x) = f2(y). I know the initial guess for x, y are 0.3 and 0.99. Below is my code.
from scipy.optimize import fsolve
import math
# some global variables
m = 46.663
chi = 1.1500799949128826
def binodal_fsolve():
def equations(p):
x, y = p
out = []
out.append(math.log(x) + 1. - ((1. + (m - 1)*x) / m) + chi * (1 - x)**2 - (math.log(y) + 1. - ((1. + (m - 1)*y) / m) + chi * (1 - y)**2))
out.append(math.log(1 - x) - (m - 1)*x + chi*m*x**2 - (math.log(1 - y) - (m - 1)*y + chi*m*y**2))
return out
initial_guess = [0.3, 0.99]
ans = fsolve(equations, initial_guess)
return ans
def test_answers(phiL, phiR):
def functions(x):
return math.log(x) + 1. - ((1. + (m - 1)*x) / m) + chi * (1 - x)**2, math.log(1 - x) - (m - 1)*x + chi*m*x**2
return functions(phiL)[0], functions(phiR)[0], functions(phiL)[1], functions(phiR)[1]
print (test_answers(0.2542983070, 0.9999999274))
# (1.3598772108380786e-09, -1.5558330624053502e-09, -8.434988430355375, -8.435122589529684)
res = binodal_fsolve()
print (res)
When I executed the code, I always encountered the math domain error.
However, if I tried to solve it using MAPLE fsolve. I can get the answers (0.2542983070, 0.9999999274).
By plugging these back to the equations, I get (1.3598772108380786e-09, -1.5558330624053502e-09, -8.434988430355375, -8.435122589529684) which suggests the answers are correct.
I don't know how to make scipy fsolve work. Any suggestions will be greatly appreciated.
In this case you can use the log function from numpy.lib.scimath that returns a complex number when its argument is negative.
Instead of using scipy.optimize.fsolve, use scipy.optimize.root and change the method to lm which solves the system of nonlinear equations in a least squares sense using a modification of the Levenberg-Marquardt algorithm. For more methods, see the documentation.
from scipy.optimize import root
import numpy.lib.scimath as math
# some global variables
m = 46.663
chi = 1.1500799949128826
def binodal_fsolve():
def equations(p):
x, y = p
out = []
out.append(math.log(x) + 1. - ((1. + (m - 1)*x) / m) + chi * (1 - x)**2 - (math.log(y) + 1. - ((1. + (m - 1)*y) / m) + chi * (1 - y)**2))
out.append(math.log(1 - x) - (m - 1)*x + chi*m*x**2 - (math.log(1 - y) - (m - 1)*y + chi*m*y**2))
return out
initial_guess = [0.3, 0.99]
#ans = fsolve(equations, initial_guess)
ans = root(equations, initial_guess, method='lm')
return ans
def test_answers(phiL, phiR):
def functions(x):
return math.log(x) + 1. - ((1. + (m - 1)*x) / m) + chi * (1 - x)**2, math.log(1 - x) - (m - 1)*x + chi*m*x**2
return functions(phiL)[0], functions(phiR)[0], functions(phiL)[1], functions(phiR)[1]
print (test_answers(0.2542983070, 0.9999999274))
# (1.3598772108380786e-09, -1.5558330624053502e-09, -8.434988430355375, -8.435122589529684)
res = binodal_fsolve()
print (res)
Which gives the following roots x and y: : array([0.25429812, 0.99999993]).
The full output:
(1.3598772108380786e-09, -1.5558330624053502e-09, -8.434988430355375, -8.435122589529684)
/home/user/.local/lib/python3.6/site-packages/scipy/optimize/minpack.py:401: ComplexWarning: Casting complex values to real discards the imaginary part
gtol, maxfev, epsfcn, factor, diag)
cov_x: array([[6.49303571e-01, 8.37627537e-07],
[8.37627537e-07, 1.08484856e-12]])
fjac: array([[ 1.52933340e+07, -1.00000000e+00],
[-1.97290115e+01, -1.24101235e+00]])
fun: array([-2.22945317e-07, -7.20367503e-04])
ipvt: array([2, 1], dtype=int32)
message: 'The relative error between two consecutive iterates is at most 0.000000'
nfev: 84
qtf: array([-0.00338589, 0.00022828])
status: 2
success: True
x: array([0.25429812, 0.99999993])

Look-at quaternion using up vector

I have a camera (in a custom 3D engine) that accepts a quaternion for the rotation transform. I have two 3D points representing a camera and an object to look at. I want to calculate the quaternion that looks from the camera to the object, while respecting the world up axis.
This question asks for the same thing without the "up" vector. All three answers result in the camera pointing in the correct direction, but rolling (as in yaw/pitch/roll; imagine leaning your head onto your ear while looking at something).
I can calculate an orthonormal basis of vectors that match the desired coordinate system by:
lookAt = normalize(target - camera)
sideaxis = cross(lookAt, worldUp)
rotatedup = cross(sideaxis, lookAt)
How can I create a quaternion from those three vectors? This question asks for the same thing...but unfortunately the only and accepted answer says ~"let's assume you don't care about roll", and then goes about ignoring the up axis. I do care about roll. I don't want to ignore the up axis.
A previous answer has given a valid solution using angles. This answer will present an alternative method.
The orthonormal basis vectors, renaming them F = lookAt, R = sideaxis, U = rotatedup, directly form the columns of the 3x3 rotation matrix which is equivalent to your desired quaternion:
Multiplication with a vector is equivalent to using said vector's components as the coordinates in the camera's basis.
A 3x3 rotation matrix can be converted into a quaternion without conversion to angles / use of costly trigonometric functions. Below is a numerically stable C++ snippet which does this, returning a normalized quaternion:
inline void CalculateRotation( Quaternion& q ) const {
float trace = a[0][0] + a[1][1] + a[2][2];
if( trace > 0 ) {
float s = 0.5f / sqrtf(trace + 1.0f);
q.w = 0.25f / s;
q.x = ( a[2][1] - a[1][2] ) * s;
q.y = ( a[0][2] - a[2][0] ) * s;
q.z = ( a[1][0] - a[0][1] ) * s;
} else {
if ( a[0][0] > a[1][1] && a[0][0] > a[2][2] ) {
float s = 2.0f * sqrtf( 1.0f + a[0][0] - a[1][1] - a[2][2]);
q.w = (a[2][1] - a[1][2] ) / s;
q.x = 0.25f * s;
q.y = (a[0][1] + a[1][0] ) / s;
q.z = (a[0][2] + a[2][0] ) / s;
} else if (a[1][1] > a[2][2]) {
float s = 2.0f * sqrtf( 1.0f + a[1][1] - a[0][0] - a[2][2]);
q.w = (a[0][2] - a[2][0] ) / s;
q.x = (a[0][1] + a[1][0] ) / s;
q.y = 0.25f * s;
q.z = (a[1][2] + a[2][1] ) / s;
} else {
float s = 2.0f * sqrtf( 1.0f + a[2][2] - a[0][0] - a[1][1] );
q.w = (a[1][0] - a[0][1] ) / s;
q.x = (a[0][2] + a[2][0] ) / s;
q.y = (a[1][2] + a[2][1] ) / s;
q.z = 0.25f * s;
}
}
}
Source: http://www.euclideanspace.com/maths/geometry/rotations/conversions/matrixToQuaternion
Converting this to suit your situation is of course just a matter of swapping the matrix elements with the corresponding vector components:
// your code from before
F = normalize(target - camera); // lookAt
R = normalize(cross(F, worldUp)); // sideaxis
U = cross(R, F); // rotatedup
// note that R needed to be re-normalized
// since F and worldUp are not necessary perpendicular
// so must remove the sin(angle) factor of the cross-product
// same not true for U because dot(R, F) = 0
// adapted source
Quaternion q;
double trace = R.x + U.y + F.z;
if (trace > 0.0) {
double s = 0.5 / sqrt(trace + 1.0);
q.w = 0.25 / s;
q.x = (U.z - F.y) * s;
q.y = (F.x - R.z) * s;
q.z = (R.y - U.x) * s;
} else {
if (R.x > U.y && R.x > F.z) {
double s = 2.0 * sqrt(1.0 + R.x - U.y - F.z);
q.w = (U.z - F.y) / s;
q.x = 0.25 * s;
q.y = (U.x + R.y) / s;
q.z = (F.x + R.z) / s;
} else if (U.y > F.z) {
double s = 2.0 * sqrt(1.0 + U.y - R.x - F.z);
q.w = (F.x - R.z) / s;
q.x = (U.x + R.y) / s;
q.y = 0.25 * s;
q.z = (F.y + U.z) / s;
} else {
double s = 2.0 * sqrt(1.0 + F.z - R.x - U.y);
q.w = (R.y - U.x) / s;
q.x = (F.x + R.z) / s;
q.y = (F.y + U.z) / s;
q.z = 0.25 * s;
}
}
(And needless to say swap y and z if you're using OpenGL.)
Assume you initially have three ortonormal vectors: worldUp, worldFront and worldSide, and lets use your equations for lookAt, sideAxis and rotatedUp. The worldSide vector will not be necessary to achieve the result.
Break the operation in two. First, rotate around worldUp. Then rotate around sideAxis, which will now actually be parallel to the rotated worldSide.
Axis1 = worldUp
Angle1 = (see below)
Axis2 = cross(lookAt, worldUp) = sideAxis
Angle2 = (see below)
Each of these rotations correspond to a quaternion using:
Q = cos(Angle/2) + i * Axis_x * sin(Angle/2) + j * Axis_y * sin(Angle/2) + k * Axis_z * sin(Angle/2)
Multiply both Q1 and Q2 and you get the desired quaternion.
Details for the angles:
Let P(worldUp) be the projection matrix on the worldUp direction, i.e., P(worldUp).v = cos(worldUp,v).worldUp or using kets and bras, P(worldUp) = |worldUp >< worldUp|. Let I be the identity matrix.
Project lookAt in the plane perpendicular to worldUp and normalize it.
tmp1 = (I - P(worldUp)).lookAt
n1 = normalize(tmp1)
Angle1 = arccos(dot(worldFront,n1))
Angle2 = arccos(dot(lookAt,n1))
EDIT1:
Notice that there is no need to compute transcendental functions. Since the dot product of a pair of normalized vectors is the cosine of an angle and assuming that cos(t) = x, we have the trigonometric identities:
cos(t/2) = sqrt((1 + x)/2)
sin(t/2) = sqrt((1 - x)/2)
If somebody search for C# version with handling every matrix edge cases (not input edge cases!), here it is:
public static SoftQuaternion LookRotation(SoftVector3 forward, SoftVector3 up)
{
forward = SoftVector3.Normalize(forward);
// First matrix column
SoftVector3 sideAxis = SoftVector3.Normalize(SoftVector3.Cross(up, forward));
// Second matrix column
SoftVector3 rotatedUp = SoftVector3.Cross(forward, sideAxis);
// Third matrix column
SoftVector3 lookAt = forward;
// Sums of matrix main diagonal elements
SoftFloat trace1 = SoftFloat.One + sideAxis.X - rotatedUp.Y - lookAt.Z;
SoftFloat trace2 = SoftFloat.One - sideAxis.X + rotatedUp.Y - lookAt.Z;
SoftFloat trace3 = SoftFloat.One - sideAxis.X - rotatedUp.Y + lookAt.Z;
// If orthonormal vectors forms identity matrix, then return identity rotation
if (trace1 + trace2 + trace3 < SoftMath.CalculationsEpsilon)
{
return Identity;
}
// Choose largest diagonal
if (trace1 + SoftMath.CalculationsEpsilon > trace2 && trace1 + SoftMath.CalculationsEpsilon > trace3)
{
SoftFloat s = SoftMath.Sqrt(trace1) * (SoftFloat)2.0f;
return new SoftQuaternion(
(SoftFloat)0.25f * s,
(rotatedUp.X + sideAxis.Y) / s,
(lookAt.X + sideAxis.Z) / s,
(rotatedUp.Z - lookAt.Y) / s);
}
else if (trace2 + SoftMath.CalculationsEpsilon > trace1 && trace2 + SoftMath.CalculationsEpsilon > trace3)
{
SoftFloat s = SoftMath.Sqrt(trace2) * (SoftFloat)2.0f;
return new SoftQuaternion(
(rotatedUp.X + sideAxis.Y) / s,
(SoftFloat)0.25f * s,
(lookAt.Y + rotatedUp.Z) / s,
(lookAt.X - sideAxis.Z) / s);
}
else
{
SoftFloat s = SoftMath.Sqrt(trace3) * (SoftFloat)2.0f;
return new SoftQuaternion(
(lookAt.X + sideAxis.Z) / s,
(lookAt.Y + rotatedUp.Z) / s,
(SoftFloat)0.25f * s,
(sideAxis.Y - rotatedUp.X) / s);
}
}
This realization based on deeper understanding of this conversation, and was tested for many edge case scenarios.
P.S.
Quaternion's constructor is (x, y, z, w)
SoftFloat is software float type, so you can easyly change it to built-in float if needed
For full edge case safe realization (including input) check this repo.
lookAt
sideaxis
rotatedup
If you normalize this 3 vectors, it is a components of rotation matrix 3x3. So just convert this rotation matrix to quaternion.

how to zoom mandelbrot set

I have successfully implemented the mandelbrot set as described in the wikipedia article, but I do not know how to zoom into a specific section. This is the code I am using:
+(void)createSetWithWidth:(int)width Height:(int)height Thing:(void(^)(int, int, int, int))thing
{
for (int i = 0; i < height; ++i)
for (int j = 0; j < width; ++j)
{
double x0 = ((4.0f * (i - (height / 2))) / (height)) - 0.0f;
double y0 = ((4.0f * (j - (width / 2))) / (width)) + 0.0f;
double x = 0.0f;
double y = 0.0f;
int iteration = 0;
int max_iteration = 15;
while ((((x * x) + (y * y)) <= 4.0f) && (iteration < max_iteration))
{
double xtemp = ((x * x) - (y * y)) + x0;
y = ((2.0f * x) * y) + y0;
x = xtemp;
iteration += 1;
}
thing(j, i, iteration, max_iteration);
}
}
It was my understanding that x0 should be in the range -2.5 - 1 and y0 should be in the range -1 - 1, and that reducing that number would zoom, but that didnt really work at all. How can I zoom?
Suppose the center is the (cx, cy) and the length you want to display is (lx, ly), you can use the following scaling formula:
x0 = cx + (i/width - 0.5)*lx;
y0 = cy + (j/width - 0.5)*ly;
What it does is to first scale down the pixel to the unit interval (0 <= i/width < 1), then shift the center (-0.5 <= i/width-0.5 < 0.5), scale up to your desired dimension (-0.5*lx <= (i/width-0.5)*lx < 0.5*lx). Finally, shift it to the center you given.
first off, with a max_iteration of 15, you're not going to see much detail. mine has 1000 iterations per point as a baseline, and can go to about 8000 iterations before it really gets too slow to wait for.
this might help: http://jc.unternet.net/src/java/com/jcomeau/Mandelbrot.java
this too: http://www.wikihow.com/Plot-the-Mandelbrot-Set-By-Hand