Adam optimizer in MatConvNet - optimization

I tried implementing Adam instead of default SGD optimizer by changing following code in cnn_train from:
opts.solver = [] ; % Empty array means use the default SGD solver
[opts, varargin] = vl_argparse(opts, varargin) ;
if ~isempty(opts.solver)
assert(isa(opts.solver, 'function_handle') && nargout(opts.solver) == 2,...
'Invalid solver; expected a function handle with two outputs.') ;
% Call without input arguments, to get default options
opts.solverOpts = opts.solver() ;
end
to:
opts.solver = 'adam';
[opts, varargin] = vl_argparse(opts, varargin) ;
opts.solverOpts = opts.solver() ;
However, I get an error:
Insufficient number of outputs from right hand side of equal sign to satisfy assignment.
Error in cnn_train>accumulateGradients (line 508)
params.solver(net.layers{l}.weights{j}, state.solverState{l}{j}, ...
Have any of you tried changing from default compiler? What else should I change in cnn_train?
The code for Adam function:
function [w, state] = adam(w, state, grad, opts, lr)
%ADAM
% Adam solver for use with CNN_TRAIN and CNN_TRAIN_DAG
%
% See [Kingma et. al., 2014](http://arxiv.org/abs/1412.6980)
% | ([pdf](http://arxiv.org/pdf/1412.6980.pdf)).
%
% If called without any input argument, returns the default options
% structure. Otherwise provide all input arguments.
%
% W is the vector/matrix/tensor of parameters. It can be single/double
% precision and can be a `gpuArray`.
%
% STATE is as defined below and so are supported OPTS.
%
% GRAD is the gradient of the objective w.r.t W
%
% LR is the learning rate, referred to as \alpha by Algorithm 1 in
% [Kingma et. al., 2014].
%
% Solver options: (opts.train.solverOpts)
%
% `beta1`:: 0.9
% Decay for 1st moment vector. See algorithm 1 in [Kingma et.al. 2014]
%
% `beta2`:: 0.999
% Decay for 2nd moment vector
%
% `eps`:: 1e-8
% Additive offset when dividing by state.v
%
% The state is initialized as 0 (number) to start with. The first call to
% this function will initialize it with the default state consisting of
%
% `m`:: 0
% First moment vector
%
% `v`:: 0
% Second moment vector
%
% `t`:: 0
% Global iteration number across epochs
%
% This implementation borrowed from torch optim.adam
% Copyright (C) 2016 Aravindh Mahendran.
% All rights reserved.
%
% This file is part of the VLFeat library and is made available under
% the terms of the BSD license (see the COPYING file).
if nargin == 0 % Returns the default solver options
w = struct('beta1', 0.9, 'beta2', 0.999, 'eps', 1e-8) ;
return ;
end
if isequal(state, 0) % start off with state = 0 so as to get default state
state = struct('m', 0, 'v', 0, 't', 0);
end
% update first moment vector `m`
state.m = opts.beta1 * state.m + (1 - opts.beta1) * grad ;
% update second moment vector `v`
state.v = opts.beta2 * state.v + (1 - opts.beta2) * grad.^2 ;
% update the time step
state.t = state.t + 1 ;
% This implicitly corrects for biased estimates of first and second moment
% vectors
lr_t = lr * (((1 - opts.beta2^state.t)^0.5) / (1 - opts.beta1^state.t)) ;
% Update `w`
w = w - lr_t * state.m ./ (state.v.^0.5 + opts.eps) ;

"Insufficient number of outputs from right hand side of equal sign to satisfy assignment."
It seems your number of outputs doesn't match what's required by cnn_train.
Could you show your adam function?
In the latest version of MatConvNet
[net.layers{l}.weights{j}, state.solverState{l}{j}] = ...
params.solver(net.layers{l}.weights{j}, state.solverState{l}{j}, ...
parDer, params.solverOpts, thisLR) ;
It seems to match your adam function
Why don't you try this:
opts.solver = #adam;
Instead of opts.solver = 'adam';

Related

is there an R**2 values for finding in the linear regression analysis?

I'm trying to code a for linear regression analysis that prints TypeError: can't multiply sequence by non-int of type 'list',.
I tried to learn linear regression coefficient analysis
def corr_coef(x,y):
N = len(x)
num = (N * (x * y).sum()) - (x.sum() * y.sum())
den = np.sqrt((N * (x**2).sum() - x.sum()**2) * (N * (y**2).sum() - y.sum()**2))
R = num / den
return R
num = (N * (x * y).sum()) - (x.sum() * y.sum())
TypeError: can't multiply sequence by non-int of type 'list'

z3py: Symbolic expressions cannot be cast to concrete Boolean values

I'm having troubles to define the objective fucntion in a SMT problem with z3py.
Long story, short, I have to optimize the placing of smaller blocks inside a board that has fixed width but variable heigth.
I have an array of coordinates (represented by an array of integers of length 2) and a list of integers (representing the heigth of the block to place).
# [x,y] list of integer variables
P = [[Int("x_%s" % (i + 1)), Int("y_%s" % (i + 1))]
for i in range(blocks)]
y = [int(b) for a, b in data[2:]]
I defined the objective function like this:
obj= Int(max([P[i][1] + y[i] for i in range(blocks)]))
It calculates the max height of the board given the starting coordinate of the blocks and their heights.
I know it could be better, but I think the problem would be the same even with a different definition.
Anyway, if I run my code, the following error occurs on the line of the objective function:
" raise Z3Exception("Symbolic expressions cannot be cast to concrete Boolean values.") "
While debugging I've seen that is P[i][1] that gives an error and I think it's because the program reads "y_i + 3" (for example) and they can't be added togheter.
Point is: it's obvious that the objective function depends on the variables of the problem, so how can I get rid of this error? Is there another place where I should define the objective function so it waits to have the P array instantiated before doing anything?
Full code:
from z3 import *
from math import ceil
width = 8
blocks = 4
x = [3,3,5,5]
y = [3,5,3,5]
height = ceil(sum([x[i] * y[i] for i in range(blocks)]) / width) + 1
# [blocks x 2] list of integer variables
P = [[Int("x_%s" % (i + 1)), Int("y_%s" % (i + 1))]
for i in range(blocks)]
# value/ domain constraint
values = [And(0 <= P[i][0], P[i][0] <= width - 1, 0 <= P[i][1], P[i][1] <= height - 1)
for i in range(blocks)]
obj = Int(max([P[i][1] + y[i] for i in range(blocks)]))
board_problem = values # other constraints I've not included for brevity
o = Optimize()
o.add(board_problem)
o.minimize(obj)
if (o.check == 'unsat'):
print("The problem is unsatisfiable")
else:
print("Solved")
The problem here is that you're calling Python's max on symbolic values, which is not designed to work for symbolic expressions. Instead, define a symbolic version of max and use that:
# Return maximum of a vector; error if empty
def symMax(vs):
m = vs[0]
for v in vs[1:]:
m = If(v > m, v, m)
return m
obj = symMax([P[i][1] + y[i] for i in range(blocks)])
With this change your program will go through and print Solved when run.

Why is the result of trigonometric function calculation different?

I calculated three methods of the following with Numpy.
Avoiding the circle periodicity, I given the range is 0 to +180.
The calculation results of the three methods should match.
However, all calculation results are different.
Why is this?
degAry = []
sumDeg = 0
cosRad = 0
sinRad = 0
LEN = 300
RAD2DEG = 180.0 / PI # 57.2957795
for i in range(LEN):
deg = random.uniform(0,180)
rad = np.deg2rad(deg)
degAry.append(deg)
sumDeg += deg
cosRad += np.cos(rad)
sinRad += np.sin(rad)
print(np.arctan2( sinRad/LEN, cosRad/LEN ) * RAD2DEG) # 88.39325364335279
print(np.sum(degAry)/LEN) # 88.75448888951954
print(sumDeg/LEN) # 88.75448888951951
What makes you think that the mean angle and the angle of the mean vector should be the same? This is correct only for n = 1,2, for n = 3 degAry = [0, 90, 90] is easily verified to be a counter example: mean of the angles is 60 with tan = sqrt(3), mean vector is (1/3 2/3) corresponding to tan = 2.
EDIT
Mean of circular quantities
suggesting that the sin, cos approach is best.
Refactoring your code to use numpy exclusively. The two methods are different, however, the first two using RAD2DEG or the np.degrees yield the same results. The latter which used the sum of degrees divided by sample size differs.
It doesn't appear to be a summation issue (N=3000, sum in normal order, ascending then descending). They yield the same results
np.sum(deg) # 134364.25172174018
np.sum(np.sort(deg)) # 134364.25172174018
np.sum(np.sort(deg)[::-1]) # 134364.25172174018
I didn't carry it out with the summation for the cos and sin in radian form. I will leave that for others.
PI = np.pi
sumDeg = 0.
cosRad = 0.
sinRad = 0.
N = 30
RAD2DEG = 180.0 / PI # 57.2957795
deg = np.random.uniform(0, 90.0, N)
rad = np.deg2rad(deg)
sumDeg = np.sum(deg)
cosRad = np.sum(np.cos(rad))
sinRad = np.sum(np.sin(rad))
print(np.arctan2(sinRad/N, cosRad/N) * RAD2DEG)
print(np.degrees(np.arctan2(sinRad/N, cosRad/N)))
print(sumDeg/N)
Results for
> N = 1
> 22.746571717879792
> 22.746571717879792
> 22.746571717879792
>
> N= 30
> 48.99636699165551
> 48.99636699165551
> 49.000295118106884
>
> N = 300
> 44.39333460088003
> 44.39333460088003
> 44.44513528547155
>
> N = 3000
> 44.984167020219175
> 44.984167020219175
> 44.97574462726241

Octave fminunc "trust region become excessively small"

I am trying to run a linear regression using fminunc to optimize my parameters. However, while the code never fails, the fminunc function seems to only be running once and not converging. The exit flag that the fminunc funtion returns is -3, which - according to documentation- means "The trust region radius became excessively small". What does this mean and how can I fix it?
This is my main:
load('data.mat');
% returns matrix X, a matrix of data
% Initliaze parameters
[m, n] = size(X);
X = [ones(m, 1), X];
initialTheta = zeros(n + 1, 1);
alpha = 1;
lambda = 0;
costfun = #(t) costFunction(t, X, surv, lambda, alpha);
options = optimset('GradObj', 'on', 'MaxIter', 1000);
[theta, cost, info] = fminunc(costfun, initialTheta, options);
And the cost function:
function [J, grad] = costFunction(theta, X, y, lambda, alpha)
%COSTFUNCTION Implements a logistic regression cost function.
% [J grad] = COSTFUNCTION(initialParameters, X, y, lambda) computes the cost
% and the gradient for the logistic regression.
%
m = size(X, 1);
J = 0;
grad = zeros(size(theta));
% un-regularized
z = X * theta;
J = (-1 / m) * y' * log(sigmoid(z)) + (1 - y)' * log(1 - sigmoid(z));
grad = (alpha / m) * X' * (sigmoid(z) - y);
% regularization
theta(1) = 0;
J = J + (lambda / (2 * m)) * (theta' * theta);
grad = grad + alpha * ((lambda / m) * theta);
endfunction
Any help is much appreciated.
There are a few issues with the code above:
Using the fminunc means you don't have to provide an alpha. Remove all instances of it from the code and your gradient functions should look like the following
grad = (1 / m) * X' * (sigmoid(z) - y);
and
grad = grad + ((lambda / m) * theta); % This isn't quite correct, see below
In the regularization of the grad, you can't use theta as you don't add in the theta for j = 0. There are a number ways to do this, but here is one
temp = theta;
temp(1) = 0;
grad = grad + ((lambda / m) * temp);
You missing a set of bracket in your cost function. The (-1 / m) is being applied only to a portion of the rest of the equation. It should look like.
J = (-1 / m) * ( y' * log(sigmoid(z)) + (1 - y)' * log(1 - sigmoid(z)) );
And finally, as a nit, a lambda value of 0 means that your regularization does nothing.

rjags error Invalid vector argument to ilogit

I'd like to compare a betareg regression vs. the same regression using rjags
library(betareg)
d = data.frame(p= sample(c(.1,.2,.3,.4),100, replace= TRUE),
id = seq(1,100,1))
# I am looking to reproduce this regression with jags
b=betareg(p ~ id, data= d,
link = c("logit"), link.phi = NULL, type = c("ML"))
summary(b)
Below I am trying to do the same regression with rjags
#install.packages("rjags")
library(rjags)
jags_str = "
model {
#model
y ~ dbeta(alpha, beta)
alpha <- mu * phi
beta <- (1-mu) * phi
logit(mu) <- a + b*id
#priors
a ~ dnorm(0, .5)
b ~ dnorm(0, .5)
t0 ~ dnorm(0, .5)
phi <- exp(t0)
}"
id = d$id
y = d$p
model <- jags.model(textConnection(jags_str),
data = list(y=y,id=id)
)
update(model, 10000, progress.bar="none"); # Burnin for 10000 samples
samp <- coda.samples(model,
variable.names=c("mu"),
n.iter=20000, progress.bar="none")
summary(samp)
plot(samp)
I get an error on this line
model <- jags.model(textConnection(jags_str),
data = list(y=y,id=id)
)
Error in jags.model(textConnection(jags_str), data = list(y = y, id = id)) :
RUNTIME ERROR:
Invalid vector argument to ilogit
Can you advise
(1) how to fix the error
(2) how to set priors for the beta regression
Thank you.
This error occurs because you have supplied the id vector to the scalar function logit. In Jags inverse link functions cannot be vectorized. To address this, you need to use a for loop to go through each element of id. To do this I would probably add an additional element to your data list that denotes how long id is.
d = data.frame(p= sample(c(.1,.2,.3,.4),100, replace= TRUE),
id = seq(1,100,1), len_id = length(seq(1,100,1)))
From there you just need to make a small edit to your jags code.
for(i in 1:(len_id)){
y[i] ~ dbeta(alpha[i], beta[i])
alpha[i] <- mu[i] * phi
beta[i] <- (1-mu[i]) * phi
logit(mu[i]) <- a + b*id[i]
}
However, if you track mu it is going to be a matrix that is 20000 (# of iterations) by 100 (length of id). You are likely more interested in the actual parameters (a, b, and phi).