Maximizing Likelihood, Julia - optimization

I have a log-likelihood function and I want to maximize it in respect to theta (N), and it is defined as:
function loglik(theta,n,r)
N=theta;k=length(n);
ar1=float(lgamma(N+1));ar2=sum(n)*log(sum(n)/(k*N));ar3=(k*N-sum(n))*log(1-(sum(n))/(k*N));
par=float(lgamma((N-r)+1));
return(-(ar1+ar2+ar3-par)) end
The I use Optim.jl's optimize function as:
r=optimize(b->loglik(b,nn, 962), 978, BFGS() );
Where nn is an array. And I get this error:
ERROR:MethodError no method matching optimize (::#46#47,::Float64, ::Optim.BFGS)
Can anyone help?

You're almost there! You need to initialize it with an array.
optimize(b->loglik(first(b),nn,962), [978.,], BFGS())
(though you still need to provide us with nn for this answer to show the output)
edit: since b is a scalar in loglik, I changed it to b->loglik(first(b),nn, 962) as suggested by Chris Rackauckas below.

Related

Calculating implied volatility using Scipy optimize brentq error

I want to calculate implied volatility using scipy optimise brent root finding algorithm:
def calcimpliedvol(S,K,T,r,marketoptionPrice):
d1=(np.log(S/K)+(r-0.5*sigma**2)*T)/(sigma*np.sqrt(T))
d2=d1-(sigma*np.sqrt(T))
BSprice_call=S*si.norm.cdf(d1,0,1)-K*np.exp(-r*T)*si.norm.cdf(d2,0,1)
fx=BSprice_call-marketoptionPrice
return optimize.brentq(fx,0,1,maxiter=1000)
However, when I run the function giving it all the inputs specified K=6,S=8,T=0.25,r=0,OptionPrice=4 I get an error saying sigma is not defined. Sigma is what I want to find with the optimisation algorithm.
Could someone please advise what am I doing wrong in defining the function?
There are multiple issues with your code
brentq needs a function as the first argument, that it finds the root of. You passed it a variable. This is the main issue
Black-Scholes formula was wrong (it is (r+0.5*sigma**2) not (r-0.5*sigma**2) for d1)
the code does not work for sigma=0 as you divide by sigma. At the very least you should not pass 0 as one of the bounds. Better yet, handle sigma=0 case separately inside the code
The value of 4 for the option price is very high with S=8, K=6, T=0.25. The implied volatility in this case is 2.18 (ie 218%) which is outside the upper bound you gave your root solver
Here is the corrected code. For the first point note how we defined the function bs_price inside your function that is then passed to the solver. Other issues also addressed
from scipy import optimize
import scipy.stats as si
def calcimpliedvol(S,K,T,r,marketoptionPrice):
def bs_price(sigma):
d1=(np.log(S/K)+(r+0.5*sigma**2)*T)/(sigma*np.sqrt(T))
d2=d1-(sigma*np.sqrt(T))
BSprice_call=S*si.norm.cdf(d1,0,1)-K*np.exp(-r*T)*si.norm.cdf(d2,0,1)
fx=BSprice_call-marketoptionPrice
return fx
return optimize.brentq(bs_price,0.0001,100,maxiter=1000)
calcimpliedvol(S=8,K=6,T=0.25, r=0, marketoptionPrice=4)
it returns 2.188862879492475

SCIP what is the function for sign?

I am new to SCIP and have read through some of the example problems and documentation, but am still unsure how to formulate the following problem for the SCIP solver:
argmax(w) sum(sign(Aw) == sign(b))
where A is a nxm matrix, w is a mx1 vector, and b is a nx1 vector. The data type is floats/real numbers, and it is a constraint-free problem.
Values for A and b are also contained row-wise in a .txt file. How can I import that?
Overall - I am new to SCIP and have no idea how to start creating variables (especially the objective function value parameter), importing data, formulate the objective function... It's a bit of a stretch for me to ask this question, but your help is appreciated!
This should work:
where beta(i) = sign(b(i)). The implication can be implemented using indicator constraints. This way we don't need big-M's.
Most likely the >= 0 constraint should be >= 0.0001 (otherwise we can set all w(j)=0).

taking the gradient in Tensorflow, tf.gradient

I am using this function of tensorflow to get my function jacobian. Came across two problems:
The tensorflow documentation is contradicted to itself in the following two paragraph if I am not mistaken:
gradients() adds ops to the graph to output the partial derivatives of ys with respect to xs. It returns a list of Tensor of length len(xs) where each tensor is the sum(dy/dx) for y in ys.
Blockquote
Blockquote
Returns:
A list of sum(dy/dx) for each x in xs.
Blockquote
According to my test, it is, in fact, return a vector of len(ys) which is the sum(dy/dx) for each x in xs.
I do not understand why they designed it in a way that the return is the sum of the columns(or row, depending on how you define your Jacobian).
How can I really get the Jacobian?
4.In the loss, I need the partial derivative of my function with respect to input (x), but when I am optimizing with respect to the network weights, I define x as a placeholder whose value is fed later, and weights are variable, in this case, can I still define the symbolic derivative of function with respect to input (x)? and put it in the loss? ( which later when we optimize with respect to weights will bring second order derivative of the function.)
I think you are right and there is a typo there, it was probably meant to be "of length len(ys)".
For efficiency. I can't explain exactly the reasoning, but this seems to be a pretty fundamental characteristic of how TensorFlow handles automatic differentiation. See issue #675.
There is no straightforward way to get the Jacobian matrix in TensorFlow. Take a look at this answer and again issue #675. Basically, you need one call to tf.gradients per column/row.
Yes, of course. You can compute whatever gradients you want, there is no real difference between a placeholder and any other operation really. There are a few operations that do not have a gradient because it is not well defined or not implemented (in which case it will generally return 0), but that's all.

Errors to fit parameters of scipy.optimize

I use the scipy.optimize.minimize ( https://docs.scipy.org/doc/scipy/reference/tutorial/optimize.html ) function with method='L-BFGS-B.
An example of what it returns is here above:
fun: 32.372210618549758
hess_inv: <6x6 LbfgsInvHessProduct with dtype=float64>
jac: array([ -2.14583906e-04, 4.09272616e-04, -2.55795385e-05,
3.76587650e-05, 1.49213975e-04, -8.38440428e-05])
message: 'CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH'
nfev: 420
nit: 51
status: 0
success: True
x: array([ 0.75739412, -0.0927572 , 0.11986434, 1.19911266, 0.27866406,
-0.03825225])
The x value correctly contains the fitted parameters. How do I compute the errors associated to those parameters?
TL;DR: You can actually place an upper bound on how precisely the minimization routine has found the optimal values of your parameters. See the snippet at the end of this answer that shows how to do it directly, without resorting to calling additional minimization routines.
The documentation for this method says
The iteration stops when (f^k - f^{k+1})/max{|f^k|,|f^{k+1}|,1} <= ftol.
Roughly speaking, the minimization stops when the value of the function f that you're minimizing is minimized to within ftol of the optimum. (This is a relative error if f is greater than 1, and absolute otherwise; for simplicity I'll assume it's an absolute error.) In more standard language, you'll probably think of your function f as a chi-squared value. So this roughly suggests that you would expect
Of course, just the fact that you're applying a minimization routine like this assumes that your function is well behaved, in the sense that it's reasonably smooth and the optimum being found is well approximated near the optimum by a quadratic function of the parameters xi:
where Δxi is the difference between the found value of parameter xi and its optimal value, and Hij is the Hessian matrix. A little (surprisingly nontrivial) linear algebra gets you to a pretty standard result for an estimate of the uncertainty in any quantity X that's a function of your parameters xi:
which lets us write
That's the most useful formula in general, but for the specific question here, we just have X = xi, so this simplifies to
Finally, to be totally explicit, let's say you've stored the optimization result in a variable called res. The inverse Hessian is available as res.hess_inv, which is a function that takes a vector and returns the product of the inverse Hessian with that vector. So, for example, we can display the optimized parameters along with the uncertainty estimates with a snippet like this:
ftol = 2.220446049250313e-09
tmp_i = np.zeros(len(res.x))
for i in range(len(res.x)):
tmp_i[i] = 1.0
hess_inv_i = res.hess_inv(tmp_i)[i]
uncertainty_i = np.sqrt(max(1, abs(res.fun)) * ftol * hess_inv_i)
tmp_i[i] = 0.0
print('x^{0} = {1:12.4e} ± {2:.1e}'.format(i, res.x[i], uncertainty_i))
Note that I've incorporated the max behavior from the documentation, assuming that f^k and f^{k+1} are basically just the same as the final output value, res.fun, which really ought to be a good approximation. Also, for small problems, you can just use np.diag(res.hess_inv.todense()) to get the full inverse and extract the diagonal all at once. But for large numbers of variables, I've found that to be a much slower option. Finally, I've added the default value of ftol, but if you change it in an argument to minimize, you would obviously need to change it here.
One approach to this common problem is to use scipy.optimize.leastsq after using minimize with 'L-BFGS-B' starting from the solution found with 'L-BFGS-B'. That is, leastsq will (normally) include and estimate of the 1-sigma errors as well as the solution.
Of course, that approach makes several assumption, including that leastsq can be used and may be appropriate for solving the problem. From a practical view, this requires the objective function return an array of residual values with at least as many elements as variables, not a cost function.
You may find lmfit (https://lmfit.github.io/lmfit-py/) useful here: It supports both 'L-BFGS-B' and 'leastsq' and gives a uniform wrapper around these and other minimization methods, so that you can use the same objective function for both methods (and specify how to convert the residual array into the cost function). In addition, parameter bounds can be used for both methods. This makes it very easy to first do a fit with 'L-BFGS-B' and then with 'leastsq', using the values from 'L-BFGS-B' as starting values.
Lmfit also provides methods to more explicitly explore confidence limits on parameter values in more detail, in case you suspect the simple but fast approach used by leastsq might be insufficient.
It really depends what you mean by "errors". There is no general answer to your question, because it depends on what you're fitting and what assumptions you're making.
The easiest case is one of the most common: when the function you are minimizing is a negative log-likelihood. In that case the inverse of the hessian matrix returned by the fit (hess_inv) is the covariance matrix describing the Gaussian approximation to the maximum likelihood.The parameter errors are the square root of the diagonal elements of the covariance matrix.
Beware that if you are fitting a different kind of function or are making different assumptions, then that doesn't apply.

Gradient for Each Example Using map_fn

I want to get the gradient of a layer with respect to a parameter matrix for each example. Normally, I would need a Jacobian, but following this idea, I decided to use map_fn so I could feed forward data in a batch rather than one by one. This gives me a problem I do not understand, unfortunately. With the code
get_grads = tf.map_fn(lambda x: tf.gradients(x, W['1'])[0], softmax_probs)
sess.run(get_grads, feed_dict={x: images[0:100]})
I get this error
InvalidArgumentError: TensorArray map_21/TensorArray_36#map_21/while/gradients: Could not write to TensorArray index 0 because it has already been read.
W['1'] is a variable in the graph. Ideas?
It seems like your issue may be connected with the bug
https://github.com/tensorflow/tensorflow/issues/7643
One commenter posts a possible fix at the end. You could try that out.
Alternatively, if you what you want is the jacobian, then you can check out this solution:
https://github.com/tensorflow/tensorflow/issues/675#issuecomment-362853672
although it appears that it will not work when nested.
I don't think this will work because x in this case is a loop variable which TensorFlow does not know how to connect to softmax_probs.