Convexity of function and its optimization - optimization

Is function convex in x and y jointly? I want is to estimate both parameter x and y, that minimizes the least square. If the function is convex in both x and y jointly, then technically I can find x and y by iterating between 2 steps: Find best x given y and find best y given x.
Obviously I know I might be wrong in multiple levels. Function look non-convex as there a multiple saddle point ie. all x=0 and y=0. But if I have a constrain that y>0, this problem is no longer there.
Further, I am not sure whether the iterative algorithm work and converge even if the function is convex.

You can compute Hessian and check whether it is positive definite.

A convex optimization problem is defined to have convex objective, convex inequality constraints, and affine equality constraints. As you've pointed out, your objective is not convex therefore is not a convex optimization problem. This problem also seems underspecified. Why not just solve the problem minimize sum_i (a_i - alpha* b_i)^2 over alpha? This problem is convex in alpha of course and once you've found alpha, you can go ahead and choose any x and y such that x*y = alpha, though I admit it's not clear why you'd want to do this

Related

Using fixed point to show square root

In going through the exercises of SICP, it defines a fixed-point as a function that satisfies the equation F(x)=x. And iterating to find where the function stops changing, for example F(F(F(x))).
The thing I don't understand is how a square root of, say, 9 has anything to do with that.
For example, if I have F(x) = sqrt(9), obviously x=3. Yet, how does that relate to doing:
F(F(F(x))) --> sqrt(sqrt(sqrt(9)))
Which I believe just converges to zero:
>>> math.sqrt(math.sqrt(math.sqrt(math.sqrt(math.sqrt(math.sqrt(9))))))
1.0349277670798647
Since F(x) = sqrt(x) when x=1. In other words, how does finding the square root of a constant have anything to do with finding fixed points of functions?
When calculating the square-root of a number, say a, you essentially have an equation of the form x^2 - a = 0. That is, to find the square-root of a, you have to find an x such that x^2 = a or x^2 - a = 0 -- call the latter equation as (1). The form given in (1) is an equation which is of the form g(x) = 0, where g(x) := x^2 - a.
To use the fixed-point method for calculating the roots of this equation, you have to make some subtle modifications to the existing equation and bring it to the form f(x) = x. One way to do this is to rewrite (1) as x = a/x -- call it (2). Now in (2), you have obtained the form required for solving an equation by the fixed-point method: f(x) is a/x.
Observe that this method requires both sides of the equation to have an 'x' term; an equation of the form sqrt(a) = x doesn't meet the specification and hence can't be solved (iteratively) using the fixed-point method.
The thing I don't understand is how a square root of, say, 9 has anything to do with that.
For example, if I have F(x) = sqrt(9), obviously x=3. Yet, how does that relate to doing: F(F(F(x))) --> sqrt(sqrt(sqrt(9)))
These are standard methods for numerical calculation of roots of non-linear equations, quite a complex topic on its own and one which is usually covered in Engineering courses. So don't worry if you don't get the "hang of it", the authors probably felt it was a good example of iterative problem solving.
You need to convert the problem f(x) = 0 to a fixed point problem g(x) = x that is likely to converge to the root of f(x). In general, the choice of g(x) is tricky.
if f(x) = x² - a = 0, then you should choose g(x) as follows:
g(x) = 1/2*(x + a/x)
(This choice is based on Newton's method, which is a special case of fixed-point iterations).
To find the square root, sqrt(a):
guess an initial value of x0.
Given a tolerance ε, compute xn+1 = 1/2*(xn + a/xn) for n = 0, 1, ... until convergence.

Matrix Inverse in Visual Basic

I'm writing a program to do the Newton Raphson Method for n variable (System of equation) using Datagridview. My problem is to determine the inverse for Jacobian Matrix. I've search in internet to find a solution but a real couldn't get it until now so if someone can help me I will real appreciate. Thanks in advance.
If you are asking for a recommendation of a library, that is explicitly off topic in Stack Overflow. However below I mention some algorithms that are commonly used; this may help you to find, or write, what you need. I would, though, not recommend writing something, unless you really want to, as it can be tricky to get these algorithms right. If you do decide to write something I'd recommend the QR method, as the easiest to write, though the theory is a little subtle.
First off do you really need to compute the inverse? If, for example, what you need to do is to compute
x = inv(J)*y
then it's faster and more accurate to treat this problem as
solve J*x = y for x
The methods below all factor J into other matrices, for which this solution can be done. A good package that implements the factorisation will also have the code to perform the solution.
If you do really really need the inverse often the best way is to solve, one column at a time
J*K = I for K, where I is the identity matrix
LU decomposition
This may well be the fastest of the algorithms described here but is also the least accurate. An important point is that the algorithm must include (partial) pivoting, or it will not work on all invertible matrices, for example it will fail on a rotation through 90 degrees.
What you get is a factorisation of J into:
J = P*L*U
where P is a permutation matrix,
L lower triangular,
U upper triangular
So having factorised, to solve for x we do three steps, each straightforward, and each can be done in place (ie all the x's can be the same variable)
Solve P*x1 = y for x1
Solve L*x2 = x1 for x2
Solve U*x = x2 for x
QR decomposition
This may be somewhat slower than LU but is more accurate. Conceptually this factorises J into
J = Q*R
Where Q is orthogonal and R upper triangular. However as it is usually implemented you in fact pass y as well as J to the routine, and it returns R (in J) and Q'*y (in the passed y), so to solve for x you just need to solve
R*x = y
which, given that R is upper triangular, is easy.
SVD (Singular value decomposition)
This is the most accurate, but also the slowest. Moreover unlike the others you can make progress even if J is singular (you can compute the 'generalised inverse' applied to y).
I recommend reading up on this, but advise against implementing it yourself.
Briefly you factorise J as
J = U*S*V'
where U and V are orthogonal and S diagonal.
There are, of course, many other ways of solving this problem. For example if your matrices are very large (dimension in the thousands) an it may, particularly if they are sparse (lots of zeroes), be faster to use an iterative method.

Relaxation of linear constraints?

When we need to optimize a function on the positive real half-line, and we only have non-constraints optimization routines, we use y = exp(x), or y = x^2 to map to the real line and still optimize on the log or the (signed) square root of the variable.
Can we do something similar for linear constraints, of the form Ax = b where, for x a d-dimensional vector, A is a (N,n)-shaped matrix and b is a vector of length N, defining the constraints ?
While, as Ervin Kalvelaglan says this is not always a good idea, here is one way to do it.
Suppose we take the SVD of A, getting
A = U*S*V'
where if A is n x m
U is nxn orthogonal,
S is nxm, zero off the main diagonal,
V is mxm orthogonal
Computing the SVD is not a trivial computation.
We first zero out the elements of S which we think are non-zero just due to noise -- which can be a slightly delicate thing to do.
Then we can find one solution x~ to
A*x = b
as
x~ = V*pinv(S)*U'*b
(where pinv(S) is the pseudo inverse of S, ie replace the non zero elements of the diagonal by their multiplicative inverses)
Note that x~ is a least squares solution to the constraints, so we need to check that it is close enough to being a real solution, ie that Ax~ is close enough to b -- another somewhat delicate thing. If x~ doesn't satisfy the constraints closely enough you should give up: if the constraints have no solution neither does the optimisation.
Any other solution to the constraints can be written
x = x~ + sum c[i]*V[i]
where the V[i] are the columns of V corresponding to entries of S that are (now) zero. Here the c[i] are arbitrary constants. So we can change variables to using the c[] in the optimisation, and the constraints will be automatically satisfied. However this change of variables could be somewhat irksome!

Update equation for gradient descent

If we have a approximation function y = f(w,x), where x is input, y is output, and w is the weight. According to gradient descent rule, we should update the weight according to w = w - df/dw. But is that possible that we update the weight according to w = w - w * df/dw instead? Has anyone seen this before? The reason I want to do this is because it is easier for me to do it this way in my algorithm.
Recall, gradient descent is based on the Taylor expansion of f(w, x) in the close vicinity of w, and has its purpose---in your context---in repeatedly modifying the weight in small steps. The reverse gradient direction is just a search direction, based upon very local knowledge of the function f(w, x).
Usually the iterative of the weight includes a step length, yielding the expression
w_(i+1) = w_(i) - nu_j df/dw,
where the value of the step length nu_j is found by using line search, see e.g. https://en.wikipedia.org/wiki/Line_search.
Hence, based on the discussion above, to answer your question: no, it is not a good idea to update according to
w_(i+1) = w_(i) - w_(i) df/dw.
Why? If w_(i) is large (in context), we'll take a huge step based on very local information, and we would be using something very different than the fine-stepped gradient descent method.
Also, as lejlot points out in the comments below, a negative value of w(i) would mean you traverse in the (positive) direction of the gradient, i.e., in the direction in which the function grows most rapidly, which is, locally, the worst possible search direction (for minimization problems).

Fitting curves to a set of points

Basically, I have a set of up to 100 co-ordinates, along with the desired tangents to the curve at the first and last point.
I have looked into various methods of curve-fitting, by which I mean an algorithm with takes the inputted data points and tangents, and outputs the equation of the cure, such as the gaussian method and interpolation, but I really struggled understanding them.
I am not asking for code (If you choose to give it, thats acceptable though :) ), I am simply looking for help into this algorithm. It will eventually be converted to Objective-C for an iPhone app, if that changes anything..
EDIT:
I know the order of all of the points. They are not too close together, so passing through all points is necessary - aka interpolation (unless anyone can suggest something else). And as far as I know, an algebraic curve is what I'm looking for. This is all being done on a 2D plane by the way
I'd recommend to consider cubic splines. There is some explanation and code to calculate them in plain C in Numerical Recipes book (chapter 3.3)
Most interpolation methods originally work with functions: given a set of x and y values, they compute a function which computes a y value for every x value, meeting the specified constraints. As a function can only ever compute a single y value for every x value, such an curve cannot loop back on itself.
To turn this into a real 2D setup, you want two functions which compute x resp. y values based on some parameter that is conventionally called t. So the first step is computing t values for your input data. You can usually get a good approximation by summing over euclidean distances: think about a polyline connecting all your points with straight segments. Then the parameter would be the distance along this line for every input pair.
So now you have two interpolation problem: one to compute x from t and the other y from t. You can formulate this as a spline interpolation, e.g. using cubic splines. That gives you a large system of linear equations which you can solve iteratively up to the desired precision.
The result of a spline interpolation will be a piecewise description of a suitable curve. If you wanted a single equation, then a lagrange interpolation would fit that bill, but the result might have odd twists and turns for many sets of input data.