How to run gradient descent algorithm when parameter space is constrained?

How to run gradient descent algorithm when parameter space is constrained? - optimization

I would like to maximize a function with one parameter.
So I run gradient descent (or, ascent actually): I start with an initial parameter and keep adding the gradient (times some learning rate factor that gets smaller and smaller), re-evaluate the gradient given the new parameter, and so on until convergence.
But there is one problem: My parameter must stay positive, so it is not supposed to become <= 0 because my function will be undefined. My gradient search will sometimes go into such regions though (when it was positive, the gradient told it to go a bit lower, and it overshoots).
And to make things worse, the gradient at such a point might be negative, driving the search toward even more negative parameter values. (The reason is that the objective function contains logs, but the gradient doesn't.)
What are some good (simple) algorithms that deal with this constrained optimization problem? I'm hoping for just a simple fix to my algorithm. Or maybe ignore the gradient and do some kind of line search for the optimal parameter?

Each time you update your parameter, check to see if it's negative, and if it is, clamp it to zero.
If clamping to zero is not acceptable, try adding a "log-barrier" (Google it). Basically, it adds a smooth "soft" wall to your objective function (and modifying your gradient) to keep it away from regions you don't want it to go to. You then repeatedly run the optimization by hardening up the wall to make it more infinitely vertical, but starting with the previously found solution. In the limit (in practice only a few iterations are needed), the problem you are solving is identical to the original problem with a hard constraint.

Without knowing more about your problem, it's hard to give specific advice. Your gradient ascent algorithm may not be particularly suitable for your function space. However, given that's what you've got, here's one tweak that would help.
You're following what you believe is an ascending gradient. But when you move forwards in the direction of the gradient, you discover you have fallen into a pit of negative value. This implies that there was a nearby local maximum, but also a very sharp negative gradient cliff. The obvious fix is to backtrack to your previous position, and take a smaller step (e.g half the size). If you still fall in, repeat with a still smaller step. This will iterate until you find the local maximum at the edge of the cliff.
The problem is, there is no guarantee that your local maximum is actually global (unless you know more about your function than you are sharing). This is the main limitation of naive gradient ascent - it always fixes on the first local maximum and converges to it. If you don't want to switch to a more robust algorithm, one simple approach that could help is to run n iterations of your code, starting each time from random positions in the function space, and keeping the best maximum you find overall. This Monte Carlo approach increases the odds that you will stumble on the global maximum, at the cost of increasing your run time by a factor n. How effective this is will depend on the 'bumpiness' of your objective function.

A simple trick to restrict a parameter to be positive is to re-parametrize the problem in terms of its logarithm (make sure to change the gradient appropriately). Of course it is possible that the optimum moves to -infty with this transformation, and the search does not converge.

At each step, constrain the parameter to be positive. This is (in short) the projected gradient method you may want to google about.

I have three suggestions, in order of how much thinking and work you want to do.
First, in gradient descent/ascent, you move each time by the gradient times some factor, which you refer to as a "learning rate factor." If, as you describe, this move causes x to become negative, there are two natural interpretations: Either the gradient was too big, or the learning rate factor was too big. Since you can't control the gradient, take the second interpretation. Check whether your move will cause x to become negative, and if so, cut the learning rate factor in half and try again.
Second, to elaborate on Aniko's answer, let x be your parameter, and f(x) be your function. Then define a new function g(x) = f(e^x), and note that although the domain of f is (0,infinity), the domain of g is (-infinity, infinity). So g cannot suffer the problems that f suffers. Use gradient descent to find the value x_0 that maximizes g. Then e^(x_0), which is positive, maximizes f. To apply gradient descent on g, you need its derivative, which is f'(e^x)*e^x, by the chain rule.
Third, it sounds like you're trying maximize just one function, not write a general maximization routine. You could consider shelving gradient descent, and tailoring the
method of optimization to the idiosyncrasies of your specific function. We would have to know a lot more about the expected behavior of f to help you do that.

Just use Brent's method for minimization. It is stable and fast and the right thing to do if you have only one parameter. It's what the R function optimize uses. The link also contains a simple C++ implementation. And yes, you can give it MIN and MAX parameter value limits.

You're getting good answers here. Reparameterizing is what I would recommend. Also, have you considered another search method, like Metropolis-Hastings? It's actually quite simple once you bull through the scary-looking math, and it gives you standard errors as well as an optimum.

Related

Smoothed Particle Hydrodynamics - Particle Density Estimation Issue

I'm currently writing an SPH Solver using CUDA on https://github.com/Mathiasb17/sph_opengl.
I have pretty good results and performances but in my mind they still seem pretty weird for some reason :
https://www.youtube.com/watch?v=_DdHN8qApns
https://www.youtube.com/watch?v=Afgn0iWeDoc
In some implementations, i saw that a particle does not contribute to its own internal forces (which would be 0 anyways due to the formulas), but it does contribute to its own density.
My simulations work "pretty fine" (i don't like "pretty fine", i want it perfect) and in my implementation a particle does not contribute to its own density.
Besides when i change the code so it does contribute to its own density, the resulting simulation becomes way too unstable (particles explode).
I asked this to a lecturer in physics based animation, he told me a particle should not contribute to its density, but did not give me specific details about this assertion.
Any idea of how it should be ?

As long as you calculate the density with the summation formula instead of the continuity equation, yes you need to do it with self-contribution.
Here is why:
SPH is an interpolation scheme, which allows you to interpolate a specific value in any position in space over a particle cloud. Any position means you are not restricted to evaluate it on a particle, but anywhere in space. If you do so, obviously you need to consider all particles within the influence radius. From this point of view, it is easy to see that interpolating a quantity at a particle's position does not influence its contribution.
For other quantities like forces, where the derivative of some quantity is approximated, you don't need to apply self-contribution (that would lead to the evaluation of 0/0).
To discover the source of the instability:
check if the kernel is normalised
are the stiffness of the liquid and the time step size compatible (for the weakly compressible case)?

Parameter Estimation to Minimize Runtime

Suppose, I an algorithm, whose runtime depends on two parameters. I want to find the best set of parameters that minimizes the runtime. The two parameters are continuous double values in the range of 0 to INFINITY.
Therefore, for two parameters a,b: I want to find the best values of a and b that minimize the runtime. I think this is pretty standard practice, but I could not find good literature on this. I found few literature such as MLE, Least Squares, etc. but they talk about distribution.

First use your brains to understand the possible functional relationship between those parameters and the running time, in a qualitative way. This means having a first idea on the number and positions of possible maxima, smoothness of the function, asymptotic behavior and any other clue that you can find.
Then make up your mind about a reasonable range of values where it makes sense to sample the function values. If those ranges are very wide, it is preferable to sample using a geometric progression rather than arithmetic (say, powers of 2).
Then measure and observe the function values with a graphical viewer and confirm your intuitions. It is likely that this will be enough to spot the gross location of the absolute maximum. Finding an accurate position might be useless if it gives you the last percents of improvement. It is also very likely that the location of the optimum will depend on the particular dataset, making accurate location even less useful.

Finding best path trought strongly connected component

I have a directed graph which is strongly connected and every node have some price(plus or negative). I would like to find best (highest score) path from node A to node B. My solution is some kind of brutal force so it takes ages to find that path. Is any algorithm for this or any idea how can I do it?

Have you tried the A* algorithm?
It's a fairly popular pathfinding algorithm.
The algorithm itself is not to difficult to implement, but there are plenty of implementations available online.
Dijkstra's algorithm is a special case for the A* (in which the heuristic function h(x) = 0).
There are other algorithms who can outperform it, but they usually require graph pre-processing. If the problem is not to complex and you're looking for a quick solution, give it a try.
EDIT:
For graphs containing negative edges, there's the Bellman–Ford algorithm. Detecting the negative cycles comes at the cost of performance, though (worse than the A*). But it still may be better than what you're currently using.
EDIT 2:
User #templatetypedef is right when he says the Bellman-Ford algorithm may not work in here.
The B-F works with graphs where there are edges with negative weight. However, the algorithm stops upon finding a negative cycle. I believe that is a useful behavior. Optimizing the shortest path in a graph that contains a cycle of negative weights will be like going down a Penrose staircase.
What should happen if there's the possibility of reaching a path with "minus infinity cost" depends on the problem.

Why is Verlet integration better than Euler integration?

Can someone explain to me why Verlet integration is better than Euler integration? And why RK4 is better than Verlet? I don't understand why it is a better method.

The Verlet method is is good at simulating systems with energy conservation, and the reason is that it is symplectic. In order to understand this statement you have to describe a time step in your simulation as a function, f, that maps the state space into itself. In other words each timestep can be written on the following form.
(x(t+dt), v(t+dt)) = f(x(t),v(t))
The time step function, f, of the Verlet method has the special property that it conserves state-space volume. We can write this in mathematical terms. If you have a set A of states in the state space, then you can define f(A) by
f(A) = {f(x)| for x in A}
Now let us assume that the sets A and f(A) are smooth and nice so we can define their volume. Then a symplectic map, f, will always fulfill that the volume of f(A) is the same as the volume of A. (and this will be fulfilled for all nice and smooth choices of A). This is fulfilled by the time step function of the Verlet method, and therefore the Verlet method is a symplectic method.
Now the final question is. Why is a symplectic method good for simulating systems with energy conservation, but I am afraid that you will have to read a book to understand this.

The Euler method is a first order integration scheme, i.e. the total error is proportional to the step size. However, it can be numerically unstable, in other words, the accumulated error can overwhelm the calculation giving you nonsense. Please note, this instability can occur regardless of how small you make the step size or whether the system is linear or not. I am not familiar with verlet integration, so I can not speak to its efficacy. But, the Runge-Kutta methods differ from the Euler method in more than just step size.
In essence, they are based on a better way of numerically approximating the derivative. The precise details escape me at the moment. In general, the fourth order Runge-Kutta method is considered the workhorse of the integration schemes, but it does have some disadvantages. It is slightly dissipative, i.e. a small first derivative dependent term is added to your calculation which resembles an added friction. Also, it has a fixed step size which can result can make it difficult to achieve the accuracy you desire. Alternatively, you can use an adaptive stepsize scheme, like the Runge-Kutta-Fehlberg method, which gives fifth order accuracy for an additional 6 function evaluations. This can greatly reduce the time necessary to perform your calculation while improving accuracy, as shown here.

If everything just coasts along in a linear way, it wouldn't matter what method you used, but when something interesting (i.e. non-linear) happens, you need to look more carefully, either by considering the non-linearity directly (verlet) or by taking smaller timesteps (rk4).

Modeling human running on a soccer field

In a soccer game, I am computing a steering force using steering behaviors. This part is ok.
However, I am looking for the best way to implement simple 2d human locomotion.
For instance, the players should not "steer" (or simply add acceleration computed from steering force) to its current velocity when the cos(angle) between the steering force and the current velocity or heading vectors is lower than 0.5 because it looks as if the player is a vehicule. A human, when there is an important change of direction, slows down and when it has slowed enough, it starts accelerating in the new direction.
Does anyone have any advice, ideas on how to achieve this behavior? Thanks in advance.

Make it change direction very quickly but without perfect friction. EG super mario
Edit: but feet should not slide - use procedural animation for feet

This is already researched and developed in an initiative called "Robocup". They have a simulation 2D league that should be really similar to what you are trying to accomplish.
Here's a link that should point you to the right direction:
http://wiki.robocup.org/wiki/Main_Page

Maybe you could compute the curvature. If the curvature value is to big, the speed slows down.
http://en.wikipedia.org/wiki/Curvature

At low speed a human can turn on a dime. At high speed only very slight turns require no slowing. The speed and radius of the turn are thus strongly correlated.
How much a human slows down when aiming toward a target is actually a judgment call, not an automatic computation. One human might come to almost a complete stop, turn sharply, and run directly toward the target. Another human might slow only a little and make a wide curving arc—even if this increases the total length to the target. The only caveat is that if the desired target is inside the radius of the curve at the current speed, the only reasonable path is to slow since it would take a wide loop far from the target in order to reach it (rather than circling it endlessly).
Here's how I would go about doing it. I apologize for the Imperial units if you prefer metric.
The fastest human ever recorded traveled just under 28 mph. Each of your human units should be given a personal top speed between 1 and 28 mph.
Create a 29-element table of the maximum acceleration and deceleration rates of a human traveling at each whole mph in a straight line. It doesn't have to be exact--just approximate accel and decel values for each value. Create fast, medium, slow versions of the 29-element table and assign each human to one of these tables. The table chosen may be mapped to the unit's top speed, so a unit with a max of 10mph would be a slow accelerator.
Create a 29-element table of the sharpest radius a human can turn at that mph (0-28).
Now, when animating each human unit, if you have target information and must choose an acceleration from that, the task is harder. If instead you just have a force vector, it is easier. Let's start with the force vector.
If the force vector's net acceleration and resultant angle would exceed the limit of the unit's ability, restrict the unit's new vector to the maximum angle allowed, and also decelerate the unit at its maximum rate for its current linear speed.
During the next clock tick, being slower, it will be able to turn more sharply.
If the force vector can be entirely accommodated, but the unit is traveling slower than its maximum speed for that curvature, apply the maximum acceleration the unit has at that speed.
I know the details are going to be quite difficult, but I think this is a good start.
For the pathing version where you have a target and need to choose a force to apply, the problem is a bit different, and even harder. I'm out of ideas for now--but suffice it to say that, given the example condition of the human already running away from the target at top stpeed, there will be a best-time path that is between on the one hand, slowing enough while turning to complete a perfect arc to the target, and on the other hand stopping completely, rotating completely and running straight to the target.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas