Why do numerical optimizers minimize rather than maximize? - optimization

I'm thinking about optim() in R or optimize() in Python's scipy. Both can minimize a function, and both can maximize a function by minimizing the inverted function. My question is: why are both coded to minimize rather than maximize? Is it just down to the numerical techniques they use (e.g., Nelder-Mead)?

most (all?) solvers have minimization as the default behavior. Not sure what the historical reasons are for that... perhaps some of the early applications were all minimizing penalties instead of maximizing "goodness", but the standardization is sure nice.
Some frameworks have option to change the "sense" from min to max, but it is just syntactic sugar and as easy as negating the objective function. (Not sure what "inverted" function implies.)

Related

Higher precision eigenvalues with numpy

I'm currently computing several eigenvalues of different matrices and trying to find their closed-form solutions. The matrix is hermitian/self-adjoint, and tri-diagonal. Additionally, every diagonal element is positive and every off-diagonal is negative.
Due to what I suspect is an issue with trying to algebraically solve the quintic, sympy cannot solve the eigenvalues of my 14x14 matrix.
Numpy has given me great results that I'm sometimes able to use via wolfram-alpha, but other times the precision is lacking to be able to determine which of several candidates the closed form solution could take. As a result, I'm wishing to increase the precision with which numpy.linalg.eigenvaluesh outputs eigenvalues. Any help would be greatly appreciated!
Eigenvalue problems of size>=5 have no general closed form solution (for the reason you mention), and so all general eigensolvers are iterative. As a result, there are a few sources of error.
First, there are the errors with the convergence of the algorithm itself. I.e. even if all your computations were exact, you would need to run a certain number of iterations to get a certain accuracy.
Second, finite precision limits the overall accuracy.
Numerical analysts study how accurate a solution you can get for a given algorithm and precision and there are results on this.
As for your specific problem, if you are not getting enough accuracy there are a few things you can try to do.
The first, is make sure you are using the best solvers for your method. I.e. since your matrix is symmetric and tridiagonal, make sure you are using solvers for this type (as suggested by norok2).
If that still doesn't give you enough accuracy, you can try to increase the precision.
However, the main issue with doing this in numpy is that the LAPACK functions under the hood are compiled for float64.
Thus, even if the numpy function allows inputs of higher precision (float128), it will round them before calling the LAPACK functions.
It might be possible to recompile those functions for higher precision, but that may not be worth the effort for your particular problem.
(As a side note, I'm not very familiar with scipy, so it may be the case that they have eigensolvers written in python which support all different types, but you need to be careful that they are actually doing every step in the higher precision and not silently rounding to float64 somewhere.)
For your problem, I would suggest using the package mpmath, which supports arbitrary precision linear algebra.
It is a bit slower since everything is done in software, but for 14x14 matrices it should still be pretty quick.

Support of trigonometric functions ( e.g.: cos, tan) in Z3

I would like to use Z3 to optimize a set of equations. The problem hereby is my equations are non-linear but most importantly they do have trigonometric functions. Is there a way to deal with that in z3?
I am using the z3py API.
Transcendental numbers and trigonometric functions are usually not supported by SMT solvers.
As Christopher pointed out (thanks!), Z3 does have support for trigonometric functions and transcendentals; but the support is rather quite limited. (In practice, this means you shouldn't expect Z3 to decide every single formula you throw at it; in complicated cases, it's most likely to simply return unknown.)
See https://link.springer.com/chapter/10.1007%2F978-3-642-38574-2_12 for the related publication. There are some examples in the following discussion thread that can get you started: https://github.com/Z3Prover/z3/issues/680
Also, note that the optimizing solver of Z3 doesn't handle nonlinear equations; so you wouldn't be able to optimize them. For this sort of optimization problems, traditional SMT solvers are just not the right choice.
However, if you're happy with δ-satisfiability (allowing a certain error factor), then check out dReal, which can deal with trigonometric functions: http://dreal.github.io/ So far as I can tell, however, it doesn't perform optimization.

Increase performance when calculating feature matrix?

Does calculate_feature_matrix use any libraries such as numba to increase performance?
I am one of the maintainers of Featuretools. calculate_feature_matrix currently only uses functions from Pandas/Numpy/Scipy to increase performance over raw Python. There are several areas where using numba or Cython may help, particularly in the PandasBackend class and in individual feature computation functions.
However, doing so requires a C-compiler or compiled C code, and so adds extra complexity to the installation. Because of this complexity it's currently not high on our priority list, but we may consider adding it in the future.
Instead, we are more focused on scalability to larger datasets, which involves parallelization rather than subroutine optimization.

Can variance be replaced by absolute value in this objective function?

Initially I modeled my objective function as follows:
argmin var(f(x),g(x))+var(c(x),d(x))
where f,g,c,d are linear functions
in order to be able to use linear solvers I modeled the problem as follows
argmin abs(f(x),g(x))+abs(c(x),d(x))
is it correct to change variance to absolute value in this context, I'm pretty sure they imply the same meaning as having the least difference between two functions
You haven't given enough context to answer the question. Even though your question doesn't seem to be about regression, in many ways it is similar to the question of choosing between least squares and least absolute deviations approaches to regression. If that term in your objective function is in any sense an error term then the most appropriate way to model the error depends on the nature of the error distribution. Least squares is better if there is normally distributed noise. Least absolute deviations is better in the nonparametric setting and is less sensitive to outliers. If the problem has nothing to do with probability at all then other criteria need to be brought in to decide between the two options.
Having said all this, the two ways of measuring distance are broadly similar. One will be fairly small if and only if the other is -- though they won't be equally small. If they are similar enough for your purposes then the fact that absolute values can be linearized could be a good motivation to use it. On the other hand -- if the variance-based one is really a better expression of what you are interested in then the fact that you can't use LP isn't sufficient justification to adopt absolute values. After all -- quadratic programming is not all that much harder than LP, at least below a certain scale.
To sum up -- they don't imply the same meaning, but they do imply similar meanings; and, whether or not they are similar enough depends upon your purposes.

Why is Verlet integration better than Euler integration?

Can someone explain to me why Verlet integration is better than Euler integration? And why RK4 is better than Verlet? I don't understand why it is a better method.
The Verlet method is is good at simulating systems with energy conservation, and the reason is that it is symplectic. In order to understand this statement you have to describe a time step in your simulation as a function, f, that maps the state space into itself. In other words each timestep can be written on the following form.
(x(t+dt), v(t+dt)) = f(x(t),v(t))
The time step function, f, of the Verlet method has the special property that it conserves state-space volume. We can write this in mathematical terms. If you have a set A of states in the state space, then you can define f(A) by
f(A) = {f(x)| for x in A}
Now let us assume that the sets A and f(A) are smooth and nice so we can define their volume. Then a symplectic map, f, will always fulfill that the volume of f(A) is the same as the volume of A. (and this will be fulfilled for all nice and smooth choices of A). This is fulfilled by the time step function of the Verlet method, and therefore the Verlet method is a symplectic method.
Now the final question is. Why is a symplectic method good for simulating systems with energy conservation, but I am afraid that you will have to read a book to understand this.
The Euler method is a first order integration scheme, i.e. the total error is proportional to the step size. However, it can be numerically unstable, in other words, the accumulated error can overwhelm the calculation giving you nonsense. Please note, this instability can occur regardless of how small you make the step size or whether the system is linear or not. I am not familiar with verlet integration, so I can not speak to its efficacy. But, the Runge-Kutta methods differ from the Euler method in more than just step size.
In essence, they are based on a better way of numerically approximating the derivative. The precise details escape me at the moment. In general, the fourth order Runge-Kutta method is considered the workhorse of the integration schemes, but it does have some disadvantages. It is slightly dissipative, i.e. a small first derivative dependent term is added to your calculation which resembles an added friction. Also, it has a fixed step size which can result can make it difficult to achieve the accuracy you desire. Alternatively, you can use an adaptive stepsize scheme, like the Runge-Kutta-Fehlberg method, which gives fifth order accuracy for an additional 6 function evaluations. This can greatly reduce the time necessary to perform your calculation while improving accuracy, as shown here.
If everything just coasts along in a linear way, it wouldn't matter what method you used, but when something interesting (i.e. non-linear) happens, you need to look more carefully, either by considering the non-linearity directly (verlet) or by taking smaller timesteps (rk4).