How can I order the basic solutions of a min cost flow problem according to their cost? - optimization

I was wondering if, given a min cost flow problem and an integer n, there is an efficient algorithm/package or mathematical method, to obtain the set of the
n-best basic solutions of the min cost flow problem (instead of just the best).

Not so easy. There were some special LP solvers that could do that (see: Ralph E. Steuer, Multiple Criteria Optimization: Theory, Computation, and Application, Wiley, 1986), but currently available LP solvers can't.
There is a way to encode a basis using binary variables:
b[i] = 1 if variable x[i] = basic
0 nonbasic
Using this, we can use "no good cuts" or "solution pool" technology to get the k best bases. See: https://yetanothermathprogrammingconsultant.blogspot.com/2016/01/finding-all-optimal-lp-solutions.html. Note that not all solution-pools can do the k-best. (Cplex can't, Gurobi can.) The "no-good" cuts work with any mip solver.
Update: a more recent reference is Craig A. Piercy, Ralph E. Steuer,
Reducing wall-clock time for the computation of all efficient extreme points in multiple objective linear programming, European Journal of Operational Research, 2019, https://doi.org/10.1016/j.ejor.2019.02.042

Related

Time complexity of a genetic algorithm for bin packing

I am trying to explore genetic algorithms (GA) for the bin packing problem, and compare it to classical Any-Fit algorithms. However the time complexity for GA is never mentioned in any of the scholarly articles. Is this because the time complexity is very high? and that the main goal of a GA is to find the best solution without considering the time? What is the time complexity of a basic GA?
Assuming that termination condition is number of iterations and it's constant then in general it would look something like that:
O(p * Cp * O(Crossover) * Mp * O(Mutation) * O(Fitness))
p - population size
Cp - crossover probability
Mp - mutation probability
As you can see it not only depends on parameters like eg. population size but also on implementation of crossover, mutation operations and fitness function implementation. In practice there would be more parameters like for example chromosome size etc.
You don't see much about time complexity in publications because researchers most of the time compare GA using convergence time.
Edit Convergence Time
Every GA has some kind of a termination condition and usually it's convergence criteria. Let's assume that we want to find the minimum of a mathematical function so our convergence criteria will be the function's value. In short we reach convergence during optimization when it's no longer worth it to continue optimization because our best individual doesn't get better significantly. Take a look at this chart:
You can see that after around 10000 iterations fitness doesn't improve much and the line is getting flat. Best case scenario reaches convergence at around 9500 iterations, after that point we don't observe any improvement or it's insignificantly small. Assuming that each line shows different GA then Best case has the best convergence time becuase it reaches convergence criteria first.

Genetic algorithm - find max of minimized subsets

I have a combinatorial optimization problem for which I have a genetic algorithm to approximate the global minima.
Given X elements find: min f(X)
Now I want to expand the search over all possible subsets and to find the one subset where its global minimum is maximal compared to all other subsets.
X* are a subset of X, find: max min f(X*)
The example plot shows all solutions of three subsets (one for each color). The black dot indicates the highest value of all three global minima.
image: solutions over three subsets
The main problem is that evaluating the fitness between subsets runs agains the convergence of the solution within a subset. Further the solution is actually a local minimum.
How can this problem be generally described? I couldn't find a similar problem in the literature so far. For example if its solvable with a multi-object genetic algorithm.
Any hint is much appreciated.
While it may not always provide exactly the highest minima (or lowest maxima), a way to maintain local optima with genetic algorithms consists in implementing a niching method. These are ways to maintain population diversity.
For example, in Niching Methods for Genetic Algorithms by Samir W. Mahfoud 1995, the following sentence can be found:
Using constructed models of fitness sharing, this study derives lower bounds on the population size required to maintain, with probability gamma, a fixed number of desired niches.
If you know the number of niches and you implement the solution mentioned, you could theoretically end up with the local optima you are looking for.

Implementing a 2D recursive spatial filter using Scipy

Minimally, I would like to know how to achieve what is stated in the title. Specifically, signal.lfilter seems like the only implementation of a difference equation filter in scipy, but it is 1D, as shown in the docs. I would like to know how to implement a 2D version as described by this difference equation. If that's as simple as "bro, use this function," please let me know, pardon my naiveté, and feel free to disregard the rest of the post.
I am new to DSP and acknowledging there might be a different approach to answering my question so I will explain the broader goal and give context for the question in the hopes someone knows how do want I want with Scipy, or perhaps a better way than what I explicitly asked for.
To get straight into it, broadly speaking I am using vectorized computation methods (Numpy/Scipy) to implement a Monte Carlo simulation to improve upon a naive for loop. I have successfully abstracted most of my operations to array computation / linear algebra, but a few specific ones (recursive computations) have eluded my intuition and I continually end up in the digital signal processing world when I go looking for how this type of thing has been done by others (that or machine learning but those "frameworks" are much opinionated). The reason most of my google searches end up on scipy.signal or scipy.ndimage library references is clear to me at this point, and subsequent to accepting the "signal" representation of my data, I have spent a considerable amount of time (about as much as reasonable for a field that is not my own) ramping up the learning curve to try and figure out what I need from these libraries.
My simulation entails updating a vector of data representing the state of a system each period for n periods, and then repeating that whole process a "Monte Carlo" amount of times. The updates in each of n periods are inherently recursive as the next depends on the state of the prior. It can be characterized as a difference equation as linked above. Additionally this vector is theoretically indexed on an grid of points with uneven stepsize. Here is an example vector y and its theoretical grid t:
y = np.r_[0.0024, 0.004, 0.0058, 0.0083, 0.0099, 0.0133, 0.0164]
t = np.r_[0.25, 0.5, 1, 2, 5, 10, 20]
I need to iteratively perform numerous operations to y for each of n "updates." Specifically, I am computing the curvature along the curve y(t) using finite difference approximations and using the result at each point to adjust the corresponding y(t) prior to the next update. In a loop this amounts to inplace variable reassignment with the desired update in each iteration.
y += some_function(y)
Not only does this seem inefficient, but vectorizing things seems intuitive given y is a vector to begin with. Furthermore I am interested in preserving each "updated" y(t) along the n updates, which would require a data structure of dimensions len(y) x n. At this point, why not perform the updates inplace in the array? This is wherein lies the question. Many of the update operations I have succesfully vectorized the "Numpy way" (such as adding random variates to each point), but some appear overly complex in the array world.
Specifically, as mentioned above the one involving computing curvature at each element using its neighbouring two elements, and then imediately using that result to update the next row of the array before performing its own curvature "update." I was able to implement a non-recursive version (each row fails to consider its "updated self" from the prior row) of the curvature operation using ndimage generic_filter. Given the uneven grid, I have unique coefficients (kernel weights) for each triplet in the kernel footprint (instead of always using [1,-2,1] for y'' if I had a uniform grid). This last part has already forced me to use a spatial filter from ndimage rather than a 1d convolution. I'll point out, something conceptually similar was discussed in this math.exchange post, and it seems to me only the third response saliently addressed the difference between mathematical notion of "convolution" which should be associative from general spatial filtering kernels that would require two sequential filtering operations or a cleverly merged kernel.
In any case this does not seem to actually address my concern as it is not about 2D recursion filtering but rather having a backwards looking kernel footprint. Additionally, I think I've concluded it is not applicable in that this only allows for "recursion" (backward looking kernel footprints in the spatial filtering world) in a manner directly proportional to the size of the recursion. Meaning if I wanted to filter each of n rows incorporating calculations on all prior rows, it would require a convolution kernel far too big (for my n anyways). If I'm understanding all this correctly, a recursive linear filter is algorithmically more efficient in that it returns (for use in computation) the result of itself applied over the previous n samples (up to a level where the stability of the algorithm is affected) using another companion vector (z). In my case, I would only need to look back one step at output signal y[n-1] to compute y[n] from curvature at x[n] as the rest works itself out like a cumsum. signal.lfilter works for this, but I can't used that to compute curvature, as that requires a kernel footprint that can "see" at least its left and right neighbors (pixels), which is how I ended up using generic_filter.
It seems to me I should be able to do both simultaneously with one filter namely spatial and recursive filtering; or somehow I've missed the maths of how this could be mathematically simplified/combined (convolution of multiples kernels?).
It seems like this should be a common problem, but perhaps it is rarely relevant to do both at once in signal processing and image filtering. Perhaps this is why you don't use signals libraries solely to implement a fast monte carlo simulation; though it seems less esoteric than using a tensor math library to implement a recursive neural network scan ... which I'm attempting to do right now.
EDIT: For those familiar with the theoretical side of DSP, I know that what I am describing, the process of designing a recursive filters with arbitrary impulse responses, is achieved by employing a mathematical technique called the z-transform which I understand is generally used for two things:
converting between the recursion coefficients and the frequency response
combining cascaded and parallel stages into a single filter
Both are exactly what I am trying to accomplish.
Also, reworded title away from FIR / IIR because those imply specific definitions of "recursion" and may be confusing / misnomer.

Can I use this fitness function?

I am working on a project using a genetic algorithm, and I am trying to formulate a fitness function, my questions are:
What is the effect of fitness formula choice on a GA?
It is possible to make the fitness function equals directly the number of violation (in case of minimisation)?
What is the effect of fitness formula choice on a GA
The fitness function plays a very important role in guiding GA.
Good fitness functions will help GA to explore the search space effectively and efficiently. Bad fitness functions, on the other hand, can easily make GA get trapped in a local optimum solution and lose the discovery power.
Unfortunately every problem has its own fitness function.
For classification tasks error measures (euclidean, manhattan...) are widely adopted. You can also use entropy based approaches.
For optimization problems, you can use a crude model of the function you are investigating.
Extensive literature is available on the characteristics of fitness function (e.g. {2}, {3}, {5}).
From an implementation point of view, some additional mechanisms have to be taken into consideration: linear scaling, sigma truncation, power scaling... (see {1}, {2}).
Also the fitness function can be dynamic: changing during the evolution to help search space exploration.
It is possible to make the fitness function equals directly the number of violation (in case of minimisation)?
Yes, it's possible but you have to consider that it could be a too coarse grained fitness function.
If the fitness function is too coarse (*), it doesn't have enough expressiveness to guide the search and the genetic algorithm will get stuck in local minima a lot more often and may never converge on a solution.
Ideally a good fitness function should have the capacity to tell you what the best direction to go from a given point is: if the fitness of a point is good, a subset of its neighborhood should be better.
So no large plateau (a broad flat region that doesn't give a search direction and induces a random walk).
(*) On the other hand a perfectly smooth fitness function could be a sign you are using the wrong type of algorithm.
A naive example: you look for parameters a, b, c such that
g(x) = a * x / (b + c * sqrt(x))
is a good approximation of n given data points (x_i, y_i)
You could minimize this fitness function:
| 0 if g(x_i) == y_i
E1_i = |
| 1 otherwise
f1(a, b, c) = sum (E1_i)
i
and it could work, but the search isn't aimed. A better choice is:
E2_i = (y_i - g(x_i)) ^ 2
f1(a, b, c) = sum (E2_i)
i
now you have a "search direction" and greater probability of success.
Further details:
Genetic Algorithms: what fitness scaling is optimal? by Vladik Kreinovich, Chris Quintana
Genetic Algorithms in Search, Optimization and Machine Learning by Goldberg, D. (1989, Addison-Wesley)
The Royal Road for Genetic Algorithms: Fitness Landscapes and GA Performance by Melanie Mitchell, Stephanie Forrest, John H Holland.
Avoiding the pitfalls of noisy fitness functions with genetic algorithms by Fiacc Larkin, Conor Ryan (ISBN: 978-1-60558-325-9)
Essentials of Metaheuristics by Sean Luke

Standard Errors for Differential Evolution

Is it possible to calculate standard errors for Differential Evolution?
From the Wikipedia entry:
http://en.wikipedia.org/wiki/Differential_evolution
It's not derivative based (indeed that is one of its strengths) but how then so you calculate the standard errors?
I would have thought some kind of bootstrapping strategy might have been applicable but can't seem to find any sources than apply bootstrapping to DE?
Baz
Concerning the standard errors, differential evolution is just like any other evolutionary algorithm.
Using a bootstrapping strategy seems a good idea: the usual formulas assume a normal (Gaussian) distribution for the underlying data. That's almost never true for evolutionary computation (exponential distributions being far more common, probably followed by bimodal distributions).
The simplest bootstrap method involves taking the original data set of N numbers and sampling from it to form a new sample (a resample) that is also of size N. The resample is taken from the original using sampling with replacement. This process is repeated a large number of times (typically 1000 or 10000 times) and for each of these bootstrap samples we compute its mean / median (each of these are called bootstrap estimates).
The standard deviation (SD) of the means is the bootstrapped standard error (SE) of the mean and the SD of the medians is the bootstrapped SE of the median (the 2.5th and 97.5th centiles of the means are the bootstrapped 95% confidence limits for the mean).
Warnings:
the word population is used with different meanings in different contexts (bootstrapping vs evolutionary algorithm)
in any GA or GP, the average of the population tells you almost nothing of interest. Use the mean/median of the best-of-run
the average of a set that is not normally distributed produces a value that behaves non-intuitively. Especially if the probability distribution is skewed: large values in "tail" can dominate and average tends to reflect the typical value of the "worst" data not the typical value of the data in general. In this case it's better the median
Some interesting links are:
A short guide to using statistics in Evolutionary Computation
An Introduction to Statistics for EC Experimental Analysis