Can variance be replaced by absolute value in this objective function? - optimization

Initially I modeled my objective function as follows:
argmin var(f(x),g(x))+var(c(x),d(x))
where f,g,c,d are linear functions
in order to be able to use linear solvers I modeled the problem as follows
argmin abs(f(x),g(x))+abs(c(x),d(x))
is it correct to change variance to absolute value in this context, I'm pretty sure they imply the same meaning as having the least difference between two functions

You haven't given enough context to answer the question. Even though your question doesn't seem to be about regression, in many ways it is similar to the question of choosing between least squares and least absolute deviations approaches to regression. If that term in your objective function is in any sense an error term then the most appropriate way to model the error depends on the nature of the error distribution. Least squares is better if there is normally distributed noise. Least absolute deviations is better in the nonparametric setting and is less sensitive to outliers. If the problem has nothing to do with probability at all then other criteria need to be brought in to decide between the two options.
Having said all this, the two ways of measuring distance are broadly similar. One will be fairly small if and only if the other is -- though they won't be equally small. If they are similar enough for your purposes then the fact that absolute values can be linearized could be a good motivation to use it. On the other hand -- if the variance-based one is really a better expression of what you are interested in then the fact that you can't use LP isn't sufficient justification to adopt absolute values. After all -- quadratic programming is not all that much harder than LP, at least below a certain scale.
To sum up -- they don't imply the same meaning, but they do imply similar meanings; and, whether or not they are similar enough depends upon your purposes.

Related

Parameter Estimation to Minimize Runtime

Suppose, I an algorithm, whose runtime depends on two parameters. I want to find the best set of parameters that minimizes the runtime. The two parameters are continuous double values in the range of 0 to INFINITY.
Therefore, for two parameters a,b: I want to find the best values of a and b that minimize the runtime. I think this is pretty standard practice, but I could not find good literature on this. I found few literature such as MLE, Least Squares, etc. but they talk about distribution.
First use your brains to understand the possible functional relationship between those parameters and the running time, in a qualitative way. This means having a first idea on the number and positions of possible maxima, smoothness of the function, asymptotic behavior and any other clue that you can find.
Then make up your mind about a reasonable range of values where it makes sense to sample the function values. If those ranges are very wide, it is preferable to sample using a geometric progression rather than arithmetic (say, powers of 2).
Then measure and observe the function values with a graphical viewer and confirm your intuitions. It is likely that this will be enough to spot the gross location of the absolute maximum. Finding an accurate position might be useless if it gives you the last percents of improvement. It is also very likely that the location of the optimum will depend on the particular dataset, making accurate location even less useful.

Finding best path trought strongly connected component

I have a directed graph which is strongly connected and every node have some price(plus or negative). I would like to find best (highest score) path from node A to node B. My solution is some kind of brutal force so it takes ages to find that path. Is any algorithm for this or any idea how can I do it?
Have you tried the A* algorithm?
It's a fairly popular pathfinding algorithm.
The algorithm itself is not to difficult to implement, but there are plenty of implementations available online.
Dijkstra's algorithm is a special case for the A* (in which the heuristic function h(x) = 0).
There are other algorithms who can outperform it, but they usually require graph pre-processing. If the problem is not to complex and you're looking for a quick solution, give it a try.
EDIT:
For graphs containing negative edges, there's the Bellman–Ford algorithm. Detecting the negative cycles comes at the cost of performance, though (worse than the A*). But it still may be better than what you're currently using.
EDIT 2:
User #templatetypedef is right when he says the Bellman-Ford algorithm may not work in here.
The B-F works with graphs where there are edges with negative weight. However, the algorithm stops upon finding a negative cycle. I believe that is a useful behavior. Optimizing the shortest path in a graph that contains a cycle of negative weights will be like going down a Penrose staircase.
What should happen if there's the possibility of reaching a path with "minus infinity cost" depends on the problem.

Converting decision problems to optimization problems? (evolutionary algorithms)

Decision problems are not suited for use in evolutionary algorithms since a simple right/wrong fitness measure cannot be optimized/evolved. So, what are some methods/techniques for converting decision problems to optimization problems?
For instance, I'm currently working on a problem where the fitness of an individual depends very heavily on the output it produces. Depending on the ordering of genes, an individual either produces no output or perfect output - no "in between" (and therefore, no hills to climb). One small change in an individual's gene ordering can have a drastic effect on the fitness of an individual, so using an evolutionary algorithm essentially amounts to a random search.
Some literature references would be nice if you know of any.
Application to multiple inputs and examination of percentage of correct answers.
True, a right/wrong fitness measure cannot evolve towards more rightness, but an algorithm can nonetheless apply a mutable function to whatever input it takes to produce a decision which will be right or wrong. So, you keep mutating the algorithm, and for each mutated version of the algorithm you apply it to, say, 100 different inputs, and you check how many of them it got right. Then, you select those algorithms that gave more correct answers than others. Who knows, eventually you might see one which gets them all right.
There are no literature references, I just came up with it.
Well i think you must work on your fitness function.
When you say that some Individuals are more close to a perfect solution can you identify this solutions based on their genetic structure?
If you can do that a program could do that too and so you shouldn't rate the individual based on the output but on its structure.

rule based fuzzy control system and function approximation

I am trying to implement a function approximator (aggregation) using a rule-based fuzzy control system. So as to simplify my implementation (and have better understanding) I am trying to approximate y=x^2 (the simplest non-linear function). As far as i understand i have to map my input (e.g. uniform samples over [-1,1]) to fuzzy sets (fuzzyfication) and then use a defuzzyfication method to take crisp values. Is there any simple explanation of this procedure because fuzzy control system literature is a bit mess.
This is sort of a broad question, but I'll give it a go since it has sat unanswered for so long.
First, I believe you need to refine your objective (at least as it stated here). I would hesitate to use the term "function approximation" in this context. If I follow your question correctly, the objective is map a non-linear function into another domain via fuzzy methods.
To do so, you first need to define your fuzzy set membership functions. (This link is a good example of the process.) Without additional information, the I recommend the triangular function due to its ease in implementation. The number of fuzzy sets, their placement and width (or support), and degree of overlap is application specific. You've indicated that your input domain is [-1,1], so you might find that three fuzzy sets does the trick, i.e Negative, Zero, and Positive.
From there, you need to craft a set of rules, i.e. if x is Negative then...
With rules in place, you can then define the defuzzification process. In short, this step weights the activation of each rule according to the needs of the application.
I don't believe I can contribute more fully until the output is better defined. You state "use a defuzzyfication method to take crisp values." - what does this set of crisp values mean? What is the range? Etc. Furthermore, you'll get more a response if you can identify the areas in which you are stuck (i.e. more specific questions).

Difference between Gene Expression Programming and Cartesian Genetic Programming

Something pretty annoying in evolutionary computing is that mildly different and overlapping concepts tend to pick dramatically different names. My latest confusion because of this is that gene-expression-programming seems very similar to cartesian-genetic-programming.
(how) Are these fundamentally different concepts?
I've read that indirect encoding of GP instructions is an effective technique ( both GEP and CGP do that ). Has there been reached some sort of consensus that indirect encoding has outdated classic tree bases GP?
Well, it seems that there is some difference between gene expression programming (GEP) and cartesian genetic programming (CGP or what I view as classic genetic programming), but the difference might be more hyped up than it really ought to be. Please note that I have never used GEP, so all of my comments are based on my experience with CGP.
In CGP there is no distinction between genotype and a phenotype, in other words- if you're looking at the "genes" of a CGP you're also looking at their expression. There is no encoding here, i.e. the expression tree is the gene itself.
In GEP the genotype is expressed into a phenotype, so if you're looking at the genes you will not readily know what the expression is going to look like. The "inventor" of GP, Cândida Ferreira, has written a really good paper and there are some other resources which try to give a shorter overview of the whole concept.
Ferriera says that the benefits are "obvious," but I really don't see anything that would necessarily make GEP better than CGP. Apparently GEP is multigenic, which means that multiple genes are involved in the expression of a trait (i.e. an expression tree). In any case, the fitness is calculated on the expressed tree, so it doesn't seem like GEP is doing anything to increase the fitness. What the author claims is that GEP increases the speed at which the fitness is reached (i.e. in fewer generations), but frankly speaking you can see dramatic performance shifts from a CGP just by having a different selection algorithm, a different tournament structure, splitting the population into tribes, migrating specimens between tribes, including diversity into the fitness, etc.
Selection:
random
roulette wheel
top-n
take half
etc.
Tournament Frequency:
once per epoch
once per every data instance
once per generation.
Tournament Structure:
Take 3, kill 1 and replace it with the child of the other two.
Sort all individuals in the tournament by fitness, kill the lower half and replace it with the offspring of the upper half (where lower is worse fitness and upper is better fitness).
Randomly pick individuals from the tournament to mate and kill the excess individuals.
Tribes
A population can be split into tribes that evolve independently of each-other:
Migration- periodically, individual(s) from a tribe would be moved to another tribe
The tribes are logically separated so that they're like their own separate populations running in separate environments.
Diversity Fitness
Incorporate diversity into the fitness, where you count how many individuals have the same fitness value (thus are likely to have the same phenotype) and you penalize their fitness by a proportionate value: the more individuals with the same fitness value, the more penalty for those individuals. This way specimens with unique phenotypes will be encouraged, therefore there will be much less stagnation of the population.
Those are just some of the things that can greatly affect the performance of a CGP, and when I say greatly I mean that it's in the same order or greater than Ferriera's performance. So if Ferriera didn't tinker with those ideas too much, then she could have seen much slower performance of the CGPs... especially if she didn't do anything to combat stagnation. So I would be careful when reading performance statistics on GEP, because sometimes people fail to account for all of the "optimizations" available out there.
There seems to be some confusion in these answers that must be clarified. Cartesian GP is different from classic GP (aka tree-based GP), and GEP. Even though they share many concepts and take inspiration from the same biological mechanisms, the representation of the individuals (the solutions) varies.
In CGPthe representation (mapping between genotype and phenotype) is indirect, in other words, not all of the genes in a CGP genome will be expressed in the phenome (a concept also found in GEP and many others). The genotypes can be coded in a grid or array of nodes, and the resulting program graph is the expression of active nodes only.
In GEP the representation is also indirect, and similarly not all genes will be expressed in the phenotype. The representation in this case is much different from treeGP or CGP, but the genotypes are also expressed into a program tree. In my opinion GEP is a more elegant representation, easier to implement, but also suffers from some defects like: you have to find the appropriate tail and head size which is problem specific, the mnltigenic version is a bit of a forced glue between expression trees, and finally it has too much bloat.
Independently of which representation may be better than the other in some specific problem domain, they are general purpose, can be applied to any domain as long as you can encode it.
In general, GEP is simpler from GP. Let's say you allow the following nodes in your program: constants, variables, +, -, *, /, if, ...
For each of such nodes with GP you must create the following operations:
- randomize
- mutate
- crossover
- and probably other genetic operators as well
In GEP for each of such nodes only one operation is needed to be implemented: deserialize, which takes array of numbers (like double in C or Java), and returns the node. It resembles object deserialization in languages like Java or Python (the difference is that deserialization in programming languages uses byte arrays, where here we have arrays of numbers). Even this 'deserialize' operation doesn't have to be implemented by the programmer: it can be implemented by a generic algorithm, just like it's done in Java or Python deserialization.
This simplicity from one point of view may make searching of best solution less successful, but from other side: requires less work from programmer and simpler algorithms may execute faster (easier to optimize, more code and data fits in CPU cache, and so on). So I would say that GEP is slightly better, but of course the definite answer depends on problem, and for many problems the opposite may be true.