Optimizing assignments based on many variables - optimization

I was recently talking with someone in Resource Management and we discussed the problem of assigning developers to projects when there are many variables to consider (of possibly different weights), e.g.:
The developer's skills & the technology/domain of the project
The developer's travel preferences & the location of the project
The developer's interests and the nature of the project
The basic problem the RM person had to deal with on a regular basis was this: given X developers where each developers has a unique set of attributes/preferences, assign them to Y projects where each project has its own set of unique attributes/requirements.
It seems clear to me that this is a very mathematical problem; it reminds me of old optimization problems from algebra and/or calculus (I don't remember which) back in high school: you know, find the optimal dimensions for a container to hold the maximum volume given this amount of material—that sort of thing.
My question isn't about the math, but rather whether there are any software projects/libraries out there designed to address this kind of problem. Does anyone know of any?

My question isn't about the math, but rather whether there are any software projects/libraries out there designed to address this kind of problem. Does anyone know of any?
In my humble opinion, I think that this is putting the cart before the horse. You first need to figure out what problem you want to solve. Then, you can look for solutions.
For example, if you formulate the problem by assigning some kind of numerical compatibility score to every developer/project pair with the goal of maximizing the total sum of compatibility scores, then you have a maximum-weight matching problem which can be solved with the Hungarian algorithm. Conveniently, this algorithm is implemented as part of Google's or-tools library.
On the other hand, let's say that you find that computing compatibility scores to be infeasible or unreasonable. Instead, let's say that each developer ranks all the projects from best to worst (e.g.: in terms of preference) and, similarly, each project ranks each developer from best to worst (e.g.: in terms of suitability to the project). In this case, you have an instance of the Stable Marriage problem, which is solved by the Gale-Shapley algorithm. I don't have a pointer to an established library for G-S, but it's simple enough that it seems that lots of people just code their own.

Yes, there are mathematical methods for solving a type of problem which this problem can be shoehorned into. It is the natural consequence of thinking of developers as "resources", like machine parts, largely interchangeable, their individuality easily reduced to simple numerical parameters. You can make up rules such as
The fitness value is equal to the subject skill parameter multiplied by the square root of the reliability index.
and never worry about them again. The same rules can be applied to different developers, different subjects, different scales of projects (with a SLOC scaling factor of, say, 1.5). No insight or real leadership is needed, the equations make everything precise and "assured". The best thing about this approach is that when the resources fail to perform the way your equations say they should, you can just reduce their performance scores to make them fit. And if someone has already written the tool, then you don't even have to worry about the math.
(It is interesting to note that Resource Management people always seem to impose such metrics on others in an organization -- thereby making their own jobs easier-- and never on themselves...)

Related

Looking for ideas/references/keywords: adaptive-parameter-control of a search algorithm (online-learning)

I'm looking for ideas/experiences/references/keywords regarding an adaptive-parameter-control of search algorithm parameters (online-learning) in combinatorial-optimization.
A bit more detail:
I have a framework, which is responsible for optimizing a hard combinatorial-optimization-problem. This is done with the help of some "small heuristics" which are used in an iterative manner (large-neighborhood-search; ruin-and-recreate-approach). Every algorithm of these "small heuristics" is taking some external parameters, which are controlling the heuristic-logic in some extent (at the moment: just random values; some kind of noise; diversify the search).
Now i want to have a control-framework for choosing these parameters in a convergence-improving way, as general as possible, so that later additions of new heuristics are possible without changing the parameter-control.
There are at least two general decisions to make:
A: Choose the algorithm-pair (one destroy- and one rebuild-algorithm) which is used in the next iteration.
B: Choose the random parameters of the algorithms.
The only feedback is an evaluation-function of the new-found-solution. That leads me to the topic of reinforcement-learning. Is that the right direction?
Not really a learning-like-behavior, but the simplistic ideas at the moment are:
A: A roulette-wheel-selection according to some performance-value collected during the iterations (near past is more valued than older ones).
So if heuristic 1 did find all the new global best solutions -> high probability of choosing this one.
B: No idea yet. Maybe it's possible to use some non-uniform random values in the range (0,1) and i'm collecting some momentum of the changes.
So if heuristic 1 last time used alpha = 0.3 and found no new best solution, then used 0.6 and found a new best solution -> there is a momentum towards 1
-> next random value is likely to be bigger than 0.3. Possible problems: oscillation!
Things to remark:
- The parameters needed for good convergence of one specific algorithm can change dramatically -> maybe more diversify-operations needed at the beginning, more intensify-operations needed at the end.
- There is a possibility of good synergistic-effects in a specific pair of destroy-/rebuild-algorithm (sometimes called: coupled neighborhoods). How would one recognize something like that? Is that still in the reinforcement-learning-area?
- The different algorithms are controlled by a different number of parameters (some taking 1, some taking 3).
Any ideas, experiences, references (papers), keywords (ml-topics)?
If there are ideas regarding the decision of (b) in a offline-learning-manner. Don't hesitate to mention that.
Thanks for all your input.
Sascha
You have a set of parameter variables which you use to control your set of algorithms. Selection of your algorithms is just another variable.
One approach you might like to consider is to evolve your 'parameter space' using a genetic algorithm. In short, GA uses an analogue of the processes of natural selection to successively breed ever better solutions.
You will need to develop an encoding scheme to represent your parameter space as a string, and then create a large population of candidate solutions as your starting generation. The genetic algorithm itself takes the fittest solutions in your set and then applies various genetic operators to them (mutation, reproduction etc.) to breed a better set which then become the next generation.
The most difficult part of this process is developing an appropriate fitness function: something to quantitatively measure the quality of a given parameter space. Your search problem may be too complex to measure for each candidate in the population, so you will need a proxy model function which might be as hard to develop as the ideal solution itself.
Without understanding more of what you've written it's hard to see whether this approach is viable or not. GA is usually well suited to multi-variable optimisation problems like this, but it's not a silver bullet. For a reference start with Wikipedia.
This sounds like hyper heuristics which you're trying to do. Try looking for that keyword.
In Drools Planner (open source, java) I have support for tabu search and simulated annealing out the box.
I haven't implemented the ruin-and-recreate-approach (yet), but that should be easy, although I am not expecting better results. Challenge: Prove me wrong and fork it and add it and beat me in the examples.
Hyper heuristics are on my TODO list.

What model best suits optimizing for a real-time strategy game?

An article has been making the rounds lately discussing the use of genetic algorithms to optimize "build orders" in StarCraft II.
http://lbrandy.com/blog/2010/11/using-genetic-algorithms-to-find-starcraft-2-build-orders/
The initial state of a StarCraft match is pre-determined and constant. And like chess, decisions made in this early stage of the match have long-standing consequences to a player's ability to perform in the mid and late game. So the various opening possibilities or "build orders" are under heavy study and scrutiny. Until the circulation of the above article, computer-assisted build order creation probably wasn't as popularity as it has been recently.
My question is... Is a genetic algorithm really the best way to model optimizing build orders?
A build order is a sequence of actions. Some actions have prerequisites like, "You need building B before you can create building C, but you can have building A at any time." So a chromosome may look like AABAC.
I'm wondering if a genetic algorithm really is the best way to tackle this problem. Although I'm not too familiar with the field, I'm having a difficult time shoe-horning the concept of genes into a data structure that is a sequence of actions. These aren't independent choices that can be mixed and matched like a head and a foot. So what value is there to things like reproduction and crossing?
I'm thinking whatever chess AIs use would be more appropriate since the array of choices at any given time could be viewed as tree-like in a way.
Although I'm not too familiar with the field, I'm having a difficult time shoe-horning the concept of genes into a data structure that is a sequence of actions. These aren't independent choices that can be mixed and matched like a head and a foot. So what value is there to things like reproduction and crossing?
Hmm, that's a very good question. Perhaps the first few moves in Starcraft can indeed be performed in pretty much any order, since contact with the enemy is not as immediate as it can be in Chess, and therefore it is not as important to remember the order of the first few moves as it is to know which of the many moves are included in those first few. But the link seems to imply otherwise, which means the 'genes' are indeed not all that amenable to being swapped around, unless there's something cunning in the encoding that I'm missing.
On the whole, and looking at the link you supplied, I'd say that genetic algorithms are a poor choice for this situation, which could be accurately mathematically modelled in some parts and the search tree expanded out in others. They may well be better than an exhaustive search of the possibility space, but may not be - especially given that there are multiple populations and poorer ones are just wasting processing time.
However, what I mean by "a poor choice" is that it is inefficient relative to a more appropriate approach; that's not to say that it couldn't still produce 98% optimal results in under a second or whatever. In situations such as this where the brute force of the computer is useful, it is usually more important that you have modelled the search space correctly than to have used the most effective algorithm.
As TaslemGuy pointed out, Genetic Algorithms aren't guaranteed to be optimal, even though they usually give good results.
To get optimal results you would have to search through every possible combination of actions until you find the optimal path through the tree-like representation. However, doing this for StarCraft is difficult, since there are so many different paths to reach a goal. In chess you move a pawn from e2 to e4 and then the opponent moves. In StarCraft you can move a unit at instant x or x+1 or x+10 or ...
A chess engine can look at many different aspects of the board (e.g. how many pieces does it have and how many does the opponent have), to guide it's search. It can ignore most of the actions available if it knows that they are strictly worse than others.
For a build-order creator only time really matters. Is it better to build another drone to get minerals faster, or is it faster to start that spawning pool right away? Not as straightforward as with chess.
These kinds of decisions happen pretty early on, so you will have to search each alternative to conclusion before you can decide on the better one, which will take a long time.
If I were to write a build-order optimizer myself, I would probably try to formulate a heuristic that estimates how good (close the to the goal state) the current state is, just as chess engines do:
Score = a*(Buildings_and_units_done/Buildings_and_units_required) - b*Time_elapsed - c*Minerals - d*Gas + e*Drone_count - f*Supply_left
This tries to keep the score tied to the completion percentage as well as StarCraft common knowledge (keep your ressources low, build drones, don't build more supply than you need). The variables a to f would need tweaking, of course.
After you've got a heuristic that can somewhat estimate the worth of a situation, I would use Best-first search or maybe IDDFS to search through the tree of possibilities.
Edit:
I recently found a paper that actually describes build order optimization in StarCraft, in real time even. The authors use depth-first search with branch and bound and heuristics that estimate the minimum amount of effort required to reach the goal based on the tech tree (e.g. zerglings need a spawning pool) and the time needed to gather the required minerals.
Genetic Algorithm can be, or can sometimes not be, the optimal or non-optimal solution. Based on the complexity of the Genetic Algorithm, how much mutation there is, the forms of combinations, and how the chromosomes of the genetic algorithm is interpreted.
So, depending on how your AI is implemented, Genetic Algorithms can be the best.
You are looking at a SINGLE way to implement genetic algorithms, while forgetting about genetic programming, the use of math, higher-order functions, etc. Genetic algorithms can be EXTREMELY sophisticated, and by using clever combining systems for crossbreeding, extremely intelligent.
For instance, neural networks are optimized by genetic algorithms quite often.
Look up "Genetic Programming." It's similar, but uses tree-structures instead of lines of characters, which allows for more complex interactions that breed better. For more complex stuff, they typically work out better.
There's been some research done using hierarchical reinforcement learning to build a layered ordering of actions that efficiently maximizes a reward. I haven't found much code implementing the idea, but there are a few papers describing MAXQ-based algorithms that have been used to explicitly tackle real-time strategy game domains, such as this and this.
This Genetic algorithm only optimizes the strategy for one very specific part of the game: The order of the first few build actions of the game. And it has a very specific goal as well: To have as many roaches as quickly as possible.
The only aspects influencing this system seem to be (I'm no starcraft player):
build time of the various units and
buildings
allowed units and buildings given the available units and buildings
Larva regeneration rate.
This is a relatively limited, relatively well defined problem with a large search space. As such it is very well suited for genetic algorithms (and quite a few other optimization algorithm at that). A full gene is a specific set of build orders that ends in the 7th roach. From what I understand you can just "play" this specific gene to see how fast it finishes, so you have a very clear fitness test.
You also have a few nice constraints on the build order, so you can combine different genes slightly smarter than just randomly.
A genetic algorithm used in this way is a very good tool to find a more optimal build order for the first stage of a game of starcraft. Due to its random nature it is also good at finding a surprising strategy, which might have been an additional goal of the author.
To use a genetic algorithm as the algorithm in an RTS game you'd have to find a way to encode reactions to situations rather than just plain old build orders. This also involves correctly identifying situations which can be a difficult task in itself. Then you'd have to let these genes play thousands of games of starcraft, against each other and (possibly) against humans, selecting and combining winners (or longer-lasting losers). This is also a good application of genetic algorithms, but it involves solving quite a few very hard problems before you even get to the genetic algorithm part.

What are the typical use cases of Genetic Programming?

Today I read this blog entry by Roger Alsing about how to paint a replica of the Mona Lisa using only 50 semi transparent polygons.
I'm fascinated with the results for that particular case, so I was wondering (and this is my question): how does genetic programming work and what other problems could be solved by genetic programming?
There is some debate as to whether Roger's Mona Lisa program is Genetic Programming at all. It seems to be closer to a (1 + 1) Evolution Strategy. Both techniques are examples of the broader field of Evolutionary Computation, which also includes Genetic Algorithms.
Genetic Programming (GP) is the process of evolving computer programs (usually in the form of trees - often Lisp programs). If you are asking specifically about GP, John Koza is widely regarded as the leading expert. His website includes lots of links to more information. GP is typically very computationally intensive (for non-trivial problems it often involves a large grid of machines).
If you are asking more generally, evolutionary algorithms (EAs) are typically used to provide good approximate solutions to problems that cannot be solved easily using other techniques (such as NP-hard problems). Many optimisation problems fall into this category. It may be too computationally-intensive to find an exact solution but sometimes a near-optimal solution is sufficient. In these situations evolutionary techniques can be effective. Due to their random nature, evolutionary algorithms are never guaranteed to find an optimal solution for any problem, but they will often find a good solution if one exists.
Evolutionary algorithms can also be used to tackle problems that humans don't really know how to solve. An EA, free of any human preconceptions or biases, can generate surprising solutions that are comparable to, or better than, the best human-generated efforts. It is merely necessary that we can recognise a good solution if it were presented to us, even if we don't know how to create a good solution. In other words, we need to be able to formulate an effective fitness function.
Some Examples
Travelling Salesman
Sudoku
EDIT: The freely-available book, A Field Guide to Genetic Programming, contains examples of where GP has produced human-competitive results.
Interestingly enough, the company behind the dynamic character animation used in games like Grand Theft Auto IV and the latest Star Wars game (The Force Unleashed) used genetic programming to develop movement algorithms. The company's website is here and the videos are very impressive:
http://www.naturalmotion.com/euphoria.htm
I believe they simulated the nervous system of the character, then randomised the connections to some extent. They then combined the 'genes' of the models that walked furthest to create more and more able 'children' in successive generations. Really fascinating simulation work.
I've also seen genetic algorithms used in path finding automata, with food-seeking ants being the classic example.
Genetic algorithms can be used to solve most any optimization problem. However, in a lot of cases, there are better, more direct methods to solve them. It is in the class of meta-programming algorithms, which means that it is able to adapt to pretty much anything you can throw at it, given that you can generate a method of encoding a potential solution, combining/mutating solutions, and deciding which solutions are better than others. GA has an advantage over other meta-programming algorithms in that it can handle local maxima better than a pure hill-climbing algorithm, like simulated annealing.
I used genetic programming in my thesis to simulate evolution of species based on terrain, but that is of course the A-life application of genetic algorithms.
The problems GA are good at are hill-climbing problems. Problem is that normally it's easier to solve most of these problems by hand, unless the factors that define the problem are unknown, in other words you can't achieve that knowledge somehow else, say things related with societies and communities, or in situations where you have a good algorithm but you need to fine tune the parameters, here GA are very useful.
A situation of fine tuning I've done was to fine tune several Othello AI players based on the same algorithms, giving each different play styles, thus making each opponent unique and with its own quirks, then I had them compete to cull out the top 16 AI's that I used in my game. The advantage was they were all very good players of more or less equal skill, so it was interesting for the human opponent because they couldn't guess the AI as easily.
http://en.wikipedia.org/wiki/Genetic_algorithm#Problem_domains
You should ask yourself : "Can I (a priori) define a function to determine how good a particular solution is relative to other solutions?"
In the mona lisa example, you can easily determine if the new painting looks more like the source image than the previous painting, so Genetic Programming can be "easily" applied.
I have some projects using Genetic Algorithms. GA are ideal for optimization problems, when you cannot develop a fully sequential, exact algorithm do solve a problem. For example: what's the best combination of a car characteristcs to make it faster and at the same time more economic?
At the moment I'm developing a simple GA to elaborate playlists. My GA has to find the better combinations of albums/songs that are similar (this similarity will be "calculated" with the help of last.fm) and suggests playlists for me.
There's an emerging field in robotics called Evolutionary Robotics (w:Evolutionary Robotics), which uses genetic algorithms (GA) heavily.
See w:Genetic Algorithm:
Simple generational genetic algorithm pseudocode
Choose initial population
Evaluate the fitness of each individual in the population
Repeat until termination: (time limit or sufficient fitness achieved)
Select best-ranking individuals to reproduce
Breed new generation through crossover and/or mutation (genetic
operations) and give birth to
offspring
Evaluate the individual fitnesses of the offspring
Replace worst ranked part of population with offspring
The key is the reproduction part, which could happen sexually or asexually, using genetic operators Crossover and Mutation.

How to test numerical analysis routines?

Are there any good online resources for how to create, maintain and think about writing test routines for numerical analysis code?
One of the limitations I can see for something like testing matrix multiplication is that the obvious tests (like having one matrix being the identity) may not fully test the functionality of the code.
Also, there is the fact that you are usually dealing with large data structures as well. Does anyone have some good ideas about ways to approach this, or have pointers to good places to look?
It sounds as if you need to think about testing in at least two different ways:
Some numerical methods allow for some meta-thinking. For example, invertible operations allow you to set up test cases to see if the result is within acceptable error bounds of the original. For example, matrix M-inverse times the matrix M * random vector V should result in V again, to within some acceptable measure of error.
Obviously, this example exercises matrix inverse, matrix multiplication and matrix-vector multiplication. I like chains like these because you can generate quite a lot of random test cases and get statistical coverage that would be a slog to have to write by hand. They don't exercise single operations in isolation, though.
Some numerical methods have a closed-form expression of their error. If you can set up a situation with a known solution, you can then compare the difference between the solution and the calculated result, looking for a difference that exceeds these known bounds.
Fundamentally, this question illustrates the problem that testing complex methods well requires quite a lot of domain knowledge. Specific references would require a little more specific information about what you're testing. I'd definitely recommend that you at least have Steve Yegge's recommended book list on hand.
If you're going to be doing matrix calculations, use LAPACK. This is very well-tested code. Very smart people have been working on it for decades. They've thought deeply about issues that the uninitiated would never think about.
In general, I'd recommend two kinds of testing: systematic and random. By systematic I mean exploring edge cases etc. It helps if you can read the source code. Often algorithms have branch points: calculate this way for numbers in this range, this other way for numbers in another range, etc. Test values close to the branch points on either side because that's where approximation error is often greatest.
Random input values are important too. If you rationally pick all the test cases, you may systematically avoid something that you don't realize is a problem. Sometimes you can make good use of random input values even if you don't have the exact values to test against. For example, if you have code to calculate a function and its inverse, you can generate 1000 random values and see whether applying the function and its inverse put you back close to where you started.
Check out a book by David Gries called The Science of Programming. It's about proving the correctness of programs. If you want to be sure that your programs are correct (to the point of proving their correctness), this book is a good place to start.
Probably not exactly what you're looking for, but it's the computer science answer to a software engineering question.

How to gauge the quality of a software product

I have a product, X, which we deliver to a client, C every month, including bugfixes, enhancements, new development etc.) Each month, I am asked to err "guarantee" the quality of the product.
For this we use a number of statistics garnered from the tests that we do, such as:
reopen rate (number of bugs reopened/number of corrected bugs tested)
new bug rate (number of new, including regressions, bugs found during testing/number of corrected bugs tested)
for each new enhancement, the new bug rate (the number of bugs found for this enhancement/number of mandays)
and various other figures.
It is impossible, for reasons we shan't go into, to test everything every time.
So, my question is:
How do I estimate the number and type of bugs that remain in my software?
What testing strategies do I have to follow to make sure that the product is good?
I know this is a bit of an open question, but hey, I also know that there are no simple solutions.
Thanks.
I don't think you can ever really estimate the number of bugs in your app. Unless you use a language and process that allows formal proofs, you can never really be sure. Your time is probably better spent setting up processes to minimize bugs than trying to estimate how many you have.
One of the most important things you can do is have a good QA team and good work item tracking. You may not be able to do full regression testing every time, but if you have a list of the changes you've made to the app since the last release, then your QA people (or person) can focus their testing on the parts of the app that are expected to be affected.
Another thing that would be helpful is unit tests. The more of your codebase you have covered the more confident you can be that changes in one area didn't inadvertently affected another area. I've found this quite useful, as sometimes I'll change something and forget that it would affect another part of the app, and the unit tests showed the problem right away. Passed unit tests won't guarantee that you haven't broken anything, but they can help increase confidence that changes you make are working.
Also, this is a bit redundant and obvious, but make sure you have good bug tracking software. :)
The question is who requires you to provide the stats.
If it's non-technical people, fake the stats. By "fake", I mean "provide any inevitably meaningless, but real numbers" of the kind you mentioned.
If it's technical people without a CS background, they ought to be told about the halting problem, which is undecidable and is simpler than counting and classifying the remaining bugs.
There's a lot of metrics and tools regarding software quality (code coverage, cyclomatic complexity, coding guidelines and tools enforcing them, etc.). In practice, what works is automating as much tests as possible, having human testers do as many tests that weren't automated as possible, and then pray.
I think keeping it simple is the best way to go. Categorize your bugs by severity, and address them in order of decreasing severity.
This way you can hand over the highest-quality build possible (the number of significant bugs remaining is how I would gauge the quality of the product, as opposed to some complex statistics).
Most of the agile methodologies address this dilemma pretty clearly. You can't test everything. Neither can you test it infinite number of times before you release. So the procedure is to rely on the risk and likelihood of the bug. Both risk and likelihood are numerical values. The product of both gives you a RPN number. If the number is less than 15 you ship a beta. If you can bring it down to less than 10 you ship the product and push the bug to be fixed in a future releasee.
How to calculate risk ?
If its a crash then its a 5
If its a crash but you can provide a work around then its a number less than 5.
If the bug reduces the functionality then its a 4
How to calculate likelihood ?
can you re-produce it every time you run, its a 5.
If the work around provided still causes it to crash then less than 5
Well, I am curious to know whether anyone else using this scheme and eager to know their milage on this.
How long is a piece of string? Ultimately what makes a quality product? Bugs gives some indication yes, but many other factors are involved, Unit Test coverage is a key factor in IMO. But in my experience the main factor that effects whether a product can be deemed quality or not, is good understanding of the problem that is being solved. Often what happens is, the 'problem' that the product is meant to solve is not understood correctly and developers end up inventing the solution to a problem they have flesh out in their head, and not the real problem, thus 'bugs' are made. I am a strong proponent of iterative Agile development, that way the product is constantly access against the 'problem' and the product does not stray to far from its goal.
The questions I heard wer, how do I estimate the bugs in my software? and what techniques do I use to ensure the quality is good?
Rather than go through a full course, here are a couple approaches.
How do I estimate the bugs in my software?
Start with the history, you know how many you found during testing (hopefully) and you know how many were found after the fact. You can use that to estimate how efficient you are at finding bugs (DDR - Defect Detection Rate is one name for this). If you can show that for some consistent time period, your DDR is consistent (or improving) you can provide some insight into the quality of the release by guessing at the number of post-release defects that will be found once the product is released.
What techniques do I use to ensure the quality is good?
Root cause analysis on your bugs will point you to specific components that are buggy, specific developers that create buggy code, the fact that lacking full requirements results in implementation not matching expectations, etc.
Project Review meetings to quickly identify what was good, so those things can be repeated and what was bad and find a way to not do those again.
Hopefully, these give you a good start. Good Luck!
It seems the consensus is that the emphasis should be placed on unit testing. Bug tracking is a good indicator of the product quality, but is only is acurate as your test team. If you employ unit testing it gives you a measurable metric of code coverage and provides regression testing so you can be assured you didn't break anything since last month.
My company relies on system/integration level testing. I see alot of defects being introduced because there is a lack of regression testing. I think "bugs" where the developer's implementation of the requirements deviates from the user's vision is sort of a seperate problem that as Dan and rptony stated is best addressed by Agile methodologies.