I'm working on implementing iterative deepening with principal variation for alpha-beta search for a computer chess program, and I was hoping to include a time limit for the search. I was wondering about the consequences of the time limit being reached in the middle of, say, a search at a depth of 5. If this incomplete search has found a new principal variation, would that be guaranteed to be at least as good as the principal variation found by the complete search at a depth of 4? Otherwise, it seems like I should throw out anything found by the incomplete search at a depth of 5.
If you stop in the middle of an iteration you can use the best move found so far backed up to the root on that iteration. It's not guaranteed to be at least as good as the best move found by the previous iteration but it is ordered above it by the current iteration. The best scoring move will be missed on the current iteration only if it's ordered below the stopping move.
Related
We are using the "auto delay until last" pattern from the docs, https://www.optaplanner.org/docs/optaplanner/latest/design-patterns/design-patterns.html
The loop detection computation is extremely expensive, and we would like to minimize the number of times it is called. Right now, it is being called in the #AfterVariableChanged method of the arrival time shadow variable listener.
The only information I have available in that method is the stop that got a new previous stop and the score director. A move may change several planning variables, so I will be doing the loop detection once for every planning variable that changed, when I should only have to do it once per move (and possibly undo move, unless I can be really clever and cache the loop detection result across moves).
Is there a way for me, from the score director, to figure out what move is being executed right now? I have no need to know exactly what move or what kind of move is being performed, only whether I am still in the same move as before.
I tried using scoreDirector.toString(), which has an incrementing number in it, but that number appears to be the same for a move and the corresponding undo move.
No, there is no access to a move from scoring code. That is by design - scoring needs to be independent of moves executed. Any solution has a score, and if two solutions represent the same state of the problem, their scores must equal - therefore the only thing that matters for the purposes of scoring is the state of the solution, not the state of the solver or any other external factor.
After reading the chessprogramming wiki and other sources, I've been confused about what the exact purpose of iterative deepening. My original understanding was the following:
It consisted of minimax search performed at depth=1, depth=2, etc. until reaching the desired depth. After a minimax search of each depth, sort the root-node moves according to the results from that search, to make for optimal move ordering in the next search with depth+1, so in the next deeper search,the PV-move is searched, then the next best move, then the next best move after that, and so on.
Is this correct? Doubts emerged when I read about MVV-LVA ordering, specifically about ordering captures, and additionally, using hash tables and such. For example, this page recommends a move ordering of:
PV-move of the principal variation from the previous iteration of an iterative deepening framework for the leftmost path, often implicitly done by 2.
Hash move from hash tables
Winning captures/promotions
Equal captures/promotions
Killer moves (non capture), often with mate killers first
Non-captures sorted by history heuristic and that like
Losing captures
If so, then what's the point of sorting the minimax from each depth, if only the PV-move is needed? On the other hand, if the whole point of ID is the PV-move, won't it be a waste to search from every single minimax depth up till desired depth just to calculate the PV-move of each depth?
What is the concrete purpose of ID, and how much computation does it save?
Correct me if I am wrong, but I think you are mixing 2 different concepts here.
Iterative deepening is mainly used to set a maximum search time for each move. The AI will go deeper and deeper, and then when the decided time is up it returns the move from the latest depth it finished searching. Since each increase in depth leads to exponentially longer search times, searching each depth from e.g. 1 to 12 take almost the same time as only searching with depth 12.
Sorting the moves is done to maximize the effect of alpha-beta pruning. If you want an optimal alpha-beta pruning you look at the best move first. Which is of course impossible to know beforehand, but the points you stated above is a good guess. Just make sure that the sorting algorithm doens't slow down your recursive function, and by that removing the effect from the alhpa-beta.
Hope this helps and that I understood your question correctly.
I built an application which implements a similar function as task assignment. I thought it works well until recently I noticed the solutions are not optimal. In details, there is a score table for each possible pair between machines and tasks, and usually the number of machines is much less than the number of tasks. I used hard/medium/soft rules, where the soft rule is incremental based on the score of each assignment from the score table.
However, when I reviewed the results after 1-2 hours run, I found out of the unassigned tasks there are many better choices (would achieve higher soft score if assigned) than current assignments. The benchmark reports indicate that the total soft score reached plateau within a hour and then stuck at that score level.
I checked the logic of rules - if the soft rule working perfectly, it should eventually find a way of allocation which achieves the highest overall soft score, whereas meeting the other hard/medium rules, isn't it?
I've been trying various things such as tuning algorithm parameters, scaling the score table, etc. but none delivers the optimal solution.
One problem is that you might be facing a score trap (see docs). In that case, make your constraint score more fine grained to deal with that.
If that's not the case, and you're stuck in a local optima, then I wouldn't play too much with the algorithm parameters - they will probably fix it, but you'll be overfitting on that dataset.
Instead, figure out the smallest possible move that gets you of that local optima and a step closer to the global optimum. Add that kind of moves as a custom move. For example if a normal swap move can't help, but you see a way of getting there by doing a 3-swap move, then implement that move.
In the opta planner configuration ,there is a provision to specify the termination time out.
Is there a better way to handle the termination time out strategy? For example , my problem size is small and I have set the termination time out as 10 sec.
But I can see from the logs that the best score is obtained well within 2 - 3 seconds. Is there any means to exit once the best score is reached ?
Or should the program always run till the timeout is reached and then output the best score.
Take a look at the Termination chapter in the OptaPlanner documentation.
What you are referring to is called BestScoreTermination but it might not be what you actually want -- do note that OptaPlanner has no way of knowing if the score is "the optimal score"... unless you configure Exhaustive Search (which doesn't scale well).
Therefore, if you misjudge your problem and set the BestScoreTermination to something "better" than the optimal value, OptaPlanner will run until it tries out all combinations (which might take effectively forever on big problems). If you're looking for a compromise, take a look at "termination composition"
I'm using OptaPlanner to solve some plannig problems. I read the documentation and I'm not quite sure how does exactly Hill Climbing and Tabu Search algorithms work. What I'm unsure of is:
does hill climbing pick only moves with THE BEST score that is BETTER than current one or does it allow to pick moves with THE BEST score that is NOT WORSE than the current one?
does tabu search allow picking moves that have WORSE score than the current one if there is no move leading to a solution with better or equal score to the current one?
For Hill Climbing, see HillClimbingAcceptor#isAccepted(...). It accepts any move that has a score that is better or equal to the lastest step score. And looking at the default forager config for hill climing (in LocalSearchPhaseConfig, which says foragerConfig.setAcceptedCountLimit(1);), as soon as 1 move is accepted, it is the winning move.
For Tabu Search, it will select moves that have a worse score, if:
in the number of moves it selects per step (usually acceptedCountLimit is configured to 1000 or so), no better move is seen
OR all the moves that do lead to a better score are in the tabu list (they are "taboo to use"). For solutionTabu, this means that there's a guarantee that they won't lead to a new best solution (but solutionTabu is useless). For entityTabu there is no such 100% guarantee, but you will get a better results in about 99.999999999999999999999999999999999999999999999999999999999999999% of the cases if you have more than let's say 50+ variables (and more if you have a 1000+ variables).
PS: Hill Climbing sucks. There's never a good reason to not use Late Acceptance or Tabu Search instead.
PPS: Use the Benchmarker, do let HC, LA, TS, ... fight against each other. It will give you a lot of insight.