Iterative deepening in minimax - sorting all legal moves, or just finding the PV-move then using MVV-LVA?

Iterative deepening in minimax - sorting all legal moves, or just finding the PV-move then using MVV-LVA? - chess

After reading the chessprogramming wiki and other sources, I've been confused about what the exact purpose of iterative deepening. My original understanding was the following:
It consisted of minimax search performed at depth=1, depth=2, etc. until reaching the desired depth. After a minimax search of each depth, sort the root-node moves according to the results from that search, to make for optimal move ordering in the next search with depth+1, so in the next deeper search,the PV-move is searched, then the next best move, then the next best move after that, and so on.
Is this correct? Doubts emerged when I read about MVV-LVA ordering, specifically about ordering captures, and additionally, using hash tables and such. For example, this page recommends a move ordering of:
PV-move of the principal variation from the previous iteration of an iterative deepening framework for the leftmost path, often implicitly done by 2.
Hash move from hash tables
Winning captures/promotions
Equal captures/promotions
Killer moves (non capture), often with mate killers first
Non-captures sorted by history heuristic and that like
Losing captures
If so, then what's the point of sorting the minimax from each depth, if only the PV-move is needed? On the other hand, if the whole point of ID is the PV-move, won't it be a waste to search from every single minimax depth up till desired depth just to calculate the PV-move of each depth?
What is the concrete purpose of ID, and how much computation does it save?

Correct me if I am wrong, but I think you are mixing 2 different concepts here.
Iterative deepening is mainly used to set a maximum search time for each move. The AI will go deeper and deeper, and then when the decided time is up it returns the move from the latest depth it finished searching. Since each increase in depth leads to exponentially longer search times, searching each depth from e.g. 1 to 12 take almost the same time as only searching with depth 12.
Sorting the moves is done to maximize the effect of alpha-beta pruning. If you want an optimal alpha-beta pruning you look at the best move first. Which is of course impossible to know beforehand, but the points you stated above is a good guess. Just make sure that the sorting algorithm doens't slow down your recursive function, and by that removing the effect from the alhpa-beta.
Hope this helps and that I understood your question correctly.

Related

Can negamax use an asymmetric evaluation function?

TLDR: I have an asymmetric evaluation function for an implementation of negamax - is that acceptable? Or do I need to make it symmetric?
Longer:
I'm writing a game AI (for the chess-like board game "Hive") that was using minimax with alpha-beta pruning and an asymmetric evaluation function.
But I was having trouble adding transposition tables correctly, and was losing confidence in my minimax implementation, so I decided to switch to negamax using the pseudo-code here: https://en.wikipedia.org/wiki/Negamax#Negamax_with_alpha_beta_pruning_and_transposition_tables
I've got everything "working" and AFAIK accurately following the pseudo-code, but my AI is now making some wildly different moves than before and games that usually ended after 10-15 turns now take 30+, and I'm not convinced the AI is actually playing better than it was before. I'm worried that having an asymmetric evaluation function means I'm scoring nodes differently than before (because of the negamax flip-flopping).
I don't want to change to a symmetric function unless I really have to - I've been trying to produce an optimal function experimentally (AI vs AI battles) and have put in hundreds if not thousands of compute hours into producing a strong evaluation function.

Negamax support asymmetric evaluation functions but it does not lead to optimal play (assuming you have no knowlege about your opponent).
I don't know enough about Hive, but in computer chess it is, in general, a bug to have an asymmetric evaluation function. The reasons behind it should be the same for chess and Hive.
For instance, take the starting position (in chess). White is next to move and let us assume your evaluation function gives the position a score of +0.08.
Now change the position, so black is first to move. Everything is the same, only that the roles of white and black has been changed. Under the assumption, that +0.08 was the optimal score for the white position, why should the position for black not also be evaluated as +0.08?
The same argument goes for any position. If you reverse everything, there is no good reason for playing the position differently.
There is only one exception to this rule. If one opponent is clearly stronger than the other, there are arguments for an asymmetric evaluation. For instance, take a completely drawn position like this:
FEN: 4k3/8/8/p1p1p1p1/PpPpPpPp/1P1P1P1P/8/4K3 b - - 0 1
This position could safely be evaluated as 0. Now imaging the starting position but white starts without one knight. This should be a strong advantage for black.
Let us assume you are Magnus Carlsen and you are playing against on opponent who does not even know the chess rules. Which position would you prefer? Here, I would argue that an asymmetric evaluation could make sense (e.g., evaluate a likely draw similar to a loss). Carlsen should avoid the drawn position, while the beginner should prefer it.
The chances that the beginner can hold its own against the world champion, even at one knight odds, are practically zero. On the other hand, in the drawn position, the skill advantage does not matter, as no order of moves can result in a win or loss.
In computer chess, Rebel had a function to prefer tactical positions when playing against humans (see ANTI GRANDMASTER PLAY). There is also the common concept of "contempt", which is the score that engines give for a remis.
But note that in both my examples, this is not optimal play. Magnus Carlsen would not choose the position without the knight when playing a strong (or unknown) opponent. Also Rebel would not use the anti-human strategy against other machines, which also excel in tactical battles. (Even though, depending on the position, Rebel 10 did use ANTI GRANDMASTER PLAY against computers.)

Iterative deepening with a time limit

I'm working on implementing iterative deepening with principal variation for alpha-beta search for a computer chess program, and I was hoping to include a time limit for the search. I was wondering about the consequences of the time limit being reached in the middle of, say, a search at a depth of 5. If this incomplete search has found a new principal variation, would that be guaranteed to be at least as good as the principal variation found by the complete search at a depth of 4? Otherwise, it seems like I should throw out anything found by the incomplete search at a depth of 5.

If you stop in the middle of an iteration you can use the best move found so far backed up to the root on that iteration. It's not guaranteed to be at least as good as the best move found by the previous iteration but it is ordered above it by the current iteration. The best scoring move will be missed on the current iteration only if it's ordered below the stopping move.

Finding best path trought strongly connected component

I have a directed graph which is strongly connected and every node have some price(plus or negative). I would like to find best (highest score) path from node A to node B. My solution is some kind of brutal force so it takes ages to find that path. Is any algorithm for this or any idea how can I do it?

Have you tried the A* algorithm?
It's a fairly popular pathfinding algorithm.
The algorithm itself is not to difficult to implement, but there are plenty of implementations available online.
Dijkstra's algorithm is a special case for the A* (in which the heuristic function h(x) = 0).
There are other algorithms who can outperform it, but they usually require graph pre-processing. If the problem is not to complex and you're looking for a quick solution, give it a try.
EDIT:
For graphs containing negative edges, there's the Bellman–Ford algorithm. Detecting the negative cycles comes at the cost of performance, though (worse than the A*). But it still may be better than what you're currently using.
EDIT 2:
User #templatetypedef is right when he says the Bellman-Ford algorithm may not work in here.
The B-F works with graphs where there are edges with negative weight. However, the algorithm stops upon finding a negative cycle. I believe that is a useful behavior. Optimizing the shortest path in a graph that contains a cycle of negative weights will be like going down a Penrose staircase.
What should happen if there's the possibility of reaching a path with "minus infinity cost" depends on the problem.

Determining hopeless branches early in branch-and-bound algorithms

I have to design a branch-and bound algorithm that solves the optimal tour of a graph on the cartesian plane every time. I have been given the hint that identifying hopeless branches earlier in the runtime will compound into a program that runs "a hundred times faster". I had the idea of assuming that the shortest edge connected to the starting/ending node will be either the first or last edge in the tour but a thin diamond shaped graph proves otherwise. Does any one have ideas for how to eliminate these hopeless branches or a reference that talks about this?
Basically, is there a better way to branch to subsets of solutions better than just lexicographically, eg. first branch is including and excluding edge a-b, second branch includes and excludes branch a-c

So somewhere in your branch-and-bound algorithm, you look at possible places to go, and then somehow keep track of them to do later.
To make this more efficient, you can do a couple things:
Write a better bound calculator. In other words, come up with an algorithm that determines the bound more accurately. This will result in less time spent on paths that turn out to be poor.
Instead of using a stack to keep track of things to do, use a queue. Instead of using a queue, use a priority queue (heap) ordered by bound, e.g. the things that seem best are put at the top of the heap, and the things that seem bad are put on the bottom.

Nearest-neighbor is a simple algorithm. Branch-and-Bound is just an optimizing loop and additionally you need a sub-problem solver. I think nearest-neighbor is also a branch-and-bound algorithm. Instead I would look into the simplex algorithm. It's a linear programming algorithm. Also cutting-plane algorithm to solve tsp.

Travelling Salesman and Map/Reduce: Abandon Channel

This is an academic rather than practical question. In the Traveling Salesman Problem, or any other which involves finding a minimum optimization ... if one were using a map/reduce approach it seems like there would be some value to having some means for the current minimum result to be broadcast to all of the computational nodes in some manner that allows them to abandon computations which exceed that.
In other words if we map the problem out we'd like each node to know when to give up on a given partial result before it's complete but when it's already exceeded some other solution.
One approach that comes immediately to mind would be if the reducer had a means to provide feedback to the mapper. Consider if we had 100 nodes, and millions of paths being fed to them by the mapper. If the reducer feeds the best result to the mapper than that value could be including as an argument along with each new path (problem subset). In this approach the granularity is fairly rough ... the 100 nodes will each keep grinding away on their partition of the problem to completion and only get the new minimum with their next request from the mapper. (For a small number of nodes and a huge number of problem partitions/subsets to work across this granularity would be inconsequential; also it's likely that one could apply heuristics to the sequence in which the possible routes or problem subsets are fed to the nodes to get a rapid convergence towards the optimum and thus minimize the amount of "wasted" computation performed by the nodes).
Another approach that comes to mind would be for the nodes to be actively subscribed to some sort of channel, or multicast or even broadcast from which they could glean new minimums from their computational loop. In that case they could immediately abandon a bad computation when notified of a better solution (by one of their peers).
So, my questions are:
Is this concept covered by any terms of art in relation to existing map/reduce discussions
Do any of the current map/reduce frameworks provide features to support this sort of dynamic feedback?
Is there some flaw with this idea ... some reason why it's stupid?

that's a cool theme, that doesn't have that much literature, that was done on it before. So this is pretty much a brainstorming post, rather than an answer to all your problems ;)
So every TSP can be expressed as a graph, that looks possibly like this one: (taken it from the german Wikipedia)
Now you can run a graph algorithm on it. MapReduce can be used for graph processing quite well, although it has much overhead.
You need a paradigm that is called "Message Passing". It was described in this paper here: Paper.
And I blog'd about it in terms of graph exploration, it tells quite simple how it works. My Blogpost
This is the way how you can tell the mapper what is the current minimum result (maybe just for the vertex itself).
With all the knowledge in the back of the mind, it should be pretty standard to think of a branch and bound algorithm (that you described) to get to the goal. Like having a random start vertex and branching to every adjacent vertex. This causes a message to be send to each of this adjacents with the cost it can be reached from the start vertex (Map Step). The vertex itself only updates its cost if it is lower than the currently stored cost (Reduce Step). Initially this should be set to infinity.
You're doing this over and over again until you've reached the start vertex again (obviously after you visited every other one). So you have to somehow keep track of the currently best way to reach a vertex, this can be stored in the vertex itself, too. And every now and then you have to bound this branching and cut off branches that are too costly, this can be done in the reduce step after reading the messages.
Basically this is just a mix of graph algorithms in MapReduce and a kind of shortest paths.
Note that this won't yield to the optimal way between the nodes, it is still a heuristic thing. And you're just parallizing the NP-hard problem.
BUT a little self-advertising again, maybe you've read it already in the blog post I've linked, there exists an abstraction to MapReduce, that has way less overhead in this kind of graph processing. It is called BSP (Bulk synchonous parallel). It is more freely in the communication and it's computing model. So I'm sure that this can be a lot better implemented with BSP than MapReduce. You can realize these channels you've spoken about better with it.
I'm currently involved in an Summer of Code project which targets these SSSP problems with BSP. Maybe you want to visit if you're interested. This could then be a part solution, it is described very well in my blog, too. SSSP's in my blog
I'm excited to hear some feedback ;)

It seems that Storm implements what I was thinking of. It's essentially a computational topology (think of how each compute node might be routing results based on a key/hashing function to the specific reducers).
This is not exactly what I described, but might be useful if one had a sufficiently low-latency way to propagate current bounding (i.e. local optimum information) which each node in the topology could update/receive in order to know which results to discard.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas