what is the best data structure for solving TSP in visual basic - vb.net

I made a program in VB to solve TSP using Genetic algorithm and I use Array list as Data structure , I want to ask is there another data structure for solving TSP in visual basic better than what I used?
also I will make a program in VB to solve TSP using branch and bound algorithm , what is the best data structure can I use it in this case or array good for that?
thank you

I don't know VB, but the following should be general enough.
If the genotypes are directly the city permutations the data structures I use are (for N cities):
a distance matrix - a N-1 by N-1 2-D array where position (i, j) contains the distance from i-th city to j-th city
the genotypes are then arrays (or lists) of the city indices (i.e. 0..N-1)
The fitness evaluation is then easy and fast as it is just a single walk through the genotype with constant-time lookups of the distances. If memory is an issue and the problem is BIG (i.e. tens of thousands of cities and more) you might want to consider not storing the whole distance matrix and store only a part of it (if the problem type allows to, like in symmetric TSP where distance from A to B is equal to distance from B to A) or just not to store it at all and compute the distances on demand.
For branch-and-bound approach you basically need the distance matrix too. If you are going to do some distance-based prioritisation over the order in which are the cities chosen and your TSP is a metric one (i.e. each city is a point in a 2D plane) you can use K-D tree for fast lookup of cities nearest to any point in the plane.

Related

Breadth first or Depth first

There is a theory that says six degrees of seperations is the highest
degree for people to be connected through a chain of acquaintances.
(You know the Baker - Degree of seperation 1, the Baker knows someone
you don't know - Degree of separation 2)
We have a list of People P, list A of corresponding acquaintances
among these people, and a person x
We are trying to implement an algorithm to check if person x respects
the six degrees of separations. It returns true if the distance from x
to all other people in P is at most six, false otherwise.
We are tying to accomplish O(|P| + |A|) in the worst-case.
To implement this algorithm, I thought about implementing an adjacency list over an adjacency matrix to represent the Graph G with vertices P and edges A, because an Adjacency Matrix would take O(n^2) to traverse.
Now I thought about using either BFS or DFS, but I can't seem to find a reason as to why the other is more optimal for this case.
I want to use BFS or DFS to store the distances from x in an array d, and then loop over the array d to look if any Degree is larger than 6.
DFS and BFS have the same Time Complexity, but Depth is better(faster?) in most cases at finding the first Degree larger than 6, whereas Breadth is better at excluding all Degrees > 6 simultaneously.
After DFS or BFS I would then loop over the array containing the distances from person x, and return true if there was no entry >6 and false when one is found.
With BFS, the degrees of separations would always be at the end of the Array, which would maybe lead to a higher time complexity?
With DFS, the degrees of separations would be randomly scattered in the Array, but the chance to have a degree of separation higher than 6 early in the search is higher.
I don't know if it makes any difference to the Time Complexity if using DFS or BFS here.
Time complexity of BFS and DFS is exactly the same. Both methods visit all connected vertices of the graph, so in both cases you have O(V + E), where V is the number of vertices and E is the number of edges.
That being said, sometimes one algorithm can be preferred over the other precisely because the order of vertex visitation is different. For instance, if you were to evaluate a mathematical expression, DFS would be much more convenient.
In your case, BFS could be used to optimize graph traversal, because you can simply cut-off BFS at the required degree of separation level. All the people who have the required (or bigger) degree of separation would be left unmarked as visited.
The same trick would be much more convoluted to implement with DFS, because as you've astutely noticed, DFS first gets "to the bottom" of the graph, and then it goes back recursively (or via stack) up level by level.
I believe that you can use the the Dijkstra algorithm.
Is a BFS approach that updates your path, is the path have a smaller value. Think as distance have always a cost of 1 and, if you have two friends (A and B) for a person N.
Those friends have a common friend C but, in a first time your algorithm checks a distance for friend A with cost 4 and mark as visited, they can't check the friend B that maybe have a distance of 3. The Dijkstra will help you doing checking this.
The Dijkstra solve this in O(|V|+|E|log|V)
See more at https://en.wikipedia.org/wiki/Dijkstra%27s_algorithm

Does translating the genes in a chromosome for a genetic algorithm for a combinatorial function increase the diversity of candidates?

I'm new to genetic algorithms and am writing code for the Traveling Salesman problem. I'm using cycle crossover to generate new offspring and I've found that this leads to some of the offspring retaining the same exact phenotype as one parent even when the two parents are different. Would translating the chromosomes avoid this?
By translate I mean a chromosome with phenotype ABCDE shifting over two to DEABC. They would be equivalent answers and have equal fitness, but might make more diverse offspring.
Is this worth it in the long run, or is it just wasting computing time?
Cycle crossover (CX) is based on the assumption that it's important to preserve the absolute position of cities (a city preferably inherits its position from either parent) and the preventive "translation" is against the spirit of CX.
Anyway multiple studies (e.g. 1) have shown that for TSP the key is to preserve the relative position of cities and the edges.
So it could work, but you have to experiment. Some form of mutation is another possibility.
Probably, if the characteristics of CX aren't satisfying, a different crossover operator is a better choice: staying with simple operators, one of the most successful is Order Crossover (e.g. 2).
L. Darrell Whitley, Timothy Starkweather, D'Ann Fuquay - Scheduling problems and traveling salesmen: The genetic edge recombination operator - 1989.
Pablo Moscato - On Genetic Crossover Operators for Relative Order Preservation.

Generating Matched Pairs for Statistical Analysis

In my study, a person is represented as a pair of real numbers (x, y). x is on [30, 80] and y is [60, 120]. There are two types of people, A and B. I have ~300 of each type. How can I generate the largest (or even a large) set of pairs of one person from A with one from B: ((xA, yA), (xB, yB)) such that each pair of points is close? Two points are close if abs(x1-x2) < dX and abs(y1 - y2) < dY. Similar constraints are acceptable. (That is, this constraint is roughly a Manhattan metric, but euclidean/etc is ok too.) Not all points need be used, but no point can be reused.
You're looking for the Hungarian Algorithm.
Suggested formulation: A are rows, B are columns, each cell contains a distance metric between Ai and Bi, e.g. abs(X(Ai)-X(Bi)) + abs(Y(Ai)-Y(Bi)). (You can normalize the X and Y values to [0,1] if you want distances to be proportional to the range of each variable)
Then use the Hungarian Algorithm to minimize matching weight.
You can filter out matches with distances over your threshold. If you're worried that this filtering might cause the approach to be sub-optimal, you could set distances over your threshold to a very high number.
There are many implementations of this algorithm. A short search found one in any conceivable language, including VBA for Excel and some online solvers (not sure about matching 300x300 matrix with them, though)
Hungarian algorithm did it, thanks Etov.
Source code available here: http://www.filedropper.com/stackoverflow1

Row / column vs linear indexing speed (spatial locality)

Related question:
This one
I am using a spatial grid which can potentially get big (10^6 nodes) or even bigger. I will regularly have to perform displacement operations (like a particle from a node to another). I'm not a crack in informatics but I begin to understand the concepts of cache lines and spatial locality, though not well yet. So, I was wandering if it is preferible to use a 2D array (and if yes, which one? I'd prefer to avoid boost for now, but maybe I will link it later) and indexing the displacement for example like this:
Array[i][j] -> Array[i-1][j+2]
or, with a 1D array, if NX is the "equivalent" number of columns:
Array[i*NX+j] -> Array[(i-1)*NX+j+2]
Knowing that it will be done nearly one million times per iteration, with nearly one million iteration as well.
With modern compilers and optimization enabled both of these will probably generate the exact same code
Array[i-1][j+2] // Where Array is 2-dimensional
and
Array[(i-1)*NX+j+2] // Where Array is 1-dimensional
assuming NX is the dimension of the second subscript in the 2-dimensional Array (the number of columns).

Search optimization problem

Suppose you have a list of 2D points with an orientation assigned to them. Let the set S be defined as:
S={ (x,y,a) | (x,y) is a 2D point, a is an orientation (an angle) }.
Given an element s of S, we will indicate with s_p the point part and with s_a the angle part. I would like to know if there exist an efficient data structure such that, given a query point q, is able to return all the elements s in S such that
(dist(q_p, s_p) < threshold_1) AND (angle_diff(q_a, s_a) < threshold_2) (1)
where dist(p1,p2), with p1,p2 2D points, is the euclidean distance, and angle_diff(a1,a2), with a1,a2 angles, is the difference between angles (taken to be the smallest one). The data structure should be efficient w.r.t. insertion/deletion of elements and the search as defined above. The number of vectors can grow up to 10.000 and more, but take this with a grain of salt.
Now suppose to change the above requirement: instead of using the condition (1), let's request all the elements of S such that, given a distance function d, we want all elements of S such that d(q,s) < threshold. If i remember well, this last setup is called range-search. I don't know if the first case can be transformed in the second.
For the distance search I believe the accepted best method is a Binary Space Partition tree. This can be stored as a series of bits. Each two bits (for a 2D tree) or three bits (for a 3D tree) subdivides the space one more level, increasing resolution.
Using a BSP, locating a set of objects to compare distances with is pretty easy. Just find the smallest set of squares or cubes which contain the edges of your distance box.
For the angle, I don't know of anything. I suppose that you could store each object in a second list or tree sorted by its angle. Then you would find every object at the proper distance using the BSP, every object at the proper angles using the angle tree, then do a set intersection.
You have effectively described a "three dimensional cyclindrical space", ie. a space that is locally three dimensional but where one dimension is topologically cyclic. In other words, it is locally flat and may be modeled as the boundary of a four-dimensional object C4 in (x, y, z, w) defined by
z^2 + w^2 = 1
where
a = arctan(w/z)
With this model, the space defined by your constraints is a 2-dimensional cylinder wrapped "lengthwise" around a cross section wedge, where the wedge wraps around the 4-d cylindrical space with an angle of 2 * threshold_2. This can be modeled using a "modified k-d tree" approach (modified 3-d tree), where the data structure is not a tree but actually a graph (it has cycles). You can still partition this space into cells with hyperplane separation, but traveling along the curve defined by (z, w) in the positive direction may encounter a point encountered in the negative direction. The tree should be modified to actually lead to these nodes from both directions, so that the edges are bidirectional (in the z-w curve direction - the others are obviously still unidirectional).
These cycles do not change the effectiveness of the data structure in locating nearby points or allowing your constraint search. In fact, for the most part, those algorithms are only slightly modified (the simplest approach being to hold a visited node data structure to prevent cycles in the search - you test the next neighbors about to be searched).
This will work especially well for your criteria, since the region you define is effectively bounded by these axis-defined hyperplane-bounded cells of a k-d tree, and so the search termination will leave a region on average populated around pi / 4 percent of the area.