All pairs maximally disjoint paths algorithm - optimization

Suppose that there are multiple source destination pairs in an undirected graph. I want to generate disjoint paths for multiple pairs. What would be the complexity of such problem? Is there any polynomial heuristic for finding edge-disjoint paths for these pairs? (i.e. path between s1 and d1 should not have any common edges with the path between s2 and d2)

This looks like a variant of the multi-commodity flow problem: http://en.wikipedia.org/wiki/Multi-commodity_flow_problem
Treat each source/sink pair as a new commodity, and give your edges unit weights to enforce disjoint paths. Now search the literature for approximations to this class of MCFP with unit capacities.

Your problem is NP-hard, even for the case of two sources and two sinks. It becomes polynomially solvable if you stop caring which source matches with which sink.

Related

In CGAL, can one convert a triangulation in more than three dimensions to a polytope?

If this question would be more appropriate on a related site, let me know, and I'd be happy to move it.
I have 165 vertices in ℤ11, all of which are at a distance of √8 from the origin and are extreme points on their corresponding convex hull. CGAL is able to calculate their d-dimensional triangulation in only 133 minutes on my laptop using just under a gigabyte of RAM.
Magma manages a similar 66 vertex case quite quickly, and, crucially for my application, it returns an actual polytope instead of a triangulation. Thus, I can view each d-dimensional face as a single object which can be bounded by an arbitrary number of vertices.
Additionally, although less essential to my application, I can also use Graph : TorPol -> GrphUnd to calculate all the topological information regarding how those faces are connected, and then AutomorphismGroup : Grph -> GrpPerm, ... to find the corresponding automorphism group of that cell structure.
Unfortunately, when applied to the original polytope, Magma's AutomorphismGroup : TorPol -> GrpMat only returns subgroups of GLd(ℤ), instead of the full automorphism group G, which is what I'm truly hoping to calculate. As a matrix group, G ∉ GL11(ℤ), but is instead ∈ GL11(𝔸), where 𝔸 represents the algebraic numbers. In general, I won't need the full algebraic closure of the rationals, ℚ̅, but just some field extension. However, I could make use of any non-trivially powerful representation of G.
With two days of calculation, Magma can manage the 165 vertex case, but is only able to provide information about the polytope's original 165 vertices, 10-facets, and volume. However, attempting to enumerate the d-faces, for any 2 ≤ d < 10, quickly consumes the 256 GB of RAM I have at my disposal.
CGAL's triangulation, on the other hand, only calculates collections of d-simplices, all of which have d + 1 vertices. It seems possible to derive the same facial information from such a triangulation, but I haven't thought of an easy way to code that up.
Am I missing something obvious in CGAL? Do you have any suggestions for alternative ways to calculate the polytope's face information, or to find the full automorphism group of my set of points?
You can use the package Combinatorial maps in CGAL, that is able to represent polytopes in nD. A combinatorial map describes all cells and all incidence and adjacency relations between the cells.
In this package, there is an undocumented method are_cc_isomorphic allowing to test if an isomorphism exist from two starting points. I think you can use this method from all possible pair of starting points to find all automorphisms.
Unfortunatly, there is no method to build a combinatorial map from a dD triangulation. Such method exists in 3D (cf. this file). It can be extended in dD.

Remove unnecessary pairs from reflexive asymetric transitive relation

I have a reflexive asymetric transitive relation represented as an nxn sparse scipy csr matrix.
Now as a result of some transformations I am left with many 'unnecessary' pairs:
set([('a','b'),('b','c'),('a','c')])
I need to remove pairs ('a','c') that can be seen as 'direct' edges when there are 'indirect' ones.
First I was thinking this is a special spanning arborescence, but actually in the following case:
set([('a','b'),('b','d'),('a','c'),('c','d')])
... no pair should be removed. The result is not necessarily a tree.
Is there a name for this kind of problem?
Is there an implementation in scipy?
If not, can you suggest an efficient algorithm in python/numpy/scipy?
EDIT: Seems like this is called a transitive reduction?
But there is no scipy.sparse.csgraph implementation?
EDIT: I guess to get an acyclic directed graph I would have to (temporarily) remove the 'reflexiveness', but this is not a problem.
So the problem is called Transitive Reduction of a directed acyclic graph.
The following code should solve the problem, although it might be far from optimal:
def transitive_reduction(edges): # edges is irreflexive and scipy sparse bool
reduction = edges.copy()
num, i = 99,2
while num > 0:
new = edges**i
num = len(new.nonzero()[0])
reduction = reduction > new
i += 1
reduction.eliminate_zeros() # might or might not be required
return reduction
Explanation: As long as paths of this length exists, we remove from reduction all direct paths for which there are indirect paths of length i.
Credits to #PaulPanzer.

Multiple trained models vs Multple features and one model

I'm trying to build a regression based M/L model using tensorflow.
I am trying to estimate an object's ETA based on the following:
distance from target
distance from target (X component)
distance from target (Y component)
speed
The object travels on specific journeys. This could be represented as from A->B or from A->C or from D->F (POINT 1 -> POINT 2). There are 500 specific journeys (between a set of points).
These journeys aren't completely straight lines, and every journey is different (ie. the shape of the route taken).
I have two ways of getting around this problem:
I can have 500 different models with 4 features and one label(the training ETA data).
I can have 1 model with 5 features and one label.
My dilemma is that if I use option 1, that's added complexity, but will be more accurate as every model will be specific to each journey.
If I use option 2, the model will be pretty simple, but I don't know if it would work properly. The new feature that I would add are originCode+ destinationCode. Unfortunately these are not quantifiable in order to make any numerical sense or pattern - they're just text that define the journey (journey A->B, and the feature would be 'AB').
Is there some way that I can use one model, and categorize the features so that one feature is just a 'grouping' feature (in order separate the training data with respect to the journey.
In ML, I believe that option 2 is generally the better option. We prefer general models rather than tailoring many models to specific tasks, as that gets dangerously close to hardcoding, which is what we're trying to get away from by using ML!
I think that, depending on the training data you have available, and the model size, a one-hot vector could be used to describe the starting/end points for the model. Eg, say we have 5 points (ABCDE), and we are going from position B to position C, this could be represented by the vector:
0100000100
as in, the first five values correspond to the origin spot whereas the second five are the destination. It is also possible to combine these if you want to reduce your input feature space to:
01100
There are other things to consider, as Scott has said in the comments:
How much data do you have? Maybe the feature space will be too big this way, I can't be sure. If you have enough data, then the model will intuitively learn the general distances (not actually, but intrinsically in the data) between datapoints.
If you have enough data, you might even be able to accurately predict between two points you don't have data for!
If it does come down to not having enough data, then finding representative features of the journey will come into use, ie. length of journey, shape of the journey, elevation travelled etc. Also a metric for distance travelled from the origin could be useful.
Best of luck!
I would be inclined to lean toward individual models. This is because, for a given position along a given route and a constant speed, the ETA is a deterministic function of time. If one moves monotonically closer to the target along the route, it is also a deterministic function of distance to target. Thus, there is no information to transfer from one route to the next, i.e. "lumping" their parameters offers no a priori benefit. This is assuming, of course, that you have several "trips" worth of data along each route (i.e. (distance, speed) collected once per minute, or some such). If you have only, say, one datum per route then lumping the parameters is a must. However, in such a low-data scenario, I believe that including a dummy variable for "which route" would ultimately be fruitless, since that would introduce a number of parameters that rivals the size of your dataset.
As a side note, NEITHER of the models you describe could handle new routes. I would be inclined to build an individual model per route, data quantity permitting, and a single model neglecting the route identity entirely just for handling new routes, until sufficient data is available to build a model for that route.

JGraphT: getting ranked solutions from e.g. KuhnMunkresMinimalWeightBipartitePerfectMatching

I've defined a bipartite graph with two equal partitions. Let's say the vertices are:
(T0, T1, T2, Z0, Z1, Z2),
the partitions are (T0, T1, T2) and (Z0, Z1, Z2). All vertices of partition "T" are connected to all vertices of partition "Z" through weighted edges.
I'm using the KuhnMunkresMinimalWeightBipartitePerfectMatching class to find the optimum assignment. It correctly gives me the best assignment possible.
My question is: can I ask for several solutions, ranked in order of cost?
I've explored two options so far that did not work:
first, I set the weight of the all the edges involved in the first assignment to a high value, effectively making that solution more expensive then other solutions. However, this excludes other feasible solutions.
second, I removed the edge from the graph after it had been used in the first solution. But then the graph is not necessarily bipartite any more. Besides that, it again removes edges that could be a part of a different solution.
Is there perhaps a different class I should be using? Or some way to extract several solutions from the one I use? I'm just starting out with JGraphT, any help is appreciated.

Search optimization problem

Suppose you have a list of 2D points with an orientation assigned to them. Let the set S be defined as:
S={ (x,y,a) | (x,y) is a 2D point, a is an orientation (an angle) }.
Given an element s of S, we will indicate with s_p the point part and with s_a the angle part. I would like to know if there exist an efficient data structure such that, given a query point q, is able to return all the elements s in S such that
(dist(q_p, s_p) < threshold_1) AND (angle_diff(q_a, s_a) < threshold_2) (1)
where dist(p1,p2), with p1,p2 2D points, is the euclidean distance, and angle_diff(a1,a2), with a1,a2 angles, is the difference between angles (taken to be the smallest one). The data structure should be efficient w.r.t. insertion/deletion of elements and the search as defined above. The number of vectors can grow up to 10.000 and more, but take this with a grain of salt.
Now suppose to change the above requirement: instead of using the condition (1), let's request all the elements of S such that, given a distance function d, we want all elements of S such that d(q,s) < threshold. If i remember well, this last setup is called range-search. I don't know if the first case can be transformed in the second.
For the distance search I believe the accepted best method is a Binary Space Partition tree. This can be stored as a series of bits. Each two bits (for a 2D tree) or three bits (for a 3D tree) subdivides the space one more level, increasing resolution.
Using a BSP, locating a set of objects to compare distances with is pretty easy. Just find the smallest set of squares or cubes which contain the edges of your distance box.
For the angle, I don't know of anything. I suppose that you could store each object in a second list or tree sorted by its angle. Then you would find every object at the proper distance using the BSP, every object at the proper angles using the angle tree, then do a set intersection.
You have effectively described a "three dimensional cyclindrical space", ie. a space that is locally three dimensional but where one dimension is topologically cyclic. In other words, it is locally flat and may be modeled as the boundary of a four-dimensional object C4 in (x, y, z, w) defined by
z^2 + w^2 = 1
where
a = arctan(w/z)
With this model, the space defined by your constraints is a 2-dimensional cylinder wrapped "lengthwise" around a cross section wedge, where the wedge wraps around the 4-d cylindrical space with an angle of 2 * threshold_2. This can be modeled using a "modified k-d tree" approach (modified 3-d tree), where the data structure is not a tree but actually a graph (it has cycles). You can still partition this space into cells with hyperplane separation, but traveling along the curve defined by (z, w) in the positive direction may encounter a point encountered in the negative direction. The tree should be modified to actually lead to these nodes from both directions, so that the edges are bidirectional (in the z-w curve direction - the others are obviously still unidirectional).
These cycles do not change the effectiveness of the data structure in locating nearby points or allowing your constraint search. In fact, for the most part, those algorithms are only slightly modified (the simplest approach being to hold a visited node data structure to prevent cycles in the search - you test the next neighbors about to be searched).
This will work especially well for your criteria, since the region you define is effectively bounded by these axis-defined hyperplane-bounded cells of a k-d tree, and so the search termination will leave a region on average populated around pi / 4 percent of the area.