How to find all possible paths from one rdf node to another? Any optimization on backtracking search? - sparql

In KG we have sample Rdf triples be like.
input1 <---- f(x) -----> ouput1 <----- f(x) -----> output2 <----- f(x) -----> output3
My goal is to find all posibble paths from Input1 - output3. we start from (Input1) and everytime we will try to find f(x) and ouput with given input. SPARQL -
?function has_input input.?function has_output ?output.
when we got some (ouput)s, they become input for the next query and so on until we found goal output(output3).
As I am finding all posible paths so I just implemented a backtracking algorithm to consider all nodes and find all path combinations.
Problem:
It takes too long time to finding all possible paths in worst case scenerio. Is there any solution that is related to semantic web? or any optimization technique for this scenerio to reduce time of backtrack?

Related

Inference on several inputs in order to calculate the loss function

I am modeling a perceptual process in tensorflow. In the setup I am interested in, the modeled agent is playing a resource game: it has to choose 1 out of n resouces, by relying only on the label that a classifier gives to the resource. Each resource is an ordered pair of two reals. The classifier only sees the first real, but payoffs depend on the second. There is a function taking first to second.
Anyway, ideally I'd like to train the classifier in the following way:
In each run, the classifier give labels to n resources.
The agent then gets the payoff of the resource corresponding to the highest label in some predetermined ranking (say, A > B > C > D), and randomly in case of draw.
The loss is taken to be the normalized absolute difference between the payoff thus obtained and the maximum payoff in the set of resources. I.e., (Payoff_max - Payoff) / Payoff_max
For this to work, one needs to run inference n times, once for each resource, before calculating the loss. Is there a way to do this in tensorflow? If I am tackling the problem in the wrong way feel free to say so, too.
I don't have much knowledge in ML aspects of this, but from programming point of view, I can see doing it in two ways. One is by copying your model n times. All the copies can share the same variables. The output of all of these copies would go into some function that determines the the highest label. As long as this function is differentiable, variables are shared, and n is not too large, it should work. You would need to feed all n inputs together. Note that, backprop will run through each copy and update your weights n times. This is generally not a problem, but if it is, I heart about some fancy tricks one can do by using partial_run.
Another way is to use tf.while_loop. It is pretty clever - it stores activations from each run of the loop and can do backprop through them. The only tricky part should be to accumulate the inference results before feeding them to your loss. Take a look at TensorArray for this. This question can be helpful: Using TensorArrays in the context of a while_loop to accumulate values

Solving a Mixed Integer Quadratic Program using SCIP

I have a mixed integer quadratic program (MIQP) which I would like to solve using SCIP. The program is in the form such that on fixing the integer variables, the problem turns out to be a linear program. And on fixing the the continuous variables it becomes a Integer Program. A simple example :
max. \Sigma_{i} n_i * f_i(x_i)
such that.
n_1 * x_1 + n2 * x_2 < t
n_3 * x_1 + n2 * x_2 < m
.
.
many random quadratic constraints in n_i's and x_i's
so on
Here f_i is a concave piecewise linear function.
x_i's are continuous variables ( they take real values )
n_i's are integer variables
I am able to solve the problem using SCIP. But on problems with a large number of variables SCIP takes a lot of time to find the solution. I have particularly noticed that it does not find many primal solutions. Thus the rate at which the upper bound reduces is very slow. However, I could get better results by doing set heuristics emphasis aggressive.
It would be great if anyone can guide me on the following questions :
1) Is there any particular algorithm/ Software package which solves problems that fit perfectly into the model as described above ?
2) Suggestions on how to improve the rate at which primal solutions are found.
3) What type of branching can I use to get better results ?
4) Any guidance on improving performance would be really helpful.
I am okay with relaxing the integer constraints as well.
Thanks
1) The algorithm in SCIP should fit your problem. There are other software packages that implement similar algorithms, e.g., BARON and ANTIGONE.
2) Have a look which primal heuristics were successful in your run and change their parameters to run them more frequently.
3) No idea. Default should be ok.
4) Make sure that your variables have good bounds. Tighter bounds allow for a tighter relaxation to be constructed.
If you can post an instance of your problem somewhere, or a log of a SCIP run, including the detailed statistics at the end, maybe someone can give more hints on what to improve.

Optimizing Python KD Tree Searches

Scipy (http://www.scipy.org/) offers two KD Tree classes; the KDTree and the cKDTree.
The cKDTree is much faster, but is less customizable and query-able than the KDTree (as far as I can tell from the docs).
Here is my problem:
I have a list of 3million 2 dimensional (X,Y) points. I need to return all of the points within a distance of X units from every point.
With the KDtree, there is an option to do just this: KDtree.query_ball_tree() It generates a list of lists of all the points within X units from every other point. HOWEVER: this list is enormous and quickly fills up my virtual memory (about 744 million items long).
Potential solution #1: Is there a way to parse this list into a text file as it is writing?
Potential solution #2: I have tried using a for loop (for every point in the list) and then finding that single point's neighbors within X units by employing: KDtree.query_ball_point(). HOWEVER: this takes forever as it needs to run the query millions of times. Is there a cKDTree equivalent of this KDTree tool?
Potential solution #3: Beats me, anyone else have any ideas?
From scipy 0.12 on, both KD Tree classes have feature parity. Quoting its announcement:
cKDTree feature-complete
Cython version of KDTree, cKDTree, is now feature-complete. Most
operations (construction, query, query_ball_point, query_pairs,
count_neighbors and sparse_distance_matrix) are between 200 and 1000
times faster in cKDTree than in KDTree. With very minor caveats,
cKDTree has exactly the same interface as KDTree, and can be used as a
drop-in replacement.
Try using KDTree.query_ball_point instead. It takes a single point, or array of points, and produces the points within a given distance of the input point(s).
You can use this function to perform batch queries. Give it, say, 100000 points at a time, and then write the results out to a file. Something like this:
BATCH_SIZE = 100000
for i in xrange(0, len(pts), BATCH_SIZE):
neighbours = tree.query_ball_point(pts[i:i+BATCH_SIZE], X)
# write neighbours to a file...

How to effectively use knn in Stata

I have two questions with executing discrim knn in Stata.
1) How do you properly code the command? I've tried various versions, but seem to always get an error that there are too many variables specified.
The vector with the correct result is buy.
I am trying: discrim knn buy, group(train test) k(1)
2) My understanding with KNN was that factor variables (binary) were fine for using KNN, even encouraged. However I get the error message that factor variables and time-series operators not allowed.
Lastly, though I know this isn't the best space for this question, should each vector be normalized for knn? I've heard conflicting responses.
I'm guessing that the error you're getting is
group(): too many variables specified
This is because you can only group by 1 variable with knn. knn performs discriminant analysis based on a single grouping variable, in your case, distinguishing the training from the test. I imagine your train and test variables are binary, in which case using only one of the variables is enough, as they are merely logical opposites of each other. A single variable has enough information to distinguish the two groups.

Weighted Bipartite Matching covering one partition

I have a problem here, that I managed to reduce to a weighted bipartite match problem. Basically, I have a bipartite graph with partitions A and B, and a set of edges with weights. In my case, |A|~=20 and |B| =300.
I want to find a set of edges which minimizes the weigths AND COVERS 'A' (each edge on A has an associated solution edge)
Questions:
-Is there a special name for this kind a problem, so I can look for algorithms and solutions?
-I know I can reduce it to a weighted bipartite perfect match, by adding dummy vertices on A, with infinite weigth. But I'm worried about practical performance since |B|>>|A|.
-Any suggestions on Java libraries? I found this: http://algs4.cs.princeton.edu/code/. I think the 'AssignmentProblem.java' is almost what I need - (but I guess it doesn't ensure a perfect matching?)
Thanks in advance and sorry about the bad english.
a) maximum weighted perfect matching
b) ???
c) floyd or floyd-warshall alogorithm is your friend
I've found a c-implemenation in the web and also you can use edmond's blossom algorithm, too.