Minimal cut/maximum flow in directed graph - min

I have a directed graph
Firstly, I used Ford-Fulkerson's algorithm to increase the flow of the network. When I marked the vertices, I saw that flow on path: s->a->b->d->t can be increased by one so graph changed to:
I know that when searching for maximum flow, you need to add up all the capacaties of edges that connect the minimal cut and outer part of the graph.
My minimal cut contains vertices: s, a, c, so When I added up all the edges I got c(G, !G) = 3 + 2 +2 + 1, however, this is a lot bigger than flow to t which is 5.
What am I doing wrong, have I messed up with FF or minimal cut?

You minimum cut is not s, a, c, but s, a, b, c. Its capacity is 5 which is the maximum flow that you've calculated.
You can find the minimum cut by using the definition of the residual network. Recall that the Ford-Fulkerson terminates when there are no paths between s and t in the residual network.
The minimum cut (S,T) is defined as
S = { v | there exists a path from s to v in the residual network }
In you graph, the node b is reachable from c in the residual network because of the flow b->c with weight 3.

Related

Getting 2 values of focal length when finding Intrinsic camera matrix (F not Fx,Fy)?

The following image is the example that was given in my computer vision class. Now I cant understand why we are getting 2 unique values of f. I can understand if mxf and myf are different, but shouldn't the focal length 'f' be the same?
I believe you have an Fx and a Fy. This is so that the the matrix transforms on f can scale f in two directions x and y. IIRC this is why you get 2 f numbers
If really single f wanted, it should be modeled in the camera model used in calibration.
e.g. give the mx,my as constants to the camera model, and estimate the f.
However, perhaps the calibration process that obtained that K was not that way, but treated the two elements (K(0,0) and K(1,1)) independently.
In other words, mx and my were also estimated in the sense of dealing with the aspect ratio.
The estimation result is not the same as the values of mx and my calculated from the sensor specifications.
This is why you got 2 values.

Graph Between Steiner Tree and Complete Graph

Given a set points P in the plane, and given a threshold t, I'd like to compute a connected graph G to minimize the sum of the lengths of its edges, subject to the following constraints:
The vertices of G contain all the points in P.
For every pair of points u and v in P, their distance in G is no greater than t times their Euclidean distance.
When t=1, this problem is solved by constructing a complete graph on P. When t is infinite (or simply large enough), this problem is the Euclidean Steiner Tree Problem.
If there already a name for this problem, I'm curious what it is. More than that, does anyone have any suggestions for how to make an algorithm for this? Since it contains the Euclidean Steiner Tree Problem as a special case, it can't be simpler, so I'm not looking for anything particularly time efficient. Thanks!

Running a logistic model in JAGS - Can you vectorize instead of looping over individual cases?

I'm fairly new to JAGS, so this may be a dumb question. I'm trying to run a model in JAGS that predicts the probability that a one-dimensional random walk process will cross boundary A before crossing boundary B. This model can be solved analytically via the following logistic model:
Pr(A,B) = 1/(1 + exp(-2 * (d/sigma) * theta))
where "d" is the mean drift rate (positive values indicate drift toward boundary A), "sigma" is the standard deviation of that drift rate and "theta" is the distance between the starting point and the boundary (assumed to be equal for both boundaries).
My dataset consists of 50 participants, who each provide 1800 observations. My model assumes that d is determined by a particular combination of observed environmental variables (which I'll just call 'x'), and a weighting coefficient that relates x to d (which I'll call 'beta'). Thus, there are three parameters: beta, sigma, and theta. I'd like to estimate a single set of parameters for each participant. My intention is to eventually run a hierarchical model, where group level parameters influence individual level parameters. However, for simplicity, here I will just consider a model in which I estimate a single set of parameters for one participant (and thus the model is not hierarchical).
My model in rjags would be as follows:
model{
for ( i in 1:Ntotal ) {
d[i] <- x[i] * beta
probA[i] <- 1/(1+exp(-2 * (d[i]/sigma) * theta ) )
y[i] ~ dbern(probA[i])
}
beta ~ dunif(-10,10)
sigma ~ dunif(0,10)
theta ~ dunif(0,10)
}
This model runs fine, but takes ages to run. I'm not sure how JAGS carries out the code, but if this code were run in R, it would be rather inefficient because it would have to loop over cases, running the model for each case individually. The time required to run the analysis would therefore increase rapidly as the sample size increases. I have a rather large sample, so this is a concern.
Is there a way to vectorise this code so that it can calculate the likelihood for all of the data points at once? For example, if I were to run this as a simple maximum likelihood model. I would vectorize the model and calculate the probability of the data given particular parameter values for all 1800 cases provided by the participant (and thus would not need the for loop). I would then take the log of these likelihoods and add them all together to give a single loglikelihood for the all observations given by the participant. This method has enormous time savings. Is there a way to do this in JAGS?
EDIT
Thanks for the responses, and for pointing out that the parameters in the model I showed might be unidentified. I should've pointed out that model was a simplified version. The full model is below:
model{
for ( i in 1:Ntotal ) {
aExpectancy[i] <- 1/(1+exp(-gamma*(aTimeRemaining[i] - aDiscrepancy[i]*aExpectedLag[i]) ) )
bExpectancy[i] <- 1/(1+exp(-gamma*(bTimeRemaining[i] - bDiscrepancy[i]*bExpectedLag[i]) ) )
aUtility[i] <- aValence[i]*aExpectancy[i]/(1 + discount * (aTimeRemaining[i]))
bUtility[i] <- bValence[i]*bExpectancy[i]/(1 + discount * (bTimeRemaining[i]))
aMotivationalValueMean[i] <- aUtility[i]*aQualityMean[i]
bMotivationalValueMean[i] <- bUtility[i]*bQualityMean[i]
aMotivationalValueVariance[i] <- (aUtility[i]*aQualitySD[i])^2 + (bUtility[i]*bQualitySD[i])^2
bMotivationalValueVariance[i] <- (aUtility[i]*aQualitySD[i])^2 + (bUtility[i]*bQualitySD[i])^2
mvDiffVariance[i] <- aMotivationalValueVariance[i] + bMotivationalValueVariance[i]
meanDrift[i] <- (aMotivationalValueMean[i] - bMotivationalValueMean[i])
probA[i] <- 1/(1+exp(-2*(meanDrift[i]/sqrt(mvDiffVariance[i])) *theta ) )
y[i] ~ dbern(probA[i])
}
In this model, the estimated parameters are theta, discount, and gamma, and these parameters can be recovered. When I run the model on the observations for a single participant (Ntotal = 1800), the model takes about 5 minutes to run, which is totally fine. However, when I run the model on the entire sample (45 participants x 1800 cases each = 78,900 observations), I've had it running for 24 hours and it's less than 50% of the way through. This seems odd, as I would expect it to just take 45 times as long, so 4 or 5 hours at most. Am I missing something?
I hope I am not misreading this situation (and I previously apologize if I am), but your question seems to come from a conceptual misunderstanding of how JAGS works (or WinBUGS or OpenBUGS for that matter).
Your program does not actually run, because what you wrote was not written in a programming language. So vectorizing will not help.
You wrote just a description of your model, because JAGS' language is a descriptive one.
Once JAGS reads your model, it assembles a transition matrix to run a MCMC whose stationary distribution is the posteriori distribution of your parameters given your (observed) data. JAGS does nothing else with your program.
All that time you have been waiting the program to run was actually waiting (and hoping) to reach relaxation time of your MCMC.
So, what is taking your program too long to run is that the resulting transition matrix must have bad relaxing properties or anything like that.
That is why vectorizing a program that is read and run only once will be of very little help.
So, your problem lies somewhere else.
I hope it helps and, if not, sorry.
All the best.
You can't vectorise in the same way that you would in R, but if you can group observations with the same probability expression (i.e. common d[i]) then you can use a Binomial rather than Bernoulli distribution which will help enormously. If each observation has a unique d[i] then you are stuck I'm afraid.
Another alternative is to look at Stan which is generally faster for large data sets like yours.
Matt
thanks for the responses. Yes, you make a good point that the parameters in the model I showed might be unidentified.
I should've pointed out that model was a simplified version. The full model is below:
model{
for ( i in 1:Ntotal ) {
aExpectancy[i] <- 1/(1+exp(-gamma*(aTimeRemaining[i] - aDiscrepancy[i]*aExpectedLag[i]) ) )
bExpectancy[i] <- 1/(1+exp(-gamma*(bTimeRemaining[i] - bDiscrepancy[i]*bExpectedLag[i]) ) )
aUtility[i] <- aValence[i]*aExpectancy[i]/(1 + discount * (aTimeRemaining[i]))
bUtility[i] <- bValence[i]*bExpectancy[i]/(1 + discount * (bTimeRemaining[i]))
aMotivationalValueMean[i] <- aUtility[i]*aQualityMean[i]
bMotivationalValueMean[i] <- bUtility[i]*bQualityMean[i]
aMotivationalValueVariance[i] <- (aUtility[i]*aQualitySD[i])^2 + (bUtility[i]*bQualitySD[i])^2
bMotivationalValueVariance[i] <- (aUtility[i]*aQualitySD[i])^2 + (bUtility[i]*bQualitySD[i])^2
mvDiffVariance[i] <- aMotivationalValueVariance[i] + bMotivationalValueVariance[i]
meanDrift[i] <- (aMotivationalValueMean[i] - bMotivationalValueMean[i])
probA[i] <- 1/(1+exp(-2*(meanDrift[i]/sqrt(mvDiffVariance[i])) *theta ) )
y[i] ~ dbern(probA[i])
}
theta ~ dunif(0,10)
discount ~ dunif(0,10)
gamma ~ dunif(0,1)
}
In this model, the estimated parameters are theta, discount, and gamma, and these parameters can be recovered.
When I run the model on the observations for a single participant (Ntotal = 1800), the model takes about 5 minutes to run, which is totally fine.
However, when I run the model on the entire sample (45 participants X 1800 cases each = 78,900 observations), I've had it running for 24 hours and it's less than 50% of the way through.
This seems odd, as I would expect it to just take 45 times as long, so 4 or 5 hours at most. Am I missing something?

Within cluster sum of square of the next iteration is bigger than the previous when K-means is applied to SURF features?

I am using K-Means algorithm to classify SIFT vector features.
My K-Means skeleton is
choose first K points of the data as initial centers
do{
// assign each point to the corresponding cluster
data assignment;
// get the recalculated center of each cluster
// suppose point is multi-dimensional data, e.g.(x1, x2, x3...)
// the center is composed by average value of each dimension.
// e.g. ((x1 + y1 + ...)/n, (x2 + y2 + ...)/n ...)
new centroid;
sum total memberShipChanged;
}while(some point still changes their membership, i.e. total memberShipChanged != 0)
We all know that K-Means aims to get the minimal of within cluster sum of square, illustrated as below snapshot.
And we can use a do while iteration to reach the target. Now I prove why after every iteration the within cluster sum of square is smaller.
Proof:
For simplicity, I only consider 2 iterations.
After data assignment process, every descriptor vector has its new nearest cluster center, so the within cluster sum of square decrease after this process. After all, the within cluster of square is sum of every vector to one center, if every vector choose it own new nearest neighbor, there is no doubt that the sum decreases.
In new centroid process, I use arithmetic mean to calculate the new center vector, the local sum of one cluster must decrease.
So the within cluster sum of square decrease twice in one iteration. And after several iterations, every descriptor vector doesn't change its membership, and within cluster sum of square reaches the local minimum.
===============================================================================
Now My question comes:
my SURF data is derived from 3000 images, every descriptor vector is 128-dimension, and there are 1,296,672 vectors in total. And in my code, I print
1)vector number of each clsuter
2)total memeberShipChanged in one iteration
3)within cluster sum of square before one iteration.
Here is output:
sum of square : 8246977014860
90504 228516 429755 266828 1653711 398631 193081 240072
memberShipChanged : 3501098
sum of square : 4462579627000
244521 284626 448700 228211 1361902 303864 317464 311810
memberShipChanged : 975442
sum of square : 4561378972772
323746 457785 388988 228431 993328 304606 473668 330546
memberShipChanged : 828709
sum of square : 4678353976030
359537 480818 346767 222646 789858 332876 612672 355924
memberShipChanged : 563256
......
I only list 4 iteration output of it. From the output, we can see that after first iteration. within cluster sum of square really decrease from 8246977014860 to 4462579627000. But other iterations are nearly of on use in minimize it, but we can still observe the memberShipChanged is converging. I don't know why this happen. I think the first k-means iteration is overwhelming important.
Besides, what should I set the new center coordinate of a empty cluster when the memberShipChanged still doesn't converge to 0 yet? Now I use (0, 0, 0, 0, 0 ...). But is this accurate, perhaps the within cluster of sum increases due to it.

On what does the time complexity for graph algorithms depends on?

I stumbled over this question in my textbook:
"In general, on what does the time complexity of Prim's, Kruskal's and Dijkstra's algorithms depends on?"
a. The number of vertices in the graph.
b. The number of edges in the graph.
c. Both, on the number of vertices and edges in the graph.
Explain your choice.
So according to Wikipedia Prim's,Kruskal's and Dijkstra's algorithms worst case time complexities are O(ElogV), O(ElogV) and O(E+VlogV) respectively. So i guess the answer is c? But why?
I don't know about Prim's and Kruskal's and might be wrong about Dijkstra's but I think in its case the answer would be b because:
Dijkstra's will visit nodes on the shortest known path until it finds the destination.
This implies that if two edges point to the same node, only one will ever be considered by the algorithm since one has a higher weight or they're equal, rendering one edge moot to follow.
Therefore, the only way to increase the time spent traversing the graph by adding edges is by adding nodes (adding edges on existing node can change the algorithm's traversal time but it's not proportional to the amount of edges, only to their weights).
Therefore, my intuition is that only the amount of nodes are in direct relation with running time. The Dijkstra's Alogrithm wikipedia page seems to confirm this:
The simplest implementation of the Dijkstra's algorithm stores
vertices of set Q in an ordinary linked list or array, and extract
minimum from Q is simply a linear search through all vertices in Q. In
this case, the running time is O(E + V^2) or O(V^2).
This is only an intuition of course, and cs.stackex might be of greater use.
The answer is (c), because both V and E contribute to the asymptotic complexity of the respective algorithms. Now, on further analysis one could argue that V is much less on Kruskal's and Prim's(since it is log factor). But E seems to almost have same weights in all three cases.
Also, note that |E| <= |V|^2 always (for simple graphs)
In worst case graph will be a complete graph i.e v(v-1)/2 edges ie e>>v and e ~ v^2
Time Complexity of Prim's and Dijkstra's algorithms are:
1. With Adjacency List and Priority queue:
O((v+e) log v) in worst case: e>>v so O( e log v)
2. With matrix and Priority queue:
O(v^2 + e log v) in WC e ~ v^2
So O(v^2 + e log v) ~ O(e + e log v) ~ O(e log v).
3. When graph go denser ( Worst case is Complete Graph ) we use Fibonacci Heap
and adjacency list: O( e + v log v)
Time complexity of kruskal is O(e log e) in Worst case e ~ v^2
so log (v^2) = 2 log v
So we can safely say than O(e log e) can be O( 2e log v)
ie O( e log v) in worst case.
As you said, the time complexities of O(ElogV), O(ElogV), and O(E+VlogV) mean that each one is dependent on both E and V. This is because each algorithm involves considering the edges and their respective weights in a graph. Since for Prim’s and Kruskal’s the MST has to be connected and include all vertices, and for Dijkstra’s the shortest path has to pass from one vertex to another through other intermediary vertices, the vertices also have to be considered in each algorithm.
For example, with Dijkstra’s algorithm, you are essentially looking to add edges that are both low in cost and that connect vertices that will eventually provide a path from the starting vertex to the ending vertex. To find the shortest path, you cannot solely look for a path that connects the start vertex to the end, and you cannot solely look for the smallest weighted edges, you need to consider both. Since you are considering both edges and vertices, the time it takes to make these considerations throughout the algorithm will depend on the number of edges and number of vertices.
Additionally, different time complexities are possible through different implementations of the three algorithms, and analyzing each algorithm requires a consideration of both E and V.
For example, Prim’s algorithm is O(V^2), but can be improved with the use of a min heap-based priority queue to achieve the complexity you found: O(ElogV). O(ElogV) may seem like the faster algorithm, but that’s not always the case. E can be as large as V^2, so in dense graphs with close to V^2 edges, O(ElogV) becomes O(V^2). If V is very small then there is not much difference between O(V^2) and O(ElogV). E and V also influence the running time based on the way that the graph is being stored. For example, an adjacency list becomes very inefficient with dense graphs (with E approaching V^2) because checking to see if an edge exists in the graph goes from close to O(1) to O(V).