graph-tool Collect Vertex Marginals - pv Size - graph-tool

I was running
gt.mcmc_equilibrate(state, force_niter=300, mcmc_args=dict(niter=10),
callback=collect_vertex_marginals)
And I got a Property Map (let's call it pv) of the vertex marginals. The pv gives an array for each vertex, say, [0.0, 0.0, 0.0, 299.0], which I understand is that it counts how many times the vertex was in a block (in this case, all counts would be in block 3), so the vertex is assigned to block 3 as it has the highest probability of being in there.
So... is it that the nth element in the array is also the the nth block?
I thought that this was the case but got pv[some vertice] which had array sizes which were smaller than the block number.
So... how should I interpret the vertex_marginals property map?
Your help is very much appreciated...

The arrays are resized on demand to avoid unnecessary memory usage. For each inexistent entry, you can assume that the corresponding value is zero.

Related

How to break down a mesh distance minimization problem?

I'm having trouble solving a problem using "Ceres" and I could use some help!
To simplify the problem:
Imagine I have a mesh "A" that I want to scale and rotate (with scale + rotation represented as variables!) to be as
close to a mesh "B" as possible.
"Closeness" is defined as the sum of....
For each vertex in A, find the distance to the nearest vertex in B
For each vertex in B, find the distance to the nearest vertex in A
Now that first clause is super easy to model.
We can add a residual block for each vertex in "A", deform the vertex within the Cost Function using our variables, and work out their nearest neighbour in "B" - great.
But what about that second clause?
Do we have to do it in a single residual block? Deform every vertex in "A" in the residual block and compare to every vertex in "B"?
Do we create a residual block for every vertex in "B", and then have to re-deform every vertex in "A" within each residual block?
Is there something better that we can do to break down the problem??
Any help would be truly appreciated - I've spent hours thinking about it and I feel kinda stuck!

Fast check if polygon contains point between dataframes

I have two dataframes. One contains a column of Polygons, taken from an image of polygon shapes. Each polygon has a set of coordinates. This dataframe also has a "segment-id" column. I have another dataframe, containing a column of Points, also with coordinates. These Points represent pixels from the same image of Polygon shapes, and therefore have the same coordinate system. I want to give every Point the "segment-id" of the Polygon which contains it. Every Polygon contains at least one Point.
Currently, I achieve this by using a nested for loop:
for i, row in enumerate(point_df.itertuples(), 0):
point = pixel_df.at[i, 'geometry']
for j in range(len(polygon_df)):
polygon = polygon_df.iat[j, 0]
if polygon.contains(point):
pixel_df.at[i, 'segment_id'] = polygon_df.at[j, 'segment_id']
else:
pass
This is extremely slow. For 100 Points, it takes around 10 seconds. I need a faster way of doing this. I have tried using apply but it is still super slow.
Hope someone can help me out, thanks very much.
For fast "is point inside polygon":
Preparation: In the code that obtains the data describing the polygons; using all the vertices, find the minimum and maximum y-coord, and minimum and maximum x-coord; and store that with the polygon's data.
1) Using the point's coords and the polygon's "minimum and maximum x and y" (pre-determined during preparation); do a "bounding box" test. This is just a fast way to find out if the point is definitely not inside the polygon (so you can skip the more expensive steps most of the time).
2) Set a "yes/no" flag to "no"
3) For each edge in the polygon; determine if a horizontal line passing through the point would intersect with the edge, and if it does determine the x-coord of the intersection. If the x-coord of the intersection is less than the point's x-coord, toggle (with NOT) the "yes/no" flag. Ignore "horizontal line passes through a vertex" during this step.
4) For each vertex, compare its y-coord with the point's y-coord. If they're the same you need to look at both edges coming from that vertex to determine if the edge's vertices are in the same y direction. If the edge's vertices are in the same y direction (if the edges form a 'V' shape or upside-down 'V' shape) ignore the vertex. Otherwise (if the edges form a '<' or '>' shape), if the vertex's x-coord is less than the point's x-coord, toggle the "yes/no" flag.
After all this is done; that "yes/no" flag will tell you if the point was in the polygon.

Within cluster sum of square of the next iteration is bigger than the previous when K-means is applied to SURF features?

I am using K-Means algorithm to classify SIFT vector features.
My K-Means skeleton is
choose first K points of the data as initial centers
do{
// assign each point to the corresponding cluster
data assignment;
// get the recalculated center of each cluster
// suppose point is multi-dimensional data, e.g.(x1, x2, x3...)
// the center is composed by average value of each dimension.
// e.g. ((x1 + y1 + ...)/n, (x2 + y2 + ...)/n ...)
new centroid;
sum total memberShipChanged;
}while(some point still changes their membership, i.e. total memberShipChanged != 0)
We all know that K-Means aims to get the minimal of within cluster sum of square, illustrated as below snapshot.
And we can use a do while iteration to reach the target. Now I prove why after every iteration the within cluster sum of square is smaller.
Proof:
For simplicity, I only consider 2 iterations.
After data assignment process, every descriptor vector has its new nearest cluster center, so the within cluster sum of square decrease after this process. After all, the within cluster of square is sum of every vector to one center, if every vector choose it own new nearest neighbor, there is no doubt that the sum decreases.
In new centroid process, I use arithmetic mean to calculate the new center vector, the local sum of one cluster must decrease.
So the within cluster sum of square decrease twice in one iteration. And after several iterations, every descriptor vector doesn't change its membership, and within cluster sum of square reaches the local minimum.
===============================================================================
Now My question comes:
my SURF data is derived from 3000 images, every descriptor vector is 128-dimension, and there are 1,296,672 vectors in total. And in my code, I print
1)vector number of each clsuter
2)total memeberShipChanged in one iteration
3)within cluster sum of square before one iteration.
Here is output:
sum of square : 8246977014860
90504 228516 429755 266828 1653711 398631 193081 240072
memberShipChanged : 3501098
sum of square : 4462579627000
244521 284626 448700 228211 1361902 303864 317464 311810
memberShipChanged : 975442
sum of square : 4561378972772
323746 457785 388988 228431 993328 304606 473668 330546
memberShipChanged : 828709
sum of square : 4678353976030
359537 480818 346767 222646 789858 332876 612672 355924
memberShipChanged : 563256
......
I only list 4 iteration output of it. From the output, we can see that after first iteration. within cluster sum of square really decrease from 8246977014860 to 4462579627000. But other iterations are nearly of on use in minimize it, but we can still observe the memberShipChanged is converging. I don't know why this happen. I think the first k-means iteration is overwhelming important.
Besides, what should I set the new center coordinate of a empty cluster when the memberShipChanged still doesn't converge to 0 yet? Now I use (0, 0, 0, 0, 0 ...). But is this accurate, perhaps the within cluster of sum increases due to it.

Looking for an efficient structure for checking which circles enclose a point

I have a large set of overlapping circles each at a random location with a specific radius.
type Circle =
struct
val x: float
val y: float
val radius: float
end
Given a new point with type
type Point =
struct
val x: float
val y: float
end
I would like to know which circles in my set enclose the new point. A linear search is trivial. I'm looking for a structure that can hold the circles and return the enclosing circles with better than O(N) for the presented point.
Ideally the structure should be fast for insertion of new circles and removal of circles as well.
I would like to implement this in F# but ideas in any language are fine.
For your information I'm looking to implement
http://takisword.wordpress.com/2009/08/13/bowyerwatson-algorithm/
but it would be an O(N^2) if I use the naive approach of scanning all circles for every new point.
If we assume that circles are distributed over some rectangle with area 1 and the average area of a circle is a then a quadtree with m levels will leave you with an area with size 1/2^m. This leaves
O(Na/2^m)
as the expected number of circles left in the remaining area.
However, we have done O(log(m)) comparisons to get to this point. This leaves the total number of comparisons as
O(log(m)) + O(N/2^m)
The second term will be constant if log(m) is proportional to N.
This suggests that a quadtree can cut things down to O(log n)
Quadtree is a structure for efficient plane search. You can use it to hold subdivision of the plane.
For example you can create quad tree with such properties:
1. Every cell of quadtree contains indices of circles, overlapping it.
2. Every cell does contain not more than K circles (for example 10) // may be broken
3. Height of tree is bounded by M (usually O(log n))
You can construct quadtree, by iterating overlapped cells, and if number of circles inside cell exceedes K, then subdivide that cell into four (if not exceeding max height). Also something should be considered in case of cell inside circles, because its subdivision is pointless.
When finding circles you should localise quadtree, then iterate through overlapping circles and find, those which contains point.
In case of sparse circle distribution search will be very efficient.
I have a bachelor thesis, where I adapted quadtree, for closest segment location, with expected time O(log n), I think similar approach could be used here
Actually you search for triangles whose circumcircles include the new point p. Thus your Delaunay triangulation is already the data structure you need: First search for the triangle t which includes p (google for 'delaunay walk'). The circumcircle of t certainly includes p. Then start from t and grow the (connected) area of triangles whose circumcircles include p.
Implementing it in a fast an reliable way is a lot of work. Unless you want to create a new library you may want to use an existing one. My approach for C++ is Fade2D [1] but there are also many others, it depends on your specific needs.
[1] http://www.geom.at/fade2d/html/

Search optimization problem

Suppose you have a list of 2D points with an orientation assigned to them. Let the set S be defined as:
S={ (x,y,a) | (x,y) is a 2D point, a is an orientation (an angle) }.
Given an element s of S, we will indicate with s_p the point part and with s_a the angle part. I would like to know if there exist an efficient data structure such that, given a query point q, is able to return all the elements s in S such that
(dist(q_p, s_p) < threshold_1) AND (angle_diff(q_a, s_a) < threshold_2) (1)
where dist(p1,p2), with p1,p2 2D points, is the euclidean distance, and angle_diff(a1,a2), with a1,a2 angles, is the difference between angles (taken to be the smallest one). The data structure should be efficient w.r.t. insertion/deletion of elements and the search as defined above. The number of vectors can grow up to 10.000 and more, but take this with a grain of salt.
Now suppose to change the above requirement: instead of using the condition (1), let's request all the elements of S such that, given a distance function d, we want all elements of S such that d(q,s) < threshold. If i remember well, this last setup is called range-search. I don't know if the first case can be transformed in the second.
For the distance search I believe the accepted best method is a Binary Space Partition tree. This can be stored as a series of bits. Each two bits (for a 2D tree) or three bits (for a 3D tree) subdivides the space one more level, increasing resolution.
Using a BSP, locating a set of objects to compare distances with is pretty easy. Just find the smallest set of squares or cubes which contain the edges of your distance box.
For the angle, I don't know of anything. I suppose that you could store each object in a second list or tree sorted by its angle. Then you would find every object at the proper distance using the BSP, every object at the proper angles using the angle tree, then do a set intersection.
You have effectively described a "three dimensional cyclindrical space", ie. a space that is locally three dimensional but where one dimension is topologically cyclic. In other words, it is locally flat and may be modeled as the boundary of a four-dimensional object C4 in (x, y, z, w) defined by
z^2 + w^2 = 1
where
a = arctan(w/z)
With this model, the space defined by your constraints is a 2-dimensional cylinder wrapped "lengthwise" around a cross section wedge, where the wedge wraps around the 4-d cylindrical space with an angle of 2 * threshold_2. This can be modeled using a "modified k-d tree" approach (modified 3-d tree), where the data structure is not a tree but actually a graph (it has cycles). You can still partition this space into cells with hyperplane separation, but traveling along the curve defined by (z, w) in the positive direction may encounter a point encountered in the negative direction. The tree should be modified to actually lead to these nodes from both directions, so that the edges are bidirectional (in the z-w curve direction - the others are obviously still unidirectional).
These cycles do not change the effectiveness of the data structure in locating nearby points or allowing your constraint search. In fact, for the most part, those algorithms are only slightly modified (the simplest approach being to hold a visited node data structure to prevent cycles in the search - you test the next neighbors about to be searched).
This will work especially well for your criteria, since the region you define is effectively bounded by these axis-defined hyperplane-bounded cells of a k-d tree, and so the search termination will leave a region on average populated around pi / 4 percent of the area.