Polygons from network of connected points - numpy

Given an array of 2D points (#pts x 2) and an array of which points are connected to which (#bonds x 2 int array with indices of pts), how can I efficiently return an array of polygons formed from the bonds?
There can be 'dangling' bonds (like in the top left of the image below) that don't close a polygon, and these should be ignored.
Here's an example:
import numpy as np
xy = np.array([[2.72,-2.976], [2.182,-3.40207],
[-3.923,-3.463], [2.1130,4.5460], [2.3024,3.4900], [.96979,-.368],
[-2.632,3.7555], [-.5086,.06170], [.23409,-.6588], [.20225,-.9540],
[-.5267,-1.981], [-2.190,1.4710], [-4.341,3.2331], [-3.318,3.2654],
[.58510,4.1406], [.74331,2.9556], [.39622,3.6160], [-.8943,1.0643],
[-1.624,1.5259], [-1.414,3.5908], [-1.321,3.6770], [1.6148,1.0070],
[.76172,2.4627], [.76935,2.4838], [3.0322,-2.124], [1.9273,-.5527],
[-2.350,-.8412], [-3.053,-2.697], [-1.945,-2.795], [-1.905,-2.767],
[-1.904,-2.765], [-3.546,1.3208], [-2.513,1.3117], [-2.953,-.5855],
[-4.368,-.9650]])
BL= np.array([[22,23], [28,29], [8,9],
[12,31], [18,19], [31,32], [3,14],
[32,33], [24,25], [10,30], [15,23],
[5,25], [12,13], [0,24], [27,28],
[15,16], [5,8], [0,1], [11,18],
[2,27], [11,13], [33,34], [26,33],
[29,30], [7,17], [9,10], [26,30],
[17,22], [5,21], [19,20], [17,18],
[14,16], [7,26], [21,22], [3,4],
[4,15], [11,32], [6,19], [6,13],
[16,20], [27,34], [7,8], [1,9]])

I can't tell you how to implement it with numpy, but here's an outline of a possible algorithm:
Add a list of attached bonds to each point.
Remove the points that have only one bond attached, remove this bond as well (these are the dangling bonds)
Attach two boolean markers to each bond, indicating if the bond has already been added to a polygon in one of the two possible directions. Each bond can only be used in two polygons. Initially set all markers to false.
Select any initial point and repeat the following step until all bonds have been used in both directions:
Select a bond that has not been used (in the respective direction). This is the first edge of the polygon. Of the bonds attached to the end point of the selected one, choose the one with minimal angle in e.g. counter-clockwise direction. Add this to the polygon and continue until you return to the initial point.
This algorithm will also produce a large polygon containing all the outer bonds of the network. I guess you will find a way to recognize this one and remove it.

For future readers, the bulk of the implementation of Frank's suggestion in numpy is below. The extraction of the boundary follows essentially the same algorithm as walking around a polygon, except using the minimum angle bond, rather than the max.
def extract_polygons_lattice(xy, BL, NL, KL):
''' Extract polygons from a lattice of points.
Parameters
----------
xy : NP x 2 float array
points living on vertices of dual to triangulation
BL : Nbonds x 2 int array
Each row is a bond and contains indices of connected points
NL : NP x NN int array
Neighbor list. The ith row has neighbors of the ith particle, padded with zeros
KL : NP x NN int array
Connectivity list. The ith row has ones where ith particle is connected to NL[i,j]
Returns
----------
polygons : list
list of lists of indices of each polygon
PPC : list
list of patches for patch collection
'''
NP = len(xy)
NN = np.shape(KL)[1]
# Remove dangling bonds
# dangling bonds have one particle with only one neighbor
finished_dangles = False
while not finished_dangles:
dangles = np.where([ np.count_nonzero(row)==1 for row in KL])[0]
if len(dangles) >0:
# Make sorted bond list of dangling bonds
dpair = np.sort(np.array([ [d0, NL[d0,np.where(KL[d0]!=0)[0]] ] for d0 in dangles ]), axis=1)
# Remove those bonds from BL
BL = setdiff2d(BL,dpair.astype(BL.dtype))
print 'dpair = ', dpair
print 'ending BL = ', BL
NL, KL = BL2NLandKL(BL,NP=NP,NN=NN)
else:
finished_dangles = True
# bond markers for counterclockwise, clockwise
used = np.zeros((len(BL),2), dtype = bool)
polygons = []
finished = False
while (not finished) and len(polygons)<20:
# Check if all bond markers are used in order A-->B
todoAB = np.where(~used[:,0])[0]
if len(todoAB) > 0:
bond = BL[todoAB[0]]
# bb will be list of polygon indices
# Start with orientation going from bond[0] to bond[1]
nxt = bond[1]
bb = [ bond[0], nxt ]
dmyi = 1
# as long as we haven't completed the full outer polygon, add next index
while nxt != bond[0]:
n_tmp = NL[ nxt, np.argwhere(KL[nxt]).ravel()]
# Exclude previous boundary particle from the neighbors array, unless its the only one
# (It cannot be the only one, if we removed dangling bonds)
if len(n_tmp) == 1:
'''The bond is a lone bond, not part of a triangle.'''
neighbors = n_tmp
else:
neighbors = np.delete(n_tmp, np.where(n_tmp == bb[dmyi-1])[0])
angles = np.mod( np.arctan2(xy[neighbors,1]-xy[nxt,1],xy[neighbors,0]-xy[nxt,0]).ravel() \
- np.arctan2( xy[bb[dmyi-1],1]-xy[nxt,1], xy[bb[dmyi-1],0]-xy[nxt,0] ).ravel(), 2*np.pi)
nxt = neighbors[angles == max(angles)][0]
bb.append( nxt )
# Now mark the current bond as used
thisbond = [bb[dmyi-1], bb[dmyi]]
# Get index of used matching thisbond
mark_used = np.where((BL == thisbond).all(axis=1))
if len(mark_used)>0:
#print 'marking bond [', thisbond, '] as used'
used[mark_used,0] = True
else:
# Used this bond in reverse order
used[mark_used,1] = True
dmyi += 1
polygons.append(bb)
else:
# Check for remaining bonds unused in reverse order (B-->A)
todoBA = np.where(~used[:,1])[0]
if len(todoBA) >0:
bond = BL[todoBA[0]]
# bb will be list of polygon indices
# Start with orientation going from bond[0] to bond[1]
nxt = bond[0]
bb = [ bond[1], nxt ]
dmyi = 1
# as long as we haven't completed the full outer polygon, add nextIND
while nxt != bond[1]:
n_tmp = NL[ nxt, np.argwhere(KL[nxt]).ravel()]
# Exclude previous boundary particle from the neighbors array, unless its the only one
# (It cannot be the only one, if we removed dangling bonds)
if len(n_tmp) == 1:
'''The bond is a lone bond, not part of a triangle.'''
neighbors = n_tmp
else:
neighbors = np.delete(n_tmp, np.where(n_tmp == bb[dmyi-1])[0])
angles = np.mod( np.arctan2(xy[neighbors,1]-xy[nxt,1],xy[neighbors,0]-xy[nxt,0]).ravel() \
- np.arctan2( xy[bb[dmyi-1],1]-xy[nxt,1], xy[bb[dmyi-1],0]-xy[nxt,0] ).ravel(), 2*np.pi)
nxt = neighbors[angles == max(angles)][0]
bb.append( nxt )
# Now mark the current bond as used --> note the inversion of the bond order to match BL
thisbond = [bb[dmyi], bb[dmyi-1]]
# Get index of used matching [bb[dmyi-1],nxt]
mark_used = np.where((BL == thisbond).all(axis=1))
if len(mark_used)>0:
used[mark_used,1] = True
dmyi += 1
polygons.append(bb)
else:
# All bonds have been accounted for
finished = True
# Check for duplicates (up to cyclic permutations) in polygons
# Note that we need to ignore the last element of each polygon (which is also starting pt)
keep = np.ones(len(polygons),dtype=bool)
for ii in range(len(polygons)):
polyg = polygons[ii]
for p2 in polygons[ii+1:]:
if is_cyclic_permutation(polyg[:-1],p2[:-1]):
keep[ii] = False
polygons = [polygons[i] for i in np.where(keep)[0]]
# Remove the polygon which is the entire lattice boundary, except dangling bonds
boundary = extract_boundary_from_NL(xy,NL,KL)
print 'boundary = ', boundary
keep = np.ones(len(polygons),dtype=bool)
for ii in range(len(polygons)):
polyg = polygons[ii]
if is_cyclic_permutation(polyg[:-1],boundary.tolist()):
keep[ii] = False
elif is_cyclic_permutation(polyg[:-1],boundary[::-1].tolist()):
keep[ii] = False
polygons = [polygons[i] for i in np.where(keep)[0]]
# Prepare a polygon patch collection
PPC = []
for polyINDs in polygons:
pp = Path(xy[polyINDs],closed=True)
ppp = patches.PathPatch(pp, lw=2)
PPC.append(ppp)
return polygons, PPC

Related

networkx position (biggest) nodes in the middle of a graph

I've been creating graphs with the networkx package and everything works fine. I would like to make the graphs even better by placing the bigger nodes in the middle of the graph and the layout functions from networkx does not seem to do the job. The nodes represent the size of degree (the higher connected the node, the bigger).
Is there any way to program these graphs in such a way that the bigger nodes are positioned in the middle? It does not have to be automated, i could also manually choose the nodes and give them the middle position but i can also not find how to do this.
If this is not possible with networkx or something else; is there any way to do it with Gephi or cytoscape? I had trouble with Gephi that it does not import the graph the same way i see it in my jupyter notebook (the colors, the node- and edge-sizes do not import).
To summarize; i want to put bigger nodes in the middle of my graph but i dont mind how i get it done (with networkx, matplotlib or whatever).
Unfortunately i cannot provide my actual graphs but here is an example which can look like one of my graphs; it is a directed weighted graph.
G = nx.gnp_random_graph(15, 0.2, directed=True)
d = dict(G.degree(weight='weight'))
d = {k: v/10 for k, v in d.items()}
edge_size = [(float(i)/sum(weights))*100 for i in weights]
node_size = [(v*1000) for v in d.values()]
nx.draw(G,width=edge_size,node_size=node_size)
There are several options:
import networkx as nx
G = nx.gnp_random_graph(15, 0.2, directed=True)
node_degree = dict(G.degree(weight='weight'))
# A) Precompute node positions, and then manually over-ride some node positions.
node_positions = nx.spring_layout(G)
node_positions[0] = (0.5, 0.5) # by default, networkx plots on a canvas with the origin at (0, 0) and a width and height of 1; (0.5, 0.5) is hence the center
nx.draw(G, pos=node_positions, node_size=[100 * node_degree[node] for node in G])
plt.show()
# B) Use netgraph to draw the graph and then drag the nodes around with the mouse.
from netgraph import InteractiveGraph # pip install netgraph
plot_instance = InteractiveGraph(G, node_size=node_degree)
plt.show()
# C) Modify the Fruchterman-Reingold algorithm to include a gravitational force that pulls nodes with a large "mass" towards the center.
# This is left as an exercise to the interested reader (i.e. very non-trivial).
Edit: option C is non-trivial but also very do-able.
Here is my stab at it.
#!/usr/bin/env python
# coding: utf-8
"""
FR layout but with an additional gravitational pull towards a gravitational center.
The pull is proportional to the mass of the node.
"""
import numpy as np
import matplotlib.pyplot as plt
# pip install netgraph
from netgraph._main import BASE_SCALE
from netgraph._utils import (
_get_unique_nodes,
_edge_list_to_adjacency_matrix,
)
from netgraph._node_layout import (
_is_within_bbox,
_get_temperature_decay,
_get_fr_repulsion,
_get_fr_attraction,
_rescale_to_frame,
_handle_multiple_components,
_reduce_node_overlap,
)
DEBUG = False
#_handle_multiple_components
def get_fruchterman_reingold_newton_layout(edges,
edge_weights = None,
k = None,
g = 1.,
scale = None,
origin = None,
gravitational_center = None,
initial_temperature = 1.,
total_iterations = 50,
node_size = 0,
node_mass = 1,
node_positions = None,
fixed_nodes = None,
*args, **kwargs):
"""Modified Fruchterman-Reingold node layout.
Uses a modified Fruchterman-Reingold algorithm [Fruchterman1991]_ to compute node positions.
This algorithm simulates the graph as a physical system, in which nodes repell each other.
For connected nodes, this repulsion is counteracted by an attractive force exerted by the edges, which are simulated as springs.
Unlike the original algorithm, there is an additional attractive force pulling nodes towards a gravitational center, in proportion to their masses.
Parameters
----------
edges : list
The edges of the graph, with each edge being represented by a (source node ID, target node ID) tuple.
edge_weights : dict
Mapping of edges to edge weights.
k : float or None, default None
Expected mean edge length. If None, initialized to the sqrt(area / total nodes).
g : float or None, default 1.
Gravitational constant that sets the magnitude of the gravitational pull towards the center.
origin : tuple or None, default None
The (float x, float y) coordinates corresponding to the lower left hand corner of the bounding box specifying the extent of the canvas.
If None is given, the origin is placed at (0, 0).
scale : tuple or None, default None
The (float x, float y) dimensions representing the width and height of the bounding box specifying the extent of the canvas.
If None is given, the scale is set to (1, 1).
gravitational_center : tuple or None, default None
The (float x, float y) coordinates towards which nodes experience a gravitational pull.
If None, the gravitational center is placed at the center of the canvas defined by origin and scale.
total_iterations : int, default 50
Number of iterations.
initial_temperature: float, default 1.
Temperature controls the maximum node displacement on each iteration.
Temperature is decreased on each iteration to eventually force the algorithm into a particular solution.
The size of the initial temperature determines how quickly that happens.
Values should be much smaller than the values of `scale`.
node_size : scalar or dict, default 0.
Size (radius) of nodes.
Providing the correct node size minimises the overlap of nodes in the graph,
which can otherwise occur if there are many nodes, or if the nodes differ considerably in size.
node_mass : scalar or dict, default 1.
Mass of nodes.
Nodes with higher mass experience a larger gravitational pull towards the center.
node_positions : dict or None, default None
Mapping of nodes to their (initial) x,y positions. If None are given,
nodes are initially placed randomly within the bounding box defined by `origin` and `scale`.
If the graph has multiple components, explicit initial positions may result in a ValueError,
if the initial positions fall outside of the area allocated to that specific component.
fixed_nodes : list or None, default None
Nodes to keep fixed at their initial positions.
Returns
-------
node_positions : dict
Dictionary mapping each node ID to (float x, float y) tuple, the node position.
References
----------
.. [Fruchterman1991] Fruchterman, TMJ and Reingold, EM (1991) ‘Graph drawing by force‐directed placement’,
Software: Practice and Experience
"""
# This is just a wrapper around `_fruchterman_reingold`, which implements (the loop body of) the algorithm proper.
# This wrapper handles the initialization of variables to their defaults (if not explicitely provided),
# and checks inputs for self-consistency.
assert len(edges) > 0, "The list of edges has to be non-empty."
if origin is None:
if node_positions:
minima = np.min(list(node_positions.values()), axis=0)
origin = np.min(np.stack([minima, np.zeros_like(minima)], axis=0), axis=0)
else:
origin = np.zeros((2))
else:
# ensure that it is an array
origin = np.array(origin)
if scale is None:
if node_positions:
delta = np.array(list(node_positions.values())) - origin[np.newaxis, :]
maxima = np.max(delta, axis=0)
scale = np.max(np.stack([maxima, np.ones_like(maxima)], axis=0), axis=0)
else:
scale = np.ones((2))
else:
# ensure that it is an array
scale = np.array(scale)
assert len(origin) == len(scale), \
"Arguments `origin` (d={}) and `scale` (d={}) need to have the same number of dimensions!".format(len(origin), len(scale))
dimensionality = len(origin)
if gravitational_center is None:
gravitational_center = origin + 0.5 * scale
else:
# ensure that it is an array
gravitational_center = np.array(gravitational_center)
if fixed_nodes is None:
fixed_nodes = []
connected_nodes = _get_unique_nodes(edges)
if node_positions is None: # assign random starting positions to all nodes
node_positions_as_array = np.random.rand(len(connected_nodes), dimensionality) * scale + origin
unique_nodes = connected_nodes
else:
# 1) check input dimensionality
dimensionality_node_positions = np.array(list(node_positions.values())).shape[1]
assert dimensionality_node_positions == dimensionality, \
"The dimensionality of values of `node_positions` (d={}) must match the dimensionality of `origin`/ `scale` (d={})!".format(dimensionality_node_positions, dimensionality)
is_valid = _is_within_bbox(list(node_positions.values()), origin=origin, scale=scale)
if not np.all(is_valid):
error_message = "Some given node positions are not within the data range specified by `origin` and `scale`!"
error_message += "\n\tOrigin : {}, {}".format(*origin)
error_message += "\n\tScale : {}, {}".format(*scale)
error_message += "\nThe following nodes do not fall within this range:"
for ii, (node, position) in enumerate(node_positions.items()):
if not is_valid[ii]:
error_message += "\n\t{} : {}".format(node, position)
error_message += "\nThis error can occur if the graph contains multiple components but some or all node positions are initialised explicitly (i.e. node_positions != None)."
raise ValueError(error_message)
# 2) handle discrepancies in nodes listed in node_positions and nodes extracted from edges
if set(node_positions.keys()) == set(connected_nodes):
# all starting positions are given;
# no superfluous nodes in node_positions;
# nothing left to do
unique_nodes = connected_nodes
else:
# some node positions are provided, but not all
for node in connected_nodes:
if not (node in node_positions):
warnings.warn("Position of node {} not provided. Initializing to random position within frame.".format(node))
node_positions[node] = np.random.rand(2) * scale + origin
unconnected_nodes = []
for node in node_positions:
if not (node in connected_nodes):
unconnected_nodes.append(node)
fixed_nodes.append(node)
# warnings.warn("Node {} appears to be unconnected. The current node position will be kept.".format(node))
unique_nodes = connected_nodes + unconnected_nodes
node_positions_as_array = np.array([node_positions[node] for node in unique_nodes])
total_nodes = len(unique_nodes)
if isinstance(node_size, (int, float)):
node_size = node_size * np.ones((total_nodes))
elif isinstance(node_size, dict):
node_size = np.array([node_size[node] if node in node_size else 0. for node in unique_nodes])
if isinstance(node_mass, (int, float)):
node_mass = node_mass * np.ones((total_nodes))
elif isinstance(node_mass, dict):
node_mass = np.array([node_mass[node] if node in node_mass else 0. for node in unique_nodes])
adjacency = _edge_list_to_adjacency_matrix(
edges, edge_weights=edge_weights, unique_nodes=unique_nodes)
# Forces in FR are symmetric.
# Hence we need to ensure that the adjacency matrix is also symmetric.
adjacency = adjacency + adjacency.transpose()
if fixed_nodes:
is_mobile = np.array([False if node in fixed_nodes else True for node in unique_nodes], dtype=bool)
mobile_positions = node_positions_as_array[is_mobile]
fixed_positions = node_positions_as_array[~is_mobile]
mobile_node_sizes = node_size[is_mobile]
fixed_node_sizes = node_size[~is_mobile]
mobile_node_masses = node_mass[is_mobile]
fixed_node_masses = node_mass[~is_mobile]
# reorder adjacency
total_mobile = np.sum(is_mobile)
reordered = np.zeros((adjacency.shape[0], total_mobile))
reordered[:total_mobile, :total_mobile] = adjacency[is_mobile][:, is_mobile]
reordered[total_mobile:, :total_mobile] = adjacency[~is_mobile][:, is_mobile]
adjacency = reordered
else:
is_mobile = np.ones((total_nodes), dtype=bool)
mobile_positions = node_positions_as_array
fixed_positions = np.zeros((0, 2))
mobile_node_sizes = node_size
fixed_node_sizes = np.array([])
mobile_node_masses = node_mass
fixed_node_masses = np.array([])
if k is None:
area = np.product(scale)
k = np.sqrt(area / float(total_nodes))
temperatures = _get_temperature_decay(initial_temperature, total_iterations)
# --------------------------------------------------------------------------------
# main loop
for ii, temperature in enumerate(temperatures):
candidate_positions = _fruchterman_reingold_newton(mobile_positions, fixed_positions,
mobile_node_sizes, fixed_node_sizes,
adjacency, temperature, k,
mobile_node_masses, fixed_node_masses,
gravitational_center, g)
is_valid = _is_within_bbox(candidate_positions, origin=origin, scale=scale)
mobile_positions[is_valid] = candidate_positions[is_valid]
# --------------------------------------------------------------------------------
# format output
node_positions_as_array[is_mobile] = mobile_positions
if np.all(is_mobile):
node_positions_as_array = _rescale_to_frame(node_positions_as_array, origin, scale)
node_positions = dict(zip(unique_nodes, node_positions_as_array))
return node_positions
def _fruchterman_reingold_newton(mobile_positions, fixed_positions,
mobile_node_radii, fixed_node_radii,
adjacency, temperature, k,
mobile_node_masses, fixed_node_masses,
gravitational_center, g):
"""Inner loop of modified Fruchterman-Reingold layout algorithm."""
combined_positions = np.concatenate([mobile_positions, fixed_positions], axis=0)
combined_node_radii = np.concatenate([mobile_node_radii, fixed_node_radii])
delta = mobile_positions[np.newaxis, :, :] - combined_positions[:, np.newaxis, :]
distance = np.linalg.norm(delta, axis=-1)
# alternatively: (hack adapted from igraph)
if np.sum(distance==0) - np.trace(distance==0) > 0: # i.e. if off-diagonal entries in distance are zero
warnings.warn("Some nodes have the same position; repulsion between the nodes is undefined.")
rand_delta = np.random.rand(*delta.shape) * 1e-9
is_zero = distance <= 0
delta[is_zero] = rand_delta[is_zero]
distance = np.linalg.norm(delta, axis=-1)
# subtract node radii from distances to prevent nodes from overlapping
distance -= mobile_node_radii[np.newaxis, :] + combined_node_radii[:, np.newaxis]
# prevent distances from becoming less than zero due to overlap of nodes
distance[distance <= 0.] = 1e-6 # 1e-13 is numerical accuracy, and we will be taking the square shortly
with np.errstate(divide='ignore', invalid='ignore'):
direction = delta / distance[..., None] # i.e. the unit vector
# calculate forces
repulsion = _get_fr_repulsion(distance, direction, k)
attraction = _get_fr_attraction(distance, direction, adjacency, k)
gravity = _get_gravitational_pull(mobile_positions, mobile_node_masses, gravitational_center, g)
if DEBUG:
r = np.median(np.linalg.norm(repulsion, axis=-1))
a = np.median(np.linalg.norm(attraction, axis=-1))
g = np.median(np.linalg.norm(gravity, axis=-1))
print(r, a, g)
displacement = attraction + repulsion + gravity
# limit maximum displacement using temperature
displacement_length = np.linalg.norm(displacement, axis=-1)
displacement = displacement / displacement_length[:, None] * np.clip(displacement_length, None, temperature)[:, None]
mobile_positions = mobile_positions + displacement
return mobile_positions
def _get_gravitational_pull(mobile_positions, mobile_node_masses, gravitational_center, g):
delta = gravitational_center[np.newaxis, :] - mobile_positions
direction = delta / np.linalg.norm(delta, axis=-1)[:, np.newaxis]
magnitude = mobile_node_masses - np.mean(mobile_node_masses)
return g * magnitude[:, np.newaxis] * direction
if __name__ == '__main__':
import networkx as nx
from netgraph import Graph
G = nx.gnp_random_graph(15, 0.2, directed=True)
node_degree = dict(G.degree(weight='weight'))
node_positions = get_fruchterman_reingold_newton_layout(
list(G.edges()),
node_size={node : BASE_SCALE * degree for node, degree in node_degree.items()},
node_mass=node_degree, g=2
)
Graph(G, node_layout=node_positions, node_size=node_degree)
plt.show()

Find pairs of array such as array_1 = -array_2

I search a way to find all the vector from a np.meshgrid(xrange, xrange, xrange) that are related by k = -k.
For the moment I do that :
#numba.njit
def find_pairs(array):
boolean = np.ones(len(array), dtype=np.bool_)
pairs = []
idx = [i for i in range(len(array))]
while len(idx) > 1:
e1 = idx[0]
for e2 in idx:
if (array[e1] == -array[e2]).all():
boolean[e2] = False
pairs.append([e1, e2])
idx.remove(e1)
if e2 != e1:
idx.remove(e2)
break
return boolean, pairs
# Give array of 3D vectors
krange = np.fft.fftfreq(N)
comb_array = np.array(np.meshgrid(krange, krange, krange)).T.reshape(-1, 3)
# Take idx of the pairs k, -k vector and boolean selection that give position of -k vectors
boolean, pairs = find_pairs(array)
It works but the execution time grow rapidly with N...
Maybe someone has already deal with that?
The main problem is that comb_array has a shape of (R, 3) where R = N**3 and the nested loop in find_pairs runs at least in quadratic time since idx.remove runs in linear time and is called in the for loop. Moreover, there are cases where the for loop does not change the size of idx and the loop appear to run forever (eg. with N=4).
One solution to solve this problem in O(R log R) is to sort the array and then check for opposite values in linear time:
import numpy as np
import numba as nb
# Give array of 3D vectors
krange = np.fft.fftfreq(N)
comb_array = np.array(np.meshgrid(krange, krange, krange)).T.reshape(-1, 3)
# Sorting
packed = comb_array.view([('x', 'f8'), ('y', 'f8'), ('z', 'f8')])
idx = np.argsort(packed, axis=0).ravel()
sorted_comb = comb_array[idx]
# Find pairs
#nb.njit
def findPairs(sorted_comb, idx):
n = idx.size
boolean = np.zeros(n, dtype=np.bool_)
pairs = []
cur = n-1
for i in range(n):
while cur >= i:
if np.all(sorted_comb[i] == -sorted_comb[cur]):
boolean[idx[i]] = True
pairs.append([idx[i], idx[cur]])
cur -= 1
break
cur -= 1
return boolean, pairs
findPairs(sorted_comb, idx)
Note that the algorithm assume that for each row, there are only up to one valid matching pair. If there are several equal rows, they are paired 2 by two. If your goal is to extract all the combination of equal rows in this case, then please note that the output will grow exponentially (which is not reasonable IMHO).
This solution is pretty fast even for N = 100. Most of the time is spent in the sort that is not very efficient (unfortunately Numpy does not provide a way to do a lexicographic argsort of the row efficiently yet though this operation is fundamentally expensive).

Create line network from closest points with boundaries

I have a set of points and I want to create line / road network from those points. Firstly, I need to determine the closest point from each of the points. For that, I used the KD Tree and developed a code like this:
def closestPoint(source, X = None, Y = None):
df = pd.DataFrame(source).copy(deep = True) #Ensure source is a dataframe, working on a copy to keep the datasource
if(X is None and Y is None):
raise ValueError ("Please specify coordinate")
elif(not X in df.keys() and not Y in df.keys()):
raise ValueError ("X and/or Y is/are not in column names")
else:
df["coord"] = tuple(zip(df[X],df[Y])) #create a coordinate
if (df["coord"].duplicated):
uniq = df.drop_duplicates("coord")["coord"]
uniqval = list(uniq.get_values())
dupl = df[df["coord"].duplicated()]["coord"]
duplval = list(dupl.get_values())
for kq,vq in uniq.items():
clstu = spatial.KDTree(uniqval).query(vq, k = 3)[1]
df.at[kq,"coord"] = [vq,uniqval[clstu[1]]]
if([uniqval[clstu[1]],vq] in list(df["coord"]) ):
df.at[kq,"coord"] = [vq,uniqval[clstu[2]]]
for kd,vd in dupl.items():
clstd = spatial.KDTree(duplval).query(vd,k = 1)[1]
df.at[kd,"coord"] = [vd,duplval[clstd]]
else:
val = df["coord"].get_values()
for k,v in df["coord"].items():
clst = spatial.KDTree(val).query(vd, k = 3)[1]
df.at[k,"coord"] = [v,val[clst[1]]]
if([val[clst[1]],v] in list (df["coord"])):
df.at[k,"coord"] = [v,val[clst[2]]]
return df["coord"]
The code can return the the closest points around. However, I need to ensure that no double lines are created (e.g (x,y) to (x1,y1) and (x1,y1) to (x,y)) and also I need to ensure that each point can only be used as a starting point of a line and an end point of a line despite the point being the closest one to the other points.
Below is the visualization of the result:
Result of the code
What I want:
What I want
I've also tried to separate the origin and target coordinate and do it like this:
df["coord"] = tuple(zip(df[X],df[Y])) #create a coordinate
df["target"] = "" #create a column for target points
count = 2 # create a count iteration
if (df["coord"].duplicated):
uniq = df.drop_duplicates("coord")["coord"]
uniqval = list(uniq.get_values())
for kq,vq in uniq.items():
clstu = spatial.KDTree(uniqval).query(vq, k = count)[1]
while not vq in (list(df["target"]) and list(df["coord"])):
clstu = spatial.KDTree(uniqval).query(vq, k = count)[1]
df.set_value(kq, "target", uniqval[clstu[count-1]])
else:
count += 1
clstu = spatial.KDTree(uniqval).query(vq, k = count)[1]
df.set_value(kq, "target", uniqval[clstu[count-1]])
but this return an error
IndexError: list index out of range
Can anyone help me with this? Many thanks!
Answering now about the global strategy, here is what I would do (rough pseudo-algorithm):
current_point = one starting point in uniqval
while (uniqval not empty)
construct KDTree from uniqval and use it for next line
next_point = point in uniqval closest to current_point
record next_point as target for current_point
remove current_point from uniqval
current_point = next_point
What you will obtain is a linear graph joining all your points, using closest neighbors "in some way". I don't know if it will fit your needs. You would also obtain a linear graph by taking next_point at random...
It is hard to comment on your global strategy without further detail about the kind of road network your want to obtain. So let me just comment your specific code and explain why the "out of range" error happens. I hope this can help.
First, are you aware that (list_a and list_b) will return list_a if it is empty, else list_b? Second, isn't the condition (vq in list(df["coord"]) always True? If yes, then your while loop is just always executing the else statement, and at the last iteration of the for loop, (count-1) will be greater than the total number of (unique) points. Hence your KDTree query does not return enough points and clstu[count-1] is out of range.

Verlet integrator + friction

I have been following "A Verlet based approach for 2D game physics" on Gamedev.net and I have written something similar.
The problem I am having is that the boxes slide along the ground too much.
How can I add a simple rested state thing where the boxes will have more friction and only slide a tiny bit?
Just introduce a small, constant acceleration on moving objects that points in the direction opposite to the motion. And make sure it can't actually reverse the motion; if you detect that in an integration step, just set the velocity to zero.
If you want to be more realistic, the acceleration should derive from a force which is proportional to the normal force between the object and the surface it's sliding on.
You can find this in any basic physics text, as "kinetic friction" or "sliding friction".
At the verlet integration: r(t)=2.00*r(t-dt)-1.00*r(t-2dt)+2at²
change the multipliers to 1.99 and 0.99 for friction
Edit: this is more true:
r(t)=(2.00-friction_mult.)*r(t-dt)-(1.00-friction_mult.)*r(t-2dt)+at²
Here is a simple time stepping scheme (symplectic Euler method with manually resolved LCP) for a box with Coulomb friction and a spring (frictional oscillator)
mq'' + kq + mu*sgn(q') = F(t)
import numpy as np
import matplotlib.pyplot as plt
q0 = 0 # initial position
p0 = 0 # initial momentum
t_start = 0 # initial time
t_end = 10 # end time
N = 500 # time points
m = 1 # mass
k = 1 # spring stiffness
muN = 0.5 # friction force (slip and maximal stick)
omega = 1.5 # forcing radian frequency [RAD]
Fstat = 0.1 # static component of external force
Fdyn = 0.6 # amplitude of harmonic external force
F = lambda tt,qq,pp: Fstat + Fdyn*np.sin(omega*tt) - k*qq - muN*np.sign(pp) # total force, note sign(0)=0 used to disable friction
zero_to_disable_friction = 0
omega0 = np.sqrt(k/m)
print("eigenfrequency f = {} Hz; eigen period T = {} s".format(omega0/(2*np.pi), 2*np.pi/omega0))
print("forcing frequency f = {} Hz; forcing period T = {} s".format(omega/(2*np.pi), 2*np.pi/omega))
time = np.linspace(t_start, t_end, N) # time grid
h = time[1] - time[0] # time step
q = np.zeros(N+1) # position
p = np.zeros(N+1) # momentum
absFfriction = np.zeros(N+1)
q[0] = q0
p[0] = p0
for n, tn in enumerate(time):
p1slide = p[n] + h*F(tn, q[n], p[n]) # end-time momentum, assuming sliding
q1slide = q[n] + h*p1slide/m # end-time position, assuming sliding
if p[n]*p1slide > 0: # sliding goes on
q[n+1] = q1slide
p[n+1] = p1slide
absFfriction[n] = muN
else:
q1stick = q[n] # assume p1 = 0 at t=tn+h
Fstick = -p[n]/h - F(tn, q1stick, zero_to_disable_friction) # friction force needed to stop at t=tn+h
if np.abs(Fstick) <= muN:
p[n+1] = 0 # sticking
q[n+1] = q1stick
absFfriction[n] = np.abs(Fstick)
else: # sliding starts or passes zero crossing of velocity
q[n+1] = q1slide # possible refinements (adapt to slip-start or zero crossing)
p[n+1] = p1slide
absFfriction[n] = muN

Storing plot objects in a list

I asked this question yesterday about storing a plot within an object. I tried implementing the first approach (aware that I did not specify that I was using qplot() in my original question) and noticed that it did not work as expected.
library(ggplot2) # add ggplot2
string = "C:/example.pdf" # Setup pdf
pdf(string,height=6,width=9)
x_range <- range(1,50) # Specify Range
# Create a list to hold the plot objects.
pltList <- list()
pltList[]
for(i in 1 : 16){
# Organise data
y = (1:50) * i * 1000 # Get y col
x = (1:50) # get x col
y = log(y) # Use natural log
# Regression
lm.0 = lm(formula = y ~ x) # make linear model
inter = summary(lm.0)$coefficients[1,1] # Get intercept
slop = summary(lm.0)$coefficients[2,1] # Get slope
# Make plot name
pltName <- paste( 'a', i, sep = '' )
# make plot object
p <- qplot(
x, y,
xlab = "Radius [km]",
ylab = "Services [log]",
xlim = x_range,
main = paste("Sample",i)
) + geom_abline(intercept = inter, slope = slop, colour = "red", size = 1)
print(p)
pltList[[pltName]] = p
}
# close the PDF file
dev.off()
I have used sample numbers in this case so the code runs if it is just copied. I did spend a few hours puzzling over this but I cannot figure out what is going wrong. It writes the first set of pdfs without problem, so I have 16 pdfs with the correct plots.
Then when I use this piece of code:
string = "C:/test_tabloid.pdf"
pdf(string, height = 11, width = 17)
grid.newpage()
pushViewport( viewport( layout = grid.layout(3, 3) ) )
vplayout <- function(x, y){viewport(layout.pos.row = x, layout.pos.col = y)}
counter = 1
# Page 1
for (i in 1:3){
for (j in 1:3){
pltName <- paste( 'a', counter, sep = '' )
print( pltList[[pltName]], vp = vplayout(i,j) )
counter = counter + 1
}
}
dev.off()
the result I get is the last linear model line (abline) on every graph, but the data does not change. When I check my list of plots, it seems that all of them become overwritten by the most recent plot (with the exception of the abline object).
A less important secondary question was how to generate a muli-page pdf with several plots on each page, but the main goal of my code was to store the plots in a list that I could access at a later date.
Ok, so if your plot command is changed to
p <- qplot(data = data.frame(x = x, y = y),
x, y,
xlab = "Radius [km]",
ylab = "Services [log]",
xlim = x_range,
ylim = c(0,10),
main = paste("Sample",i)
) + geom_abline(intercept = inter, slope = slop, colour = "red", size = 1)
then everything works as expected. Here's what I suspect is happening (although Hadley could probably clarify things). When ggplot2 "saves" the data, what it actually does is save a data frame, and the names of the parameters. So for the command as I have given it, you get
> summary(pltList[["a1"]])
data: x, y [50x2]
mapping: x = x, y = y
scales: x, y
faceting: facet_grid(. ~ ., FALSE)
-----------------------------------
geom_point:
stat_identity:
position_identity: (width = NULL, height = NULL)
mapping: group = 1
geom_abline: colour = red, size = 1
stat_abline: intercept = 2.55595281266726, slope = 0.05543539319091
position_identity: (width = NULL, height = NULL)
However, if you don't specify a data parameter in qplot, all the variables get evaluated in the current scope, because there is no attached (read: saved) data frame.
data: [0x0]
mapping: x = x, y = y
scales: x, y
faceting: facet_grid(. ~ ., FALSE)
-----------------------------------
geom_point:
stat_identity:
position_identity: (width = NULL, height = NULL)
mapping: group = 1
geom_abline: colour = red, size = 1
stat_abline: intercept = 2.55595281266726, slope = 0.05543539319091
position_identity: (width = NULL, height = NULL)
So when the plot is generated the second time around, rather than using the original values, it uses the current values of x and y.
I think you should use the data argument in qplot, i.e., store your vectors in a data frame.
See Hadley's book, Section 4.4:
The restriction on the data is simple: it must be a data frame. This is restrictive, and unlike other graphics packages in R. Lattice functions can take an optional data frame or use vectors directly from the global environment. ...
The data is stored in the plot object as a copy, not a reference. This has two
important consequences: if your data changes, the plot will not; and ggplot2 objects are entirely self-contained so that they can be save()d to disk and later load()ed and plotted without needing anything else from that session.
There is a bug in your code concerning list subscripting. It should be
pltList[[pltName]]
not
pltList[pltName]
Note:
class(pltList[1])
[1] "list"
pltList[1] is a list containing the first element of pltList.
class(pltList[[1]])
[1] "ggplot"
pltList[[1]] is the first element of pltList.
For your second question: Multi-page pdfs are easy -- see help(pdf):
onefile: logical: if true (the default) allow multiple figures in one
file. If false, generate a file with name containing the
page number for each page. Defaults to ‘TRUE’.
For your main question, I don't understand if you want to store the plot inputs in a list for later processing, or the plot outputs. If it is the latter, I am not sure that plot() returns an object you can store and retrieve.
Another suggestion regarding your second question would be to use either Sweave or Brew as they will give you complete control over how you display your multi-page pdf.
Have a look at this related question.