How to speed up graph coloring problem in python PuLP - optimization

I am trying to solve the classic graph coloring problem using python PuLP. We have n nodes, a collection of edges in the form edges = [(node1, node2), (node2, node4), ...], and we are trying to find the minimum number of node colors so that no connected nodes share a color.
My implementation works, but is slow. It is made of three constraints, plus the one optimization of initializing node0 to color 0 to somewhat limit the search space. The code is as follows:
nodes = range(node_count)
n_colors = 10
# colors = range(node_count)
colors = range(n_colors)
prob = LpProblem("coloring", LpMinimize)
# variable xnc shows if node n has color c
xnc = LpVariable.dicts("x", (nodes, colors), cat='Binary')
# array of colors to indicate which ones were used
used_colors = LpVariable.dicts("used", colors, cat='Binary')
# minimize how many colors are used, and minimize int value for those colors
prob += lpSum([used_colors[c] * c for c in colors])
# prob += lpSum([used_colors[c] for c in colors])
# set the first node to color 0 to constrain starting point
prob += xnc[0][0] == 1
# Every node uses one color
for n in nodes:
prob += lpSum([xnc[n][c] for c in colors]) == 1
# Any connected nodes have different colors
for e in edges:
e1, e2 = e[0], e[1]
for c in colors:
prob += xnc[e1][c] + xnc[e2][c] <= 1
# mark color as used if node has that color
for n in nodes:
for c in colors:
prob += xnc[n][c] <= used_colors[c]
prob.solve()
I see that there are symmetries, and I know I could reduce this by making any new color used at most max(colors_already_used) + 1, so that if node 0 is color 0, node 1 will either have the same color, or color 1. But I am not sure how to encode this because max is not allowed the linear nature of the problem in PuLP as far as I know. I achieve a similar effect above by multiplying all colors used by their integer values, which speeds things up a bit but I do not think works as quite the efficient/deterministic constraint I seek.
Also limiting the number of colors seems to have a nice effect on the speed, but I am not sure if it is worth the preprocessing cost to try and find a heuristic before starting the optimization, since it is not clear how many colors will be needed in advance.
What other constraints could I add, or other ways I could speed it up? I am mostly interested in better ways to formulate the problem, but also open to computational optimizations ie parallelization, if they can be done in PuLP.

Related

Computing Bounding Boxes from a Mask-Image (Tensorflow or other)

I'm looking for ways to convert a mask (a Height x Width boolean image) into a series of bounding boxes (see example picture below, which I hand-drew), with boxes encircling the "islands of truth".
Specifically, I'm looking for a way that would work with standard TensorFlow ops (though all input is welcome). I want this so I can convert the model to TFLite without adding custom ops and recompiling from source. But in general it would just be nice to be aware of different ways of doing this.
Notes:
I already have a solution involving non-standard Tensorflow, based on tfa.image.connected_components (see solution here). However that op is not included in Tensorflow Lite. It also feels like it does something slightly harder than necessary (finding connected components feels harder than just outlining blobs on an image without worrying about whether they are connected or not)
I know I haven't specified here exactly how I'd like the boxes generated (e.g whether separate "ying-yang-style" connected components should have separate boxes even if they overlap, etc). Really I'm not worried about the details, just that the resulting boxes look "reasonable".
Some related questions (please read before flagging as duplicate!):
Converting a binary mask into a bounding box in tensorflow asks about creating a single bounding box, which is significantly easier.
Generating bounding boxes from heatmap data (similar, but asks the slightly broader question of converting from "heatmap", and does not specify Tensorflow).
Create Bounding Boxes from Image Labels assumes the image has already been segmented into components (called "labels" there)
I'm ideally looking for something that does not need training (e.g. YOLO-style regression) and just works out of the box (heh).
Edit Here is an example mask image: https://github.com/petered/data/blob/master/images/example_mask3.png which can be loaded into a mask with
mask = cv2.imread(os.path.expanduser('~/Downloads/example_mask3.png')).mean(axis=2) > 50
Well, not sure if this is doable with just tensorflow ops, but here is a Python/Numpy implementation (which uses a very inefficient double-for loop). In principle, it should be fast if vectorized (again, not sure if possible) or written in C, because it just does 2 passes over the pixels to compute the boxes.
I'm not sure if this algorithm has an existing name, but if not I would call it Downright Boxing because it involves extending the mask-segments down and to the right in order to find boxes.
Here's the result on the mask in the question (with a few extra shapes added as examples):
def mask_to_boxes(mask: Array['H,W', bool]) -> Array['N,4', int]:
""" Convert a boolean (Height x Width) mask into a (N x 4) array of NON-OVERLAPPING bounding boxes
surrounding "islands of truth" in the mask. Boxes indicate the (Left, Top, Right, Bottom) bounds
of each island, with Right and Bottom being NON-INCLUSIVE (ie they point to the indices AFTER the island).
This algorithm (Downright Boxing) does not necessarily put separate connected components into
separate boxes.
You can "cut out" the island-masks with
boxes = mask_to_boxes(mask)
island_masks = [mask[t:b, l:r] for l, t, r, b in boxes]
"""
max_ix = max(s+1 for s in mask.shape) # Use this to represent background
# These arrays will be used to carry the "box start" indices down and to the right.
x_ixs = np.full(mask.shape, fill_value=max_ix)
y_ixs = np.full(mask.shape, fill_value=max_ix)
# Propagate the earliest x-index in each segment to the bottom-right corner of the segment
for i in range(mask.shape[0]):
x_fill_ix = max_ix
for j in range(mask.shape[1]):
above_cell_ix = x_ixs[i-1, j] if i>0 else max_ix
still_active = mask[i, j] or ((x_fill_ix != max_ix) and (above_cell_ix != max_ix))
x_fill_ix = min(x_fill_ix, j, above_cell_ix) if still_active else max_ix
x_ixs[i, j] = x_fill_ix
# Propagate the earliest y-index in each segment to the bottom-right corner of the segment
for j in range(mask.shape[1]):
y_fill_ix = max_ix
for i in range(mask.shape[0]):
left_cell_ix = y_ixs[i, j-1] if j>0 else max_ix
still_active = mask[i, j] or ((y_fill_ix != max_ix) and (left_cell_ix != max_ix))
y_fill_ix = min(y_fill_ix, i, left_cell_ix) if still_active else max_ix
y_ixs[i, j] = y_fill_ix
# Find the bottom-right corners of each segment
new_xstops = np.diff((x_ixs != max_ix).astype(np.int32), axis=1, append=False)==-1
new_ystops = np.diff((y_ixs != max_ix).astype(np.int32), axis=0, append=False)==-1
corner_mask = new_xstops & new_ystops
y_stops, x_stops = np.array(np.nonzero(corner_mask))
# Extract the boxes, getting the top-right corners from the index arrays
x_starts = x_ixs[y_stops, x_stops]
y_starts = y_ixs[y_stops, x_stops]
ltrb_boxes = np.hstack([x_starts[:, None], y_starts[:, None], x_stops[:, None]+1, y_stops[:, None]+1])
return ltrb_boxes

Kinetic Theory Model

Edit: I've now fixed the problem I asked about. The spheres were leaving the box in the corners, where the if statements (in the while loop shown below) got confused. In the bits of code that reverse the individual components of velocity on contact with walls, some elif statements were used. When elif is used (as far as I can tell) if the sphere exceeds more than one position limit at a time, the program only reverses the velocity component for one of them. This is rectified when replacing elif with simply if. I'm not sure if I quite understand the reason behind this, so hopefully someone cleverer than I will comment such information, but for now, if anyone has the same problem, I hope my limited input helps!
Some context first:
I'm trying to build a model of the kinetic theory of gases in VPython, as a revision exercise for my (Physics) degree. This involves me building a hollow box and putting a bunch of spheres in it, randomly positioned throughout the box. I then need to assign each of the spheres its own random velocity and then use a loop to adjust the position of each sphere with reference to its velocity vector and a time step.
The spheres should also undergo elastic collisions with each wall and all other spheres.
When a sphere meets a wall in the x-direction, its x-velocity component is reversed and similarly in the y and z directions.
When a sphere meets another sphere, they swap velocities.
Currently, my code works so far as creating the right number of spheres and distributing them randomly and giving each sphere its own random velocity. The spheres also move as they should, except for collisions. The spheres should all stay inside the box as they should bounce off all the walls. They appear to be bouncing off each other, however, occasionally a sphere or two will go straight through the box.
I am extremely new to programming and I don't quite understand what's going on here or why it's happening but I'd be very grateful if someone could help me.
Below is the code I have so far (I've tried to comment what I'm doing at each step):
##########################################################
# This code is meant to create an empty box and then create
# a certain number of spheres (num_spheres) that will sit inside
# the box. Each sphere will then be assigned a random velocity vector.
# A loop will then adjust the position of each sphere to make them
# move. The spheres will undergo elastic collisions with the box walls
# and also with the other spheres in the box.
##########################################################
from visual import *
import random as random
import numpy as np
num_spheres = 15
fps = 24 #fps of while loop (later)
dt = 1.0/fps #time step
l = 40 #length of box
w = 2 #width of box
radius = 0.5 #radius of spheres
##########################################################
# Creating an empty box with sides length/height l, width w
wallR = box(pos = (l/2.0,0,0), size=(w,l,l), color=color.white, opacity=0.25)
wallL = box(pos = (-l/2.0,0,0), size=(w,l,l), color=color.white, opacity=0.25)
wallU = box(pos = (0,l/2.0,0), size=(l,w,l), color=color.white, opacity=0.25)
wallD = box(pos = (0,-l/2.0,0), size=(l,w,l), color=color.white, opacity=0.25)
wallF = box(pos = (0,0,l/2.0), size=(l,l,w), color=color.white, opacity=0.25)
wallB = box(pos = (0,0,-l/2.0), size=(l,l,w), color=color.white, opacity=0.25)
#defining a function that creates a list of 'num_spheres' randomly positioned spheres
def create_spheres(num):
global l, radius
particles = [] # Create an empty list
for i in range(0,num): # Loop i from 0 to num-1
v = np.random.rand(3)
particles.append(sphere(pos= (3.0/4.0*l) * (v - 0.5), #pos such that spheres are inside box
radius = radius, color=color.red, index=i))
# each sphere is given an index for ease of referral later
return particles
#defining a global variable = the array of velocities for the spheres
velarray = []
#defining a function that gives each sphere a random velocity
def velocity_spheres(sphere_list):
global velarray
for sphere in spheres:
#making the sign of each velocity component random
rand = random.randint(0,1)
if rand == 1:
sign = 1
else:
sign = -1
mu = 10 #defining an average for normal distribution
sigma = 0.1 #defining standard deviation of normal distribution
# 3 random numbers form the velocity vector
vel = vector(sign*random.normalvariate(mu, sigma),sign*random.normalvariate(mu, sigma),
sign*random.normalvariate(mu, sigma))
velarray.append(vel)
spheres = create_spheres(num_spheres) #creating some spheres
velocity_spheres(spheres) # invoking the velocity function
while True:
rate(fps)
for sphere in spheres:
sphere.pos += velarray[sphere.index]*dt
#incrementing sphere position by reference to its own velocity vector
if abs(sphere.pos.x) > (l/2.0)-w-radius:
(velarray[sphere.index])[0] = -(velarray[sphere.index])[0]
#reversing x-velocity on contact with a side wall
elif abs(sphere.pos.y) > (l/2.0)-w-radius:
(velarray[sphere.index])[1] = -(velarray[sphere.index])[1]
#reversing y-velocity on contact with a side wall
elif abs(sphere.pos.z) > (l/2.0)-w-radius:
(velarray[sphere.index])[2] = -(velarray[sphere.index])[2]
#reversing z-velocity on contact with a side wall
for sphere2 in spheres: #checking other spheres
if sphere2 != sphere:
#making sure we aren't checking the sphere against itself
if abs(sphere2.pos-sphere.pos) < (sphere.radius+sphere2.radius):
#if the other spheres are touching the sphere we are looking at
v1 = velarray[sphere.index]
#noting the velocity of the first sphere before the collision
velarray[sphere.index] = velarray[sphere2.index]
#giving the first sphere the velocity of the second before the collision
velarray[sphere2.index] = v1
#giving the second sphere the velocity of the first before the collision
Thanks again for any help!
The elif statements within the while loop in the code given in the original question are/were the cause of the problem. The conditional statement, elif, is only applicable if the original, if, condition is not satisfied. The circumstance wherein a sphere meets the corner of the box satisfies at least two of the conditions for reversing velocity components. This means that, while one would expect (at least) two velocity components to be reversed, only one is. That is, the direction specified by the if statement is reversed, whereas the component(s) mentioned in the elif statement(s) are not, as the first condition has been satisfied and, hence, the elif statements are ignored.
If each elif is changed to be a separate if statement, the code works as intended.

efficient way to draw continuous line in psychopy

I'm looking for a more efficient way to draw continuous lines in PsychoPy. That's what I've come up with, for now...
edit: the only improvement I could think of is to add a new line only if the mouse has really moved by adding if (mspos1-mspos2).any():
ms = event.Mouse(myWin)
lines = []
mspos1 = ms.getPos()
while True:
mspos2 = ms.getPos()
if (mspos1-mspos2).any():
lines.append(visual.Line(myWin, start=mspos1, end=mspos2))
for j in lines:
j.draw()
myWin.flip()
mspos1 = mspos2
edit: I tried it with Shape.Stim (code below), hoping that it would work better, but it get's edgy even more quickly..
vertices = [ms.getPos()]
con_line = visual.ShapeStim(myWin,
lineColor='red',
closeShape=False)
myclock.reset()
i = 0
while myclock.getTime() < 15:
new_pos = ms.getPos()
if (vertices[i]-new_pos).any():
vertices.append(new_pos)
i += 1
con_line.vertices=vertices
con_line.draw()
myWin.flip()
The problem is that it becomes too ressource demanding to draw those many visual.Lines or manipulate those many vertices in the visual.ShapeStim on each iteration of the loop. So it will hang on the draw (for Lines) or vertex assignment (for ShapeStim) so long that the mouse has moved enough for the line to show discontinuities ("edgy").
So it's a performance issue. Here are two ideas:
Have a lower threshold for the minimum distance travelled by the mouse before you want to add a new coordinate to the line. In the example below I impose a the criterion that the mouse position should be at least 10 pixels away from the previous vertex to be recorded. In my testing, this compressed the number of vertices recorded per second to about a third. This strategy alone will postpone the performance issue but not prevent it, so on to...
Use the ShapeStim solution but regularly use new ShapeStims, each with fewer vertices so that the stimulus to be updated isn't too complex. In the example below I set the complexity at 500 pixels before shifting to a new stimulus. There might be a small glitch while generating the new stimulus, but nothing I've noticed.
So combining these two strategies, starting and ending mouse drawing with a press on the keyboard:
# Setting things up
from psychopy import visual, event, core
import numpy as np
# The crucial controls for performance. Adjust to your system/liking.
distance_to_record = 10 # number of pixels between coordinate recordings
screenshot_interval = 500 # number of coordinate recordings before shifting to a new ShapeStim
# Stimuli
myWin = visual.Window(units='pix')
ms = event.Mouse()
myclock = core.Clock()
# The initial ShapeStim in the "stimuli" list. We can refer to the latest
# as stimuli[-1] and will do that throughout the script. The others are
# "finished" and will only be used for draw.
stimuli = [visual.ShapeStim(myWin,
lineColor='white',
closeShape=False,
vertices=np.empty((0, 2)))]
# Wait for a key, then start with this mouse position
event.waitKeys()
stimuli[-1].vertices = np.array([ms.getPos()])
myclock.reset()
while not event.getKeys():
# Get mouse position
new_pos = ms.getPos()
# Calculating distance moved since last. Pure pythagoras.
# Index -1 is the last row.index
distance_moved = np.sqrt((stimuli[-1].vertices[-1][0]-new_pos[0])**2+(stimuli[-1].vertices[-1][1]-new_pos[1])**2)
# If mouse has moved the minimum required distance, add the new vertex to the ShapeStim.
if distance_moved > distance_to_record:
stimuli[-1].vertices = np.append(stimuli[-1].vertices, np.array([new_pos]), axis=0)
# ... and show it (along with any "full" ShapeStims
for stim in stimuli:
stim.draw()
myWin.flip()
# Add a new ShapeStim once the old one is too full
if len(stimuli[-1].vertices) > screenshot_interval:
print "new shapestim now!"
stimuli.append(visual.ShapeStim(myWin,
lineColor='white',
closeShape=False,
vertices=[stimuli[-1].vertices[-1]])) # start from the last vertex

Constructing a bubble trellis plot with lattice in R

First off, this is a homework question. The problem is ex. 2.6 from pg.26 of An Introduction to Applied Multivariate Analysis. It's laid out as:
Construct a bubble plot of the earthquake data using latitude and longitude as the scatterplot and depth as the circles, with greater depths giving smaller circles. In addition, divide the magnitudes into three equal ranges and label the points in your bubble plot with a different symbol depending on the magnitude group into which the point falls.
I have figured out that symbols, which is in base graphics does not work well with lattice. Also, I haven't figured out if lattice has the functionality to change symbol size (i.e. bubble size). I bought the lattice book in a fit of desperation last night, and as I see in some of the examples, it is possible to symbol color and shape for each "cut" or panel. I am then working under the assumption that symbol size could then also be manipulated, but I haven't been able to figure out how.
My code looks like:
plot(xyplot(lat ~ long | cut(mag, 3), data=quakes,
layout=c(3,1), xlab="Longitude", ylab="Latitude",
panel = function(x,y){
grid.circle(x,y,r=sqrt(quakes$depth),draw=TRUE)
}
))
Where I attempt to use the grid package to draw the circles, but when this executes, I just get a blank plot. Could anyone please point me in the right direction? I would be very grateful!
Here is the some code for creating the plot that you need without using the lattice package. I obviously had to generate my own fake data so you can disregard all of that stuff and go straight to the plotting commands if you want.
####################################################################
#Pseudo Data
n = 20
latitude = sample(1:100,n)
longitude = sample(1:100,n)
depth = runif(n,0,.5)
magnitude = sample(1:100,n)
groups = rep(NA,n)
for(i in 1:n){
if(magnitude[i] <= 33){
groups[i] = 1
}else if (magnitude[i] > 33 & magnitude[i] <=66){
groups[i] = 2
}else{
groups[i] = 3
}
}
####################################################################
#The actual code for generating the plot
plot(latitude[groups==1],longitude[groups==1],col="blue",pch=19,ylim=c(0,100),xlim=c(0,100),
xlab="Latitude",ylab="Longitude")
points(latitude[groups==2],longitude[groups==2],col="red",pch=15)
points(latitude[groups==3],longitude[groups==3],col="green",pch=17)
points(latitude[groups==1],longitude[groups==1],col="blue",cex=1/depth[groups==1])
points(latitude[groups==2],longitude[groups==2],col="red",cex=1/depth[groups==2])
points(latitude[groups==3],longitude[groups==3],col="green",cex=1/depth[groups==3])
You just need to add default.units = "native" to grid.circle()
plot(xyplot(lat ~ long | cut(mag, 3), data=quakes,
layout=c(3,1), xlab="Longitude", ylab="Latitude",
panel = function(x,y){
grid.circle(x,y,r=sqrt(quakes$depth),draw=TRUE, default.units = "native")
}
))
Obviously you need to tinker with some of the settings to get what you want.
I have written a package called tactile that adds a function for producing bubbleplots using lattice.
tactile::bubbleplot(depth ~ lat*long | cut(mag, 3), data=quakes,
layout=c(3,1), xlab="Longitude", ylab="Latitude")

np.fft.fft off by a factor of 1000 (fitting an powerspectrum)

I'm trying to make a powerspectrum from an experimental dataset which I am reading in, and then to fit it to an theoretical curve. Now everything is working fine and I'm not getting errors, except for the fact that my curve keeps differing by a factor of 1000 from the data and I have absolutely no idea what the problem could be. I've asked a few people, but to no avail. (I hope that you guys will be able to help)
Anyways, I'm pretty sure that its not the units, as they were tripple checked by me and 2 others. Basically, I need to fit a powerspectrum to an equation by using the least squares method.
I can't post the whole code, as its rather long and a bit messy, but this is the fourier part, I added comments to all arrays and vars which have not been declared in the code)
#Calculate stuff
Nm = 10**-6 #micro to meter
KbT = 4.10E-21 #Joule
T = 297. #K
l = zvalue*Nm #meter
meany = np.mean(cleandatay*Nm) #meter (cleandata is the array that I read in from a cvs at the start.)
SDy = sum((cleandatay*Nm - meany)**2)/len(cleandatay) #meter^2
FmArray[0][i] = ((KbT*l)/SDy) #N
#print FmArray[0][i]
print float((i*100/len(filelist)))#how many % done?
#fourier
dt = cleant[1]-cleant[0] #timestep
N = len(cleandatay) #Same for cleant, its the corresponding time to cleandatay
Here is where the fourier part starts, I take the fft and turn it into a powerspectrum. Then I calculate the corresponding freq steps with the array freqs
fouriery = np.fft.fft((cleandatay*(10**-6)))
fourierpower = (np.abs(fouriery))**2
fourierpower = fourierpower[1:N/2] #remove 0th datapoint and /2 (remove negative freqs)
fourierpower = fourierpower*dt #*dt to account for steps
freqs = (1.+np.arange((N/2)-1.))/50.
#Least squares method
eta = 8.9E-4 #pa*s
Rbead = 0.5E-6#meter
constant = 2*KbT/(3*eta*pi*Rbead)
omega = 2*pi*freqs #rad/s
Wcarray = 2.*pi*np.arange(0,30, 0.02003) #0.02 = 30/len(freqs)
ChiSq = np.zeros(len(Wcarray))
for k in range(0, len(Wcarray)):
Py = (constant / (Wcarray[k]**2 + omega**2))
ChiSq[k] = sum((fourierpower - Py)**2)
pylab.loglog(omega, Py)
print k*100/len(Wcarray)
index = np.where(ChiSq == min(ChiSq))
cutoffw = Wcarray[index]
Pygoed = (constant / (Wcarray[index]**2 + omega**2))
print cutoffw
print constant
print min(ChiSq)
pylab.loglog(omega,ChiSq)
So I have no idea what could be going wrong, I think its the fft, as nothing else can really go wrong.
Below is the pic I get when I plot all the fit lines against the spectrum, as you can see it is off by about 1000 (actually exactly 1000, as this leaves a least square residue of 10^-22, but I can't just randomly multiply without knowing why)
Just to elaborate on the picture. The green dots are the fft spectrum, the lines are the fits, the red dot is where it thinks the cutoff frequency is, and the blue line is the chi-squared fit, looking for the lowest value.
Take a look at the documentation for the FFT that you are using. Many FFTs introduce a scaling factor that is usually N * result (number of samples). Multiplying by 1/N will scale the results back in line. (You said that the result is 1000 too high....could it be that you are using a 1024 size FFT?)
Your library FFT routine might include a scale factor of 1/sqrt(n).
Check the documentation for the fft you used, as the proportion of the scale factor allocated between the fft and the ifft is arbitrary.